Text-Mining the Voice of the People
As recent advances in information and communication technologies continue to reshape the relationship between governments and citizens, opportunities emerge at both ends. Citizens route their voices through new electronic channels, hoping to have their opinions heard at any time from any place. Governments can listen, understand, and adapt to social changes as they unfold. But what technologies would allow governments and citizens alike to make the most of it? Incorporating such input into administrative processes is not always straightforward. Citizen input data reflects a variety of unstructured formats, including mobile phone SMS messages and online text submissions. New approaches to the analysis of such unstructured data promise to offer solutions that amplify the citizens' voice, ensuring it is heard by government.
In recent years, best practices in governance have expanded to include Internet and wireless technologies that help governments improve the quality of services to their citizens. These technologies also leverage a new kind of citizen empowerment, allowing direct participation in political discourse. Consequently, the political vocabulary has been enriched with terms like e-government and e-democracy. E-government is typically defined as the use of innovative ICTs (such as Web-based applications and applications based on wireless devices) that aim to provide citizens and businesses with convenient, high-quality access to government information and services.6 The main players in e-government systems are politicians who enact laws; public administrators who translate the law into domain-specific regulations and processes; programmers who implement these processes by designing and implementing related systems; and citizens who are end users of the systems.21
The evolution of e-government systems is captured by different implementation stages (such as initial presence, extended presence, interactive presence, transactional presence, vertical integration, horizontal integration, and totally integrated presence).8 More on e-government, including definitions, models, technology, theories, methods, and limitations, was covered in Heeks and Bailur,11 a special issue of the European Journal of Information Systems,12 and Yildiz.24
In our digital age, e-government helps give citizens access to government information and personal benefits while promoting general compliance, efficient spending of money, integrating government-to-government information and services, and encouraging general participation in political life.6 Parts of these services are also available in developing countries,17 though related projects are not always successful, especially in Africa.10
With Internet proliferation, governments have become e-governments, citizens e-citizens, and democracies e-democracies. Even though there is no consensus, a broadly accepted definition of e-democracy is one provided by Steven Clift, founder and executive director of e-democracy.org: "any use of the Internet for political purposes either by governments, politicians, media, political parties, civil society organizations, or citizens."13
Informative background on e-democracy principles, definitions, and implementation can be found in the Council of Europe's Recommendation CM/Rec of the Committee of Ministers to member states on electronic democracy.3 In terms of phone lines,22 cellphones, and Internet connections,9 e-democracy involves a learning curve23 that can be steep, especially for certain socioeconomic strata in the digital divide. Studies of the influence of communication technologies on the interaction of citizens and their governments show small but significant differences across technologies.2 A short discussion in the following sections on the role of technology in e-government and e-democracy (with a focus on Internet and wireless phones) supports understanding of the two case studies explored later in this article.
The importance of the Internet as an enabling platform for e-government and e-democracy is well established through a number of research studies.3,9 A less obvious option until recentlycellphones and especially short message service (SMS) communicationis an increasingly important asset in the political world. M-government, or "the utilization of all kinds of wireless and mobile technology, services, applications, and devices for improving benefits to the parties involved in e-government, including citizens, businesses, and all government units,"15 emerged in the early 2000s as an alternative to the relative passivity and infrastructure-intensive nature of Web browser-based e-government. Its use is seen in China, Kenya, Saudi Arabia, and other countries. Both Internet connectivity and cellphones empower citizens, encouraging participation in political discourse that occasionally crosses national borders, as explored in the first case study.
Among 10 recommendations proposed by a panel of experts at the second conference on "Working Together to Strengthen Our Nation's Democracy" in Washington, D.C., August 2009 (http://www.whitehouse.gov/files/documents/ostp/opengov/sond2%20final%20report.pdf), the first recommendation was "Involve the American public in meaningful deliberations about important policy questions." The suggested "next step" was "Pursue discussions with the White House Office of Public Engagement, supply relevant information (including case material), and offer opportunities to observe different approaches and to consider how they might be adapted for this purpose."7 Similar approaches for soliciting and rewarding citizen feedback are being practiced today by a number of U.S. cities and states, as well as by various national governments, including the U.K. (the 2010 Spending Challenge project partnering with Facebook) and Greece (the 2011 Make Innovation Work competition).
Acquiring and using feedback provided by citizens is a key e-democracy challenge. ICTs, including cellphones and Internet access, serve as technological mediation between political administrative systems and citizens.1 However, as citizen input often takes the form of unstructured text, administrations and other political institutions need tools that help sift through mountains of textual data, uncovering hidden value; Figure 1 outlines a framework for supporting citizen involvement by incorporating their feedback in the administrative decision-making process, with citizens expressing themselves in the form of online comments and text messages sent by mobile phones. The messages are then summarized through concept-based processing methods to extract the meaning of the collected textual data and feed it back to elected decision makers.
However, such feedback is not free of bias. One obvious and potentially significant form is self-selection, since some citizen groups may simply be more willing to participate in such feedback, trying to promote their own political agendas. A 2010 Pew Research Center report found that income, education, and age continue to be significant sources of bias in citizen participation in e-government.20 The specific algorithm used for concept-based processing is another potential source of bias. Finally, analysts' own subjective interpretation can influence the quality of summarization labels. To mitigate subjectivity bias, analysts can work in groups and follow consensus-building procedures.
Techniques and algorithms capable of summarizing high-level semantic content in unstructured text have been proposed in the context of text mining, machine learning, natural language processing, and information retrieval14; included are Latent Semantic Analysis (LSA), probabilistic LSA, Non-Negative Matrix Factorization, and Latent Dirichlet Allocation. For analyzing the two case studies here, we used LSA, aiming to understand and summarize the thematic (topical) structure in citizens' messages. LSA was introduced by researchers at Bell Communications Research as an information-retrieval method for improving search-engine query performance. It was pioneered in the 1990s by psychologists theorizing it mathematically describes cognitive functions of the human mind. The stream of research on LSA reached a milestone with publication of the Handbook of Latent Semantic Analysis in 2007 (see the sidebar "Latent Semantic Analysis").
The two case studies demonstrate how LSA helps incorporate citizen feedback into e-democracy. To that end, we employ a methodological twist (originally introduced in 2008) in LSAextraction of articulated factors of meaning through rotation of the corresponding mathematical components.19 The reason we chose this twist is the interpretability of the extracted topical factors, a property that benefits e-democracy by producing meaningful summaries of citizen feedback.
SMS Messages from Africa
In July 2009, following the G8 summit in L'Aquila, Italy, U.S. President Barack Obama visited Ghana in Western Africa. In anticipation of a presidential speech scheduled for July 11 in Accra, Ghana, the White House encouraged African citizens to send their comments and questions through SMS (see Figure 2) from July 3 to July 11, 2009; see the America.gov archive (http://www.america.gov) for the electronic version of the invitation. To facilitate access for Africans, the White House provided four short codes and four additional phone numbers. Participants could text in English or French. We obtained a sample of 902 of those SMS messages from America.gov, analyzed them using LSA, and now discuss specific details of our method implementation:
In order to represent all messages accurately, we corrected some obvious typos and spelled out some abbreviations. We manually identified typos and abbreviations through examination of the extracted vocabulary; replacement was done automatically. We removed a few duplicate messages and split up some messages that were accidentally concatenated. Of the 902 messages, 98 were originally in French, which we translated into English.
Following best practices in analyzing textual data, we represented our collection of documents in matrix format using the Vector Space Model.18 Following standard text-processing operations, we performed raw text cleaning, tokenization, stemming, filtering, term weighting, and dimensionality reduction.4 For the rest of our analysis, we followed the LSA factor-analysis approach,5 whereby the factors represent socially constructed components of meaning.16 Since we were interested in a mid-level analysis of semantic granularity, we extracted 10 factors. As is typically done in factor analysis of numerical data, we obtained high-loading terms and high-loading documents for each factor. In order to explain the factors in the way they are articulated by humans, we performed varimax rotations of the term loadings and reciprocated the same rotations on the document loadings.
We interpreted and labeled the resulting rotated factors through a co-examination of each factor's high-loading terms and documents; Table 1 lists the top-loading terms for factor F10.1, and Table 2 lists selected high-loading documents for the same factor. At this point it was obvious that the factor primarily discusses the White House's choice of Ghana for Obama's visit (highest loadings), as opposed to Nigeria, making passing reference to sending a message to African leaders (lower loadings). Our label for factor F10.1 is "Choice of Visiting Ghana." Following a similar examination of high-loading terms and documents, we labeled all 10 factors (see Figure 3). The number of high-loading documents served as a measure of importance for each topic theme (factor). This message count includes cross-loading documents. We ranked topic factors by importance, or in descending order of SMS document count.
The topic-importance chart in Figure 3 is one of the main results of our LSA treatment of the African SMS messages. Such a chart is essential in helping leaders summarize and understand the voice of their people; for each factor (topic), Table 3 lists a representative high-loading message, bridging the semantic abstraction grain, represented by the topic labels, with individual document grain, represented by example messages. Table 3 also lists classification accuracy based on a random sample of 100 messages and consensus between two of the authors. In terms of e-democracy, such example messages can help leaders stay connected with individual citizens.
It is up to the leaders to harness the wisdom of the political crowds and to the citizens to seek a higher level of participatory democracy.
Further analysis of our data by drilling across the country dimension provides a better understanding of how the importance of the identified topics (factors) varies by country; Figure 4 shows countries that contributed at least four messages for two selected factors: The "help Kenya change" request was brought up mainly by Kenyan citizens, whereas the "hello your Excellency" greeting was more dispersed around the African continent. Such analysis highlights the mass-customization aspect of e-democracy, helping leaders localize political dialogue with their citizens and become aware of local priorities.
In 2009 President Obama launched the "Securing Americans' Value and Efficiency [SAVE] Award" (http://www.whitehouse.gov/save-award), aiming to make the U.S. government more effective and efficient at spending taxpayer money. Initially it sought ideas from federal employees, but the program was expanded to include any person who could log onto the SAVE Award Web page and wished to contribute. The winnerTrudy Givens of Portage, WI, a Bureau of Prisons employee who proposed elimination of printing and mailing of thousands of Federal Register copies to employees who don't need them since they are available onlinewas invited to the White House (http://www.whitehouse.gov/blog/2011/01/28/welcoming-our-2010-save-award-winner). All finalists and all SAVE Award submissions were sent to the agencies for potential action and inclusion in the President's FY2012 federal budget (http://www.whitehouse.gov/blog/2010/11/15/and-top-saver).
Overall, the program received more than 18,000 ideas. Using an offline browser, we then downloaded 16,537 submitted ideas. The analysis presented here focuses on the Department of Homeland Security, about which 1,481 contributing ideas were submitted. For our concept-based analysis of this dataset we used the same LSA method we described in the first case study. Our results include a topic importance chart (see Figure 5) in which the most popular money-saving idea theme was consolidation of DHS-wide operations and IT systems. Other important idea themes included cutting full-time employee work hours and a number of "green policy" ideas (such as electronic documents to save paper, mandatory recycling programs, and motion-sensor switches to save energy).
We analyzed 902 SMS messages sent to President Obama by African people in July 2009 with reference to his visit to Ghana (case study 1) and 1,481 ideas submitted by American citizens to the SAVE 2010 award, concerning the U.S. Department of Homeland Security (case study 2). In each, our analysis demonstrated the effectiveness of LSA as a concept-based method for processing unstructured text and closing the loop in the political dialogue between leaders and citizens.
Concept-based methods for processing unstructured text have improved and are today capable of contributing to the political dialogue between leaders and citizens. The two case studies demonstrate how LSA and similar methods can benefit e-democracy by distilling the voice of the people. It is then up to the leaders to harness the wisdom of the political crowds and to the citizens to seek a higher level of participatory democracy. Democracy is the prerequisite, of course, with the people confident enough to speak freely. In any case, the technology is ready to help.
The authors thank William May and Kimberly Harrington of the U.S. State Department for their encouragement in the early stages of this project.
3. Council of Europe. Recommendation CM/Rec (2009)1 of the Committee of Ministers to member states on electronic democracy. Council of Europe, Strasbourg, France, Oct. 2009; https://wcd.coe.int/ViewDoc.jsp?id=1410627
4. Coussement, K. and Van Den Poel, D. Improving customer complaint management by automatic email classification using linguistic style features as predictors. Decision Support Systems 44, 4 (Mar. 2008), 870882.
7. Fung, A., Goldman, J., McCoy, M., and Wright, B. Working Together to Strengthen Our Nation's Democracy: Ten Recommendations. AmericaSpeaks, Ash Institute for Democratic Governance and Innovation, Everyday Democracy, and Demos, Washington, D.C., 2009; http://www.whitehouse.gov/files/documents/ostp/opengov/sond2%20final%20report.pdf
15. Kushchu, I. and Kuscu, H.M. From E-government to M-government: Facing the Inevitable, White Paper. Mobile Government Lab, May 2004; http://www.mgovlab.org
20. Smith, A. Government Online: The Internet Gives Citizens New Paths to Government Services and Information. Pew Research Center, Washington, D.C., Apr. 27, 2010; http://pewinternet.org/Reports/2010/E-Government.aspx
21. United Nations. United Nations e-Government Survey 2008: From e-Government to Connected Governance, M. Mimicopoulos, Ed. United Nations, New York, 2008; http://unpan1.un.org/intradoc/groups/public/documents/un/unpan028607.pdf
Figure 5. Labels and high-loading document counts for the 10 topic factors extracted by applying LSA to the 1,481 ideas submitted with reference to the Department of Homeland Security in the second case study.
©2012 ACM 0001-0782/12/02 $10.00
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from email@example.com or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2012 ACM, Inc.
No entries found