(Discourse Approaches to Politics, Society and Culture 55) Bertie Kaal (ed.), Isa Maks (ed.),...

346

Transcript of (Discourse Approaches to Politics, Society and Culture 55) Bertie Kaal (ed.), Isa Maks (ed.),...

  • From Text to Political Positions

  • Volume 55

    From Text to Political Positions. Text analysis across disciplinesEdited by Bertie Kaal, Isa Maks and Annemarie van Elfrinkhof

    Discourse Approaches to Politics, Society and Culture (DAPSAC)The editors invite contributions that investigate political, social and cultural processes from a linguistic/discourse-analytic point of view. The aim is to publish monographs and edited volumes which combine language-based approaches with disciplines concerned essentially with human interaction disciplines such as political science, international relations, social psychology, social anthropology, sociology, economics, and gender studies.

    For an overview of all books published in this series, please see http://benjamins.com/catalog/dapsac

    General EditorsRuth Wodak, Andreas Musolff and Johann UngerLancaster University / University of East Anglia / Lancaster [email protected]; [email protected] and [email protected]

    Advisory Board Christine AnthonissenStellenbosch University

    Michael BilligLoughborough University

    Piotr CapUniversity of d

    Paul ChiltonLancaster University

    Teun A. van DijkUniversitat Pompeu Fabra, Barcelona

    Konrad EhlichFree University, Berlin

    J.R. MartinUniversity of Sydney

    Jacob L. MeyUniversity of Southern Denmark

    Greg MyersLancaster University

    John RichardsonLoughborough University

    Luisa Martn RojoUniversidad Autonoma deMadrid

    Christina SchffnerAston University

    Louis de SaussureUniversity of Neuchtel

    Hailong TianTianjin Foreign Studies University

    Joanna ThornborrowCardiff University

    Sue WrightUniversity of Portsmouth

  • From Text to Political PositionsText analysis across disciplines

    Edited by

    Bertie KaalIsa MaksAnnemarie van ElfrinkhofVU University Amsterdam

    John Benjamins Publishing CompanyAmsterdam / Philadelphia

  • Library of Congress Cataloging-in-Publication Data

    From Text to Political Positions : Text analysis across disciplines / Edited by Bertie Kaal, Isa Maks and Annemarie van Elfrinkhof.

    p. cm. (Discourse Approaches to Politics, Society and Culture, issn 1569-9463 ; v. 55)Includes bibliographical references and index.1. Discourse analysis--Political aspects. 2. Public communication--Political aspects.

    3.Mass media--Political aspects. 4. Communication in politics. I. Kaal, Bertie.P302.77.F76 2014401.41--dc23 2014004430isbn 978 90 272 0646 6 (Hb ; alk. paper)isbn 978 90 272 7034 4 (Eb)

    2014 John Benjamins B.V.No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher.

    John Benjamins Publishing Co. P.O. Box 36224 1020 me Amsterdam The NetherlandsJohn Benjamins North America P.O. Box 27519 Philadelphia pa 19118-0519 usa

    8 TM The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

  • Table of contents

    Foreword vii

    chapter 1Positions of parties and political cleavages between parties in texts 1

    Jan Kleinnijenhuis and Wouter van Atteveldt

    Part I. Computational methods forpoliticaltextanalysis

    Introduction 23Piek Vossen

    chapter 2Comparing the position of Canadian politicalparties using French and English manifestos as textual data 27

    Benot Collette and Franois Ptrychapter 3Leveraging textual sentiment analysis withsocial network modeling:Sentiment analysis of political blogs in the 2008 U.S. presidential election 47

    Wojciech Gryc and Karo Moilanenchapter 4Issue framing and language use intheSwedish blogosphere:Changing notions of the outsider concept 71

    Stefan Dahlberg and Magnus Sahlgrenchapter 5Text to ideology or text to party status? 93

    Graeme Hirst, Yaroslav Riabinin, Jory Graham, Magali Boizot-Roche, and Colin Morris

    chapter 6Sentiment analysis inparliamentaryproceedings 117

    Steven Grijzenhout, Maarten Marx, and Valentin Jijkounchapter 7The qualitative analysis of political documents 135

    Jared J. Wesley

  • vi From Text to Political Positions

    Part II. From text to political positions viadiscourseanalysis

    Introduction 163Veronika Koller

    chapter 8The potential of narrative strategies inthediscursive construction of hegemonic positions and social change 171

    Nicolina Montesano Montessorichapter 9Christians, feminists, liberals, socialists, workers and employers:The emergence of an unusual discourse coalition 189

    Anja Eleveldchapter 10Between the Union and a United Ireland: Shifting positions in Northern Irelands post-Agreement political discourse 207

    Laura Filardo-Llamaschapter 11Systematic stylistic analysis: The use of a linguistic checklist 225

    Maarten van Leeuwenchapter 12Participation and recontextualisation innewmedia: Political discourse analysis and YouTube 245

    Michael S. Boyd

    Part III. Converging methods

    Introduction 271Alan Cienki

    chapter 13From text to the construction ofpoliticalparty landscapes:A hybrid methodology developed forVotingAdviceApplications 275

    Andr Krouwel and Matthew Wall chapter 14From text to political positions: The convergence of political, linguistic anddiscourseanalysis 297

    Annemarie van Elfrinkhof, Isa Maks, and Bertie Kaal

    About the authors 325

    Index 331

  • Foreword

    There is clearly a need for the identification and positioning of political parties and their stances. This need is both of a societal and of an academic nature. Socially it seems that traditional voting patterns have dissolved and information sources have diversified exponentially. Voters no longer vote primarily according to their community identity, such as class or religion (Franklin et al. 1992), but they make their choice independently. Accessible information about party positions on the social structure in general as well as on specific issues is therefore imperative to inform the public so that they can cast an informed vote.

    Academically, identifying policy preferences and positions of political parties and their actors is primarily a concern for political scientists. When analysing the behaviour of political parties over time or across countries (political) scien-tists need reliable and valid measures to establish positions. The search for such estimates has resulted in a range of analytical methods ranging from expert sur-veys, to opinion polls and roll-call behaviour to various forms of content analysis (Krouwel and Van Elfrinkhof 2013). Each of these methods has its advantages and disadvantages, resulting in more or less reliable, valid and transparent measure-ments that might be improved by adopting cross-disciplinary research designs.

    This volume contributes to the literature of party positioning by focusing on political text analyses with a wide variety of approaches from political science, linguistics and discourse analysis. The central question is how to identify verbal expressions of politically motivated ideas in texts. Political texts in particular func-tion to articulate ideas persuasively to a broad audience. For this reason, political texts reflect the goals and policies induced by the norms and values of a partys ideology and identity. As such, texts are an excellent source of political positions and their rationale. Furthermore, the sheer abundance and availability of political texts make them an attractive source for analysis, especially since collections of these texts are often available digitally. The chapters in this volume discuss meth-ods to analyse a variety of political text genres that are designed to establish and communicate party positions, including newspaper articles, election manifestos, campaign and parliamentary speeches, online documents, blogs and interviews (see Table 1).

    The purpose of this volume is to discover how stance is encoded in political texts and how such characteristics can best be gauged on political dimensions.

  • viii From Text to Political Positions

    Simply reading and interpreting a text will not do, as meaning making and the social implications of policies are subject to their dynamic and diverse contexts, as analysed in qualitative studies. Another approach is to analyse large amounts of political text to find variations in content, linguistic and discursive aspects that can be linked to positions on political dimensions. Current types of analysis apply qualitative and quantitative methods, or combinations thereof. Such attempts at modelling political text analysis for party positioning reveal the complexity of sense making. Both qualitative and quantitative approaches are pursued in many of the contributions in this volume.

    We have tried to divide the variety of approaches into meaningful parts that give an impression of the division between qualitative and quantitative methods, as well as attempts to converge them with reflections on the flaws and advan-tages of mixing methods. As an introduction to the problematics addressed, Kleinnijenhuis and Van Atteveldt (Chapter 1) describe the evolution of political text analysis from the early concept of political thought (Machiavelli) to modern advances in the domains of knowledge representation, natural language parsing, automated content analysis and semantic-web technologies. That chapter then focuses on conceptualisation through the logic and structure of semantic net-works and discusses the sophisticated NET-Method, developed at The Network Institute (VU University Amsterdam). Part I, introduced by Piek Vossen, is a collection of discussions about automated, quantitative methods to identify characteristics and variation in political texts. Part II, introduced by Veronika Koller, focuses on qualitative methods, some of which apply quantitative analysis as evidence of the existence of linguistic and discursive aspects of meaning mak-ing. Part III is introduced by Alan Cienki and includes two chapters on projects attempting to mix and match qualitative and quantitative methods for better results and best practice. The purpose of this book is to present the variety of con-siderations and applications of text analysis for political party positioning, and to inform scholars of the wide range of available methods that have their advantages and disadvantages under different conditions. More ambitiously, this book aims to trigger cross-disciplinary research in which methods are combined, as a way to refine research methods for the analysis of stance taking and to achieve more accurate results overall.

    However, attempts to converge quantitative and qualitative methods of con-tent analysis run across the three parts, as shown in Table 1. For instance, the chapters by Kleinnijenhuis and Van Atteveldt, Collette and Ptry, and Krouwel and Wall show the ability and the promise of quantitative methods for the analysis of large amounts of texts vis--vis the in-depth analysis of qualitative methods. In linguistics, various quantitative methods have been developed for analysing

  • Foreword ix

    expressions of subjectivity and sentiment in party documents as well as their rhe-torical affordances. The chapters by Gryc and Moilanen, Dahlberg and Sahlgren, and Grijzenhout, Marx and Jijkoun show how adding knowledge of linguistic fea-tures can enhance the potential of quantitative methods. The focus of discourse analysis is on extracting constructions of meaning and analysing the persuasive nature of political discourse in its social context. Montesano Montessori, Eleveld, Filardo-Llamas, and Van Leeuwen show how linguistic formulations and concep-tualisations relate to the social context.

    Table 1. Qualitative and quantitative methods and genres across chapters.

    Ch. Authors Data source Methodology Quant / Qual1 Kleinnijenhuis &

    Van AtteveldtNewspapers NET Method Quantitative

    2 Collette & Ptry Election manifestos Wordfish, Wordscores Quantitative3 Gryc & Moilanen Political blogs Entity-centric

    document-level sentiment classification

    Quantitative

    4 Dahlberg & Sahlgren Blogs, websites political parties

    Random Indexing Quantitative and Qualitative

    5 Hirst, Riabinin, Graham, Boizot-Roche & Morris

    Parliamentary proceedings

    Machine Learning Quantitative

    6 Grijzenhout, Marx & Jijkoun

    Parliamentary proceedings

    Sentiment analysisMachine Learning

    Quantitative

    7 Wesley Political texts Qualitative document analysis

    Qualitative

    8 Montesano Montessori Speeches and declarations

    Critical Discourse/narrative analysis

    Qualitative

    9 Eleveld Interviews and other documents

    Discourse/narrative analysis

    Qualitative

    10 Filardo-Llamas Speeches Critical discourse analysis

    Qualitative

    11 Van Leeuwen Parliamentary speeches

    Stylistic analysis Qualitative

    12 Boyd Text comments on political videos

    Critical discourse analysis

    Quantitative and Qualitative

    13 Krouwel & Wall Election manifestos, party documents

    Content analysis Quantitative and Qualitative

    14 Van Elfrinkhof, Maks & Kaal

    Election manifestos Wordscores, subjectivity analysis, Critical Discourse Analysis

    Quantitative and Qualitative

  • x From Text to Political Positions

    Whether assessing political texts quantitatively or qualitatively, researchers are confronted with strategic and creative language use: politicians are creative language users, and political language as a result changes fast (Grijzenhout et al., page 130) and these functional aspects also deserve attention. Filardo Llamas stresses the ambiguity and vagueness of political language which allow for mul-tiple interpretations of the same texts that can divide or unify parties, as was the case with the peace settlement in Northern Ireland. The possibility of multiple interpretations is also the topic of Elevelds chapter that discusses how one com-mon policy narrative can be supported by different political parties and lead to unusual coalitions. Hirst et al. discuss unexpected and hidden structures, where polarization is not an ideological opposition, but rather a pragmatic attack and defense dichotomy. These examples of political text analyses give evidence of the researchers challenges and opportunities.

    With a view to make the collaboration between disciplines tangible, we organised the workshop From Text to Political Positions at the VU University Amsterdam, in April 2010. This led to discussions on work in progress, on the challenges of the complexity of the material to be analysed for relations between language and politics, and on finding solutions to bringing qualitative and quanti-tative methods together. This volume includes a selection of the projects presented at the workshop that have since been revised and updated. The result is a repre-sentative variety of approaches to political text analysis across genres, languages, theories and methods. It brings together state-of-the-art political, discourse and linguistic analytical models that make links between linguistics and the social sciences. The chapters present top-down and bottom-up methods and applica-tions using a variety of texts, each with their own communicative functions. Each chapter reflects on advantages and disadvantages of the methods that were chosen in view of their relation to the data and the reliability of their results.

    Developments in the field of political text analysis have been recorded in this volume with the aim to take notice of the challenges of cross-disciplinary analyti-cal models as well as the complexity of the relationship between content, language use and different motivations for political action.

    Amsterdam, November 2013 Bertie Kaal Isa Maks Annemarie van Elfrinkhof

  • Foreword xi

    Reference

    Franklin, M., T. Mackie, and H. Valen. 1992. Electoral Change: Responses to Evolving Social and Attitudinal Structures in Western Countries. Cambridge: Cambridge University Press.

    Krouwel, A. and A. van Elfrinkhof. 2013. Combining strengths of methods of party positioning to counter their weaknesses: the development of a new methodology to calibrate parties on issues and ideological dimensions. Quality & Quantity, DOI: 10.1007/sl 1135-013-9846-0.

    Acknowledgements

    The research project From Text to Political Positions and the workshop from which these chap-ters resulted was made possible by the VU institute for interdisciplinary research, The Network Institute (formerly the Center for Advanced Media Research Amsterdam, CAMeRA). Other sponsors of the workshop were Kieskompas and the Nederlandse Nieuwsmonitor. We would like to thank all participants for their presentations and lively discussions. We are grateful to the contributors to this volume for their patience and open-mindedness to make revisions and to follow up on the editors and reviewers suggestions. Special thanks are due to Alan Cienki for his critical comments and meticulous editing.

  • chapter 1

    Positions of parties and political cleavages between parties in texts

    Jan Kleinnijenhuis and Wouter van AtteveldtVU University Amsterdam, Department of Communication Science

    If humans are political animals, and language is their most versatile communi-cation tool, then the old question about what should be extracted from political texts to understand politics deserves on-going attention. Recent advances in the information and communication sciences have resulted in new means to process political texts, especially advances in the domains of knowledge repre-sentation, natural language parsing, automated content analysis and semantic web technologies. However, applying these innovations to uncover what matters in politics is far from trivial. The aim of this chapter is to give an introduction to an analysis of political texts aimed at inferring their political meaning.

    First, a few concepts from the history of political thought will be reviewed so as to arrive at a feeling for what should be revealed by means of an analysis of political texts. Next, three sets of Natural Language Processing tools will be distinguished to analyze political texts: tools to assess whether a concept (or object) occurs, tools to assess whether an (asymmetric) relationship between concepts occurs, and tools to assess the nature of the relationship between concepts. These three sets of tools relate to advances in ontology construction and entity recognition, advances in statistical associations and network theory, and advances in part-of-speech tag-ging and grammar parsing respectively. The aim is to show how automation of the three sets of tools could be employed in the near future and could give reliable and valid answers to frequently asked questions in political communication.

    Political language and content analysis

    Websters Unabridged Dictionary defines politics in various ways. Politics is the total complex of interacting and usually conflicting relations between men living in society, it is concerned with governing or with influencing or winning and holding control and with actions, practices or policies to achieve goals with

  • 2 Jan Kleinnijenhuis and Wouter van Atteveldt

    respect to issues. In short, politics is usually about achieving goals by means of policies, about conflict or cooperation, and about winning or losing.

    Machiavelli (14691527) observed that princes can acquire new principali-ties by means of their own virtue, or by fortune. Their virtue prescribes them to take great pains to satisfy the people and make them content with laws that serve their issue interests, but also to raise prestige by the art of conflict and coopera-tion, thus by revealing themselves without any preservation in favor of one side against another, with the risk of vigorous wars. Fortune is what befalls on leaders, such as unexpected natural disasters, or popular support. Virtue can be practiced whereas fortune can only be anticipated. Machiavellis primary concern is political language rather than politics itself, that is, how others would speak about the acts of actors and about what befalls upon them. Even winning a war was perceived by Machiavelli as the art of inspiring the army with confidence in itself and in its general, thus as the art of inducing others to speak about themselves in relation-ship to their leader. Interpreted from the contemporary perspective of this chapter, already Machiavelli elucidated that political language centers around four types of statements:1

    Virtue: what an actor can do:1. take an issue position (pro or con a cause, e.g., poverty, crime, unemployment); 2. cooperation or conflict with support or criticism from other actors (e.g., build-

    ing a coalition government, waging war)

    Fortuna: what befalls upon an actor:3. real world developments with regard to issues (e.g., famine, unemployment)4. actors success or failure, gains or losses (e.g., gains and losses in wars, opinion

    polls, or in political debates).

    Machiavelli is considered as the forerunner of the age of absolutism, but his basic ideas are equally important for a democracy. In a democracy, each of these four types of statements is important to attract voters. Voters choose a party when they agree with the issue positions of that party (Tomz and Van Houweling 2008; Westholm 1997), and even when the issues on which that party holds a strong reputation dominate the campaign (Budge and Farlie 1983; Hayes 2005; Petrocik 1996). Voters choose a party with a strong profile in terms of attacks and criti-cisms from political adversaries and support from within and from societal actors (Shah et al. 2002). They prefer incumbent parties in case of favorable real world developments (e.g., economic growth) and a challenger in case of deteriorating

    1. This interpretation of Machiavelli is based especially on Chapters 67, 1718, 21 and 25 of The Prince and Chapters 3335 of The Discourses.

  • Chapter 1. Positions of parties and political cleavages between parties in texts 3

    real world developments (e.g., unemployment) (Schumpeter 1950 (1943)), but in the absence of objective knowledge they are susceptible to the positive or negative portrayal of these developments in the media (Hetherington 1996; Sanders and Gavin 2004; Soroka 2006). Of these four driving factors, the attribution of suc-cesses and failures to parties in the media is the most important predictor of shifts in party preferences (Kleinnijenhuis et al. 2007).

    Rather unsuccessful attempts have been made to proclaim one of these four types of statements as the most fundamental one. In their seminal study of the his-torical origins of party systems and voter alignments Lipset and Rokkan observed that parties as well as voters loyalty to these parties rest on old conflicts and cleav-ages between parties (Lipset and Rokkan 1967), for example between workers and owners (between the rich and the poor, the haves and the have-nots), or between permissive, secular, urban, orthodox religious and rural groups (e.g., in the Dutch context the rekkelijken versus preciezen in the sixteenth and seventeenth cen-tury). From the perspective of Lipset and Rokkan, disagreement about issue positions mirrors the historical dividing lines of conflicts or cleavages between actors. A central claim in Marxist theories is that that political-issue positions simply mirror class cleavages, i.e., one particular type of cleavages. An opposed view is apparent from the literature about issue voting and election campaigns. Disagreement about issues tends to be seen as the heart of politics. Attack poli-tics and news about conflicts between parties are either regarded as a mirror of issue (dis)agreement, or as degenerated forms of political communication that merely enhance political cynicism (Ansolabehere and Iyengar 1995; Cappella and Jamieson 1997; Patterson 1993). The intellectual origins of this opposed view date back to ideas about political apathy of Alexis de Tocqueville (De Tocqueville 1951 [1835]). Jon Elster (2009) shows that De Tocquevilles thoughts can be rewrit-ten in the language of current social-science methodology as reciprocal causal assertions, as two-way interactions, between developments at the societal level (e.g., issue positions of parties, press coverage) and developments at the level of citizens (beliefs, desires, preferences, behaviors). De Tocquevilles ideas are also at the heart of Elisabeth Noelle-Neumanns (1980) theory of the spiral of silence, which maintains that dominant issue positions in the media will have as an effect that citizens with opposing views will feel less free to express their opinions, which will in turn reinforce these dominant issue positions. The comparative research literature on party manifestos (Budge, Robertson, and Hearl 1987; cf. the chap-ters by Collette and Ptry; Van Elfrinkhof, Maks and Kaal; and Krouwel and Wall in this volume) also contributes to the view that issue positions drive coopera-tion and conflict between parties. As Van Elfrinkhof, Maks and Kaal point out (this volume), a striking feature of the genre of party manifestos is that they deal solely with the issue positions of a single party, rather than, for example, with the

  • 4 Jan Kleinnijenhuis and Wouter van Atteveldt

    conflicts and the teamwork within that party that produced a particular manifesto, or with the pattern of conflicts and cooperation that will emerge in election cam-paigns, in government coalition negotiations, or within the next government coa-lition. Remarkably little longitudinal empirical research has been done, however, to veri fy whether one fundamental type of statement is systematically mirrored in other types of statements.

    The simple truth may be, that a single causal order is impossible or unlikely. Disagreement about a topic is sometimes the result of conflicts between actors, but conflicts between actors may also follow from disagreement about a topic. Sometimes the two may be unrelated, for example when a political party attacks an ideological similar party, or induces third parties to attack or to neglect an ideologi-cal similar party, because precisely ideological similar parties are serious electoral competitors. Winning or losing may be the outcome of conflicts, but vesting a reputation as a loser may also cause conflicts, as Machiavelli stated over and over again. Longitudinal research is needed to answer the important research question as to which causal order is likely under which conditions. However, to arrive at rele-vant data for this type of research new tools are required to analyze the emphasis, turns, shifts and moves in political language, both comprehensively and in-depth.

    Meta-language about political language

    More than two thousand years ago the ancient Greeks invented political theatre, political dialogues and democracy. We still use their concepts to talk about politi-cal language, such as sign and signifier, symbols, and last but not least the sub-ject-object-predicate triplet. Although the Greek concepts of subject, object and predicate are ambiguous and outdated from the point of view of todays theories about grammar, logic and semantics, they are still valid to discuss what language is all about.

    The Australian linguist Robert M. W. Dixon observes that in all languages sentences deal with a subject: who or what directs its action or energy towards a target or object (Dixon 1992, 2005). The nature of this action, or energy, is a two-place predicate. The subject and the object are either animate entities, which we will label actors, or circumstances. Other non-animate entities, will be labeled as issues here, although in non-political context labels such as variables, circum-stances or states of affairs would presumably be more intuitive. Subject-predicate-object triples resemble the a-symmetric xRy-triples in relational logic, which was pointed out succinctly by Ludwig Wittgenstein (19891951) in his famous state-ment: Namen gleichen Punkten, Stze Pfeilen, sie haben Sinn (names resemble points; propositions resemble arrows, they have sense [Wittgenstein 1922: 3.144]).

  • Chapter 1. Positions of parties and political cleavages between parties in texts 5

    The boundaries of ones propositions would also be the boundaries of ones world, according to the early Wittgenstein, but in his later work he was precisely inter-ested in the exchange and the misunderstandings between different world views (Wittgenstein 1953), thereby recognizing the source of propositions as an integral part of the proposition itself (effectively s: xRy, instead of merely xRy, in which s represents the source, x the subject, R the predicate, and y the object).

    Taking the analysis of propositions one step further, Fritz Heider (18961988), another Austrian who moved into the Anglo-Saxon world, developed balance theory in a remarkably short paper. Balance theory also deals with triangles of three statements and assumes that the third relationship can usually be predicted correctly from the first two on the basis of the principle that friends of friends, but also enemies of enemies, tend to be friends, whereas enemies of friends, as well as friends of enemies, will usually be enemies. Thus, if x dislikes y whereas y likes z, then the expectation is raised that x dislikes z as well. In Heiders notation, in which ~L means the opposite of the liking-relation L this would be: x~Ly and yLz, therefore x~Lz. Balance theory, and later theories of cognitive consistency, such as congruence theory and the theory of cognitive dissonance, hold that people will try to avoid cognitive representations that violate balance by a number of Freudian escape routes, such as the negation of information, blaming the messenger, or the rationalization of previous choices with ingenious new arguments (Severin and Tankard 2005).

    Charles Osgood was the first to develop a coding instruction to extract xRy- statements from full sentences and complete texts (Osgood, Saporta, and Nunally 1956). Many different elaborations of this method have been proposed (Deetjen 1977; Kleinnijenhuis 2008; van Cuilenburg, Kleinnijenhuis and De Ridder 1986). Since xRy-statements build up a network, and so the enterprise to extract them from texts has been labeled as semantic network analysis (Krippendorff 2004; Van Atteveldt 2008), which is the topic of the next paragraph.

    Elementary xRy-statements may look all too familiar from the point of view of contemporary logic as applied in semantic web approaches (Antoniou and Van Harmelen 2004), but it should be pointed out that in logic the focus is usually on deriving theorems from sets of axioms in which the predicate R is invariant between axioms, or at least predicates R, S, T, differ categorically from each other. However, the predicates in political language, and in theories of balance and cognitive consistency can be mapped onto a positive-negative continuum. In contemporary logic, axioms that give rise to contradictions are deemed unten-able and therefore uninteresting, whereas in cognitive consistency theories the primary diagnostic of belief systems is the degree to which they are unbalanced and therefore ambiguous when it comes to drawing inferences. In most logics, statements are either true or false, whereas in political language statements have

  • 6 Jan Kleinnijenhuis and Wouter van Atteveldt

    both a magnitude (a frequency, a saliency), and an angle (a direction, a continu-ous positive-negative scale): they mount up to a vector model (Kleinnijenhuis and Pennings 2001). Moreover, the direction of a predicate has not only a mean inter-pretation as its first statistical moment, but also higher-order moments, such as variance, skewness and kurtosis. In short, contemporary logic aborts where politi-cal language, political dialogues and cognitive consistency theories start, namely after a contradiction and after a variety of interpretations.

    Three tools for analysing political texts

    Although the aim of this chapter is to address whether political meanings can be inferred from texts by automated extraction of subject-predicate-object-triples from texts, we will start out with two more basic questions. Is it feasible to automate a textual analysis to extract the occurrence and co-occurrence of concepts in texts? From a theoretical perspective this question is equivalent to the question whether it is possible to automate first-order and second-order agenda-setting research.

    Does a concept occur? First-order agenda setting and entity recognition

    Agenda setting (McCombs 2004; Rogers, Dearing and Bregman 1993), that is to say first-order agenda-setting, is concerned with the transfer of issue saliency of the agenda of one actor to the agenda of another actor. Agenda setting theory predicts, for example, that significant media attention to a particular issue will be followed by huge public attention for that issue. A transfer of issue saliency is also at the heart of the issue ownership theory which attempts to explain party com-petition. The issues that are prominent in party manifestos will tend to become the issues that are central in election campaigns. Pre-established reputations of parties with respect to the issues that dominate the campaign, that may well go back to the old cleavages between parties, determine which party will win at the elections, and ultimately also determine the policy emphasis and policy expendi-tures of parties in government. In short, party competition is primarily directed at increasing the emphasis on owned issues, rather than at pro- or contra argu-ments (Budge and Farlie 1983; Budge et al. 2001; Hayes 2005; Petrocik 1996). In mediatized democracies, politics is basically the politics of attention (Jones and Baumgartner 2005).

    If the emphasis on issues and attributes could be measured with single-word lists or with elementary boolean search strings, then it would be easy to automate agenda setting research, but this is often not the case. However, ontology-match-ing may nevertheless help to automate agenda-setting research.

  • Chapter 1. Positions of parties and political cleavages between parties in texts 7

    The ontological problem: (named) entity recognition

    Let us start with an example article to elucidate the ontology-matching problem that will be encountered when one tries to count the occurrence of concepts, rather than the occurrence of words. The example article was compiled from a number of available articles about the Israeli-Palestinian conflict. Scholars in the tradition of Critical Discourse Analysis would regard texts from Western news-papers about the Israeli-Palestinian conflict as examples of us-them thinking. For example, in Richardson (2004) a positive self-representation of the Western world as a civilizing force is combined with a negative other-representation of the Muslim world, including military threat, extremism and terrorism, despotism and sexism. However, the bottom-up analysis of the sample article that is presented below does not aim at such a high-level interpretation.

    1. Time running out for Mideast two-state solution.2. Only a few months ago, president Obama welcomed Israeli Prime Minister

    Benjamin Netanyahu and Palestinian Authority President Mahmoud Abbas to the White House.

    3. President Obama said both leaders came to Washington in an effort to restart the peace process and reach the goal of a two-state solution that ensures the rights and security of both Israelis and Palestinians.

    4. Obama told reporters that the Israeli government and the Palestinian Authority had taken important steps to build mutual confidence since May.

    5. Since then, president Abbas stated repeatedly that only a complete Israeli set-tlement freeze would create the conditions for a return to the negotiating table.

    6. Today not only Israel, but also the United States dropped the Palestinian demand for a settlement freeze that would have opened up negotiations shortly.

    7. If the past two years have shown nothing else, it is that the weak Ramallah government will not realize enough success to help lead the path back to nego-tiations that bring about a two-state solution.

    8. No two-state solution is possible without Hamas, but Israel and the United States do not want to negotiate with Hamas.

    9. Hamas leader Khaled Mashal called for continuing the jihad against Israel and categorically denied any possibility of talks with Israel.

    Readers who understand this article, must have an ontology in mind of existing concepts and their relations. Ontology is the study of the things that are, and an ontology is a name used in Knowledge Engineering to denote a (shared) formali-zation of a view on the world. Table 1 presents a simple strictly hierarchical ontol-ogy (or taxonomy) to match the content of the example article.

  • 8 Jan Kleinnijenhuis and Wouter van Atteveldt

    Table 1. An ontology to match the content of the sample article.

    Actors Palestinians government: Palestinan Authority leader: President Mahmoud Abbas capital: Rahmallah United States leader: President Barack Obama Hamas leader: Khaled Mashal Israel leader: Prime Minister Benjamin NetanyahuIssues peace process, Israeli-Palestinian negotiations, peace talks two-state solution Israeli settlements settlement freeze []

    An easy observation is that pre-established ontologies are indispensable, since otherwise the article would become incoherent (what has Ramallah to do with Abbas?). Explicit ontologies enable automation of agenda setting at the level of concepts, and not just only at the level of words or boolean search strings of words. From an automation perspective the advantage of an ontology is its additivity. Frequency counts of lower-order concepts (e.g. Abbas, Mashal) are sufficient to arrive at counts of higher-order concepts such as the Palestinians or Hamas. Antonyms and concepts with an opposed meaning give rise to a complication. Lower order antonyms of higher order concepts are marked with a []-mark. The []-mark in the example ontology means that attention for a settlement freeze can be counted as attention for Israeli settlements, but also that protagonists of a settle-ment freeze should be counted as antagonists of Israeli settlements, and vice versa.

    Research on election campaigns or government policy ontologies typically deals with 500 up to 2500 lower-order concepts. A lower number of concepts is often desirable when it comes to interpretation and dispatches. To achieve this, the concepts may be mapped in accordance with the rules implied in the ontology to 5 up to 25 relevant concepts.

    Co-occurrence of concepts? Conditional probabilities and associative framing

    Second-order agenda setting maintains that the attributes that are associated with an object in the media to which one is exposed will also become the attributes that the audience will associate with the concept (McCombs 2004). Second-order agenda setting rests on the transfer of saliency, as did first-order agenda setting.

  • Chapter 1. Positions of parties and political cleavages between parties in texts 9

    The question that needs to be answered next is what causes that a concept B (dotted line) is attributed to, or associated with a concept A (solid line), as shown in Figure 1? In his seminal article Features of Similarity, Amos Tversky points out that the smaller set often lets you think of the larger set simply because the number of elements in the intersection of two sets is a larger percentage of the number of elements in the smaller set than of the number of elements in the larger set (Tversky 1977).

    P (solid dotted) = 2/5 = 0.4P (dotted solid) = 2/10 = 0.2

    Figure 1. Conditional probabilities and asymmetric associative framing.

    For example, if you think of kerosine, you may come to think about an airplane, but it is quite unlikely that thinking about an airplane will immediately generate thoughts about kerosine. Planes are more strongly associated with visions of holi-day or congress destinations. In the study of language, the size of sets is equivalent to the occurrence of concepts, whereas their intersection is equivalent to their co-occurrence.

    After 9/11 politically correct journalists attempted to show that most Muslims (a large set) were actually not terrorists (a small set) by giving examples of hard-working integrated Muslim immigrants, but this did not prevent thoughts about terrorism from generating pictures of 9/11 and negative ideas about Muslims. Conditional probabilities give the cue to associative framing of topics in the media, for example, of terrorism and Islam (Ruigrok and Van Atteveldt 2007). In Spain, however, even the Madrid train bombings by Muslim terrorists shortly before the national elections in 2004 were exploited by the governing Partido Popular (PP) as new evidence of ETA violence. In spite of the Madrid bombings, in spite of the protests against the PP propaganda, and in spite of the victory of Spain after almost a millennium of servitude to Muslim invaders, the Spanish press never came to associate terrorism strongly with the Islam in the period between 20002008. The primary association of terrorism remained with ETA. Primary associations in the press of immigrants were not the Islam, or terrorism, but rather the economy and the calling effect of the regularization of immigration (Mena-Montes 2010). Associative framing can be automated fully, since techniques to count occurrences

  • 10 Jan Kleinnijenhuis and Wouter van Atteveldt

    and co-occurrences of words are straightforward, whereas progress in ontology matching also enables counting the occurrence and co-occurrence of higher-order concepts.

    Many scholars have been puzzled by conditional probabilities, for example, John Maynard Keynes who wrote in 1929, after his renewed encounter with Ludwig Wittgenstein in Cambridge: Well, God has arrived. I met him on the 5:15 train, thereby referring to proposition 5.15 (Wittgenstein 1922), one of Wittgensteins propositions about conditional probability in terms of sets. Conditional prob-abilities are also at the heart of Bayesian statistics. Many theorems from Bayesian statistics have an analogue in associative framing.

    When the frequency of concept occurrence per textual unit is known, it is simple to compute the asymmetrical associations or co-occurrence between con-cepts. These associations can be conceptualized as the conditional probability of encountering one concept, given that another object is encountered: given that a sentence contains a reference to Hamas, how likely are we to see a reference to Israel? This conditional probability is the association between Hamas and Israel. Taking the sentence as the contextual unit, in the sample text this probability is 100% as both sentences mentioning Hamas also mention Israel. However, the reverse is not true, as only 2 out of 6 sentences mentioning Israel also mention Hamas, making the association between Israel and Hamas 33%. Figure 2 shows the network of all associations greater than 50% as extracted from the example text. The lower right shows a central cluster of strongly interconnected actors: Israel, USA, and Palestinians. Interestingly, Palestinians and Hamas are not associated with each other at all. Moreover, while both are associated with the Peace Process and the Two-state Solution, these issues are not associated with Hamas and the Palestinians, but rather with Israel and the USA. If this article would be represent-ative of Middle-East reporting, one would expect that people think of the USA and Israel when they think of the peace process, and not of Hamas or the Palestinians.

    Hamas

    Israel

    Peaceprocess

    Two-statesolution

    USA Palestinians

    Figure 2. Association network extracted from the sample text.

  • Chapter 1. Positions of parties and political cleavages between parties in texts 11

    Semantic network analysis

    In the examples above we have shown how the occurrence and co-occurrence of words can provide meaningful information on the agenda and associations of relevant concepts. These techniques do not tell us, however, how these concepts are related. Fully understanding the relations expressed in language requires fully understanding both the intricacies of natural language and the context in which the language is to be understood. This is beyond the capabilities of the computer. However, it is possible to employ grammatical analysis to analyze some of these relations. First, we should understand how humans would analyze the example text on the two-state solution.

    Manual coding using the NET-method

    Presumably, the most straightforward way to understand the extraction of political statements from political texts is to present the extracted statements in a network. Figure 3 represents the statements about the two-state solution from the sample text based on human coding that are attributed to president Abbas and to presi-dent Obama, whereas Figure 4 represents the network according to the compiler of the article. In Figure 3 and 4 lower-order concepts (e.g., Obama, Abbas) are mapped in accordance with the tree-structure of the ontology to higher order con-cepts (e.g., USA, Palestinians). Solid arrows represent positive associations; dashed arrows represent negative associations. The arrow labels include the quoted actor, an abbreviation for the type of statement, the sentence number in the example article on which the arrow is based, and a few crucial words from the predicate which clarify why these relationships are positive or negative. Table 2 gives an overview of the abbreviations used.

    Two additional statement types that were not discussed before, pop up in the example article: CSQ (= consequences of issues for actors) and CAU (= causal relationships between issues).

    Figure 3 shows that, according to Obama, Israel, the Palestinians and the peace process are positively associated with each other in each direction. They mutually trust each other, both can benefit from the rights and security delivered by the peace process, and both want to restart the peace process. According to Abbas, however, a settlement freeze is a precondition for the peace process.

  • 12 Jan Kleinnijenhuis and Wouter van Atteveldt

    Israel

    Palestinians

    Peace process

    USA:CSQ 3:ensures rightsand security

    USA:CSQ 3:ensures rightsand security

    USA:CC 4:mutualconflidence

    USA:CC 4:mutualconflidence

    Settlement freeze

    Palestinians:CAU 5:create conditions

    USA:IP 3:restart

    USA:IP 3:restart

    Figure 3. Quotations from Obama and Abbas, attributed respectively to the USA and the Palestinians.

    CC 8:not

    negotiate

    CC 8:not

    negotiate

    CC 2:welcomed

    Peaceprocess

    Reality

    Two-statesolution

    Settlementfreeze

    Palestinians

    Israel

    HamasUSA

    CC 2:welcomed

    CC 6:drops

    IP 6:demand

    IP 7:not

    lead backCAU 5:open up

    CAU 7:bring

    REA 1 & 8:time running out& not possible

    CC 6:drops

    CC 9:jihad

    CC 9:deniedpossibility

    IP 8:withoutsupport

    SF 7: weak &not success

    Figure 4. Statements from the sample news article on behalf of its author.

  • Chapter 1. Positions of parties and political cleavages between parties in texts 13

    In contrast with Figure 3, negative relationships show up in Figure 4. Actually, only a few relationships are positive. The USA welcomed the Palestinians, who demanded a settlement freeze, which could open up the peace process, which could bring about a two-state solution. In line with the transitivity principle, this chain of reasoning implies that the USA furthers a two-state solution. Other chains of reasoning support this conclusion. The USA does not want to negotiate with Hamas, since Hamas lends no support to a two-state solution, and denies the possibility of a peace process that could bring about a two-state solution. The USA drops the demands of the Palestinians, and welcomes Israel, which also drops the demands of the Palestinians, since the Palestinians themselves will not lead back to the peace process, which could bring about the two-state solution.

    Nevertheless, the sample article gives an inconsistent, unbalanced view (Heider 1946) of the US position. The unbalance hinges on two inconsistencies. The Palestinians demand a settlement freeze that could open up the peace process, but they will not lead the path back to negotiations, according to the author of the example article. The USA welcomes the Palestinians, but also drop their demands. Given these inconsistencies, one may also argue that the USA now rejects a two-state solution. For example, the USA welcomed Palestinians who will not lead back to the peace process themselves. More importantly, they dropped Palestinian demands although these demands for a settlement freeze could have opened the peace process, which could have brought about a two-state solution.

    Table 2. Statement types and their abbreviations.

    Abbreviations of statement type

    Subject/agent

    Object/target

    Example

    2-place predicates

    IP: Issue Position actor issue Obama freezes bonusesCC: conflict/cooperation,

    Support/Criticismactor actor Palin unleashes attack

    against ObamaCSQ: consequences issue actor Bonuses are simply good for

    bankersCAU: Causation issue issue Bonuses help the economy

    1-place predicates

    REA: Real World developments

    reality issue Bonuses rose further in 2011

    SF: Success/failure reality actor Obama has lost heavilyAEV: Actor Evaluations actor ideal Obama is doing a great jobIEV: Issue Evaluations issue ideal Bonuses are obscene

  • 14 Jan Kleinnijenhuis and Wouter van Atteveldt

    In summary, a semantic network analysis aims at inferring political mean-ings from texts by highlighting the chains of reasoning in texts, as well as their consistency or inconsistency. The semantic network analysis of the sample article clearly reveals that it throws the ball into the Palestinian court by portraying the Palestinian demands for a settlement freeze as the major obstacle for the peace process, rather than the Israeli settlements.

    Automation using semantic rules on top of an ontology, POS-tags, syntax dependency trees and a sentiment analysis of predicates

    We will now show how grammatical analysis can be used to automatically extract part of the network as extracted by human coders. In particular, we will extract citations (sources) and semantic subject/predicate/object triples. Grammatical analysis yields `syntax trees, i.e., graphs containing the grammatical relations between the words of a sentence. For this example, we used the freely available Stanford parser to parse the sentences listed above (Klein and Manning 2003). In other cases we have used the (also freely available) Alpino parser for Dutch with similar techniques (Van Atteveldt et al. 2008; Van Noord 2006).

    The key intuition for using syntax trees is that these trees are closer to the (semantic) relation we wish to measure than the raw words of the sentence. As an example, consider the sentences John hits Mary and Mary, who has been the victim of domestic violence before, was hit by John. Both sentences express a hit-ting relation between John (the hitter) and Mary. However, the surface structure is very different, with many (for this relation) irrelevant words in between John and Mary in the second example and the reversed order of John and Mary. As will be shown below the grammatical structure of these sentences will make it clear that the relative clause (, who has been.) is not central to the expressed relation and that the second sentence is in the passive voice.

    We use the grammatical structure of the text by defining rules that match specific patterns in the syntax trees. The concepts occurring in the relevant parts of the patterns are then translated to semantic roles between these concepts. To illustrate this, Figure 5 shows the annotated parse tree of the fifth sentence from the example above. The words in italics are the words from the text, with the labels on the edges indicating the grammatical relations between them. President Abbas, for example, is the subject of stated, while the whole sub-tree under would create is the complement of that verb.2 As can be seen from this graph, irrelevant

    2. Note that noun phrase (NP) chunks were collapsed to simplify the graph. Normally, each word would be a single node, with for example Abbas being the subject of stated and President a modifying node under Abbas.

  • Chapter 1. Positions of parties and political cleavages between parties in texts 15

    modifiers such as repeatedly, only, and since then are no longer in between the predicates and their subjects, bringing the grammatical structure indeed closer to our intended semantic structure. Moreover, grammatical relations such as subject are related (but not identical) to the semantic relations, giving hope that moving from grammar to semantics might be doable.

    Stated (that)

    complement clausemodifier

    modifier

    subject

    subject

    preposition

    preposition

    since

    object object

    object

    for

    a return to thenegotiating table

    then only

    repeatedlywould create

    Citation: quoteTriple: predicate

    President abbasCitation: source

    a complete israelisettlement freezeTriple: subject

    the conditionsTriple: object

    Figure 5. Semantic tree of sentence 5: Since then, president Abbas stated repeatedly that only a complete Israeli settlement freeze would create the conditions for a return to the negotiating table.

    In order to move from grammar to semantics, we have defined a (relatively) small number of rules that match patterns on the syntactic tree. Figure 6 contains a list of the four rules that are used in the examples here. In the current example, Rule 1 matches the stated verb as it is a speech act verb and it has a subject and comple-ment. Thus, a Citation is created with President Abbas as source and would create (and all nodes below it) as quote. Rule 2 matches the verb would create, as it is not a speech act and has a subject and object. Thus, a Triple is created with the settle-ment freeze as subject, the conditions (again including underlying nodes) as object, and the verb could create as predicate.

    If we use the same ontology to identify the concepts of interest, and if we can detect that create is a positive relation, we can reduce this citation containing a triple to a s: xRy relation [Palestine: SettlementFreeze + PeaceProcess].

  • 16 Jan Kleinnijenhuis and Wouter van Atteveldt

    Rule 1: CitationPattern: a speech act verb (state, say, ) with a subject child node and a complement child nodeResult: Citation(source = subject, quote = complement)Rule 2: Active verbsPattern: a non-speech act verb with a subject child and an object childResult: Triple(subject = subject, predicate = verb, object = object)

    Rule 3: Citation from gerundPattern: a speech act gerund (stating, saying, ) with a complement parent node that has a subject child node and a complement clause child node,Result: Citation(source = subject of complement, quote = complement clause)

    Rule 4: Action NounsPattern: an action noun (attack, policy) with a possessive child and a preposition object grandchildResult: Triple(subject = possessive, predicate = noun, object = object of preposition)

    Figure 6. Selection of pattern rules for detecting semantic roles.

    attackedTriple1: Predicate

    strengthenedCitation: Quote; Triple3: Predicate

    Mr. GaylardTriple1: SubjectCitation: Source

    IsraelsTriple2: Subject siege towards

    saying

    haditTriple3: SubjectHamas

    Triple3: Object

    policyTriple1: Object

    Triple2: Predicate

    GazaTriple2: Object

    subject

    subject

    compl. clause

    auxilliary

    possessive multiword preposition

    object

    object object

    complement

    Figure 7. Semantic tree of Mr Gaylard attacked Israels siege policy towards Gaza, saying it had strengthened Hamas.

    Figure 7 shows a more complicated example from an actual newspaper article: Mr Gaylard attacked Israels siege policy towards Gaza, saying it had strengthened Hamas. This tree sets off rules: Rule 2 matches both active verbs strengthened and attacked. Rule 3 matches the gerund saying, finding as its source the subject

  • Chapter 1. Positions of parties and political cleavages between parties in texts 17

    of its parent node attacked and as its quote the clause below strengthened. Rule 4 matches the policy, using the prepositional object Gaza as object and the posses-sive Israels as subject. This yields the annotations Citation and Triple1 to Triple3 displayed in the syntax tree. In order to create a semantic network from these annotations we would need a suitable ontology to link Mr. Gaylard to the UN and Gaza to Palestinians. Moreover, we need sentiment analysis to determine that siege policy and attack are negative while strengthening is positive. Finally, we need to use anaphora resolution to determine that the it in this sentence refers back to Israels policy (Lappin and Laess 1994; Van Atteveldt et al. 2008). This yields three semantic roles: [UN policy], [Israel Palestinians], and [UN: policy + Hamas].

    Note that this example showcases another complexity in extracting a semantic graph from language: graphs are by definition first order, meaning that relations cannot themselves be used as nodes in another relation. However, in natural lan-guage relations are frequently nested, as in our example of Mr Gaylard attacking Israels policy against Gaza. To reduce this complex network to a normal graph we need to resolve these containments using transitivity rules based on cogni-tive-consistency theories discussed above. In this case, we would conclude that MrGaylard is against Israel (since he disagrees with their policy) and in favour of Gaza (since he disagrees with a policy detrimental to them).

    These steps entail a substantive interpretation of the implications of state-ments and a move from the manifest to the latent content of the text. Since we can formally describe the rules for these interpretations this is much nicer, however, than asking human coders to draw such inferences, since it is sometimes difficult to keep political knowledge (and bias) away from the interpretation.

    Summary

    Starting from old ideas about politics and political language this chapter explored whether the occurrence of concepts, their co-occurrence, and the relationships between them can be extracted automatically so as to infer the political mean-ings underlying a text. From a theoretical perspective, these three objectives cor-respond with the automation of first-order agenda setting, second-order agenda setting and semantic network analysis. The latter is not only concerned with the extraction of issue positions of actors from texts, but simultaneously with the extraction of other political relationships, such as conflict or cooperation between actors, success or failure of actors, consequences of issues for actors, causal rela-tionships between issues, and so on.

    By showing what information the different methods (association analysis, manual coding, syntactic parsing) extract from a single sample text on the Middle

  • 18 Jan Kleinnijenhuis and Wouter van Atteveldt

    East conflict, the chapter illustrates how these different methods show the seman-tic network expressed in this text in different levels of detail. By using pattern matching on the automatically parsed syntax trees, it showed that automation of semantic network analysis can proceed beyond word counts and co-occurrence. It also illustrated the complex patterns originating from single sentences and the additional techniques required to move from extracted syntactic roles to a full semantic network.

    References

    Ansolabehere, S., and S. Iyengar. 1995. Going Negative: How Attack Ads Shrink and Polarize the Electorate. New York: Free Press.

    Antoniou, G., and F. van Harmelen. 2004. A Semantic Web Primer. Cambridge: MIT Press.Budge, I., and D. J. Farlie. 1983. Explaining and Predicting Elections: Issues Effects and Party

    Strategies in Twenty-Three Democracies. London: George, Allen and Urwin.Budge, I., H. D. Klingemann, A. Volkens,, J. Bara, and Tanenbaum, E. 2001. Mapping Policy

    Preferences: Estimates for Parties, Electors, and Governments, 19451998. Oxford: Oxford University Press.

    Budge, I., D. Robertson, and D. Hearl. 1987. Ideology, Strategy and Party Change: A Spatial Analysis of Post-war Election Programmes in 19 Democracies. Cambridge: Cambridge Uni-versity Press. DOI: 10.1017/CBO9780511558771

    Cappella, J. I., and K. H. Jamieson. 1997. Spiral of Cynicism: The Press and The Public Good. New York and Oxford: Oxford University Press.

    De Tocqueville, A. C. H. C. 1951 [1835]. De la Dmocratie en Amrique. Paris: Gallimard.Deetjen, G. 1977. Industriellenprofile in Massenmedien: Ein neuer Ansatz zur Aussagenanalyse.

    Hamburg: Verlag Hans-Bredow-Institut.Dixon, R. M. W. 1992. A New Approach to English Grammar, Based on Semantic Principles.

    Oxford: Clarendon.Dixon, R. M. W. 2005. A Semantic approach to English Grammar. Oxford: Clarendon.Elster, J. 2009. Alexis de Tocqueville: The First Social Scientist. Cambridge: Cambridge University

    Press. DOI: 10.1017/CBO9780511800429Hayes, D. 2005. Candidate qualities through a partisan lens: a theory of trait ownership. Ameri-

    can Journal of Political Science 49(4), pp. 908923. DOI: 10.1111/j.1540-5907.2005.00163.xHeider, F. 1946. Attitudes and cognitive organization. The Journal of Psychology, 21(1), pp. 107

    112. DOI: 10.1080/00223980.1946.9917275Hetherington, M. J. 1996. The medias role in forming voters national economic evaluations

    in 1992. American Journal of Political Science 40(2), pp. 372-395. DOI: 10.2307/2111629Jones, B., and F. Baumgartner. 2005. The Politics of Attention: How Government Prioritizes Prob-

    lems. Chicago: University of Chicago Press.Klein, D., and Ch. D. Manning. 2003.Fast exact inference with a factored model for natural

    language parsing. InAdvances in Neural Information Processing Systems 15 (NIPS 2002), Cambridge, MA: MIT Press, pp. 310.

  • Chapter 1. Positions of parties and political cleavages between parties in texts 19

    Kleinnijenhuis, J. 2008. Reasoning in economic discourse: A network approach to the Dutch Press. In K. Krippendorff and M. A. Bock (eds), The Content Analysis Reader, pp. 430442. Thousand Oaks: Sage.

    Kleinnijenhuis, J., and P. Pennings. 2001. Measurement of party positions on the basis of party programmes, media coverage and voter perceptions. In M. Laver (ed.), Estimating the Policy Positions of Political Actors. London and New York: Routledge, pp. 162182

    Kleinnijenhuis, J., A. M. J. Van Hoof, D. Oegema, and J. A. De Ridder. 2007. A test of rivaling approaches to explain news effects: News on issue positions of parties, real world develop-ments, support and criticism, and success and failure. Journal of Communication 57(2), pp. 366384. DOI: 10.1111/j.1460-2466.2007.00347.x

    Krippendorff, K. 2004. Content Analysis. Thousand Oaks: Sage.Lappin, S. and Laess, H. J. 1994. An algorithm for pronominal anaphora resolution. Computa-

    tional Linguistics, 20(4), pp. 535561.Lipset, S., and S. Rokkan. 1967. Cleavage Structures, Party Systems, and Voter Alignments: An

    Introduction. New York: The Free Press.McCombs, M. E. 2004. Setting the Agenda: The Mass Media and Public Opinion. Cambridge:

    Polity Press.Mena-Montes, N. 2010. Un estudio sobre la inmigracin (20002008): La construccin de la

    agenda (Agenda Building) y la evolucin de los encuadres (Frame Building) en el discurso poltico-parlamentario, meditico y sus referencias a la opinin pblica (with an extended English summary). Madrid: Universidad Rey Juan Carlos (PhD dissertation).

    Noelle-Neumann, E. 1980. Die Schweigespirale: ffentliche Meinung -- unsere soziale Haut. Frankfurt am Main: Ullstein.

    Osgood, C. E., S. Saporta, and J. C. Nunally. 1956. Evaluative assertion analysis. Litera 3, pp.47102.

    Patterson, T. E. 1993. Out of Order. New York: Knopf.Petrocik, J. R. 1996. Issue ownership in Presidential Elections, with a 1980 case study. American

    Journal of Political Science 40, pp. 825850. DOI: 10.2307/2111797Richardson, J. E. 2004. (Mis) Representing Islam: The Racism and Rhetoric of British Broadsheet

    Newspapers. Amsterdam: John Benjamins Publishing Company. DOI: 10.1075/dapsac.9Rogers, E. M., J. W. Dearing, and D. Bregman. 1993. The anatomy of agenda-setting research.

    Journal of Communication 43(2), pp. 6884. DOI: 10.1111/j.1460-2466.1993.tb01263.xRuigrok, N., and W. van Atteveldt. 2007. Global angling with a local angle: How U.S., British,

    and Dutch newspapers frame global and local terrorist attacks. The Harvard International Journal of Press and Politics 12, pp. 6890. DOI: 10.1177/1081180X06297436

    Sanders, D., and N. Gavin. 2004. Television news, economic perceptions and political pref-erences in Britain, 19972001. The Journal of Politics 66(4), pp. 12451266. DOI: 10.1111/j.0022-3816.2004.00298.x

    Schumpeter, J. 1950 (published first in 1943). Capitalism, Socialism and Democracy. New York: Harper and Row.

    Severin, W., and J. W. Tankard. 2005 (5th ed.). Communication Theories: Origins, Methods, Uses. New York: Addison-Wesley / Longman.

    Shah, D. V., M. D. Watts. D. Domke, and D. P. Fan. 2002. News framing and cueing of issue regimes: Explaining clintons public approval in spite of scandal Public Opinion Quarterly 66(3), pp. 339370. DOI: 10.1086/341396

  • 20 Jan Kleinnijenhuis and Wouter van Atteveldt

    Soroka, S. 2006. Good news and bad news: Asymmetric responses to economic information Journal of Politics 68(2), pp. 372385. DOI: 10.1111/j.1468-2508.2006.00413.x

    Tomz, M., and R. P. van Houweling. 2008. Candidate positioning and voter choice. American Political Science Review 102(3), pp. 303318. DOI: 10.1017/S0003055408080301

    Tversky, A. 1977. Features of similarity. Psychological Review 48(4), pp. 327352. DOI: 10.1037/ 0033-295X.84.4.327

    Van Atteveldt, W. 2008. Semantic network analysis. Techniques for extracting, representing and querying media content. PhD Thesis. Amsterdam: Vrije Universiteit Amsterdam.

    van Atteveldt, W. 2008. Semantic Network Analysis: Techniques for Extracting, Representing and Querying Media Content. Charleston, SC: BookSurge Publishers.

    Van Atteveldt, W., J. Kleinnijenhuis, and N. Ruigrok. 2008. Parsing, semantic networks, and political authority: Using syntactic analysis to extract semantic relations from Dutch news-paper articles. Political Analysis, 16(4), pp. 428446.

    van Cuilenburg, J. J., J. Kleinnijenhuis, and J. A. de Ridder. 1986. Towards a graph theory of journalistic texts. European Journal of Communication 1, pp. 6596. DOI: 10.1177/ 0267323186001001005

    Van Noord, G.-J. 2006.Atlastparsingisnowoperational. In Mertens, P., C. Fairon, A. Dister, and P. Watrin (eds). TALN06. Verbum Ex Machina. Actes de la 13e conference sur le traite-ment automatique des langues naturelles, pp. 2042.

    Westholm, A. 1997. Distance versus direction: the illusory defeat of the proximity theory of electoral choice. The American Political Science Review 91(4), pp. 865883. DOI: 10.2307/ 2952170

    Wittgenstein, L. 1922. Tractatus Logico-Philosophicus: Logisch-philosophische Abhandlung. London: Kegan Paul.

    Wittgenstein, L. 1953. Philosophical Investigations. Oxford: Blackwell.

  • part i

    Computational methods forpoliticaltextanalysis

  • Introduction

    Piek VossenVU University Amsterdam

    Political language is one of the most challenging text types to analyze. In debates, politicians use language with great rhetorical skill. For computational analysis, political language is a genuine challenge. Natural language processing (NLP) by computers can be used to automatically analyze text and to derive implications from it, but all technology is limited to information that is structurally and statisti-cally observable. There are implications that computers can draw from analyzing vast amounts of text, which people cannot do simply because they cannot store exact statistical data and cannot read so fast. Computers can, for example, detect trends in word usage across time and across different groups of (political) speakers that signify changes in political discourse. However, we also know that there are vast amounts of implications that humans can draw even from the smallest piece of text, but computers cannot do this. People can because they have an understanding of the complex social relations between the participants in the debate, have rich and complex background knowledge about the world we live in and are extremely experienced and sensitive to the use of language within such contexts. It is one of the exciting aspects of the field of text mining to see how far we can get in draw-ing implications from (political) text. Part I of this volume contains 6 articles that illustrate the possibilities and limitations of applying computational techniques to the analysis of political text. It is not a comprehensive overview, but the following chapters show some of the main issues that are currently under discussion.

    In principle, the units and aspects of language can vary from individual words, to full text and collections of text, and from plain statistics to party issue positions and rhetoric. Table 1 gives an overview of the possible units of analysis (cf. first column) and their types of analysis (cf. first row).

    Research and development is done on all these aspects but progress and the complexity of such efforts vary a lot. Statistics can be derived easily for any structure, but automatic analysis is more complex as we move on the scale from statistics to world knowledge, and fewer systems are available that can do the job. In this book we see examples of NLP techniques centered to the left side of the table, whereas deeper qualitative approaches at the right side are necessarily restricted to a single or a small set of documents to be analyzed by humans (see Wesley, this volume).

  • 24 Piek Vossen

    Table 1. Units of text and types of text analysis in the field.

    Statistics Structure Meaning Polarity Position Rhetoric World knowledge

    words + + + + +phrases + + + + + sentences + + +/ + + paragraphs + + + + complete discourse or document level

    + +/ + + +

    text collections + +/ +

    In terms of methods, NLP has seen an impressive development over the past decades from rule-based and knowledge-rich systems, to machine-learning approaches, and most recently, to hybrid solutions. For various reasons, statistical and machine-learning methods are more accessible and widely used. One reason is their success over the traditional rule-based systems. Another reason is the light-weight and shallow processing of the text, which can be applied to large volumes. It has been shown that statistical and machine-learning NLP can, to a reasonable extent, predict party positions along political dimensions on the basis of language used by politicians. This book includes several examples of how this can be done.

    In Chapters 2 (Collette and Ptry) and 5 (Hirst, Riabinin, Graham, Boizot-Roche and Morris), we see how the words of texts can be used to find party posi-tions. In these approaches, the assumption is made that similar texts tend to use similar words and that there is no need to preserve the structure and complex composition of texts. The words that make up the text are the features that are machine learned to make predictions. Collette and Ptry compare simple word-frequency methods of the widely-used programs Wordfish and Wordscores to English and French versions of Canadian party manifestos and evaluate them against expert surveys on party positions. They try to measure degrees of influ-ence of different languages on party positioning and also consider the effects of word stemming (reducing the feature-space). The programs associate words and their frequencies with the party manifestos of one election and compare these with the frequencies in the manifestos of another election. Collette and Ptry show that this works equally well for English and French parliamentary debates in Canada, despite the different morphological properties of these languages. They also note that Wordscores outperforms Wordfish and that stemming words does not lead to significant effects.

    Chapter 5 represents an interesting contrast with this paper. Hirst et al. also try to discover party positions in Canadian politics (liberal versus conservative) but use a Support-Vector Machine (SVM) as a model and apply their analysis to

  • Part I. Introduction 25

    the English and French debates rather than to party manifestos. Hirst et al. come to the remarkable conclusion that their SVM classifier does not learn the language of the political position but merely the language of defense and attack. In their corpus, the liberals and conservatives swap places from opposition to government and vice versa. A classifier trained with the opposition language identifies the opposition regardless of the partys political status and the other way around for the language of the governing party. The results hold for both English and French. This chapter demonstrates that texts consist of many layers of information and it is dangerous to associate bags-of-words with any type of labels in classifiers, since we do not know what the classifier actually learns. In other words, we do not know which words belong to which layer of information. This is a genuine risk of any machine-learning approach, which can only be dealt with by rigorous testing and evaluation on many different data sets.

    Another pair of chapters tries to extract positive/negative attitude or senti-ment from heterogeneous types of text. In Chapter 3, Gryc and Moilanen com-pare different types of linguistic attitude indicators and social network data to derive sentiments centered around Barack Obama during the 2008 US elections. They apply their methods to 700 blog posts that are classified by crowd sourcing. Different feature sets are derived from the blog post (social network features, sen-timent words, bags-of-words) and different classifiers are trained. Results are mod-erate, with the bags-of-words approach (using a large feature set) working best but social network analysis appearing to contribute additional evidence. Combining different classifiers through voting gives the best results, which is a well-known phenomenon in machine learning. Chapter 6 is closely related to this work: here, Grijzenhout, Jijkoun and Marx compare various techniques to determine subjec-tivity and polarity of paragraphs in Dutch parliamentary debates. They split the problem into determining: (1) the subjectivity of a paragraph and (2) the polarity of the subjectivity found. They compare different types of mathematical models for learning classifiers from training data and contrast the results with algorithms based on subjectivity lexicons. Both chapters show that state-of-the-art approaches to sentiment analysis (both machine-learning and lexicon-based) give reason-able results for political topics in various types of text, such as blogs and debates. Nevertheless, sentiment analysis for negative or positive attitude is rather one-dimensional compared to a complete analysis of the meaning and implication of political text.

    Chapter 4 can be seen as an attempt to use similar techniques to perform a more complex task, namely to model the usage of the concept outsiders in the Swedish political debate. Dahlberg and Sahlgren use Random Indexing to meas-ure concept-shifts. The basic idea is that statistics on the surrounding words tell you something about the meaning of a target word or concept. Whereas in the

  • 26 Piek Vossen

    previous techniques presented in this volume, sets of words are used to model larger text fragments associated with positions or sentiments, in this research the surrounding words (the word space) characterize the word outsider itself. In a first step, the language surrounding the concept outsider is learned from official documents of different parties. Next, the language used in a large collection of blogs is analyzed and compared with the outsider language of the parties. In this study, the analysis is complicated by lack of diachronic data. However, the outsider language of the parties tends to be similar and provides some evidence that the Conservative Moderate Party may have introduced connected concepts such as unemployment to the word space.

    The chapters on computational methods show a strong tendency for statis-tical or quantitative approaches rather than for deeper qualitative approaches. In the light of Chapter 1, by Kleinnijenhuis and van Atteveldt, this kind of text analysis is relatively shallow. Even so, the conclusions and results of these shallow-quantitative techniques are still not without controversy. Results are moderate and probably not stable across different types of texts. Furthermore, Hirst et al. clearly show that we do not know what is learned since text, and definitely political text, comprises many different layers of information that may have been mixed up by the statistical analysis. However, it remains to be seen if deeper text analy-sis and more comprehensive approaches, as described by Kleinnijenhuis and van Atteveldt, can do a better job. For instance, the Net Method also leaves out many details and works by virtue of large volumes of news texts to wash out statistical value from noisy data. In order to become aware of the limitations and to evaluate the adequacy of computerized methods one should carefully consider and account for what is ignored.

    To build a bridge between the (dis-)advantages of automated text analysis and methods for qualitative, interpretive analysis, Part I ends with a discussion of the fundamental differences between qualitative and quantitative methods for analyzing political texts that are the subject of Parts II and III (Wesley, this vol-ume). Even though quantitative methods are often computerized and therefore assumed to be more objective, Wesley argues for qualitative approaches in which the interpretation of the researcher plays a role even when it leads to a bias in the interpretation of results. He claims that interpretive-subjective methods can result in productive insights as long as they are applied rigorously, following specified guidelines and choices are properly documented. Although not following the CDA paradigm, Wesleys chapter indicates a need for text analysis beyond quantitative premises. His approach builds a bridge to the chapters on qualitative discourse-analytic methods presented in Part II.

  • chapter 2

    Comparing the position of Canadian politicalparties using French and English manifestos as textual data

    Benot Collette and Franois PtryUniversit Laval, Department of Political Science

    Recently, computer-assisted, quantitative methods have been developed to position political parties. These word-based textual analysis techniques rely exclusively on the relative frequency of words. As such they do not necessi-tate the knowledge of any particular language to extract policy positions from texts. However, different languages have different word distributions and other syntactic idiosyncrasies. These differences might provoke word-based textual analysis techniques to extract noticeably different positions from parallel texts that are similar in every aspect except language. How crippling is this potential disadvantage when comparing political texts written in different languages? It is this chapters objective to determine the effect of language on the two word frequency methods Wordscores and Wordfish by comparing the policy positions of Canadian parties as extracted from their English and French party manifestos.

    Word-based parallel content analysis

    Over the past thirty years, the methods employed by researchers to locate politi-cal parties have evolved from hand-coded methods, such as the well-known Comparative Manifesto Project (CMP) (Budge et al. 2001; Klingemann et al. 2007), to expert surveys (Castles and Mair 1984; Huber and Inglehart 1995; Laver and Hunt 1992) to dictionary methods (Kleinnijenhuis and Pennings 1999; Laver and Garry 2000; Ray 2001). New computer-assisted, quantitative methods for extracting political party positions on the left-right axis or other policy dimen-sions from political texts have been a useful addition to the researchers toolbox. They rely on objective textual data, they can be used to a nearly unlimited flow of data, and they make it possible to isolate policy preferences from behaviour (see Benoit and Laver 2007b; Laver and Garry 2000; Marks et al. 2007). One advantage

  • 28 Benot Collette and Franois Ptry

    of word-based textual analysis techniques is that, since they rely exclusively on the relative frequency of words, they do not necessitate the knowledge of any particu-lar language to extract policy positions from texts. However, different languages have different word distributions and other syntactic differences. These differences might provoke word-based textual analysis techniques to extract noticeably differ-ent positions from parallel texts that are similar in every aspect except language.

    Word-based techniques, such as Wordscores (Laver et al. 2003) and Wordfish (Slapin and Proksch 2008), analyse the distribution of words in political texts to extract policy positions from them. We call these techniques word-based because the analytical units are the words in a text, not paragraphs, sentences, locutions or topics. This particularity has two main advantages. By chopping texts into words, word-based techniques gain the advantage of analytical simplicity, because words can be automatically identified and treated without human inter-vention.1 Second, since words are treated like quantitative data, the knowledge of a language is no longer necessary to extract and then compare policy positions from texts written in different languages.

    The disadvantage of word-based techniques is that they do not take into account the meaning or the grammatical structure of sentences and words that make them up. Focusing exclusively on the relative frequency of mention of words can lead to linguistic nonsense when the logic is pushed to its limits. As an extreme illustration, it is possible to extract a policy position from a random bunch of words that no human reader could make sense of, or from a freely reorganized text in, say, alphabetical order. It is impossible to measure the positive or negative direction of a policy preference in a text (Monroe et al. 2008). For example, the meaning of the sentence We will raise taxes is the exact opposite of We will not raise taxes. But if we cut these sentences into separate words we (2 times) will (2) not (1) raise (2), and taxes (2), the difference between the two is not, a relatively meaningless word which is likely to be overlooked. The difference in the meanings of the two sentences will be blurred as a result.

    How crippling is this potential disadvantage when comparing political texts written in different languages? It is this chapters objective to determine the effect of language by comparing the policy positions of parties in recent Canadian elec-tions extracted from English and French party manifestos. We do this by check-ing whether two different methods for automated text analysis, Wordscores and Wordfish, extract the same policy positions on the left-right axis from parallel

    1. In practice things are more complicated and experience showed us that, for example, hyphenated locutions can be sometimes treated as a single word or as separated words depending on the method used to produce a frequency matrix.

  • Chapter 2. Comparing the position of Canadian political parties 29

    texts. Parallel texts are original documents written in different languages, not translations. They can be used to benchmark automatic translation quality (see Jian-Yun et al. 1999 for applications of parallel texts).

    Do Wordscores and Wordfish extract the same policy positions on the left-right axis from parallel documents? In theory, there is no reason to believe that they would not. Parallel documents are rigorously the same. Their format is identical, they include the same topics and a bilingual reader will consider those documents to be almost exactly the same. Studies analyzing parallel text-ual data are interesting because they give an opportunity to test the validity and the reliability of word-based textual analysis methods in a more rigorous way than repeated studies focusing on a single language and/or a single party sys-tem. Wordscores has been tested with languages other than English, such as Dutch (Klemmensen et al. 2007), German (Hug and Schulz 2007; Magin et al. 2009; Bruninger and Debus 2008), and French (Laver et al. 2006). But we could find only one study comparing Wordscores results using parallel texts as input. Debus (2009: 5354) uses Wordscores to compare Flemish and French coalition agreements in Belgium and finds no significant difference between them for eco-nomic policy positions. In their analysis of European Parliament speeches Slapin and Proksch (2008) compare Wordfish results for speeches in English, French and German. They find remarkable similarities between languages (English and French especially):

    The comparison of the results across languages suggests that the position esti-mation technique is in fact highly robust to the choice of language (the correla-tion coefficient is 0.86 or higher). The highest correlation is between positions estimated from the English and French translations. These two languages are so similar to each other with regard to the information contained in words that they produce virtually identical position estimates. (Proksch and Slapin 2009: 13)

    It should be noted that due to the large number of official languages spoken in the European Union, speeches in the European Parliament are delivered in one language and then translated. Slapin and Proksch relied on translations instead of original texts. It is unclear whether the relatively high level of similarity between texts in different languages was achieved because, or in spite, of the fact that they were translations. One way to clarify this is to extract party positions from parallel documents which original versions are written in more than one language.

    Some specific features of languages can have a significant impact on the distri-bution of words in a text. French and English differ on many levels: syntax, gram-mar, and style (see Lederer 1994; Vinay and Darbelnet 2003 [1977]). An important syntactic difference between French and English is the use of articles. In French, they are more frequent than in English, because they are quasi-mandatory before

  • 30 Benot Collette and Franois Ptry

    nouns. We can expect to find some common articles (de, du, la, les, etc.) to have a very high frequency, higher than the equivalent in English.

    Looking at grammar, we find several significant differences. In French, gram-matical gender results in a duplicate set of adjective, pronoun and article forms one for masculine and one for feminine nouns. French texts tend to have relatively higher differentiations of those words when compared to English and that means expressing the same concepts with more words that are relatively less frequent than in English. While in English nouns may be singular or plural, in French adjectives can also be singular or plural, in addition of being masculine or femi-nine. So it is possible to have four word forms to express the same concept ins