The quality of social interaction: Towards an automatic analysis of sentiments in informative and...

80
The quality of social interaction: The quality of social interaction: Towards an automatic analysis of Towards an automatic analysis of sentiments in informative and persuasive sentiments in informative and persuasive texts. texts. Khurshid Ahmad, Department of Computing, University of Surrey Department of Computer Science, Trinity College, Dublin, Ireland Workshop on Information Management and e-Science, Lancaster e-Science Centre, Lancaster University, 5 th October 2005

Transcript of The quality of social interaction: Towards an automatic analysis of sentiments in informative and...

The quality of social interaction: The quality of social interaction: Towards an automatic analysis of Towards an automatic analysis of

sentiments in informative and persuasive sentiments in informative and persuasive texts.texts.

Khurshid Ahmad,Department of Computing, University of Surrey

Department of Computer Science, Trinity College, Dublin, Ireland

Workshop on Information Management and e-Science, Lancaster e-Science Centre, Lancaster University, 5th

October 2005

2

MotivationMotivationNewly emergent subjects and e-

Science:Behavioural Economics Investor

Psychology; Social Studies of Finance; Economic Sociology;

‘The number of items of quantitative and qualitative information available to well-equipped actor is, in effect,

infinite, yet the capacity of any agencement [humans, machines, algorithms, location,..] to apprehend and to

interpret that data is finite’ (Hardie and Mackenzie 2005).‘The economies of calculation’ (Mackenzie 2003, 2004,

2005)

3

MotivationMotivationNewly emergent subjects and e-Science:

“I remember ’29 very well,” Steinbeck writes (2002: 17), “We had it made…I remember the drugged and happy faces of people who built

paper fortunes in stocks they couldn’t possibly have paid for…Their eyes had the look you see

around the roulette table.” Then, however, “came panic, and panic changed to dull shock…People remembered their little bank balances,

the only certainties in a treacherous world. They rushed to draw the money out. There were fights

and riots and lines of policemen. Some banks failed; rumors began to fly”

4

MotivationMotivation Of all the contested boundaries that define the Of all the contested boundaries that define the

discipline of sociology, none is more crucial than discipline of sociology, none is more crucial than the divide between sociology and economics […] the divide between sociology and economics […] Talcott Parsons, for all [his] synthesizing Talcott Parsons, for all [his] synthesizing ambitions, solidified the divide. “Basically,” […] ambitions, solidified the divide. “Basically,” […] “Parsons made a pact ... you, economists, study “Parsons made a pact ... you, economists, study value; we, the sociologists, will study values.”value; we, the sociologists, will study values.”

If the financial markets are the core of many If the financial markets are the core of many high-modern economies, so at their core is high-modern economies, so at their core is arbitrage: the exploitation of discrepancies in the arbitrage: the exploitation of discrepancies in the prices of identical or similar assets. prices of identical or similar assets.

MacKenzie, Donald. 2000b. “Long-Term Capital Management: a Sociological Essay.” In (Eds) in Okönomie und Gesellschaft, Herbert Kaltoff, Richard Rottenburg and Hans-Jürgen Wagener. Marberg: Metropolis. Pp 277-287.

5

MotivationMotivation Social studies of finance repopulates abstracted Social studies of finance repopulates abstracted

financial markets with humanfinancial markets with human traders and speculatorstraders and speculators, who have particular and complex , who have particular and complex

relations to what they understand to be the market; relations to what they understand to be the market; inventors of market models and formulasinventors of market models and formulas, that prove to be , that prove to be

contested and fallible interpretations of economic reality rather contested and fallible interpretations of economic reality rather than unproblematic representations; than unproblematic representations;

designers of technology and risk assessment modelsdesigners of technology and risk assessment models, , which have normative choices and criteria at their hearts; and which have normative choices and criteria at their hearts; and

journalists who do not just write impassive financial newsjournalists who do not just write impassive financial news, , but play important roles in marketing financial products and but play important roles in marketing financial products and creating space for speculation in everyday life. creating space for speculation in everyday life.

de Goede, Marieke (2005). "Resocialising and Repoliticising Financial Markets: Contours of Social Studies of Finance". Economic Sociology.Vol. 6, No. 3 - July 2005

6

MotivationMotivationNewly emergent subjects and e-

Science:Criminology: Crime Perception,

Detection and Prevention;Anthropology: Ethnic and Cultural

Identity

‘The number of items of quantitative and qualitative information available to well-

equipped actor is, in effect, infinite, yet the capacity of any agencement [humans, machines,

algorithms, location,..] to apprehend and to interpret that data is finite’ (Hardie and

Mackenzie 2005)

7

Motivation: Bounded RationalityMotivation: Bounded Rationality

Herbert Simon•Mechanisms of Bounded Rationality –

rationality is bounded when it fails short of omniscience – largely due to failures of

knowing all of the alternatives, uncertainty about relevant exogenous

events, and inability to calculate consequences (pp 356)

•Human behaviour, even rational human behaviour, is not to be accounted for by a

handful of invariants (pp 367)

8

MotivationMotivation Sentiment Analysis?Sentiment Analysis?

In the 1960’s and 1970’s “The unpredictability of inflation In the 1960’s and 1970’s “The unpredictability of inflation was a primary cause of business cycles”. was a primary cause of business cycles”.

Friedman: “the level of inflation was not a problem; it was the Friedman: “the level of inflation was not a problem; it was the uncertainty about future costs and prices that would prevent uncertainty about future costs and prices that would prevent entrepreneurs from investing and lead to a recession” (Milton entrepreneurs from investing and lead to a recession” (Milton Friedman 1977). Friedman 1977).

Friedman’s conjecture “Friedman’s conjecture “could only be plausible if the could only be plausible if the uncertainty were changing over time so this was my goal. uncertainty were changing over time so this was my goal. Econometricians call this heteroskedasticity.” (Robert Engle Econometricians call this heteroskedasticity.” (Robert Engle 2003)2003)

Friedman, M. (1977), "Nobel Lecture: Inflation and Unemployment," Journal of Political Economy, 85, 451-472. Engle, Robert (2003)RISK AND VOLATILITY: ECONOMETRIC MODELS AND FINANCIAL PRACTICE, Nobel Lecture, December 8, 2003

9

MotivationMotivation :Sentiment Analysis? :Sentiment Analysis? Two strands of literature imply asymmetry in

the response of exchange rates to news. First Strand: bad news in “good times” should have

an unusually large impact Second Strand: “bad news should have unusually

large effects” Robert Engle was shared the 2003 Nobel Prize

in Economic sciences on formulating the impact of ‘news’ on economic and financial variables. ‘News’ was code for the ‘announcement of key economic indices by various agencies’.

Torben G. Andersen, Tim Bollerslev, Francis X. Diebold &Clara Vega (2002). MICRO EFFECTS OF MACRO ANNOUNCEMENTS:REAL-TIME PRICE DISCOVERY IN FOREIGN EXCHANGE. Working Paper 8959 Cambridge, MA: NATIONAL BUREAU OF ECONOMIC RESEARCH. http://www.nber.org/papers/w8959

10

Motivation: Bounded RationalityMotivation: Bounded Rationality

Daniel Kahneman•Maps of Bounded Rationality – Two generic modes of

cognitive function: an intuitive mode, where judgements and decisions are made automatically and rapidly, and a controlled mode which is deliberate

and slower (pp 449)

•Kahneman and Tversky found that intuitive judgements occupy a position […] between automatic operation of

perception and the deliberate operations of reasoning (e.g. discrepancy between statistical judgement and statistical

knowledge). (pp 450)

•Highly accessible features will influence decisions, while features of low accessibility will be largely

ignored. (pp459)

•Abrupt transition from risk aversion to risk seeking could not be plausibly explained by a utility function for

wealth (pp 461)

11

Japanese yen/US dollar exchange rate (decreasing solid line); US Japanese yen/US dollar exchange rate (decreasing solid line); US consumer price index (increasing solid line); Japanese consumer consumer price index (increasing solid line); Japanese consumer price index (increasing dashed line), 1970:1 − 2003:5, monthly price index (increasing dashed line), 1970:1 − 2003:5, monthly observationsobservations

Why is it that Japanese consumer price index is following the same trend as the US CPI?

Motivation: Bounded Motivation: Bounded RationalityRationality

12

Motivation: Motivation: I wrote therefore I existed; I may write and I wrote therefore I existed; I may write and

change the worldchange the world

The real world The real world GenreGenre

News Reports; Regulatory Body News Reports; Regulatory Body ReportsReports

InformativeInformative

Commentaries; Letters to the Commentaries; Letters to the Editors; Rumour-laden e-mailsEditors; Rumour-laden e-mails

AppelativeAppelative

Semi-structured interviews; Semi-structured interviews; Confidence SurveysConfidence Surveys

ExpressiveExpressive

++ Language and text are constitutive (and not merely representational)-- ‘society is not reducible to language and linguistic analysis (Hodgson 2000:62). -- Discourses are broader than language, being constituted not just in texts, but also in definite institutional and organizational practices’ (Jackson 2004). ++ But text is all we have after the event, the interview, the survey, the news, the review – a trace of the sentiment.

13

The quality of social The quality of social interactioninteraction

or the world according to Khurshid Ahmador the world according to Khurshid AhmadAny analysis of the interaction Any analysis of the interaction between the members of a between the members of a well defined social group, well defined social group, where each is engaged in where each is engaged in optimising return on his or optimising return on his or her economic and social her economic and social investment, should involve an investment, should involve an analysis of the 'sentiments' of analysis of the 'sentiments' of the group members the group members

14

The quality of social The quality of social interactioninteraction

or the world according to Khurshid Ahmador the world according to Khurshid AhmadThe sentiment is expressed in the The sentiment is expressed in the news and views that emanate for news and views that emanate for and on behalf of the members in and on behalf of the members in free natural language writing and free natural language writing and speech excerpts. speech excerpts. The quantifiable aspects of the The quantifiable aspects of the exchange of objects abstract exchange of objects abstract (power) and concrete (money, (power) and concrete (money, goods, and services) have to be goods, and services) have to be assessed in the context of how the assessed in the context of how the news and views may impact on the news and views may impact on the exchange. exchange.

15

The quality of social The quality of social interactioninteraction

or the world according to other folkor the world according to other folkMore importantly the sentiment may More importantly the sentiment may be expressed through be expressed through actionaction::(a) panic buying and selling of (a) panic buying and selling of financial instruments by the investors financial instruments by the investors and traders, and and traders, and (b) the sometimes complacent attitude (b) the sometimes complacent attitude of the regulators, are good examples of the regulators, are good examples of economic, social and political of economic, social and political action by individuals and groups. action by individuals and groups.

Simon, H.A. (1978). “Rational Decision-Making in Business Organizations”. Nobel Lectures, Economics 1969-1980, (Editor) Assar Lindbeck, World Scientific Publishing Co.: Singapore, 1992. http://www.nobel.se/economics/laureates/1978/simon-lecture.html.

Kahneman, D. (2002). “Maps of Bounded Rationality: A perspective on Intuitive Judgement and Choice”, Les Prix Nobel 2002. (Editor) Professor Tore Frangsmyr. http://www.nobel.se/economics/laureates/2002/kahneman-lecture.html.Mackenzie, Donald. (2000). ‘Fear in the Markets’. London Review of Books. Vol 22 (No. 8).

16

The quality of social The quality of social interactioninteraction

or the world according to other folkor the world according to other folkActions motivated by panic Actions motivated by panic can equally well be seen in can equally well be seen in mass hysteria related to mass hysteria related to national/ethnic identity that, national/ethnic identity that, in turn, can motivate in turn, can motivate concerns related to security concerns related to security and safety (Jackson 2004). and safety (Jackson 2004).

Jackson, Richard (2004). ‘The Social Construction of Internal War’ In (Ed.) Richard Jackson. (Re)Constructing Cultures of Violence and Peace. Rodopi: Amsterdam/New York.

17

e-Science and social interaction?e-Science and social interaction?

The UK e-Science programme is The UK e-Science programme is moving towards successful completion. moving towards successful completion.

Major contribution has been made to Major contribution has been made to UK science and technology:UK science and technology: Bioinformatics, psychiatry, chemistry and Bioinformatics, psychiatry, chemistry and

engineering (engineering (Discovery Net and Discovery Net and mymyGridGrid)) New ways of doing chemistry (New ways of doing chemistry (CombeChemCombeChem) ) Visualisation of complex systems Visualisation of complex systems

((RealityGridRealityGrid);); Novel design (Novel design (GEODISEGEODISE);); Safer aircrafts (Safer aircrafts (DAMEDAME))

18

e-Science and social interaction?e-Science and social interaction?

Crime, conflict, and economy are Crime, conflict, and economy are deeply interrelated and highly deeply interrelated and highly interactive. interactive.

However, data and methods in each However, data and methods in each area are in a mono-disciplinary silo, area are in a mono-disciplinary silo, referred to by some as referred to by some as data tombsdata tombs, , where access to others requires where access to others requires significant mediation. significant mediation.

Data required in each case includes Data required in each case includes quantitative data, textual data, and quantitative data, textual data, and historical data. historical data.

19

e-Science and social interaction?e-Science and social interaction? Social sciences and the so-called hard Social sciences and the so-called hard

sciences increasingly use complementary sciences increasingly use complementary methodologies, and a century or more of methodologies, and a century or more of discussion of methodology, statistical discussion of methodology, statistical methods and structural models is witness methods and structural models is witness to this. to this.

E-Science offers the potential for E-Science offers the potential for convergence of scientific methods convergence of scientific methods through provision of a common through provision of a common underlying structure, or "grid", of underlying structure, or "grid", of computational methods, data-base computational methods, data-base technologies and conceptual models. technologies and conceptual models.

20

e-Science and social interaction?e-Science and social interaction?

Social scientists often want to develop Social scientists often want to develop evidence basedevidence based substantive theory. substantive theory. They They want to know “what determines what”, e.g. long want to know “what determines what”, e.g. long term unemployment and social exclusionterm unemployment and social exclusion

And social scientists want to explore the And social scientists want to explore the consequences ofconsequences of policy changespolicy changes on on individual behaviour, e.g. encouragement to individual behaviour, e.g. encouragement to stay on at school on educational attainment, stay on at school on educational attainment, truancy, and social exclusion truancy, and social exclusion

Social science data sets may be Social science data sets may be small small (<10GB (<10GB (some exceptions)) but they are (some exceptions)) but they are complexcomplex

(Imitation is the sincerest form of flattery – Rob)(Imitation is the sincerest form of flattery – Rob)

21

e-Science and social interaction?e-Science and social interaction?

Financial Financial EconomicsEconomics

Sociology of Sociology of Crime; Crime Crime; Crime

ScienceScience

Social Social

AnthropologyAnthropology

Macro-micro Economic Indicators; Census Statistics;Macro-micro Economic Indicators; Census Statistics;

Survey of Social Attitudes; Survey of Social Attitudes;

Life-style and Well-being Statistics;Life-style and Well-being Statistics;

Market MovementMarket Movement Crime Crime

StatisticsStatisticsEthnicity-related Ethnicity-related

datadata

Political News – Reports, Editorials, Political News – Reports, Editorials, Letters to the EditorLetters to the Editor; ;

Political and Social Opinion Polls; Political and Social Opinion Polls;

Consumer Confidence Survey;Consumer Confidence Survey;

Investor/Trader Investor/Trader Confidence Surveys; Confidence Surveys; Regulatory Body Regulatory Body Output;Output;

Financial News;Financial News;

Citizen Confidence Citizen Confidence Surveys; Surveys;

Police Forces/Home Police Forces/Home Office Reports;Office Reports;

Crime Reports;Crime Reports;

Ethnic Minority Surveys; Ethnic Minority Surveys;

Police Forces/Home Office Police Forces/Home Office Reports;Reports;

Crime Reports;Crime Reports;

22

The Surrey Society Grid The Surrey Society Grid DemonstratorDemonstrator

Was developed under the aegis of the ESRC e-Was developed under the aegis of the ESRC e-Social Science Programme (FINGRID). Social Science Programme (FINGRID).

demonstrated how Grid technologies could demonstrated how Grid technologies could support novel research activities in financial support novel research activities in financial economics that involve economics that involve the rapid processing of large volumes of time-varying the rapid processing of large volumes of time-varying

qualitative and quantitative dataqualitative and quantitative data (Monte Carlo simulation, (Monte Carlo simulation, wavelet analysis, fuzzy logic and neural network based wavelet analysis, fuzzy logic and neural network based simulations)simulations)

fusing/visualising of such qualitative and quantitative fusing/visualising of such qualitative and quantitative data (qualitative data –news, e-mails- and quantitative data (qualitative data –news, e-mails- and quantitative data – non-stationary and heteroskadistic data data – non-stationary and heteroskadistic data collated at different frequencies and in different units.collated at different frequencies and in different units.

23

Globus Toolkit 3.0 (based on Globus Toolkit 3.0 (based on Open Grid Services Open Grid Services Architecture (OGSA))Architecture (OGSA))

Java CogKit (Java Commodity Grid) for resource Java CogKit (Java Commodity Grid) for resource management and system integrationmanagement and system integration

Languages for Development:Languages for Development: Java for the implementation of the applicationJava for the implementation of the application Reuters SSL Developer’s Kit (Java) for the connection with the Reuters SSL Developer’s Kit (Java) for the connection with the

Reuters streaming dataReuters streaming data Other Technologies:Other Technologies:

XML (NewsML) for the news informationXML (NewsML) for the news information JMatlink (adapted to Linux environment for the communication with JMatlink (adapted to Linux environment for the communication with

Matlab environment)Matlab environment) CGI for communication of Java Applet with the server sideCGI for communication of Java Applet with the server side

The Society Grid DemonstratorThe Society Grid Demonstrator

24

The Society Grid DemonstratorThe Society Grid Demonstrator

Live financial data:Live financial data: news, historical time series data news, historical time series data and tick data provided by Reuters, (Reuters SSL SDK). and tick data provided by Reuters, (Reuters SSL SDK).

Time series analysis:Time series analysis: a FORTRAN bootstrap algorithm, a FORTRAN bootstrap algorithm, and the MATLAB toolkit for Wavelet Analysis (via and the MATLAB toolkit for Wavelet Analysis (via JMatLink)JMatLink)

News/Sentiment analysis:News/Sentiment analysis: System Quirk components System Quirk components for terminology extraction, ontology learning and local for terminology extraction, ontology learning and local grammar analysis.grammar analysis.

Visualisation and fusion:Visualisation and fusion: System Quirk components for System Quirk components for corpus visualisation, financial charting, and data fusion.corpus visualisation, financial charting, and data fusion.

25

Design and Performance of the Design and Performance of the Society GridSociety Grid

4

4.5

5

5.5

6

6.5

7

0 16 32 48 64 80

Preparation Time

GridFTP Upload Time

Processing Time

GridFTP Download Time

Number of CPUs

Tim

e in

ms

(lo

g)

26

The new (e-) Social The new (e-) Social Sciences?Sciences? Social sciences deal with collectives, or Social sciences deal with collectives, or

agencements agencements comprising human comprising human beings, technical devices, algorithms, beings, technical devices, algorithms, workplaces and so on (Callon 1998), workplaces and so on (Callon 1998), such that the number of items of such that the number of items of quantitative and qualitative information quantitative and qualitative information to a well equipped economic actor, or to a well equipped economic actor, or agencement, ‘agencement, ‘is, in effect, infinite, yet is, in effect, infinite, yet the capacity of any agencement to the capacity of any agencement to apprehend and to interpret that data is apprehend and to interpret that data is finite’ (Hardie and MacKenzie 2005) finite’ (Hardie and MacKenzie 2005)

Callon, Michael. (1998). The Laws of the Markets. Oxford: Blackwell. Hardie, Iain & MacKenzie, Donald. (July 2005). An Economy of Calculation: Agencement and Distributed Cognition in a Hedge Fund (http://www.sps.ed.ac.uk/staff/An%20Economy%20of%20Calculation.pdf)

27

The new (e-) Social The new (e-) Social Sciences?Sciences? The number of data items available to an The number of data items available to an agencement agencement in a in a

market place – financial instruments, commodity markets, emarket place – financial instruments, commodity markets, e-Bay -Bay (?)(?) – – is potentially infinite but at any give time only a fraction of is potentially infinite but at any give time only a fraction of that data can be processed. The market place is a fickle place that data can be processed. The market place is a fickle place and the information derived from historical data can be so and the information derived from historical data can be so quickly outdated that ‘in any quickly outdated that ‘in any agencement agencement for a selective, for a selective, socially distributed, technologically-mediated ‘economy of socially distributed, technologically-mediated ‘economy of calculation’. calculation’.

   ““The economies of calculation and the The economies of calculation and the agencements agencements that that

underpin them stretch beyond individual firms: the sifting of underpin them stretch beyond individual firms: the sifting of information often takes place in networks of interacting information often takes place in networks of interacting participants. The features of processes involved – for instance, participants. The features of processes involved – for instance, where agency lies, the types of information that are deemed where agency lies, the types of information that are deemed relevant or irrelevant, how that information is processed – are relevant or irrelevant, how that information is processed – are consequential. They affect, for example, the possibility of a consequential. They affect, for example, the possibility of a ‘global’ market and help shape how ‘markets’ and ‘politics’ ‘global’ market and help shape how ‘markets’ and ‘politics’ interact.” (Hardies & Mackenzie 2005).interact.” (Hardies & Mackenzie 2005).

Hardie, Iain & MacKenzie, Donald. (July 2005). An Economy of Calculation: Agencement and Distributed Cognition in a Hedge Fund (available from [email protected])

28

The new (e-) Social The new (e-) Social Sciences?Sciences? Sentiments and the sociology of financial Sentiments and the sociology of financial

marketsmarkets Mackenzie has focused on how a mathematical-Mackenzie has focused on how a mathematical-

economics theory is used to create a new instrument economics theory is used to create a new instrument – especially arbitrage (Mackenzie 2003) and options – especially arbitrage (Mackenzie 2003) and options markets (Mackenzie and Millo 2003, Mackenzie markets (Mackenzie and Millo 2003, Mackenzie 2004)- and then the theory is used to explain and 2004)- and then the theory is used to explain and monitor the workings of the instrument. monitor the workings of the instrument.

Mackenzie, Knorr-Cettina and others are studying Mackenzie, Knorr-Cettina and others are studying the rise of electronic markets – where people in the rise of electronic markets – where people in distant geographical locations can be distant geographical locations can be ‘interactionally present’‘interactionally present’

Mackenzie, Donald. (2003). ‘Long-Term Capital Management and the sociology of arbitrage’. Economy and Society Vol. 32 (No. 3). pp 349-380.

29

The new (e-) Social The new (e-) Social Sciences?Sciences?Sentiments and the sociology of financial marketsSentiments and the sociology of financial markets

Mackenzie used interviewing techniques to understand the Mackenzie used interviewing techniques to understand the collapse of a large arbitrage firm (Long-Term Capital collapse of a large arbitrage firm (Long-Term Capital Management, Management, LTCMLTCM), a firm that pioneered trading of financial ), a firm that pioneered trading of financial instruments that sought to profit from price discrepancies; the instruments that sought to profit from price discrepancies; the 24/7 watch on price discrepancies requires a distributed 24/7 watch on price discrepancies requires a distributed computational infrastructure. computational infrastructure.

Mackenzie (2003) has looked at the change in the value of the Mackenzie (2003) has looked at the change in the value of the instruments and has conducted just under 70 interviews with instruments and has conducted just under 70 interviews with partners and employees of the failed firm, including a Nobel partners and employees of the failed firm, including a Nobel Laureate who was a partner, and with other experts, together Laureate who was a partner, and with other experts, together with documents that were found to have precipitated or with documents that were found to have precipitated or hastened the demise of LTCM. The sentiment about LCTM as hastened the demise of LTCM. The sentiment about LCTM as expressed in the interviews, and in some of the key documents, expressed in the interviews, and in some of the key documents, formed the basis of an analysis of a set of time series and the formed the basis of an analysis of a set of time series and the computation of key parameters of the time series. computation of key parameters of the time series.

Mackenzie, Donald. (2003). ‘Long-Term Capital Management and the sociology of arbitrage’. Economy and Society Vol. 32 (No. 3). pp 349-380.

30

The new (e-) Social The new (e-) Social Sciences?Sciences?Sentiments and the sociology of financial marketsSentiments and the sociology of financial markets

Mackenzie found that he was working with a community Mackenzie found that he was working with a community of people who had organized themselves and knew each of people who had organized themselves and knew each other. There was evidence that imitation of the business other. There was evidence that imitation of the business model and practices adapted by the firm by others played model and practices adapted by the firm by others played a major role in the demise of the firm. Most importantly a major role in the demise of the firm. Most importantly for us Mackenzie cites the existence of a fax sent by one for us Mackenzie cites the existence of a fax sent by one of the principals of the firm that asked investors to make of the principals of the firm that asked investors to make more investment as problems had started to arise: this more investment as problems had started to arise: this fax was posted on the Internet within five minutes of its fax was posted on the Internet within five minutes of its dispatch and contributed to the demise of the firm. The dispatch and contributed to the demise of the firm. The sentiments expressed by the principal were misconstrued sentiments expressed by the principal were misconstrued by the recipients and despite the fairly sound reasons by the recipients and despite the fairly sound reasons expressed in the fax, albeit in a febrile atmosphere, expressed in the fax, albeit in a febrile atmosphere, bounded rationality of the recipients came into play.bounded rationality of the recipients came into play.

Mackenzie, Donald. (2003). ‘Long-Term Capital Management and the sociology of arbitrage’. Economy and Society Vol. 32 (No. 3). pp 349-380.

31

The new (e-) Social The new (e-) Social Sciences?Sciences? Sentiments and the sociology of financial marketsSentiments and the sociology of financial markets

Knorr-Cetina and Bruegger (2002) have looked at the emergence of Knorr-Cetina and Bruegger (2002) have looked at the emergence of electronic markets and focused on the virtual societies being formed in electronic markets and focused on the virtual societies being formed in the financial markets through the infrastructure that supports electronic the financial markets through the infrastructure that supports electronic trading. trading.

The trading room operative is in a disembodied world dealing with a on-The trading room operative is in a disembodied world dealing with a on-screen reality that ‘lacks an off-screen counterpart’ – a form of screen reality that ‘lacks an off-screen counterpart’ – a form of arepresentation (arepresentation (appresentation) of markets. The operative is connected appresentation) of markets. The operative is connected to others through electronic mail, news and data feeds (this is not to others through electronic mail, news and data feeds (this is not explicitly dealt with in Knorr-Cteina and Bruegger), and has access to a explicitly dealt with in Knorr-Cteina and Bruegger), and has access to a computing system that can process very complex data in a timely and computing system that can process very complex data in a timely and efficient manner. efficient manner.

This virtual world has fast throughput of data and processed information This virtual world has fast throughput of data and processed information and the rapidity of the interaction perhaps compensates for the and the rapidity of the interaction perhaps compensates for the disembodied nature of the electronic trading markets. disembodied nature of the electronic trading markets.

Knorr-Cetina, Karin & Bruegger, Urs. (2002). ‘Global Microstructures: The Virtual Societies of Financial Markets’. American Journal of Sociology. Volume 107, pp 909-950.

32

The new (e-) Social The new (e-) Social Sciences?Sciences?

Hardie, Iain & MacKenzie, Donald. (July 2005). An Economy of Calculation: Agencement and Distributed Cognition in a Hedge Fund (available from [email protected])

There is a constant stream of news and e-mails in a dealing room. Some directly from news agencies (*) and some annotated items based on the news

33

The new (e-) Social The new (e-) Social Sciences?Sciences?

Hardie, Iain & MacKenzie, Donald. (July 2005). An Economy of Calculation: Agencement and Distributed Cognition in a Hedge Fund (available from [email protected])

There is a constant stream of news and e-mails in a dealing room. Some directly from news agencies (*) and some annotated items based on the news

34

The new (e-) Social The new (e-) Social Sciences?Sciences?

Hardie, Iain & MacKenzie, Donald. (July 2005). An Economy of Calculation: Agencement and Distributed Cognition in a Hedge Fund (available from [email protected])

35

The new (e-) Social The new (e-) Social Sciences? Sciences?

Hardie, Iain & MacKenzie, Donald. (July 2005). An Economy of Calculation: Agencement and Distributed Cognition in a Hedge Fund (available from [email protected])

But whilst the trader is not ‘reading’ the news off the live news wire streams – Reuters, Bloomberg, BBC, CNN- somebody else is eyeballing the news for the content (Brazilian economics, Chilean politics) and the sentiment (bonds so hot that they were on fire!)

36

The classical Social The classical Social Sciences: Eyeballing the Sciences: Eyeballing the

text!text! The key requirement in The key requirement in

contemporary social sciences is to contemporary social sciences is to complement the analysis of a range complement the analysis of a range of data sets, demographic, economic of data sets, demographic, economic and political, with data related to the and political, with data related to the personperson (Kahneman 2002, Simon (Kahneman 2002, Simon 1972),1972), or lived experience or lived experience (Sacks (Sacks 1992, Sliverman 2004) 1992, Sliverman 2004)

Sacks, H., (1992). Lectures on Conversation. Oxford: Blackwell Publishers (Ed. Gail Jefferson).Silverman, David. (2004). ‘Who cares about experience?’. In (Ed.) David Silverman. Qualitative Research. London: Sage Publications. ‘pp 342-367.

37

The classical Social Sciences: Eyeballing the The classical Social Sciences: Eyeballing the text!text!

PackagePackage FunctionFunction FacilitiesFacilitiesATLAS.tiATLAS.ti text analysis and text analysis and

model building. model building. Users attach code and annotate; Users attach code and annotate; search/select segments by code; Manual search/select segments by code; Manual hotlinks connecting segments, displays hotlinks connecting segments, displays link information diagrammatically.link information diagrammatically.

Similar segments can be coded Similar segments can be coded automaticallyautomatically

The General The General InquirerInquirer

content analysiscontent analysis Users can establish patterns in the Users can establish patterns in the meaning of words supported by large meaning of words supported by large content dictionaries (Lasswell Value content dictionaries (Lasswell Value Dictionary; Harvard Psycho-Sociological Dictionary; Harvard Psycho-Sociological Dictionary)Dictionary)

NvivoNvivo ‘‘Entry’ level Entry’ level qualitative text qualitative text analysisanalysis

Users supply text patterns and can Users supply text patterns and can analyse text data base through text-analyse text data base through text-pattern matching to search for pattern matching to search for repetition, variant word forms, recurrent repetition, variant word forms, recurrent phrases.phrases.

QUALRUSQUALRUS General purpose General purpose qualitative qualitative analysis packageanalysis package

Offers intelligent suggestions throughout the Offers intelligent suggestions throughout the coding process; analysis of data once it has coding process; analysis of data once it has already been codedalready been coded

TextSmart TextSmart ((SPSS's SPSS's module)module)

coding and coding and analyzing open-analyzing open-ended survey ended survey questionsquestions

Automated stemming; grouping of synonyms; Automated stemming; grouping of synonyms; excludes grammatical words automatically; excludes grammatical words automatically; Term clustering; text categorisation based on Term clustering; text categorisation based on clustering; Dictionary free approach clustering; Dictionary free approach

38

The classical Social The classical Social Sciences: Eyeballing the Sciences: Eyeballing the

text!text! What is missing in the qualitative What is missing in the qualitative analysis packages?analysis packages? The texts have to be The texts have to be eye-balledeye-balled – Most – Most

phrases, clauses, paragraphs have to be phrases, clauses, paragraphs have to be coded/annotated by hand coded/annotated by hand impossible task when texts all around us is exploding;

There is a need for a domain specific There is a need for a domain specific thesaurus (conceptually-organised thesaurus (conceptually-organised terminology or ‘ontology’) for each new terminology or ‘ontology’) for each new domain domain • Identify ontological commitments;Identify ontological commitments;• Find terms, and the broader/narrower equivalents; Find terms, and the broader/narrower equivalents;

synonyms and antonyms;synonyms and antonyms;• Maintain terminology data basesMaintain terminology data bases

Texts that are conceptually similar within a Texts that are conceptually similar within a domain have to be clustered using domain have to be clustered using unsupervised learning algorithmsunsupervised learning algorithms

39

The new (e-) Social The new (e-) Social Sciences? Sciences? Towards an automatic Towards an automatic

analysisanalysis What is missing in the qualitative What is missing in the qualitative

analysis packages?analysis packages?

40

The new (e-) Social The new (e-) Social Sciences? Sciences? Towards an automatic Towards an automatic

analysisanalysis One key result of close social One key result of close social interaction is the emergence of a sub-interaction is the emergence of a sub-set of the natural language of a given set of the natural language of a given community that is idiosyncratic of the community that is idiosyncratic of the desires, aspirations, goals and desires, aspirations, goals and prejudices of the community prejudices of the community idiosyncratic nature of the ontological idiosyncratic nature of the ontological commitment of the community;commitment of the community;

The subset has its own The subset has its own lexicogrammar and is called lexicogrammar and is called language language for special purposes for special purposes of a given of a given specialismspecialism

Lexicogrammar: Vocabulary Lexicogrammar: Vocabulary (terminology) + Local Grammar(terminology) + Local Grammar

41

The new (e-) Social The new (e-) Social Sciences? Sciences? Towards an automatic Towards an automatic

analysisanalysisJuly 2005 Reuters Financial News Service: News items disambiguated using an automatic extracted terminology and an automatically local grammar that only recognises

changes in financial instruments

TotalTotal Per HourPer HourNumber of News ItemsNumber of News Items 134,975134,975 208208Number of WordsNumber of Words 46,337,11146,337,111 7150871508Raw SentimentRaw Sentiment 774,507774,507 11951195

Raw PositiveRaw Positive 520, 006520, 006 802802Raw NegativeRaw Negative 254, 501254, 501 393393

Filtered SentimentFiltered Sentiment 56,10256,102 8787Filtered PositiveFiltered Positive 17,34017,340 2727

Filtered NegativeFiltered Negative 38,76238,762 6060

42

The new (e-) Social The new (e-) Social Sciences? Sciences? Towards an automatic Towards an automatic

analysisanalysisChanges in ‘semantic orientation’ for a news input, for July 2005 for all shares in the FTSE.

-500

-300

-100

100

300

500

0 50 100 150 200 250

Hours

Sem

anti

c O

rien

tati

on

Series1

43

The new (e-) Social The new (e-) Social Sciences? Sciences? Towards an Towards an

automatic analysisautomatic analysis•There is no obvious technique in social science research method that can improve the researchers productivity in collecting and analysing large volumes of speech and text.

•Social scientists survey, and occasionally interview, interesting individuals in various social groups – analyse the survey form and quantify.

•So what about the data collected in the field. Data is buried in tombs never to be taken out again.

•Most text, if ever, is hand-coded by the social science researcher and then the proxy of the interpretation of the codes is presented as objective analysis.

The real The real world world

GenrGenree

News News Reports; Reports; Regulatory Regulatory Body Body ReportsReports

InformaInformativetive

CommentariCommentaries; Letters to es; Letters to the Editors; the Editors; Rumour-Rumour-laden e-mailsladen e-mails

AppelatAppelativeive

Semi-Semi-structured structured interviews; interviews; Confidence Confidence SurveysSurveys

ExpressExpressiveive

44

The new (e-) Social The new (e-) Social Sciences? Sciences? Towards an automatic Towards an automatic

analysisanalysis The real The real world world

GenreGenre

News Reports; News Reports; Regulatory Regulatory Body ReportsBody Reports

InformatiInformativeve

Commentaries; Commentaries; Letters to the Letters to the Editors; Editors; Rumour-laden Rumour-laden e-mailse-mails

AppelativAppelativee

Semi-Semi-structured structured interviews; interviews; Confidence Confidence SurveysSurveys

ExpressivExpressivee

•We present a method for systematically identifying sentiment bearing phrases in large volumes of streaming texts – a local grammar comprising templates to extract the phrases with a minimal number of false positives.

•The sentiments are aligned with quantitative (time-varying) information and results co-integrated and tested for Granger causality

•The grammar itself is constructed automatically from a corpus of domain specific texts

45

Conclusions and Future WorkConclusions and Future Work

The methods developed in the Society Grids The methods developed in the Society Grids project can be usedproject can be used to investigate how a person’s perception of his or her to investigate how a person’s perception of his or her

own well being, at different times and in different places, own well being, at different times and in different places, and in various facets - social, political and economic.and in various facets - social, political and economic.

This can be the same or at variance with, say for This can be the same or at variance with, say for example, crime statistics, economic indicators, example, crime statistics, economic indicators, achievements or failures of (other) ethnic/racial achievements or failures of (other) ethnic/racial categories.categories.

These can be extended to the new areas likeThese can be extended to the new areas like the the reassurance gapreassurance gap in policing in policing totalising war discoursetotalising war discourse that leads to ethnic/racial that leads to ethnic/racial

conflictsconflicts

46

We rely on reviews and opinion polls of We rely on reviews and opinion polls of various kinds:various kinds: Film & TV reviews; Book reviews; Resort Film & TV reviews; Book reviews; Resort

reviewsreviews Bank reviews; Automobile Review; White Bank reviews; Automobile Review; White

good reviews;good reviews; Consumer surveys; ‘write your own’ Consumer surveys; ‘write your own’

reviews;reviews; Newspaper editorials; Editors’ choice.Newspaper editorials; Editors’ choice.

Towards an automatic Towards an automatic analysis of sentiments?analysis of sentiments?

47

We rely on the sentiment of the We rely on the sentiment of the reviewers, editors, investment experts, and reviewers, editors, investment experts, and ………… We do know the cost of durables, shares, We do know the cost of durables, shares, holidays.holidays. A reasonable price is rejected if the A reasonable price is rejected if the reviews are poor; an exorbitant price is reviews are poor; an exorbitant price is acceptable if the reviews are good;acceptable if the reviews are good; Bad reviews stick in the mind for longer Bad reviews stick in the mind for longer than good reviews.than good reviews.

Towards an automatic Towards an automatic analysis of sentiments?analysis of sentiments?

48

We rely on the sentiment of We rely on the sentiment of the more vociferous in the the more vociferous in the society sometimessociety sometimes The vociferous may call black The vociferous may call black white, and white black;white, and white black; The vociferous may repudiate The vociferous may repudiate facts and purvey fiction.facts and purvey fiction.

Towards an automatic Towards an automatic analysis of sentiments?analysis of sentiments?

49

Turney, Peter D. (2002). “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews”. In Proc of the 40th Ann. Meeting of the Ass. for Comp. Linguistics (ACL). Philadelphia, July 2002, pp. 417-424. (Available at http://acl.ldc.upenn.edu/P/P02/P02-1053.pdf).

online service online service unethical practices unethical practices

online experience online experience low funds low funds

direct deposit direct deposit other problems other problems

local branch local branch old man old man

low fees low fees lesser evil lesser evil

well other well other virtual monopoly virtual monopoly

small part small part probably wondering probably wondering

printable version printable version little difference little difference

true service true service other bank other bank

other bank other bank possible moment possible moment

inconveniently locatedinconveniently located extra day extra day

A new bank has just been launched: Punter Smith has passed his judgement on the bank. Which of the two columns tells us that he likes the new outfit?

Towards an automatic Towards an automatic analysis of sentiments?analysis of sentiments?

50

Turney, Peter D. (2002). “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews”. In Proc of the 40th Ann. Meeting of the Ass. for Comp. Linguistics (ACL). Philadelphia, July 2002, pp. 417-424. (Available at http://acl.ldc.upenn.edu/P/P02/P02-1053.pdf).

online service online service unethical unethical practices practices

online experience online experience low funds low funds

direct deposit direct deposit other other problems problems

local branch local branch old man old man

low fees low fees lesser evil lesser evil

well other well other virtual virtual monopoly monopoly

small part small part probably probably wondering wondering

printable version printable version little little difference difference

true service true service other bank other bank

other bank other bank possible possible moment moment

inconveniently inconveniently locatedlocated

extra day extra day

How can a machine detect the positive/negative sentiment from texts? We eyeball the collocation of words like excellent & poor in text corpus.

The point wise mutual information is computed between word1 & word2:

))()((

)&((),(

21

21

21 wordpwordp

wordwordpwordwordPMI

Semantic orientation of phrase is given as:

),"("

),"(")(

phrasepoorPMI

phraseexcellentPMIphraseSemOr

Towards an automatic Towards an automatic analysis of sentiments?analysis of sentiments?

51

PhrasePhrase Semantic Semantic OrientationOrientation

PhrasePhrase Semantic Semantic OrientationOrientation

online service online service 2.7802.780 unethical unethical practices practices

-8.484-8.484

online experience online experience 2.2532.253 low funds low funds -6.843-6.843

direct deposit direct deposit 1.2881.288 other problems other problems -2.748-2.748

local branch local branch 0.4210.421 old man old man -2.566-2.566

low fees low fees 0.3330.333 lesser evil lesser evil -2.288-2.288

well other well other 0.2370.237 virtual monopoly virtual monopoly -2.050-2.050

small part small part 0.0530.053 probably probably wondering wondering

-1.830-1.830

printable version printable version -0.705-0.705 little difference little difference -1.615-1.615

true service true service -0.732-0.732 other bank other bank -0.850-0.850

other bank other bank -0.850-0.850 possible moment possible moment -0.668-0.668

inconveniently inconveniently locatedlocated

-1.541-1.541 extra day extra day -0.286-0.286

How can a machine detect the positive/negative sentiment from texts? We eyeball the collocation of words like excellent & poor in a number of texts.

Towards an automatic Towards an automatic analysis of sentiments?analysis of sentiments?

52

Robert Engle’s contribution:Robert Engle’s contribution: Volatility may vary Volatility may vary

considerably over time: large (small) changes in considerably over time: large (small) changes in returns are followed by large (small) changes.returns are followed by large (small) changes.

Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimatesof the variance of United Kingdom inflation. Econometrica Vol 50, pp 987—1007.

Towards an automatic Towards an automatic analysis of sentiments?analysis of sentiments?

53

Engle and Ng have developed the concept of the Engle and Ng have developed the concept of the news news impact curveimpact curve. .

To condition at time To condition at time tt on the information available at on the information available at t − t − 22 and thus consider the effect of the shock and thus consider the effect of the shock ε ε t−1t−1 on the on the conditional variance conditional variance hhtt in isolation. in isolation.

The conditional variance is affected by the latest The conditional variance is affected by the latest information, “the news” information, “the news” ε ε t−1t−1::

• The symmetric case: Both positive and negative news has the The symmetric case: Both positive and negative news has the same effect. same effect.

• The assymetric case: a positive and an equally large negative The assymetric case: a positive and an equally large negative piece of “news” do not have the same effect on the piece of “news” do not have the same effect on the conditional variance.conditional variance.

Engle, R. F. and Ng, V. K (1993). Measuring and testing the impact of news on volatility, Journal of Finance Vol. 48, pp 1749—1777.

2

110

tth

11

2

110

ttthh

Towards an automatic Towards an automatic analysis of sentiments?analysis of sentiments?

54

News Analysis and Sentiment News Analysis and Sentiment AnalysisAnalysis

Dan Nelson (1992) ‘recognized that Dan Nelson (1992) ‘recognized that volatility could respond asymmetrically to volatility could respond asymmetrically to past forecast errors. In a financial context, past forecast errors. In a financial context, negative returns seemed to be more negative returns seemed to be more important predictors of volatility than important predictors of volatility than positive returns. Large price declines positive returns. Large price declines forecast greater volatility than similarly forecast greater volatility than similarly large price increases. This is an large price increases. This is an economically interesting effect that has economically interesting effect that has wide ranging implications’wide ranging implications’

55

Engle, R. F. and Ng, V. K (1993). Measuring and testing the impact of news on volatility, Journal of Finance Vol. 48, pp 1749—1777.

Symmetric caseAsymmetric case

Towards an automatic Towards an automatic analysis of sentiments?analysis of sentiments?

56

News Effects News Effects I: News Announcements Matter, and I: News Announcements Matter, and

Quickly;Quickly; II: Announcement Timing MattersII: Announcement Timing Matters III: Volatility Adjusts to News GraduallyIII: Volatility Adjusts to News Gradually IV: Pure Announcement Effects are Present IV: Pure Announcement Effects are Present

in Volatilityin Volatility V: Announcement Effects are Asymmetric – V: Announcement Effects are Asymmetric –

Responses Vary with the Sign of the News;Responses Vary with the Sign of the News; VI: The effect on traded volume persists VI: The effect on traded volume persists

longer than on prices.longer than on prices.

Andersen, T. G., Bollerslev, T., Diebold, F X., & Vega, C. (2002). Micro effects of macro announcements: Real time price discovery in foreign exchange. National Bureau of Economic Research Working Paper 8959, http://www.nber.org/papers/w8959

Towards an automatic analysis Towards an automatic analysis of sentiments?of sentiments?

57

Eyeballing Eyeballing News for News for SentimentsSentiments

Qualitative research methods are being used in financial Qualitative research methods are being used in financial economics, and in sociological studies of financial economics, and in sociological studies of financial markets, for systematically studying the hopes and fears markets, for systematically studying the hopes and fears of the traders, investors, and regulators in the analysis of of the traders, investors, and regulators in the analysis of the behaviour of the markets.the behaviour of the markets.

Since 2000, the analysis of news wire has become Since 2000, the analysis of news wire has become selective and targeted. selective and targeted.

Some researchers choose news related to economic and Some researchers choose news related to economic and financial topics financial topics

news about employmentnews about employment distinguish between scheduled and non-scheduled news distinguish between scheduled and non-scheduled news

announcements; announcements;

58

Eyeballing Eyeballing News for News for SentimentsSentiments

Some pre-select keywords that indicate change in the Some pre-select keywords that indicate change in the value of a financial instrument – including metaphorical value of a financial instrument – including metaphorical terms like terms like above, below, up above, below, up andand down down – and use them to – and use them to ‘represent’ ‘represent’ positivepositive//negativenegative news stories. news stories.

Some use the frequency of collocation patterns for Some use the frequency of collocation patterns for assigning a ‘feel-good/bad’ score to the storyassigning a ‘feel-good/bad’ score to the story

‘‘Good’ news stories appear to comprise collocates like Good’ news stories appear to comprise collocates like revenues revenues roserose, , share roseshare rose; ;

‘‘Bad’ news stories contain Bad’ news stories contain profit warning, poor expectationprofit warning, poor expectation;; ‘‘Neutral’ stories contain collocates such as Neutral’ stories contain collocates such as announces productannounces product, ,

alliance madealliance made;; The ‘sentiment’ of the story is then correlated with that of The ‘sentiment’ of the story is then correlated with that of

a financial instrument cited in the stories and inferences a financial instrument cited in the stories and inferences made.made.

59

Automating News Analysis for Automating News Analysis for Extracting SentimentsExtracting Sentiments

We adopt a text-driven and bottom-up We adopt a text-driven and bottom-up method: starting from a collection of texts method: starting from a collection of texts in a specialist domain, together with a in a specialist domain, together with a representative general language corpus, representative general language corpus, and use the following five-step algorithm and use the following five-step algorithm for identifying discourse patterns with for identifying discourse patterns with more or less unique meanings, without more or less unique meanings, without any overt access to an external knowledge any overt access to an external knowledge basebase

60

Automating News Analysis for Automating News Analysis for Extracting Sentiments: A Extracting Sentiments: A

methodmethodI. Select training corpora: Reuters Corpus

Volume 1 (RCV1) and a general language corpus.

II. Extract key words;III. Extract key collocates;IV. Extract local grammar using collocation

and relevance feedback;V. Assert the grammar as a finite state

automaton.

61

Automating News Analysis for Automating News Analysis for Extracting Sentiments: An Extracting Sentiments: An

experimentexperiment I. Select training corpora Training-CorpusTraining-Corpus

The British National Corpus, comprising 100-The British National Corpus, comprising 100-million tokens distributed over 4124 texts million tokens distributed over 4124 texts (Aston and Burnard 1998);(Aston and Burnard 1998);

Reuters Corpus Volume 1 (Reuters Corpus Volume 1 (RCV1RCV1) comprising ) comprising news texts produced in 1996-1997 and news texts produced in 1996-1997 and contains 181 million words distributed over contains 181 million words distributed over 806,791 texts806,791 texts

62

Automating News Analysis for Automating News Analysis for Extracting Sentiments: An Extracting Sentiments: An

experimentexperiment II. II. Extract key words

The frequencies of individual words in the The frequencies of individual words in the RCV1 were computed using System Quirk;RCV1 were computed using System Quirk;

for describing how our method works we will for describing how our method works we will use a randomly selected component of the use a randomly selected component of the corpus – the output of February 1997, corpus – the output of February 1997, henceforth referred to as the RCV1-Feb97 henceforth referred to as the RCV1-Feb97 corpus;corpus;

the RCV1-Feb97 corpus containing 14 Million the RCV1-Feb97 corpus containing 14 Million words distributed 63,364 texts.words distributed 63,364 texts.

63

Automating News Analysis for Automating News Analysis for Extracting Sentiments: An Extracting Sentiments: An

experimentexperimentRanksRanks RCV1 Feb97RCV1 Feb97

(N(NRCV1Feb97RCV1Feb97=14 Million) =14 Million)

CumulativeCumulative

Number ofNumber of

Tokens (%)Tokens (%)

British NationalBritish National

CorpusCorpus

(N(NBNCBNC=100 Million)=100 Million)

CumulativeCumulative

Number ofNumber of

Tokens (%)Tokens (%)

1-101-10 the, to, of, in, a, and, the, to, of, in, a, and, saidsaid, , on, s, foron, s, for

0.87 M0.87 M

(21.3%)(21.3%)

the, of, and, a, in, to, for, the, of, and, a, in, to, for, is, as, that is, as, that

22.3 M22.3 M

(22.3%)(22.3%)

11-2011-20 at, that, was, is, it, by, with, at, that, was, is, it, by, with, from, from, percentpercent, be, be

0.28 M0.28 M

(6.8%)(6.8%)

was, I, on, with, as, be, was, I, on, with, as, be, he, you, at, byhe, you, at, by

6.51 M6.51 M

(6.5 %)(6.5 %)

21-3021-30 as, he, as, he, millionmillion, , yearyear, its, , its, will, but, has, would, werewill, but, has, would, were

0.17 M0.17 M

(4.2%)(4.2%)

are, this, have, but, not, are, this, have, but, not, from, had, his, they, or from, had, his, they, or

4.23 M4.23 M

(4.2%)(4.2%)

31-4031-40 an, not, are, have, which, an, not, are, have, which, had, up, n, new, had, up, n, new, marketmarket

0.13M0.13M

(3.3%)(3.3%)

which, an, she, where, which, an, she, where, here, we, one, there, all, here, we, one, there, all, been been

3.05 M3.05 M

(3.1%)(3.1%)

41-5041-50 this, we, after, one, last, this, we, after, one, last, companycompany, u, they, , u, they, bankbank, , governmentgovernment

0.10M0.10M

(2.6%)(2.6%)

their, if, has, will, so, their, if, has, will, so, would, no, what, can, would, no, what, can, whenwhen

2.35 M2.35 M

(2.4%)(2.4%)

64

Automating News Analysis for Automating News Analysis for Extracting Sentiments: An Extracting Sentiments: An

experimentexperiment

TokenToken RCV1 Feb97RCV1 Feb97

(N(NRCV1Feb97RCV1Feb97= 14,244,349)= 14,244,349)

BNCBNC

(N(NBNCBNC=100,000,000)=100,000,000)

WeirdnessWeirdness

(a/b)(a/b) RankRank ffRCV1Feb97RCV1Feb97 ffRCV1Feb97 RCV1Feb97 //

NNRCV1Feb97RCV1Feb97

(a)(a)

RankRank ffBNCBNC ffBNC BNC / N/ NBNCBNC

(b)(b)

percentpercent 1919 6576365763 0.462%0.462% 33943394 29282928 0.003%0.003% 157.84157.84

marketmarket 4040 3634936349 0.255%0.255% 301301 3007830078 0.030%0.030% 8.498.49

companycompany 4646 2905829058 0.204%0.204% 219219 4011840118 0.040%0.040% 5.095.09

bankbank 4949 2804128041 0.197%0.197% 562562 1793217932 0.018%0.018% 10.9910.99

sharesshares 5656 2335223352 0.164%0.164% 12851285 84128412 0.008%0.008% 19.5119.51

65

Automating News Analysis for Automating News Analysis for Extracting Sentiments: An Extracting Sentiments: An

experimentexperiment III. Extract key collocates

ff LeftLeft RightRight TotalTotal z-scorez-score

percentpercent 6576365763

upup 53155315 43604360 955955 53155315 15.9115.91

roserose 43614361 39883988 373373 43614361 13.0413.04

riserise 23912391 980980 14111411 23912391 7.127.12

downdown 22912291 16361636 655655 22912291 6.826.82

fellfell 20742074 18441844 230230 20742074 6.176.17

66

Automating News Analysis for Automating News Analysis for Extracting Sentiments: An Extracting Sentiments: An

experimentexperiment IV. Extract local grammar using collocation and

relevance feedback

PatternPattern ff CollocateCollocate LeftLeft RightRight z-scorez-score

10 percent to10 percent to 108108 roserose 2424 00 5.455.45

by 10 percent toby 10 percent to 1818 roserose 55 00 2.272.27

rose 10 percent torose 10 percent to 1414 billionbillion 00 77 4.244.24

rose 20 percent torose 20 percent to 1111 billionbillion 11 77 6.026.02

67

Automating News Analysis for Automating News Analysis for Extracting Sentiments: An Extracting Sentiments: An

experimentexperiment V. Assert the grammar as a finite state automaton

The (re-) collocation patterns can then be asserted as a finite state automata for each of the The (re-) collocation patterns can then be asserted as a finite state automata for each of the movement verbs and spatial preposition metaphorsmovement verbs and spatial preposition metaphors

68

Automating News Analysis for Automating News Analysis for Extracting Sentiments: An Extracting Sentiments: An

experimentexperiment V. Assert the grammar as a finite state automaton

The (re-) collocation patterns can then be asserted as a finite state automata for each of the The (re-) collocation patterns can then be asserted as a finite state automata for each of the movement verbs and spatial preposition metaphorsmovement verbs and spatial preposition metaphors

69

Automating News Analysis for Automating News Analysis for Extracting Sentiments: An Extracting Sentiments: An

experimentexperiment V. Assert the grammar as a finite state automaton

The (re-) collocation patterns can then be asserted as a finite state automata for each of the The (re-) collocation patterns can then be asserted as a finite state automata for each of the movement verbs and spatial preposition metaphorsmovement verbs and spatial preposition metaphors

70

Experiments and Evaluation of Experiments and Evaluation of sentiment analysis methodsentiment analysis method

V. Assert the grammar as a finite state automaton The (re-) collocation patterns can then be asserted as a finite state automata for each of the The (re-) collocation patterns can then be asserted as a finite state automata for each of the

movement verbs and spatial preposition metaphorsmovement verbs and spatial preposition metaphors

71

Automating News Analysis for Automating News Analysis for Extracting Sentiments: Some Extracting Sentiments: Some

resultsresultsChanges in the total number of positive/negative words together with those that are used in the local grammars (filtered positive / negative words) and total number of words.

0

1

2

3

4

5

6

7

0 6 12 18 24 30 36 42Hours from midnight Nov. 15th, 2004

Nu

mb

er o

f w

ord

s (L

og

sca

le)

Raw Sentiment

Filtered Sentiment

Total number of Tokens

72

Automating News Analysis for Automating News Analysis for Extracting Sentiments: Some Extracting Sentiments: Some

resultsresultsChanges in the total number of positive/negative words together with those that are used in the local grammars (filtered positive / negative words) and total number of words.

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

0 6 12 18 24 30 36 42

Hours from midnight Nov. 15th, 2004

Nu

mb

er

of

wo

rds

(L

og

sc

ale

)

Raw Positive Words

Raw Negative Words

Filtered Positive Words

Filtered Negative Words

Total Number of Words

73

Automating News Analysis for Automating News Analysis for Extracting Sentiments: Bradford Extracting Sentiments: Bradford

Riots?Riots? BBC News tracked from 9/11/1999 to 5/08/2005 for the BBC News tracked from 9/11/1999 to 5/08/2005 for the

keywords keywords Bradford Riots, Burnley RiotsBradford Riots, Burnley Riots, and , and Oldham Oldham RiotsRiots

““City”City” Number Number of News of News ItemsItems

Total # of Total # of TokensTokens

Average Average # of # of Tokens Tokens ((±Std. Dev)±Std. Dev)

BradfordBradford 253253 175191175191 33683368(±5478)(±5478)

BurnleyBurnley 172172 9905999059 23042304(±3236)(±3236)

OldhamOldham 261261 151696151696 30963096(±3041)(±3041)

74

Automating News Analysis for Automating News Analysis for Extracting Sentiments: Bradford Extracting Sentiments: Bradford

Riots?Riots?

-42%

-32%

-22%

-12%

-2%

8%

18%

28%

38%

3 4 5 6 7 8 9 10 11 12 13

Months

Pe

rce

nta

ge

Ch

an

ge

20

01

-20

02

Bradford

Oldham

Burnely

BBC News tracked from 9/11/1999 to 5/08/2005 for the keywords BBC News tracked from 9/11/1999 to 5/08/2005 for the keywords Bradford Riots, Burnley RiotsBradford Riots, Burnley Riots, and , and Oldham Riots. Oldham Riots. The results for the The results for the period July 2001-July 2002period July 2001-July 2002

75

Automating News Analysis for Automating News Analysis for Extracting Sentiments: Bradford Extracting Sentiments: Bradford

Riots?Riots?

-10%

0%

10%

20%

30%

40%

50%

60%

0 2 4 6 8 10

Months

Per

cen

tag

e o

ccu

ran

ce o

f ri

ots

Bradford

Oldham

Burnley

Rate of change?

76

Automating News Analysis for Automating News Analysis for Extracting Sentiments: Bradford Extracting Sentiments: Bradford

Riots?Riots?

Shared between

All 3 corpora asian blair bradford racial rioting youthsblunkett burnley riotsbnp oldham

2 corpora asians griffin racist disturbances

riot

Unique to a corpus immigrant~ malik manninghamshahid

The ‘common’ agencements persons, places, institutions and acts

77

Grids for Automating News Grids for Automating News AnalysisAnalysis

We followed Hughes We followed Hughes et al.et al. (2003) word (2003) word frequency counting approach to evaluate the frequency counting approach to evaluate the performance of our implementationperformance of our implementation

The corpora used in our experiments are the The corpora used in our experiments are the Brown Corpus and the Reuters RCV1 CorpusBrown Corpus and the Reuters RCV1 Corpus

Files Size (Mb) Words (M)

Brown 500 5.2 1.0

RCV1 806,791 2576.8 169.9

78

Grids for Automating News Grids for Automating News AnalysisAnalysis

0

2000

4000

6000

8000

0 16 32 48 64 80

Number of CPUs

Tim

e in

sec

on

ds

79

AfterthoughtAfterthought

Though we have devised programs that can learn Though we have devised programs that can learn unambiguous patterns of use of positive or negative unambiguous patterns of use of positive or negative sentiment, a sentence is always used in the context of sentiment, a sentence is always used in the context of other sentences and the context may change if the other sentences and the context may change if the inference is made on the basis of one sentence only;inference is made on the basis of one sentence only;

One can argue that a new text is a response to some or One can argue that a new text is a response to some or all of the existing texts, and in that sense each text is all of the existing texts, and in that sense each text is contextualised within a network of other texts - even if all contextualised within a network of other texts - even if all the existing texts unambiguously expressed a positive the existing texts unambiguously expressed a positive sentiment, a new text with strong negative sentiment sentiment, a new text with strong negative sentiment may invalidate all of the positive sentiment.may invalidate all of the positive sentiment.

80

Conclusions and Future WorkConclusions and Future WorkData SourcesData Sources Financial Financial

EconomicsEconomicsSociology of Sociology of

Crime; Crime Crime; Crime ScienceScience

Social Social

AnthropologyAnthropology

QuantitativQuantitativee

Macro-micro Economic Indicators; Census Macro-micro Economic Indicators; Census Statistics;Statistics;

Survey of Social Attitudes; Survey of Social Attitudes;

Life-style and Well-being Statistics;Life-style and Well-being Statistics;

Market Market MovementMovement

Crime Crime

StatisticsStatisticsEthnicity-related Ethnicity-related

datadata

QualitativeQualitative

Political News – Reports, Editorials, Political News – Reports, Editorials, Letters to Letters to the Editorthe Editor; ;

Political and Social Opinion Polls; Political and Social Opinion Polls;

Consumer Confidence Survey;Consumer Confidence Survey;

Investor/Trader Investor/Trader Confidence Confidence Surveys; Surveys; Regulatory Body Regulatory Body Output;Output;

Financial News;Financial News;

Citizen Citizen Confidence Confidence Surveys; Surveys;

Police Police Forces/Home Forces/Home Office Reports;Office Reports;

Crime Reports;Crime Reports;

Ethnic Minority ; Ethnic Minority ;

Police Forces/Home Police Forces/Home Office Reports;Office Reports;

Crime Reports;Crime Reports;