Stock prediction using social network

26
Stock Prediction Using Social Network Data Rohit Tiwari (rtiwari2) Chanon Hongsirikulkit (hongsir2)

Transcript of Stock prediction using social network

Page 1: Stock prediction using social network

Stock Prediction Using Social Network Data

Rohit Tiwari (rtiwari2) Chanon Hongsirikulkit (hongsir2)

Page 2: Stock prediction using social network

Outline- Introduction

- Data Sources- APIs

- Filter Relevant Data

- Text Normalization

- Noise Removal

- Feature Extraction- Topic Modeling

- Sentiment Analysis

- Tweet Features

- Prediction Model Construction

- Conclusion

- Future Works

Page 3: Stock prediction using social network

Fake Tweet -> Stocks Plunged

Page 4: Stock prediction using social network

Introduction- Social Network is a communication platform contain hidden valuable knowledge

- Information on social network can reflect the real-world events

- Many researches exploit those information to enhance the application capability

- To analyze tweets contain information needs (Zhao and Mei 2013)

- Apply tweet-rate to predict box office revenues of movie (Asur and Huberman 2010)

- Our survey will focus on using social network data to predict stock market movement

- False message on Twitter “BREAKING: Two Explosions in the White House and Barack Obama is injured.” -> The Dow Jones and S&P 500 indexes dropped by close to 1%, the equivalent of hundreds of billions of dollars changing hands.

- In August 2012, an Italian journalist set up a fake Twitter account for a member of Russia's government and tweeted that the president of Syria had been killed, causing brief fluctuations in the oil markets.

http://www.telegraph.co.uk/finance/markets/10013768/Bogus-AP-tweet-about-explosion-at-the-White-House-wipes-billions-off-US-markets.html

Page 5: Stock prediction using social network

Formal Description: The Efficient Market Hypothesis (EMH)

- The EMH states that financial markets are the source of comprehensive and huge information. It implies that market prices reflect changes in investor behavior since they take this into account and act accordingly.

- Research asserts investor’s rational considerations are influenced by psychological biases and emotions.

- For several decades, direct surveys have been the prominent method to estimate public mood and investor sentiment. However, explicit expressions can be manipulated incorrectly. It cannot take behavior based indicators into consideration.

J. Bollen and H. Mao, “Twitter Mood as a Stock Market Predictor,” Computer, vol. 44, no. 10, pp. 91-94, 2011.

Page 6: Stock prediction using social network

General Methodology for Stock prediction

Data Sources

Relevant Dataset

Data Preprocessing

-Text Filter-Text Normalization-Noise Removal

via APIs

Feature Extraction

Features

Topic Modeling

Sentiment Analysis

Tweet Features

Classifiers

Training Data

Results

Correlation / Prediction Capability Testing

Page 7: Stock prediction using social network

Data Sources

- Twitter (Asur and Huberman 2010; Bollen and Mao 2011; Zhao and Mei 2013; Arias et al. 2015)

- Streaming API -> collect real-time tweets

- Search API -> search and collect historical tweets one week in past

- Yahoo Finance (Nguyen et al. 2015)

- Collect historical stock prices

- Collect posts from Yahoo Finance Board

- Sina Weibo (Liu et al. 2015)

- Microblogging service from China which is similar to Twitter

Page 8: Stock prediction using social network

Filter Relevant Data from Corpus

- Collect data from social network contain both relevant and non-relevant data to our specific domain

- We need to filter only relevant data

- Some approaches are used in the researches

- Filter by keywords -> exploit hashtag or cash tag in the messages

- Apply LDA to do topic modeling and then filter only related topics (Arias et al. 2015)

M. Arias, A. Arratia, and R. Xuriguera, “Forecasting with Twitter Data,” ACM Transactions on Intelligent Systems and Technology, vol. 5, no. 1, pp. 1-24, 2015.

Page 9: Stock prediction using social network

Text Normalization

Primary step to refine the data. It can involve tasks.

- Stop word removal

- Punctuation removal

- Lowercase conversion

- Compressing

- Transform “Haaappyyyy” to “Happy” . This is done in multiple iterations, finally validated with the dictionary lookup at the end.

Page 10: Stock prediction using social network

Noise Removal in tweets

- Noise data removing has standard tools to remove highly weighted and frequent terms with IDF.

- Named entity recognition (NER) system - Initially, it was built to figure out if tweet contains name entities related to companies(or other feature) based on conditional random fields (CRF) model. If the Tweet doesn’t have any named entities from keyword list for the company, it is removed.

Page 11: Stock prediction using social network

Cluttered Information

Refined form

Feature Extraction

Page 12: Stock prediction using social network

- Some researches use topics of the messages to be features for forecasting model

- Many approaches are proposed for topic extraction

- Extract n-gram (unigrams or bigrams)

- Latent Dirichlet Allocation (LDA)

- Joint Sentiment-Topic (JST) -> to extract both sentiment information and topics from text data simultaneously

- Aspect-based sentiment -> to extract topics first and then calculate sentiment scores concerning the distance between topics and emotion words / the importance of each topic (Nguyen et al. 2015)

Topic Modelling

Page 13: Stock prediction using social network

- To extract topics first and then calculate sentiment scores concerning the distance between topics and emotion words / the importance of each topic (Nguyen et al. 2015)

Aspect-based sentiment algorithm

Algorithm for extracting topics from dataset

Algorithm for extracting topics and their sentiment values

T. H. Nguyen, K. Shirai, and J. Velcin, “Sentiment analysis on social media for stock movement prediction,” Expert Systems with Applications, vol. 42, no. 24, pp. 9603-9611, 2015.

Page 14: Stock prediction using social network

Sentiment Analysis

- Some researches consider sentiment information on social network as features for their model

- There are two ways to extract sentiment score

- Using software to calculate sentiment scores

- Construct a classifier for sentiment classification

- Popular tools

- GPOMS -> categorize people’s emotions into 6 categories: calm, alert, sure, vital, kind, and happy

- OpinionFinder (OF) -> classify sentiment into positive or negative feelings

Page 15: Stock prediction using social network

Constructing Sentiment Classifier- Have experts to annotate sentiment data and use them as training data

- Extract features from training data -> n-gram, POS tagging

- Use classifier (SVM, Linear Regression Model) to learn from training data

- Apply the classifier to entire collection

Page 16: Stock prediction using social network

Extracting Sentiment Features

After having classified sentiment data, we can generate sentiment features in various ways

Example of sentiment features used in some researches.

- Average daily sentiment score

- Sentiment index = Numbers of positive tweets / Total numbers of tweets

- PNRatio = Numbers of positive tweets / Numbers of negative tweets

- Sentiment polarity = (ptw - ntw) / (ptw + ntw)

- ptw : numbers of positive tweets

- ntw : numbers of negative tweets

Page 17: Stock prediction using social network

Sentiment Features Testing

- To ensure that sentiment information reflect the real-world events and can be used for prediction

- Some approaches used in researches (Bollen and Mao 2011)

- Causality testing : to test correlation between sentiment information and stock market price (DJIA / VIX)

- Self-organizing fuzzy neural network (SOFFN) : to test prediction capability of sentiment information

J. Bollen, and H. Mao, “Twitter Mood as a Stock Market Predictor,” Computer, vol. 44, no. 10, pp. 91-94, 2011.

Page 18: Stock prediction using social network

Extracting Tweet Features

Some useful quantifiable information out of corpus.

- Number of followers of the company or the famous personality tweeting about the company (typical problem of mapreduce framework)

- Tweet volume (related to a specific identity or hashtag)

- Retweet volume (related to a specific hashtag coupled with an identity)

- Tweet-rate = Numbers of tweets / Duration for generating those tweets

- Tweet length

Page 19: Stock prediction using social network

Prediction Model Construction

1. Combine features from previous step

- Topic features

- Sentiment features

- Tweet features

- Stock historical price features (additional features)

Page 20: Stock prediction using social network

Google Heat Map: Gives the fair idea of any form of concentrated information by the geography. Eg, Facebook trends

Page 21: Stock prediction using social network

Iterative Training & Validation

2. Train the classifier -> SVM, Linear Regression, Neural Networks

3. Test and evaluate the model

- Most popular method for this is windowing mechanism, where model segregates tweets in a window (w1) spanning over days and analyses their sentiments or features.

- Then in the subsequent window(w2) of 1-2 days, stock indices are measured.

- Then, w1 & w2 are formally analyzed together to find interesting patterns.

Page 22: Stock prediction using social network

Correlation of sentiments & indices

This involve formally casually correlating social network sentiments and stock market indices from Dow Jones, NASDAQ, NYSE, VIX

M. Arias, A. Arratia, and R. Xuriguera, “Forecasting with Twitter Data,” ACM Transactions on Intelligent Systems and Technology, vol. 5, no. 1, pp. 1-24, 2015.

T. H. Nguyen, K. Shirai, and J. Velcin, “Sentiment analysis on social media for stock movement prediction,” Expert Systems with Applications, vol. 42, no. 24, pp. 9603-9611, 2015.

Page 23: Stock prediction using social network

Conclusion

- Information on social network reflect the real-world events

- Social network data can be used to predict stock market movement at certain degree

- The knowledge extracted from social media can be applied to different applications

- Individual stock price prediction

- Predicting box-office revenue of a movie

- Presidential/Senate election prediction based on campaigning data.

Page 24: Stock prediction using social network

Future Works

- Try to work on longer duration dataset -> some current works use only 15 transaction dates

- Combining information from different data sources might improve prediction accuracy -> we know that Twitter contain many noise data

- Come up with new features, such as the credibility of tweets. -> most of current researches focus on topic + sentiment without concerning about reliability of data

Page 25: Stock prediction using social network

References[1] M. Arias, A. Arratia, and R. Xuriguera, “Forecasting with Twitter Data,” ACM Transactions on Intelligent Systems and Technology, vol. 5, no. 1, pp. 1-24, 2015.[2] L. Liu, J. Wu, P. Li, and Q. Li, “A social-media-based approach to predicting stock comovement,” Expert Systems with Applications, vol. 42, no. 8, pp. 3893-3901, 2015.[3] T. H. Nguyen, K. Shirai, and J. Velcin, “Sentiment analysis on social media for stock movement prediction,” Expert Systems with Applications, vol. 42, no. 24, pp. 9603-9611, 2015.[4] S. Asur, B. A. Huberman, "Predicting the Future with Social Media," 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 492-499, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2010.[5] Z. Zhao, Q. Mei, “Questions about questions: an empirical analysis of information needs on Twitter,” Proceedings of the 22nd international conference on World Wide Web, May 13-17, 2013, Rio de Janeiro, Brazil[6] J. Bollen, and H. Mao, “Twitter Mood as a Stock Market Predictor,” Computer, vol. 44, no. 10, pp. 91-94, 2011.[7] J. Si, A. Mukherjee, B. Liu, Q. Li, H. Li, and X. Deng, “Exploiting Topic based Twitter Sentiment for Stock Prediction,” Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 24-29, 2013.[8] X. Zhang, H. Fuehres, and P. A. Gloor, “Predicting Stock Market Indicators Through Twitter “I hope it is not as bad as I fear”,” The 2nd Collaborative Innovation Networks Conference - COINs2010, vol. 26, pp. 55-62, 2011.[9] G. Ranco, D. Aleksovski, G. Caldarelli, M. Grcar, and I. Mozetic, “The Effects of Twitter Sentiment on Stock Price Returns,” Plos ONE, vol. 10, no. 9, pp. 1-21, 2015.[10] T. T. Vu, S. Chang, Q. T. Ha, and N. Collier, “An Experiment in Integrating Sentiment Features for Tech Stock Prediction in Twitter,” Workshop on Information Extraction and Entity Analytics on Social Media Data, pp. 23-38, 2012.

Page 26: Stock prediction using social network

Thank You :)