20121108 sntmnt data_sciencenl
-
Upload
datasciencenl -
Category
Documents
-
view
115 -
download
3
description
Transcript of 20121108 sntmnt data_sciencenl
![Page 1: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/1.jpg)
the prevailing attitude of investors as to anticipatedprice development in a market.
< sen·ti·ment >
Tim Harbers, CTO SNTMNTDataScienceNL Meetup November 8th 2012
![Page 2: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/2.jpg)
Tim Harbers
Background
BSc Computer Science
MSc Computer Science
Researcher
Data Miner
Technical Consultant
Co-Founder and COO
Co-Founder and CTO
![Page 3: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/3.jpg)
Vincent van LeeuwenCustomer Development
Kees van NunenProduct Development
Durk KingmaData Mining Expert
Tim HarbersMachine Learning Expert
The Rockstars‣ Balanced multidisciplinary
team
‣ Two machine learning experts in predictive analysis and large datasets
‣ Academic degrees in Behavioral Finance, Portfolio Finance, Strategic Management & Artificial Intelligence
‣ Strong network in (Dutch) financial industry
‣ Young, enthusiastic team with a proven entrepreneurial mindset
![Page 4: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/4.jpg)
How to select the right stockto invest in?
![Page 5: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/5.jpg)
Our solution:
Predicting stock price movementbased on online buzz
Engineered based on academic research:
Bollen, et al, (2010)
Sprenger and Welpe (2010)
Van Leeuwen (2011)
Sehgal and Song (2007)
![Page 6: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/6.jpg)
Why would this work? Very different from traditional indicators News travels faster via social than traditional
media Tremendous amount of data (Almost) nobody uses it yet
![Page 7: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/7.jpg)
Why focus on Twitter? Public data & easily accessible Structured language 400M tweets per day
![Page 8: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/8.jpg)
Historic ResearchBollen (2010) Created a model based on Twitter mood states, which was 86% accurate on the DJI.
Sprenger and Welpe (2011) Analyzed correlation of the stock market and micro blogs
![Page 9: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/9.jpg)
Financial Sentiment vs Brand Sentiment
Financial Sentiment Brand Sentiment
Tweets relating to stocks
Written by traders Trader mumbo
jumbo More relevant Shorter term
Tweets relating to brands
Written by consumers
Any language Larger dataset Longer term
![Page 10: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/10.jpg)
Data setupPeriodJune 2010 to April 2012
StocksTop 15 most tweeted stocks in S&P 500
TweetsFinancial Dataset Timm Sprenger (4 million)4 Million tweets Topsy Brand Tweets (100+ million tweets)
OtherKloutPeerindex
![Page 11: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/11.jpg)
![Page 12: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/12.jpg)
Sentiment Scoring
![Page 13: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/13.jpg)
Financial tweets
![Page 14: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/14.jpg)
Commercial tweets
![Page 15: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/15.jpg)
Sentiment analysis:
Enabling computers to derive sentimentfrom natural language
![Page 16: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/16.jpg)
Naive Approach: Dictionaries Use a dictionary of common positive and
negative terms Count the number of positive and negative
terms Use the difference between the two.
![Page 17: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/17.jpg)
SNTMNT’s approach: machine learning Label a training set of tweets (target) Use preprocessing techniques Use several feature extractors Create a sparse dataset. Use supervised learning to train a machine
learning model.
![Page 18: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/18.jpg)
Labeling
• 25K Financial tweets hand labeled• 30K Commercial tweets hand
labeled• 1M #happy vs. #sad
![Page 19: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/19.jpg)
Difficulties in sentiment analysis Authors / Urls Foreign languages
Slang aykm lol tgsttttptct
Negation
Target Sentiment Analysis
![Page 20: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/20.jpg)
ResultsFinancial tweets84.3% accurate on 2-point scale (Baseline: 60.4%)76.8% accurate on 3-point scale (Baseline: 65.0%)Beat Lexalytics (84.3% vs. 70.3%)Commercial tweets 84.7% accurate on 2-point scale (Baseline: 61.0%) 86.9% accurate on 3-point scale (baseline: 81.1%)
![Page 21: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/21.jpg)
Stock Regression
![Page 22: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/22.jpg)
Stock Regression Input:
Sentiment scores Mood states Meta Data Stock
Output: Trading Indication Confidence
![Page 23: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/23.jpg)
Many dimensions Tweet period Trading period Financial Tweets or Commercial Tweets Tweet Crunchers Models Trading strategy
![Page 24: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/24.jpg)
Tweet Aggregation Problem
Tweet volume Volume positive
tweets Avg sentiment Sentiment Growth Etc.
![Page 25: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/25.jpg)
Machine Learning Models Linear Regression Bayesian Approaches Decision Trees Neural Nets Support Vector Machines
![Page 26: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/26.jpg)
Results R2 < 0.01 Not usable as an independent trading model
after transaction costs. Still usable as an extra indicator to be used by
proven trading models.
![Page 27: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/27.jpg)
Stock Dashboard (B2B2C)
Sentiment APIs(B2B)
Trading Indicator API(B2B)
Products - next steps:
‣ Extend scope to further niche domains and languages.
‣ Market leader and thought leader financial sentiment analysis. ‣ Getting more
insights into added value of SNTMNT algorithm as indicator next to fundamental and technical analysis.
![Page 28: 20121108 sntmnt data_sciencenl](https://reader036.fdocuments.net/reader036/viewer/2022070304/54c6f8754a795937038b45ee/html5/thumbnails/28.jpg)
Any questions?
For more info, visit:
www.SNTMNT.com