Building a Sentiment Analytics Solution powered by Machine Learning- Impetus Webinar
Real time machine learning architecture & sentiment analysis
-
Upload
yige-zhao -
Category
Data & Analytics
-
view
258 -
download
5
Transcript of Real time machine learning architecture & sentiment analysis
![Page 1: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/1.jpg)
Real Time Machine Learning Architecture & Sentiment Analysis
Quantcon 2016, Singapore
Juan CHENG, PHDData [email protected]
www.infotrie.com@infotrie
www.finsents.com@finsents
![Page 2: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/2.jpg)
Agenda
● About us● News analytics in finance● A news analytics case
• Information extraction of text• Text feature extraction for machine learning classification• Big data tools applied• Architecture that combines all
![Page 3: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/3.jpg)
Our team
Frederic GEORJONCEO
Ajil GEORGEHead of Development Center
Daniel ABROUKHead of EMEA
Paris/Singapore London
LONG ZhichengCTO
Singapore India
![Page 4: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/4.jpg)
Services
FinSentS.com➔ Real-time information
and trading portal➔ Millions of sources /
Multilingual➔ Saas or on premises➔ Real-time Alerts➔ Actionable signals
Sentiment Data➔ Through API or 1/3 parties➔ Up to 15 years of history➔ Low latency / Tick by tick➔ 50,000+ entities➔ Stock, Forex, commodities,
index, Macroeconomic topics etc…
Consultancy and Training➔ Trading Technology➔ Algorithmic trading➔ Big Data➔ Natural Language
Processing (NLP)➔ Machine Learning
![Page 5: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/5.jpg)
B.No, I’m a quant. I found it’s hard to quantified news.
A.No, I found news are noisy. They are just too much.
C. Yes. But I found using news is not very efficient. I have to manually related them to my portfolio.
Do you use news in your strategies?
![Page 6: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/6.jpg)
News Analytics in FinanceAccess to News / News management
- Visualization tools - Filtering tools - On demand view
Feed from multiple sources:- Social Media- Web based content- Private sources - Internal data
News Content Alerts based on sentiment indicator
Provide accurate information from Big Data environment and pushed it front of Users in real time for Risk management
Dashboard
- Consolidated Dashboard- Portfolio Alerts
Actionable indicators
Users receive news signals for trading / hedging / risk management based sentiment indicator
Algo Trading / Robo Trading
Real Time algorithmic trading Sentiment indicator and News Analytics
Equity Research / Sales Team Hedging Trader / Prop Trader
- News Tag Cloud- Filtering newsfeed with Social media blotter, news blotter - Search Engine on demand
- Topics detection - Rumours alerts- News qualification per importance
- Relevant information from single screen- Automatic Alert- Integrated to OMS
Provide relevant news analytics indicator for hedging or trade idea generation
Fully integrated news analytics signals integrated to algo trading strategies
![Page 7: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/7.jpg)
ReutersMARKET NEWS | Fri Oct 21, 2016 | 2:18am EDTAT&T acquires Time Warner for $85 billionNEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion, the boldest move yet by a telecommunications company to acquire content to stream over its high-speed network to attract a growing number of online viewers.
The trend of consolidation comes as technology advances have been upending traditional entertainment companies. Many in the industry believe that getting bigger is the best way to compete with companies like Google, Apple, Netflix and Facebook.David Goldman and Paul R. La Monica contributed to this report.
What’s in the news?
![Page 8: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/8.jpg)
ReutersMARKET NEWS | Fri Oct 21, 2016 | 2:18am EDTAT&T acquires Time Warner for $85 billionNEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion, the boldest move yet by a telecommunications company to acquire content to stream over its high-speed network to attract a growing number of online viewers.
The trend of consolidation comes as technology advances have been upending traditional entertainment companies. Many in the industry believe that getting bigger is the best way to compete with companies like Google, Apple, Netflix and Facebook.David Goldman and Paul R. La Monica contributed to this report.
Source
Category
Time
Location
Named Entity
Sentiment
Event
Hacking skill, regex,nlp, named entity recognition, pos taggers
What’s in the news?
![Page 9: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/9.jpg)
Text feature extraction
Train Document Set:
d1: The sky is blue.d2: The sun is bright.
Test Document Set:
d3: The sun in the sky is bright.d4: We can see the shining sun, the bright sun.
Vector Space Model (VSM)
t1 t2...
d1
d2 ...
![Page 10: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/10.jpg)
Text feature extraction
Train Document Set:d1: The sky is
blue.d2: The sun is
bright.
Vocabulary
Term frequency(TF)
![Page 11: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/11.jpg)
Text feature extraction
TF emphasize a term which is almost present in the entire corpus
TD-IDF
TF example IDF example
Normalized TD-IDF
![Page 12: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/12.jpg)
Text feature extraction
Train Document Set:
d1: The sky is blue.d2: The sun is bright.
Test Document Set:
d3: The sun in the sky is bright.d4: We can see the shining sun, the bright sun.
Vector Space Model (VSM)
t1 t2...
d1
d2 ...
Machine Learning
![Page 13: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/13.jpg)
- Companies, indexes - People, locations, organizations- Events- Regions
NLP
Text- Dow Jones, bloomberg- Web news, blogs, twitter- 1000+ sources
Feature Extraction
Classification
Sentiment
- 15 years history- Tens of millions of articles
Training
Indexing - Sector/industry- Commodity, FX, ETFs- Political, country risk- Macroeconomic- Fear, greed, anger,
happiness
Aggregation
Processes in text analytics
![Page 14: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/14.jpg)
Architecture requirements
❏ Guaranteed data processing❏ Horizontal scalability❏ Fault-tolerance❏ Higher level abstraction than message passing❏ Real-time machine learning for classification and predictive
analytics
![Page 15: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/15.jpg)
Analytics on Massive Historical Text Data
Analytics on recent pass
Realtime analytics
Batch layer real-time layer
Architecture Solutions
![Page 16: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/16.jpg)
Fast and general engine for large-scale distributed data processing
Memory Network CPU’s Disk
Reference: spark
Logistic regression in Hadoop and Spark
What’s Spark
![Page 17: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/17.jpg)
What’s Storm?
open source distributed realtime computation system, easily process unbounded streams of data
Storm was benchmarked at processing one million 100 byte messages per second per node on hardware with the following specs:
Processor: 2x Intel [email protected]
Memory: 24 GB
Reference: storm
Spout
bolt
![Page 18: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/18.jpg)
Requirements
✓ Guaranteed data processing ✓ Horizontal scalability✓ Fault-tolerance✓ Higher level abstraction than message
passing✓ Real-time machine learning for
classification and predictive analytics
![Page 19: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/19.jpg)
NoSQL Databasecache persistent
Kafka Filter, topic classification, sentiment calculation, entity detection, stock mapping, sentiment aggregation
Apache Storm
DFSNlp modelsML models
ProducersBlogs, twitter, news, bloomberg...
Model training, batch cleaning, batch calculation
Apache Spark
Solr
Relational Database
Web app
Architecture
![Page 20: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/20.jpg)
Usecases
➔Scale analysis pipeline
➔Live stats
➔Recommendations
➔Predictions➔Realtime analytics
➔Online machine learning
Apply similar architecture in
![Page 22: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/22.jpg)
USE CASE in trading I- positive buzz
Sentiment in itself is a powerful trading indicator out of which multiple trading strategies can be build
Simulate impact of complex events
![Page 23: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/23.jpg)
USE CASE in Trading II- Monitoring & Rebalancing
MIFID alertImprove Client's communication
Regulatory Process complex / low signals events
ESG monitoringEcological – Social – Governance
An union calls for a strike in a factory in Argentina?
Negative news coverage is accelerating for a stock I hold in Chinese press but are not yet in English press?
A European company employs children in Bangladesh (*)?
ACTIONS
![Page 24: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/24.jpg)
111111111
3231
111111111
3231
111111111
3231
dfs
Spark basics - word count
96
3
99693
text_file.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)Job
Executor
![Page 25: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/25.jpg)
Storm basics
Nimbus
Zookeeper
Zookeeper
Worker
Worker
Worker
Worker
![Page 26: Real time machine learning architecture & sentiment analysis](https://reader036.fdocuments.net/reader036/viewer/2022062412/589da6491a28ab21728b4a75/html5/thumbnails/26.jpg)
Big Data in Finance
Velocity
Big Data
Variety
- News, blogs, social media, analyst reports, company announcement, traders’ chat room…
- Financial reports, price, economic events...
- Weather, GPS, image....
Volumn
- ETL- Machine learning- Correlation analysis,- regressions….
- As fast as possible