The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data...
Transcript of The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data...
![Page 1: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/1.jpg)
The analytics landscape:A personal view
Charles Elkan
el December 20, 2011
![Page 2: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/2.jpg)
What is analytics?
• Big data, business intelligence (BI), decision support (DSS), data warehousing, unstructured data, knowledge discovery in databases (KDD), information visualization, map-reduce.
• analytics = convert data into intelligence + capture value = statistics + optimization
• statistics = machine learning = data mining
• optimization = microeconomics + operations research
JARGON
![Page 3: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/3.jpg)
Outline
1.Structured data (predictive, visual)2.Unstructured data3.The business of analytics4.A research and business opportunity
![Page 4: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/4.jpg)
A basic distinction
I. Structured dataTables in databasesNodes and links in networks
II. Unstructured dataTextVideosTables in web pagesXML
![Page 5: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/5.jpg)
I. Structured data
• A data warehouse is a cost center, not a profit center.• How can structured data be a profit center?
1.Predictive analytics2.Visual analytics
![Page 6: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/6.jpg)
1. Predictive analytics
• So, what can we do with structured data?• Answer: Make predictions, then take actions. • Example:
• But, what are the costs and benefits of alternative actions?• And, who pays which costs?
![Page 7: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/7.jpg)
Cost-sensitive learning
• Cross-domain theory of making optimal decisions given predictions:
![Page 8: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/8.jpg)
2. Visual analytics
• So, what can we do with structured data?• Answer: Find and display patterns; prompt human insight.
![Page 9: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/9.jpg)
Patterns of human metabolism
![Page 10: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/10.jpg)
Information visualization
• “state of the art analytic tools to identify biomarkers”
![Page 11: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/11.jpg)
II. Unstructured data
![Page 12: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/12.jpg)
A case study
![Page 13: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/13.jpg)
A general need: Task-oriented semantic search
LaVerne Council, CIO of Johnson & Johnson:
“... allow anyone to ask a question ... folks that have given us access to their email ... data mining for answers to that question
... help us solve a very hairy issue for one of our products ... one of the associates had completed his thesis in college on that very topic ... they weren’t in the same company
... we were able to really come back with answers.”
![Page 14: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/14.jpg)
A grand vision
• “Open source intelligence (OSI)”
![Page 15: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/15.jpg)
A less grand vision
![Page 16: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/16.jpg)
III. The business of analytics
• Analytics applications are valuable.
![Page 17: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/17.jpg)
Analytics companies are valuable
![Page 18: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/18.jpg)
Are valuations bubble-icious?
• HP compared to Autonomy:
Sales: $128B versus $963MIncome: $12B versus $343M
Value: $50B versus $11B
• Forrester: “The Autonomy IP is stagnant. There hasn’t been a major release in five years.”
• Zero recent patents for the core analytics.
![Page 19: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/19.jpg)
IV. A research and market opportunity
![Page 20: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/20.jpg)
• New platform for diverse dataCloud-basedMultiply the user base 10x:
• Easy to use• Fun to use
• Opportunity: Add “secret sauce” to open-source softwareNewer artificial intelligencePatented artificial intelligence
Disruption from below
![Page 21: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/21.jpg)
• A role model for cloud-based ease of use: Box.net• $650M valuation, but no intelligence.
![Page 22: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/22.jpg)
• Cloud-based software as a service (SaaS)• Easy to use, fun to use• Newer AI, patented AI
• Open-source foundation:Lucene and Solr as backendTika for importing unstructured data
Disruption from below
![Page 23: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/23.jpg)
Newer artificial intelligence
• Sentiment analysis• Topic models for
organizing content• Recursive neural nets
for deep understanding www.socher.org/index.php/Main/ParsingNaturalScenes AndNaturalLanguageWith RecursiveNeuralNetworks
![Page 24: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/24.jpg)
Newer AI: Fewer topics, better fit
![Page 25: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/25.jpg)
Patented AI: Sentiment analysis
• ... labels designate level of quality, such as interestingness, appropriateness, timeliness, humor, style of language, obscenity, sentiment
• ... a classifier means effective to automatically associate a quality value to items of data, wherein said quality value is indicative of the qualitative nature of said items of data
![Page 26: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/26.jpg)
Today in the New York Times
![Page 27: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/27.jpg)
SQUID
• Sentiment analysis• Question answering• Unstructured data organization• Interactive insight• Diverse entity extraction
• But what will be most beneficial and profitable?
• Historical answer: Specific vertical applications.
![Page 28: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/28.jpg)
Profit lies in verticals, I
![Page 29: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/29.jpg)
Profit lies in verticals, II
![Page 30: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research](https://reader035.fdocuments.net/reader035/viewer/2022071020/5fd45e69bc298110191c525b/html5/thumbnails/30.jpg)
Discussion
• Acknowledgement: Most images are due to other authors.