DATA FOR SCIENCEHOW ELSEVIER IS USING DATA SCIENCE TO EMPOWER RESEARCHERS
Paul Groth | @pgroth | pgroth.com
Disruptive Technology Director
Elsevier Labs | @elsevierlabs
European Data Forum 2016
12 million people per month
40 million reactions 75 million compounds500 million facts
3 EXAMPLES• Personalized: what should I read?
• Actionable: who should I collaborate with?
• Consumable: how do I make my data available?
RECOMMENDATIONS AT MENDELEY
• Maya Hristakeva• Data Scientist at Mendeley• @mayahhf• Spark Summit 2015• http://www.slideshare.net/SparkSummit/
sparking-science-up-with-research-recommendations-by-maya-hristakeva
Read &
Organize
Search &
Discover
Collaborate &
Network
Experiment&
Synthesize
MENDELEY BUILDS TOOLS TO HELP RESEARCHERS …
BEING THE BEST RESEARCHER YOU CAN BE!• Good researchers are on top of their game
• Large amount of research produced
• Takes time to get what you need
• Help researchers by recommending relevant research
PERSONALIZED ARTICLE RECOMMENDATIONInput:User libraries
Output:
Suggested articles to read
Algorithms:• Collaborative Filtering
– Item-based
– User-Based
– Matrix Factorization
• Content-based
Costly & GoodCostly & Bad
Cheap & GoodCheap & Bad
Tuned IB Mahout
Tuned UB Mahout
Tuned UB Spark
Tuned IB Spark
UB DimSumSpark MLlib
ALS Matrix Fact.Spark MLlib
Performance
+100%
+150%~$50
CALCULATING 75 TRILLION METRICS• Benchmark 4600 institutions & 220 countries updated weekly
• 40 terabytes of data
• HPCC massively parallel compute system – 40 node system
ALL DATA ISN’T CURATED
60 % OF TIME IS SPENT ON DATA PREPARATION
10 ASPECTS OF HIGHLY EFFECTIVE RESEARCH DATA
https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
http://data.mendeley.com/
Each dataset receives a versioned DOI, so it can be cited
The citation for the associated article is
displayed
ACADEMIC COLLABORATIONS
CONCLUSION• Researchers are faced with an ever growing amount of data and content
• Data Science is key to making systems that help them
• I’ve shown three Elsevier examples. Many more!
• Antonio Gulli’s codingplayground.blogspot.nl • labs.elsevier.com
• Of course, we’re hiring
Contact: Paul Groth @pgroth
Top Related