Big dataweb, science, mining
-
Upload
james-littlejohn -
Category
Technology
-
view
3.088 -
download
2
Transcript of Big dataweb, science, mining
PHPUK
Data WebData ScienceData Mining
BIG
BIG
BIG
BIG
Welcome thank for invite, background, assumed read profile
First talk, as an entrepreneur through n in at the deepend always good, make sure you learn to swim fast.
Agenda
BIG DATA WEB
BIG DATA SCIENCE
BIG DATA MINING
SUMMARY
10min set the science10min what is data science and review of characters in the industry, what saying whats being leartn, OPEN source20 hands on code.10 min Q&A
com
Started with a dot, physists tells you a big bang!
Data story began. .com commercial, transaction focus, e-commerce automation, mechanical
Burst thinking continueium web2.0
Web2.0
Moores LawEconomics
Bandwidth
SOCIALopen/share
Send it to friends, family, share openess story build application
Infractucture econonmicsPace of data bandwidth
FB Zuckerberg open share ration growing faster than more law.
What happens as we cycle through this and speeds up DATA web web2.0 squared web3.0 . . . .
Web 2.0 +mobileCloudComputingData Science
Data web
Open and share accellerate (privacy debate wont go there)
How is could difference from moore law, that plus more hadoop, more to go in the cloud, don t want per hour, want what I need, NOSQL, data portability etc.
Data science- what does it all mean?
In practice
Existing data
Always working
Every webpage personalized
DataWeb Summary
Data - expanding fast rate
Economic free cloud
Personalization real time
Science applied to society
Re cap and make conclusions
Data Science
What is data science?
Data lifecycle
Case studies
Live state of physics 1800 Chairman Google
Community rallying around Data Science, strataconf. Structure, local meetups
How does data live?
Characters in the industry, I ve been reading about, useful to link to post get started.
What is data science?
Combines three areas
Engineering
Mathematics Statistics ML
CommunicationPP to infographic, product, API
Add three hats graphics yellow hard hat, prof hat and marketing hat! Dave mccure!
Data lifecycle
Comes from?
Data conditioning
Scale
Tell a story
Intelligence
DataFlowClean keep up to date include new? (big problem? If data with answer is not included, doesn 't matter how smart you DM is !)Algorithm magicPresent -communicate, API portable, feedback loop, etc
Case Studies
Range of perspectives
Cloudera
Bitly
e-commerce
Range of business, infrastructure hadoop cloudera, business linkedin, amazon e-commerce, health everything LL me Link into data mining,
Cloudera
Jeff Hammerbacher
http://jeffhammerbacher.com/
Video http://www.cloudera.com/?resource=orbitz-ideas-jeff-hammerbacher-evolving-new-analytical-platform-apache-hadoop
Enterprise side Dataspaces
Infrastructure stack
Bitly
Hilary Mason
http://www.hilarymason.com/
Video http://www.youtube.com/watch?v=KWszSUm-x2Y
Links across lots of services
Cross source view of world
Monica Rojita
http://www.linkedin.com/in/mrogati
Videohttp://www.forbes.com/sites/danwoods/2011/11/27/linkedins-monica-rogati-on-what-is-a-data-scientist/
Core part of product team
e-commerce
Ebay.com keynote Saturday morning
Amazon.com - John Rauserhttp://www.forbes.com/sites/danwoods/2011/10/07/amazons-john-rauser-on-what-is-a-data-scientist/
Heart of discovery- probability to purchase
Amazon and ebay talk tomorrow keynote
Me OSDS
Vision
Wisdom of Crowds
Big made from small
Yahoo meetup James Sarwoski Wisdom of the Crowd book, prediction
markets, choice bet with money better, what if replace bet with
money with bet with your life? Need to measure life?
Set hypthosis test. Need curiosity to apply ideas
Smart on our own smarter networked?
Only live life in real time
Lots of 'path' already worn
Data Science Summary
Go(ing) mainstream
Wide variety applications
Curiosity gives edge
Next push of the web?
Start up to existing need skill set, education market adopting to skill up work place
Picture of a cat, = curiosity
Data Mining
Types - techniques
Examples:Statistics - Text categorisation - SOM
Summary
Types - Techniques
Granularity
WWW
Blog Post
SpecificSentiment
Picuture small med large show different level of granularity of data
What hypothsisi are you trying to ask?
Lets go and see what each is usfeul for?
Statistics
Simple is beautiful
Real time maybe best indicator
Show live site stats
Need to get screen shot
Text Categorisation
Show me the code
Data lifecycle
Assumptions
Scaling
Got chrome or FF
Code open files
Story show class of data lifecycle, clean, make wise, UI API RDF
Example, choices made, two words limit 50 FREQUENCY PLAYING GOT image assumption try and crowd source everything, getting start, re start once startedUse Couch DB to show top50 May change two words or limit to 100? Trade off with speed
We know what the answer will look like? Just getting there.
Not always awere choice made, frequency of matching, weights attached
'Rule' be consistent
Could be better but is quantums better than what we have
Learn by doing ie learn be accident!
'play god slide'
BIG
BIG made from small
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
BIG
SOM-Automous learning
SOM SELF ORGANISING MAPS
Dr. Andrew Starkey Blue Flow Ltd
Aberdeen University Spin Out
http://www.blue-flow.com/
Liverpool play in red
Liverpool have a red strip
Liverpool used to play in blue
Liverpool in a red strip
Liverpool known for their red strip
Everton play in blue
Everton have a blue strip
Everton known as the bitter blue
Everton have a horrible blue strip
Everton dont like their blue colour
DM summary
No one on its own but combination
Future more human
Emergence Platform Cloudspaces PtoP
Personal data (VRM)
Dave winer not so much data for and against, to be use to make what we need.
PHP - dataweb
40% web CMS leading OS
40% value from data
Evolution language - LINC .net examplehttps://github.com/dahlia/phunctional
Speak on conf. On future of language, our job to pursudate in data science ie this direction
Summary
Data Web here
Personalised start to everything
Society science
Life = Information
Thank you
Q & A
Feedback https://joind.in/4955
http://lanyrd.com/2012/php-uk-conference/sptkm/
Contact
James Littlejohn
@aboynejames [email protected]
+44 7521580938