Big dataweb, science, mining

download Big dataweb, science, mining

If you can't read please download the document

Transcript of Big dataweb, science, mining

PHPUK

Data WebData ScienceData Mining

BIG

BIG

BIG

BIG

Welcome thank for invite, background, assumed read profile

First talk, as an entrepreneur through n in at the deepend always good, make sure you learn to swim fast.

Agenda

BIG DATA WEB

BIG DATA SCIENCE

BIG DATA MINING

SUMMARY

10min set the science10min what is data science and review of characters in the industry, what saying whats being leartn, OPEN source20 hands on code.10 min Q&A

com

Started with a dot, physists tells you a big bang!

Data story began. .com commercial, transaction focus, e-commerce automation, mechanical

Burst thinking continueium web2.0

Web2.0

Moores LawEconomics

Bandwidth

SOCIALopen/share

Send it to friends, family, share openess story build application

Infractucture econonmicsPace of data bandwidth

FB Zuckerberg open share ration growing faster than more law.

What happens as we cycle through this and speeds up DATA web web2.0 squared web3.0 . . . .

Web 2.0 +mobileCloudComputingData Science

Data web

Open and share accellerate (privacy debate wont go there)

How is could difference from moore law, that plus more hadoop, more to go in the cloud, don t want per hour, want what I need, NOSQL, data portability etc.

Data science- what does it all mean?

In practice

Existing data

Always working

Every webpage personalized

DataWeb Summary

Data - expanding fast rate

Economic free cloud

Personalization real time

Science applied to society

Re cap and make conclusions

Data Science

What is data science?

Data lifecycle

Case studies

Live state of physics 1800 Chairman Google

Community rallying around Data Science, strataconf. Structure, local meetups

How does data live?

Characters in the industry, I ve been reading about, useful to link to post get started.

What is data science?

Combines three areas

Engineering

Mathematics Statistics ML

CommunicationPP to infographic, product, API

Add three hats graphics yellow hard hat, prof hat and marketing hat! Dave mccure!

Data lifecycle

Comes from?

Data conditioning

Scale

Tell a story

Intelligence

DataFlowClean keep up to date include new? (big problem? If data with answer is not included, doesn 't matter how smart you DM is !)Algorithm magicPresent -communicate, API portable, feedback loop, etc

Case Studies

Range of perspectives

Cloudera

Bitly

LinkedIN

e-commerce

Range of business, infrastructure hadoop cloudera, business linkedin, amazon e-commerce, health everything LL me Link into data mining,

Cloudera

Jeff Hammerbacher

http://jeffhammerbacher.com/

Video http://www.cloudera.com/?resource=orbitz-ideas-jeff-hammerbacher-evolving-new-analytical-platform-apache-hadoop

Enterprise side Dataspaces

Infrastructure stack

Bitly

Hilary Mason

http://www.hilarymason.com/

Video http://www.youtube.com/watch?v=KWszSUm-x2Y

Links across lots of services

Cross source view of world

LinkedIN

Monica Rojita

http://www.linkedin.com/in/mrogati

Videohttp://www.forbes.com/sites/danwoods/2011/11/27/linkedins-monica-rogati-on-what-is-a-data-scientist/

Core part of product team

e-commerce

Ebay.com keynote Saturday morning

Amazon.com - John Rauserhttp://www.forbes.com/sites/danwoods/2011/10/07/amazons-john-rauser-on-what-is-a-data-scientist/

Heart of discovery- probability to purchase

Amazon and ebay talk tomorrow keynote

Me OSDS

Vision

Wisdom of Crowds

Big made from small

Yahoo meetup James Sarwoski Wisdom of the Crowd book, prediction markets, choice bet with money better, what if replace bet with money with bet with your life? Need to measure life?
Set hypthosis test. Need curiosity to apply ideas

Smart on our own smarter networked?

Only live life in real time

Lots of 'path' already worn

Data Science Summary

Go(ing) mainstream

Wide variety applications

Curiosity gives edge

Next push of the web?

Start up to existing need skill set, education market adopting to skill up work place

Picture of a cat, = curiosity

Data Mining

Types - techniques

Examples:Statistics - Text categorisation - SOM

Summary

Types - Techniques

Granularity

WWW

Blog Post

SpecificSentiment

Picuture small med large show different level of granularity of data

What hypothsisi are you trying to ask?

Lets go and see what each is usfeul for?

Statistics

Simple is beautiful

Real time maybe best indicator

Show live site stats

Need to get screen shot

Text Categorisation

Show me the code

Data lifecycle

Assumptions

Scaling

Got chrome or FF

Code open files

Story show class of data lifecycle, clean, make wise, UI API RDF

Example, choices made, two words limit 50 FREQUENCY PLAYING GOT image assumption try and crowd source everything, getting start, re start once startedUse Couch DB to show top50 May change two words or limit to 100? Trade off with speed

We know what the answer will look like? Just getting there.

Not always awere choice made, frequency of matching, weights attached

'Rule' be consistent

Could be better but is quantums better than what we have

Learn by doing ie learn be accident!

'play god slide'

BIG

BIG made from small

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

BIG

SOM-Automous learning

SOM SELF ORGANISING MAPS

Dr. Andrew Starkey Blue Flow Ltd

Aberdeen University Spin Out

http://www.blue-flow.com/

Liverpool play in red

Liverpool have a red strip

Liverpool used to play in blue

Liverpool in a red strip

Liverpool known for their red strip

Everton play in blue

Everton have a blue strip

Everton known as the bitter blue

Everton have a horrible blue strip

Everton dont like their blue colour

DM summary

No one on its own but combination

Future more human

Emergence Platform Cloudspaces PtoP

Personal data (VRM)

Dave winer not so much data for and against, to be use to make what we need.

PHP - dataweb

40% web CMS leading OS

40% value from data

Evolution language - LINC .net examplehttps://github.com/dahlia/phunctional

Speak on conf. On future of language, our job to pursudate in data science ie this direction

Summary

Data Web here

Personalised start to everything

Society science

Life = Information

Thank you

Q & A

Feedback https://joind.in/4955

http://lanyrd.com/2012/php-uk-conference/sptkm/

Contact

James Littlejohn

@aboynejames [email protected]

+44 7521580938