What is big data? - O'Reilly Mediaassets.en.oreilly.com/1/event/70/Opening Remarks_ The Harsh...
Transcript of What is big data? - O'Reilly Mediaassets.en.oreilly.com/1/event/70/Opening Remarks_ The Harsh...
What is big data?
Big Data’s a nebulous term, like cloud computing. It’s not really clear what it means.
A series of emerging technologies to create, manipulate, and manage “very large data sets.”
Dan Kusnetzky, What is “Big Data?,” ZDNet (Feb. 16, 2010), http://www.zdnet.com/blog/virtualization/what-is-big-data/1708.
Dan Kusnetzky described it as the tools to manage very large amounts of information.
A movement to bring large-scale data analysis capabilities to the public by providing access to existing data sets, along with the ability to use this data in exciting new ways.
http://www.data.gov/.
Data.gov has a more utopian definition.
Datasets that grow so large that they become awkward to work with using on-hand database management tools.
http://en.wikipedia.org/wiki/Big_data
And Wikipedia agrees on this idea of “new tools needed.”
My definition.
Here’s my working definition
Large amounts of information
Public and private
Easily linked and collected
Stored just because we can
Analyzedby algorithms
In near real time
Applied to business
Usable by everyone
Fed back into the system
Sure, big data is about large amounts of information. But increasingly, that’s both public and private: enterprise data warehouses connected to maps, social networks, government data, and so on. In fact, it’s because this data has become easily collected (through sensor networks, people, and a computerized society) and can be linked (by someone’s email, or a barcode, etc.) that this much data exists.Then, it’s about storing stuff just in case—because storage is free. It’s about letting machines chew on it to find hidden patterns, and producing results in or near real time.Finally, it’s about using this stuff for business. Everyone’s a quant. We’re much more data-literate than we were. And finally, all these conclusions feed back into the system as a set of new data, or improvements to the collection tools, or better algorithms.
Why now?
Big Data’s a nebulous term, like cloud computing. It’s not really clear what it means.
0
27.5
55
82.5
110
2007 2008 2009 2010Revenue Widgets Things Stuff
In the past, when we collected information, we had a priori knowledge of how we’d use it. When we put information into databases, we knew what it was for. We knew we were collecting quarterly sales figures by store, by sales rep, and by product. Storage was expensive, data warehouses took time to manage, and what lived in our databases had structure.
From Elasticsoul on Flickr (http://www.flickr.com/photos/elasticsoul/19940431)
Big Data, on the other hand, is about unstructured information we collect on faith. We drink from the firehose. We don’t know how we’ll use it yet. We store it because we think it’ll be useful later. We have good reason to think so:
In 2020 a two-disk, 2.5” drive will store over 14 TB
and will cost $40.
Magnetic disk areal storage density doubles annually. This estimate assumes that hard drives continue to progress at their current pace; it was first reported in physorg according to http://en.wikipedia.org/wiki/Mark_Kryder
It’s going to be really cheap. Imagine an iPad that can do BI work. So the cost of storing and analyzing is so low, we often assume we may as well keep everything.
Why it’s an advantage
Big Data’s a nebulous term, like cloud computing. It’s not really clear what it means.
“Any sufficiently advanced technology is indistinguishable from magic.”Arthur C. Clarke, Profiles of The Future, 1961 (Clarke’s third law)
Advancement is in the eye of the beholder
For traditional, non-technical business,big datais magic.
Nobody is immune.
Big Data’s a nebulous term, like cloud computing. It’s not really clear what it means.
http://www.flickr.com/photos/saucysalad/3640865387
A quick visit to a music store
http://www.flickr.com/photos/uggboy/4158150814
a DVD rental outlet,
http://www.flickr.com/photos/maladjusted/5207565912
or a travel agent will confirm this—if you can still find one.
http://www.flickr.com/photos/bobjagendorf/5130753552
Companies that aren’t using data to transform themselves will soon be the walking dead, unable to anticipate their markets. The web’s household names got where they are today by mining the information that their users generate and turning it into business advantage.
Why isn’tBlockbuster Netflix?They had data on what people watch and where they live. They just didn’t thing the postal service was a substitute for retail outlets.
Big Data has already transformed many industries forever.
But it’s not just about new businesses.It’s about doing the boring things in new ways.
Big Data’s a nebulous term, like cloud computing. It’s not really clear what it means.
Fina
nce
& b
anki
ng“Q
uant
”
Pha
rmac
eutic
als
Gen
omic
sex
pert
Ene
rgy
Qua
ntita
tive
geol
ogis
t
Civ
ic p
lann
ing
Traf
fic p
atte
rn
anal
yst
Nat
iona
l def
ense
Sim
ulat
ions
op
erat
or
......
“Datascientist”
Many industries have had employees who worked with big data for decades. But until recently, they thought of themselves in terms of their industry—they didn’t realize they were part of a discipline that reached beyond the borders of their specific vertical.Today, we call these people “data scientists,” and the good ones are able to move between industries easily.
0K
125,000K
250,000K
375,000K
500,000K
Amazon Q410 Barnes & Noble Q410 Netflix Q409 Blockbuster Q409 Dropbox Q211 Groupon Q211
Revevnue/Employee (000s)
Revenue per capita, augmenting humans with data
0K
17,500,000K
35,000,000K
52,500,000K
70,000,000K
Amazon Barnes & Noble Netflix Blockbuster Dropbox Groupon
Market cap/employee August 2011
Who would you rather be?
Now, the leader is the one who knows what questions to ask
(Computers, Communications and the Public Interest, pages 40-41, Martin Greenberger, ed., The Johns Hopkins Press, 1971.)
“What information consumes is rather obvious: it consumes the attention of its recipients.Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.”
Now, the leader is the one who knows what questions to ask
What would an MBA look like in a data-filled world?
Big Data’s a nebulous term, like cloud computing. It’s not really clear what it means.
Once, a leader convinced others in the absence of data.
Once, a leader was someone who could convince people to do things in the absence of data.
Now, a leader knows what questions to ask.
Now, the leader is the one who knows what questions to ask
In an era of technology we'll be judging companies by their ability to augment people with technology.
Welcome to JumpStart.