What is big data? - O'Reilly Remarks_ The Harsh Light... · What is big data? Big Data’s a ... that…


of 31

  • date post

  • Category

  • view

  • download


Embed Size (px)


<ul><li><p>What is big data?</p><p>Big Datas a nebulous term, like cloud computing. Its not really clear what it means.</p></li><li><p>A series of emerging technologies to create, manipulate, and manage very large data sets.</p><p>Dan Kusnetzky, What is Big Data?, ZDNet (Feb. 16, 2010), http://www.zdnet.com/blog/virtualization/what-is-big-data/1708.</p><p>Dan Kusnetzky described it as the tools to manage very large amounts of information.</p></li><li><p>A movement to bring large-scale data analysis capabilities to the public by providing access to existing data sets, along with the ability to use this data in exciting new ways.</p><p>http://www.data.gov/.</p><p>Data.gov has a more utopian definition.</p></li><li><p>Datasets that grow so large that they become awkward to work with using on-hand database management tools.</p><p>http://en.wikipedia.org/wiki/Big_data</p><p>And Wikipedia agrees on this idea of new tools needed.</p></li><li><p>My definition.</p><p>Heres my working definition</p></li><li><p>Large amounts of information</p><p>Public and private</p><p>Easily linked and collected</p><p>Stored just because we can</p><p>Analyzedby algorithms</p><p>In near real time</p><p>Applied to business</p><p>Usable by everyone</p><p>Fed back into the system</p><p>Sure, big data is about large amounts of information. But increasingly, thats both public and private: enterprise data warehouses connected to maps, social networks, government data, and so on. In fact, its because this data has become easily collected (through sensor networks, people, and a computerized society) and can be linked (by someones email, or a barcode, etc.) that this much data exists.Then, its about storing stuff just in casebecause storage is free. Its about letting machines chew on it to find hidden patterns, and producing results in or near real time.Finally, its about using this stuff for business. Everyones a quant. Were much more data-literate than we were. And finally, all these conclusions feed back into the system as a set of new data, or improvements to the collection tools, or better algorithms.</p></li><li><p>Why now?</p><p>Big Datas a nebulous term, like cloud computing. Its not really clear what it means.</p></li><li><p>0</p><p>27.5</p><p>55</p><p>82.5</p><p>110</p><p>2007 2008 2009 2010Revenue Widgets Things Stuff</p><p>In the past, when we collected information, we had a priori knowledge of how wed use it. When we put information into databases, we knew what it was for. We knew we were collecting quarterly sales figures by store, by sales rep, and by product. Storage was expensive, data warehouses took time to manage, and what lived in our databases had structure.</p></li><li><p>From Elasticsoul on Flickr (http://www.flickr.com/photos/elasticsoul/19940431)</p><p>Big Data, on the other hand, is about unstructured information we collect on faith. We drink from the firehose. We dont know how well use it yet. We store it because we think itll be useful later. We have good reason to think so:</p></li><li><p>In 2020 a two-disk, 2.5 drive will store over 14 TB </p><p>and will cost $40.</p><p>Magnetic disk areal storage density doubles annually. This estimate assumes that hard drives continue to progress at their current pace; it was first reported in physorg according to http://en.wikipedia.org/wiki/Mark_Kryder</p><p>Its going to be really cheap. Imagine an iPad that can do BI work. So the cost of storing and analyzing is so low, we often assume we may as well keep everything.</p></li><li><p>Why its an advantage</p><p>Big Datas a nebulous term, like cloud computing. Its not really clear what it means.</p></li><li><p>Any sufficiently advanced technology is indistinguishable from magic.Arthur C. Clarke, Profiles of The Future, 1961 (Clarkes third law)</p></li><li><p>Advancement is in the eye of the beholder</p></li><li><p>For traditional, non-technical business,big datais magic.</p></li><li><p>Nobody is immune.</p><p>Big Datas a nebulous term, like cloud computing. Its not really clear what it means.</p></li><li><p>http://www.flickr.com/photos/saucysalad/3640865387</p><p>A quick visit to a music store</p></li><li><p>http://www.flickr.com/photos/uggboy/4158150814</p><p>a DVD rental outlet, </p></li><li><p>http://www.flickr.com/photos/maladjusted/5207565912</p><p>or a travel agent will confirm thisif you can still find one. </p></li><li><p>http://www.flickr.com/photos/bobjagendorf/5130753552</p><p>Companies that arent using data to transform themselves will soon be the walking dead, unable to anticipate their markets. The webs household names got where they are today by mining the information that their users generate and turning it into business advantage.</p></li><li><p>Why isntBlockbuster Netflix?They had data on what people watch and where they live. They just didnt thing the postal service was a substitute for retail outlets.</p><p>Big Data has already transformed many industries forever. </p></li><li><p>But its not just about new businesses.Its about doing the boring things in new ways.</p><p>Big Datas a nebulous term, like cloud computing. Its not really clear what it means.</p></li><li><p>Fina</p><p>nce </p><p>&amp; b</p><p>anki</p><p>ngQ</p><p>uant</p><p>Pha</p><p>rmac</p><p>eutic</p><p>als</p><p>Gen</p><p>omic</p><p>sex</p><p>pert</p><p>Ene</p><p>rgy</p><p>Qua</p><p>ntita</p><p>tive</p><p>geol</p><p>ogis</p><p>t</p><p>Civ</p><p>ic p</p><p>lann</p><p>ing</p><p>Traf</p><p>fic p</p><p>atte</p><p>rn </p><p>anal</p><p>yst</p><p>Nat</p><p>iona</p><p>l def</p><p>ense</p><p>Sim</p><p>ulat</p><p>ions</p><p> op</p><p>erat</p><p>or</p><p>......</p><p>Datascientist</p><p>Many industries have had employees who worked with big data for decades. But until recently, they thought of themselves in terms of their industrythey didnt realize they were part of a discipline that reached beyond the borders of their specific vertical.Today, we call these people data scientists, and the good ones are able to move between industries easily.</p></li><li><p>0K</p><p>125,000K</p><p>250,000K</p><p>375,000K</p><p>500,000K</p><p>Amazon Q410 Barnes &amp; Noble Q410 Netflix Q409 Blockbuster Q409 Dropbox Q211 Groupon Q211</p><p>Revevnue/Employee (000s)</p><p>Revenue per capita, augmenting humans with data</p></li><li><p>0K</p><p>17,500,000K</p><p>35,000,000K</p><p>52,500,000K</p><p>70,000,000K</p><p>Amazon Barnes &amp; Noble Netflix Blockbuster Dropbox Groupon</p><p>Market cap/employee August 2011</p><p>Who would you rather be?</p></li><li><p>Now, the leader is the one who knows what questions to ask</p></li><li><p>(Computers, Communications and the Public Interest, pages 40-41, Martin Greenberger, ed., The Johns Hopkins Press, 1971.)</p><p>What information consumes is rather obvious: it consumes the attention of its recipients.Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.</p><p>Now, the leader is the one who knows what questions to ask</p></li><li><p>What would an MBA look like in a data-filled world?</p><p>Big Datas a nebulous term, like cloud computing. Its not really clear what it means.</p></li><li><p>Once, a leader convinced others in the absence of data.</p><p>Once, a leader was someone who could convince people to do things in the absence of data.</p></li><li><p>Now, a leader knows what questions to ask.</p><p>Now, the leader is the one who knows what questions to ask</p></li><li><p>In an era of technology we'll be judging companies by their ability to augment people with technology.</p></li><li><p>Welcome to JumpStart.</p></li></ul>