Zwei jahrebigdata
-
Upload
joerg-blumtritt -
Category
Technology
-
view
106 -
download
0
description
Transcript of Zwei jahrebigdata
![Page 1: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/1.jpg)
Big Data:Das zweite Jahr.
Joerg Blumtritt
![Page 2: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/2.jpg)
2
![Page 3: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/3.jpg)
![Page 4: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/4.jpg)
4
![Page 5: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/5.jpg)
5
![Page 6: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/6.jpg)
The Future of Market Research
![Page 7: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/7.jpg)
![Page 8: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/8.jpg)
Hardware
Traditional• exotic hardware• big central servers• SAN• RAID• hardware reliability• expensive• limited scalability
Big Data• commodity HW• racks of pizza boxes• Ethernet• JBOD• unreliable HW• cost effective• scales further
![Page 9: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/9.jpg)
Software
Traditional• monolithic• centralized storage• RDBMS• schema first• proprietary
Big Data• distributed• storage & compute• nodes• raw data• open source
![Page 10: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/10.jpg)
Quanti fication
VolumeVelocityVariety
DataScience
![Page 11: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/11.jpg)
1. Volume– Very large data sets– Data Center → Data Warehouse → Internet Scale– Typical dimensions: billions or trillions of records, millions
or billions of variables– e.g. Twitter: > 400 M Tweets per day– Technologies: MapReduce, HDFS, Project Voldemort
... das erste V
![Page 12: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/12.jpg)
Map-Reduce
12
http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html#Example%3A+WordCount+v2.0
![Page 13: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/13.jpg)
1. Volume2. Velocity
– Very fast data streams– sensor data, smartphones, socia media:– Typical dimensions: 15k-300k/s– Real time inputs / real time outputs– Stream/event pocessing– Technologies: Storm, S4, Esper, HBase, Kafka
zweites V
![Page 14: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/14.jpg)
Storm
14
http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html
![Page 15: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/15.jpg)
1. Volume2. Velocity3. Variety / Variability
– Manifold and highly variable data structures– data market places, e.g. Datasift, GNIP, Enigma.io– No schema / NoSQL– Distributed storage– Immutability
... und das letzte V
![Page 16: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/16.jpg)
16
{"created_at":"Sat Apr 13 08:07:34 +0000 2013", "id":322984390491774976, "id_str":"322984390491774976", "text":"getr\u00e4umt, ich h\u00e4tte \u00fcber den Skandal geblogt, dass wir immernoch geschirrsp\u00fchlen, genau wie zu Car\u00eames Zeiten.", "source":"\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e", "truncated":false, "in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":10177792,"id_str":"10177792", "name":"Joerg Blumtritt", "screen_name":"jbenno", "location":"Stockdorf", "url":"http:\/\/slow-media.net", "description":"I just coined the word panfuturistic because it sounds cool. http:\/\/memeticturn.com\/declaration-of-liquid-culture", "protected":false,"followers_count":2671,"friends_count":1599,"listed_count":141,"created_at":"Mon Nov 12 11:16:15 +0000 2007", "favourites_count":3582,"utc_offset":3600, "time_zone":"Berlin", "geo_enabled":true,"verified":false,"statuses_count":30140,"lang":"en", "contributors_enabled":false,"is_translator":false,"profile_background_color":"FFFFFF", "profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_images\/816896285\/688fcbc8df9391dfd71012d06ca34002.jpeg", "profile_background_image_url_https":"https:\/\/si0.twimg.com\/profile_background_images\/816896285\/688fcbc8df9391dfd71012d06ca34002.jpeg", "profile_background_tile":false,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/3315156408\/db719e7db02772e468179545fb06e7f9_normal.jpeg", "profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/3315156408\/db719e7db02772e468179545fb06e7f9_normal.jpeg", "profile_banner_url":"https:\/\/si0.twimg.com\/profile_banners\/10177792\/1365261531", "profile_link_color":"0000FF", "profile_sidebar_border_color":"FFFFFF", "profile_sidebar_fill_color":"E0FF92", "profile_text_color":"000000", "profile_use_background_image":true,"default_profile":false,"default_profile_image":false, "following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"favorited":false,"retweeted":false,"lang":"de"}
![Page 17: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/17.jpg)
17
Statt die Konsistenz der Daten schon in der Struktur festzulegen,wird eine Funktion definiert, die jeden Record nach den vorgegebenen Kriterien überprüft:
function IsConsistent(Record, Schema) as Boolean
![Page 18: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/18.jpg)
18
Operation SQL Create INSERT Read (Retrieve) SELECT Update (Modify) UPDATE Delete (Destroy) DELETE
"mutable"
"Each event happens at a particular time and is always true"
• Just C+R; nothing gets ever "updated"
• Records are stored as files. Each record is a new file.
"immutable"
![Page 19: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/19.jpg)
19
Query
Precomputed View(Batch Mode)
Data Stream
All Data
Precomputed realtime view
![Page 20: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/20.jpg)
Quanti fication
VolumeVelocityVariety
DataScience
![Page 21: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/21.jpg)
known knowns known unknowns unknowns unkonws
„data puking“(Dashboards)
„analysis throwing“(Modellings)
„data democracy“(Big Data)
Avinash Kaushik
As we know, There are known knowns. There are things we know we know. We also know There are known unknowns. That is to say: We know there are some things
We do not know. But there are also unknown unknowns, The ones we don't know We don't know.
Donald Rumsfeld
![Page 22: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/22.jpg)
Data Science
22
![Page 23: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/23.jpg)
![Page 24: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/24.jpg)
![Page 25: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/25.jpg)
• Text comparism of party programmes
• Cosinus-Vector distance
![Page 26: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/26.jpg)
26
0
500
1000
1500
0 4 8 12 16 20 0 4 8 12 16 22 2 6 10 14 20
DSDSTatort
So 10.3.Sa 9.3.Fr 8.3.
![Page 27: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/27.jpg)
Personahttp://twitter.com/FlaviaReil/statuses/308321057499144193http://twitter.com/froschmann1968/statuses/308321920200364034http://twitter.com/VeronikaTangen/statuses/308322141676388352http://twitter.com/froschmann1968/statuses/308322188501602304http://twitter.com/QWallyTy/statuses/308322522863128576http://twitter.com/Duftlavendel/statuses/308322911444406272http://twitter.com/kakakiri/statuses/308323144836456448http://twitter.com/Chake/statuses/308323468179566592http://twitter.com/RegulaAeppli/statuses/308323570386350083http://twitter.com/Imissmycat1/statuses/308323602342764544http://twitter.com/WorldNewsGerman/statuses/308323834749140995http://twitter.com/Zoran2010/statuses/308324446035386368
27
![Page 28: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/28.jpg)
28
männlichweiblichn.a.
![Page 29: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/29.jpg)
29
http://www.jasondavies.com/parallel-sets/
http://www.nytimes.com/interactive/2012/05/17/business/dealbook/how-the-facebook-offering-compares.html?_r=0
http://www.senchalabs.org/philogl/PhiloGL/examples/winds/
![Page 30: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/30.jpg)
Quanti fication
VolumeVelocityVariety
DataScience
D3
![Page 31: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/31.jpg)
31
![Page 32: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/32.jpg)
32
![Page 33: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/33.jpg)
33
![Page 34: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/34.jpg)
Quantified Self
34
![Page 35: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/35.jpg)
35
![Page 36: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/36.jpg)
36
![Page 37: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/37.jpg)
37
![Page 38: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/38.jpg)
38
![Page 39: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/39.jpg)
39
![Page 40: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/40.jpg)
40
![Page 41: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/41.jpg)
41
![Page 42: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/42.jpg)
42
![Page 43: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/43.jpg)
43
![Page 44: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/44.jpg)
44
![Page 45: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/45.jpg)
45
![Page 46: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/46.jpg)
Digital Darwinismis the Evolution ofConsumer Behavior whenSociety & TechnologyEvolve FasterThan the AbilityTo Adapt
Brian Solis
![Page 47: Zwei jahrebigdata](https://reader033.fdocuments.net/reader033/viewer/2022051515/54c65ae74a795940598b4580/html5/thumbnails/47.jpg)
47
{"name": "Joerg Blumtritt", "job":
{title: "Strategy Consultant", startdate: "2005", enddate: null
}"job":
{title: "Chairman", company: "Arbeitsgemeinschaft Social Media e.V.", startdate: "2008", enddate: null
}"email": "[email protected]""twitter":"@jbenno", "blog": "http://beautifuldata.net", "blog": "http://slow-media.net", "blog": "http://kuirjeo.net", "blog": "http://memeticturn.net", "website":"http://mediagnosis.de" , "image": "http://slow-media.net/wp-content/uploads/jb_creeper.jpg", "bio": http://beautifuldata.net/Joerg-blumtritt/
}