InfoShare2015
-
Upload
jakub-wszolek -
Category
Documents
-
view
77 -
download
0
Transcript of InfoShare2015
![Page 1: InfoShare2015](https://reader033.fdocuments.net/reader033/viewer/2022052915/58f085e61a28abfd438b4629/html5/thumbnails/1.jpg)
© 2013 Acxiom Corporation. All Rights Reserved. © 2013 Acxiom Corporation. All Rights Reserved.
Jak oswoić słonia w korporacji Hadoop in practice
11.06.2015 – Jakub Wszolek ([email protected])
twitter.com/jwszol
![Page 2: InfoShare2015](https://reader033.fdocuments.net/reader033/viewer/2022052915/58f085e61a28abfd438b4629/html5/thumbnails/2.jpg)
© 2013 Acxiom Corporation. All Rights Reserved.
NoSQL
2
![Page 3: InfoShare2015](https://reader033.fdocuments.net/reader033/viewer/2022052915/58f085e61a28abfd438b4629/html5/thumbnails/3.jpg)
© 2013 Acxiom Corporation. All Rights Reserved.
Era danych
3
Nazwa Ilość danych
New York Stock 1 TB nowych danych / dzień
Ancestry.com (genealogy site) 2.5 PB danych
Facebook 1 PB danych
Allegro.pl Aukcja o numerze 1 600 000 000 (8 maja
2011)
![Page 4: InfoShare2015](https://reader033.fdocuments.net/reader033/viewer/2022052915/58f085e61a28abfd438b4629/html5/thumbnails/4.jpg)
© 2013 Acxiom Corporation. All Rights Reserved.
Rynek pracy
4
![Page 5: InfoShare2015](https://reader033.fdocuments.net/reader033/viewer/2022052915/58f085e61a28abfd438b4629/html5/thumbnails/5.jpg)
© 2013 Acxiom Corporation. All Rights Reserved.
Hadoop
5
• Hadoop framework
• Praktycznie nieograniczona skalowalność
• Środowisko rozproszone
• Możliwości szybkiej analizy dużych
wolumenów
• Dedykowane aplikacje
• wyszukiwanie trendów
• analizy statystyczne
![Page 6: InfoShare2015](https://reader033.fdocuments.net/reader033/viewer/2022052915/58f085e61a28abfd438b4629/html5/thumbnails/6.jpg)
© 2013 Acxiom Corporation. All Rights Reserved.
Hadoop eco-system
6
![Page 7: InfoShare2015](https://reader033.fdocuments.net/reader033/viewer/2022052915/58f085e61a28abfd438b4629/html5/thumbnails/7.jpg)
© 2013 Acxiom Corporation. All Rights Reserved.
Klaster klienta
7
• Srodowiska VM vs. Fizyczne
• Wolumen danych
• bigdata.myAcxiom.com – warstwa dostepowa
![Page 8: InfoShare2015](https://reader033.fdocuments.net/reader033/viewer/2022052915/58f085e61a28abfd438b4629/html5/thumbnails/8.jpg)
© 2013 Acxiom Corporation. All Rights Reserved.
Klaster klienta
8
• HUE
• Dedykowane rozwiazania administracyjne
![Page 9: InfoShare2015](https://reader033.fdocuments.net/reader033/viewer/2022052915/58f085e61a28abfd438b4629/html5/thumbnails/9.jpg)
© 2013 Acxiom Corporation. All Rights Reserved.
Ingestion/extraction process
9
Extract
Ingestion
Ingestion
Extract
![Page 10: InfoShare2015](https://reader033.fdocuments.net/reader033/viewer/2022052915/58f085e61a28abfd438b4629/html5/thumbnails/10.jpg)
© 2013 Acxiom Corporation. All Rights Reserved.
Automatyzacja
• Oozie scheduler
• Autorskie narzedzia wspomagajace
-Hadoop Java Framework
-Python
-Shell script
• Rozbudowany system raportowania
10
![Page 11: InfoShare2015](https://reader033.fdocuments.net/reader033/viewer/2022052915/58f085e61a28abfd438b4629/html5/thumbnails/11.jpg)
© 2013 Acxiom Corporation. All Rights Reserved.
Analiza - R
11
• RevR + RStudio
• DataScience
• Analiza trendów, zaawansowany klastering
• Budowanie modeli predykcyjych
• Klasyfikatory
![Page 12: InfoShare2015](https://reader033.fdocuments.net/reader033/viewer/2022052915/58f085e61a28abfd438b4629/html5/thumbnails/12.jpg)
© 2013 Acxiom Corporation. All Rights Reserved.
Typowe problemy
• Podzial zasobow klastra pomiedzy wielku
uzytkowników
• Fair Scheduler - http://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html
• Edge node – zarzadzanie dostepem
• Partycjonwanie duzych tabeli
• Nieoptymalne zapytania (HQL)
12
![Page 13: InfoShare2015](https://reader033.fdocuments.net/reader033/viewer/2022052915/58f085e61a28abfd438b4629/html5/thumbnails/13.jpg)
© 2013 Acxiom Corporation. All Rights Reserved. © 2013 Acxiom Corporation. All Rights Reserved.
Pytania?
Dziekuje!