Hug Milano September 2014: Hadoop Summit Europe Impressions

24
1 Hadoop user group Italy Alberto Ghedin

Transcript of Hug Milano September 2014: Hadoop Summit Europe Impressions

1

Hadoop user group Italy

Alberto Ghedin

2

European Hadoop Summit Aprile 2014

3

European Hadoop Summit Aprile 2014

https://www.youtube.com/watch?v=Fz-GnjOZAmQ

4

European Hadoop Summit Aprile 2014

https://www.youtube.com/watch?v=Fz-GnjOZAmQ

5

European Hadoop Summit Aprile 2014

https://www.youtube.com/watch?v=Fz-GnjOZAmQ

6

European Hadoop Summit Aprile 2014

https://www.youtube.com/watch?v=Fz-GnjOZAmQ

7

European Hadoop Summit Aprile 2014

https://www.youtube.com/watch?v=Fz-GnjOZAmQ

8

Hadoop Summit – Putting wings on the elephant

9

Hadoop Summit – Putting wings on the elephant

10

Hadoop Summit - Impala

● Massive parallel processing SQL query engine

● Utilizza i suoi demoni nel cluster● Non usa map reduce● Non materiallizza processi intermedi● Usa il più possibile istruzioni macchina● Usa la memoria per salvare i dati intermedi● Non ha l’update

11

Hadoop Summit - Tez

12

Hadoop Summit – Perchè Tez?

● MR● Uso intensivo di file

temporanei e scritture in HDFS

● API espressive● Non è necessario

persistere passi intermedi

13

Hadoop Summit – Tez API

● Esecuzione● Logica + risorse ● Trasferimento Dati

● DAG● Nodo● Arco

Deve essere aciclico per meccanismo di fault tollerance

14

Hadoop Summit – Tez API

● Nodi

15

Hadoop Summit – Tez API

● Archi– Data-movement:

● One to One● Broadcast● Scatter Gather

– Scheduling● Sequenziale● Concorrente

– Data source property● Peristed● Peristed reliable● Ephimeral

16

Hadoop Summit – Esempi

● MR– Data-movementment:

● Scatter Gather

– Scheduling● Sequenziale

– Data source property● Peristed

● Streaming– Scheduling

● Concurrent

– Data source property● Ephimeral

17

Hadoop Summit – Hive on Tez

18

Hadoop Summit – Hive on Tez

19

Hadoop Summit – Hive on Tez

MR

TEZ

20

Hadoop Summit – Hive on Tez

21

Hadoop Summit – Hive on Tez

22

Hadoop Summit – Pig on Tez

23

Hadoop Summit – Pig on Spark

+ = Spork

24

Greetings

Q&A

@AlbertoGhedo

[email protected]