Apach avro

24
Apach Avro

Transcript of Apach avro

Page 1: Apach avro

Apach Avro

Page 2: Apach avro

Overview 1

Data serialization system or /and Data Exchange

Resolve Hadoop Writables : lack of portability

Sharing data

Language independent-schema (JSON)

No need for code generation

Page 3: Apach avro

Overview 2

Supports schema evolution

Supports compression and splitting

Rich data types and schema

Page 4: Apach avro

Avro Data types and Schemas 1

null

boolean

int

long

float

double

bytes

Page 5: Apach avro

Avro Data types and Schemas 2

array

map

record

enum

fixed

union

Page 6: Apach avro

Avro Data types and Schemas 3

Generic Java mapping

Specific Java mapping

Reflect Java mapping

Page 7: Apach avro

In-memory Serialization and Deserialization

Page 8: Apach avro

Specific API (avro-tool)

Page 9: Apach avro

Datafiles

Schema

Avro object

Marker sync

In binary format

Page 10: Apach avro

Datafiles

Page 11: Apach avro

Portability

Page 12: Apach avro

Portability

Page 13: Apach avro

Schema resolution (Projection)

Page 14: Apach avro

Sort Order

Every avro object has ordering rule except records

Comparing works directly on the byte streams

Page 15: Apach avro

Avro MapReduce

Avro offers many API to run MapReduce on Avro data

Page 16: Apach avro

Avro MapReduce

Page 17: Apach avro

Avro MapReduce

Page 18: Apach avro

Avro MapReduce

Page 19: Apach avro

Avro MapReduce

Page 20: Apach avro

Avro MapReduce

Page 21: Apach avro

Avro MapReduce

Page 22: Apach avro

Avro MapReduce

Page 23: Apach avro

Avro Sorting MapReduce

Page 24: Apach avro

Avro Sorting MapReduce