dc area spark interactive nov 2015

39
JeanDaniel Cryans on behalf of the Kudu team Kudu: a new storage engine for fast analytics on fast data

Transcript of dc area spark interactive nov 2015

Page 1: dc area spark interactive nov 2015

1©  Cloudera,  Inc.  All  rights  reserved.

Jean-­‐Daniel  Cryans on  behalf  of  the  Kudu  team

Kudu: a  new  storage  engine  for  fast  analytics  on  fast  data

Page 2: dc area spark interactive nov 2015

2©  Cloudera,  Inc.  All  rights  reserved.

Myself

• Software  Engineer  at  Cloudera• On  the  Kudu  team  for  2  years• Apache  HBase committer  and  PMC  member  since  2008• Previously  at  StumbleUpon

Page 3: dc area spark interactive nov 2015

3©  Cloudera,  Inc.  All  rights  reserved.

Motivation  and  GoalsWhy  build  Kudu?

3

Page 4: dc area spark interactive nov 2015

4©  Cloudera,  Inc.  All  rights  reserved.

Motivating  Questions

• Are  there  user  problems  that  can  we  can’t  address  because  of  gaps  in  Hadoopecosystem  storage  technologies?• Are  we  positioned  to  take  advantage  of  advancements  in  the  hardware  landscape?

Page 5: dc area spark interactive nov 2015

5©  Cloudera,  Inc.  All  rights  reserved.

Questions  to  the  crowd

•What  is  Apache  Parquet  good  at?•What  is  Apache  HBase good  at?

Page 6: dc area spark interactive nov 2015

6©  Cloudera,  Inc.  All  rights  reserved.

Current  Storage  Landscape  in  HadoopHDFS  excels  at:• Efficiently  scanning  large  amounts  

of  data• Accumulating  data  with  high  

throughputHBase  excels  at:• Efficiently  finding  and  writing  

individual  rows• Making  data  mutable

Gaps  exist  when  these  properties  are  needed  simultaneously

Page 7: dc area spark interactive nov 2015

7©  Cloudera,  Inc.  All  rights  reserved.

• High  throughput  for  big  scans  (columnar  storage  and  replication)Goal:Within  2x  of  Parquet

• Low-­‐latency  for  short  accesses  (primary  key  indexes  and  quorum  replication)Goal: 1ms  read/write  on  SSD

• Database-­‐like semantics  (initially  single-­‐row  ACID)

• Relational  data  model• SQL  query• “NoSQL”  style  scan/insert/update  (Java  client)

Kudu  Design  Goals

Page 8: dc area spark interactive nov 2015

8©  Cloudera,  Inc.  All  rights  reserved.

Questions  to  the  crowd

• How  much  RAM  do  typical  servers  have  today?•What  about  in  2  years?• How  many  of  you  have  hybrid  ssd/hdd servers?•Who’s  currently  running/planning  to  run/would  like  to  run  with  non-­‐volatile  memory  express  drives  (NVMe)?

Page 9: dc area spark interactive nov 2015

9©  Cloudera,  Inc.  All  rights  reserved.

Changing  Hardware  landscape

• Spinning  disk  -­‐>  solid  state  storage• NAND  flash:  Up  to  450k  read  250k  write  iops,  about  2GB/sec  read  and  1.5GB/sec  write  throughput, at  a  price  of  less  than  $3/GB  and  dropping• 3D  XPoint memory (1000x  faster  than  NAND,  cheaper  than  RAM)

• RAM is  cheaper  and  more  abundant:• 64-­‐>128-­‐>256GB  over  last  few  years

• Takeaway  1:  The next  bottleneck  is  CPU,  and  current  storage  systems  weren’t  designed  with  CPU  efficiency  in  mind.• Takeaway  2: Column  stores  are  feasible  for  random  access

Page 10: dc area spark interactive nov 2015

10©  Cloudera,  Inc.  All  rights  reserved.

Kudu  Usage

• Table  has  a  SQL-­‐like  schema• Finite  number  of  columns  (unlike  HBase/Cassandra)• Types:  BOOL,  INT8,  INT16,  INT32,  INT64,  FLOAT,  DOUBLE,  STRING,  BINARY,  TIMESTAMP• Some  subset  of  columns  makes  up  a  possibly-­‐composite  primary  key• Fast  ALTER  TABLE

• Java  and  C++  “NoSQL”  style  APIs• Insert(),  Update(),  Delete(),  Scan()

• Integrations  with  MapReduce,  Spark,  and  Impala•more  to  come!

10

Page 11: dc area spark interactive nov 2015

11©  Cloudera,  Inc.  All  rights  reserved.

Use  cases  and  architectures

Page 12: dc area spark interactive nov 2015

12©  Cloudera,  Inc.  All  rights  reserved.

Kudu  Use  Cases

Kudu  is  best  for  use  cases  requiring  a  simultaneous  combination  ofsequential  and  random  reads  and  writes

● Time  Series○ Examples:  Stream  market  data;  fraud  detection  &  prevention;  risk  monitoring○ Workload:  Insert,  updates,  scans,  lookups

●Machine  Data  Analytics○ Examples:  Network  threat  detection○ Workload:  Inserts,  scans,  lookups

●Online  Reporting○ Examples:  OperationalData  Store  (ODS)○ Workload:  Inserts,  updates,  scans,  lookups

Page 13: dc area spark interactive nov 2015

13©  Cloudera,  Inc.  All  rights  reserved.

Questions  to  the  crowd

• How  do  you  manage  incoming  data  when  running  reports  on  data  stored  in  Parquet?

Page 14: dc area spark interactive nov 2015

14©  Cloudera,  Inc.  All  rights  reserved.

Real-­‐Time  Analytics  in  Hadoop  TodayFraud  Detection  in  the  Real  World  =  Storage  Complexity

Considerations:● How  do  I  handle   failure  

during   this  process?

● How  often  do  I  reorganize  data  streaming  in  into  a  format  appropriate  for  reporting?

● When  reporting,   how  do  I  see  data  that  has  not  yet  been  reorganized?

● How  do  I  ensure   that  important   jobs  aren’t  interrupted  by  maintenance?

HBaseHave  we  

accumulated  enough  data?

Incoming  Data  (Messaging  System)

Parquet  File

Reorganize  HBase  file  into  Parquet

Reporting  Request

New  Partition

Most  Recent  Partition

Historic  Data

Impala  on  HDFS

• Wait  for  running   operations  to  complete  • Define  new  Impala  partition  referencing  the  newly  written  Parquet  file

Page 15: dc area spark interactive nov 2015

15©  Cloudera,  Inc.  All  rights  reserved.

Real-­‐Time  Analytics  in  Hadoop  with  Kudu

Improvements:● One  system to  operate

● No  cron  jobs  or  background  processes

● Handle  late  arrivals  or  data  corrections  with  ease

● New  data  available  immediately  for  analytics  or  operations  

Historical  and  Real-­‐timeData

Incoming  Data  (Messaging  System)

Reporting  Request

Storage  in  Kudu

Page 16: dc area spark interactive nov 2015

16©  Cloudera,  Inc.  All  rights  reserved.

How  it  works

16

Page 17: dc area spark interactive nov 2015

17©  Cloudera,  Inc.  All  rights  reserved.

Tables  and  Tablets

• Table  is  horizontally  partitioned  into  tablets• Range or  hash partitioning• PRIMARY KEY (host, metric, timestamp) DISTRIBUTE BY HASH(timestamp) INTO 100 BUCKETS

• Each  tablet  has  N  replicas  (3  or  5),  with  Raft consensus• Allow  read  from  any  replica,  plus  leader-­‐driven  writes  with  low  MTTR

• Tablet  servers  host  tablets• Store  data  on  local  disks  (no  HDFS)

17

Page 18: dc area spark interactive nov 2015

18©  Cloudera,  Inc.  All  rights  reserved.

Questions  to  the  crowd

• How  many  failure(s)  can  a  quorum  of  3  nodes  tolerate?

Page 19: dc area spark interactive nov 2015

19©  Cloudera,  Inc.  All  rights  reserved.

Page 20: dc area spark interactive nov 2015

20©  Cloudera,  Inc.  All  rights  reserved.

Questions  to  the  crowd

•What  are  the  main  advantages  of  using  an  LSM  tree  when  inserting/mutating  rows,  like  in  HBase and  Cassandra?• In  which  situations  can  column-­‐stores  can  read  data  faster  than  row-­‐stores?

Page 21: dc area spark interactive nov 2015

21©  Cloudera,  Inc.  All  rights  reserved.

Kudu  trade-­‐offs

• Random  updates  will  be  slower• HBase  model  allows  random  updates  without  incurring  a  disk  seek• Kudu  requires  a  key  lookup  before  update,  bloom  lookup  before  insert,  may  incur  seeks

• Single-­‐row  reads  may  be  slower• Columnar  design  is  optimized  for  scans• Especially  slow  at  reading  a  row  that  has  had  many  recent  updates  (e.g YCSB  “zipfian”)

21

Page 22: dc area spark interactive nov 2015

22©  Cloudera,  Inc.  All  rights  reserved.

Benchmarks

22

Page 23: dc area spark interactive nov 2015

23©  Cloudera,  Inc.  All  rights  reserved.

TPC-­‐H  (Analytics  benchmark)

• 75TS  +  1  master  cluster• 12  (spinning)  disk  each,  enough  RAM  to  fit  dataset• Using  Kudu  0.5.0,  Impala  2.2  with  Kudu  support,  CDH  5.4• TPC-­‐H  Scale  Factor  100  (100GB)

• Example  query:• SELECT n_name, sum(l_extendedprice * (1 - l_discount)) as revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkeyAND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate >= date '1994-01-01' AND o_orderdate < '1995-01-01’ GROUP BY n_name ORDER BY revenue desc;

23

Page 24: dc area spark interactive nov 2015

24©  Cloudera,  Inc.  All  rights  reserved.

-­‐ Kudu  outperforms   Parquet  by  31%  (geometric  mean)  for  RAM-­‐resident  data-­‐ Parquet  likely  to  outperform   Kudu   for  HDD-­‐resident  (larger  IO  requests)

Page 25: dc area spark interactive nov 2015

25©  Cloudera,  Inc.  All  rights  reserved.

What  about  Apache  Phoenix?• 10  node  cluster  (9  worker,  1  master)• HBase  1.0,  Phoenix  4.3• TPC-­‐H  LINEITEM  table  only  (6B  rows)

25

2152

21976

131

0.04

1918

13.2

1.7

0.7

0.15

155

9.3

1.4 1.5 1.37

0.01

0.1

1

10

100

1000

10000Load TPCH  Q1 COUNT(*)

COUNT(*)WHERE…

single-­‐rowlookup

Time  (sec)

Phoenix

Kudu

Parquet

Page 26: dc area spark interactive nov 2015

26©  Cloudera,  Inc.  All  rights  reserved.

What  about  NoSQL-­‐style  random  access?  (YCSB)

• YCSB 0.5.0-­‐snapshot• 10  node  cluster(9  worker,  1  master)• HBase 1.0• 100M  rows,  10M  ops

26

Page 27: dc area spark interactive nov 2015

27©  Cloudera,  Inc.  All  rights  reserved.

Apache  Spark

27

Page 28: dc area spark interactive nov 2015

28©  Cloudera,  Inc.  All  rights  reserved.

Motivational  Quotes

• “seeing  kudu  as  a  great  fit  for  Spark  Streaming  workloads”

• “One  unexpected  benefit  I  am  seeing  from  Kudu  is  the  filter  performance  on  non-­‐key  columns  -­‐ subsec on  2  million  rows  of  meetup data...  (on  small  ec2  instance)”• Kudu  user  in  our  Slack  channel,  11/03/15

28

Page 29: dc area spark interactive nov 2015

29©  Cloudera,  Inc.  All  rights  reserved.

Spark  Streaming

29

Live  ingestion

Get  latest  data

Build  models

Page 30: dc area spark interactive nov 2015

30©  Cloudera,  Inc.  All  rights  reserved.

Spark  SQL

• Great  fit  for  Dataframes• Kudu  natively  supports  types

• Can  replace  Parquet  storage• Kudu’s  goal  is  to  be  almost  as  fast  as  Parquet• Predicate  pushdown  +  lazy  materialization• Plus  makes  it  possible  to  mutate  data

• A  better  alternative  to  other  “NoSQL”  storage  engines• Aims  to  offer  sequential  consistency,  2nd only  to  strict  consistency• Consistency  can  be  relaxed  to  enable  faster  reading  of  stale  data

Page 31: dc area spark interactive nov 2015

31©  Cloudera,  Inc.  All  rights  reserved.

Code  snippetsval kuduContext = new KuduContext(sparkContext, kuduMaster)

val kuduRdd = kuduContext.kuduRDD(tableName, ”c1").map(r => {

val row = r._2

val c1 = row.getLong(0)etc… })

someStream.kuduForeachPartition(kuduContext, (it, kuduClient, asyncClient) => {

val table = kuduClient.openTable(tableName)

etc… })

sqlContext.load("org.kududb.spark”,

Map("kudu.table" -> ”example_table”,

"kudu.master”-> kuduMaster)).registerTempTable(“example_table")

Page 32: dc area spark interactive nov 2015

32©  Cloudera,  Inc.  All  rights  reserved.

Demo

32

Page 33: dc area spark interactive nov 2015

33©  Cloudera,  Inc.  All  rights  reserved.

Demo

33

• Code  currently  at  https://github.com/tmalaska/SparkOnKudu/

Gamer  data  points

Producer   sends  datapoints   to  Kafka

Spark  pulls  from  Kafka Spark  loads  basedata  from  Kudu

Aggregates  are  storedback  into  Kudu

Live  queries  comefrom  Impala

Page 34: dc area spark interactive nov 2015

34©  Cloudera,  Inc.  All  rights  reserved.

Demo

Page 35: dc area spark interactive nov 2015

35©  Cloudera,  Inc.  All  rights  reserved.

Getting  started

35

Page 36: dc area spark interactive nov 2015

36©  Cloudera,  Inc.  All  rights  reserved.

Project  status

• Public  Beta  released  September  28th  2015,  version  0.5.0• Not  ready  for  production• No  security• Feedback/jiras/patches  welcome

• Next  release  in  this  month  (0.6.0):•Mac  OSX  support  for  single  node  deployment• Initial  Apache  Spark  integration• Lots  of  small  fixes  and  improvements

• GA  sometime  next  year  (hopefully!)

36

Page 37: dc area spark interactive nov 2015

37©  Cloudera,  Inc.  All  rights  reserved.

Getting  started  as  a  user

• http://getkudu.io• kudu-­‐[email protected]• http://getkudu-­‐slack.herokuapp.com/

• Quickstart VM• Easiest  way  to  get  started• Impala  and  Kudu  in  an  easy-­‐to-­‐install  VM

• CSD  and  Parcels• For  installation  on  a  Cloudera  Manager-­‐managed  cluster

37

Page 38: dc area spark interactive nov 2015

38©  Cloudera,  Inc.  All  rights  reserved.

Getting  started  as  a  developer

• http://github.com/cloudera/kudu• All  commits  go  here  first

• Public  gerrit:  http://gerrit.cloudera.org• All  code  reviews  happening  here

• Public  JIRA:  http://issues.cloudera.org• Includes  bugs  going  back  to  2013.  Come  see  our  dirty  laundry!

• kudu-­‐[email protected]

• Apache  2.0  license  open  source• Contributions  are  welcome  and  encouraged!

38

Page 39: dc area spark interactive nov 2015

39©  Cloudera,  Inc.  All  rights  reserved.

http://getkudu.io/@getkudu