Apache Accumulo and Cloudera

Post on 27-Jan-2015

114 views 3 download

description

 

Transcript of Apache Accumulo and Cloudera

Apache  Accumulo  and  Cloudera  Hadoop-­‐DC,  July  2013  Joey  Echeverria  |  Director,  Federal  FTS  joey@cloudera.com  |  @fwiffo  

©2013  Cloudera,  Inc.  All  Rights  Reserved.  1

Apache  Accumulo  and  Cloudera  

HADOOP  101  

2  

OperaNng  Systems  

•  Manage  and  schedule  machine  resources  •  CPU  •  RAM  •  Memory  

•  Provide  abstracNons  and  APIs  •  Files  =  stream  of  bytes  •  Process  =  instrucNons  +  private  memory  space  

3

Distributed  OperaNng  System  

•  Same  thing,  but  over  a  cluster  of  networked  servers  •  AddiNonal  concerns:  

•  Inter-­‐process  and  inter-­‐machine  communicaNon  •  Data  locality  •  Data  availability  •  Data  processing  availability  

4

Hadoop  

•  Defacto  Distributed  OperaNng  System  •  Apache  HDFS  •  Apache  MapReduce  and  Apache  YARN  

5

Ecosystem  

6

Key  Value  Stores   High  Level  Batch  Languages  

Low  Latency  SQL  Engine  Graph  Processing  

Cloudera  

7

CDH  History  

8

CDH1    

*HDFS  *MR  *Hive  *Pig  

CDH2    

*HDFS  *MR  *Hive  *Pig  

CDH3    

*HDFS  *MR  *Hive  *Pig  *Flume  *HBase  Hue  *Mahout  *Oozie  *Sqoop  *Whirr  *Zookeeper  *Avro  

CDH4    *HDFS  *MR  *YARN  *Hive  *Pig  *Flume  *HBase  Hue  *Mahout  *Oozie  *Sqoop  *Whirr  *Zookeeper  *Avro  DataFu  HCatalog  Impala  *Solr  *BigTop  Sentry  

Apache  Accumulo  and  Cloudera  

ACCUMULO  101  AND  201  

9  

BigTable  

10

Accumulo  Data  Model  

•  MulJ-­‐dimensional  sorted  map  row id -> [ family -> [ qualifier -> [ visibility -> [ timestamp -> value ] ] ] ]

11

Accumulo  Storage  Model  

•  key  -­‐>  value  •  key  =  <row  id><column><Nmestamp>  •  column  =  <family><qualifier><visibility>  

12

Key  Value  

Row  ID  Column  

Timestamp  Family   Qualifier   Visibility  

13  

Other  Concerns  

•  Write-­‐ahead  log  •  Tablet  server  failure  handling  •  Versioning  •  Iterators  •  Cell-­‐level  security  

14

Apache  Accumulo  and  Cloudera  

PROJECT  HISTORY  

15  

Pre-­‐Apache  

16

Apache  

17

RelaNonship  to  Hadoop  Releases  

•  1.3.x  -­‐>  Hadoop  0.20.2  •  1.4.x  -­‐>  Hadoop  0.20.2,  Hadoop  0.20.203  •  1.5.x  -­‐>  Hadoop  1.0.4,  Hadoop  2.0.4-­‐alpha  

18

Accumulo  and  Cloudera  Releases  

•  Accumulo  1.3.x,  1.4.x,  and  1.5.x  all  work  with  CDH3  •  Accumulo  1.5.x  should  work  with  CDH4…  

•  Limited  tesNng  

19

Apache  Accumulo  and  Cloudera  

ANNOUNCEMENT  

20  

Apache  Accumulo  and  Cloudera  

CLOUDERA  SUPPORT  OF  APACHE  ACCUMULO  ON  CDH4  

21  

Apache  Accumulo  and  Cloudera  

DEMO  

22  

System  Logs  

•  Id  •  Unique  id  for  an  acNon  

•  Timestamp  •  Time  the  acNon  occured  

•  Actor  •  User  or  system  performing  the  acNon  

•  AcNon  •  The  acNon  taken  

•  Object  •  The  object  of  the  acNon  

•  Info  •  Free  form  informaNon  (e.g.  success/failure,  alribute  value,  etc.)  

23

AcNons  

•  created_user  •  deleted_user  •  set_password  •  logged_in  •  logged_out  •  read  •  modified  

24

Roles  

•  system  •  Any  user  on  the  system  

•  admin  •  Administrators  

•  audit  •  Auditors  

25

Accumulo  Data  Model  

26

Key  Value  

Row  ID  Column  

Timestamp  Family   Qualifier   Visibility  

<ts>-­‐<id>   <actor>   <acNon>:<object>           <info>  

Apache  Accumulo  and  Cloudera  

DEMO  

27  

Logs  Demo  

28

Row  key   Column   Visibility   Value  

201307241535-­‐1   root:created_user:sean   audit   succeeded  

201307241535-­‐1   root:set_password:sean   admin&audit   password  

201307241537-­‐2   sean:logged_in:host   system   succeeded  

201307241538-­‐3    

sean:read:/tmp/a   audit   succeeded  

201307241539-­‐4    

sean:modified:/tmp/a   audit   failed  

201307241540-­‐5    

sean:logged_out:host   system   succeeded  

Apache  Accumulo  and  Cloudera  

VERSIONS  REDUX  

29  

Recap  

•  Accumulo  1.3.x,  1.4.x,  and  1.5.x  all  work  with  CDH3  •  Accumulo  1.5.x  should  work  with  CDH4  

30

Cloudera  Support  

•  Naturally,  Cloudera  has  tested  and  packaged  Accumulo  1.5…  

•  But  1.5  is  rather  bleeding  edge…  

•  So,  we  instead  back  ported  Hadoop  2.0  support  from  1.5  onto  1.4.3  

31

Apache  Accumulo  and  Cloudera  

ECOSYSTEM  INTEGRATION  

32  

Apache  Nutch  

33

Apache  Pig  

34

Apache  Accumulo  and  Cloudera  

DEMO  

35  

Apache  Accumulo  and  Cloudera  

NEXT  STEPS  

36  

Recap  

•  What’s  available  today  •  Beta  release  of  Accumulo  1.4.3  on  CDH4.3  •  Beta  release  of  Accumulo  1.4.3  Pig  integraNon  

•  Semi-­‐private  beta  •  Contact  me  (joey@cloudera.com)  if  you’re  interested  in  trying  out  the  bits  

37

Future  Ideas  (not  promises  ;)  

•  Cloudera  Manager  integraNon  •  Flume  integraNon  •  Sqoop  integraNon  •  Hive  integraNon  •  Impala  integraNon  

38

What  next?  

•  Download  Hadoop!  •  CDH  available  at  www.cloudera.com  •  Cloudera  provides  pre-­‐loaded  VMs  

•  hlps://ccp.cloudera.com/display/SUPPORT/Cloudera+QuickStart+VM  

•  Reach  out  to  me  (joey@cloudera.com)  if  you  want  to  try  out  the  Accumulo  beta  

•  InstrucNons  to  replicate  the  demos  pending  

My  personal  preference  

•  Cloudera  Manager  •  hlps://ccp.cloudera.com/display/SUPPORT/Downloads  

•  Free  up  to  unlimited  nodes!  

Shout  Out  

•  Jason  Trost  •  @jason_trost  •  covert.io  blog  posts  

•  hlp://www.covert.io/post/18414889381/accumulo-­‐nutch-­‐and-­‐gora  

•  hlp://www.covert.io/post/18605091231/accumulo-­‐and-­‐pig  

QuesNons?  

•  Contact  me!  •  Joey  Echeverria  •  joey@cloudera.com  •  @fwiffo  

•  We’re  hiring!  

©2013  Cloudera,  Inc.  All  Rights  Reserved.  43