EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf ·...

Post on 22-May-2020

26 views 0 download

Transcript of EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf ·...

EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS

   

Bay  Area  Big  Data  Meetup  March  24,  2016  Shreyas  Subramanya  @shreyas_subra    

 

Big Data / Hadoop Ecosystem

Hadoop  /  Spark  distribu0on  vendors    

•  Hortonworks  •  Cloudera  •  IBM  •  MapR  •  Pivotal  •  Databricks  (Spark)  

Other  Apache  Projects   SQL  Solu0ons   Third  Party  Apps  

Evaluating SQL-on-Hadoop Tools

1.  DifferenMators  2.  Relevant  features  3.  Performance      

SQL

Trying Out Different Tools

Typical  Approaches  and  Op0ons  1.  Download  virtual  machines  2.  Manually  install  on  physical  machines  (or  on  EC2)  3.  Use  cloud-­‐based  trial  versions  from  vendors      

Each Option Has its Own Challenges

 

•  Download  virtual  machines    

•  Manually  install  on  physical  machines  (or  on  EC2)  

•  Use  cloud-­‐based  trial  versions  from  vendors  

 Typical  Approaches  and  Op0ons    •  Need  a  fairly  beefy  laptop  to  try  mulMple  

products  

•  VM  images  can  be  huge  

•  Manual  installaMon  steps  could  be  tricky  with  dependency  management  and  lack  of  hardware  

•  Reusability  and  portability  (moving  to  producMon)  

•  Security,  cost,  and  scaling  

 Challenges  and  Pain  Points  

A DevOps Model for Agility & Scale

Performance  &  Scale  

Integrated  Environment  

Mul0-­‐Node  Cluster  

Single      Container  

Docker Containers

•  Each  Docker  image  is  a  package  of  a  complete  runMme  environment,  including  your  soUware,  libraries  and  other  tools  

•  Docker  containers  run  as  separate  processes  in  user  space  on  the  host,  sharing  the  kernel  

•  Eliminates  environment  inconsistencies  

•  Easy  distribuMon,  reduces  development  to  deployment  Mmes  

Source:  hWps://www.docker.com/what-­‐docker  

Virtual  machines   Containers  

Single Docker Container

•  Install  Docker  toolbox  for  your  Mac  or  Linux  machine  •  Download  a  Docker  image  from  Docker  Hub  •  Or  build  your  own  Docker  image  with  a  simple  set  of  instrucMons  

•  Run  it!  

Example: Drill Embedded (Demo)

Example: Drill Embedded (Demo)

Example: Drill Embedded (Demo)

A DevOps Model for Agility & Scale

Performance  &  Scale  

Integrated  Environment  

Mul0-­‐Node  Cluster  

Single      Container  

Multinode-multiuser systems

•  Clustering  and  orchestraMon  •  Resource,  container  management  •  Networking  •  Storage  •  ApplicaMon  management  (versioning,  upgrades)  •  Template  (infrastructure  as  code)  

Clustering frameworks

•  Puppet,  chef,  ansible  (orchestraMon)  •  Docker  swarm  •  Kubernetes  •  Mesos  •  Amazon  ECS  (cloud  formaMon)  •  Bluedata  

Simplifying Big Data Deployment

IOBoost™  -­‐  Extreme  performance  and  scalability  ElasMcPlane™  -­‐  Self-­‐service,  mulM-­‐tenant  clusters  

DataTap™  -­‐  In-­‐place  access  to  enterprise  data  stores  

Blue  Data  EPIC  SoKware  PlaMorm  MarkeMng   R&D   Sales   Manufacturing  Support  

BI/AnalyMcs  Tools  

NFS   Gluster   Object  Store  Remote  HDFS   CEPH  

Local  HDFS  

Deploying SQL-on-Hadoop Tools

•  Amazon  like  environment  on-­‐premise  

•  Many  big  data  applicaMons  are  available  out  of  the  box  

•  Bring  your  own  apps  

Single Physical Node (Demo)

Running Different SQL Tools (Demo) Spark  SQL  

Drill  

Impala  

Automated orchestration in Bluedata

•  Docker  image  •  Deployment  specificaMon  (metadata)  •  Glue  scripts  •  RegistraMon    

Authoring in BlueData

Key Takeaways and Next Steps

•  EvaluaMng  SQL-­‐on-­‐Hadoop  tools  can  be  challenging  •  Docker  containers  can  help  simplify  deployment  •  BlueData  enables  a  DevOps  model  for  Big  Data  apps  

ü   Spin  up  instant  clusters  using  Docker  images  ü   Evaluate  mulMple  Big  Data  tools  and  frameworks  ü   MulM-­‐tenant  deployment,  from  dev/test  to  prod  ü   Enterprise-­‐grade  security,  scalability,  performance  

THANK YOU

www.bluedata.com  Try  BlueData  EPIC  Lite  for  Free:  bluedata.com/free