EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf ·...
Transcript of EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf ·...
EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS
Bay Area Big Data Meetup March 24, 2016 Shreyas Subramanya @shreyas_subra
Big Data / Hadoop Ecosystem
Hadoop / Spark distribu0on vendors
• Hortonworks • Cloudera • IBM • MapR • Pivotal • Databricks (Spark)
Other Apache Projects SQL Solu0ons Third Party Apps
Evaluating SQL-on-Hadoop Tools
1. DifferenMators 2. Relevant features 3. Performance
SQL
Trying Out Different Tools
Typical Approaches and Op0ons 1. Download virtual machines 2. Manually install on physical machines (or on EC2) 3. Use cloud-‐based trial versions from vendors
Each Option Has its Own Challenges
• Download virtual machines
• Manually install on physical machines (or on EC2)
• Use cloud-‐based trial versions from vendors
Typical Approaches and Op0ons • Need a fairly beefy laptop to try mulMple
products
• VM images can be huge
• Manual installaMon steps could be tricky with dependency management and lack of hardware
• Reusability and portability (moving to producMon)
• Security, cost, and scaling
Challenges and Pain Points
A DevOps Model for Agility & Scale
Performance & Scale
Integrated Environment
Mul0-‐Node Cluster
Single Container
Docker Containers
• Each Docker image is a package of a complete runMme environment, including your soUware, libraries and other tools
• Docker containers run as separate processes in user space on the host, sharing the kernel
• Eliminates environment inconsistencies
• Easy distribuMon, reduces development to deployment Mmes
Source: hWps://www.docker.com/what-‐docker
Virtual machines Containers
Single Docker Container
• Install Docker toolbox for your Mac or Linux machine • Download a Docker image from Docker Hub • Or build your own Docker image with a simple set of instrucMons
• Run it!
Example: Drill Embedded (Demo)
Example: Drill Embedded (Demo)
Example: Drill Embedded (Demo)
A DevOps Model for Agility & Scale
Performance & Scale
Integrated Environment
Mul0-‐Node Cluster
Single Container
Multinode-multiuser systems
• Clustering and orchestraMon • Resource, container management • Networking • Storage • ApplicaMon management (versioning, upgrades) • Template (infrastructure as code)
Clustering frameworks
• Puppet, chef, ansible (orchestraMon) • Docker swarm • Kubernetes • Mesos • Amazon ECS (cloud formaMon) • Bluedata
Simplifying Big Data Deployment
IOBoost™ -‐ Extreme performance and scalability ElasMcPlane™ -‐ Self-‐service, mulM-‐tenant clusters
DataTap™ -‐ In-‐place access to enterprise data stores
Blue Data EPIC SoKware PlaMorm MarkeMng R&D Sales Manufacturing Support
BI/AnalyMcs Tools
NFS Gluster Object Store Remote HDFS CEPH
Local HDFS
Deploying SQL-on-Hadoop Tools
• Amazon like environment on-‐premise
• Many big data applicaMons are available out of the box
• Bring your own apps
Single Physical Node (Demo)
Running Different SQL Tools (Demo) Spark SQL
Drill
Impala
Automated orchestration in Bluedata
• Docker image • Deployment specificaMon (metadata) • Glue scripts • RegistraMon
Authoring in BlueData
Key Takeaways and Next Steps
• EvaluaMng SQL-‐on-‐Hadoop tools can be challenging • Docker containers can help simplify deployment • BlueData enables a DevOps model for Big Data apps
ü Spin up instant clusters using Docker images ü Evaluate mulMple Big Data tools and frameworks ü MulM-‐tenant deployment, from dev/test to prod ü Enterprise-‐grade security, scalability, performance
THANK YOU
www.bluedata.com Try BlueData EPIC Lite for Free: bluedata.com/free