Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
-
Author
in-memory-computing-summit -
Category
Technology
-
view
1.557 -
download
1
Embed Size (px)
Transcript of Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Nikita IVANOV Founder, PMC
Konstantin BoudnikVP Open Source Development, WANdisco Member of the Apache Software FoundationAccelerating the Hadoop data stack with Apache Ignite, Spark and Bigtophttp://ignite.apache.org
Nikita IvanovCTO, GridGain SystemsMember of the PMC for Apache Ignitehttp://bigtop.apache.org
2014 GridGain Systems, Inc.Apache Bigtop primerA project, environment, and a philosophy to:Define and create software stacks (think Debian)Deploy and validate actual software in the real worldConfiguration managementGuarantees of consistency and compatibilityEmpirical vs. Rationaldon't rely on someone's hearsaydon't assume an environment: control it
Apache Bigdata stackBigtop is the cutting edge of Apache Bigdata stackDelivers:A ready data processing stackDev. env. for anyone to create their ownFramework for easy integration/deployment/validationIt works on my laptop isn't cool anymore0.x release series was focused on Hadoop ecosystem
Let's get serious about IMCBigtop boards more & more IMC(-like) componentsProvides transitional tech for legacy MR-based usersHDFS accelerationMR accelerationUses RAM as inter-component data mediaCrossing component boundaries w/o leaving RAMAdvanced clustering and service models
Connecting the stackBigtop Data Fabric Core:Works with HDFS/RDBMS/MR/Hive/Hbase/Spark/Storm/SQLCluster memory is a natural media to exchange dataA usecase:Kafka --> Data Fabric --> HBase --> Data Fabric --> SQL querying --> Spark --> A service Singlethon --> Data Fabric --> RDBMS/FS
Connecting the ...
Live DemoDeploy Apache Ignite (incubating)Run MR Pi on YARNRun same MR Pi on Apache Ignite:Only client config needs to be changedRun MR TeragenRun MR Terasort on YARN vs Ignite IGFSRun Spark queries w/ & w/o Ignite caching
Results reviewWorkloadWithout IgniteWith IgniteMapReduce:CPU-bound (Pi)MapReduceIO-bound:(TeraGen/Sort)Spark SQL
In-Memory Data FabricStrategic Approach to IMCSupports Applications of various types and languagesOpen Source Apache 2.0Simple Java APIs1 JAR DependencyHigh Performance & ScaleAutomatic Fault ToleranceManagement/MonitoringRuns on Commodity HardwareSupports existing & new data sourcesNo need to rip & replace
2014 GridGain Systems, Inc. 2014 GridGain Systems, Inc.Plug and Play installation10x to 100x AccelerationIn-Memory Native MapReduceIn-Process Data ColocationIgniteFS In-Memory File SystemRead-Through from HDFSWrite-Through to HDFS Sync and Async PersistenceIn-Memory Hadoop Accelerator
2014 GridGain Systems, Inc.In-Memory Hadoop Accelerator
Zero Code ChangeIn-Memory Native PerformanceUse existing MR codeUse existing Pig/Hive queriesNo Name NodeEager Push Scheduling
2014 GridGain Systems, Inc.Spark Integration IGFS & Shared RDD
2014 GridGain Systems, Inc.ANY Questions?Thank you for joining us. Follow the conversation.http://bigtop.apache.orghttp://ignite.apache.org
#apacheignite
#apachebigtopKonstantin BoudnikVP Open Source Development, WANdisco Member of the Apache Software Foundation
Nikita IvanovCTO, GridGain SystemsMember of the PMC for Apache Ignite
2014 GridGain Systems, Inc.