Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop

of 13 /13
© 2014 GridGain Systems, Inc. KONSTANTIN BOUDNIK VP Open Source Development, WANdisco Member of the Apache Software Foundation Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop http:/ /ignite.apache.or g NIKITA IVANOV CTO, GridGain Systems Member of the PMC for Apache Ignite http:/ / bigtop .apache.org

Embed Size (px)

Transcript of Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop

Nikita IVANOV Founder, PMC

Konstantin BoudnikVP Open Source Development, WANdisco Member of the Apache Software FoundationAccelerating the Hadoop data stack with Apache Ignite, Spark and Bigtophttp://ignite.apache.org

Nikita IvanovCTO, GridGain SystemsMember of the PMC for Apache Ignitehttp://bigtop.apache.org

2014 GridGain Systems, Inc.Apache Bigtop primerA project, environment, and a philosophy to:Define and create software stacks (think Debian)Deploy and validate actual software in the real worldConfiguration managementGuarantees of consistency and compatibilityEmpirical vs. Rationaldon't rely on someone's hearsaydon't assume an environment: control it

Apache Bigdata stackBigtop is the cutting edge of Apache Bigdata stackDelivers:A ready data processing stackDev. env. for anyone to create their ownFramework for easy integration/deployment/validationIt works on my laptop isn't cool anymore0.x release series was focused on Hadoop ecosystem

Let's get serious about IMCBigtop boards more & more IMC(-like) componentsProvides transitional tech for legacy MR-based usersHDFS accelerationMR accelerationUses RAM as inter-component data mediaCrossing component boundaries w/o leaving RAMAdvanced clustering and service models

Connecting the stackBigtop Data Fabric Core:Works with HDFS/RDBMS/MR/Hive/Hbase/Spark/Storm/SQLCluster memory is a natural media to exchange dataA usecase:Kafka --> Data Fabric --> HBase --> Data Fabric --> SQL querying --> Spark --> A service Singlethon --> Data Fabric --> RDBMS/FS

Connecting the ...

Live DemoDeploy Apache Ignite (incubating)Run MR Pi on YARNRun same MR Pi on Apache Ignite:Only client config needs to be changedRun MR TeragenRun MR Terasort on YARN vs Ignite IGFSRun Spark queries w/ & w/o Ignite caching

Results reviewWorkloadWithout IgniteWith IgniteMapReduce:CPU-bound (Pi)MapReduceIO-bound:(TeraGen/Sort)Spark SQL

In-Memory Data FabricStrategic Approach to IMCSupports Applications of various types and languagesOpen Source Apache 2.0Simple Java APIs1 JAR DependencyHigh Performance & ScaleAutomatic Fault ToleranceManagement/MonitoringRuns on Commodity HardwareSupports existing & new data sourcesNo need to rip & replace

2014 GridGain Systems, Inc. 2014 GridGain Systems, Inc.Plug and Play installation10x to 100x AccelerationIn-Memory Native MapReduceIn-Process Data ColocationIgniteFS In-Memory File SystemRead-Through from HDFSWrite-Through to HDFS Sync and Async PersistenceIn-Memory Hadoop Accelerator

2014 GridGain Systems, Inc.In-Memory Hadoop Accelerator

Zero Code ChangeIn-Memory Native PerformanceUse existing MR codeUse existing Pig/Hive queriesNo Name NodeEager Push Scheduling

2014 GridGain Systems, Inc.Spark Integration IGFS & Shared RDD

2014 GridGain Systems, Inc.ANY Questions?Thank you for joining us. Follow the conversation.http://bigtop.apache.orghttp://ignite.apache.org

#apacheignite

#apachebigtopKonstantin BoudnikVP Open Source Development, WANdisco Member of the Apache Software Foundation

Nikita IvanovCTO, GridGain SystemsMember of the PMC for Apache Ignite

2014 GridGain Systems, Inc.