Hortonworks roadshow

Click here to load reader

  • date post

    10-May-2015
  • Category

    Documents

  • view

    1.355
  • download

    3

Embed Size (px)

Transcript of Hortonworks roadshow

  • 1.Big Business Value fromBig Data and Hadoop Hortonworks Inc. 2012 Page 1

2. Topics The Big Data Explosion: Hype or Reality Introduction to Apache Hadoop The Business Case for Big Data Page 2 Hortonworks Inc. 2012 3. Big Data: Hype or Reality? Hortonworks Inc. 2012Page 3 4. What is Big Data?What is Big Data?Page 4 Hortonworks Inc. 2012 5. Big Data: Changing The Game for Organizations Transactions + InteractionsPetabytesBIG DATA Mobile Web+ Observations SentimentUser Click StreamSMS/MMS = BIG DATA Speech to TextSocial Interactions & FeedsTerabytes WEBWeb logs Spatial & GPS Coordinates A/B testingSensors / RFID / DevicesBehavioral Targeting GigabytesCRM Business Data Feeds Dynamic Pricing Segmentation External DemographicsSearch Marketing Customer TouchesUser Generated ContentERP Megabytes Affiliate Networks Purchase detailSupport ContactsHD Video, Audio, Images Dynamic Funnels Purchase recordOffer detailsOffer historyProduct/Service Logs Payment recordIncreasing Data Variety and Complexity Hortonworks Inc. 2012 6. Next Generation Data Platform DriversOrganizations will need to become more data driven to compete Business Enable new business models & drive faster growth (20%+)Drivers Find insights for competitive advantage & optimal returnsData continues to grow exponentially TechnicalData is increasingly everywhere and in many formats DriversLegacy solutions unfit for new requirements growth FinancialCost of data systems, as % of IT spend, continues to grow DriversCost advantages of commodity hardware & open source Hortonworks Inc. 2012 7. What is a Data Driven Business? DEFINITIONBetter use of available data in the decision making process RULEKey metrics derived from data should be tied to goals PROVEN RESULTSFirms that adopt Data-Driven Decision Making have output andproductivity that is 5-6% higher than what would be expectedgiven their investments and usage of information technology*1110010100001010011101010100010010100100101001001000010010001001000001000100000100010010010001000010111000010010001000101001001011110101001000100100101001010010011111001010010100011111010001001010000010010001010010111101010011001001010010001000111* Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance? Brynjolfsson, Hitt and Kim (April 22, 2011)Page 7 Hortonworks Inc. 2012 8. Path to Becoming Big Data Driven4 Key Considerations for a Data Driven Business1.Large web properties were born this way, you may have to adapt a strategy2.Start with a project tied to a key objective or KPI Dont OVER engineer3.Make sure your Big Data strategy fits your organization and grow it over time4.Dont do big data just to do big data you can get lost in all that data Simply put, because of big data, managers can measure, and hence know, radically more about their businesses, and directly translate that knowledge into improved decision making & performance.- Erik Brynjolfsson and Andrew McAfee Page 8 Hortonworks Inc. 2012 9. Wide Range of New TechnologiesIn Memory StorageNoSQL BigTable MPPIn Memory DBCloudHadoop HBasePig Apache MapReduceYARNScale-out StoragePage 9 Hortonworks Inc. 2012 10. A Few Definitions of New TechnologiesNoSQLA broad range of new database management technologies mostlyfocused on low latency access to large amounts of data.MPP or in database analyticsWidely used approach in data warehouses that incorporatesanalytic algorithms with the data and then distribute processingCloudThe use of computing resources that are delivered as a serviceover a networkHadoopOpen source data management software that combines massivelyparallel computing and highly-scalable distributed storage Page 10 Hortonworks Inc. 2012 11. Big Data: Its About Scale & Structure SCALE (storage & processing) STRUCTURETraditional MPP Hadoop EDW NoSQLDatabaseAnalytics Distribution Standards and structuredgovernance Loosely structured Limited, no data processing processing Processing coupled w/ data Structured data types Multi and unstructuredA JDBC connector to Hive does not make you big dataPage 11 Hortonworks Inc. 2012 12. Hadoop Market: Growing & Evolving Big data outranks virtualization as #1 trend driving spendingIndustry UnstructuredData, $6 initiatives Applications, Hadoop, $14 $12 Barclays CIO survey April 2012NoSQL, $2 Enterprise Overall market at $100B Applications, $5 Hadoop 2nd only to RDBMS in potential Storage for DB, $9 Estimates put market growth > 40% CAGR RDBMS, $27 Server for DB, IDC expects Big Data tech and services $12market to grow to $16.9B in 2015Data IntegrationBusiness According to JPMC 50% of Big Data& Quality, $4Intelligence, $13market will be influenced by Hadoop Source: Big Data II New Themes, New Opportunities,BofA Merrill Lynch, March 2012 Page 12 Hortonworks Inc. 2012 13. Hadoop: Poised for Rapid Growth Hadoop is in the chasm Ecosystem evolvingcustomers relative % Enterprises endure 1-3 year adoption cycleThe CHASMInnovators,Early Early Late majority,Laggards,technology adopters,majority, conservativesSkepticsenthusiasts visionaries pragmatiststimeCustomers wantCustomers wanttechnology & performancesolutions & convenience Source: Geoffrey Moore - Crossing the Chasm Page 13 Hortonworks Inc. 2012 14. Whats Needed to Drive Success? Enterprise tooling to become a complete data platform Open deployment & provisioning Higher quality data loading Monitoring and management APIs for easy integrationwww.hortonworks.com/moore Ecosystem needs support & development Existing infrastructure vendors need to continue to integrate Apps need to continue to be developed on this infrastructure Market needs to rally around core Apache Hadoop To avoid further splintering/market distraction To accelerate adoptionPage 14 Hortonworks Inc. 2012 15. What is Hadoop? Hortonworks Inc. 2012 Page 15 16. What is Apache Hadoop?Open source data management software that combines massively parallel computingand highly-scalable distributed storage to provide high performance computationacross large amounts of dataPage 16 Hortonworks Inc. 2012 17. Big Data Driven Means a New Data PlatformFacilitate data captureCollect data from all sources - structured and unstructured dataAt all speeds batch, asynchronous, streaming, real-timeComprehensive data processingTransform, refine, aggregate, analyze, reportSimplified data exchangeInteroperate with enterprise data systems for query and processingShare data with analytic applicationsData between analystEasy to operate platformProvision, monitor, diagnose, manage at scaleReliability, availability, affordability, scalability, interoperabilityPage 17 Hortonworks Inc. 2012 18. Enterprise Data Platform RequirementsA data platform is an integrated set of components thatallows you to capture, process and share data in anyformat, at scaleCapture Store and integrate data sources in real time and batch Process and manage data of any size and includes tools to Process sort, filter, summarize and apply basic functions to the dataExchange Open and promote the exchange of data with new & existing external applicationsOperate Includes tools to manage and operate and gain insight into performance and assure availability Page 18 Hortonworks Inc. 2012 19. Apache HadoopOpen Source data management with scale-out storage & processingProcessingStorageMap ReduceHDFS Distributed across nodes Splits a task across processors near the data & assembles results Natively redundant Self-Healing, High Bandwidth Name node tracks locationsClustered Storage Page 19 Hortonworks Inc. 2012 20. Apache Hadoop CharacteristicsScalable Efficiently store and process petabytes of data Linear scale driven by additional processing and storageReliable Redundant storage Failover across nodes and racksFlexible Store all types of data in any format Apply schema on analysis and sharing of the dataEconomical Use commodity hardware Open source software guards against vendor lock-inPage 20 Hortonworks Inc. 2012 21. Tale of the Tape RelationalVS.Hadoop Required on writeschemaRequired on read Reads are fastspeedWrites are fast Standards and structuredgovernance Loosely structured Limited, no data processing processing Processing coupled with dataStructured data types Multi and unstructuredInteractive OLAP AnalyticsData Discovery Complex ACID Transactions best fit use Processing unstructured dataOperational Data StoreMassive Storage/Processing Page 21 Hortonworks Inc. 2012 22. What is a Hadoop Distribution TalendWebHDFS SqoopFlumeA complimentary set HCatalogof open source HBasePig Hivetechnologies that MapReduceHDFSmake up a completeAmbariOozie HAdata platformZooKeeper Tested and pre-packaged to ease installation and usage Collects the right versions of the components that all have different release cycles and ensures they work together Hortonworks Inc. 2012 23. Apache Hadoop Related ProjectsTalend WebHDFSSqoopFlumeHCatalog HBase Capture Pig Hive MapReduceHDFS Ambari OozieHAZooKeeperProcessWebHDFSExchange REST API that supports the complete FileSystem interface for HDFS. Move data in and out and delete from HDFS Perform file and directory functionsOperate webhdfs://:/PATH Standard and included in version 1.0 of Hadoop Page 23 Hortonworks Inc. 2012 24. Apache Hadoop Related Projects Talend WebHDFS SqoopFlumeHCatalog HBase Capture Pig Hive MapReduceHDFSAmbariOozieHAZooKeeperProcessApache SqoopExchange Sqoop is a set of tools that allow non-Hadoop data stores to interact with traditional relational databases and data warehouses. A series of connectors have been created to support explicit sources such as Oracle & TeradataOperate It moves data in and out of Hadoop SQ-OOP: SQL to HadoopPage 24 Hortonworks Inc. 2012 25. Apache Hadoop Related Projects TalendWebHDFS Sqoop Flume HCatalogHBase CapturePig Hive MapReduce HDFS AmbariOozieHA ZooKeeperProcessApache FlumeExchange Distributed service for efficiently collecting, aggregating, and moving stre