Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011...

16
Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    233
  • download

    3

Transcript of Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011...

Page 1: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

Hortonworks

Eric Baldeschwieler – CEOtwitter: @jeric14 (@hortonworks)

© Hortonworks Inc. 2011

Architecting the Future of Big Data

June 29, 2011

Page 2: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

© Hortonworks Inc. 2011

About Hortonworks

• Mission: Revolutionize and commoditize the storage and processing of big data via open source

• Vision: Half of the world’s data will be stored in Apache Hadoop within five years

• Strategy: Grow the Apache Hadoop Ecosystem by making Apache Hadoop easier to consume, profit by providing training, support and certification

An independent company Focused on making Apache Hadoop great Hold nothing back, Apache Hadoop will be complete

2

Page 3: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

© Hortonworks Inc. 2011

Credentials

• Technical: key architects and committers from Yahoo! Hadoop engineering team

−Highest concentration of Apache Hadoop committers−Contributed >70% of the code in Hadoop, Pig and ZooKeeper−Delivered every major/stable Apache Hadoop release since 0.1−History of driving innovation across entire Apache Hadoop stack−Experience managing world’s largest deployment

• Business operations: team of highly successful open source veterans

−Led by Rob Bearden, former COO of SpringSource & JBoss

• Investors: backed by Benchmark Capital and Yahoo!

−Benchmark was key investor in Red Hat, MySQL, SpringSource, Twitter & eBay

3

Page 4: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

Hortonworks and Yahoo!

• Yahoo! is a development partner−Leverage large Yahoo! development, testing & operations team

More than 1,000 active & sophisticated users of Apache Hadoop Access to the Yahoo! grid for testing large workloads Only organization that has delivered a stable release of Apache Hadoop

−Yahoo will continue to contribute Apache Hadoop code too!

• Yahoo! is a customer−Hortonworks provides level 3 support and training to Yahoo!−Yahoo deploys Apache Hadoop releases across its 42,000 grid

• Yahoo! is an investor

© Hortonworks Inc. 2011 4

Page 5: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

Current State of Adoption

Vendor Ecosystem Adoption

Enterprise Adoption

• Early adopters• Technology is hard to install,

manage & use• Technology lacks enterprise

robustness• Requires significant

investment in technical staff or consulting

• Hard to find & hire experienced developer & operations talent

• Early in vendor adoption lifecycle

• Hadoop is hard to integrate and extend

• Hard to find & hire experienced developer & operations talent

Technology & Knowledge Gaps Prevent Apache

Hadoop from Reaching Full Potential

Customers are asking their vendors for help with

Hadoop!

“We’re seeing Hadoop in all of our fortune 2000 data

accounts”

© Hortonworks Inc. 2011 5

Page 6: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

Hortonworks Role & Opportunity

Vendor Ecosystem Adoption

Enterprise Adoption

Bridge the Gap!Grow Market

Sell training and support via

Partners

Fundamental shift in enterprise data architecture strategy• Apache Hadoop becomes standard for managing new types & scale of data• New applications & solutions will be created to leverage data in Apache Hadoop• Creates massive big data technology and services opportunity for ecosystem

© Hortonworks Inc. 2011 6

Page 7: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

Hortonworks Objectives

• Make Apache Hadoop projects easier to install, manage & use−Regular sustaining releases−Compiled code for each project (e.g. RPMs)−Testing at scale

• Make Apache Hadoop more robust−Performance gains−High availability−Administration & monitoring

• Make Apache Hadoop easier to integrate & extend−Open APIs for extension & experimentation

All done within Apache Hadoop community

• Develop collaboratively with community

• Complete transparency• All code contributed

back to Apache

Anyone should be able to easily deploy the Hadoop projects directly from Apache

© Hortonworks Inc. 2011 7

Page 8: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

Technology Roadmap

Phase 1 – Making Apache Hadoop Accessible• Release the most stable version of Hadoop ever• Release directly usable code via Apache (RPMs, .debs…)• Frequent sustaining releases off of the stable branches

2011

Phase 2 – Next Generation Apache Hadoop• Address key product gaps (Hbase support, HA, Management…)• Enable community & partner innovation via modular architecture &

open APIs• Work with community to define integrated stack

2012(Alphas starting Oct 2011)

© Hortonworks Inc. 2011 8

Page 9: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

© Hortonworks Inc. 2011

Phase 2 - Next Generation Apache Hadoop

• Core−HDFS Federation−Next Gen MapReduce−New Write Pipeline (HBase support)−HA (no SPOF) and Wire compatibility

• Data - HCatalog 0.3−Pig, Hive, MapReduce and Streaming as clients−HDFS and HBase as storage systems−Performance and storage improvements

• Management & Ease of use−All components fully tested and deployable as a stack−Stack installation and centralized config management−REST and GUI for user tasks

9

Page 10: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

Phase 2 – Core - MapReduce

• Complete rewrite of the resource management layer

• Performance and Scale improvements

• 6,000+ nodes / 100,000 concurrent tasks

• Supports better availability and fail-over

• Supports new frameworks beyond MapReduce© Hortonworks Inc. 2011 10

Page 11: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

Phase 2 – Core – HDFS Federation

• Multiple independent Namenodes and Namespace Volumes in a cluster

− Scalability (6K nodes, 100K clients, 120PB disk), Workload isolation support− Client side mount tables for Global Namespace

• Block storage as a generic shared storage service

− DataNodes store blocks for all Namespace volumes – no partitioning− Non-HDFS namespaces (HBase, MR tmp and others) can share the same storage

Datanode 1 Datanode 2 Datanode m... ... ...

NS1Foreign

NS n... ...

NS k

Balancer

Block Pools

Pool nPool kPool 1

NN-1 NN-k NN-n

Common Storage

Nam

espa

ceBl

ock

stor

age

© Hortonworks Inc. 2011 11

Page 12: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

• Limitations of HDFS write pipeline in 0.20−Broken Flush, Sync, Append−Node failures can cause data loss for slow writers

• Hadoop.Next−Flush, Sync, and Append support−New replicas are added dynamically on failures

Phase 2 – Core – HDFS Write Pipeline

DNDN DNClient

Flush Ack

© Hortonworks Inc. 2011 12

Page 13: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

Phase 2 – Data – HCatalog

HDFS HBase

HCatalog

MapReduce Pig Hive Streaming

= Phase 1

= Phase 2

• Shared schema and data model• Data can be shared between tool users• Data located by table rather than file• Clients independent of storage details

• format, compression, …• Only one adaptor for new formats

• not one per tool• Notifications when new data is

available

© Hortonworks Inc. 2011 13

Page 14: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

Confidential Information

Hortonworks Value

For Enterprises

• Make Apache Hadoop easier to consume

• Extend to broader developer audience

• Foster vibrant technology and services ecosystem

• Access to Hortonworks’ technical expertise

For Vendors

• Create larger market for Apache Hadoop technology and services

• Simplify process for supporting Hadoop

• Access to Hortonworks’ technical expertise

For Community

• Ensure Apache Hadoop remains unified and strong

• Expand value provided by core Apache projects

• Foster additional participation & contributions from ecosystem

14

Page 15: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

Hortonworks Differentiation

• Unmatched domain expertise−Delivered every major release of Apache Hadoop to date−Critical mass of committers

• Community leadership role −Setting direction for core projects

• Yahoo! commitment and backing−Access to 1,000+ Hadoop engineers, Yahoo! grid

• Absolute dedication to Apache & open source−Focused on making Apache Hadoop the standard

• Focus on delivering significant value to technology vendors−ISVs, OEMs, Systems Integrators and other service providers

Confidential Information 15

Page 16: Hortonworks Eric Baldeschwieler – CEO twitter: @jeric14 (@hortonworks) © Hortonworks Inc. 2011 Architecting the Future of Big Data June 29, 2011.

Thank You.

© Hortonworks Inc. 2011