Bigger Data For Your Budget
-
Upload
hortonworks -
Category
Technology
-
view
597 -
download
2
description
Transcript of Bigger Data For Your Budget
V Dave Porter
Dave Porter – SproutCore Architect, [email protected]
Bigger Data For Your Budget
CANADIAN HEADQUARTERS152 West Hastings StreetVancouver BC, V6B 1G8
UNITED STATES OFFICE3414 Peachtree Road, #1600Atlanta Georgia, 30326-1164
UNITED KINGDOM OFFICE3000 Hillswood DriveHillswood Business ParkChertsey KT16 0RS, UK
How to turn your Big Data into Big Insights without breaking the bank
V Dave Porter
John KreisaVP Marketing, Hortonworks
Dave PorterSproutCore Architect,
Appnovation Technologies
Speakers
V Dave Porter
Appnovation is one of the world’s TOP OPEN SOURCE DEVELOPMENT SHOPS.
V Dave Porter
LOCATIONS
VANCOUVER OFFICE152 West Hastings StreetVancouver BC, V6B 1G8
ATLANTA OFFICE3414 Peachtree Road, #1600Atlanta Georgia, 30326-1164
LONDON OFFICE3000 Hillswood DriveHillswood Business ParkChertsey KT16 0RS, UK
V Dave Porter
V Dave Porter
Bigger DataFor Your Budget
V Dave Porter
DatabasesServer logs
Raw transactional dataHuman-Quality Input
WHAT IS BIG DATA?
V Dave Porter
Website Traffic Patterns
Financial Transactions
Science
People
WHERE IS IT COMING FROM?
V Dave Porter
V Dave Porter
Curing Cancer
Beating XDR-TB
Finding Earth 2.0 in Outer Space
Seeing Deeper Into Your Business
THE PROMISE OF BIG DATA
V Dave Porter
THE PROMISE OF BIG DATA
V Dave Porter
Retail Inventory System
WHAT CAN BIG DATA DO FOR ME?
V Dave Porter
Retail Inventory System
Overnight Batch Cycle
WHAT CAN BIG DATA DO FOR ME?
V Dave Porter
Retail Inventory System
Hourly Cycle
WHAT CAN BIG DATA DO FOR ME?
V Dave Porter
Collecting & Storing
Processing & Analyzing
THE BIG DATA CHALLENGES
V Dave Porter
Collecting & Storing…on expensive hardware
Processing & Analyzing…with expensive software
THE BIG DATA CHALLENGES
V Dave Porter
Bigger DataFor Your Budget
V Dave Porter
Open Source Software,
Running on Commodity Hardware.
BIGGER DATA FOR YOUR BUDGET
V Dave Porter
BIGGER DATA FOR YOUR BUDGET
V Dave Porter
Gnomes … with flashlights (and notepads)
HADOOP:BIGGER DATA FOR YOUR BUDGET
V Dave Porter
+
HADOOP:BIGGER DATA FOR YOUR BUDGET
© Hortonworks Inc. 2013
A Brief History of Apache Hadoop
Page 22
2013
Focus on INNOVATION2005: Yahoo! creates
team under E14 to work on Hadoop
Focus on OPERATIONS2008: Yahoo team extends focus to
operations to support multiple projects & growing clusters
Yahoo! begins to Operate at scale
EnterpriseHadoop
Apache Project Established
HortonworksData Platform
2004 2008 2010 20122006
STABILITY2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with
24 key Hadoop engineers from Yahoo
© Hortonworks Inc. 2013
Hortonworks Snapshot
Page 23
• We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform
• We engineer, test & certify HDP for enterprise usage
• We employ the core architects, builders and operators of Apache Hadoop
• We drive innovation within Apache Software Foundation projects
• We are uniquely positioned to deliver the highest quality of Hadoop support
• We enable the ecosystem to work better with Hadoop
Develop Distribute Support
We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution
Endorsed by Strategic Partners
Headquarters: Palo Alto, CAEmployees: 180+ and growingInvestors: Benchmark, Index, Yahoo
© Hortonworks Inc. 2013
Hortonworks Process for Enterprise Hadoop
Page 24
Upstream Community Projects Downstream Enterprise Product
HortonworksData Platform
Design & Develop
Distribute
Integrate & Test
Package & Certify
ApacheHCatalo
g
ApachePig
ApacheHBase
Other Apache Projects
ApacheHive
Apache Ambari
ApacheHadoop
Test &Patch
Design & Develop
Release
No Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects
Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream
Stable Project Releases
Fixed Issues
© Hortonworks Inc. 2013
Enhancing the Core of Apache Hadoop
Deliver high-scale storage & processing with enterprise-ready platform services
Unique Focus Areas:• Bigger, faster, more flexible
Continued focus on speed & scale and enabling near-real-time apps
• Tested & certified at scale Run ~1300 system tests on large Yahoo clusters for every release
• Enterprise-ready servicesHigh availability, disaster recovery, snapshots, security, …
Page 25
HADOOP CORE
Hortonworkers are the architects, operators, and builders of core Hadoop
Distributed Storage & Processing
PLATFORM SERVICES Enterprise Readiness
© Hortonworks Inc. 2013Page 26
HADOOP CORE
DATASERVICES
Provide data services to store, process & access data in many ways
Unique Focus Areas:• Apache HCatalog
Metadata services for consistent table access to Hadoop data
• Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools
Distributed Storage & Processing
Hortonworks enables Hadoop data to be accessed via existing tools & systems
Store, Process and Access Data
PLATFORM SERVICES Enterprise Readiness
Data Services for Full Data Lifecycle
© Hortonworks Inc. 2013
Operational Services for Ease of Use
Page 27
OPERATIONAL SERVICES
Include complete operational services for productive operations & management
Unique Focus Area:• Apache Ambari:
Provision, manage & monitor a cluster; complete REST APIs to integrate with existing operational tools; job & task visualizer to diagnose issues
Only Hortonworks provides a complete open source Hadoop management tool
Manage & Operate at
Scale
DATASERVICES
Store, Process and Access Data
HADOOP CORE Distributed Storage & Processing
PLATFORM SERVICES Enterprise Readiness
© Hortonworks Inc. 2013
OS Cloud VM Appliance
Page 28
PLATFORM SERVICES
HADOOP CORE
DATASERVICES
OPERATIONAL SERVICES
Manage & Operate at
Scale
Store, Process and Access Data
Enterprise Readiness
Only Hortonworks allows you to deploy seamlessly across any deployment option
• Linux & Windows• Azure, Rackspace & other clouds• Virtual platforms• Big data appliances
HORTONWORKS DATA PLATFORM (HDP)
Distributed Storage & Processing
Deployable Across a Range of Options
© Hortonworks Inc. 2013
OS Cloud VM Appliance
HDP: Enterprise Hadoop Distribution
Page 29
PLATFORM SERVICES
HADOOP CORE
DATASERVICES
OPERATIONAL SERVICES
Manage & Operate at
Scale
Store, Process and Access Data
HORTONWORKS DATA PLATFORM (HDP)
Distributed Storage & Processing
Hortonworks Data Platform (HDP)Enterprise Hadoop
• The ONLY 100% open source and complete distribution
• Enterprise grade, proven and tested at scale
• Ecosystem endorsed to ensure interoperability
Enterprise Readiness
© Hortonworks Inc. 2013
Existing Data Architecture
Page 30
APPL
ICAT
ION
SDA
TA S
YSTE
MS
TRADITIONAL REPOSRDBMS EDW MPP
DATA
SO
URC
ES
OLTP, POS SYSTEMS
OPERATIONALTOOLS
MANAGE & MONITOR
Traditional Sources (RDBMS, OLTP, OLAP)
DEV & DATATOOLS
BUILD & TEST
Business Analytics
Custom Applications
Enterprise Applications
© Hortonworks Inc. 2013
An Emerging Data Architecture
Page 31
APPL
ICAT
ION
SDA
TA S
YSTE
MS
TRADITIONAL REPOSRDBMS EDW MPP
DATA
SO
URC
ES
MOBILEDATA
OLTP, POS SYSTEMS
OPERATIONALTOOLS
MANAGE & MONITOR
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
DEV & DATATOOLS
BUILD & TEST
Business Analytics
Custom Applications
Enterprise Applications
HORTONWORKS DATA PLATFORM
© Hortonworks Inc. 2013
Interoperating With Your Tools
Page 32
APPL
ICAT
ION
SDA
TA S
YSTE
MS
TRADITIONAL REPOS
DEV & DATATOOLS
OPERATIONALTOOLS
Viewpoint
Microsoft Applications
HORTONWORKS DATA PLATFORM
DATA
SO
URC
ES
MOBILEDATA
OLTP, POS SYSTEMS
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
© Hortonworks Inc. 2013
Big DataTransactions, Interactions, Observations
Hadoop Patterns of Use
Page 33
Business Case
HORTONWORKS DATA PLATFORM
Refine Explore Enrich
© Hortonworks Inc. 2013
Operational Data Refinery
Page 34
DATA
SYS
TEM
SDA
TA S
OU
RCES
1
31 Capture
Capture all data
ProcessParse, cleanse, apply structure & transform
ExchangePush to existing data warehouse for use with existing analytic tools
2
3
Refine Explore Enrich
2
APPL
ICAT
ION
S
Collect data and apply a known algorithm to it in trusted operational process
TRADITIONAL REPOSRDBMS EDW MPP
HORTONWORKS DATA PLATFORM
Business Analytics
Custom Applications
Enterprise Applications
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
© Hortonworks Inc. 2013
Big Data Exploration & Visualization
Page 35
DATA
SYS
TEM
SDA
TA S
OU
RCES
Refine Explore Enrich
APPL
ICAT
ION
S
1 CaptureCapture all data
ProcessParse, cleanse, apply structure & transform
ExchangeExplore and visualize with analytics tools supporting Hadoop
2
3
Collect data and perform iterative investigation for value
3
2TRADITIONAL REPOS
RDBMS EDW MPP
1
HORTONWORKS DATA PLATFORM
Business Analytics
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
© Hortonworks Inc. 2013
Application Enrichment
Page 36
DATA
SYS
TEM
SDA
TA S
OU
RCES
Refine Explore Enrich
APPL
ICAT
ION
S
1 CaptureCapture all data
ProcessParse, cleanse, apply structure & transform
ExchangeIncorporate data directly into applications
2
3
Collect data, analyze and present salient results for online apps
3
1
2TRADITIONAL REPOS
RDBMS EDW MPP
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
Custom Applications
Enterprise Applications
NOSQL
HORTONWORKS DATA PLATFORM
V Dave Porter
John KreisaVP Marketing, Hortonworks
Dave PorterSproutCore Architect,
Appnovation Technologies
Speakers
V Dave Porter
Next Steps
Hortonworks.com/sandbox
Hortonworks.com/hadoop-training
@Appnovation
[email protected] [email protected]
@hortonworks@hortonworks_U
Appnovation.com/Blog
BlogLEARN
V Dave Porter
Thank You For Your Participation!
CANADIAN HEADQUARTERS152 West Hastings StreetVancouver BC, V6B 1G8
UNITED STATES OFFICE3414 Peachtree Road, #1600Atlanta Georgia, 30326-1164
UNITED KINGDOM OFFICE3000 Hillswood DriveHillswood Business ParkChertsey KT16 0RS, UK