A Big Data Journey: Bringing Open Source to Finance
-
Upload
slim-baltagi -
Category
Data & Analytics
-
view
185 -
download
1
Transcript of A Big Data Journey: Bringing Open Source to Finance
Slim Baltagi & Rick Fath
A Big Data Journey: Bringing
Open Source to Finance
Closing Keynote: Big Data Executive Summit
Chicago 11/28/2012
© 2012 CME Group. All rights reserved
Agenda
PART I – Hadoop at CME: Our Practical Experience
PART II - Bringing Hadoop to the Enterprise:
Challenges & Opportunities
PART I – Hadoop at CME: Our Practical Experience
3
Rick Fath
© 2012 CME Group. All rights reserved
PART I - Hadoop at CME: Our Practical Experience
1. What’s CME Group Inc.?
2. Big Data & CME Group: a natural fit!
3. Drivers for Hadoop adoption at CME Group
4. Key Big Data projects at CME Group
5. Key Learning’s
© 2012 CME Group. All rights reserved
Understanding What is Apache Hadoop 1. What’s CME Group Inc.?
© 2012 CME Group. All rights reserved 6
CME Group Overview
• #1 futures exchange in
the U.S. and globally by
2011 volume
• 2011 Revenue of $3.3bn
• 11.8 million contracts
per day
• Strong record of growth,
both organically and
through acquisitions
– Dow Jones Indexes
– S&P Indexes
– BM&FBOVESPA
– CME Clearing Europe
CME Europe Ltd
Combination is greater
than the sum of its parts
© 2012 CME Group. All rights reserved 7
Forging Partnerships to Expand Distribution, Build 24-Hour Liquidity, and Add New Customers
CBOT Black
Sea Wheat
Partnerships include:
• Equity investments
• Trade matching services
• Joint product development
• Order routing linkages
• Product licensing
• Joint marketing
• European clearing services
• Developing capabilities globally
• Expanding upon global benchmark products
• Positioned well within key strategic closed markets
•Recently announced application to
FSA for CME Europe Ltd. –
expected launch mid-2013
© 2012 CME Group. All rights reserved
2. Big Data & CME Group: a natural fit!
• CME Group is a Big Data factory: 11 million
contracts a day
• Real-time and historical data
• Growing partnerships around the globe
• Meeting new regulatory requirements (CFTC,
SEC).
• Trading shift from floor to electronic.
• Algorithmic trading improvements, high
volumes, low latency and historical trends.
© 2012 CME Group. All rights reserved
•Existing data solutions at CME do not scale
for Big Data.
•Reduce TCO – storage, support, data
quality.
•Reduce duplication of data (derive data on
demand fast).
•Grow the business with new insights on
new and old data.
•Enable business users (Ad Hoc queries,
define new datasets).
9
3. Drivers for Hadoop adoption at CME
© 2012 CME Group. All rights reserved
•Inflexible archival storage: data can’t be
queried.
•Reduce datasets silos
•Make data more accessible to the business.
•Offer new Market Data services.
•Maximize investment in existing IT assets
10
© 2012 CME Group. All rights reserved
Understanding What is Apache Hadoop
1. CME Group's BI solution: Now uses Data
Warehouse Appliance. Replaced Oracle DB by
Oracle Exadata.
2. CME Group’s Historical Market Data Platform:
Now uses Hadoop (HDFS) to store and Hive to
query market data instead of SAN and Oracle.
3. CME group’s Metrics & Monitoring Platform: Now
uses HBase for message storage and processing
instead of Oracle and SAN.
4. More Hadoop projects at CME in the pipeline.
4. Key Big Data projects at CME Group
© 2012 CME Group. All rights reserved
5. Key Learning’s:
•Embrace open source- “We are not the first
ones to solve this problem!!”.
•Start with a specific problem\opportunity to
allow your organization to learn, evolve, and
make mistakes on small scale.
•Know the costs (Enterprise Hadoop
distribution, hardware (server\network),
DR\Backup plan).
•Leverage the Hadoop community and network
with companies doing similar work.
12
© 2012 CME Group. All rights reserved
• Leverage the Hadoop Ecosystem.
• Promote education in your organization.
• Define Enterprise Data strategy.
• Hadoop pushed us to re-organize around
data: Hadoop, EDW, BI & Analytics teams
are now one consolidated team.
• Capture all data first and then figure out
new opportunities
• Better leverage MPP tools for their
intended use
• Strive for “Single Source of Truth”
13
PART II - Bringing Hadoop to the Enterprise:
Challenges & Opportunities
14
Slim Baltagi
© 2012 CME Group. All rights reserved
PART II - Bringing Hadoop to the Enterprise
1. What is Hadoop, what it isn’t and what it can help
you do?
2. What are the operational concerns and risks?
3. What organizational changes to expect?
4. What are the observed Hadoop trends?
© 2012 CME Group. All rights reserved
Understanding What is Apache Hadoop 1. What is Apache Hadoop?
• Hadoop: a software framework which transforms a
cluster of commodity hardware into services that:
• Reliably store big data via its Hadoop Distributed
File System (HDFS).
• Scalably process big data in parallel via its data
processing framework (MapReduce).
17
Hadoop overcomes the traditional limitations of
storage and compute:
•Commodity vs. Specialized hardware, Open
Source Software vs. Commercial Software, Any
data type vs. Structured databases
•Reliable: Hadoop automatically maintains
multiple copies of data and automatically
redeploys computing tasks based on failures.
•Scalable: Hadoop provides linear scalability, from
few to thousands servers, from Terabytes to
Petabytes
•Economical: Hadoop leverages cheaper servers
as the platform for storage instead of SAN and
Databases
© 2012 CME Group. All rights reserved
Understanding What is Apache Hadoop • Hadoop ecosystem includes a number of related
Apache open source projects.
• Apache Hadoop software is open source and free
and is developed by a global community.
• Hadoop is also offered in commercial distributions.
• Apache Hadoop is driving innovation and creating a
new market.
© 2012 CME Group. All rights reserved
Understanding What is Apache Hadoop What Apache Hadoop isn’t?
• Hadoop isn’t just a new open source technology
but a disruptive one and a paradigm shift.
• Hadoop (HDFS) isn't a dumping ground from which
to pull data chunks into existing data warehouses!
• Hadoop is not a silver bullet to solve all your big
data problems.
© 2012 CME Group. All rights reserved
Understanding What is Apache Hadoop What Hadoop can help you do?
• Hadoop helps you manage big data in a cost-
effective and scalable ways.
• Capture and store all raw data in any format
and size without a pre-defined schema (no schema
on write but schema on-read).
• Directly observe and experiment on raw big data.
© 2012 CME Group. All rights reserved
Understanding What is Apache Hadoop • Build a new Agile Enterprise Data Warehouse
without being forced to predict upfront what data
would be important and how to query it.
• Uncover new business opportunities and strategies
never possible before: correlating data sets,
querying structured and un-structured data.
• Query stored big data without the need to move
chunks of it.
© 2012 CME Group. All rights reserved
Understanding What is Apache Hadoop 2. What are the operational concerns & risks?
• Hadoop requires fundamental changes in the ways
business & technology have to work together.
• Hadoop development, deployment and
maintenance is inherently collaborative.
• Hadoop requires a broad new skill sets that
nobody has them all!
© 2012 CME Group. All rights reserved
Understanding What is Apache Hadoop • How to select a Hadoop distribution?
• Growing cluster means growing support fees. How
to lower future maintenance fees?
• How to efficiently integrate with existing IT
assets?
• Future project candidates (super cluster vs.
individual clusters?).
© 2012 CME Group. All rights reserved
Understanding What is Apache Hadoop • Deriving insight from your big data requires
mature processes and skills that most companies
lack.
• Hadoop requires new roles: Hadoop cluster
administrator, Integrator of Hadoop with existing
BI systems, Data analyst, Data scientist …
© 2012 CME Group. All rights reserved
Understanding What is Apache Hadoop 3. What organizational challenges to expect?
• Organizing people within the enterprise to deal
with Big Data is a challenge.
• Establish a Big Data Competency Center (BDCC):
• A multi-disciplinary and collaborative central team.
• Coordinates the use of the technology, the data assets and
big data analytics.
• Decision making by business analysts and executives.
© 2012 CME Group. All rights reserved
Understanding What is Apache Hadoop • With data being considered more and more as a
strategic asset, do companies need a dedicated
Chief Data Officer (CDO)?
• With Big Data, will such role become even more
visible and crucial?
• What specific Big Data responsibilities might a
CDO include?
© 2012 CME Group. All rights reserved
Understanding What is Apache Hadoop 4. What are the observed Hadoop trends?
• Apache Hadoop continues to get better and is
rapidly gaining adoption in many enterprises.
• Hadoop is a rapidly-maturing as the foundation of
Big Data solutions and enterprise-viable platform.
• More Hadoop training, books and support available.
• All big vendors are now supporting Hadoop.
•
© 2012 CME Group. All rights reserved
Understanding What is Apache Hadoop • More Hadoop solutions are available for agile ETL
and interactive/exploratory business intelligence.
• Hadoop is now supporting more programming
paradigms other than MapReduce.
• Hadoop is no more confined to batch processing
but is being expanded with near Real-Time query
and analytics.
• Convergence towards the Hadoop Data Warehouse.
© 2012 CME Group. All rights reserved
Thank you!
29
Questions?
30
© 2012 CME Group. All rights reserved 31
Futures trading is not suitable for all investors, and involves the risk of loss. Futures are a
leveraged investment, and because only a percentage of a contract’s value is required to trade,
it is possible to lose more than the amount of money deposited for a futures position. Therefore,
traders should only use funds that they can afford to lose without affecting their lifestyles. And
only a portion of those funds should be devoted to any one trade because they cannot expect to
profit on every trade.
The Globe Logo, CME®, Chicago Mercantile Exchange®, and Globex® are trademarks of
Chicago Mercantile Exchange Inc. CBOT® and the Chicago Board of Trade® are trademarks of
the Board of Trade of the City of Chicago. NYMEX, New York Mercantile Exchange, and
ClearPort are trademarks of New York Mercantile Exchange, Inc. COMEX is a trademark of
Commodity Exchange, Inc. CME Group is a trademark of CME Group Inc. All other trademarks
are the property of their respective owners.
The information within this presentation has been compiled by CME Group for general purposes
only. CME Group assumes no responsibility for any errors or omissions. Although every attempt
has been made to ensure the accuracy of the information within this presentation, CME Group
assumes no responsibility for any errors or omissions. Additionally, all examples in this
presentation are hypothetical situations, used for explanation purposes only, and should not be
considered investment advice or the results of actual market experience.
All matters pertaining to rules and specifications herein are made subject to and are superseded
by official CME, CBOT, NYMEX and CME Group rules. Current rules should be consulted in all
cases concerning contract specifications.