A Big Data Journey: Bringing Open Source to Finance

31
Slim Baltagi & Rick Fath A Big Data Journey: Bringing Open Source to Finance Closing Keynote: Big Data Executive Summit Chicago 11/28/2012

Transcript of A Big Data Journey: Bringing Open Source to Finance

Page 1: A Big Data Journey: Bringing Open Source to Finance

Slim Baltagi & Rick Fath

A Big Data Journey: Bringing

Open Source to Finance

Closing Keynote: Big Data Executive Summit

Chicago 11/28/2012

Page 2: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Agenda

PART I – Hadoop at CME: Our Practical Experience

PART II - Bringing Hadoop to the Enterprise:

Challenges & Opportunities

Page 3: A Big Data Journey: Bringing Open Source to Finance

PART I – Hadoop at CME: Our Practical Experience

3

Rick Fath

Page 4: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

PART I - Hadoop at CME: Our Practical Experience

1. What’s CME Group Inc.?

2. Big Data & CME Group: a natural fit!

3. Drivers for Hadoop adoption at CME Group

4. Key Big Data projects at CME Group

5. Key Learning’s

Page 5: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Understanding What is Apache Hadoop 1. What’s CME Group Inc.?

Page 6: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved 6

CME Group Overview

• #1 futures exchange in

the U.S. and globally by

2011 volume

• 2011 Revenue of $3.3bn

• 11.8 million contracts

per day

• Strong record of growth,

both organically and

through acquisitions

– Dow Jones Indexes

– S&P Indexes

– BM&FBOVESPA

– CME Clearing Europe

CME Europe Ltd

Combination is greater

than the sum of its parts

Page 7: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved 7

Forging Partnerships to Expand Distribution, Build 24-Hour Liquidity, and Add New Customers

CBOT Black

Sea Wheat

Partnerships include:

• Equity investments

• Trade matching services

• Joint product development

• Order routing linkages

• Product licensing

• Joint marketing

• European clearing services

• Developing capabilities globally

• Expanding upon global benchmark products

• Positioned well within key strategic closed markets

•Recently announced application to

FSA for CME Europe Ltd. –

expected launch mid-2013

Page 8: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

2. Big Data & CME Group: a natural fit!

• CME Group is a Big Data factory: 11 million

contracts a day

• Real-time and historical data

• Growing partnerships around the globe

• Meeting new regulatory requirements (CFTC,

SEC).

• Trading shift from floor to electronic.

• Algorithmic trading improvements, high

volumes, low latency and historical trends.

Page 9: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

•Existing data solutions at CME do not scale

for Big Data.

•Reduce TCO – storage, support, data

quality.

•Reduce duplication of data (derive data on

demand fast).

•Grow the business with new insights on

new and old data.

•Enable business users (Ad Hoc queries,

define new datasets).

9

3. Drivers for Hadoop adoption at CME

Page 10: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

•Inflexible archival storage: data can’t be

queried.

•Reduce datasets silos

•Make data more accessible to the business.

•Offer new Market Data services.

•Maximize investment in existing IT assets

10

Page 11: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Understanding What is Apache Hadoop

1. CME Group's BI solution: Now uses Data

Warehouse Appliance. Replaced Oracle DB by

Oracle Exadata.

2. CME Group’s Historical Market Data Platform:

Now uses Hadoop (HDFS) to store and Hive to

query market data instead of SAN and Oracle.

3. CME group’s Metrics & Monitoring Platform: Now

uses HBase for message storage and processing

instead of Oracle and SAN.

4. More Hadoop projects at CME in the pipeline.

4. Key Big Data projects at CME Group

Page 12: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

5. Key Learning’s:

•Embrace open source- “We are not the first

ones to solve this problem!!”.

•Start with a specific problem\opportunity to

allow your organization to learn, evolve, and

make mistakes on small scale.

•Know the costs (Enterprise Hadoop

distribution, hardware (server\network),

DR\Backup plan).

•Leverage the Hadoop community and network

with companies doing similar work.

12

Page 13: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

• Leverage the Hadoop Ecosystem.

• Promote education in your organization.

• Define Enterprise Data strategy.

• Hadoop pushed us to re-organize around

data: Hadoop, EDW, BI & Analytics teams

are now one consolidated team.

• Capture all data first and then figure out

new opportunities

• Better leverage MPP tools for their

intended use

• Strive for “Single Source of Truth”

13

Page 14: A Big Data Journey: Bringing Open Source to Finance

PART II - Bringing Hadoop to the Enterprise:

Challenges & Opportunities

14

Slim Baltagi

Page 15: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

PART II - Bringing Hadoop to the Enterprise

1. What is Hadoop, what it isn’t and what it can help

you do?

2. What are the operational concerns and risks?

3. What organizational changes to expect?

4. What are the observed Hadoop trends?

Page 16: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Understanding What is Apache Hadoop 1. What is Apache Hadoop?

• Hadoop: a software framework which transforms a

cluster of commodity hardware into services that:

• Reliably store big data via its Hadoop Distributed

File System (HDFS).

• Scalably process big data in parallel via its data

processing framework (MapReduce).

Page 17: A Big Data Journey: Bringing Open Source to Finance

17

Hadoop overcomes the traditional limitations of

storage and compute:

•Commodity vs. Specialized hardware, Open

Source Software vs. Commercial Software, Any

data type vs. Structured databases

•Reliable: Hadoop automatically maintains

multiple copies of data and automatically

redeploys computing tasks based on failures.

•Scalable: Hadoop provides linear scalability, from

few to thousands servers, from Terabytes to

Petabytes

•Economical: Hadoop leverages cheaper servers

as the platform for storage instead of SAN and

Databases

Page 18: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Understanding What is Apache Hadoop • Hadoop ecosystem includes a number of related

Apache open source projects.

• Apache Hadoop software is open source and free

and is developed by a global community.

• Hadoop is also offered in commercial distributions.

• Apache Hadoop is driving innovation and creating a

new market.

Page 19: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Understanding What is Apache Hadoop What Apache Hadoop isn’t?

• Hadoop isn’t just a new open source technology

but a disruptive one and a paradigm shift.

• Hadoop (HDFS) isn't a dumping ground from which

to pull data chunks into existing data warehouses!

• Hadoop is not a silver bullet to solve all your big

data problems.

Page 20: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Understanding What is Apache Hadoop What Hadoop can help you do?

• Hadoop helps you manage big data in a cost-

effective and scalable ways.

• Capture and store all raw data in any format

and size without a pre-defined schema (no schema

on write but schema on-read).

• Directly observe and experiment on raw big data.

Page 21: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Understanding What is Apache Hadoop • Build a new Agile Enterprise Data Warehouse

without being forced to predict upfront what data

would be important and how to query it.

• Uncover new business opportunities and strategies

never possible before: correlating data sets,

querying structured and un-structured data.

• Query stored big data without the need to move

chunks of it.

Page 22: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Understanding What is Apache Hadoop 2. What are the operational concerns & risks?

• Hadoop requires fundamental changes in the ways

business & technology have to work together.

• Hadoop development, deployment and

maintenance is inherently collaborative.

• Hadoop requires a broad new skill sets that

nobody has them all!

Page 23: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Understanding What is Apache Hadoop • How to select a Hadoop distribution?

• Growing cluster means growing support fees. How

to lower future maintenance fees?

• How to efficiently integrate with existing IT

assets?

• Future project candidates (super cluster vs.

individual clusters?).

Page 24: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Understanding What is Apache Hadoop • Deriving insight from your big data requires

mature processes and skills that most companies

lack.

• Hadoop requires new roles: Hadoop cluster

administrator, Integrator of Hadoop with existing

BI systems, Data analyst, Data scientist …

Page 25: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Understanding What is Apache Hadoop 3. What organizational challenges to expect?

• Organizing people within the enterprise to deal

with Big Data is a challenge.

• Establish a Big Data Competency Center (BDCC):

• A multi-disciplinary and collaborative central team.

• Coordinates the use of the technology, the data assets and

big data analytics.

• Decision making by business analysts and executives.

Page 26: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Understanding What is Apache Hadoop • With data being considered more and more as a

strategic asset, do companies need a dedicated

Chief Data Officer (CDO)?

• With Big Data, will such role become even more

visible and crucial?

• What specific Big Data responsibilities might a

CDO include?

Page 27: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Understanding What is Apache Hadoop 4. What are the observed Hadoop trends?

• Apache Hadoop continues to get better and is

rapidly gaining adoption in many enterprises.

• Hadoop is a rapidly-maturing as the foundation of

Big Data solutions and enterprise-viable platform.

• More Hadoop training, books and support available.

• All big vendors are now supporting Hadoop.

Page 28: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Understanding What is Apache Hadoop • More Hadoop solutions are available for agile ETL

and interactive/exploratory business intelligence.

• Hadoop is now supporting more programming

paradigms other than MapReduce.

• Hadoop is no more confined to batch processing

but is being expanded with near Real-Time query

and analytics.

• Convergence towards the Hadoop Data Warehouse.

Page 29: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved

Thank you!

29

Page 30: A Big Data Journey: Bringing Open Source to Finance

Questions?

30

Page 31: A Big Data Journey: Bringing Open Source to Finance

© 2012 CME Group. All rights reserved 31

Futures trading is not suitable for all investors, and involves the risk of loss. Futures are a

leveraged investment, and because only a percentage of a contract’s value is required to trade,

it is possible to lose more than the amount of money deposited for a futures position. Therefore,

traders should only use funds that they can afford to lose without affecting their lifestyles. And

only a portion of those funds should be devoted to any one trade because they cannot expect to

profit on every trade.

The Globe Logo, CME®, Chicago Mercantile Exchange®, and Globex® are trademarks of

Chicago Mercantile Exchange Inc. CBOT® and the Chicago Board of Trade® are trademarks of

the Board of Trade of the City of Chicago. NYMEX, New York Mercantile Exchange, and

ClearPort are trademarks of New York Mercantile Exchange, Inc. COMEX is a trademark of

Commodity Exchange, Inc. CME Group is a trademark of CME Group Inc. All other trademarks

are the property of their respective owners.

The information within this presentation has been compiled by CME Group for general purposes

only. CME Group assumes no responsibility for any errors or omissions. Although every attempt

has been made to ensure the accuracy of the information within this presentation, CME Group

assumes no responsibility for any errors or omissions. Additionally, all examples in this

presentation are hypothetical situations, used for explanation purposes only, and should not be

considered investment advice or the results of actual market experience.

All matters pertaining to rules and specifications herein are made subject to and are superseded

by official CME, CBOT, NYMEX and CME Group rules. Current rules should be consulted in all

cases concerning contract specifications.