Paul Preuveneers – Principal Technologist Lee Pollington – Principal Consultant

22
1 This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved. Paul Preuveneers – Principal Technologist Lee Pollington – Principal Consultant The Only Operational Database Technology for Mission-Critical Big Data Applications

description

The Only Operational Database Technology for Mission-Critical Big Data Applications. Paul Preuveneers – Principal Technologist Lee Pollington – Principal Consultant. Agenda. Big Data and MarkLogic What is MarkLogic? MarkLogic in Financial Services - PowerPoint PPT Presentation

Transcript of Paul Preuveneers – Principal Technologist Lee Pollington – Principal Consultant

Page 1: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

1This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

Paul Preuveneers – Principal TechnologistLee Pollington – Principal Consultant

The Only Operational Database Technology for Mission-Critical Big Data Applications

Page 2: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

2This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

Agenda

•Big Data and MarkLogic•What is MarkLogic?•MarkLogic in Financial Services•MarkLogic Integration Points (Connectors / Toolkits)

Page 3: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

3This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

Volume

Complexity VariabilityValueVariety

Petabyte / ExabyteBillions of itemsSocial MediaMachine dataData processes producing data

10Ks of transactions per secondIn & outStreamsBulk processing

PatternsInferenceUnstructuredDisparate eventsRelationships

Varied sourcesVaried data typesChanging data typesValue from decision supportValue from operational efficiencies

Velocity

Page 4: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

4This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

Agenda

•Big Data and MarkLogic?•What is MarkLogic?•MarkLogic in Financial Services•MarkLogic Integration Points (Connectors / Toolkits)

Page 5: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

5This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

What is MarkLogic Server?

•Special Purpose DBMS for poly-structured information, with enterprise expectations• ACID transactions• Backup, Full/Partial Replication, Distributed Txns

•Search Engine Kernel, with enterprise expectations• Full text• Faceted navigation, at massive scale • Boolean, proximity, stemming, tokenization, decompounding, case, diacritics,

language…•Application Server

• HTTP (including RESTful)• XCC Java/.NET• WebDAV

Page 6: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

6This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

What makes MarkLogic DBMS Special?•Not Relational (RDBMS)•XML

• The Only Data Model Required• Schema Agnostic• Text a First-class Citizen among Data Types• XQuery/XSLT

•Optimized Search Engine Algorithms•Very Low DBA Overhead (0.5 FTE / 100 hosts)•5-Minute Install•5-Minute Scale-Out•Database and Search Engine are the same

Page 7: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

7This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

What makes MarkLogic Search Special?

•Transactional: Enterprise Scale (no index latency)•Unicode (Internationalization)•Multiple Query Types

• Analytics: Aggregation, Facets & Ranges, Co-occurrence, Geospatial• Text Search: Boolean, Stemming, Word Lexicons, Dictionary & Thesauri• Alerting: Profiles, Alerts, Filters, Tipping, Selectors, “Triggers” … • Powerful Search Combination (e.g. Text + Analytics + Alerting)

•Processing Near the Data (fast search, low bandwidth)• Database and Search Engine are the same

Page 8: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

8This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

123, 127, 129, 152, 344, 791 . . .

122, 125, 126, 129, 130, 167 . . .

123, 126, 130, 142, 143, 167 . . .

123, 130, 131, 135, 162, 177 . . .

126, 130, 167, 212, 219, 377 . . .

. . .

. . .

Document References

126, 130, 167, …

Term Term List

Range Indexes

“accelerating”

“creation”

“content”

“application”

“agility”

<article>

<article> / <title>

product: MarkLogic

Geospatial

Search: Universal Index

Page 9: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

9This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

MarkLogic Can Scale

•Scale Up: Typically 1 TB+ XML per Server•Scale Out: Low Hundreds(++) of Servers in a Cluster•Commodity Hardware

• 2-CPU x 6-core/hyperthreaded• 32+ GB RAM• 3x disk: local mount with failover

•OS• Linux RHEL 5• Solaris 10• Windows 2003/8 (XP/Vista/7 for Dev)

Page 10: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

10This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

E Host 1

partition1

E Host 3

D Host 4 D Host 5 D Host 6 D Host k

partition2 partition3 partitionm

E Host 2

partition4

HA&DR

AppServer

Data

Same Code-base

Shared-Nothing Cluster

Page 11: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

11This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

Agenda

•Big Data and MarkLogic•What is MarkLogic?•MarkLogic in Financial Services•MarkLogic Integration Points (Connectors / Toolkits)

Page 12: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

12This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

Financial Services Solutions

• Operational Data Store / Trade Store

Highly Transactional

• ISDA Contract Analysis (Electronic & Paper)• Document Analysis (e.g. Sales Process, Financial Directives)• Situational Awareness• Customer On-Boarding

Content Aggregation & Discovery

• Research / Policy Authoring & Distribution

Content Publishing

Page 13: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

13This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

Operational Data Store / Trade Store

- High Volume Trades (Derivatives, Equities, FX etc.) in siloes- Mostly represented in XML (e.g. FpML, FIXML)- Point-in-time queries (e.g. exposure by counterparty)- Risk Management (understand exposure, auditing)

What is it?

- High Performance with Native XML compared to RDBMS- We are a transactional DB (ACID + business continuity)- Less hardware required / commodity servers- No shredding of XML (lowers risk of corruption)- Can aggregate over multiple schemas- Easily accommodate new schemas, changes in schema

Why are we good at it?

Page 14: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

14This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

Operational Data Store / Trade Store

Example: JP Morgan Chase ODS

Live for 12+ months2.25 million OTC Derivatives (450+ million documents)Strategic platform mandated for core transaction processingShort-listed for Best Investment Banking Initiative at The Banking Technology Awards 2011Agile onboarding of new Derivatives productsHuge reduction in time to process FO XML messages20 Sybase systems replaced with 3-Node MarkLogic cluster

Page 15: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

15This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

It's a Trade Processing Story

Started with DerivativesNatural fit with documents Complex instruments, “low volume” instruments

It’s a trade workflow engineEnterprise Service Bus / Component architectureNew products Modifications to existing productsSecurities had a new challenge for us

Page 16: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

16This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

ISDA Contract Analysis

- Swaps / Derivatives Contracts- Risk Management (understand exposure)- Effect of Change (e.g. credit rating, termination events)

What is it?

- Contracts are combination data/text- Front-end solutions like Exari use Word for contract authoring but output structured XML- Good query functions for filtering and aggregation of exposure as well as other what-if scenarios

Why are we good at it?

- If in paper form, OCR and enrichment is required. This is hard, time-consuming and costly (up to $150 per doc for managed service)- Most contracts are in paper form (90+ percent)

Where do we need help?

Page 17: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

17This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

Document Analysis (e.g. Sales Process, Financial Directives)

- Making sense of poly-structured data (avoid BIG fines)- Extracting patterns and trends (e.g. did we say the right thing to our customer at the right time? / PPI mis-selling)- Developing value calculations in hard-to-handle formats (i.e. aggregating and unlocking the calculations in Excel)

What is it?

- Good conversion tools for PDF, MS Office etc.- Great full-text search to analyse converted documents- Inclusion of external content where applicable (RSS, Social Media, Web Sites)- Group individual Excel spreadsheets for powerful analysis

Why are we good at it?

- Enrichment often requires substantial domain expertise

Where do we need help?

Page 18: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

18This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

Situational Awareness

- Trading Decision Support- Amalgamation of internal/external poly-structured data- Heavy geospatial element- Analysis across datasets (vessels, pipes, weather, RSS)

What is it?

- Quick take-up of new sets of data- ML is good at geospatial queries- ML is good at incorporating external data (web, RSS etc.)

Why are we good at it?

Page 19: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

19This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

Situational Awareness

Page 20: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

20This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

Customer On-Boarding

- Content Aggregation from multiple CMS- KYC / Holistic view of customer (good communication)- Avoid duplication of effort (faster on-boarding)- Rapid search and retrieval

What is it?

- Feature-rich, fast search at volume- 30 Digits allows us to extract from multiple CMS- Flexible metadata-handling (dynamic facets)- Able to apply security model from underlying CMS

Why are we good at it?

- Lots of content is image-based / requires OCR and data enrichment

Where do we need help?

Page 21: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

21This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

Research / Policy Authoring & Distribution

- Template-driven authoring - Ensuring consistency, validation and component re-use- Dynamic Publishing (VISA, Morgan Stanley, Citigroup)

What is it?

- Easy template creation and maintenance- Great integration with MS Office- Componentisation and versioning easy in ML- Dynamic assembly based on role/geography etc.

Why are we good at it?

Page 22: Paul Preuveneers – Principal Technologist Lee  Pollington  – Principal Consultant

22This document is CONFIDENTIAL and its circulation and use are RESTRICTED. © 2012 KPMG LLP, a UK limited liability partnership, is a subsidiary of KPMG Europe LLP and a member firm of the KPMG network of independent member firms affiliated with KPMG International, a Swiss cooperative. All rights reserved.

Thank You – Questions?