GOTO 2016_real_final

download GOTO 2016_real_final

of 33

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of GOTO 2016_real_final

Nordeas object library for 4:3 presentations

Monolithic Batch Goes Microservice Streaming

A story about one transformationCharles Tye & Anton Polyakov

Who are We?3

Anton Polyakov

Head of Application Development

2 years in NordeaCharles Tye

Head of Core Services & Risk IT

17 years in NordeaDevelop solutions forMarket RiskCredit RiskLiquidity RiskStress TestingMessaging

Together with around 70 other people from all over the worldWhat We Do

Market Risk4 The high level view

Quantify potential losses and exposuresDo many small risks add up to a big risk?Can risks combine in unusual and unexpected ways?

Market Risk5 Line of DefenceProtect Nordea and our customersDaily internal reporting and external reporting to regulators

Independent functionAnalysis and insight into the sources of riskControl of riskManagement of capital

Examples of Risk Analysis6 Value at RiskLook at last 2 years of market historyAverage of the worst 1% of outcomes

Simulate if the same thing happened again today.Highly non-linear but requirement to drill in and find the drivers

Examples of Risk Analysis7 Stress ScenariosBlack Swan worst case scenarios

Unexpected outcomes from future eventsExample: BrexitSimulate if it happened

An Interesting Technology Problem8 Risk Analysis: Everything has to be included = know when you are completeRisk does not sum over hierarchies

Drill-down is non trivial

Traditional OLAP aggregate & incrementdoesnt work10,000,000,000,000Reactive nearreal-time calculations

Streaming dataFast corrections and what-if

Interactive sub-second queries on hugedata sets

Challenge No 1.Find the seamsBreak it upReusable componentsReplace a piece at a time9


Challenge No 2.10 Develop a new serviceIntegrate into the legacy system

Reconcile the output

Find and fix legacy bugs

Fight complification


Challenge No 3.Batch is synchronous state transfer. The only way to achieve consistency?11 Consistency is seriously hard to combine with streaming

Event sourced and streaming approachMore robust, scalable and faster, especially for recovery Comes with a cost

Challenge No 4.Legacy SQL was slow12 Partitions and horizontally scales out across commodity hardware.Tougher challenges on terabyte-scale hardware due to NUMA limitations. Some cubes already > 200gb and larger ones planned.Replace with in-memory aggregationAggregate billions of scenarios in-memory and pre-compute total vectors over hierarchies (linear)Non-linear measures computed lazilyReactive and continuous queries

Solution: Microservices!Well almostSingle responsibility replace pieces of legacy from the inside outSelf contained with business functional boundariesIndependent and rapid development team owns the whole stackOrganisationally scalable horizontally scale your teamsFlexible and maintainable evolve the architecture Smart endpoints and dumb pipesInnovation and short lifecycles13

The problemBusiness:Multi-model Market Risk calculator for Nordea portfolioVaR on different organization levels with 5-6 different models in parallelIT:7000 CPU hours of grid calculationMore than 4000 SQL jobsGraph with more than 10000 edgesNightly batch flow


How did it look like?Well, you know. 10 years of developmentIn SQL

No refactoring(who needs it?)


Precisely, how did it look?16

Logical architectureMonolith staged app


Now a little of complicationSloo-o-o-owFat. So it breaks

Can be parallel?18

So what to do?We all know the answer probably (since we are at this section )Find logically isolated blocksKeep an eye on non-functional aspectThink of how they communicateThink about what happens if something dies19

Not quite a classical microservicesor?

produceenrichaggregateRequest/response is not feasibleSynchronous interaction is too longSome results are expensive to reproduce20

So we needA middleware whichGlues services togetherCaches important resultsServes as a coordinator and work distributor


Scale out

Fast pub/subQueues and setspull and dedupDistributed locks22

Scale out

Fast pub/subQueues and setspull and dedupDistributed locksLocks? Who needs locks?23

storestorestorePub/sub messaging as notifierProducerEnricherAggregatorconsumer

Redis pub/sub



There are two main problems in distributed messaging:2) Guarantee that each message is only delivered once1) Guarantee messages order2) Guarantee that each message is only delivered once

EnricherRedis pub/subIncoming queueProcessing queueEnricherProducerstoreQueues with atomic operations



Sets and Hmaps all good for dedupIn eventually consistent world dedup is your best friendstore - HSETEnricherMultiple inserts due to recoveryConsistent state due to dedup27

So how to scale out?logicallyconcurrentlyEnricher Enricher Enricher Redis pub/subAggregator Aggregator Aggregator Steal workFilter my eventsRedLock + TTL


DemostorestorestoreProducerEnricherAggregatorconsumerRedis pub/sub

Incoming queueProcessing queueRedLock + TTL29

The Result and What We LearnedSuccess!Aggregate and produce risk: 5 hours 30 minsCorrections: 40 mins 1 secondEarlier deliveries more time to manage the risksFaster recovery from problemsHappy risk managersImportant (and painful) to integrate new services into the existing systemConsistency is hard to combine with streaming (subject of another talk maybe)When distributing remember first law of distributed objects architecture(do you remember it?)30

The Result and What We Learned

First Law of Distributed Object Design: "don't distribute your objects"31

And of course32