COBOL to Apache Spark

59
Oct 28, 2017 Ville Misaki System Strategy Department, Rakuten Card Co., Ltd.

Transcript of COBOL to Apache Spark

Page 1: COBOL to Apache Spark

Oct 28, 2017

Ville Misaki

System Strategy Department,

Rakuten Card Co., Ltd.

Page 2: COBOL to Apache Spark

2

Ville Misaki

Senior Software Engineer

Technology Strategy Group,

System Strategy Department,

Rakuten Card Co., Ltd

Career

15+ years; 3 years at Rakuten

In Finland, the Netherlands, Japan

Java (EE), Perl, C++, web systems, relational

databases, performance optimization & security

Page 3: COBOL to Apache Spark

3

Oracle OpenWorld 2017

Case Study: Credit Card Core System

with Exalogic, Exadata, Oracle Cloud

Machine (CON4994) => Link

JavaOne 2017

Java EE 7 with Apache Spark for the

World’s Largest Credit Card Core

Systems (CON4998) => Link

Page 4: COBOL to Apache Spark

4

Part 1 – Perfect Design

1. About Rakuten Card

2. Background

3. Platform Migration

4. Data Migration

5. Software Migration

Part 2 – Harsh Reliability

6. Performance

7. Apache Spark

8. Judgement Day

9. Into the Future

Page 5: COBOL to Apache Spark

5

Page 6: COBOL to Apache Spark

6

Unified brand, ecosystems around the world.

Page 7: COBOL to Apache Spark

7

Top-level credit card

company in Japan

Core of Rakuten eco

systems.

3rd position of total

transaction volume in 2016.

Growing rapidly.

Page 8: COBOL to Apache Spark

8

Page 9: COBOL to Apache Spark

9

Core Systems

Web Systems

External Systems

Intra Systems

Page 10: COBOL to Apache Spark

10

Mainframe

Old architecture – >20 years

High cost

Limited capacity and

performance

Low maintainability

Vendor locked-in

Limited security

For more details, check session

“From Mainframe to Java EE” at

16:00 today

Page 11: COBOL to Apache Spark

11

Phase of the improvement – 3.0

1.0 Initial phase

2.0 In-house

development

3.0 Standardization

Outsource based,

just started.

Vendor locked-in.

In-house

development,

differentiate with

lower costs and

faster delivery.

Standardized

system

architecture, both

for hardware and

software.

Achieved

Current Standard

Architecture

Page 12: COBOL to Apache Spark

12

Page 13: COBOL to Apache Spark

13

Oracle Exalogic

+ Exadata + ZFS Servers

Mainframe

Old New

Core

Systems

Page 14: COBOL to Apache Spark

14

Financial de-facto standard

Java EE compliant.

Matured, from 1997.

Financial de-facto standard

ISO/IEC 9075 SQL compliant

Matured, from 1983.

COBOL

Network

DB

App Server

Database

Old New

WebLogic Server

Oracle Database

Page 15: COBOL to Apache Spark

15

Page 16: COBOL to Apache Spark

16

ISAM

VSAMNDB Oracle Database

Copy & Convert

Page 17: COBOL to Apache Spark

17

Data Conversion

Network database to relational database

ISAM/VSAM data to relational database

Legacy Japanese character set to Unicode

Fix data inconsistencies

Scale

Terabytes of live production data

Less than 24 hours time

Page 18: COBOL to Apache Spark

18

Offline migration

Freeze data during migration

Full migration – not incremental

Customers mostly unaffected

Data & System migration

At the same time

Cannot be split into phases

Cached

Page 19: COBOL to Apache Spark

19

ISAM

VSAMNDB Oracle DatabaseISAM

VSAMNDB

Mirror

Copy & Convert

Replication

Page 20: COBOL to Apache Spark

20

Page 21: COBOL to Apache Spark

21

Req.

Source code

Appliction

Platform

Hardware

Reimplement

Convert

Emulate

Page 22: COBOL to Apache Spark

22

Reimplement Emulate Convert

Pro

• Optimal performance

• Low maintenance cost

• Development unchanged

• Easy to test

• Easy to migrate

• Flexible cost vs. schedule

• Case-by-case fixes

• Easy to test

Con

• Expensive

• Takes a long time

• Risky

• Difficult to test

• Development unchanged

• Low performance

• Future questionable

• Legacy code remains

• Low performance points

need to be addressed

Requirements?

Page 23: COBOL to Apache Spark

23

Reimplement Emulate Convert

Pro

• Optimal performance

• Low maintenance cost

• Development unchanged

• Easy to test

• Easy to migrate

• Flexible cost vs. schedule

• Case-by-case fixes

• Easy to test

Con

• Expensive

• Takes a long time

• Risky

• Difficult migration

• Development unchanged

• Low performance

• Future questionable

• Legacy code remains

• Low performance points

need to be addressed

2x Performance No regression Minimal downtime

Page 24: COBOL to Apache Spark

24

Reimplement Emulate Convert

Pro

• Optimal performance

• Low maintenance cost

• Development unchanged

• Easy to test

• Easy to migrate

• Flexible cost vs. schedule

• Case-by-case fixes

• Easy to test

Con

• Expensive

• Takes a long time

• Risky

• Difficult migration

• Development unchanged

• Low performance

• Future questionable

• Legacy code remains

• Low performance points

need to be addressed

2x Performance No regression Minimal downtime

Page 25: COBOL to Apache Spark

25

Japanese COBOL

Source code

Java Source code

Customized

source code

converter

Convert from Japanese

COBOL to Java EE

Keep original core

business logic

Page 26: COBOL to Apache Spark

26

JavaFrom Web Systems,

For New Logic

COBOLFrom Old System,

converted to Java

Ease of migration, resource re-use

Introduce power of Java EE

Introduce converter from YPS to Java

“Dual Source Architecture”

Japanese

COBOL

Japanese source code

Almost abandoned

No books, no community

Old New

Page 27: COBOL to Apache Spark

27

New Logic

(Java EE)

Application Server

(Java EE)

Legacy Logic

(Mainframe)

Build

Deploy

Japanese

COBOL

Convert to

COBOL

Convert

to Java

COBOL

Java

Compile

WAR

Converter

Two sources,

single binary

Easy to operate

Java

Byte Code

Compile

Java

Page 28: COBOL to Apache Spark

28

BIG

-IP

Real-time Servers

(WebLogic)

Batch Servers

(Spark & Java)

Façade

Rich clients Façade

Façade

Intranet

External

Intra

Exadata

Mail

Form

BIG

-IP

Façade

BIG

-IPExternal

customers

Scheduler

Core

Busin

ess L

ogic

AP

Is

Operation

terminal

Web

bro

wse

r

Old New

Page 29: COBOL to Apache Spark

29

Part 1 – Perfect Design

1. About Rakuten Card

2. Background

3. Platform Migration

4. Data Migration

5. Software Migration

Part 2 – Harsh Reliability

6. Performance

7. Apache Spark

8. Judgement Day

9. Into the Future

Page 30: COBOL to Apache Spark

30

Page 31: COBOL to Apache Spark

31

vs.

Page 32: COBOL to Apache Spark

32

vs.

Page 33: COBOL to Apache Spark

33

Start

Slow

Slow

Batches are run as networks

Hierarchical

Critical path

Time window

Page 34: COBOL to Apache Spark

34

Automatic code conversion

COBOL program flow emulated in Java

COBOL-like data structures in Java

DB access logic

Business logic built on network DB

NDB and RDB are good at different tasks

Page 35: COBOL to Apache Spark

35

COBOL vs. Java

Goto statement – imitation is complex

Sub-program calls – heavy

No local variables – tight coupling

No libraries – copy&paste code

Few shared data structures – copy&paste definition

No shared enum/constant – magic numbers

Page 36: COBOL to Apache Spark

36

COBOL data structures

Fixed length – hard-coded

String-based

Data block inside program

Often thousands of fields

Hierarchical fields

Content is joined/split automatically

Variable namespace under each parent

Even five levels deep

Page 37: COBOL to Apache Spark

37

Page 38: COBOL to Apache Spark

38

Logic optimized for NDB

Read sequentially

Data pre-sorted

Data pre-formatted

Emulate in RDB

Uphill battle

NDB RDB

Search Slow Fast

Sequential Access Fast Slow

Sorting Slow Fast

Formatting Fast Slow

Page 39: COBOL to Apache Spark

39

New system must be faster

Time until launch:

1 year

Page 40: COBOL to Apache Spark

40

Options?

Redesign and re-implement from scratch

Not feasible

Optimize framework

Limited effectiveness

Parallelize batches

Elastic brute-force

Page 41: COBOL to Apache Spark

41

Page 42: COBOL to Apache Spark

42

Time

Sequential

Parallel

Page 43: COBOL to Apache Spark

43

Cluster Node

Cluster Node

Cluster Node

Cluster Node

Cluster Node

Bootstrap

Scheduler

Cluster Node

Share

d M

em

ory

Page 44: COBOL to Apache Spark

44

1. Making business logic parallel

Independent processing

2. I/O

Data transferred over network

3. Data ordering

Shuffles

Page 45: COBOL to Apache Spark

45

Problem: input data rows are not independent!

Red flags

Fields not initialized for each row

Code forks early (header & data?)

Legacy code analysis

Refactor

Fields to local variables

Extract data structures

Initialize data for each row

Run & see

321

3

2

1 Reference?

Page 46: COBOL to Apache Spark

46

1. Group related rows together

2. Process header rows separately

3. Modify business logic

Page 47: COBOL to Apache Spark

47

Group related rows together Custom data reader

Multiple rows behave like one row

Process each group row in a loop, on

the same node

Pro

Business logic not modified

Con

Relationships may be too complex

Groups may grow too big

ID Data

1 …

1 …

2 …

3 …

3 …

4 …

Page 48: COBOL to Apache Spark

48

Process header rows separately Run business logic for header rows first

Collect result in NavigableMap

Run business logic for data rows

Initialize data from previous header

floorKey(dataRowIndex)

Pro

Minimal changes to business logic

Con

Relationships may be too complex

ID Type Data

1 Head …

1 Data …

1 Data …

2 Head …

2 Data …

3 Head …

3 Data …

Page 49: COBOL to Apache Spark

49

Modify business logic Row relationship could be removed, if it’s

Unintentional (a bug)

For unnecessary optimization

Data that could be retrieved otherwise

Pro

High chance for good performance

Con

High chance for new bugs

Page 50: COBOL to Apache Spark

50

Input and output data must be shared

Network storage

How long does it take to copy 200 GB?

Transfer

Process

Transfer

Process

Transfer

Heavy

Process

Heavy

ProcessTransfer

Transfer Process

Page 51: COBOL to Apache Spark

51

Sequential batches rely on ordering

Tricky to keep in Spark

Safe operations: map, filter, zip

Unsafe operations: join, group, sort

Process

Process

Process

Process

Process

Process

Shuffle

Process

Process

Process

Shuffle

Page 52: COBOL to Apache Spark

52

Good for

Heavy processing

Independent input data records

One input, multiple outputs

Unordered data

Not so great for

Little processing

Dependencies between data records

Merging multiple data sources

Page 53: COBOL to Apache Spark

53

Page 54: COBOL to Apache Spark

54

Page 55: COBOL to Apache Spark

55

321

321Data

Saturday Sunday Monday

Page 56: COBOL to Apache Spark

56

vs.

Page 57: COBOL to Apache Spark

57

Page 58: COBOL to Apache Spark

58

Next Phase

1.0 Initial phase

2.0 In-house

development

3.0 Standardization

4.0Data Optimized

Outsource based,

just started.

Vendor locked-in.

In-house

development,

differentiate with

lower costs and

faster delivery.

Standardized

system

architecture, both

for hardware and

software.

Overwhelming

differentiation,

with enabling

architecture for

customer centric

service.

Achieved Next

Current Standard

Architecture

Page 59: COBOL to Apache Spark