Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire...

31
Migrating from Oracle to Espresso David Max Senior Software Engineer LinkedIn

Transcript of Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire...

Page 1: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Migrating from Oracle to Espresso

David MaxSenior Software Engineer

LinkedIn

Page 2: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

About LinkedIn New York Engineering

• Located in Empire State Building

• Approximately 100 engineers and 1000 employees total

• Multiple teams, front end, back end, and data science

New YorkEngineering

Page 3: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

About Me

• Software Engineer at LinkedIn NYC since 2015

• Content Ingestion team

• Office Hours –Thursday 11:30-12:00

David MaxSenior Software Engineer

LinkedInwww.linkedin.com/in/davidpmax/

Page 4: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

What is Content Ingestion?

Content Ingestion

Babylonia

Page 5: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Content Ingestion

Babylonia

Page 6: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Content Ingestion

Babylonia

Page 7: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Content Ingestion

Babylonia

url: https://www.youtube.com/watch?v=MS3c9hz0bRg

title: "SATURN 2017 Keynote: Software is Details”

image: https://i.ytimg.com/vi/MS3c9hz0bRg/hqdefault.jpg?sqpoaymwEYCKgBEF5IVfKriqkDCwgBFQAAiEIYAXAB\\u0026rs=AOn4CLClwjQlBmMeoRCePtHaThN-qXRHqg

Page 8: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Content Ingestion

Babylonia

Page 9: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

What is Content Ingestion?

Content Ingestion

Babylonia

• Extracts metadata from web pages

• Source of Truth for 3rd party content

• Also contains metadata for some public 1st party content

• Used by LinkedIn services for sharing, decorating, and embedding content

• Data also feeds into content understanding and relevance models

Page 10: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Babylonia Datasets

Database HDFSETL

Content Ingestion

Babylonia Data Change Events

Page 11: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Downstream and Upstream Datasets

Database HDFSETL

Near Line

Offline

Data Change Events

Content Ingestion

Babylonia

Page 12: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Babylonia use of Oracle (before migration)

• Schema – Metadata extracted from each URL stored in individual rows

• Client –Babylonia the main (but not only) client to directly execute queries on Oracle DB

• Rest.li – Most online interaction with dataset in Oracle via Babylonia’s Rest.li API

• RDBMS – Relational Database Management System

• Databus – Platform for streaming data change events to near line consumers

• Offline – ETL to HDFS for offline consumers

Page 13: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

What isEspresso?

Espresso is LinkedIn’s strategic distributed, fault-tolerant NoSQL database that powers many of LinkedIn’s services

• ~100 clusters in use*

• ~420TB of SoT data*

• ~2 million qps at peak load*

* as of August 1, 2017

Page 14: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

What is Espresso?

• Document – A table is a container for documents of the same schema (defined in Avro)

• Keys – Documents index by key fields, which are defined in the table schema

• NoSQL – Non relational

• Distributed – A single database can be distributed over a cluster of machines

• Scalable – Able to scale clusters horizontally by adding more nodes

Page 15: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Why Migrate?

• Integration – Support for Espresso integrated with other tools and systems at LinkedIn

• Rest.li – Espresso’s API is based on Rest.li, which makes it easier to treat Espresso endpoints like other LinkedIn Rest.li endpoints

• Schema Evolution – Supported with zero downtime and no coordination with DBA teams

• Maintenance – Babylonia’s Oracle tables required periodic jobs to be run that involved downtime for each server

• Cost – Oracle more expensive to run

• Strategy – Espresso is the preferred platform at LinkedIn for data of this type

• Support – Espresso team part of LinkedIn

Page 16: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Data Formats (Oracle)

Oracle Database

HDFSETL

Near Line

Offline

Oracle DatabusEvents

Content Ingestion

Babylonia

Rest.liEndpoints

Oracle RowPegasusObject

PegasusData

Oracle Row

Oracle Row

Oracle Row

• Complex transformation between Oracle format and Pegasus format

Page 17: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Pegasus and Avro

Pegasus Schema

Avro Schema

Java Objects

Java Objects

• Both can be used to generate Java objects with very similar interfaces

• Pegasus schema can be used to auto-generate the Avro schema

• Pegasus and Avro schema definitions are very similar

Page 18: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Data Formats (Espresso)

Espresso Database

HDFSETL

Near Line

Offline

Espresso Brooklin Events

Content Ingestion

Babylonia

Rest.liEndpoints

Espresso AvroPegasusObject

PegasusData

Espresso Avro

Espresso Avro

Espresso Avro• Simple transformation

between Avro format and Pegasus format

Page 19: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Why Migrate? Schema Evolution

• ALTER TABLE

• Not tied to code deployment – need to coordinate with DBAs

• Schema change involves server downtime

• In practice, developers go to great lengths to avoid the hassle

• Schema accumulates tech debt

• Document schema auto-registration

• Schema changes are registered automatically as part of the Babylonia deployment process

• Backwards compatibility is enforced –existing data does not need to be transformed

• Avro schema more natural fit with Rest.li Pegasus schema

Espresso

Page 20: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Goals forMigration Process

• Zero down time

• Transparent to Rest.li clients

• Give offline and nearline consumers time to migrate

• Validate each step

• Mirroring in real time

Page 21: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Pre-Migration State of Babylonia

Oracle Database

HDFSETL

Near Line

Offline

Oracle DatabusEvents

Content Ingestion

Babylonia

Page 22: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Pre-Migration State of Babylonia

Oracle Database

Oracle DatabusEvents

Rest.liEndpoints

Other Services

Rest.liCalls

Page 23: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Pre-Migration Cleanup

Oracle Database

Oracle DatabusEvents

Rest.liEndpoints

Other Services

Rest.liCalls

• Identify code that is tightly-coupled to the database

• Decide which code should be reimplemented for Espresso, and which code should be decoupled or eliminated.

• Reduce number of code paths to migrate

The easiest lines of code to migrate are the lines of code that don’t exist

Page 24: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Bootstrap Espresso Database

Oracle Database

HDFSETL

Offline Convert

Job

Espresso Database

Espresso Bulk

Loader

Avro Data File

Page 25: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Bootstrap Espresso Database

Oracle Database

HDFSETL

Espresso Database

Page 26: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Shadow Read Validation

Databus Listener, Shadow Read Validation

Oracle Database

Oracle DatabusEvents

Espresso Database

DatabusListener

Page 27: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Direct Writes to Espresso

Oracle Database

Oracle DatabusEvents

Espresso Database

DatabusListener

Shadow Read Validation

DirectWrite

Page 28: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Resolving Write Conflicts

Oracle DatabusEvents

Espresso Database

DatabusListener

DirectWrite

• Dual Write Conflict – Databus Listener and Babylonia updating same record

• Migration Control – optional field added to scheme indicating which process wrote the record: Bulk Loader, Databus listener, or Babylonia

Page 29: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Espresso New SoT

Oracle Database

Oracle DatabusEvents

Espresso Database

DirectRead/Write

Dual Writes

Espresso Brooklin Events

Deprecated

Page 30: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Oracle Turnoff

Espresso Database

DirectRead/Write

Espresso Brooklin Events

Page 31: Migrating from Oracle to Espresso€¦ · About LinkedIn New York Engineering •Located in Empire State Building •Approximately 100 engineers and 1000 employees total •Multiple

Thank you