Session #2442: Flash-Optimized Apache Spark: Expanding In ... R Scala SQL Python Java Spark SQL...

download Session #2442: Flash-Optimized Apache Spark: Expanding In ... R Scala SQL Python Java Spark SQL Streaming

of 48

  • date post

    13-Aug-2020
  • Category

    Documents

  • view

    5
  • download

    0

Embed Size (px)

Transcript of Session #2442: Flash-Optimized Apache Spark: Expanding In ... R Scala SQL Python Java Spark SQL...

  • #ibmedge © 2016 IBM Corporation

    Session #2442: Flash-Optimized Apache Spark: Expanding In-Memory Analytics into Flash Bernie Wu, Levyx

    Randy Swanberg, IBM

    9/21/16

  • #ibmedge

    Please Note: •  IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

    and at IBM’s sole discretion.

    •  Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

    •  The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.

    •  The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

    •  Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

    2

  • #ibmedge

    Agenda

    •  Apache Spark •  OpenPOWER

    •  Spark on OpenPOWER •  CAPI Flash Technology

    •  Levyx •  Technology overview •  Capabilities and use-cases •  Levyx on OpenPOWER with CAPI Flash

    •  Summary / Questions / Follow-up

    3

  • © 2016 IBM Corporation #ibmedge

    Apache Spark

  • #ibmedge

    Apache Spark

    5

    Fast and general engine for large-scale data processing

    Spark Core API R Scala SQL Python Java

    Spark SQL Streaming MLlib GraphX

  • #ibmedge

    Apache Spark

    6

    •  Unified Analytics Platform –  Combine streaming, graph, machine

    learning and SQL analytics on a single platform

    –  Simplified, multi-language programming model

    –  Interactive and Batch

    •  In-Memory Design –  Pipelines multiple iterations on single

    copy of data in memory –  Superior Performance –  Natural Successor to MapReduce

    Fast and general engine for large-scale data processing

    Spark Core API R Scala SQL Python Java

    Spark SQL Streaming MLlib GraphX

  • © 2016 IBM Corporation #ibmedge

    OpenPOWER

  • #ibmedge

    OpenPOWER, a Catalyst for Open Innovation

    8 8

    Accelerated innovation through collaboration of partners

    Amplified capabiliAes driving industry performance leadership

    Vibrant ecosystem through open development

    Cloud Computing Hyperscale & Large scale

    Datacenters

    High Performance Computing & Analytics

    Domestic IT Agendas

    Industry Adoption, Open choice

    OpenPOWER Strategy

    Moore’s law no longer satisfies performance gain

    Numerous IT consumption models

    Growing workload demands

    Mature Open software ecosystem

    Market Shifts

  • #ibmedge 9

    Machine Learning SQL Graph

    1.7X System-to-System Advantage 2X Core-to-Core Advantage

    Machine Learning SQL Graph Machine Learning SQL Graph

    1.5X Price Performance Advantage

    Performance of Spark on POWER 7-Node S812LC 10-core vs. 7-Node E5-2690 v3 12-core

  • #ibmedge 10 10

    Typical I/O Model Flow

    Flow with a Coherent Model Shared Mem.

    Notify Accelerator Acceleration Shared Memory

    Completion

    ü  Virtual addressing & data Caching

    ü  Easier programming model

    ü  Enables applicaAons not possible on I/O

    OpenPOWER Technology: Coherent Accelerator Processor Interface (CAPI)

    CAPP PCIe

    POWER8 Processor

    FPGA

    Fu n

    ction n

    Fu n

    ction 0

    Fu n

    ction 1

    Fu n

    ction 2

    CAPI

    IBM Supplied POWER Service Layer

    DD Call Copy or Pin Source Data MMIO Notify Accelerator Acceleration

    Poll / Int Completion

    Copy or Unpin Result Data

    Ret. From DD Completion

  • #ibmedge

    strategy ( )

    CAPI Attached Flash Optimization §  Attach IBM FlashSystem to POWER8 via CAPI §  Read/write commands issued via APIs from applications to eliminate 97% of code path length §  Saves 10+ cores per 1M IOPS

    Pin buffers, Translate, Map DMA, Start I/O

    Application

    Read/Write Syscall

    Interrupt, unmap, unpin,Iodone scheduling

    20K instructions reduced to

  • #ibmedge

    CAPI Flash Configurations

    Up to 56TB of extended memory with one POWER8 server + CAPI attach FLASH

    Power S822L / S812L

    Flash System 900

    Power S822L / S812L / S822 LC

    NEW

    External Flash Configuration

    Integrated Flash Configuration

    Up to 8TB of super-fast storage tier on one POWER8 server

    12

    0

    50,000

    100,000

    150,000

    200,000

    250,000

    300,000

    350,000

    400,000

    450,000

    Conventional CAPI - I CAPI - E

    IOPS per Hardware Thread

    0

    20

    40

    60

    80

    100

    120

    140

    160

    180

    200

    Conventional CAPI - I CAPI - E

    Latency (microseconds)

    0.6X 1X

    2.6X

    3.7X

    0%

    100%

    200%

    300%

    400%

    Fibre Channel NVMe CAPI Fibre Channel CAPI NVMe

    Average Relative IOPs per CPU Thread

  • #ibmedge

    CAPI Flash Solution Use Cases

    Memory Expansion •  Application constrained by single-

    system memory capacity. Typical growth is through additional compute nodes.

    •  CAPI Flash APIs offer highly- efficient flash access, increased total capacity at better $ / throughput.

    Data Cache •  Application uses in-memory caches

    for data storage, and typically- constrained by ratios of memory to underlying storage.

    •  CAPI Flash APIs offer access to much larger ephemeral or persistent data in Flash, freeing up RAM.

    Fast Storage •  Application is constrained by IO

    overhead and throughput of existing storage infrastructure.

    •  CAPI Flash APIs offer extremely high IO per CPU thread with low latency.

  • © 2016 IBM Corporation #ibmedge

  • #ibmedge

    Levyx Overview •  Mission:

    •  Provide Software that cost-effectively maximizes performance and minimizes latency for Big Data and other Database server Platforms

    •  Founded in 2013 , Headquartered in Irvine, CA •  Reza Sadri, CEO

    –  Entrepreneur, PhD CS. Database specialization •  Tony Givargis, CTO

    –  UC Irvine Professor, PhD CS, Embedded Systems

    •  Series “A” led by OCA Ventures

    •  Patent-Pending Indexing technology

    •  Cloud, OEM, SI/SP partnerships

    15

  • #ibmedge

    Levyx Key-Value Storage Layer Bridges Gap

    16

    Software Hardware

    NVMs

    Flash SSDs

    Multi-core Processors

    Hardware

    Agnostic storage layer designed to

    optimize data- focused SW

    and latest HW

  • #ibmedge

    •  Helium-DB Storage Engine •  World’s Fastest Key Value store for Big Data Analytics and Operational

    Databases •  In-Memory Speeds or greater with Persistence

    •  LevyxSpark: Apache Spark+Helium •  Storage Optimized and Accelerated Open Source Spark for real-time/hi IO

    performance applications •  Full Spark SQL query pushdown (join, group-by, filter, etc) and

    acceleration to machine code speeds •  Node consolidation with combined memory-flash storage layer

    Levyx Products

  • #ibmedge

    Example Use Cases

    •  Financial Services •  Electronic Trading Workflow- Streaming analytics, compliance, risk-

    management, algorithmic/ML based trading

    •  Cybersecurity •  Logging and event management, correlation •  User behavior analytics/ML

    •  IOT •  Edge and Datacenter real time and batch analytics/operational databases

    •  E-commerce/Adtech •  Real-time Bidding Analytics

    18 #ibmedge © 2016 IBM Corporation 19 © Copyright 2013-2016 Levyx Inc.

    Helium: World’s Fastest Key Value Store Pluggable DB Storage Engine

  • #ibmedge © 2016 IBM Corporation 19 © Copyright 2013-2016 Levyx Inc.

    Helium: World’s Fastest Key Value Store Pluggable DB Storage Engine

  • #ibmedge

    Optimization Tool

    Patent-pending Multi-core

    © Copyright 2013-2015 Levyx Inc. Proprietary and Confidential 20

    Ultra-low latency indexing engine