BlueSSD: Distributed Flash Store for Big Data Analytics

18
BlueSSD: Distributed Flash Store for Big Data Analytics Sang Woo Jun, Ming Liu, Kermin Fleming, Arvind Computer Science and Artificial Intelligence Laboratory MIT

description

BlueSSD: Distributed Flash Store for Big Data Analytics. Sang Woo Jun, Ming Liu, Kermin Fleming, Arvind Computer Science and Artificial Intelligence Laboratory MIT. Introduction – Flash Storage. Low latency, high density Throughput per chip is fixed - PowerPoint PPT Presentation

Transcript of BlueSSD: Distributed Flash Store for Big Data Analytics

Page 1: BlueSSD: Distributed Flash Store for Big Data  Analytics

BlueSSD: Distributed Flash Store for Big Data AnalyticsSang Woo Jun, Ming Liu, Kermin Fleming, ArvindComputer Science and Artificial Intelligence LaboratoryMIT

Page 2: BlueSSD: Distributed Flash Store for Big Data  Analytics

Introduction – Flash Storage

• Low latency, high density

• Throughput per chip is fixed• Many chips are organized into multiple busses

that can work concurrently• High throughput is achieved with more busses

• Read/write speed difference, limited write lifetime• Not the main focus… yet

Page 3: BlueSSD: Distributed Flash Store for Big Data  Analytics

Flash Deployment Goals

• High Capacity / Low Unit Cost • COREFU - Share distributed Storage over

commodity network• TBs of storage at <1ms latency, 1GB

throughput at high distribution• High Throughput / Low Latency

• FusionIO - Maximum performance using many busses/chips and PCIE

• 100s of GB at 100s of us latency, 3GB throughput

Page 4: BlueSSD: Distributed Flash Store for Big Data  Analytics

BlueSSD – Best of Both Worlds

• Shared distributed storage over faster custom network to accelerate big data analytics

• PCIE• 8x PCIe 2.0 (~1GB/s)

• Inter-FPGA SERDES• Low latency sideband network (<1us, ~1GB/s)• Automatic network/flow control synthesis

Page 5: BlueSSD: Distributed Flash Store for Big Data  Analytics

The Physical System (Old)

Sideband Link (~1GB/s)

Flash Board (~80MB/s)

PCIe (~1GB/s)

Page 6: BlueSSD: Distributed Flash Store for Big Data  Analytics

The Physical System (Now-4 Nodes)

Page 7: BlueSSD: Distributed Flash Store for Big Data  Analytics

System Configuration• 6 Xilinx ML605 Development Boards + Hosts• 4 Custom Flash Boards

• 4 busses with 8 chips, 16GB per board• 2 Xilinx XM104 Connector Expansion Boards• 5 SMA Connections

FPGA FPGAXM014 XM014

FPGA1 FPGA2CustomFlashBoard

CustomFlashBoard

FPGA3 FPGA4CustomFlashBoard

CustomFlashBoard

Host PC

PCIE

SMA

SMA

Hub node

Storage Node

The ML605 only has one SMA port, requiring hubs

Page 8: BlueSSD: Distributed Flash Store for Big Data  Analytics

System Configuration• Single software host can access all nodes• All nodes have identical memory maps of the

entire address space• Requests are redirected to nodes that have the

dataFPGA FPGA

XM014 XM014

FPGA1 FPGA2CustomFlashBoard

CustomFlashBoard

FPGA3 FPGA4CustomFlashBoard

CustomFlashBoard

Host PC

PCIE

SMA

SMA

Page 9: BlueSSD: Distributed Flash Store for Big Data  Analytics

Network Flash Controller

CustomFlashBoard

Host PC

PCIE

FPGAXM014

FPGA1FPGA1

RequestsData

Client InterfacePCIE

Flash Controller

Flash Board

AddressMappin

g

SMA

Host PC

Remote Node

Page 10: BlueSSD: Distributed Flash Store for Big Data  Analytics

ML605XM014

ML605

SMAML605

ML605

Network Hub

• Programmatically define high-level connections• N-to-N crossbar-like network is generated

ML605FPGA1

FPGA2

FPGA3

FPGA1

FPGA2

FPGA3

FPGA4 FPGA4

Page 11: BlueSSD: Distributed Flash Store for Big Data  Analytics

Software

• FUSE provides a file system abstraction• Custom FUSE module

interfaces with FPGA• The entire storage can be

accessed as a single regular file

• Currently running SQLite off-the-shelf• How to benchmark?

SQLite

stdioFile System

FUSE

PCIE Driver

FPGA

Page 12: BlueSSD: Distributed Flash Store for Big Data  Analytics

Storage Structure

• Focusing on read-intensive workloads• Writes are done offline, no coherence issues• Address is striped across FPGAs

• Concurrent writes will require more than coherence• SQLite assumes exclusive access to storage• If we are to have more than one file, file

system metadata will need o be synchronized

Page 13: BlueSSD: Distributed Flash Store for Big Data  Analytics

Performance Measurement

Nodes Page Read Latency (us)

1 1042 1414 (<180?)COREFU 600FusionIO 68

Nodes Throughput (MB/s)

1 852 1704 (340?)COREFU* 1500FusionIO 3000

Throughput bottlenecked by custom flash card*COREFU performance at 32 nodes

Page 14: BlueSSD: Distributed Flash Store for Big Data  Analytics

Scalability

• Latency increase is small enough to accommodate 16+ FPGAs

• Single SMA cable can accommodate 10+ Flash board throughput• More should be possible with good topology• Different story if flash boards are faster

(link compression?)

Page 15: BlueSSD: Distributed Flash Store for Big Data  Analytics

Future Work (1)

• Bring up the 4 node system• Bring up the 8 node system

• 8 more ML605 boards have been asked from Xilinx

• More capacity + throughput

Page 16: BlueSSD: Distributed Flash Store for Big Data  Analytics

Future Work (2)

• Offload computation to FPGA• Do computation near storage

• Relational algebra processor• Complex analytics?

• Looking for interesting application

Page 17: BlueSSD: Distributed Flash Store for Big Data  Analytics

Future Work (3)

• Multiple concurrent writers• Software level transaction management• Hardware level pseudo-filesystem is probably

required

Page 18: BlueSSD: Distributed Flash Store for Big Data  Analytics

The End

• Thank you!