Unifying the Data Center Caching Layer - Feasible? Profitable?

23
Unifying the Data Center Caching Layer - Feasible? Profitable? Liana V. Rodriguez , Alexis Gonzalez, Pratik Poudel, Raju Rangaswami, and Jason Liu

Transcript of Unifying the Data Center Caching Layer - Feasible? Profitable?

Page 1: Unifying the Data Center Caching Layer - Feasible? Profitable?

Unifying the Data Center Caching

Layer - Feasible? Profitable?

Liana V. Rodriguez, Alexis Gonzalez, Pratik Poudel,

Raju Rangaswami, and Jason Liu

Page 2: Unifying the Data Center Caching Layer - Feasible? Profitable?

Background

❖ Cloud Data Centers use different storage types

(block, file, object , key-value)

❖ Caches typically target one storage type or one

store instance or simply workloads running on a

single host or VM

❖ New devices (3D-Xpoint, flash-based SSD

technology) create new requirements for cache

optimization

❖ High performance network fabrics are making

remote data access more efficient (40 GigE,

RoCE, InfiniBand, etc.)

Applications

Libraries

Big

Data

Enterprise

Apps

NoSQL Spark

TensorFlow

Interfaces

Distributed

Caching

MapReduce

Pytorch

Block

FileObject

Key-Value

Memcached Redis

EC-Cache Alluxio

Storage

Systems

HDFS

NFS

Azure

Blob

Google

FS

Ceph

S3

Machine

Learning

Page 3: Unifying the Data Center Caching Layer - Feasible? Profitable?

Outline

❖ Background

❖ Motivation

❖ Caching as a Service (CaaS)

❖ Preliminary Study

❖ Discussion and Future Work

Page 4: Unifying the Data Center Caching Layer - Feasible? Profitable?

Motivation

Host 2

Key-Value Store

Host 1

File System (HDFS)

App1

App2App3

App4

Cloud Data Center

App5

Cache Fragmentation

Cache resources should be shared

across storage systems and applications

to decrease resource fragmentation

within the data center.

Page 5: Unifying the Data Center Caching Layer - Feasible? Profitable?

Motivation

Host 2

Key-Value Store

Host 1

File System (HDFS)

App1

App2App3

App4

Cloud Data Center

App5

Cache Fragmentation

Cache resources should be shared

across storage systems and applications

to decrease resource fragmentation

within the data center.

Store Type Writes

Block 71%

File 9.8%

Object 40.5%

Key-value 22.5%

Write-dominant workloads are

common in today’s cloud data

centers.

Cache Writes

Page 6: Unifying the Data Center Caching Layer - Feasible? Profitable?

Motivation

Host 2

Key-Value Store

Host 1

File System (HDFS)

App1

App2App3

App4

Cloud Data Center

App5

Cache Fragmentation

Cache resources should be shared

across storage systems and applications

to decrease resource fragmentation

within the data center.

Store Type Writes

Block 71%

File 9.8%

Object 40.5%

Key-value 22.5%

Write-dominant workloads are

common in today’s cloud data

centers.

Cache Writes

Unified Caching Layer

Applications

Cache Devices

Different devices implies different

optimizations for cost-performance

in the caching layer.

Page 7: Unifying the Data Center Caching Layer - Feasible? Profitable?

Throughput increases 10x in Local Cache and 7x in Remote Cache when cache

hit-rate increases from 90% to 99%, using storage back-end latency of 1 ms.

Similar improvements are seen with larger (10ms and 100ms) storage latencies.

Motivation

Page 8: Unifying the Data Center Caching Layer - Feasible? Profitable?

Throughput increases 1.24x with 20GB Remote cache with QD=8

Throughput increases 18.7x with 100GB Remote cache with QD=8

Motivation

Page 9: Unifying the Data Center Caching Layer - Feasible? Profitable?

Caching as a Service (CaaS)

A general distributed caching layer that abstracts

and exposes all the available cache resources in

the cloud for all types of storage systems.

Page 10: Unifying the Data Center Caching Layer - Feasible? Profitable?

Workload

Generators

VMs, Containers,

Applications, etc.

Back-end Storage

All Storage I/O

Architecture

Page 11: Unifying the Data Center Caching Layer - Feasible? Profitable?

Workload

Generators

VMs, Containers,

Applications, etc.

CaaS Client

Translation Layer

Translation Layer

Back-end Storage

All Storage I/O

Architecture

Page 12: Unifying the Data Center Caching Layer - Feasible? Profitable?

Workload

Generators

VMs, Containers,

Applications, etc.

CaaS Client

Translation Layer

Translation Layer

Back-end Storage

All Storage I/O

Architecture

CaaS

Data Server

CaaS

Data Server

CaaS

Data Server

CaaS

Coordination Service

CaaS API

Page 13: Unifying the Data Center Caching Layer - Feasible? Profitable?

Workload

Generators

VMs, Containers,

Applications, etc.

CaaS Client

Translation Layer

Translation Layer

Back-end Storage

All Storage I/O

CaaS Misses

& Writeback

Architecture

CaaS

Data Server

CaaS

Data Server

CaaS

Data Server

CaaS

Coordination Service

CaaS API

Page 14: Unifying the Data Center Caching Layer - Feasible? Profitable?

❖ CaaS API is designed around the concepts of Store and Clice (Cache-slice).

❖ Stores: Unit of access-protected cache provisioning

❖ Clices: Unit of CaaS data access

❖ CaaS Clients register with the Coordination Service

❖ CaaS Clients request clice locations using on-demand lookup calls

❖ CaaS Clients access data from CaaS Data Servers using Read, writeClean/Dirty

CaaS APICaaS Coordination/Data Server API

Page 15: Unifying the Data Center Caching Layer - Feasible? Profitable?

❖ CaaS API is designed around the concepts of Store and Clice (Cache-slice).

❖ Stores: Unit of access-protected cache provisioning

❖ Clices: Unit of CaaS data access

❖ CaaS Clients register with the Coordination Service

❖ CaaS Clients request clice locations using on-demand lookup calls

❖ CaaS Clients access data from CaaS Data Servers using Read, writeClean/Dirty

CaaS API

❖ Write-back calls atomically write inter-dependent dirty data to storage back-end

CaaS Coordination/Data Server API

CaaS Client API

Page 16: Unifying the Data Center Caching Layer - Feasible? Profitable?

Experimental Setup❖ 363 Block Storage Workloads.

■ User home/project directories; Web-based servers; Webpage; Online course management system;

Hardware monitoring; Source control; Web staging; Terminal; Web/SQL; Media; Test web; Firewall/web

proxy; VMware VMs.

❖ 40 Gbit Ethernet network latency equal to 4µs. Back-end storage latency (AWS EBS)

equal to 2ms

❖ ARC replacement algorithm in SSD-based Local Cache and CaaS

❖ Cache simulation used to obtain the number of Read Hits

❖ Writes in local cache are simulated misses while writes in CaaS are simulated hits

❖ Writeback fault-tolerance in CaaS simulated using Primary-Backup replication with one

leader and one follower

Page 17: Unifying the Data Center Caching Layer - Feasible? Profitable?

Preliminary Study Model

Local Cache

Application

SSD Cache

Back-End

Storage

(EBS)

Read Latency

(5.3 µs)

Write/Miss

Latency

(2 ms)

Page 18: Unifying the Data Center Caching Layer - Feasible? Profitable?

CaaS

CaaS

Client

CaaS

Data Server

Read Latency

(13.3 µs)

Application

Back-End

Storage

(EBS)

Miss

Latency

(2 ms)

CaaS

CaaS

Client

CaaS

Data Server

Read Latency

(13.3 µs)

Application

Back-End

Storage

(EBS)

Miss

Latency

(2 ms)

Preliminary Study Model

Local Cache

Application

SSD Cache

Back-End

Storage

(EBS)

Read Latency

(5.3 µs)

Write/Miss

Latency

(2 ms)

Back-End

Storage

(EBS)

Application

CaaS

Data Server

CaaS

Data Server

Write Latency

(28 µs)

Miss

Latency

(2 ms)

CaaS

Client

Page 19: Unifying the Data Center Caching Layer - Feasible? Profitable?

Write Latency up to 90%Read Latency up to 6%

Preliminary Study Results

Page 20: Unifying the Data Center Caching Layer - Feasible? Profitable?

I/O Latency 31% improvement

Preliminary Study Results

Page 21: Unifying the Data Center Caching Layer - Feasible? Profitable?

Preliminary Study Results

I/O Latency 36% improvement

Page 22: Unifying the Data Center Caching Layer - Feasible? Profitable?

❖ Writeable caching: Design a caching layer that provides the fault-tolerance and

consistency requirements.

❖ Cache unit size: Select the clice size(s) in CaaS to optimize for both cache

fragmentation and metadata overhead.

❖ Cache management: Design an allocation scheme and the corresponding

eviction strategies.

❖ Data placement and Load balancing: Policies for data placement and load

balancing decisions across different CaaS data servers.

❖ Service Level Agreements (SLAs): A distributed cache resource allocation

algorithm that guarantees a user-defined level of Quality of Service (QoS).

Future Work

Page 23: Unifying the Data Center Caching Layer - Feasible? Profitable?

41

Thanks! ❖ [email protected]

❖ http://sylab-srv.cs.fiu.edu