SNIA Scality DSI April 2014 · Traditional File Storage not ready and designed to support Internet...
Transcript of SNIA Scality DSI April 2014 · Traditional File Storage not ready and designed to support Internet...
PRESENTATION TITLE GOES HERE
Key criterias when building ExaScale Data Center
Philippe Nicolas [email protected]
Scality
2 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Agenda
Internet impact ExaScale DC Scality RING Summary
3 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
A Rolling and Unstoppable Ball…
WORLDWIDE VOLUME OF DATA IS EXPECTED TO GROW BY
50% EACH YEAR
D A T A P R O D U C E D B Y G O O G L E
2 4 P B A D D E D D A I L Y 100 hours of video uploaded to YouTube every minute
12.5M new photos uploaded every hour 4.2B likes, posts and comments per day
170PB HDFS on 40K+ servers @ Yahoo!
4 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Internet changed everything !
...Internet introduced new challenges impossible to solve with traditional approaches:
100s Millions of users, 10s-100s PB of Data and Billions of files to store, serve and
compute
What all these companies have in common ?
5 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Radical new Software Approaches
Traditional File Storage not ready and designed to support Internet age !! (fast and massive growth, always on, remote access)
Limitations in Size and Numbers (total capacity, file size, file number - global and per directory)
Google FS, Haystack at Facebook, 170PB HDFS (40k+ servers) at Yahoo! Metadata wide-consistency is problematic
“Horizontal” consistency (volume, FS) - Recovery model File System structure and disk layout, File Sharing Protocols “too old”, Scalability limits
Cost control (HW & SW) Limited SW license (open source) New Ideas coming from Academic, Research and “Recent” IT companies
GOOG research papers about BigTable, GFS or AMZN Dynamo
6 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Scalability Performance and Capacity
Multi-Geo Multiple points of presence and service
Advanced Data Protection Replication and Erasure Coding, (Geo-)Dispersed
Access Methods http, file and block modes, open, standard, proprietary... protocols
Ecosystem Large application portfolio validated on the platform
Common ExaScale DC Characteristics
7 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Scality’s Mission
Their DC Their App. YOUR Data
Their DC YOUR App. YOUR Data
YOUR DC YOUR App. YOUR Data
Unified Scale-Out Storage Software from primary to long term storage
with Cloud’s advantages, investment protection and ready for future
8 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Scality RING
Scality RING
x86
Ring Topology
P2P
End-to-End Parallelism
DATA
MD
Object Storage
Replication
ARC
Erasure Coding
Geo Redundancy
Tiering Management Distributed DB
MESA
Object HTTP/REST, S3, Swift & CDMI
File FUSE, NFS, CIFS, AFP & HDFS
Block iSCSI, Cinder
The Data Center Data Storage Platform
Security
Kerberos ACL, SSL…
9 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Hardware Agnostic
Racks (10…40U) Clusters
Massively Scalable Architecture
x86 server (1U, 2U…4U)
with CPU, RAM,
Ethernet and DAS running Linux
Linux OS
10 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Data Center Storage with Servers
From Servers Farm to Storage Pool Scale-Out, Share-Nothing, P2P Technology & RING Topology Data Replication and Erasure Coding (Scality ARC) Object, File (Scality SOFS) and Block Access
Linux OS Linux OS Linux OS Linux OS …
Scality RING
Object – File – Block
11 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Distributed Architecture
From Servers to Storage Nodes RING Topology, P2P Architecture Limitless Scale-Out Storage based on Shared Nothing model Fully Distributed Storage (Data and Meta-data)
servers (min. 6) storage nodes (ex: 6/server, total=36)
36 storage nodes projected on a ring
Scality RING
1 node manages
1/36th of the key space
12 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Advanced Data Protection
Local or Stretched Cluster (Geo-Dispersed) Replication (Small files, “Small” config.)
Erasure Code (Scality ARC)
Replication 4x
Data Data inputs
Scality ARC(14,4)
Data
Parities
Data inputs
4 chunks
14 fragments
13 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Access Methods and Standards
Open to de-facto & industry standards Access Methods exposed by Connectors Local and Remote access Global Namespace Optimized Object API
Scality HTTP/REST API, S3*, Swift & CDMI
Flexible File Interfaces FUSE, NFS, CIFS, AFP, FTP & HDFS
Block interface iSCSI & Cinder
NFS NFS
SOFS FUSE
S3
CDMI
NFS
Swift
Internet
SOFS Block
NFS NFS HDFS
* Scality RS2 (Full and Light version)
14 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Scality File Storage Services
FUSE based Parallel Network File Access Industry & de-facto File Sharing Protocols (NFS, CIFS…) CDMI server and Computing in-place for Hadoop Scality Scale Out File System (SOFS) back-end layer Data Protection by Replication or Erasure Coding (ARC)
Scality Scale Out File System
Replication Erasure Coding
FUSE CIFS AFP FTP HDFS CDMI NFS
15 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Scality SOFS via FUSE
Parallel Network File Access FUSE-based access POSIX compliant Direct data access, no gateway Highly Scalable
2^32 volumes (=FS), 2^24 namespaces
Very High Capacity and Billions of Files 10^20 addressable files (2^64 files) per Volume
Sparse files to support large files Leverages Scality MESA Internal Distributed DB
Volume
File System
File System
File System
File System …
Mount Point
264 files
232 Volumes
Namespace 224 Namespaces
16 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Parallel Network File Access
Global Namespace – ONE Single View of Content
1000s-10000s Compute
Nodes
Parallel Data
Access To Deliver
High Throughput
10s-1000s Storage Servers
100s-10 000s HDD
Linux OS
17 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Performance Benchmark
Mixed workloads with small and large files Performance almost linear with addition of storage
45k Read / 42k Write (4KB) 8GB/s Read (1MB) 17GB/s Read (1GB) 40 ms average time to file download on HDD 5 ms maximum time to information lookup 1 rack of storage (0.7PB raw)
Data reconstruction 1TB in 16 ! while serving data
Downloadable from www.scality.com
18 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Scality Kinetic Model
MetaData RING Farm
x86 servers or VM
w. SSD File system Directories
Small objects Cache
MDC
Seagate Kinetic Pools
Redundant Geo
Metadata farms (available also at Kinetic sites)
Data Center #0 Data Center #1 Data Center #2
MetaData access
Direct Data access (Object / File / Block)
Erasure coding and/or replication
19 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Scality OpenStack Playbook
Seamless integration between OpenStack Compute cluster and Scality RING via Object (Swift) or Block (Cinder) APIs OpenStack to benefit from Scality RING Petabyte scale unified storage platform Scality to leverage multiple OpenStack projects: Dashboard, Telemetry, Identity, Savanna (Hadoop) and Manila (File Services) Key contribution: Open Source kernel block driver via HTTP
Commodity HW
App. Data
Services
20 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Ecosystem – Path to Success
Scality RING Data Storage
Platform
S3 & CDMI API
Associations Foundations
Use Cases Vertical IT
Strategic Partners
Software Validation
21 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Scality Technologies for ExaScale DC
Massively Scalable Architecture Share-Nothing, Scale-out, P2P Technology & RING Topology Multiple Local and Geo deployments Billions of Files and ExaBytes of Data
Very High Resiliency
Data Replication and Erasure Coding (Scality ARC)
Application connection with flexible Access Methods
HTTP/REST, S3, CDMI, NFS, CIFS, AFP, FTP, HDFS, Scality SOFS and Block
Deployment Options
Data Center Multi-sites Stretched DC
SOFS
22 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Value Proposition
No NAS Silos, Storage Arrays or Storage Networking Software-Defined Storage with x86 COTS
High Performance, Capacity and Reliability Simple IT Operations with Fully Automated solutions and No Storage Management Fast and Easy Integration with Standard Access Methods and Interfaces
Very low TCO
NAS SAN
23 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Active References
24 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Scality Technical Vision
Storage Back-end Local, Distant or Multi-Geo RINGs
Scale Out Access Layer
Unified Storage Interface (File, Block, Object and HDFS
Local and Remote)
Tenant A Tenant B Tenant C Tenant D
Email Media Cloud Compute
Flash/SSD High Perf.
Transactional
Real-time Policy & QoS
Big Data Analytics
Content Indexing
Metering
Auth.
Statistics
Security
Customer IT Infrastructure
Discovery
Note: Underlined items are roadmap
1 year
SATA Standard Storage
100 years
Erasure Code SATA & Ethernet
Long-Term Storage
25 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Scality – Quick Facts
Founded 2009 Experienced management team HQ in the San Francisco, Global reach ~70 employees, ~30 engineers in Paris 24 x 7 support team 3 US patents $35M in 3 rounds 500% annual growth Industry Associations
“Aggressive use of a scale-out architecture
like that enabled by Scality's RING
architecture will become more
prevalent, as IT organizations develop
best practices that boost storage asset
use, reduce operational overhead,
and meet high data availability
expectations.”
26 2014 Data Storage Innovation Conference. © Scality. All Rights Reserved.
Questions & Answers
PRESENTATION TITLE GOES HERE
Key criterias when building ExaScale Data Center
Philippe Nicolas [email protected]
Scality