OceanStore An Architecture for Global-Scale Persistent Storage Motivation Feature Application...

Post on 22-Dec-2015

222 views 0 download

Tags:

Transcript of OceanStore An Architecture for Global-Scale Persistent Storage Motivation Feature Application...

OceanStore An Architecture for Global-Scale Persistent Storage

• Motivation

• Feature

• Application

• Specific Components- Secure Naming - Update

- Access Control - Deep Archival Storage

- Data Location and Routing - Introspection

• Conclusion

http://oceanstore.cs.berkeley.edu

provides persistent storage for ubiquitous computing

• Secure information

• Durable information

• Automatic and reliable archiving of information

• Geographically distributed data and cache

1010 users * 10,000 files/user = 1014 files

Motivation

• Data Utility Model- user- service provider / responsible party

• Untrusted Infrastructure - privacy & integrity & robustness

• Nomadic Data/Promiscuous Cachingfloating replicas

• Deep Archival Storagearchival form of data objectself-verifying data

• Introspection

Features

Computation

Optimization Observation

• Groupware and personal information management toolschallenge: concurrent updates from many peoplesolution: flexible update mechanism

• Digital libraries and repositories for scientific datachallenge: Massive quantities of storage, reliability, complicated managementsolution: Deep archival storage + seamless data migration

• New streaming applications challenge: data aggregation and dissemination solution: uniform infrastructure

Applications

Secure Naming

Fundamental Unit: Persistent ObjectGUID: secure hash, 160bit, location independent

Uniqueness + unforgeability + verification

• AGUID Active data:

SHA-1(human-readable name + owner’s public key)

• VGUID Archival data: SHA-1(data)

• NodeID Server: SHA-1(public key of server)

Directory object: securely mapping human-readable names to GUID

GUIDs Secure PointersName+Key

Active GUID

Global Object Resolutions

Floating Replica(Active Object)

Active Data

CommitLogs

CKPoint GUID

Archival GUIDArchival GUID

Signature

RP KeysACLsMetaData

Global Object Resolution

Archival copyor snapshot

Archival copyor snapshot

Archival copyor snapshot

Erasure Coded

Archival GUIDSignature

Inactive Object

Global Object Resolution

Access Control

• Reader restriction

restrict key distribution only to readers

• Writer restriction

ACL

require all writes be signed

Data Location and Routing

Two levels:

• Fast probabilistic search for “routing cache”Attenuated Bloom filter

first Bloom filter: record of the objects contained locally on the current node

ith Bloom filter:union of all of the Bloom filters for all of the nodes a distance i through any path from the current node.

fully distributed, constant amount of storagelocality – provided by introspection mechanism

• Slow guaranteed global search plaxton mesh

Global Algorithm

• Nodes : NodeID• Data Object: GUID

– Each object has Root node

f (ObjectID) = RootID, randomly mapped

– Root node is responsible for storing object’s location

– Publish process :

deposit a pointer at every hop along the path to root node

• Plaxton mesh• Incremental suffix based routing

4

2

3

3

3

2

2

1

2

4

1

2

3

3

1

34

1

1

4 3

2

4

Plaxton MeshIncremental suffix-based routing

NodeID0x43FE

NodeID0x13FENodeID

0xABFE

NodeID0x1290

NodeID0x239E

NodeID0x73FE

NodeID0x423E

NodeID0x79FE

NodeID0x23FE

NodeID0x73FF

NodeID0x555E

NodeID0x035E

NodeID0x44FE

NodeID0x9990

NodeID0xF990

NodeID0x993E

NodeID0x04FE

NodeID0x43FE

Object LocationRandomization and Locality

Fault-tolerant Routing

• Multiple roots of each object using salted hash

• Additional neighbor links & neighbor link repair

• Repeat publishing process to repair location pointers

• Detect failures via soft-state probe packets

• Dynamic insertion & deletion

Update Model

TimeStampClient ID{Pred1, Update1}{Pred2, Update2}{Pred3, Update3}Client Signature

Update message format:

Conflict resolution

• Predicate-action pairs• write restriction• All updates submitted to Inner Ring servers which use byzantine agreement protocol to choose the final commit order• Responsible party decides the inner ring• Use plaxton mesh to disseminate commit order to secondary tier replicas

Flexible update: support a range of consistency semantics (e.g. ACID)Untrusted infrastructure, limitation to work over ciphertext.

Performance: - requirement of network bandwidth - latency of the client side

OceanStore Update

Deep Archive Storage

• Archival Data in Erasure Coded Fragments - Erasure codes

produce n fragments, where any m is sufficient to reconstruct data. m < n. rate r = m/n. Storage overhead is 1/r.

• OceanStore equivalent of stable store• Archival Fragments generated by Inner Ring• Fragments are self-verifying

Deep Archive Storage - update

Deep Archival Storage - Self Verifying Data

Fragment 3:

Fragment 4:

Data:

Fragment 1:

Fragment 2:

H2 H34 Hd F1 - fragment data

H14 data

H1 H34 Hd F2 - fragment data

H4 H12 Hd F3 - fragment data

H3 H12 Hd F4 - fragment data

F1 F2 F3 F4

H1 H2 H3 H4

H12 H34

H14

B-GUID

HdData

Encoded Fragments

F1

H2

H34

Hd

Fragment 1: H2 H34 Hd F1 - fragment data

Introspection

• Monitoring and adaptation of routing substrate–Optimization of Plaxton Mesh–Adaptation of second-tier multicast tree

• Continuous monitoring of access patterns:–Clustering algorithms to discover object relationships

•Clustered prefetching: demand-fetching related objects•Proactive-prefetching: get data there before needed

–Time series-analysis of user and data motion• Continuous testing and repair of information

–Slow sweep through all information to make sure there are sufficient erasure-coded fragments–Continuously reevaluate risk and redistribute data–Diagnosis and repair of routing and location infrastructure

Conclusions• OceanStore: everyone’s data, one big utility

– Global Utility model for persistent data storage

• OceanStore properties:– Provides security, privacy, and integrity– Provides extreme durability– Lower maintenance cost through continuous

adaptation, self-diagnosis and repair– Large scale system has good statistical properties

Difference: Oceanstore: persistent storage infrastructure, untrusted infrastructure, passive data object OSD: active/dynamic object, trust model

can not be too active over ciphertext.

Common issues:- data security(privacy, integrity, reliability) - authentication and authorization- naming and routing- data consistency - caching- maintain-free- applications

OceanStore vs OSD