Google - Bigtable
-
Upload
- -
Category
Engineering
-
view
47 -
download
0
Transcript of Google - Bigtable
1
Bigtable : A Distributed Storage System for Struc-tured Data
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C.Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew
Fikes,Robert E. Gruber
Google, Inc.
2
IndexIntroductionData ModelAPIBuilding BlocksImplementationRefinementsReal ApplicationsConclusions
3
Introduction1. Motivation2. What is a Bigtable?3. Why not a DBMS?
4
Introduction : MotivationLot of structured data at Google
◦Web page, Geographic Info. , User data, Mail
Millions of machinesDifferent projects/applications
5
Introduction : Why not a DBMS?Provide more than Google needsRequired DB with wide scalability,
wide applicability, high perfor-mance and high availability
Low-level storage optimizations help performance significantly
Cost would be very high◦Most DBMSs require very expensive
infrastructure
6
Introduction : What is a Bigtable?Bigtable is a distributed storage system
for managing structured dataAchieved several goals
◦wide applicability, scalability, high perfor-mance
Scalable◦ Terabytes of in-memory data◦ Petabyte of disk-based data◦ Millions of reads/writes per second, efficient scans
Self-managing◦Servers can be added/removed dynamically◦Servers adjust to load imbalance
7
Data Model1. Row2. Column families3. Timestamps
8
Data Model : RowThe row keys in a table are arbi-
trary stringsData is maintained lexicographic
older by row keyRow range is called a “tablet”,
which is the unit of distribution and load balancing
Sorted by row key in tablet
9
Data Model : Column Fam-iliesColumn keys are grouped into
sets called “column families”Basic unit of access controlA column key is named using the
this syntax “ family:qualifier”Access control and disk/memory ac-
counting are performed at the col-umns-family level
10
Data Model : TimestampsEach cell in a Bigtable can con-
tain multiple versions of the same data
sorted by timestamp order by descending
64-bit integersreal time in microseconds or as-
signed by client application
11
Data Model : Example
Row
Columns Columns family
Timestamps
12
APIThe Bigtable API provieds functions
◦Create/delete table and column families◦Change table, column family metadata◦Look up values from individual rows◦Iterate over a subset of the data
Supports single-row trancsactionsCan be used with
MapReduce(HBase)
13
API : ExampleUses a Scanner to iterate over all
anchors in particular rowTable *T = OpenOrDie(“/bigtable/web/webtable”);
14
Building BlocksUses the distributed Google File
System(GFS) to store log and data files
A Bigtable cluster typically oper-ates in a shared pool of machines
Depend on cluster management system
The Google SSTable file format is used internally to store Bigtable data
Relies on a highly-available and persistent distributed lock service called Chubby
15
Building Blocks : GFS & SSTable & ChubbyGoogle File System:
◦Google File System grew out of an earlier Google effort, "BigFiles”
◦Select for high data throughputs
16
Building Blocks : GFS & SSTable & ChubbySSTable:
◦provides a persistent, ordered map from keys to values
◦Contains a sequence of index block
17
Building Blocks : GFS & SSTable & ChubbyChubby:
◦ensure that there is at most one ac-tive master at any time
◦store the bootstrap location of Bigtable data
◦discover tablet servers and finalize tablet server deaths
◦store Bigtable schema information (the column family information for each table)
18
Implementation1. Tablet Location2. Tablet Assignment3. Tablet Serving
19
ImplementationThree major components
◦Library that is linked every client◦One master server◦Many tablet servers
20
Implementation : Tablet LocationUse three-level hierarchy analogous to
that of a B+tree to store tablet loca-tion information(Maximum three level)
The first level is a file stored in Chubby that contains the location of the root tablet
21
Implementation : Tablet LocationRoot tablet
◦First tablet in the METADATA table◦Never split to ensure that the tablet
location hierarchy has no more than three levels
METADATA tablet◦Stores the location of a tablet under
a row key that is an encoding of the tablet’s table identifier and its end row
Implementation : Tablet Assign-ment
Master server◦assign tablets to tablet servers◦detect presence of absence(expiration) of
tablet servers◦balance tablet-server load◦handle schema changes such as table and
column family creationsTablet server
◦manage a set of tablets(ten to a thousand tablets per tablet server)
◦handle read/write requests to the tablets◦split tablets that have grown too large
23
Implementation : Tablet ServingUpdates are committed to a
commit log that stores redo records.
Recently committed ones are store in memtable
Older updates are stored in a se-quence of SSTables
24
Refinements1. Locality groups2. Compression3. Caching for read performance4. Bloom filters5. Commit-log implementation
25
RefinementsLocality groups
◦Client can group multiple column fami-lies together into a locality group
Compression◦We benefit in that small portions of an
SSTable can be read without decom-pressing the entire file
◦Encode at 100-200MB/s◦Decode at 400-1000MB/s◦10-to-1 reduction in space
26
RefinementsCaching for read performance
◦Tablet servers use two levels of caching Scan/Block Cache
Bloom filters◦Should be created for SSTable in a
particular locality groupCommit-log implementation
◦Co-mingling mutations for different tablets in the same physical log file
27
Real Applications1. Google Analytics2. Personalized Search
28
Real ApplicationsGoogle Analytics
◦Use two of the tables The raw click table(~200TB) The summary table(~20TB)
◦Use a MapReducePersonalized Search
◦History of users◦Use a MapReduce
29
ConclusionsBigtable clusters have been in
production use since April 2005 at Google
Provide Performance and high availability
Found that there are significant ad-vantages to building storage solution at Google
Apache Hbase based on Bigtable
30
Thank you!