読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)

Click here to load reader

  • date post

    15-Jan-2015
  • Category

    Technology

  • view

    2.866
  • download

    10

Embed Size (px)

description

2011/04/14 4月OS/ARC研究会の発表スライドです。

Transcript of 読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)

  • 1. 11.4.14 - mycassandra - 1

2. NoSQL, Key-Value Store (KVS), Document-oriented DB, GraphDB : memcached, Google Bigtable, Amazon Dynamo, Amazon SimpleDB, Apache Cassandra, Voldemort, Ringo, Vpork, MongoDB, CouchDB, Tokyo Cabinet/Tokyo Tyrant, Flare, ROMA, kumofs, Kai, Redis, Hadoop Hbase, Hypertable, Yahoo! PNUTS, Scalaris, Dynomite, ThruDB, Neo4j, IBM ObjectGrid, Oracle Coherence, Velocity, : (join, transaction)/ - mycassandra - 2 3. key/value vs. multi-dimensional map vs. document vs. graph vs. vs. strong vs. weak vs. row vs. column master/slave vs. decentralized11.4.14 - mycassandra -3 4. key/value vs. multi-dimensional map vs. document vs. graph vs. vs. strong vs. weak vs. row vs. column master/slave vs. decentralized11.4.14 - mycassandra -4 5. vs.write/read Bigtable, Cassandra,MySQL, Sherpa HBaseLog-StructuredB+-Tree [R.Bayer 72] Merge Tree [P. ONeil 96]BigtableMySQL11.4.14- mycassandra -5 6. Write-Heavy Read-Heavywrite-optimized Better Betterread-optimized write-optimized read-optimizedYahoo! Cloud Serving Benchmark, SOCC 1011.4.14- mycassandra -6 7. /1.2.1.MyCassandra 2.MyCassandra Clusterread-optimizedread/write-optimizedwrite-optimized11.4.14 - mycassandra - 7 8. Apache Cassandra N = 3 IDConsistent Hashing( )A F Z secondary 1Q V N request proxyprimarysecondary 2 primary nodesecondary nodehash(key) = Qkey values11.4.14 - mycassandra - 8 9. Google Bigtable -:- Bigtable:sequential write I/O always writable write-lockCassandra map: asyncMemtable MemoryDisk writeCommit Log SSTable 11.4.14 - mycassandra -9 10. Google Bigtable- :-key Memtable value SSTablevalue I/OMap Cassandra read MemtableMemory Disk Commit LogI/O SSTable11.4.14- mycassandra -10 11. 1. MyCassandra read-optimizedwrite-optimized11.4.14 - mycassandra -11 12. Cassandra Cassandra/ Consistent HashingInnoDB MyISAMMemory Gossip ProtocolBigtableMySQLRedis11.4.14MyCassandra:12 13. MyCassandra :Cassandra :. JDBC API / stored procedure : key-value store MyCassandra node 611.4.14 13 14. 2.MyCassandra Clusterread/write-optimized11.4.14- mycassandra - 14 15. sync async=>Quorum Protocol: ( )+( )> ()=>mem11.4.14- mycassandra -15 16. W: R: RW: MyCassandra(W) / (R) / (RW)gossip protocol 1.(key ) 2. N-1N=3 Consistent HashingID R RW RW WWRgossip RRWWRW R W11.4.14 16 17. hostnode(1) 1 /1 storage(2) 1 /k ID[Amazon Dynamo, SOSP 07](3) 1 FT spaceFaultTorelance (FT)space FTspace(3)1storage / 1node / 1 host(2) (1)virtual node 1 node / hostk nodes / host11.4.14 17k storages / node 1 storage / node 18. : R: RW: =3, =2W:RW:R = 1:1:1 Client 1) Proxy2) W, RW ACK ACK3a)W 3b) R RW R ACK : max (W, RW)11.4.14 - mycassandra -18 19. : R: =3, =2 RW:W:RW:R = 1:1:1Client Proxy1) 2) R, RW 3a)3b)orWW RW R4) Proxy (Cassandra read repair ) : max (R, RW)11.4.14- mycassandra -19 20. / MyCassandra Cluster: 63 = 18/6 (W:R:RW = 6 : 6 : 6) Cassandra: 6/6 : = 3, : = =2: Bigtable (W), MySQL / InnoDB (R), Redis (RW): YCSB (Yahoo! Cloud Serving Benchmark) [SOCC 10] 1. MyCassandra/Cassandra6 YCSB Client1 2. 1KB values(100[Bytes]10[columns])+key1,000 3. 4. YCSB 5. YCSB Stat11.4.14 - mycassandra -20 21. YCSB 4 WorkloadApplication Operation RatioRecordExample SelectionLogRead: 0% Zipfian( )Write Write-OnlyWrite: 100%HeavyRead: 50% Write-Heavy Session StoreWrite: 50%Read: 95%Read Read-Heavy Photo tagging Write: 5%HeavyRead: 100% Read-Only CacheWrite: 0%( ) Zipfian :,/11.4.14- mycassandra -21 22. /1 11.5~23.5%avg. write-latencyCassandra0.8MyCassandra0.6 Cluster0.4MySQL + RedisBetter0.2write:100%write:50% write:5% write:0% 0 (ms)88.5%10avg. read-latency 8Better 685.2%88.5%449.7% 2 read:0%read:50%read:95%read:100% 0 (ms) Write-OnlyWrite-HeavyRead-HeavyRead-Only11.4.14 - mycassandra -22 23. 30000 0.99Cassandra max. qps for 40 clients MyCassandra25000Cluster20000 6.53 15000Better100000.621.49 5000 0 [100:0] [50:50] [5:95][0:100][write:read](query/sec) Write-OnlyWrite-Heavy Read-HeavyRead-Only Write HeavyRead Heavy6.53 11.4.14- mycassandra -23 24. (1): HDD vs. SSD30000Cassandra HDD30000MyCassandra SSDHDD25000 SSD2500020000 20000Cluster15000 15000(3)( ) ( ) 10000Better 10000 5000 5000(3)00(qps) (qps)(1)HDD/SSD IOZoneHDD: Western digital SSD: Crucial(2)benchmark sequential write86,277 qps 96,401 qps(3) sequential read 108,914 qps216,099 qps random write2,485 qps29,045 qps11.4.14- mycassandra - random read 926 qps21,751 qps 24 25. Read-Heavy 88.5% 6.53=> / Write-Heavy Cassandra11.4.14- mycassandra - 25 26. (1/2)Write-Heavy MySQL:: )write-optimized write-heavy 415000Cassandra MyCassandracluster31000021 50000 011.4.14 26 write latencyread latency throughput 27. (2/2)Amazon EC2 1 /N/ / 11.4.14- mycassandra -27 28. FD-Tree: Tree Indexing on Flash Disks, VLDB 10 B+tree + LSM-tree SSD MySQL: RDBMS Anvil, SOSP 09: 1 Cloudy, VLDB 10: Dynamo, SOSP 07: vs. MyCassandra (): vs.11.4.14 - mycassandra - 28 29. : MyCassandra/MyCassandra ClusterCassandra 1. MyCassandra 2. MyCassandra Clusterdata model multi-dimensional map (Column Family)throughput writewrite or read write and readlatencylowlower in case lowerpersistenceyesyes or no (memory) yesconsistencyweak (eventual, quorum)replicationsync / asyncdata partition rownodedecentralizedorganizationthroughput, latency11.4.14- mycassandra - 29 30. : 1) 2) MySQL + memcached: MyCassandra Cluster - - Tablemovie-id namethumb-name tag count704122313movieA EY37lHk5bgU sport, succer, FIFA, 169,374704122314movieB Zk3BSYMWjzQ music, jazz, 472,80311.4.14Read-Heavy - mycassandra - Write-Heavy 30