Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf ·...
Transcript of Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf ·...
Presented by
Nanditha Thinderu
� Enterprise systems are highly distributed and heterogeneous which makes administration a complex task
� Application Performance Management tools developed to retrieve information about failures rates and resource utilization.
� APM platform for monitoring big data with a tight resource budget and fast response time
� APM is refers to monitoring and managing the enterprise software systems.
� The two approaches are� Black –box approach�API based approach� By capturing every method invocation in an enterprise system, APM tools can generate a vast amount data
� APM data consists of a metric name, a value and a time stamp.
� In storage system, the queries can be two major types
� Single value lookups to retrieve the most current value
� Small scans for retrieving systems health information
Metric NameMetric NameMetric NameMetric Name valuvaluvaluvalueeee
MinMinMinMin MaxMaxMaxMax TimestaTimestaTimestaTimestampmpmpmp
DurationDurationDurationDuration
� Yahoo! Cloud servicing Benchmark is designed for evaluation of key values stores using APM properties.
� We define five workloads (R,W,RSW,RW,RSW) as APM data is append only .
� It comprises a data generator, a workload generator as well as drivers for several key-value stores
� The goal was not only to get a pure performance comparison but also a broad overview of available solutions.
� Data stores used can be classified into categories
� Key-value stores : project Voldemort and Redis� Extensible record stores: HBase and Cassandra
� Scalable relational stores: My SQL Cluster an VoltDB
�We used Hbase v0.90.4 running on top of Hadoop v0.20.205.0.�Hbase uses HDFS it also requires the installation and configuration of Hadoop�Tables in Hbase can be accessed through API
�We used the recent 1.0.0.rc2 version and default Random Partitioner that distributes the data across the nodes randomly
�Implemented Cassandra YCSB client which is required to set just one column family to store all fields, each of them corresponding to a column
�It’s a systematic system and employs consistent hashing for distributing the values across the nodes
•We used 0.90.1 with embedded BerkeleyDB
storage and already
implemented Voldemort
configuration was easy
for most part.
•It is highly scalable
storage system with a simpler design
compared to relational
database
�We used 2.4.2 version as cluster version was in an unstable state and could not run a complete test.
�The default updated Redis YCBS client to use SharedJedisPool
�For data storage, YCSB uses a hash map as well as sorted set.
•We used VoltDB v2.1.3
and the default
configuration
•YCSB client driver for
the VoltDB that
connects to all servers
is implemented
•We used MySQL
v5.5.17 and InnoDB as
the storage engine
•RDBMS YCSB client
which is implemented
and connects to
databases using JDBC
� The workload has the most read intensive with 95% and only
5% writes. We present latencies and throughout using
logarithmic scale
� Redis has highest throughput
� Hbase has highest Read latency
� Cassandra has highest write latency
� In the second experiment, workload RW is
used which has 50%writes
� VoltDB achieves highest throughput for one
node which is slightly lower compare to
workload R
� In write latency Hbase and MySQL have
important differences compared to Workload
RW
� Workload is one that is closest to APM use case
� It has 99% write rate
� The throughput results is similar to workload RW
� For the read latency, the apparent change is the high latency of Hbase
� For write latency, Hbase has increased significantly
� The workload RS has 47% read and scan and 6% write
operations
� The MYSQL has best throughput for a single node
� Cassandra, HBase obtain a linear increase in throughput for
number of nodes
� This workload has 50% reads of which 25% are scans
� The most of results are similar to RS
� In this we used 8 nodes of each system
� The results are calculated for workload R
� We observe varying latencies for different key store
values
� The write latencies have similar development for
Cassandra, Voldemort, Redis
� The most efficient system in storage is Hbase
� REDIS an VoltDB are omitted as do not store
data on disk
� Cassandra stores the data most efficiently
� The disk usage can be reduced by
compression
� Series of tests conducted on cluster D
� The throughput increases for all systems with
higher ratios
� Project Voldemort has best read latency
� HBase has a low write latency but it is best for
workload RW
� Cassandra: Its achieves highest throughput for maximum
number of nodes and its performance is best for high rates.
� Hbase: Hbase throughput is lowest for one node. But
increases linearly with number of nodes. It has low write
latency, however read latency is much higher than other
systems.
� Project Voldemort: At low the read and write latencies are
similar and are stable.
� MYSQL: It achieved high throughput, however latency
decreases with the number of nodes.
� Redis: It has high throughput which exceeds all other
systems for read intensive. But latencies decreases for
both read and write operations
� VoltDB: The performance is high for single instance but
never achieved throughput increase with more than one
node
� we optimized each system for our workload and tested it with a number of open connections which was 4 times higher than the number of cores in the host CPUs.
� Higher numbers of connections led to congestion and slowed down the systems considerably while lower numbers did not fully utilize the systems.
� This configuration resulted in an average latency of the request processing that was much higher than in previously published performance measurements.
� Since our use case does not have the strict latency requirements that are common in online applications and similar environments, the latencies in most results are still adequate