Post on 18-Feb-2022
AN EVALUATION OF KEY-VALUE STORES IN
SCIENTIFIC APPLICATIONS
A Thesis Presented to
the Faculty of the Department of Computer Science
University of Houston
In Partial Fulfillment
of the Requirements for the Degree
Master of Science
By
Sonia Shirwadkar
May 2017
AN EVALUATION OF KEY-VALUE STORES IN
SCIENTIFIC APPLICATIONS
Sonia Shirwadkar
APPROVED:
Dr. Edgar Gabriel, ChairmanDept. of Computer Science, University of Houston
Dr. Weidong ShiDept. of Computer Science, University of Houston
Dr. Dan PriceHonors College, University of Houston
Dean, College of Natural Sciences and Mathematics
ii
Acknowledgments
“No one who achieves success does so without the help of others. The wise acknowledge
this help with gratitude.” - Alfred North Whitehead
Although, I have a long way to go before I am wise, I would like to take this opportunity
to express my deepest gratitude to all the people who have helped me in this journey.
First and foremost, I would like to thank Dr. Gabriel for being a great advisor. I
appreciate the time, effort and ideas that you have invested to make my graduate experience
productive and stimulating. The joy and enthusiasm you have for research was contagious
and motivational for me, even during tough times. You have been an inspiring teacher and
mentor and I would like to thank you for the patience, kindness and humor that you have
shown. Thank you for guiding me at every step and for the incredible understanding you
showed when I came to you with my questions. It has indeed been a privilege working with
you.
I would like to thank Dr. Shi and Dr. Price for accepting to be my committee members.
I truly appreciate the time and effort you spent in reviewing my thesis and providing
valuable feedback.
A special thanks to my PSTL lab-mates Shweta, Youcef, Tanvir, and Raafat. You have
contributed immensely to my personal and professional time at the University of Houston.
The last nine months have been a joy mainly because of the incredible work environment
in the lab. Thank you for being great friends and for all the encouragement that you have
given me.
A big thanks to Hope Queener and Jason Marsack at the College of Optometry for
teaching me the value of team-work and work ethics. I truly enjoyed working with you.
I have been extremely fortunate to have the constant support, guidance, and faith of
iii
my friends. A big thank you to all my friends in India, for constantly motivating me to
follow my dreams. Thank you for the late-night calls, care packages, and all the love that
you have given me in the time that I have been away from home. I would like to thank my
friends Omkar, Tejus, Sneha, Sonal, Aditya, and Shweta for being my family away from
home. I will forever be grateful for the constant assurance and encouragement that you
gave me. I would also like to thank my friends, classmates and roomates here in Houston
for all their help and support.
A special thanks to all my teachers. I would not be here if not for the wisdom that you
have shared. You have empowered me to chase my dreams. Each one of you has taught
me important life lessons that have always guided me. I will be eternally grateful to have
been your student.
Last but by no means the least, I would like to thank my family for always being there
for me. I would like to start by thanking my Mom and Dad for their unconditional love and
support. A very big thank you to Kaka and Kaku for all their love, concern and advice.
You all have taught me the beauty of hard-work and perseverance and this thesis would
never have been possible without you.
Finally, I would like to thank Parikshit for being my greatest source of motivation.
You inspire me everyday to be a better version of myself and I would never have made it
without you.
iv
AN EVALUATION OF KEY-VALUE STORES IN
SCIENTIFIC APPLICATIONS
An Abstract of a Thesis
Presented to
the Faculty of the Department of Computer Science
University of Houston
In Partial Fulfillment
of the Requirements for the Degree
Master of Science
By
Sonia Shirwadkar
May 2017
v
AbstractBig data analytics is a rapidly evolving multidisciplinary field that involves the use of com-
puting capacity, tools, techniques, and theories to solve scientific and engineering problems.
With the big data boom, scientific applications now have to analyze huge volumes of data.
NoSQL [1] databases are gaining popularity for these type of applications due to their scal-
ability and flexibility. There are various types of NoSQL databases available in the market
today [2], including key-value databases. Key-value databases [3] are the simplest NoSQL
databases where every single item is stored as a key-value pair. In-memory key-value stores
are specialized key-value databases that maintain data in main memory instead of the disk.
Hence, they are well-suited for applications having high-frequencies of alternating read and
write cycles.
The focus of this thesis is to analyze popular in-memory key-value stores and com-
pare their performance. We have performed the comparisons based on parameters like
in-memory caching support, supported programming languages, scalability, and utilization
from parallel applications. Based on the initial comparisons, we evaluated two key-value
stores in detail, namely Memcached [4] and Redis [5]. To perform extensive analysis of these
two data stores, a set of micro-benchmarks have been developed and evaluated for both
Memcached and Redis. Tests were performed to evaluate the scalability, responsiveness
and data load handling capacity and Redis outperformed Memcached in all test cases.
To further analyze the in-memory caching ability of Redis, we integrated it as a caching
layer into an air quality simulation [6] based on Hadoop [7] MapReduce [8] which calculates
the eight-hour rolling average of ozone concentration at various sites in Houston, TX. Our
aim was to compare the performance of the original air-quality application that uses the
disk for data storage, to our application that uses in-memory caching. Initial results show
that there is no performance gain achieved by integrating Redis as a caching layer. Further
optimizations and configurations of the code is reserved for future work.
vi
Contents
1 Introduction 1
1.1 Brief Overview of Key-Value Data Stores . . . . . . . . . . . . . . . . . . . 4
1.2 Goals of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Organization of this Document . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Background 8
2.1 In-memory Key-value Stores . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Redis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Memcached . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Riak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.4 Hazelcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.5 MICA (Memory-store with Intelligent Concurrent Access) . . . . . . 21
2.1.5.1 Parallel Data Access . . . . . . . . . . . . . . . . . . . . . . 21
2.1.5.2 Network Stack . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.5.3 Key-value Data Structures . . . . . . . . . . . . . . . . . . 23
2.1.6 Aerospike . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1.7 Comparison of Key-Value Stores . . . . . . . . . . . . . . . . . . . . 26
2.2 Brief Overview of Message Passing Interface (MPI) . . . . . . . . . . . . . . 29
vii
2.3 Brief Overview of MapReduce Programming and Hadoop Eco-system . . . . 31
2.3.1 Integration of Key-Value Stores in Hadoop . . . . . . . . . . . . . . 35
3 Analysis and Results 36
3.1 MPI Micro-benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1.1 Description of the Micro-benchmark Applications . . . . . . . . . . . 38
3.1.1.1 Technical Data . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1.2 Comparison of Memcached and Redis using our Micro-benchmark . 41
3.1.2.1 Varying the Number of Client Processes . . . . . . . . . . . 43
3.1.2.1.1 Using Values of Size 1 KB . . . . . . . . . . . . . 43
3.1.2.1.2 Using Values of Size 32 KB . . . . . . . . . . . . . 44
3.1.2.2 Varying the Number of Server Instances . . . . . . . . . . . 47
3.1.2.3 Varying the Size of the Value . . . . . . . . . . . . . . . . . 48
3.1.2.4 Observations and Final Conclusions . . . . . . . . . . . . . 50
3.2 Air-quality Simulation Application . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Integration of Redis in Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.1 Technical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.4 Results and Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4 Conclusions and Outlook 59
Bibliography 62
viii
List of Figures
1.1 Key-value pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Redis Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Redis in a Master-Slave Architecture . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Memcached Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Riak Ring Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Hazelcast In-memory Computing Architecture . . . . . . . . . . . . . . . . . 19
2.6 Hazelcast Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.7 MICA Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.8 Aerospike Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.9 Word Count Using Hadoop MapReduce . . . . . . . . . . . . . . . . . . . . 34
3.1 Time Taken to Store and Retrieve Data When the Number of Client Pro-
cesses is Varied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Time Taken to Retrieve Data When the Number of Client Processes is Varied. 46
3.3 Time Taken to Store and Retrieve Data When the Number of Servers is
Varied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 Time Taken to Store and Retrieve Data when the Value Size is Varied. . . . 50
3.5 Customized RecordWriter to Read in Data from Redis . . . . . . . . . . . . 54
ix
3.6 Customized RecordReader to Write Data to Redis . . . . . . . . . . . . . . 55
3.7 Comparison of Execution Times (in minutes) for Air-quality Applications
Using HDFS and Redis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
x
List of Tables
2.1 Summary of features of key-value stores . . . . . . . . . . . . . . . . . . . . 28
3.1 Time taken to store and retrieve data when number of client processes is
varied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Time taken to store and retrieve data when number of client processes is
varied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 Time taken to store and retrieve data when the number of servers is varied. 47
3.4 Time taken to store and retrieve data when the size of the value is varied. . 49
3.5 Time taken to execute original air-quality application . . . . . . . . . . . . 56
3.6 Time taken to execute the air-quality application using Redis . . . . . . . . 57
xi
Chapter 1
Introduction
Traditionally, science has been divided into theoretical and applied/experimental branches.
Scientific computing (or Computational Science) though closely related to the theoretical
side, also has features related to the experimental domain. Computational science has
now become the third pillar of science and increasingly scientists employ scientific comput-
ing tools and techniques to solve many problems in the fields of science and engineering.
Problems as diverse as designing the wing of an airplane to predicting the weather are
being solved using scientific computing methodologies. However, the data generated in
such problems is in the range of hundreds of gigabytes, while some applications even deal
with terabytes of data. “Big Data” is a term that is generally used to describe such a
collection of data which is huge in size and yet growing exponentially with time. The New
York Stock Exchange generates terabytes of new trade data per day. Social media sites like
Facebook ingest and generate around 500+ terabytes of data per day. A single jet engine
can generate 10+ terabytes of data in 30 minutes of flight time, with thousands of flights
scheduled per day the generation of data reaches up to several petabytes [9]. Many of these
1
applications are either real-time or have requirements to provide results in a timely man-
ner. Traditionally, such applications were executed using specialized hardware along with
conventional data storage and retrieval methods. However, as the scale of data increased,
the need for larger and scalable data storage methods increased which is why large data
centers began to be used. The massive datasets collected are so large and complex that
none of the traditional data-management tools are able to store it or process it efficiently
mainly because they do not scale according to the scale of the data.
Data being produced can be structured, unstructured, or semi-structured. Relational
databases are bounded by their schema and hence, pose a limitation on the type of data
that can be entered into the database. They cannot accommodate the volume, velocity, and
variety of the data being produced. Also, the data being collected could not be discarded
because larger datasets can be analyzed to generate more accurate correlations, which
may lead to more concrete decision-making resulting in greater operational efficiencies and
profits. In the early 2000s, the volumes of data being handled by organizations like Google
started outgrowing the capacities of the legacy RDBMS software. The exponential growth
of the web also contributed to this data explosion and gradually businesses all around began
facing the issue of managing increasingly large volumes of data. While Internet giants such
as Amazon, Facebook, and Google may have been the first to truly struggle with the “big
data problem”, enterprises across industries were struggling to manage massive quantities
of data, or data entering systems at a high velocity, or both. It wasn’t long before data
scientists and engineers designed a new system to meet the increasing data-management
demands. As a result, the term “NoSQL” was introduced to describe the data-management
systems that contained some RDBMS-like qualities, but went beyond the limits that limited
traditional SQL-based databases.
2
A NoSQL-database environment is a non-relational database system optimized for hori-
zontal scaling onto a large, distributed network of nodes. It enables rapid, ad-hoc organiza-
tion and analysis of massive amounts and diverse data types. NoSQL is a whole new way of
thinking about databases. The easiest way to think of NoSQL, is of a database which does
not adhere to the traditional relational database management system (RDBMS) structure
and sometimes it is also referred to as ‘not only SQL’. It is not built on tables and does
not necessarily employ SQL to manipulate data. NoSQL databases also commonly do not
provide full ACID (atomicity, consistency, isolation, durability) [10] guarantees. NoSQL
also helps ensure availability of data even in the face of hardware failures. If one or more
database servers, or nodes goes down, the other nodes in the system are able to continue
with operations without data loss, thereby showing true fault tolerance. When deployed
properly, NoSQL databases enable high performance while also guaranteeing availability.
This is immensely beneficial because system updates, modifications, and maintenance can
be carried out without having to take the database offline. As NoSQL databases do not
strictly adhere to the ACID properties, they provide real location independence. This
means that read and write operations to a database can be performed regardless of where
that I/O operation physically occurs with the operation being propagated out from that
location, so that its available to users and machines at other sites. Such functionality is
very difficult to architect for relational databases. NoSQL databases guarantee eventual
consistency of the data across all nodes.
A NoSQL-data model can support use cases that don’t fit well into a RDBMS. A NoSQL
database is able to accept all types of data (structured, semi-structured, or unstructured)
much more easily than a relational database, which rely on a predefined schema. NoSQL
systems are designed so that they can be easily integrated into new cloud-computing archi-
tectures that have emerged over the past decade to allow massive computations to be run
3
inexpensively and efficiently. Data organized in NoSQL systems can be analyzed to gain
insights about previously unknown patterns and trends with minimal coding and without
the need for data scientists and additional infrastructure. This makes operational big-data
workloads much easier to manage, cheaper, and faster to implement. Each organization
had different requirements from their NoSQL database and as a result there are various
NoSQL-data stores in the market from different vendors including Amazon, Google, etc.,
to handle big data. However NoSQL databases can be broadly categorized as follows:
• Key-value store: In a key-value store, the data consists of an indexed key and a value,
hence the name.
• Document database: Expands on the basic idea of key-value stores where “docu-
ments” contain more complex data and each document is assigned a unique key.
• Column store: Instead of storing data in rows as done by RDBMS, these databases
are designed for storing data tables as sections of columns of data, rather than as
rows of data.
• Graph database: Based on graph theory, these databases are designed for data whose
relations are well-represented as a graph and have elements which are interconnected.
This thesis focuses on key-value stores and on in-memory key-value stores in particular.
1.1 Brief Overview of Key-Value Data Stores
A key-value store is a simple database that uses an associative array (map or dictionary)
as the fundamental data structure in which each key is associated with one value. In each
key-value pair the key can be in the form of a string such as a filename, URI, or hash
while the value on the other hand, can be any kind of data. The value is stored as a
4
BLOB (Binary Large OBject). The value essentially is binary data and can be anything
ranging from numbers, strings, counters, JSON, XML, HTML, binaries, images, and short
videos. As a result, key-value stores require minimal upfront database design and are faster
to deploy. Also, since data is referenced by keys, there is no need to index the data to
improve performance. However, since the type of the values is known, you cannot filter or
control what’s returned from a request based on the value. Key-value stores provide a way
to store, retrieve, and update data using get, put, and delete commands. The simplicity of
this model makes a key-value store fast, easy to use, scalable, portable, and flexible. Figure
1.1 shows a collection of keys and the values associated with them. These key-value pairs
are then ultimately stored in a key-value database configured to store and retrieve data in
an efficient manner.
Figure 1.1: Key-value pairs
As seen in Figure 1.1, the data is stored in the form of key-value pairs. The key needs to
be unique throughout the dataset since it serves as the index for the value into the datastore.
Key-value databases are designed so as to enable efficient storage and retrieval of key-value
pairs. Typically key-value datastores are implemented using hash-tables since retrieval
from hash-tables can be done in O(1) time if the hash-table is implemented properly. Key-
value stores can use consistency models ranging from eventual consistency to serializability.
Some maintain the data in memory (known as in-memory key-value stores) while others
5
employ storage devices to maintain the data. There are many types of key-value stores
available today but this paper focuses on in-memory key-value stores and specifically on
two in-memory key-value databases namely Memcached and Redis.
1.2 Goals of this Thesis
Over the years, traditional databases have been the go-to solution for all data storage and
analysis requirements. Although traditional databases have been the tried and tested way
to store data, in recent years, we have seen a tremendous shift in the status quo and NoSQL
databases have emerged as the solution for all “big data” applications. This is because
traditional databases are unable to keep up with the “volume, velocity and variety” of the
data currently being generated. There are many types of NoSQL databases each designed
with a specific purpose and target group in mind. Key-value stores are one such type of
NoSQL databases which have found wide-spread due to their simplicity and their ability to
be easily integrated into any environment with minimal efforts. In-memory key-value stores
are a special kind of key-value store that retain data in RAM. They are now increasingly
being used in enterprise applications to improve application performance by enhancing the
speed with which data is written/read. The goal of this thesis is to evaluate and compare
the various in-memory key-value stores currently available. This evaluation is done in three
phases. In the first phase, we evaluate and compare popular and widely-used in-memory
key-value stores available in the market today. In the second phase, we evaluate and
compare in detail, the performance of two in-memory key-value stores, namely Memcached
and Redis using a micro-benchmark that we have developed using C and the OpenMPI
library. In the final phase, we integrate Redis into an application performing large-scale
data analysis so as to analyze if in-memory key-value stores enhance the performance of
the application. The application that we have used is an air-quality simulation developed
6
using Hadoop MapReduce, which analyzes an air-quality dataset of 48.5 GB, containing
measurements of pollutants from various sensors spread all over Texas. This application
analyzes the given dataset to calculate the eight-hour rolling average of air-quality in sites
across Houston, TX.
1.3 Organization of this Document
The rest of the thesis is organized as follows. Chapter 2 discusses the details of various
widely-used in-memory key-value stores. It also outlines the details of the OpenMPI library
and the Hadoop framework. In Chapter 3, we describe in detail, the use of OpenMPI micro-
benchmark which we evaluate and compare the performance of Memcached and Redis. We
also discuss the details of the Hadoop MapReduce air-quality simulation application and
evaluate the performance results after integrating Redis into this application. In Chapter
4, we present the conclusion of the work.
7
Chapter 2
Background
In the previous chapter, we discussed briefly the limitations of traditional relational databases
and how they fall short when dealing with huge volumes of data. Relational databases offer
many powerful data management tools and techniques. However, a majority of applica-
tions today, only require basic functionalities to store and retrieve data by primary key and
do not require the complex querying and management features offered by RDBMS’s. En-
terprise level relational databases require sophisticated hardware and trained professionals
for day-to-day operations which increases the cost of maintaining applications using these
databases. Also, the available replication strategies are limited and typically choose con-
sistency over availability. Despite improvements being made, it is still difficult to scale-out
databases or use smart partitioning schemes for load balancing. To overcome the limita-
tions discussed earlier, NoSQL databases were proposed as the solution. Various types of
NoSQL databases are now increasingly being used for large-scale data analytics applica-
tions. Key-value stores are one such type of NoSQL databases which are widely used in
production environments for their performance and simplicity. In-memory key-value stores
are a specialized form of key-value stores, and they will be the main focus of this thesis. In
8
this chapter, we describe in detail some widely used in-memory key-value stores available
in the market today. We will also briefly explain the OpenMPI library and the Hadoop
framework using which we have developed benchmarks and applications for evaluation and
results.
2.1 In-memory Key-value Stores
Key-value stores are the simplest form of NoSQL databases. A key-value store allows
you to store data, indexed by unique keys. The value is just a blob and the database is
usually not concerned about the content or type of the value. In other words, key-value
stores don’t have a type-defined schema, but a client-defined semantics for understanding
what the values are. Key-value stores tend to have great performance, because the access
pattern in key-value stores can be optimized. The benefits of using this approach is that
it is very simple to build a key value store. Also, applications using key-value stores are
easily scalable. In-memory key-value stores are a specialized form of key-value stores which
are highly optimized so as to allow extremely fast read/writes from/to the database. In-
memory key-value databases store the data in main memory (RAM), so that any request
to read/write data can be serviced by just accessing the RAM instead of the disk. It is
because of this reason that such key-value stores are now increasingly being incorporated
as caching layers in time-sensitive data analytics applications. In the following subsections
we describe and compare the details of some widely used in-memory key-value stores.
2.1.1 Redis
Redis [5] is a very popular open-source (BSD licensed), in-memory, key-value data store.
Redis is widely used because of it’s great performance, adaptability, a rich set of data
structures, and a simple API. According to the creator, Salvatore Sanfilippo, Redis is an
9
“in-memory data structure store used as a database, cache, and message broker” [5]. This
is because, Redis provides support for storing not only string values but also complex data
structures like hashes, lists, and sets. Redis also has support for replication, LUA scripting
[11], Least Recently Used (LRU) eviction [12] and different levels of on-disk persistence.
On-disk persistence means that apart from maintaining data in the memory, there is an
option to also persist the data by either dumping the data to the disk periodically or by
appending each write command to a log file. Persistence can be optionally disabled, if the
application just requires a high-performance, in-memory, caching mechanism.
The architecture of any Redis application is simple and consists of two main processes -
Redis client and Redis Server. The client and server processes can be in the same computer
or in two different computers. The server is responsible for storing data in memory and
handling all read/write requests from the client. The client can be the Redis console client
(provided by RedisLabs) or any other application developed using Redis-client libraries
(available for a wide variety of programming languages). For trivial applications, which
require basic caching facilities, one instance of a Redis server will suffice. However, most
production level applications, will require more than one instance of a Redis server. “Redis
Cluster” (available since version 3.0) is a fairly new feature of Redis which involves running
multiple instances of the Redis server on machines in the cluster. The basic structure of
Redis deployed in a cluster is as follows:
10
Figure 2.1: Redis Cluster
In a cluster, all the server instances are connected to each other and together, they
maintain meta-data about the state of the network. There may be more than one instances
of the server running on one physical machine. The servers communicate using a customized
and highly optimized version of the gossip protocol [13]. Client applications connect to
these server instances and issue read/write requests. A requesting client application can
be of two types:
• Dummy-client requests: The client is responsible for locating the correct node on
which data is located and issue requests to the respective node.
• Smart-client requests: The client forwards its request to any one of the nodes. The re-
quest is then forwarded to the appropriate server where the requested data is present.
The details of how the above connections are created and maintained, are hidden from the
end-user applications by Redis-client libraries. Redis can be deployed in cluster in a variety
11
of ways but the most common method is a master-slave sharded method so as to enable
replication of data. The logical structure of such an architecture is as follows:
Figure 2.2: Redis in a Master-Slave Architecture
The slave nodes are exact replicas of the master nodes which ensures that the required
data will be available even if a particular master node goes down. The Redis cluster
manager tries to allocate slaves and masters such that the replicas are in different physical
servers. “Redis Cluster” also provides many other useful features like adding and removing
nodes while applications are running, resharding of keys around nodes in the cluster, multi-
key operations (e.g., using wildcard characters to retrieve key-value pairs). Redis, whether
running on a standalone machine or in a cluster greatly improves the performance of
applications. Due to a diverse set of useful features and also because of its performance
and ease of use, Redis, today, is one of the leading key-value stores being used in the
industry and academia.
2.1.2 Memcached
Memcached [4] is an open-source, high-performance, distributed, in-memory key-value store
which is used as a caching layer in many applications that deal with huge volumes of data.
12
Memcached is used for data caching in LiveJournal, Slashdot, Wikipedia, and other high-
traffic sites [14]. According to Brad Fitzpatrick, the creator of Memcached, the primary
motivation for creating Memcached was to improve the load performance of dynamic web-
sites by caching individual objects on dynamic web pages. The main idea behind Mem-
cached is to collect the main memory available in all machines connected in a network, and
pool them together so that their collective main memory capacities appear as one cohe-
sive unit to applications using Memcached. This means that Memcached does not require
extremely powerful servers to execute. Memcached can be run on commodity hardware,
connected together in a network. Nodes can be added/removed from the network with-
out any adverse effects. Also, the effective total amount of RAM made available to client
applications is more and can easily be modified to suit application requirements.
Memcached is designed to have a client-server architecture. Memcached server instances
are run over nodes in the network, wherever memory is available, and each server listens
on a user-defined IP and port. The main memory from all running Memcached server
instances forms a single, common memory pool, and client applications use this memory
pool to store and retrieve data. Multiple Memcached server instances can run on a single
physical machine. The basic structure of Memcached in action in the network is as follows:
13
Figure 2.3: Memcached Architecture
In Figure 2.3 [4], we have three Memcached server instances running in the cluster.
The keyspace is divided among the server instances such that each Memcached server is
responsible for a particular set of key-value pairs. To store/retrive a key-value pair, client
applications are supposed to send requests to the correct Memcached instance. This is
done by logically considering each Memcached server itself to be a bucket in a hash table.
To store/retrieve a key, the client calculates the hash of the key, which points to the correct
Memcached instance. Each Memcached instance, in turn, holds a hash table of it’s assigned
key-value pairs. The client application can then store/retrieve the key-value pair. Thus,
Memcached acts as a two-layer global hash table. End-users need not be worried about
the details about how to connect to the correct Memcached instances. There are many
Memcached client libraries available in a wide variety of languages like C, C++, Perl, Java,
PHP, Ruby, and Python. These libraries abstract away the internal details and present a
simplified API which can then be used by applications. In Figure 2.3, if the application
requests the key ‘foo’ (using a client library), the client library calculates the hash value of
the key to locate the server which will process the request (in this case, ‘foo’ is present in
14
server 2). The request is then forwarded to the correct server (server 2). The server then
responds to the client library by searching for ‘foo’ in it’s local hash table and returning it
to the user.
Each server instance is independent of the other and they do not communicate with
each other. Also, the data inside the servers is maintained on a least recently used basis
to make room for new items. In case a server fails or if the requested data is not present
in the cache, requests to the server result in a cache miss, which the application may then
handle appropriately. Memcached clients have to be configured appropriately to deal with
node failures. If no effort is taken in this direction, requests for keys assigned to a failed
Memcached instance simply result in cache misses. Memcached is designed for fast access
to data by using optimized memory allocation algorithms, avoiding locking objects so as to
avoid waits, fetching multiple keys at the same time etc. Due to it’s compactness, simplicity
and high-performance, Memcached is widely used as a caching layer in many applications
that require high-speed access to data [15].
2.1.3 Riak
Riak key-value store (known as Riak KV) [16] is a highly resilient key-value database. It
is highly optimized to be available and scalable while running on a cluster of commodity
hardware. Riak also provides in-memory caching by integrating Redis as the caching
layer into it’s key-value database. This helps reduce latency and improves application
performance. Riak stores data as a combination of keys and values, where the value can
be anything ranging from JSON, XML, HTML to binaries, images etc. Keys are binary
values which are used to uniquely identify a value. An application using Riak is part of a
client-server request-response model. Client applications are responsible for connecting to a
Riak server and making read or write requests. User applications wanting to leverage Riak,
15
need not delve into the details of how to communicate with Riak servers. They can simply
make use of simple API’s provided by client libraries, available for many programming
languages like Java, Ruby, Python, PHP, Erlang, .NET, Node.js, C, Clojure, Go, Perl,
Scala, and R. A Riak server is responsible for satisfying incoming client requests and can
function as a stand-alone instance or can be grouped together to form a Riak cluster. All
the Riak instances in a cluster work together, by pooling together their individual hardware
resources to provide a global view of the database to client applications. They communicate
with each other to provide data availability and partition tolerance.
Riak, working in a cluster, has a peer-to-peer architecture in which all the nodes can
fulfill read and write requests. All nodes have the same set of functionalities which is
why there is no single point of failure in the architecture. Riak’s architecture is arranged
in the form of a “Ring”. Nodes in the cluster are assigned logical partitions and these
partitions are all considered as part of the same hash space (In Figure 2.4 [16], node 0
is responsible for all green partitions while all orange partitions are handled by node 1
and so on). Each partition is a logical entity that is managed by a separate process. This
process is responsible for storing data, serving incoming read, and write requests. Since the
workload is distributed among multiple processes, Riak is extremely scalable. A physical
machine in the network may have one or more partitions stored locally. Depending on
the replication factor (say N), replicas of data stored in one partition is also stored in
the “next N partitions” of the hash space. Nodes in the cluster communicate with each
other by exchanging a data structure known as “Ring state”. At any given point of time
each node in the cluster knows the state of the entire cluster. A client can request for a
particular piece of information from any node in the cluster. If a node receives a request
for data that is not present locally, it forwards the request to the proper node by consulting
the ring state. The ring architecture explained above can be logically depicted as follows:
16
Figure 2.4: Riak Ring Architecture
Riak is an eventually consistent database [17] which means that data is evenly dis-
tributed among all nodes in the cluster and that if a node goes down, key-value pairs are
redistributed in an efficient manner. When a particular node goes down, a neighboring
node will take over its responsibilities. When the failed node returns, the updates received
by the neighboring node are handed back to it. This ensures that data is always available.
Riak also guarantees eventually consistent replicas of the data, meaning that while data
is always available, not all replicas may have the most recent update at the exact same
time. Due to it’s simple architecture, high-performance, and well-documented client li-
brary API’s, Riak has found wide-spread use in many corporations like Uber, Alert Logic,
Zephyr and Rovio.
2.1.4 Hazelcast
Hazelcast [18] is an open source, in-memory data store written in Java. According to the
documentation, “Hazelcast is an In-Memory Data Grid (IMDG) and allows for data to
be evenly distributed among the nodes of a computer cluster and is designed to scale up
17
to hundreds of thousands of nodes”. While in-memory key-value stores like Redis started
providing cluster support only after a few initial versions, Hazelcast was developed from
the ground up with the intention to leverage distributed computer architectures.
Hazelcast’s architecture can be described as peer-to-peer. There is no master and slave
and hence there is no single point of failure. All nodes store an equal amount of data and
do an equal amount of processing. The oldest node in the cluster is the de-facto leader and
manages cluster membership by determining which node is responsible for which particular
chunk of data. As new nodes join or dropout, the cluster re-balances accordingly. Each
server instance runs in a separate Java Virtual Machine [19] and there may be more than
one server instances running on a single physical machine. Hazelcast supports a client-
server request-response design. Client applications making data requests are serviced by
Hazelcast server instances running on nodes in the cluster. User applications do not need to
delve into the details of connecting to Hazelcast servers and making requests. There are a
wide variety of client libraries that enable user applications to communicate with Hazelcast
instances distributed on nodes in the network. Client libararies are provided for popular
programming languages like Java, C++, .NET, Node.js, Python, and Scala. Figure 2.5
[20] depicts the communication mechanism between the client and server applications.
18
Figure 2.5: Hazelcast In-memory Computing Architecture
The client application makes requests to the Hazelcast server which is then fulfilled.
The communication pattern between the client and the servers can one of the following:
• Embedded topology
The client application, the data and the Hazelcast instance all reside on the same
node and share a single JVM. The client and the server communicate with each other
directly.
• Client plus member topology
The client application and the Hazelcast instances are not tightly coupled and may
reside on different nodes of the cluster. They communicate with each other over the
network.
19
The two topologies listed above are depicted below [20].
(a) Embedded Topology (b) Client plus Member Topology
Figure 2.6: Hazelcast Architecture
Although the embedded topology is comparatively simple and there are no extra nodes
to manage or maintain, the client plus member topology is mostly preferred. This is be-
cause, it provides greater flexibility in terms of cluster mechanics. Member JVMs can
be taken down and restarted without affecting the application. The client plus member
topologies isolate the application code from cluster-related events. Hazelcast client ap-
plications can be either a “native client” or a “lite client”. A native client maintains a
connection to any one node in the cluster and is redirected appropriately by that node
when making requests. A lite client maintains data about each and every cluster in the
node and makes requests to the correct Hazelcast instance. The Hazelcast instances share
the keyspace such that any one instance is not over-burdened. In case of node crashes,
Hazelcast also provides recovery and fail-over capabilities. Hazelcast is an open source
library which is easily distributed in the form of a JAR file without the need to install
any software. It supports in-built data structures like maps, queues, multimaps and also
20
allows for the creation of custom data structures. Hazelcast is used in many enterprise
applications and has a huge client base that includes American Express, Deutsche Bank,
Dominos Pizza and JC Penny.
2.1.5 MICA (Memory-store with Intelligent Concurrent Access)
MICA [21] is “a scalable in-memory key-value store that handles 65.6 to 76.9 million key-
value operations per second using a single general-purpose multi-core system” [21]. MICA
can be integrated into applications using a request-response, client-server model. MICA is
installed across nodes in the cluster and client applications can connect to these instances
to make requests. The requesting client needs to know which server instance to contact. To
serve multiple client requests efficiently, MICA is designed for high single-node throughput
and low end-to-end latency. MICA also strives to achieve consistent performance across
workloads, and can handle small, and variable-length key-value items while still running on
commodity hardware. To achieve all the above performance gains, MICA makes key design
decisions regarding parallel data access, the network stack, and key-value data structures.
The following sub-sections describe these design choices in detail.
2.1.5.1 Parallel Data Access
To enable truly parallel access to data, MICA creates one or more data partitions (“shards”)
per CPU core and stores key-value items in a partition determined by their key. An
item’s partition is determined by using a 64-bit hash of an items key calculated by the
client application. Sometimes, such partitioning may lead to skewed workloads wherein a
particular partition is being used more often than others. In this case, MICA exploits CPU
caches and packet burst I/O to disproportionately speed more loaded partitions, nearly
eliminating the penalty from skewed workloads. MICA can operate in EREW (Exclusive
21
Read Exclusive Write) or CREW (Concurrent Read Exclusive Write) modes. EREW
assigns a single CPU core to each partition for all operations. The absence of concurrent
access to partitions removes the need for synchronization and inter-core communication,
making MICA scale linearly with CPU cores. CREW allows any core to read partitions, but
only a single core can write. This combines the benefit of concurrent read and exclusive
write; the former allows all cores to process read requests, while the latter still reduces
expensive cache-line transfer.
2.1.5.2 Network Stack
MICA uses Intels DPDK [22] instead of standard socket I/O. This allows our user-level
server software to control NIC’s (Network Interface Card) and transfer packet data with
minimal overhead. This is done because the key-value pairs to be sent over the network
are usually not large enough as compared to traditional TCP/IP packets. Also, TCP/IP
features like congestion control and error correction are strictly not required for this articula
case. By bypassing socket I/O, MICA avoids any additional network features that are not
required and hence avoids delays. For NUMA (non-uniform memory access) systems [23],
the data is partitioned such that the CPU core and the NIC only accesses packet buffers
stored in their respective NUMA domains. Each key-value pair to be transmitted is an
individual packet, to further increase transmission speeds, MICA uses bursty I/O. MICA
also ensures that no CPU core is overloaded with requests by using processor affinity to
determine which CPU is responsible for which partition of data. Requests for keys are then
forwarded accordingly by the client.
22
2.1.5.3 Key-value Data Structures
MICA can be used either for storing data (no existing items can be removed without an
explicit client request) or for caching data (existing items may be removed to reclaim space
for new items). MICA uses separate memory allocators for cache and store semantics.
MICA uses a circular log for caching. New data is appended to the log and existing data
is modified in place. Oldest items at the head of the file are evicted to make space for
newer entries when the cache is full. Although the natural eviction is FIFO, MICA can
provide LRU eviction by reinserting any requested items at the tail. In store mode, MICA
uses a lossy concurrent hash index to index stored items. Both the above data structures
exploit cache semantics to provide fast writes and simple memory management. Each
MICA partition consists of a single circular log and lossy concurrent hash index.
Figure 2.7 [24] clearly depicts MICA’s in-memory key-value store approach. It also shows
how a client request is forwarded to the server and how each design decision discussed
above affects the plays a part in enhancing the performance.
Figure 2.7: MICA Approach
MICA is entirely written using the C programming language and it has a client library
23
in C. Applications that want to leverage MICA as a key-value store or cache can use this
client library to make requests to MICA instances installed on a cluster. Although, MICA
has a set of impressive features, it is not as widely used as it’s other counterparts. The
reasons for this include limited documentation as well as lack of client libraries in other
programming languages.
2.1.6 Aerospike
Aerospike is a distributed, scalable NoSQL database. It is developed from the ground up
keeping clustering and persistence in mind. It’s architecture is comprised of the following
layers [25]:
• Application layer
All end-user applications fall in this layer
• Client layer
This layer consists of a set of client libraries written in a variety of languages like
C, Java, C#. NET, Go, Perl and Python. These client libraries are responsible for
monitoring the cluster on which Aerospike is installed and forwarding application
requests to the correct node.
• Clustering and distribution layer
This layer manages cluster communications and automates fail-over, replication,
cross-data center synchronization, and intelligent re-balancing and data migration.
• Data storage layer
This layer reliably stores data in DRAM and Flash for fast retrieval.
24
Figure 2.8: Aerospike Architecture
Aerospike uses a shared-nothing architecture, where every node in the Aerospike cluster
is identical, all nodes are peers and there is no single point of failure. Data is distributed
evenly and randomly across all nodes within the cluster. Nodes within the cluster com-
municate with each other using a “heartbeat call” to monitor inter-node connectivity and
to maintain meta-data about the cluster state. When a node is added or removed from
the cluster, data is automatically redistributed among the nodes. Aerospike also allows
for replication of data so as to ensure reliability and availability even if a node goes down.
Replication is done on geographically separated nodes so as to ensure maximum availabil-
ity. Any changes to the main data partition is also immediately reflected in the replicas.
On cluster startup, Aerospike configures policy containers -namespaces (similar to RDBMS
databases). Namespaces are divided into sets (similar to RDBMS tables) and records (sim-
ilar to RDBMS rows). Each record has a unique indexed key, and one or more bins (similar
to RDBMS columns) that contain the record values. Applications can read or write this
data by making requests using Aerospike client libraries. When data is to be stored, the
client library computes a hash to determine which node the data is to be stored on and
25
forwards the request accordingly. Similarly, to read a particular key-value pair, the hash
of the key is calculated by the client library and the request is forwarded to the that node
accordingly. If a node goes down, the client libraries communicate with the replicas until
the node comes back up again. Aerospike secondary indices of data in memory for faster
retrieval. One major feature of Aerospike is that the data can be persisted on to SSD (Solid
State Storage) storage. This hybrid model enables faster fetching of data as compared to
traditional HDD (Hard Disk Drive) storage. Aerospike also supports data types, queries
and User Defined Functions (UDF). Aerospike has steadily gained recognition for being a
high-performing, scalable key-value store and is being used by organizations like Kayak,
AppNexus, Adform and Yashi.
2.1.7 Comparison of Key-Value Stores
In the previous sections, we briefly described the salient features of some widely used in-
memory key-value stores. In this section, we will compare them so as to pick the ones that
we would like to further analyze. The comparison is done on the following factors:
• Programming languages
The aim is to select a database which has client libraries in widely-used major pro-
gramming languages. This ensures that the key-value store can be easily integrated
into scientific and big-data applications.
• Hadoop and HPC support
We want to select a database which can be easily integrated into Hadoop and High
Performance Computing environments (in our case we aim for Open MPI support).
This is because we will be analyzing the key-value store using an Open MPI micro-
benchmark and a Hadoop application.
26
• In-memory storage
Our aim is to analyze key-value stores which can be integrated as a caching layer in
compute intensive applications to see if we observe any performance benefits. Hence,
we look for a key-value store that maintains data in memory.
• Storage on files or databases
We also would ideally like the key-value database to persist data onto secondary
storage so that data is not lost.
• Access from remote locations
We plan to install the key-value store onto a cluster and then access the database
remotely using client applications, which is why easy remote access is important for
us.
• Support for parallel storage and operations
Ideally, we want data operations to be performed in parallel. The key-value store
should be able to run in a cluster and should be able to process multiple incoming
simultaneous requests. Unrelated data requests should not block operations and data
operations should be performed as soon as possible.
• Open Source
From a financial perspective, we aim to select key-value stores that are open source.
Table 2.1 gives a summary of the relevant features of all the key-value stores discussed
above:
27
Table 2.1: Summary of features of key-value stores
Comparing the features of all the above in-memory key-value stores, we found Redis to
be the best fit. Riak fulfills all of the above requirements but its in-memory key-value store
internally uses Redis, so we decided on not moving forward with it. Similarly, Aerospike
also has some promising features but it requires a Solid State Drives (SSD) as the backing
store, which, we believe, largely restricts its scope. Hazelcast and MICA do not have the
option to back data onto a secondary storage medium, which is why we did not select
them. Although, Memcached too does not allow backing of data onto secondary storage,
based on surveys [26], we observed that Redis and Memcached are the most widely-used
28
key-value stores. Hence, we decided to select Redis and Memcached for further analysis.
In the next sections, we examine details of the Message Passing Interface (MPI) used
in parallel computing and the Hadoop framework used for analysis mostly done using a
cluster of commodity hardware. We will also examine the ways in which in-memory key-
value stores can possibly be integrated into these environments so as to offer performance
improvements.
2.2 Brief Overview of Message Passing Interface (MPI)
Traditionally computer problems were solved using serial algorithms where instructions
were executed one after the other. In parallel computing, a problem is broken down into
discrete parts that can be executed concurrently by compute resources that communicate
and co-ordinate with each other to produce the desired results. Parallel computing is thus
used to either solve problems that are too large to be solved by a single compute resource
or to solve problems faster than a single compute resource. The compute resources can
be either a single computer with multiple cores or a set of computers connected through
a network. If the compute resource is a single multi-core computer, then communication
is done by reading or writing to shared memory. However for a distributed architecture,
communication is done using sockets, message passing, or Remote Procedure Calls (RPC).
Generally, shared memory systems are easy to program while distributed memory systems
are difficult to program. This is largely because of the inherent complexity of designing and
coordinating concurrent tasks, a lack of portable algorithms, standardized environments,
and software development toolkits. There are constant innovations in microprocessor ar-
chitecture and as a result, parallel software developed keeping a particular architecture
in mind soon becomes outdated, which ultimately undermines the efforts taken to design
that particular parallel software. Hence, there is a need for a standard library that enables
29
programmers to develop portable, high-performance, parallel applications. MPI stands for
Message Passing Interface [27] and it is a standard that is created and maintained by the
MPI Forum, an open group consisting of parallel computing experts from the industry
as well as academia. The MPI standard provides an Application Programming Interface
(API) [28] that is used for portable, high-performance inter-process communication (IPC)
[29] message passing.
On most operating systems, an “MPI process” usually corresponds to the operating
system’s concept of a process and processes working together to solve a particular problem
are part of a group so as to enable communication between them. MPI is designed to be
actualized as middleware, meaning that upper-level applications invoke MPI functions to
perform message passing without actually going into the details of how exactly communica-
tion takes place. MPI defines a high-level API and it abstracts away the actual underlying
communication methods used to transfer messages between processes. This abstraction is
done to hide the complexity of inter-process communication from the upper-level applica-
tion and also to make the application portable across different environments. A properly
written MPI application is meant to be source-compatible across a wide variety of plat-
forms and network types. MPI exposes API’s for point-to-point communication (e.g., send
and receive) and also for other communication patterns, such as collective communication.
A collective operation is an operation where multiple processes are involved in a single
communication. Reliable broadcast which involves one MPI process sending a message to
all other MPI processes in the group is an example of a collecive operation. There are many
implementations of the MPI standard targeted for a wide variety of platforms, operating
systems, and network types. Some implementations are open source while others are closed
source. Open MPI, as its name implies, is an open source implementation of MPI and is
widely used in many high-performance computing environments. We have developed a
30
micro-benchmark using OpenMPI to analyze the performance of Redis and Memcached.
The details of this benchmark are given in the next chapter.
2.3 Brief Overview of MapReduce Programming and Hadoop
Eco-system
Lately, there has been a deluge of data that is huge and varied. Traditional data analysis
tools are not equipped to handle the magnitude and variety of data being generated and
that is where Hadoop [7] comes in. “The Apache Hadoop software library is a framework
that allows for the distributed processing of large data sets across clusters of computers us-
ing simple programming models. It is designed to scale up from single servers to thousands
of machines, each offering local computation and storage. Rather than rely on hardware to
deliver high-availability, the library itself is designed to detect and handle failures at the
application layer, so delivering a highly-available service on top of a cluster of computers,
each of which may be prone to failures.” [30]. In Hadoop, data storage and data analysis,
both are performed using the same set of nodes which allows Hadoop to improve the per-
formance of large scale computations by using the principle of spatial locality [31]. Also,
the cost of a Hadoop cluster is extremely cheap due to the use of commodity hardware.
Together Hadoop-based frameworks have become the de-facto standard for storing and
processing big data.
The Hadoop framework consists of three main components:
• HDFS: Hadoop Distributed File System (HDFS) [30] is a distributed file system
which is used to store very large files.
• MapReduce Framework: The MapReduce [30] module is responsible for carrying out
distributed analysis tasks by implementing the MapReduce paradigm.
31
• YARN: Yet Another Resource Manager (YARN) [32] is the resource manager for the
framework and is responsible for managing and allocation resources to the application
as and when required.
The origins of the Hadoop framework are largely inspired by the Google File System
[33] and MapReduce paradigm [8] introduced in 2004. These concepts laid the foundation
for the Hadoop framework and by 2009 Hadoop came to be widely used as a large-scale
data-analysis platform. In this model, the total computational requirements of a Hadoop
application are divided among nodes in the cluster, and the data to be processed is stored in
HDFS. HDFS divides the file into blocks and stores those blocks onto nodes in the cluster.
HDFS also provides fault tolerance by storing replicas of file chunks in the cluster and the
default replica count is three (which may be configured according to the requirements of
the application). For fault tolerance, HDFS stores the first replica on the same rack where
the original data is present so as to quickly overcome the failure of a node and to continue
processing. Another replica is stored on a separate rack so that in the event of a rack
failure, data will be available and can be analyzed.
The MapReduce style of programming is exceptionally flexible and can be used to
solve a wide-array of data analytics problems. A Hadoop cluster consists of computational
nodes which can share workloads and take advantage of a very large aggregate bandwidth
across the cluster. Hadoop clusters typically consist of a few master nodes, which control
the storage and processing systems in Hadoop, and many slave nodes, which store all the
clusters data and is also where the data gets processed. MapReduce involves the processing
of a sequence of operations on distributed-data sets. The data consists of key-value pairs,
and the computations have only two phases: a map phase and a reduce phase. The
key concept here is divide and conquer. A typical MapReduce application will have the
following phases:
32
• During the Map phase, input data is split into a large number of fragments, each of
which is assigned to a map task.
• These map tasks are distributed across the cluster.
• Each map task processes the key-value pairs from its assigned fragment and produces
a set of intermediate key-value pairs.
• The intermediate data set is sorted by key, and the sorted data is partitioned into a
number of fragments that matches the number of reduce tasks. This phase is known
as the sort and shuffle phase.
• During the Reduce phase, each reduce task processes the data fragment that was
assigned to it and produces an output key-value pair.
• These reduce tasks are also distributed across the cluster and write their output to
HDFS when finished.
To put this in perspective, we can make use of a basic word-count example. The word
count operation takes place in two stages - a mapper phase and a reducer phase. In the
mapper phase the input text/document is tokenized into words and a key value pair is
formed with these words such that the key is the word itself and the value is ‘1’. All
the values corresponding to a key go to one reducer and in the reduce phase the keys are
grouped together and the values for similar keys are added. This process can be visualized
better as seen in Figure 2.9 [34].
33
Figure 2.9: Word Count Using Hadoop MapReduce
In a MapReduce application, both the map and reduce functions are distributed. When
a MapReduce application is launched, many copies of the program are started on the cluster
of machines on which it is started. One of the copies is called the master and it controls
the rest of the copies - the workers. The master is responsible for distributing the data
across the workers and ensuring that all the workers are engaged in successful completion of
tasks. In case of any failure, automatic re-scheduling of tasks across the available workers is
done. The intermediate key value pairs generated by the map function is distributed across
the multiple workers which run the reduce function. The intermediate values are sorted
and then merged by the reduce function which emits them as output. This distribution
of resources is handled by the YARN module of the Hadoop framework. In the next
subsection, we briefly describe our reasoning behind integrating an in-memory key-value
store into a Hadoop application and the potential benefits that we may gain.
34
2.3.1 Integration of Key-Value Stores in Hadoop
The input, temporary results and the output of a MapReduce application are read/written
from/to the disk via HDFS. Although HDFS is optimized to handle huge loads, the disk
will tend to slow down the performance. Although a majority of MapReduce applications
are meant to be executed in batch-processing mode, there are some applications that may
require quick delivery of intermediate results. Scientific applications fall in this category
and hence, this thesis aims to introduce an in-memory key-value store that will act as the
primary backing store for MapReduce applications instead of HDFS. This is done with the
intention of improving the overall performance of the application by reducing the time to
read/write results. To achieve this, we studied, analyzed and compared the features of
many key-value stores widely used today. Our aim was to find a key-value store which
had the ability to retain data in the main memory so as to reduce retrieval time, support
parallel computing and Hadoop applications and which preferably, is also open source.
Some of the key-value stores that we analyzed are discussed below so as to select the ones
that most suit our needs.
In the next chapter, we will compare the performance of Memcached and Redis using a
micro-benchmark. We will also discuss the working of the air-quality simulation application
in detail and about how integrating an in-memory cache into this application can possibly
give increased performance benefits.
35
Chapter 3
Analysis and Results
The previous chapter gave an overview of various in-memory key-value stores, OpenMPI
and the Hadoop framework. After evaluating some widely used in-memory key-value stores,
we were most interested in evaluating the performance of Redis and Memcached in detail.
To perform this analysis, we have developed a micro-benchmark application. Also, we
were interested in integrating an in-memory key-value database into a compute intensive
application to evaluate if we gain any performance benefits. For this analysis, we have used
an air-quality simulation that generates the eight-hourly air-quality average around sites
in Houston.
In the initial part of this chapter, we describe the micro-benchmark application in detail
and present our results and observations. We then describe the air-quality application
in detail and present our strategy for incorporating an in-memory cache into a Hadoop
application. We then conclude this chapter with our results and findings.
36
3.1 MPI Micro-benchmark
To compare the performance offered by Memcached to that offered by Redis, we have
developed two micro-benchmark applications using C and the MPI library, one each for
Memcached and Redis. The main intention behind developing these two micro-benchmarks
was to do an initial performance analysis of Memcached and Redis. The micro-benchmark
is a C program that establishes a basic communication setup between Memcached/Redis
servers running in a cluster and the respective client applications. The micro-benchmarks
have been developed so that a user can easily specify configurations using only command
line arguments and input files. The parameters that the user can influence are as follows:
• Number of Servers
The number of Memcached/Redis servers to be used and their respective hostnames
are passed in an input text file to the program. These servers then work together to
handle incoming client requests.
• Number of Clients
The number of client processes making requests to the server can be specified us-
ing command-line arguments. The number of client processes storing data can be
configured separately from the number of clients retrieving data.
• Number of key-value pairs to be stored and retrieved
The total number of key-value pairs to be stored and retrieved can be indicated using
command-line arguments.
• Individual value size
The size of individual values can be specified using command-line arguments.
In our analysis, the main conditions that we want to evaluate is the scalability, reliability,
37
and load-balancing ability of Memcached and Redis. By varying input parameters to
the benchmarks, we have evaluated and compared both Memcached and Redis to test
for the above conditions. In the next section, we present details of the micro-benchmark
application and our findings.
3.1.1 Description of the Micro-benchmark Applications
Although, we have developed two micro-benchmark applications, one each for Memcached
and Redis, the two are very similar and only differ in parts that require communication
and synchronization with either Memcached or Redis. We now give details of the Mem-
cached benchmark application and later on, we will explain the Redis benchmark by only
explaining the sections of code that differ.
In the previous chapter, we explained that Memcached client libraries can be integrated
into user applications to make requests to the server to store, retrieve or modify a particu-
lar key-value pair. Memcached has a variety of client libraries for programming languages
like C, C++, Java, or C# .NET. Since we are using MPI and the C programming language
for our benchmark application, we have used libMemcached as our client library. libMem-
cached is an open source C/C++ client library for the Memcached server which has been
designed to be light on memory usage, thread safe, and provide full access to server side
methods. Our MPI micro-benchmark applications make requests to Memcached server
instances, with the help of API’s exposed by libMemcached. The cluster that we have used
for our evaluations is the crill cluster at the University of Houston and the details of this
cluster are provided later on in this chapter.
Our MPI benchmark application acts as a client and sends requests to Memcached servers.
We initially start out with validating the input parameters and initializing the MPI en-
vironment. Once everything has been set-up, we establish a connection to the required
38
number of Memcached servers by using host-names from a given input text file. In the fol-
lowing sample code, each line from the input file is fetched and interpreted as a host-name
with which a connection is to be established.
while((readLen = getline(&line, &length, fp)) != -1)
{
line[readLen - 1] = ’\0’;
servers = memcached_server_list_append(servers, line, 11211, &rc);
rc = memcached_server_push(memc, servers);
}
Once the connections have been established, key-value pairs are stored onto the Memcached
servers. Depending on the number of instances of the client application to be executed and
the number of key-value pairs to be stored/retrieved, the keyspace is divided equally among
the MPI processes. Each MPI process is responsible for handling it’s subset of the keyspace,
independent of the other MPI processes. For example, if 4 MPI processes are given the
task of storing 20 key-value pairs, each process will generate and store 5 key-value pairs
onto Memcached servers. Out of these 4 MPI processes, if only 2 processes are given the
responsibility to retrieve key-value pairs, then each retrieving client will be responsible
for fetching 10 key-value pairs. Special care has been taken to avoid duplicate keys in
the dataset by using a combination of the current MPI process’ rank and offset of the
current key within the subset of data assigned to the current instance. Values are just
alpha-numerical strings that are generated using a random function. These key-value pairs
are then later retrieved one by one and the amount of time taken to store and retrieve the
key-value pairs is noted down. Between the generation, storing of key-value pairs and their
retrieval, care has been taken to insert MPI barrier statements because we are pipelining
the storage and retrieval tasks one after the other. The following code section demonstrates
39
the relevant code section to retrieve key-value pairs from Memcached servers.
MPI_Comm_rank(MPI_COMM_WORLD,&taskid);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
nKeyValPairs = atoi(argv[2]);
nSubsetSize = nKeyValPairs / numtasks;
keyMin = taskid * nSubsetSize;
keyMax = ((taskid + 1) * nSubsetSize) - 1;
start = MPI_Wtime();
while(keyMin <= keyMax)
{
sprintf(key, "%d", keyMin);
gen_random(value, valueSize);
rc = memcached_set(memc, key, strlen(key), value, strlen(value), (time_t)0,
(uint32_t)0);
keyMin++;
}
end = MPI_Wtime();
The working of the micro-benchmark application for Redis is also very similar to the one
described above and for brevity, we skip the code sections for the Redis micro-benchmark.
As our Redis client library, we have used Hiredis. Hiredis is a compact C client library for
the Redis server. Hiredis is the official C client library recommended by Redis Labs and it
is thread-safe with built-in write replication, auto-reconnect, and a couple of other useful
features.
Thus, using these two benchmarks, we performed measurements to analyze and compare
the performance of Memcached and Redis. In the next section, we give technical details of
the hardware and software resources used.
40
3.1.1.1 Technical Data
For the analysis of our benchmark we have used the crill cluster at the University of
Houston. The crill cluster consists of 16 nodes with four 12-core AMD Opteron (Magny
Cours) processors each (48 cores per node, 768 cores total), 64 GB of main memory and
two dual-port InfiniBand HCAs per node. The cluster has a PVFS2 (v2.8.2) parallel file
system with 15 I/O servers and a stripe size of 1 MB. The file system is mounted onto the
compute nodes over the second InfiniBand network interconnect of the cluster. The cluster
utilizes SLURM as a resource manager. For development we have used the OpenMPI
library (version 2.0.1), Memcached (version 1.4.20), Redis (version 3.2.8), Libmemcached
(version 1.0.18), and Hiredis (version 1.0.0)
In the next few sections, we explain in detail the process that we have used to analyze and
compare the performance of Memcached and Redis using the benchmark applications.
3.1.2 Comparison of Memcached and Redis using our Micro-benchmark
Integrating a database into a mission critical application is often a huge decision and
organizations typically invest a lot of effort in selecting one that suits their needs. Any
such analysis on databases is incomplete without taking into consideration how well it
performs in terms of speed. The amount of time taken to store and retrieve data is one
of the main parameters affecting the efficiency of a database. Hence, our benchmarks
focus mainly on the time taken to store and retrieve a pre-determined amount of data.
However, there can be many factors that affect how fast data is stored and retrieved from
the database. The major parameters that we are concerned with are as follows:
• Responsiveness.
To test for responsiveness, we vary the number of processes storing and retrieving
41
data to/from the database servers. We believe that this experiment will give us an
idea of how well a server handles parallel requests coming in from multiple client
applications. Ideally, even as the number of parallel client requests increases, the
database should stay responsive. This will ensure that even if clients work together
to complete a single huge task, the performance is not hampered.
• Scalability.
To test for scalability, we vary the number of Memcached/Redis server instances
running in the cluster. This will help us gain insights about how well a database
performs load balancing. We believe that, as the number of servers increases, data is
also distributed evenly among increasing number of server. Hence the time taken by
an individual server to search for a data item and return it to the client should also
go down. This will in turn lead to lesser execution times.
• Functionality in case of varying data load.
This experiment is aimed at understanding how well a database performs irrespective
of the size of data to be stored/fetched. To do this, we incrementally vary the size
of the value to be stored and retrieved from the database. We expect that as data
sizes increase, the execution times will also increase. The main of this experiment if
to test that both Memcached and Redis perform well despite increasing data loads.
We believe that analyzing Memcached and Redis based on the above three criteria will
give us an overall understanding of their performance. It will also help quantify the overall
performance levels of the two databases. We have executed our micro-benchmarks on the
crill cluster, keeping in mind the above parameters. In the next few sections, we will
examine and compare the results that we have observed.
42
3.1.2.1 Varying the Number of Client Processes
In this analysis, our main aim is to observe the performance of Memcached and Redis in
the face of parallel data requests. To do this, we gradually increase the number of client
processes making requests to the servers, while keeping all other aspects of the application
fixed. This experiment has been performed in two parts. In the first part, we vary the
number of client processes while keeping the value size fixed at 1 KB. In the second part
we vary the clients and keep the value size fixed at 32 KB. The reasoning for this two-part
evaluation is explained in the following subsections.
3.1.2.1.1 Using Values of Size 1 KB
For this case, we have generated, stored and retrieved 100,000 key-value pairs where each
key is 20 characters long and each value is of size 1KB. We have used eight Memcached
and Redis server instances. The number of MPI processes is varied from 1 to 64 in steps of
powers of 2. The processes work together to store and retrieve the data. We have recorded
three readings for storage and retrieval times and reported the minimum. The minimum
storage and retrieval times observed in each case is given in Table 3.1:
Table 3.1: Time taken to store and retrieve data when number of client processes is varied.
43
Figure 3.1, shows a comparison of the data storage and retrieval times for Memcached
and Redis.
0 20 40 600
20
40
60
80
100
No. of client processes
Min
.ti
me
take
nto
stor
ed
ata
(sec
)
MemcachedRedis
0 20 40 600
10
20
30
No. of client processes
Min
.ti
me
take
nto
retr
ieve
data
(sec
)
MemcachedRedis
Figure 3.1: Time Taken to Store and Retrieve Data When the Number of Client Processes
is Varied.
As observed in Figure 3.1, for both Memcached and Redis, we see that for storing and
retrieving data, the time taken to store and retrieve data decreases as the number of
processes increases. However, we see that towards the end, the performance of Memcached
is significantly worse than Redis. This leads us to conclude that Memcached is unable to
keep up as the number of simultaneous client requests increases beyond a certain threshold.
Also, when we compare the performance of Memcached and Redis, we can clearly see that
Redis gives better storage and retrieval times as compared to Memcached.
3.1.2.1.2 Using Values of Size 32 KB
In production-level applications, data size typically exceeds 1 KB. Hence, to get an idea of
how Memcached and Redis would perform while integrated with a regular application, we
44
decided to generate, store and retrieve 100,000 key-value pairs where each key is 20 char-
acters long and each value is of size 32 KB. For this analysis, we have used 16 Memcached
and Redis server instances. As in the previous case, the number of processes is varied from
1 to 64 in steps of powers of 2. We have recorded three readings for storage and retrieval
times and reported the minimum. The minimum storage and retrieval times observed in
each case is given in Table 3.2:
Table 3.2: Time taken to store and retrieve data when number of client processes is varied.
As seen in Figure 3.2, we compare the storage and retrieval time for Memcached and
Redis, when the data size is 32 KB and the number of client processes is varied.
45
0 20 40 600
50
100
150
200
No. of client processes
Min
.ti
me
take
nto
stor
ed
ata
(sec
)
MemcachedRedis
0 20 40 600
50
100
150
200
No. of client processes
Min
.ti
me
take
nto
retr
ieve
data
(sec
)
MemcachedRedis
Figure 3.2: Time Taken to Retrieve Data When the Number of Client Processes is Varied.
As seen in Figure 3.2, the time taken to store and retrieve 100,000 key-value pairs (each
value of 32 KB) follows the same pattern as the one where we used 1 KB values. How-
ever, in this case Memcached performs significantly worse than Redis. While storing and
retrieving data, we observed that, despite initial spikes, overall as the number of clients is
increased, the storage and retrieval time also gradually decreases. Also, when the value size
was increased to 32 KB, we noticed a considerable amount of cache misses for Memcached.
The reason for these misses is the fact that Memcached does not back data to a secondary
store and it is purely an in-memory key-value store.
Thus, for both of the above cases (data of size 1 KB and 32 KB), we conclude that Redis
is better at handling parallel client requests. Redis also performs better than Memcached
while storing and retrieving data. We observed that Redis was more reliable than Mem-
cached and that it strives to achieve data availability in most cases irrespective of data size.
In the next section, we will present the results observed while testing for the scalability of
both databases.
46
3.1.2.2 Varying the Number of Server Instances
We now run the second experiment by varying the number of Memcached and Redis server
instances running in the cluster. As part of this experiment, we generate, store and retrieve
100,000 key-value pairs with each key of 20 characters and each value of size 1 KB. We run
this experiment using 16 MPI processes and all 16 of them will share the load of storing
and fetching the data. The number of server instances used are 1, 2, 4, 8, 12, and 16. We
have recorded three readings for storage and retrieval times and reported the minimum.
The minimum storage and retrieval times observed in each case is given in Table 3.3:
Table 3.3: Time taken to store and retrieve data when the number of servers is varied.
As seen in Figure 3.3, we compare the performance of both databases when the number of
servers are varied.
47
0 5 10 15 200
1
2
3
4
No. of servers
Min
.ti
me
take
nto
store
dat
a[s
ec]
MemcachedRedis
0 5 10 15 200
1
2
3
4
No. of servers
Min
.ti
me
take
nto
retr
ieve
data
(sec
)
MemcachedRedis
Figure 3.3: Time Taken to Store and Retrieve Data When the Number of Servers is Varied.
From the above graphs, we can see a downward trend in the time taken to store and
retrieve the data. This indicates positively towards the scalability and the load-balancing
abilities of both Memcached and Redis. However, in this case too, we found that Redis
out-performs Memcached.
3.1.2.3 Varying the Size of the Value
The previous two cases focused mainly on analyzing the responsiveness and scalablity of
Memcached and Redis. In this case, we subject both Memcached and Redis to increasing
levels of data load and analyze how well they perform regular functions like storing and
retrieving data. For this case we have generated, stored and retrieved 100,000 key-value
pairs where each key is 20 characters long. We have used 16 Memcached and Redis server
instances and 16 MPI processes. All the MPI processes are equally responsible for handling
the load. The value size is varied from 1 KB to 64 KB in steps of powers of 2. We have
recorded three readings for storage and retrieval times and reported the minimum. The
48
minimum storage and retrieval times observed in each case is given in Table 3.4:
Table 3.4: Time taken to store and retrieve data when the size of the value is varied.
As seen in Figure 3.4, the difference in data storage and retrieval times for Memcached
and Redis when the data load is varied.
49
0 2 4 6
·104
0
20
40
60
80
100
120
Size of value (bytes)
Min
.ti
me
take
nto
stor
ed
ata
(sec
)
MemcachedRedis
0 2 4 6
·104
0
20
40
60
80
100
120
Size of value (bytes)
Min
.ti
me
take
nto
retr
ieve
data
(sec
)
MemcachedRedis
Figure 3.4: Time Taken to Store and Retrieve Data when the Value Size is Varied.
In this experiment, we observed that, as the size of individual values were increased, both
Memcached and Redis gradually started taking more time to store and retrieve the data.
Figure 3.4 clearly shows a linear relationship between the size of the data and the storage
and retrieval times. For values upto sizes of 1 KB, both Memcached and Redis perform
reasonably well while storing and retrieving data. However we see that, as the data size
is increased beyond 1 KB, the execution times double with each step. In this experiment
too, we observed that Redis performs better than Memcached while storing and retrieving
data. Also, in case of Memcached, we observed that, as the data size increased, the number
of data misses also increased.
3.1.2.4 Observations and Final Conclusions
In this manner, we have performed a comprehensive analysis of Memcached and Redis
using our OpenMPI micro-benchmark. We analyzed both key-value stores so as to gain an
50
idea about how reactive they are to varying data loads and varying number of client re-
quests. We also performed experiments to test the scalability of these two databases. Both
key-value stores performed fairly well in the test cases. However, as the number of client
requests and the volume of data to be stored/fetched increased, the difference in perfor-
mance between the two databases became apparent. Redis out-performed Memcached in
all our test cases. Also, we noted that Redis was generally more reliable than Memcached
in terms of data availability. This observation can be attributed to the fact that, contrary
to Redis, Memcached does not have any option to back data to secondary storage. As a
result, incorporating Memcached as an in-memory key-value cache into an application may
lead to more cache misses as the volume of data and incoming requests increases. Taking
into consideration the above results, we conclude that Redis is a much better candidate to
incorporate into applications as an in-memory caching mechanism. In the next section, we
present the details of the air-quality simulation application and the method that we have
used to integrate Redis into this application.
3.2 Air-quality Simulation Application
In the previous section, we analyzed and compared two in-memory key-value stores, namely
Memcached and Redis. After analysis we concluded that Redis outperformed Memcached
in most instances. In this section, we integrate Redis as a caching layer into a data analysis
application to see if gives any significant performance benefits. The application that we
are using is a Hadoop MapReduce application that is responsible for calculating the eight-
hour rolling average of air-quality data gathered around sites in Houston [6]. We are using
a dataset that contains information of pollutants measured by various sensors placed all
across Texas, from 2009 to 2013. We are using a total of five input files, one for each year.
The total size of the dataset is 48.5 GB and all the input files are stored in HDFS. Each
51
input file is a comma-separated list of information. Each line consists of the following fields:
year, month, day, hour, min, region, parameter id, parameter name, site, cams, value, and
flag. The problem that we tried to solve is to compute the eight-hour rolling average of O3
concentration in the air around sites in Houston, TX. This problem is broken down into
two parts. In the first part, for every site in Houston, we calculate the hourly average.
In the second part we combine the hourly averages to calculate the eight-hourly averages.
Using Hadoop MapReduce we can solve this problem using two MapReduce jobs.
The first MapReduce job computes the average of O3 concentration around sites in Houston
for every hour. The data present in the input directory is divided into blocks and given
as input to the mapper. It outputs (key,value) pairs which are then used by the reducer
to perform the required aggregation. The key emitted by the mapper is a combination
of siteId, year, day of the year, and the hour. Only data points having the valid flag
set, belonging to sites in Houston, with parameter name as O3 and whose pollutant value
is not null are considered as valid data points for our measurement. The corresponding
pollutant concentration is emitted by the Mapper as the value. The Reducer gets as input
a subset of keys, and each key is associated with a list of values. For each key, the sum of
values and the number of values associated with that keys are computed. If the frequency
count for a given hour is above a certain threshold (ex: greater than five in our case), the
corresponding hourly average is computed. If the frequency is less, a dummy value (“-1”
in our case) is emitted so as to indicate that the value is inconsequential. The second
MapReduce job calculates the eight-hour rolling average of O3 concentration around sites
in Houston. The mapper receives as input, the hourly averages computed in the previous
MapReduce job. The mapper emits eight keys that indicate the eight consecutive hours
starting from the hour indicated in the input key and the average pollutant concentration
value corresponding to the base hour. Special care has been taken to ensure that the hours
52
emitted by the mapper roll over after 24 hours. Every instance of the reducer, receives
as input, a list of average O3 concentration values associated with a particular hour. For
every hour, the sum of the averages and a frequency count is computed similar to the earlier
MapReduce job. If the total number of valid entries for a given hour is above a certain
threshold (greater than six in our case), the corresponding eight-hour rolling average is
computed. If the frequency count is less, a dummy value (“NA” in our case) is emitted by
the reducer to indicate an inconsequential entry.
Thus, using the above two MapReduce jobs, we have calculated the eight-hourly averages
of O3 concentration. The next section describes our reasoning for integrating Redis as a
caching layer into this application, and gives details of how this was achieved.
3.3 Integration of Redis in Hadoop
In the previous section, we described the air-quality MapReduce application in detail and
pointed out that input data to the application comes from HDFS and the output data is
written to HDFS. However, intermediate data, like the data passed from the first MapRe-
duce job to the second MapReduce job as well as the data passed on from the Mapper to
the Reducer is also written to HDFS. We believe that introducing an in-memory key-value
store as a caching layer may boost the performance of this application, because data will
be read in from RAM and not from the disk. To test this hypothesis, we have decided
to incorporate Redis as an in-memory cache in the air-quality application. To do this, we
have customized the data input source and output destinations to suit our requirements.
In the previous chapter, we discussed that, when a MapReduce job starts, each input file
is divided into splits and each of these splits is assigned to an instance of the Mapper.
Each split is further divided into records of key-value pairs which are then processed by
the Mapper. The ’InputFormat’ class is responsible for configuring how contiguous chunks
53
of input are generated from blocks in HDFS (or other sources). This class also provides
a ’RecordReader’ class that generates key-value pairs from each individual split. Hadoop
provides a set of standard InputFormat classes, but in our case, we use our own Input-
Format and RecordReader classes so as to read in data from Redis. Similarly, to write
our data to Redis instead of HDFS, we need to provide our own implementation of the
RecordWriter class.
In the new application, we will still have two MapReduce jobs where the first job calculates
the hourly averages and the second calculates the eight-hourly average. The flow of the
new application will be as follows:
• The Mapper of the first MapReduce job reads data from the input file stored in
HDFS and emits (key, value) pairs which are then used by the reducer to perform
the required aggregation.
• The Reducer calculates and emits the corresponding hourly averages to Redis instead
of HDFS. To write to Redis, we use our own customized RecordWriter as follows:
Figure 3.5: Customized RecordWriter to Read in Data from Redis
• The Mapper of the second job, reads in the hourly averages from Redis and emits
eight keys that indicate the eight consecutive hours starting from the hour indicated
in the input key and the input average value corresponding to the base hour. The
data is emitted to Redis instead of HDFS. To acheive this we implement our own
RecordReader as follows:
54
Figure 3.6: Customized RecordReader to Write Data to Redis
• Finally, the Reducer of the second MapReduce job, reads in the output of the previous
step from Redis and calculates and emits the final eight hourly average to HDFS.
To integrate Redis into our Hadoop application, we make use of a Java Redis client library
called Jedis which is the officially recommended Java client by Redis Labs. We have
then implemented customized RecordReader and RecordWriter classes to read/write data
to/from Redis using Jedis.
In this way, we have concluded the description of the air-quality simulation application
and our own customized version using Redis. In the next section, we present the details of
the hardware and software resources used for our analysis.
3.3.1 Technical Data
The Whale cluster located at the University of Houston is used to perform analyses for the
research work. It has 57 Appro 1522H nodes (whale-001 to whale-057). Each node has two
2.2 GHz quad-core AMD Opteron processors (8 cores total) with 16 GB main memory and
Gigabit Ethernet. The cluster uses a 144 port 4xInfiniBand DDR Voltaire Grid Director
ISR 2012 switch and a two 48 port HP GE switch for the network interconnect. For
the storage, a 4 TB NFS /home file system and a 7 TB HDFS file system (using triple
replication) is used. For development we have used Hadoop (version 2.7.2), Redis (version
3.2.8) and Jedis (version 2.8)
55
In the next section we compare the performance of both air-quality simulation applications
described previously and present our conclusion.
3.4 Results and Comparison
In the previous section, we discussed in detail the Hadoop air-quality application and also
our customized implementation with in-memory caching. In this section, we analyze the
performance of the two applications with respect to the time taken to complete execution.
We then compare the execution times of both applications to see if integrating Redis as a
caching layer provides any benefits.
To perform our analysis, we have used the whale cluster at the University of Houston. For
our analysis, we have varied the number of reducers from 1 to 20 in steps of 5. We have
executed both applications three times on the whale cluster and reported the minimum
of the three. The results that we observed for the original air-quality application are in
Table 3.5:
Table 3.5: Time taken to execute original air-quality application
No. of Reducers Execution time (min)
1 5min, 9sec
5 3min, 33sec
10 3min
15 2min, 42sec
20 2min, 41sec
56
The execution times that we observed for the air-quality application in which we integrated
Redis are in Table 3.6:
Table 3.6: Time taken to execute the air-quality application using Redis
No. of Reducers Execution time (min)
1 5min, 49sec
5 3min, 43sec
10 3min, 43sec
15 2min, 45sec
20 2min, 44sec
Figure 3.7 will enable us to understand the execution timings better. In the graph, we
have compared the total execution time taken by both air-quality applications.
1 5 10 15
3
4
5
6
No. of Reducers
Exec
uti
onti
me
(min
) HDFSRedis
Figure 3.7: Comparison of Execution Times (in minutes) for Air-quality Applications Using
HDFS and Redis.
57
Contrary to our expectations, we observed that integrating Redis into our application did
not provide any added performance benefits. In fact, the total time taken by the application
using in-memory caching is more as compared to the original application. We believe that
the delay is being introduced due to the fact that we are using a single Redis hash to store
data. As a result, this becomes a bottleneck when a client tries to write multiple key-value
pairs to the database. When a client wants to write data to Redis, it connects to a Redis
server instance and demands access to the hash. This client will then wait till it receives
a response from the server before sending the next request. Essentially all requests from
a single client are serialized and delay is introduced in completing the requests. When
we use more than one client, the delays get accrued and we see poor performance. To
solve this problem, Redis provides an advanced feature called pipelining [35]. Using Redis
pipelining it is possible to send multiple commands to the server without waiting for replies
from the server. This essentially means that a client buffers up a bunch of commands and
ships them to the server in one go. The benefit here is that we save network round trip
time for every command. However, due to lack of proper documentation for Jedis and
time constraints, we could not explore this option of pipelining client requests, but we wish
to continue exploring this option. We believe that introducing pipelining will give better
results, and we will see the true benefits of using Redis as a caching layer in scientific
applications. With this, we conclude the analysis and result section of this thesis and in
the next chapter, we conclude this thesis by summarizing our analysis, observations and
findings.
58
Chapter 4
Conclusions and Outlook
In recent years, the industry as well as academia has faced an unprecedented data explo-
sion and performing analyses on these large datasets is becoming increasingly common.
Data analysis is performed so as to find previously unknown correlations between datasets,
however, at the same time there is a tremendous need to make proper use of the available
computing resources. Also, traditional RDBMS databases are unable to keep up with the
huge volume of data that is being generated. To complicate matters further, data being
generated is obtained from various sources and may be structured or unstructured. NoSQL
databases overcome many of the shortcomings of RDBMS systems and have emerged as
a solution to store and analyze big data. There are many types of NoSQL databases and
lately key-value NoSQL databases are being increasingly used due to their simplicity and
ease of use. In-memory key-value stores are a special kind of key-value databases that re-
tain data in main memory instead of on secondary storage. This is done so as to speed-up
access to data. As a result, they are being used in compute intensive applications as an
intermediate caching layer to store intermediate and final results. This ensures faster read
times and hence enhances the performance of the application. The main focus of this thesis
59
is to analyze and compare the various in-memory key-value stores available in the market
today.
We have analyzed popular in-memory key-value stores like Memcached, Redis, Riak, Hazel-
cast, Aerospike, and MICA. We have then compared them based on features like in-memory
caching, support for multiple, parallel requests, open-source, easy access from remote loca-
tions etc. Based on our analysis, we were most interested in studying Redis and Memcached
in detail. To do this we have developed a micro-benchmark using C and the OpenMPI
library so as to analyze and compare Memcached and Redis. Based on our analysis, we
concluded that Redis was more scalable and reliable as compared to Memcached. Also,
we noticed Redis to be more resilient in the face of large data requests. Based on this
observation, we concluded Redis to be the better of the two.
To test how well Redis performs as an in-memory cache, we have integrated it into a Hadoop
MapReduce application that measures the eight hourly average of air-quality around sites
in Houston. We have used a 48.5 GB dataset that contains data collected from various
sites in Texas from 2009 to 2013. This task has been achieved in two parts using two
MapReduce jobs. The first job is responsible for calculating hourly averages and the second
job calculates the final eight hourly averages. The main aim was to compare the execution
times of this application with a similar Hadoop application that does not use in-memory
caching. Although, we observed promising results for the second part of the application we
observed that integrating Redis as a caching layer did not offer any performance benefits.
However, we believe that this problem can be solved using an advanced feature called Redis
pipelining and we wish to explore this further.
In the future, we are interested in benchmarking other in-memory key-value stores like Riak.
We also want to integrate Memcached as a caching layer into a data analysis application
to observe it’s performance in a real-world scenario. Further, we would love to explore
60
other components of the NoSQL eco-system so as improve the analytical abilities of big
data applications.
61
Bibliography
[1] Rick Cattell. Scalable sql and nosql data stores. SIGMOD Rec., 39(4):12–27, May2011.
[2] Ameya Nayak, Anil Poriya, and Dikshay Poojary. Type of nosql databases and itscomparison with relational databases. International Journal of Applied InformationSystems, 5(4), March 2013. Published by Foundation of Computer Science, New York,USA.
[3] Key-value database - wikipedia. https://en.wikipedia.org/wiki/Key-value_
database. [Online; accessed 16-Mar-2017].
[4] Brad Fitzpatrick. Distributed caching with memcached. Linux J., 2004(124):5–, Au-gust 2004.
[5] Redis. https://redis.io/. [Online; accessed 23-Dec-2016].
[6] Haripriya Ayyalasomayajula, Edgar Gabriel, Peggy Lindner, and Daniel Price. Airquality simulations using big data programming models. In Big Data Computing Ser-vice and Applications (BigDataService), 2016 IEEE Second International Conferenceon, pages 182–184. IEEE, 2016.
[7] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. Thehadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposiumon Mass Storage Systems and Technologies (MSST), MSST ’10, pages 1–10, Washing-ton, DC, USA, 2010. IEEE Computer Society.
[8] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on largeclusters. Commun. ACM, 51(1):107–113, January 2008.
[9] Introduction to big data: Types, characteristics & benefits. http://www.guru99.
com/what-is-big-data.html. [Online; accessed 04-Nov-2016].
[10] Acid - wikipedia. https://en.wikipedia.org/wiki/ACID. [Online; accessed 21-February-2017].
[11] The programming language lua. https://www.lua.org/. [Online; accessed 27-Dec-2016].
62
[12] Using redis as an lru cache redis. https://redis.io/topics/lru-cache. [Online;accessed 24-Dec-2016].
[13] Gossip protocol - wikipedia. https://en.wikipedia.org/wiki/Gossip_protocol.[Online; accessed 11-Jan-2017].
[14] Memcached - a distributed memory object caching system. https://memcached.org/.[Online; accessed 27-Nov-2016].
[15] Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee,Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford,Tony Tung, and Venkateshwaran Venkataramani. Scaling memcache at facebook. InProceedings of the 10th USENIX Conference on Networked Systems Design and Imple-mentation, nsdi’13, pages 385–398, Berkeley, CA, USA, 2013. USENIX Association.
[16] Riakkv enterprise technical overview. http://info.basho.com/rs/721-DGT-611/
images/RiakKV\%20Enterprise\%20Technical\%20Overview-6page.pdf. [Online;accessed 01-Feb-2017].
[17] Consistent hashing - wikipedia. https://en.wikipedia.org/wiki/Consistent_
hashing. [Online; accessed 31-Dec-2016].
[18] An architect’s view of hazelcast imdg - hazelcast.com. https://hazelcast.com/
resources/architects-view-hazelcast/. [Online; accessed 02-Feb-2017].
[19] Java virtual machine - wikipedia. https://en.wikipedia.org/wiki/Java_virtual_machine. [Online; accessed 28-Feb-2017].
[20] Hazelcast documentation. http://docs.hazelcast.org/docs/3.3/manual/pdf/
hazelcast-documentation-3.3.5.pdf. [Online; accessed 03-Feb-2017].
[21] Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. Mica: Aholistic approach to fast in-memory key-value storage. In 11th USENIX Symposiumon Networked Systems Design and Implementation (NSDI 14), pages 429–444, Seattle,WA, 2014. USENIX Association.
[22] Data plane development kit. http://dpdk.org/. [Online; accessed 03-Mar-2017].
[23] Non-uniform memory access - wikipedia. https://en.wikipedia.org/wiki/
Non-uniform_memory_access. [Online; accessed 07-Mar-2017].
[24] Mica: A holistic approach to fast in-memory key-value storage. http://www.slideserve.com/schuyler/
mica-a-holistic-approach-to-fast-in-memory-key-value-storage. [On-line; accessed 04-Feb-2017].
[25] Aerospike architecture. http://www.aerospike.com/docs/architecture. [Online;accessed 04-Mar-2017].
63
[26] Db-engines ranking - popularity ranking of key-value stores. https://db-engines.
com/en/ranking/key-value+store. [Online; accessed 03-Feb-2017].
[27] Mpi: A message-passing interface standard. http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf. [Online; accessed 13-Dec-2016].
[28] Application programming interface - wikipedia. https://en.wikipedia.org/wiki/
Application_programming_interface. [Online; accessed 02-March-2017].
[29] Inter-process communication - wikipedia. https://en.wikipedia.org/wiki/
Inter-process_communication. [Online; accessed 26-Feb-2017].
[30] Apache hadoop. http://hadoop.apache.org/. [Online; accessed 10-Feb-2017].
[31] Locality of reference - wikipedia. https://en.wikipedia.org/wiki/Locality_of_
reference. [Online; accessed 21-February-2017].
[32] Apache hadoop yarn. https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/
hadoop-yarn-site/YARN.html. [Online; accessed 17-Apr-2017].
[33] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system.In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles,SOSP ’03, pages 29–43, New York, NY, USA, 2003. ACM.
[34] Hadoop word count example. https://cs.calvin.edu/courses/cs/374/
exercises/12/lab/. [Online; accessed 12-Dec-2016].
[35] Redis pipelining. https://redis.io/topics/pipelining. [Online; accessed 12-Apr-2017].
[36] M. Berezecki, E. Frachtenberg, M. Paleczny, and K. Steele. Many-core key-value store.In Proceedings of the 2011 International Green Computing Conference and Workshops,IGCC ’11, pages 1–8, Washington, DC, USA, 2011. IEEE Computer Society.
[37] Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. Work-load analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIG-METRICS/PERFORMANCE Joint International Conference on Measurement andModeling of Computer Systems, SIGMETRICS ’12, pages 53–64, New York, NY, USA,2012. ACM.
[38] Shared memory hash table vishesh handa’s blog. http://vhanda.in/blog/2012/
07/shared-memory-hash-table/. [Online; accessed 01-Dec-2016].
[39] Tom White. Hadoop: The Definitive Guide . O’ReillyMedia, Inc., second edition,October 2010.
[40] Thilina Gunarathne Srinath Perera. Hadoop MapReduce Cookbook . Packt Publishing,first edition, February 2013.
64
[41] Introduction to mapreduce and hadoop. http://people.csail.mit.edu/matei/
talks/2010/amp_mapreduce.pdf. [Online; accessed 17-Apr-2017].
[42] Edgar Gabriel. Cosc 6374 parallel computation, fall 2015. http://www2.cs.uh.edu/
~gabriel/courses/cosc6374_f15/index.shtml. [Online; accessed 17-Apr-2017].
[43] Edgar Gabriel. Cosc 6339 big data analytics, spring 2015. http://www2.cs.uh.edu/
~gabriel/courses/cosc6339_s15/index.shtml. [Online; accessed 17-Apr-2017].
[44] Rdbms, 2016. [Online; accessed 28-November-2016].
[45] Emilio Coppa. Hadoop architecture overview. http://ercoppa.github.io/
HadoopInternals/HadoopArchitectureOverview.html. [Online; accessed 17-Apr-2017].
[46] Open mpi: Open source high performance computing. https://www.open-mpi.org/.[Online; accessed 17-Apr-2017].
[47] Jeffrey M. Squyres. The architecture of open source applications (volume 2): Openmpi. http://www.aosabook.org/en/openmpi.html, 2015. [Online; accessed 17-Apr-2017].
65