AN APPROACH FOR SECURE AND LEAKAGE ...cs.ucf.edu/~ahmadian/pubs/Proposal.pdfAN APPROACH FOR SECURE...
Transcript of AN APPROACH FOR SECURE AND LEAKAGE ...cs.ucf.edu/~ahmadian/pubs/Proposal.pdfAN APPROACH FOR SECURE...
AN APPROACH FOR SECURE AND LEAKAGE RESILIENT SEARCH OVER ENCRYPTEDNOSQL DATABASES IN A PUBLIC CLOUD
by
MOHAMMAD AHMADIANM.S. University of Central Florida, 2014
M.S. Amirkabir University of Technology, 2009
A Proposal submitted in partial fulfilment of the requirementsfor the degree of Doctor of Philosophy
in the Department of Electrical Engineering and Computer Sciencein the College of Engineering and Computer Science
at the University of Central FloridaOrlando, Florida
Fall Term2016
Major Professor: Dan C. Marinescu
c© 2016 Mohammad Ahmadian
ii
ABSTRACT
Processing the vast volume of data generated by web and mobile applications necessitates a scal-
able and flexible data management system. Database-as-a-Service (DBaaS) is a new paradigm
offered by cloud computing promising a cost-effective and efficient database functionality that
meets all requirements. However, outsourcing data storage to clouds changes significantly the
threats and adds new dimension to data security. While many traditional data processing threats
remain, DBaaS introduces new challenges such as confidentiality violation and information leak-
age from privileged malicious insiders. We consider the problem of building a secure DBaaS on
top of a public cloud infrastructure where the Cloud Service Provider (CSP) is not completely
trusted by the data owner. We present a high level description of several architectures that combine
recent and modern cryptographic primitives to achieve our goal. In this thesis a novel search-
able security scheme is proposed to leverage secure query processing in presence of a malicious
cloud insider without disclosing sensitive information. Comprehensive database security scheme
comprises more than just encryption and this thesis is focused on information leakage prevention.
Therefore, information leakage prevention as a key challenge is addressed. The main contributions
of our work are:
i. Searchable security scheme for non-relational databases of the cloud DBaaS; ii. Leakage min-
imization in the untrusted cloud. The analysis of experiments that employ a set of established
cryptographic techniques to protect databases and minimize information leakage, proves that per-
formance of our solution is bounded by communication cost and not cryptographic computation.
iii
To Ghazal & my dear parents.
iv
ACKNOWLEDGMENTS
I would like to express my sincere gratitude to my advisor Prof. Dan C.Marinescu for the con-
tinuous support of my Ph.D. study and research, for his patience, motivation, enthusiasm, and
immense knowledge. His guidance helped me in all the time of research and writing of this thesis.
Moreover, I would like to thank the rest of my thesis committee: Prof. Joseph Brennan, Dr. Mark
Heinrich, and Dr. Pawel Wocjan, for their encouragement and insightful comments.
v
TABLE OF CONTENTS
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
CHAPTER 1: INTRODUCTION AND MOTIVATION . . . . . . . . . . . . . . . . . . . 1
1.1 Cloud Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Searchable Security Scheme For RDBMS . . . . . . . . . . . . . . . . . . 2
1.1.2 Cloud Data Storage And Management components . . . . . . . . . . . . . 3
1.2 Cloud NoSQL Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Data Models For NoSQL Databases . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Searchable Security Scheme For NoSQL databases . . . . . . . . . . . . . 7
1.3 Leakage Proof Data Processing In Public Cloud . . . . . . . . . . . . . . . . . . . 8
1.3.1 Cryptosystems For Outsourced Data Store . . . . . . . . . . . . . . . . . . 9
1.4 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
CHAPTER 2: RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
CHAPTER 3: RESEARCH OBJECTIVES AND APPROACH . . . . . . . . . . . . . . . 19
vi
3.1 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 JSON And BSON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
CHAPTER 4: CURRENT WORK AND PRELIMINARY RESULTS . . . . . . . . . . . 24
4.1 SecureNoSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.1 The Proposed Construction: SecureNoSQL Scheme . . . . . . . . . . . . 27
4.1.2 Security Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1.3 Processing Queries On Encrypted Data . . . . . . . . . . . . . . . . . . . 35
4.1.4 Measurements And Experimental Results . . . . . . . . . . . . . . . . . . 37
4.2 Leakage Prevention In DBaaS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
CHAPTER 5: CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.1 Work In Progress And Tasks Time Table . . . . . . . . . . . . . . . . . . . . . . . 42
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
vii
LIST OF REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
viii
LIST OF FIGURES
4.1 Architecture of SecureNoSQL. . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 The high level structure of the security plan. . . . . . . . . . . . . . . . . . . 29
4.3 Structure and description of Collection: (a) The chart outlines the structure of
collection containing the name of collection and name of all fields which are
considered as meta-data thus should be protected with proper cryptographic
module. The pointer to a cryptomodule, the encryption key, and the initial-
ization vector used for the encryption of the items. (b) The description of
a collection and security parameters in designed JSON based language. In
this specific case the Advanced Encryption Standard in deterministic (AES-
DET) mode with a 128-bit key and an initialization vector (IV) is assigned to
encrypt the name of the collection and the fields name. . . . . . . . . . . . . 30
4.4 Structure and description of Cryptographic modules: (a) Security Plan with
the second section, the cryptographic module, expanded. The attributes in-
cluded for each module are: name, type, key size, key, input and output size.
(b) The OPE encryption including the cryptosystems and their attributes. The
proxy applies these modules using the key-value pairs (KVP). . . . . . . . . 31
ix
4.5 Structure and description of Data element: (a) The chart outlines the structure
of Data elements containing attributes of data elements such as name, type
and value for of collection and name. Then introduces security parameters
for each data elements. (b) The data element section of a sample database
which are represented in designed notation. A data item has 7 fields: id,
name, salary, balance, ccn, ssn, and email. The id, name, email and salary
are required fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.6 Structure and description of Mapping cryptographic modules to the Data ele-
ment: (a) Security plan with the fourth section expanded. This section estab-
lishes a correspondence between the data fields and the cryptographic mod-
ules used to encrypt and decrypt it. (b) The mapping section of the schema
for a sample database with 7 fields. For example, the id and the name will be
encrypted with OPE 128 bit and AES-DET, respectively. . . . . . . . . . . . 33
4.7 SecureNoSQL applied to: (a) The key-value data model; Key1, . . . , Keyn
are all encrypted using the cryptographic module z while the correspond-
ing values, V alue1, . . . , V aluen are encrypted with cryptographic modules
1, 2, . . . , n, respectively. (b) The document store data model; the meta-data
such ass collection name encrypted as well as attributes with assigned cryp-
tographic modules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.8 The validation process of input data against security plan in the client side. . . 35
4.9 Security plan designed for sample input: (a) Data element section of sample
security plan. (b) Output of JSON Data validation for sample database. . . . . 36
x
4.10 The query db.customers.find({salary:{$gt:5000}, balance:{$lt:2000}}) re-
ceived from an application. (a) The parsing tree of the query (b) The crypto-
graphic modules applied to the data elements according to schema definition . 37
4.11 Query processing time in milliseconds (ms) for the unencrypted database and
for the encrypted databases when the 32-bit keys are encrypted as 64, 128, 256
and 512-bit integers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.1 Estimate work plan and timeline . . . . . . . . . . . . . . . . . . . . . . . . 43
xi
LIST OF TABLES
2.1 Information leakage management methods comparison . . . . . . . . . . . . 18
4.1 Overhead of encryption upon security level . . . . . . . . . . . . . . . . . . 35
4.2 Sample queries and their corresponding encrypted version . . . . . . . . . . 37
5.1 List of publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
xii
CHAPTER 1: INTRODUCTION AND MOTIVATION
Cloud computing is an appealing alternative for data processing but, at the same time, it raises
serious concerns regarding the security of sensitive data. Is it feasible to outsource computation
without reveling private information? This thesis tries to find an answer to this question through the
investigation. Web and mobile applications based on cloud services are ubiquitous and Database
As a Service (DBaaS) is one of the most important cloud services for data storage. The idea
of outsourcing storage and processing of private data to a third party is a high-risk decision that
makes applications vulnerable to unauthorized access by external or by malicious cloud insiders.
Security and privacy in the cloud environment are a critical concern for cloud users. The majority
of the cloud service providers (CSPs) support features that allow system administrators to deploy
a basic level of security controls for hosted datasets. Nevertheless, it seems that there is no full-
proof accepted solution to prevent unauthorized access by malicious insiders who have unlimited
access to the entire system. The security and privacy threats associated with cloud computing
negatively affect all cloud services and act as an inhibitor for potential cloud customers. Many
cloud users have sensitive data related to their enterprise, so any unauthorized data access will
devastate their business. We propose a solution that satisfies security requirements for applications
using DBaaS for database functionality either as relational (RDBMS) or non-relational (NoSQL)
database management systems.
In the proposed searchable security scheme there are three interested parties, the data owner, the
cloud service provider and the user’s applications. This thesis assumes that in public cloud en-
vironment all three parties interact. The proposed scheme is easily adaptable to the hybrid and
community cloud environment where the security risk is lower than the public cloud. Most CSPs
such as Amazon Web Services (AWS), Google AppEngine, Microsoft Azure are providing full
featured RDBMS and NoSQL database systems. We first examine the security requirements of
1
DBaaS. Then, we report on our scheme so called “SecureNoSQL” for query processing over en-
crypted NoSQL database the first scheme for NoSQL databases.
1.1 Cloud Relational Databases
RDBMS are widely used by most organizations for supporting data management for many appli-
cations. There are IT professionals are trained to implement application with using RDBMS. For
those reasons RDBMS have been offered as a service by CSPs even at their early development
stages. Nowadays, cloud RDBMS such as Relational Database Service (RDS) offered by Amazon
Web Services (AWS) is a cost-efficient database functionality. AWS RDS provides six popular
database engines to choose from, including Amazon Aurora, Oracle, Microsoft SQL, PostgreSQL,
MySQL and MariaDB.
1.1.1 Searchable Security Scheme For RDBMS
RDBMS are used by operational database systems for On-Line Transaction Processing (OLTP).
Cloud computing adopted RDBMS and equipped it to more features and delivers it as a fully
managed and integrated service. Therefore, the cloud RDBMS is ideally suited for complex query-
intensive analytic workloads. In cloud DBaaS the application developer plays a more important
role than on-premise computation because cloud eliminate the need for database administrators
and this can be seen as another reason for the popularity of DBaaS. Major CSPs such as Amazon
Web Services, Microsoft Azure and Google Cloud Platform offer a broad range of cloud storage
and data management that help organizations move faster from on-premise computing to cloud
computing.
2
1.1.2 Cloud Data Storage And Management components
Cloud storage is a cost effective and scalable service that allows customers to store and access data
anywhere and anytime using the Internet. In the cloud storage model, the CSP is considered as
third party that provides reliable storage service for users who pay on a per use basis. CSPs store
multiple copy of data redundantly across different geographical location to reduce access time and
facilitate disaster recovery. Although cloud storage is cost-effective, it poses significant security
and privacy risks. Once in cloud storage the owner of the data has no longer control on where it
is stored and what how it is protected against unauthorized access. For instance, AWS offers an
array of flexible and affordable data management services including Simple Storage Service (S3),
SimpleDB, RDS, Elastic Compute Cloud (EC2) and DynamoDB.
Amazon Simple Storage Service: AWS S3 uses a simple data model, consisting of two types of
storages: objects and buckets. Objects, like files, contain data and metadata but, objects are not
organized in a hierarchy and every object exists at the same level. A bucket is a logical unit of
storage used to store objects. From the security viewpoint, S3 only provides an access control
mechanism based on rules to either grant or deny access permission to the S3 objects or buckets.
Obviously, having access control does not provide protection for S3 data against malicious insider.
Encryption can be applied for the stored data to protect from the cloud internal. Data in a bucket
can be encrypted to protect it from either insider or outsider threats.
Amazon Elastic Compute Cloud: EC2 is a virtual server on demand that user can manage it like
a physical machine. EC2 can be created by the API or management console. AWS has defined a
unit for measuring the processing power of an EC2 instance to ensure their performance remain
consistent over time. AWS offers verity of choice for the EC2 instances that offer different level
of performance and resources with the corresponding different in pricing. EC2 uses the public key
part of the key pair associated with the AWS account to secure login, so that only someone with
the corresponding private key can access to the EC2 instance. In addition, by using concept of
3
security group that are basically collections of rules the traffic of EC2 instance is manageable.
1.2 Cloud NoSQL Databases
The name NoSQL given to the storage model discussed in this thesis is misleading. Michael Stone-
breaker notes that blinding performance depends on removing overhead. Such overhead has noth-
ing to do with SQL, but instead revolves around traditional implementations of ACID transactions,
multi-threading, and disk management” [46]. The “soft-state” approach in the design of NoSQL
databases allows data to be inconsistent and transfers the task of implementing only the subset of
the ACID properties required by a specific application to the application developer. The NoSQL
ensures that data will be “eventually consistent” at some future point in time, instead of enforcing
consistency at the time when a transaction is “committed”. Data partitioning among multiple stor-
age servers and data replication are also tenets of the NoSQL philosophy; they increase availability,
reduce the response time, and enhance scalability.
Scalability and availability are critical requirements for E-commerce, social networks and other
applications dealing with very large data sets. Companies heavily involved in cloud computing
discovered early on that traditional RDBMS cannot handle the massive amount of data and the
real-time demands of on-line applications critical for their business model. RDBMS schema is of
little use for such applications and conversion to NoSQL databases seems a much better approach.
Big data and mobile applications are the two most important growth area of cloud computing. Big
data growth can be viewed as a three-dimensional phenomenon; it implies an increased volume
of data, requires an increased processing speed to produce more results, and, at the same time, it
involves a diversity of data sources and data types [37]. A delicate balance between data security
and privacy and efficiency of database access is critical for such applications. Many cloud services
used by these applications operate under tight latency constraints; moreover, these applications
4
have to deal with extremely high data volumes and are expected to provide reliable services for
very large communities of users. Nowadays NoSQL databases are widely supported by cloud
service providers. Their advantages over traditional databases are critical for big data application.
Amazon DynamoDB: AWS offers DynamoDB, a fully managed fast and flexible NoSQL database
service that provides fast performance with consistent scalability. DynamoDB supports both docu-
ment and key-value store models that are very flexible data models, this feature make DynamoDB
best choice for mobile, web, gaming and Internet Of Things (IOT) applications. AWS Manage-
ment Console or the Amazon DynamoDB Application Program Interface (API), can be used for
scale up or down without downtime or performance degradation.
1.2.1 Data Models For NoSQL Databases
In recent years more than 120 NoSQL databases have been created including CouchDB, Neo4j,
VaultDB, MongoDB, Cassandra, and BigTable 1 and all of these are refereed by umbrella term
NoSQL. They are classified based on their data models. Choosing proper data model has extremely
important influence on the performance and scalability of the data stores. Since, our work has tight
connection to NoSQL data models; therefore, for being precise we bring a brief definition for data
models.
Key-value stores: This simple data model resembles an associative map or a dictionary where a
key uniquely identifies the value. The data can be either a primitive data type such as a string, an
integer, an array, or it can be an object. This model is effective for storing distributed data thus,
it is highly scalable and this motivates its use by cloud data management systems. Systems such
1For compete list refer http://www.nosql-database.org/
5
as Bigtable [14], CouchDB 2, DynamoDB [44], MemcacheDB 3 and Redis 4 use this model. This
model is not suitable for applications demanding relations or structures.
Column-family stores: In this model the data are stored in a column-oriented style and the dataset
comprise several rows, each row is indexed by a unique key, so-called primary key. Each row is
composed of a set of column families, and different rows can have different column families. Sim-
ilarly to key-value stores, the row key resembles the key, and the set of column families resembles
the value represented by the row key. However, each column family further acts as a key for the one
or more columns that it holds, whereas each column consists of a key-value pair. Hadoop HBase
directly implements the Google Bigtable concepts, whereas Amazon SimpleDB and DynamoDB
contain only a set of column name-value pairs in each row, without having column families. Some-
times, SimpleDB and DynamoDB are classified as key-value stores. Typically, the data belonging
to a row is stored together on the same server node. Cassandra provides the additional function-
ality of super-columns, which are formed by grouping various columns together. Cassandra can
store a single row across multiple server nodes using composite partition keys. In column-family
stores, the configuration of column families is typically performed during start-up. A column fam-
ily in different rows can contain different columns. A prior definition of columns is not required
and any data type can be stored in this data model. In general, column-family stores provide more
powerful indexing and querying than key-value stores because they are based on column families
and columns in addition to row keys. Similarly, to key-value stores, any logic requiring relations
must be implemented in the client application.
Document stores: In this model data are stored inside the internal structure, while in the key-
value store the data are opaque to database. Now the database engine applies meta-data to create
2http://couchdb.apache.org3http://www.Memcached.org4http://redis.io
6
a higher level of granularity and delivers a richer experience for modern programming techniques.
Document-oriented databases are using a key to locate the document inside data store. Most docu-
ment stores use JSON or BSON (Binary JSON). Document stores are suited for applications where
the input data can be represented in a document format. A document can contain complex data
structures such as nested objects. Document store allows document grouping into collections. A
document in a collection should have a unique key. Unlike an RDBMS, where every row in a
table follows the same schema, a document in document stores may have a different structure.
Document stores provide the capability of indexing documents based on the primary key as well
as on the contents of the documents. Like key-value stores, they are inefficient in multiple-key
transactions involving cross-document operations.
Graph Databases: This data model based on graphs can be used to represent complex structures
and highly connected data often encountered in real-world applications. In graph databases, the
nodes and edges have individual properties consisting of key-value pairs. Graph databases are
a good alternative for social networking applications, pattern recognition, dependency analysis
and recommendation systems. Some graph databases such as Neo4J 5 support ACID6 properties.
Graph data stores are not as efficient as other NoSQL data stores and do not scale well horizontally
when related nodes are distributed to different servers.
1.2.2 Searchable Security Scheme For NoSQL databases
Data security in cloud platform is critical for the applications running on public clouds because
multiple virtual machines (VMs) often share the same physical platform [50, 51, 52]. Using classic
cryptography primitives can protect data while in storage, but even the encrypted data has to be de-
5http://neo4j.com6ACID (Atomicity, Consistency, Isolation, Durability) properties guarantee that transactions are processed reliably.
7
crypted for processing purpose. This is particularly troubling when searching databases containing
personal information such as healthcare or financial records; then the entire plaintext database is
exposed to such attacks. This motivates us to investigate methods for searching encrypted NoSQL
databases. Though general computations with encrypted data are theoretically feasible using the
algorithms for Fully Homomorphic Encryption (FHE) [24], this is by no means a practical solu-
tion at this time. Existing algorithms for homomorphic encryption increase the processing time of
encrypted data by many orders of magnitude compared with the processing of plaintext data. Re-
cent implementation of FHE [28] requires about six minutes per batch; after optimization this time
drop to almost one second for computing simple operation on encrypted data [20]. Other related
methods are Learning With Error (LWE) [7], lattice based encryption [39, 10], and Attribute based
Encryption [26].
1.3 Leakage Proof Data Processing In Public Cloud
Encryption is a common practice to promise privacy of data and query, but still encrypted data and
query are vulnerable against information leakage in cloud platform. A database can be encrypted
by data owner before outsourcing to the cloud in such a way that client queries can still be pro-
cessed on transformed data. Ultimately, the encryption does not hide all information about the
encrypted data, for instance the collection name (or table name in RDBMS), field name, number
of the field, involved in the query and their length often revealing information about the encrypted
data. Moreover, a cloud insider can infer sensitive information from sequence of queries. This type
of attacks on encrypted database is classified as information leakage. Outsourced encrypted data
set should leak sensitive information as little as possible. An acceptable level of security on search-
able encryption can be achieved with the Oblivious RAM (ORAM) [25, 40, 34] method. The major
problem of ORAM is its efficiency and the high computational cost and intense communication
8
between client and server.
We will argue that any query is an object with several features. Therefore, any query is considered
as a point in n dimensional feature space. Then we use a linear classifier with training data set to
extract implicit information from encrypted dataset. Every query is distinct from others in terms
of measurable features, such as the length of query string, number of involved fields, number of
objects, operation between objects, aggregate functions, domain of query and the timing informa-
tion. These features form a fingerprint from each unique query which can be identified uniquely.
Furthermore, the fingerprint of each specific client can be achieved with high confidence based on
the combination of the fingerprints of all the most periodic issued queries. In this research work
We will formulate the information leakage from encrypted data sets then we will define metrics
and cost coefficient of leakage prevention solution, to measure their performance.
1.3.1 Cryptosystems For Outsourced Data Store
Data in the cloud computing can be in one of three states: store, transit, or process. Developers
of web applications need to have efficient tools to protect sensitive information from a third party,
including the CSP. In an effort to maintain security and privacy, any comprehensive data security
mechanism must take into account the protection criteria for data in any of these states.
The communication channels can be secured by using the standard HTTP over Secure Socket
Layer (SSL) communication protocol. Most CSPs provide an API for the web service that enables
developers to use both the standard HTTP and the secure version of the HTTPS protocol. The
security requirements of data in transit state fully can be satisfied by using HTTPS for communi-
cation with cloud. In addition, the endpoint authentication feature of the SSL protocol makes it
possible to ensure clients are communicating with an authentic cloud server.
9
The basic idea is to encrypt the data before uploading it to Cloud. However, the data should be
decrypted by the cloud server before getting processed. In other words, the data owner should
disclose decryption key to the server in order to decrypt the data before performing any required
operation. The problem is when the decryption key is compromised, the data confidentiality would
be affected. Therefore, in the cloud computing model, new set of cryptosystems is required. En-
cryption schemes that support operations on encrypted data are called homomorphic encryption
which have a very wide range of applications in cloud computing. In a nutshell, a fully homomor-
phic encryption scheme is a cryptosystem that allows evaluation of arbitrary complex operations
on encrypted data.
A cloud developer is responsible to ensure that the data in cloud storage is protected by authen-
tication based on user’s credentials. Moreover, for highly sensitive data, the risk of illegitimate
access should be considered. For instance, the data should be protected from a malevolent insider
who may gain access to the data. Thus, for protection purposes, the sensitive information should
be encrypted before being uploaded to the cloud. Any type of encryption can be used, since there
is no required data format for cloud storage.
Random (RND). Applying A RND type encryption scheme, a message is coupled with a key and
a random Initial Vector (IV). This scheme is called probabilistic, since encryption of the same
message with the same key yields different ciphertext. This randomness provides the highest level
of security. Randomness property is achievable with different encryption algorithms. Advanced
Encryption Standard (AES) with Cipher Block Chaining (CBC) mode [19] is used for RND en-
cryption. AES is a symmetric block cipher algorithm with a key size of 128,192 or 256 bits and
with a block size of 128 bits. RND type schemes are semantically secure against chosen plaintext
attacks and hides all kind of information about ciphertext. As a result, RND scheme does not allow
any efficient computation on the ciphertext. Equation 1.1 describes the encryption and decryption
10
of a block cipher in CBC mode.
C1 = Ek(P1 ⊕ IV ), P1 = IV ⊕Dk(C1)
for j = 2 . . . n; Cj = Ek(Pj ⊕ Cj−1), Pj = Cj−1 ⊕Dk(Cj)
(1.1)
Where: Ek is the Encryption algorithm, Dk is the Decryption algorithm, k is the secret key P is a
block of plaintext data and C is a block of ciphered data.
Deterministic (DET). A DET encryption scheme is a cryptosystem which always produces the
same ciphertext for an equal pair of given plaintext and key. Block ciphers in Electronic Code Book
(ECB) mode with a constant initialization vector are deterministic (DET). Deterministic encryption
scheme leaks information about ciphertext of same plaintext. AES encryption scheme in ECB
mode is used for DET encryption over document-oriented NoSQL databases. This DET scheme
enables server to process pipeline aggregation stages such as group, count, retrieving distinct values
and equality match 7 on the fields within an embedded document. The embedded document can
maintain the link with the primary document through application of DET encryption. The Equation
1.2 displays the encryption and decryption operation in a DET.
for j = 1 . . . n; Cj = Ek(Pj); Pj = Dk(Cj) (1.2)
Order-Preserving Encryption (OPE). OPE projects the order relation between plaintext data
elements to their ciphertext values. OPE leaks the order of ciphertext, so it supports a lower degree
of security. Even in Modular Order-Preserving Encryption (MOPE) [38] which is an extension
to the basic OPE for security improvement, there is information leakage. An efficient inequality
comparisons on the encrypted data elements can be performed by applying OPE which supports
7Equality matches over specific common fields in an embedded document will select documents in the collectionwhere the embedded document contains the specified fields with the specified values.
11
range queries, comparison, Min(), Max() on the ciphertext. We use the algorithm introduced in
[6] and implemented in [4] for cloud environment. Equation 1.3 shows the preservation of order
relation in plaintext and the ciphertext.
∀x, y | x, y ∈ Data Domain x < y =⇒ OPEk(x) < OPEk(y) (1.3)
Additive Homomorphic Encryption (AHOM). AHOM is a scheme that allows the server to
conduct computations on ciphertext with the final result that get decrypted at the proxy. In spite
of sustained research efforts [24, 8] of the Fully Homomorphic Encryption (FHE), there is no
efficient FHE, except for limited operations. We applied Paillier [41] scheme that supports additive
operations as shown by Equation 1.4. It should be noted that m1,m2 are messages to be encrypted
where m1,m2 ∈ Zn. r1, r2 are randomly selected and r1, r2 ∈ Z∗n. In other words, the product of
two ciphertexts decrypt to the sum of their corresponding plaintexts.
Dk(Ek(m1, r1)× Ek(m2, r2)mod n2) = m1 +m2 (mod n) (1.4)
Definition 1 (Information leakage)
Information leakage is the ability of an attacker to infer sensitive information either through mul-
tiple database searches or through statistical analysis of cloud database queries. In a nutshell,
information leakage can be defined as using combination of data, meta-data and query that are
classified at lower level L1 to extract information that are at higher level L2.
In this research, we restrict our discussion to secure query processing particularly over encrypted
NoSQL databases with minimum information leakage. The key part of SecureNoSQL is evaluation
a set of operations on the encrypted databases. Moreover, the designed novel algorithms for in-
12
formation leakage prevention from data or query are added to SecureNoSQL. We also introduced
a novel descriptive language based on the JSON8 notation which enables the users to generate a
security plan. The security plan is useful tools for data owners for regulating security parameters
management without getting involved in the details. We considered four sections for any secu-
rity plan, the collection, data element, cryptographic modules and the mapping between them.
The concurrent queries are supported by the present designed structure; however, for the relevant
concurrent experiments, a network of multiple servers and clients are required. At this moment,
such configurations and hardware setup were not available. Thus, for some experiments of this
research we have used EC2 instances which is consistent with the final goal of this study. Since the
standard Database Management System (DBMS) are used in this work, therefore the concurrent
queries over encrypted distributed datasets are automatically supported without extra cost.
1.4 Roadmap
We discuss all of our approaches and solutions addressed above in the rest of this proposal which
has been organized as follows: the latest related work and researches on the subjects of secure
query processing and information leakage prevention are reviewed in Chapter 2. Chapter 3 repre-
sents the research objectives, motivation, threat model, JSON and BSON and finally we describe
the problem statement.
All the experiments of prototype systems are presented in Chapter 4. We propose two schemes for
secure query processing over encrypted data sets and information leakage management. The orga-
nization and the structure of security plan and the notation of descriptive language for generation
of security plan are discussed in Section 4.1. Afterwards, the mechanism for information leakage
8JSON (JavaScript Object Notation), is a lightweight text based syntax for storing and exchanging data objectsconsisting of key-value pairs. It is used primarily to transmit data between a server and web application. JSONpopularity is due to the fact that it is self-describing and easy to understand by human and machine.
13
prevention is discussed in Section 4.2. Finally this proposal is concluded in Chapter 5 with the in
progress and completed tasks time table as well as the published and under review papers.
14
CHAPTER 2: RELATED WORK
High scalability and distribution feature are the most important requirements for processing a large
volume of data which is mostly created by human or connected devices. DBaaS is extensively used
for data processing and meets both aforementioned requirements. Furthermore, DBaaS enables
users to use a database without running their own sever. In DBaaS setup, CSP takes the responsi-
bility of maintaining the hardware and the software. The cost for the service is proportional to the
usage of resources. Although easy launch of database through web-based console is an alluring
option, DBaaS brings in series of new security risks which need to be addressed. Some of the
studies on DBaaS focus on information leakage caused by sharing physical infrastructure among
multiple virtual machines. The study concluded by Ristenpart et al [43] showed the Infrastructure
As A Service (IaaS) model is susceptible for information leakage despite the isolation of virtual ma-
chines. A method called “Advanced cloud Protection System (ACPS)” for secure visualization in
cloud environment, proposed by Lombardi et al [36], mitigates security risks for external attackers
assuming the cloud is trustworthy.
The performance and efficiency of DBaaS have been extensively studied in the literature [27,
18, 17]. Techniques to improve workload balancing between clients and server and graph-based
partitioning algorithm for improving the performance and obtaining almost linear elastic scale-out
are introduced in [18]. Furthermore, a new benchmark framework compares DBaaS performance
offering by various CSPs [17].
The first SQL-aware query processing over encrypted database was CryptDB [42]. CryptDB sat-
isfies data confidentiality for the relational database. However, CryptDB cannot perform queries
over data encrypted with different keys. One important application of searching on encrypted
data [11, 45, 48] is in cloud computing where the clients outsource their storage and computation.
15
In [11] a practical searchable security scheme is introduced which can search on encrypted data
sets in sub-linear time complexity by using different types of indices, however it is not practical
on NoSQL data sets which are designed to scale to millions of users doing updates simultane-
ously [13].
NoSQL databases are suffering from lack of proper data protection mechanism because these
databases have been designed to support high performance and scalability requirement. In or-
der to protect personal and sensitive information, a privacy and security preserving mechanism is
required in big data platforms. Integration of privacy aware access control features into existing
big data are discussed in the [30]. The evolution of big data systems from the perspective of an in-
formation security application is studied in [23, 47]. A cloud based monitoring and threat detection
system proposed by [16] for critical component to make infrastructure systems secure. Security in
DBaaS has been studied by several research projects [42, 29, 48, 31]. In all of these researches the
cryptosystems applied for encrypting databases before outsourcing to the CSP, in the same way
queries are encrypted and processed on the server. This is a practical general approach for pro-
tection of sensitive data at the off-site data-store. For example, in [42] CryptDB is introduced for
processing queries over encrypted relational databases. Similarly, in SecureNoSQL is proposed
for processing queries over encrypted NoSQL databases in cloud platform. The system supports
access to a MongoDB1 encrypted document-store database. SecureNoSQL is a secure proxy that
allows the applications to access and process queries on the encrypted datasets. The proxy receives
queries from clients, extracts the elements of the query, applies security parameters on them, and,
finally, forwards them to the cloud database server. After an encrypted query is processed by the
database the proxy receives the results, decrypts it and forwards it to the client. SecureNoSQL is
an open infrastructure easily extended with new encryption modules. To implementation of leak-
1MongoDB is a document-oriented NoSQL database which adopts the concept of traditional table-based relationaldatabase structure in favor of JSON-like documents with dynamic schema.
16
age prevention algorithms, the construction of SecureNoSQL has been further developed for the
study discussed in this research. The leakage prevention mechanism also implemented inside the
SecureNoSQL. We have implemented a number of cryptosystems for different types of queries and
now we describe the characteristics of these cryptosystems and their applications.
Information leakage issue in a single untrusted server is studied in [49] and statistical measurement
of information leakage investigated in [15]. The weakness of k-anonymity solution for protection
against identity disclosure is recovered by introducing t-closeness in [33] which requires closeness
between distribution of sensitive attributes in the equivalent classes to the global distribution of
attributes.
To protect sensitive data from untrusted CSP the existing crypto-primitives which require de-
cryption key for processing could not be applicable, consequently the research track of finding
cryptosystems that allow processing over ciphertext data has been became appealing. Most of
researches focused on Homomorphic Encryption that allows computations to be carried out over
encrypted data [24]. Other cryptosystem that relaxed on security notion is Order-Preserving En-
cryption (OPE) also introduced in [6] and implemented in [4] for cloud platform. Untrusted CSP
still can extract information from encrypted data. In the majority of the research works in the lit-
erature, it is assumed that applying cryptographic techniques adequately provide protection in the
untrusted cloud platform, while this assumption is not utterly true. The information leakage from
encrypted data in the cloud is a plausible risk and very few works address this risk. The reported
research in this thesis, leverages the leakage-free query processing over very large scale encrypted
datasets. Ultimate goal is minimizing the information leakage with efficient solutions; therefore, a
diversity of techniques is utilized.
17
Table 2.1: Information leakage management methods comparisonMethod Description Context Advantage Downside Reference
ObliviousCross-Tags(OXT)
Searchable symmetricencryption
Searches for a set ofkeywords
Practical (1)Multiple round ofinteractions; (2)Pre-Processing
Cash et al. [12]
Extended-OXT
Searchable symmetricencryption
Searches for a set ofkeywords
Extends OXTto: (1)Substring;(2)Wildcards andPhrase; (3)Substring
(1)Multiple round ofinteractions; (2)Pre-processing
Faber et al. [21]
CryptDB Secure query process-ing
SQL aware database Efficient Leakage from en-crypted data
Popa et al. [42]
SecureNoSQL Leakage resilientquery processing overencrypted database
NoSQL database Covers: (1)searchover encryptedNoSQL databases;(2)Leakage preven-tion
Requires extra hard-ware resources forProxy
Current work *
* The paper related to this work is currently under review.
18
CHAPTER 3: RESEARCH OBJECTIVES AND APPROACH
3.1 Research Objectives
The primary research interests of this work are at the intersection of cloud computing and informa-
tion security, in an area known as secure computation outsourcing on the third party. We seek to
understand the needs for security and privacy of both individual users, as well as, of large organi-
zation in a public cloud environment. The security of users’ data in the cloud computing, as a large
scale distributed computational platform, is a demanding challenge that influences all users. The
evidence shows that the importance of information security in cloud is increasing as more on-line
systems are moving into to the cloud. In general, our research vision is to design security schemes
that enable cloud users to securely receive the productivity and computational benefits of the cloud
DBaaS without compromising security and privacy.
3.2 Motivation
The principal research challenge is to answer this question, “Is it possible to delegate processing
of your data without getting your private information revealed?” In other words, the goal of my
research is to resolve the conflict between the availability of data on a public-access cloud and
providing the required security level. By using classic encryption, the cloud server needs to decrypt
the data with secret decryption key before being able to process the data; however, this process
reveals users private information to adversary or malicious insider. Resolving this issue requires
a multidisciplinary approach that ties computer science and mathematics with application specific
knowledge such as finite field. As a summary, my research objective is to design a secured solution
for cloud-based on-line applications in order to address the corresponding security requirements.
19
The key contributions and impact of this research are cloud-based large-scale database systems,
on-line transaction processing (OLTP) and web applications.
Technology research analyses indicate the large number of enterprises are using cloud DBaaS
from major CSP. The number of websites hosted on AWS has increased from 6.8M in September
2012 to 11.6M in May 2013, a 71% upsurge [1] [35]. Furthermore, a 67% annual growth rate
is predicted for DBaaS by 2019. Undoubtedly, considering the cloud threat model an efficient
security scheme is required for high volume of data stored and processed in the cloud. Threats
of cloud computing can be analyzed from multiple viewpoints, this work investigates it from the
adversarial prospective which is a holistic multifaceted procedure that considers whole system’s
security end-to-end. The adversarial threat analysis starts with thinking like a hacker and continues
to prepare a corresponding countermeasure. The model identifies two classes of threats as external
and internal attackers. These two classes of threats are addressed by the proposed solution. The
description of the two major threats are as follows.
3.3 Threat Model
A threat model describes the threats against a system. The threat model of cloud computing can
be analyzed from multiple viewpoints. In this work we investigate this issue from the adversarial
prospective. The adversarial threat model for DBaaS is a holistic process based on end-to-end
security. The model identifies two classes of threats, as external and internal attackers.
External attacker: An attacker from the outside of cloud environment might obtain unauthorized
access to the data by applying techniques or tools to monitor the communication between clients
and cloud servers. External attackers, in most cases, face a more complex task because they must
bypass firewalls, intrusion detection systems and other defensive tools without any authorization.
20
Cloud malicious insiders: An internal attacker has primary advantages of being within the pro-
tected area of cloud and having access to resources. A major side effect hosting database in the
cloud is unauthorized access to data by the cloud internals which are refereed as malicious insid-
ers. More specifically, a certain employee or contractor of the CSP will have access to the servers,
software and hardware and therefore, to user’s data. The Efforts for data protection provided by
CSP could be bypassed by malicious insider. Encrypted datasets accompanied with secure proxy
construction such as SecureNoSQL, guarantees that malicious insiders never obtain the decryption
keys. The proxy encrypts/decrypts data and query/response between clients and cloud. The proxy
construction assures the malicious insider could not explicitly access to the sensitive information,
however still there is risk of information leakage from ciphered datasets. The malicious insider
exploits the leaked information to organize more extensive attacks to amplify leakage. In the tra-
ditional single propose on-site server malicious insider only has access few database as source to
conduct data inference attack, but the cloud server has access to millions of datasets belonging
to large variety of enterprises. With initial brute force inference attack the adversary can extract
implicit information. This analytic is totally from information leakage view points in the cloud
infrastructure is a novel idea of this work.
3.4 JSON And BSON
JavaScript Object Notation (JSON) is an open standard format which can be used to transmit data
objects consisting of key-value pairs in a self describing manner. JSON was primarily used as a
main data format to interchange data between servers. JSON supports all the basic data types of the
JavaScript programming language [9]. JSON is a simple, lightweight and efficient data structure
and these features make it as an appealing option for database vendors. Thus, several NoSQL
document store databases such as MongoDB, CouchDB and Google Cloud DataStore adopt JSON
21
as their primary data representation to store and index. There is a binary extension for JSON
known as Binary JSON (BSON) that is being used by most of the document databases to represent
JSON documents in binary-encoded format in the back-end processing. BSON extends the JSON
model to provide additional data types and it is more efficient than JSON. In fact, BSON provides
users the ease of use and flexibility of JSON together with the speed and efficiency of a lightweight
binary format. In the document database, data, query, and query response all represented in the
JSON format such that document databases are referencing as JSON databases.
In this work we use JSON to create a new concept so-called security plan. In particular, the
security plan is a document contains a hierarchical collection of key-value pairs that describes
data elements, parameters of cryptosystems and mapping between these two. Every security plan
document includes four top-level sections represented in key-value pairs (see Section 4.1).
3.5 Problem Statement
The data owner has a database containing sensitive information and wants to encrypt and upload
it to a cloud and give search permission to a group of users using DBaaS. The data owner wants
to keep the data and users queries private from the CSP. Users should be able to retrieve all doc-
uments that satisfy specific condition posed by their queries from the encrypted database. An
additional privacy requirement critical for some applications such as stock market data, is to hide
any information about the access pattern from a cloud insider.
The proposed solution requires only one interaction per query with a minimum communication
between users application and DBaaS server. The work of DBaaS server for processing a re-
quested query over encrypted database still remains liner in the size of database. We address both
confidentiality and leakage prevention requirement.
22
• We propose a descriptive language based on JSON notations that enables the users to create a
security plan for database and describe security parameters and assign proper cryptographic
primitives to the data elements.
• A multi-key, multi-level mechanism. The lifetime of an encryption key is shorter than that of
encryption modules so it is subject of change more frequently than encryption parameters.
Furthermore, keys are assigned for single data element, while encryption algorithms could be
applied for several data elements with several keys. This separation allows a more efficient
enforcement of security policy and of key management.
• We design an effective validation procedure against security plan in SecureNoSQL, helps to
initially evaluate locally all requests, rather than forwarding large numbers of fallacious key-
value pairs to remote cloud server. This mechanism helps to avoid unnecessarily increase of
workload and response time of remote cloud server.
• Support for a comprehensive, flexible protection. The solution is open-ended, users can add
new customized cryptographic modules simply by using designed descriptive language.
• A balanced system with a security level-proportional overhead. The overhead of scheme is
proportional to the desired level of security.
• SecureNoSQL addresses the information leakage from fully or partially encrypted databases
in the cloud. The malicious insider potentially could pool all databases and extract sensitive
information from correlation with various hosted databases. We propose a novel algorithm
that minimize information leakage in the untrusted cloud.
The details of SecureNoSQL proxy is discussed in the Chapter 4.
23
CHAPTER 4: CURRENT WORK AND PRELIMINARY RESULTS
Our research started with exploring the characteristics of cloud architecture and cloud services
followed by a study of classic and modern cryptography. The research results were published in
a paper entitled “Application of order-preserving encryption (OPE) among multiple organizations
in hyper cloud environment” [4]. In this work, we report a novel solution for efficient query
processing on encrypted data. The security scheme we proposed adds overhead due to increased
size of the ciphered data. Processing the transformed data increases the time and space in an
acceptable range. This part of our research delivers an encryption solution by a straightforward
relaxation of standard security philosophies such as indistinguishability against chosen-plaintext
attack which is unfeasible by a practical OPE scheme. As a result, a security notion is proposed in
the essence of pseudo-random functions and related primitives such that the OPE scheme becomes
“as random as possible” to fulfill the order preserving constraints.
Afterwards, our research was continued with designing a scheme for read intensive large scale
database. In this scheme the classic cryptography is used for securing geographic information
databases for location based system [3]. This work concentrates on security of distributed large
scale databases with high rate of read and low rate of write operations. More specifically, wide
variety of applications ranging from social networks to military applications are using location
information for delivering different services. Moreover, smart phones and hand-held devices are
increasingly being used for mobile transactions. These devices are mostly GPS-enabled and can
provide location information to the service providers. In some cases, the geographical information
of clients is integrated with location-based applications as an authentication factor to enhance
security. Yet, since it is easy for attackers to forge location information, the security of geographical
information is a critical issue. The geographical database features were discussed and an effective
security scheme was proposed accordingly for mobile devices with limited resources.
24
Our research is currently under progress exploring security solutions for Big Data in the cloud
computing. Many applications in areas as diverse as computational science and engineering, com-
putational finance and economics, mobile computing, and social media require access to very large
NoSQL databases, stored on computer clouds. The NoSQL databases are preferred to relational
databases for such Big Data applications due to faster response time and scalability. Considering
importance of Big Data in research communities, my investigations are extended to the field of
security of cloud Big Data. A security scheme for Big Data applications is developed based on
encryption of data in JavaScript Object Notation (JSON) format. The results of the experiments
carried out on the very large NoSQL data-stores were inspected for different types of queries.
In a related topic in my research, the grouped homomorphic operation over encrypted data-stores
and leakage prevention on the untrusted cloud server. We explored the problem of minimizing
information leakage from the encrypted databases in cloud environment. A fundamental issue in
the cloud environment is to preserve the data security. Data encryption is the basic means by which
sensitive information can be protected from intruders and malicious insider or external attacks.
However, users need to interact with the encrypted data stores through queries. Analyzing queries
enables cloud to illegitimately gain knowledge about underlying sensitive information which is
considered as information leakage. On the other hand, preserving security of queries against cloud
is a fundamental issue for data owners. Literature studies are focused on data and query privacy;
however, the information leakage from encrypted data-sets and queries is not being addressed in
any study. Therefore, a diversity of techniques will be utilized including: analysis of information
leakage of different cryptosystems on the encrypted large data sets, implementation, experimental
setup, theoretical analysis, and simulation in the cloud.
25
4.1 SecureNoSQL
This section introduces a construction, denoted as SecureNoSQL, that is a framework to incorpo-
rate the data confidentiality and information leakage prevention algorithms. SecureNoSQL lever-
ages secure query processing for web and mobile applications which are using DBaaS. There are
two possible different system organizations which are able to fulfill our design objectives. The first
one, shown in Figure 4.1 is suitable when all database users belong to the same organization. Then
the proxy runs on a trusted server behind a firewall, so that the communication between clients and
the proxy is secure. The second case, clients are unrelated to one another and access the system
through public lines. In this case, either each client’s software includes a copy of the proxy and
only encrypted data is transmitted over public lines, or Secure Sockets Layer(SSL) protocol is used
to establish secure connection to the proxy. Figure 4.1 illustrates the high-level architecture of
SecureNoSQL as a secure proxy between user’s applications and cloud NoSQL database server.
Data Query Security plan Query Response Query Response
Security LayerData Integrity
Leakage prevention
Query Language
Data ModelStorage Engine
Data store Replica set
Data Owner Client1 Clientn
QueryResponse
Application layer
SecureNoSQL Proxy
Cloud NoSQL database
Figure 4.1: Architecture of SecureNoSQL.
26
4.1.1 The Proposed Construction: SecureNoSQL Scheme
SecureNoSQL is based on general principles of NoSQL database products. We introduce a new
concept, denoted as security plan, which is a JSON description of subsequent data elements, meta-
data and parameter configuration of cryptosystems. In the proposed solution, a descriptive lan-
guage is introduced to generate and read the security plan automatically. JSON, as a dominant
format in NoSQL databases, is selected as a format to express the designed security plan. We used
a subset of JSON notation readable by human and machine. Document databases, such as Mon-
goDB, store documents inside the collection in JSON representation in a similar way as RDBMS
stores tables and records. A query and the corresponding response are also represented in JSON
format; therefore, the governing format in document database is JSON. Additionally, there is a
binary extension of JSON, known as BSON, which is used by document oriented databases for
efficient encoding/decoding. JSON query model is a functional, declarative notation, designed
especially for working with large volumes of structured, semi-structured and unstructured JSON
documents. The data owner develops the security plan that outlines and maps out the determined
crypto-primitive with specific parameters to a particular data element.
The schema in NoSQL database is flexible which allows a different number of attributes for dif-
ferent documents corresponding to the same object. On the other hand, in order to create a com-
prehensive protection for all data elements in the database, a full list of attributes is required to
assign proper level of protection. Therefore, we define a logical operator denoted as Super Docu-
ment which is basically the union of all attributes for different versions of documents related to the
same object. As it is described in Equation 4.1, each pair 〈ki, vi〉 represents an attribute and Super
27
Document represent all attributes of a specific object.
d1 =⟨〈k1, v1〉, 〈k2, v2〉, . . . , 〈ki, vi〉
⟩d2 =
⟨〈k1, v1〉, 〈k2, v2〉, . . . , 〈kj, vj〉
⟩. . .
dn =⟨〈k1, v1〉, 〈k2, v2〉, . . . , 〈kl, vl〉
⟩Super Document D =
n⋃i=1
di
(4.1)
In addition, a match functionM(di, dj) is required to determine whether any two given documents
di, dj can be merged or not. Two documents can be merged provided that they share the same
attribute from an identifying class or group of attributes from semi-identity class.
4.1.2 Security Plan
In fact, the security plan identifies the mechanism that is applied to maintain the security of data
elements in a database. Also it determines how to interpret queries that are issued by a specific
user’s applications. As it can be seen in Figure 4.2, we organized the security plan in four sub-
divisions which enable us to efficiently describe security rule, not only for data elements but also
for meta-data such as field-name (Key) and collection name. These subdivisions are the building
blocks of security plan which elaborates how those rules are enforced over the giving data. The
structure of the subdivisions are presented as follows:
1. Collection. The first section includes the name of a collection and a reference to the encryp-
tion module to be used to encrypt the name of collection and name of fields (meta data).
2. Cryptographic modules: The second section lists the cryptographic modules for encrypting
28
the fields of the database entries in the query.
3. Data elements. The third section lists the properties of each data field including the data
type. The data type determines cryptographic modules to be applied to each field.
4. Mapping cryptographic modules to the fields: The fourth section specifies the cryptographic
modules used to encrypt the value of fields. This information is used by the proxy to encrypt
and decrypt the data fields.
Security Plan
Collection
Cryptographic modules
Data elements
Mapping cryptographic modules to the fieldsFigure 4.2: The high level structure of the security plan.
Collection: A collection is defined as a group of NoSQL documents which is an equivalent for
table in relational database. A collection has some properties like name which need to be protected
by encryption. The structure of collection is illustrated and described in the Figure 4.3. For more
clarification, refer to the listing 4.3b that presents how to secure a sample collection using our
designed descriptive language.
The key-value pairs (KVP) are the primary data model for a NoSQL database. The key is used as
an index to access the associated value of the data pointed by the reference ref. The initialization
vector (IV) is a fixed-size, random input to the cryptographic module encryption. Additionally, a
collection exists within a single database. Documents within a collection can have different fields.
29
Collection
name
encryption
ref
key
iv
fieldName
encryption
ref
key
iv
(a)
{"$collection" : {"name" : "Personnel","$encryption":{"$ref":"/AES-DET","key":"02468acebdf135790369cf258be147ad","iv":"2468ace0" }},"$fieldName": {"$encryption": {"$ref":"/AES-DET","key":"0123456789abcdef0123456789abcdef","iv":"ffeeddcc"}}}
(b)
Figure 4.3: Structure and description of Collection: (a) The chart outlines the structure of collection contain-ing the name of collection and name of all fields which are considered as meta-data thus should be protectedwith proper cryptographic module. The pointer to a cryptomodule, the encryption key, and the initializationvector used for the encryption of the items. (b) The description of a collection and security parameters indesigned JSON based language. In this specific case the Advanced Encryption Standard in deterministic(AES-DET) mode with a 128-bit key and an initialization vector (IV) is assigned to encrypt the name of thecollection and the fields name.
Typically, all documents in a collection are related with one another.
Cryptographic modules. There are various encryption algorithms for different applications, each
with diverse strengths and weaknesses. The choice of a particular cryptosystem depends on the se-
curity policy of applications. Criteria for algorithm selection include: the security against theoreti-
cal attacks, cost of implementation and performance issues whether the encryption and decryption
can be parallelized in CPU pool like cloud computing. Other factors may be involved in the selec-
tion of an algorithm are the memory requirements and the integration in the overall system design.
According to the proposed format, the Cryptographic modules introduces all encryption modules
30
and their parameters such as key, key-size, initialization vector and output-size. The structure of
this section depicted in Figure 4.4a and the listing introduced in Figure 4.4b is displaying second
section of security plan for the previous example.
Cryptographic modules
Module #1
name
type
keySize
key
inputSize
outputSize
Module #2
name
type
keySize
key
inputSize
outputSize
(a)
{"OPE" : {"properties" : {"encryptionMethod" : {"type" : "string","enum" : [ "OPE" ] },"keySize" : {"type" : "integer","minimum" : 64,"maximum" : 4096,"default" : 128 },"key" : {"type" : "string","pattern" : "ˆ([0-9a-fA-F]{2})+$" },"inputSize" : {"type" : "integer","minimum" : 8,"maximum" : 128,"default" : 32 },"outputSize" : {"type" : "integer","minimum" : 64,"default" : 128 } },"required" : [ "key", "
encryptionMethod" ],"additionalProperties" : false}}
(b)
Figure 4.4: Structure and description of Cryptographic modules: (a) Security Plan with the second section,the cryptographic module, expanded. The attributes included for each module are: name, type, key size,key, input and output size. (b) The OPE encryption including the cryptosystems and their attributes. Theproxy applies these modules using the key-value pairs (KVP).
Our proof of concept uses the parametric Order Preserving Encryption (OPE) and the Advanced
31
Encryption Standard(AES) modules. The system is open-ended, users can add the cryptosystems
best suited to the security requirements of their application. In our design the definitions of the
cryptographic modules and of the pairs, encryption key and initialization value, are separated fol-
lowing the so-called key separation principle [22]. This security practice is based on the observa-
tions that users have long- and short-term security policies. The cryptographic modules are less
likely to change while the key and the initialization value change frequently.
The data elements. The third section of security plan, the data elements and their properties are
covered. Figure 4.5 presents the structure and description of Data element section of Security plan.
The listing displayed in Figure 4.5b displays data elements and its JSON description for previous
example. To ensure the desired level of security the security plan should provide the description of
all sensitive data elements of database in third section of security plan.
Data elements
Field #1
name
type
value
Field #2
name
type
value
(a)
{"id":{ "type":"integer" },"name":{ "type":"string" },"salary"{ "type":"integer" },"balance"{ "type":"integer" },"ccn"{ "type":"integer" },"ssn"{ "type":"integer" },"email":{ "type":"string" }"required":["id","name","email","
salary"]}
(b)
Figure 4.5: Structure and description of Data element: (a) The chart outlines the structure of Data elementscontaining attributes of data elements such as name, type and value for of collection and name. Thenintroduces security parameters for each data elements. (b) The data element section of a sample databasewhich are represented in designed notation. A data item has 7 fields: id, name, salary, balance, ccn, ssn, andemail. The id, name, email and salary are required fields.
Mapping cryptographic modules to the fields The last section of security plan specifies all cryp-
32
tographic modules for all sensitive data fields. Figure 4.6 and the listing presented in Figure 4.6b
show the mapping of the cryptographic modules and the corresponding JSON format for a sample
application.
Mapping cryptographicmodules to the Fields
Field #1
Cryptographic module m
Field #2
Cryptographic module n
(a)
{"id":{ "$ref": "#/definitions/
ope128" },"name":{ "$ref": "#/definitions/
AES-DET" },"email":{ "$ref": "#/definitions
/AES-DET" },"salary":{ "$ref": "#/
definitions/ope256" },"ssn":{ "$ref": "#/definitions/
ope256" },"ccn":{ "$ref": "#/definitions/
ope256" },"balance":{ "$ref": "#/
definitions/ope256" }}
(b)
Figure 4.6: Structure and description of Mapping cryptographic modules to the Data element: (a) Securityplan with the fourth section expanded. This section establishes a correspondence between the data fieldsand the cryptographic modules used to encrypt and decrypt it. (b) The mapping section of the schema fora sample database with 7 fields. For example, the id and the name will be encrypted with OPE 128 bit andAES-DET, respectively.
As outlined in Section 1, the method presented in this work can be easily extended to the other
NoSQL data models discussed in Section 2. Figure4.7 shows how this extension from the KV to
the document store model can be carried out.
Query and data validation The proxy validates the data and query as a JSON-formatted input
with the reference security plan. Afterward, enforcing assigned crypto-primitives, generates new
query with respect to NoSQL query semantic; in this process it applies to each field the cryp-
tographic modules described in the mapping section of the schema, Finally, the proxy forwards
33
Cryptographic module z
Key1 V alue1
... ...
Keyn V aluen
Cryptographic module1
...
Cryptographic modulen
(a)
Collection name
Cryptographic module x
Document ID
Cryptographic module y
Cryptographic module z
Key1 V alue1
... ...
Keyn V aluen
Cryptographic module1
...
Cryptographic modulen
(b)
Figure 4.7: SecureNoSQL applied to: (a) The key-value data model; Key1, . . . ,Keyn are all encryptedusing the cryptographic module z while the corresponding values, V alue1, . . . , V aluen are encrypted withcryptographic modules 1, 2, . . . , n, respectively. (b) The document store data model; the meta-data such asscollection name encrypted as well as attributes with assigned cryptographic modules.
new encrypted query/data to the NoSQL database server. Figure 4.8 depicts the schema validation
process.
For better illustration, consider listings depicted in Figure 4.9a as an input data after running val-
idation process the output is generated (see Figure 4.9b). The output of validation process is a
single file which contains descriptive information for data and meta-data in designed format and
ready to execute on the SecureNoSQL.
The output of validation process is a single file which contains descriptive information for data
and meta-data in designed format and ready to execute on the SecureNoSQL. The final output of
validation process for example is illustrate in Figure 4.9b. As it noted earlier in Section 3.5, the
34
JSON Data/Query Security plan
Validation of data elements (format matching)
Extraction of encryption parameters
Applying cryptomodules to the data and metadata
Forward encrypted Data/Query to cloud NoSQL server
NoSQL server
Figure 4.8: The validation process of input data against security plan in the client side.
prosed scheme is proportional to desired security level which explicitly expressed in security plan
for any database. In Table 4.1 the data overhead based on the different parameters for several
crypto-primitive are contracted.
Table 4.1: Overhead of encryption upon security level
Database Plain OPE64 OPE128 OPE256 OPE512
Size(MB) 170 430 508 662 1000
4.1.3 Processing Queries On Encrypted Data
According the proposed scheme, in order to process queries over encrypted data the queries should
transferred to the encrypted version with respect to security plan, this task is designed to conducted
by our secure proxy. The security plan provides the assigned cryptographic modules to be applied
to the different fields of query. Figure 4.10 displays the processing and rewriting of a sample query.
35
{"id": 1,"name":"Mohammad
Ahmadian","email": "
"salary":17000,"ssn": 433042664,"ccn":"47162552387","balance":1320}
(a)
{"id": {"$encryption": {"encryptionMethod": "op2128","key": "
ADBDBC3B439DB495A81DA1BE56ACA" },
"value": 1 },"name": {"$encryption": {"encryptionMethod": "AES-DET","key": "00112233445566778899
aAbBcCdDeEfF" },"value": "Mohammad Ahmadian" },"email": {"$encryption": {"encryptionMethod": "AES-DET","key": "00112233445566778899
aAbBcCdDeEfF" },"value": "[email protected]" },"balance": {"$encryption": {"encryptionMethod": "ope256","key": "
A75C644DF2E4EFE5328BB35E3C636" },
"value": 1320 }}
(b)
Figure 4.9: Security plan designed for sample input: (a) Data element section of sample security plan. (b)Output of JSON Data validation for sample database.
For better understanding the query encryption, in Table 4.2 you can find some sample encrypted
queries after enforcing security plan. As it can be seen, data elements and immediate values are
encrypted, however the output is consistent with NoSQL semantics.
36
and
≥
salary 5000
≤
balance 2000
(a)
and
≥
9mnGu8Q2VDstE+T9jFw2wQ==
3986410786398723978941641627711702
≤
5pgAxn6BF08WtM7zyuYaKg==
161374267674800082431533686937402
(b)
Figure 4.10: The query db.customers.find({salary:{$gt:5000}, balance:{$lt:2000}}) received from an ap-plication. (a) The parsing tree of the query (b) The cryptographic modules applied to the data elementsaccording to schema definition
.
Table 4.2: Sample queries and their corresponding encrypted versionQuery Encrypted query
1 db.customers.find({ssn:936136916})db[”k/IevnbanDMQHNkb9cRgUg==”].find({”5pgAxn6BF08WtM7zyuYaKg==”:74172405478441908041711118833862143778})
2db.customers.find({balance:{$gte:5084610},balance:{$lte:9911843}})
db[”k/IevnbanDMQHNkb9cRgUg==”].find({”3iXpo2l8xZpW7J7TezFdeA==”:{$gte:402982988013604629517872370128473753},”3iXpo218xZpW7J7TezFdeA==”{$lte:785596355698717592780268633369454231}})
3db.customers.aggregate([{$group:{ id:null,minBalance:{$min:”$balance”}}}])
db[”k/IevnbanDMQHNkb9cRgUg==”].aggregate([{$group:{ id:null,EncMinBalance:{$min:”$3iXpo2l8xZpW7J7TezFdeA==”}}}])
4db.customers.aggregate([{$group:{ id:null,maxBalance:{$max:”$balance”}}}])
db[”k/IevnbanDMQHNkb9cRgUg==”].aggregate([{$group:{ id:null,EncmaxBalance:{$max:”$3iXpo2l8xZpW7J7TezFdeA==” }}}])
5db.customers.find({$or:[{Salary:{$gt:516046}},{balance:{$lt:285462}}]})
db[”k/IevnbanDMQHNkb9cRgUg==”].find({ $or: [ { ”9mnGu8Q2VDstE+T9jFw2wQ==”: { $gt: 40994186216785746613193244129885849}},{”3iXpo2l8xZpW7J7TezFdeA==”:{$lt:22657430453144634679791167652174833}}]})
4.1.4 Measurements And Experimental Results
The experiments to measure the query time must be carefully designed. To construct average
query processing time each experiment has to be carried out repeatedly. We noticed a significant
reduction of database management response time after the first execution of a query, a sign that
MongoDB is optimized and caches the results of the most recent queries. A solution is to disable
37
the cache, or if this is not feasible, to clear the cache before repeating the query. Another important
observation is that modern processors have a 64-bit architecture and are optimized for operations
on 64-bit integers. This explains why for three of the five types of queries, Q2 (Range query), Q3
(equality), and Q4 (logical), database response time is slightly shorter for the encrypted database
than for the unencrypted one when the keys are 32-bit integers.
Comparison EqualityRange Logical Aggregation
300
350
400
450
500
550
600
650
700
Que
rypr
oces
sing
time
(mic
rose
cond
s)
32 bit64 bit
128 bit256 bit512 bit
Figure 4.11: Query processing time in milliseconds (ms) for the unencrypted database and for the encrypteddatabases when the 32-bit keys are encrypted as 64, 128, 256 and 512-bit integers.
Our measurements show that the response time of the NoSQL database management system to
encrypted data depends on the type of the query. The shortest and longest database response time
occur for Q1 (comparison) and Q5 (aggregated queries), respectively; for these two extremes the
time for the unencrypted database almost doubles, but the time for encrypted databases increases
only by 70− 80%. As expected, the query processing type for a given type of query increases, but
only slightly, less than 5% when the key length increases from 64, to 128, 256, and 512 bit. As
expected, the OPE encryption time increases significantly with the size of the encryption space; it
38
increases almost tenfold when the size of the encrypted output increases from 64-bit to 1024-bit
and it is about 10 ms for 256-bit. The decryption time is considerably smaller, it increases only
slightly from 0.11 ms to 0.17 when the size of the encrypted key increases from 64-bit to 1024 bit.
Secure proxy is an important element for the proposed architecture; therefore, the potential attacks
that could affect the proxy, also should be taken to considerations. In general, two major possible
attacks on proxy are Denial of Service (DoS) and unauthorized access. In DoS attack, the attacker
sends so many network traffic to the proxy, that the system is not capable of process within the
expected time frame. Successful DoS attacks can turn the proxy to a bottleneck of the system.
In unauthorized access attacks, attackers use a proxy to mask their connections while attacking
to the different targets. For improving the security of proxy against DoS attacks and reducing the
consecutive impacts, there are different solutions including blocking the undesired packets or using
multiple proxies with load balancers. Moreover, for prevention of unauthorized access attacks, it
is required to use best fit authorization to access the proxy. User authentication based on group
membership with different authorizations are best practical solutions.
4.2 Leakage Prevention In DBaaS
Encryption is a common practice to promise privacy of data and query, but still encrypted data and
query are vulnerable against information leakage in cloud platform. A databases can be encrypted
by data owner before being outsourced to the cloud in such a way that client queries can still be
processed on the transformed data. Ultimately, the encryption does not hide all information about
39
the encrypted data. For instance, collection name (or table name in RDBMS), field name, number
of field, involved in a query and their length often reveal sensitive information about the encrypted
data. Moreover, a cloud insider can infer sensitive information from sequence of queries. This type
of attacks on encrypted database is categorized in information leakage class. Outsourced encrypted
data set should leaks sensitive information as little as possible. An acceptable level of security
on searchable encryption can be achieved with the proposed scheme. For studying information
leakage from DBaaS model, we choose NoSQL database model with flexible scheme. In the data
model of NoSQL, a database is depicted as a collection of documents C = {d1, d2, . . . , dn} and
accordingly a document is modeled with a set of key-value pairs {ki, vi} each of which represents
an attribute of an object.
4.2.1 Problem Statement
We assume that the data is fully or partially encrypted before being outsourced to the CSP. How-
ever, fully or partially encrypted databases in the cloud are at the risk of information leakage in the
presence of a malicious cloud insider who potentially could pool all databases and extract sensi-
tive information from correlation between various hosted databases. This work characterizes most
common sources of information leakage from encrypted NoSQL databases. We propose and ana-
lyze a secure query processing system with minimum information leakage in an untrusted cloud.
Also a metric to quantify the information leakage is introduced. This work currently is under
progress and the experimental results will be presented in the dissertation.
40
CHAPTER 5: CONCLUSION
We presented a novel searchable secure scheme over encrypted NoSQL databases which provides
protection for sensitive information in presence of two important threats confronting database-
backed applications. The proposed scheme meets all design objectives with respect to three prin-
ciples: i. Running queries efficiently over encrypted data using a novel JSON-aware encryption
strategy, the evaluation on a large trace of queries from a variety of databases running on the cloud
DBaaS shows that SecureNoSQL can support search operations over encrypted NoSQL data. The
throughput penalty of SecureNoSQL is modest, resulting in a reduction of 1425% on performance
of query processing time as compared to Plain database. Our security analysis shows that Se-
cureNoSQL protects most sensitive attributes of collection with highly secure encryption schemes
for variety of applications. ii. With application security plan which is novel notion introduced in
this work for automation of security parameter configuration to enforce security policy on database
and relevant queries. Intuitively, the life time of encryption key is shorter than encryption algo-
rithm and we expect key change happening more frequently than changing cryptosystem itself.
By using the designed descriptive language, the data owner manage the security parameters to the
secure proxy with minimum effort. iii. Our security analysis shows that SecureNoSQL protects
most sensitive attributes of collection with highly secure encryption schemes for variety of appli-
cations. Furthermore, the server application is kept unmodified and the user never involved in the
complexity of security measures.
The secure proxy is a critical component of the system, it is multi-threaded and the cache man-
41
agement is non-trivial. The management of the security attributes is rather involved. On the other
hand, a proxy integrated in the client-side software can be light-weight and considerably sim-
pler. We are currently implementing the two versions of proxy. Experimental results for multiple
large datasets with up to one million documents show that SecureNoSQL is rather efficient. Our
approach can be extended to a multi-proxy structure for big data applications. We are now im-
plementing a sophisticated mechanism for maintaining consistency of hash values database in the
proxies datasets based on PAXOS [32, 37]. Outsourcing encryption data sets to the third party
like cloud environment provides good level security, however encryption of query and data is still
vulnerable against data leakage in cloud platform. The encryption does not hide all information
about the encrypted data, and this is new area for research and investigation for future works. We
introduced novel techniques to protect encrypted data sets to prevent malicious insider to discover
implicit information especially with cross-referencing attack. The propose method introduces data
overhead which is proportional to the desired security level.
5.1 Work In Progress And Tasks Time Table
My research work in progress is on the leakage prevention from both plain and ciphertext databases
hosted by DBaaS. We propose solution for this problem by utilizing data encryption as a primary
approach that protects sensitive information from intruders and malicious insiders. In the rest
of this research, we implement the proposed algorithm in real world cloud service and NoSQL
databases hosted by DBaaS. The tasks have been done so far are shown in blue bars and the tasks
42
in progress are illustrated with red bars, all demonstrated in figure 5.1. The titles of all resulting
papers during this research work are listed in Table 5.1.
2013 2014 2015 2016
1 2 3 4 5 6 7 8 9 101112 1 2 3 4 5 6 7 8 9 101112 1 2 3 4 5 6 7 8 9 101112 1 2 3 4 5 6 7 8 9 101112
Study
Published papers
Paper 1
Paper 2
Submitted papers
Paper 3
Paper 4
Paper 5
Ready to submit
Paper 6
Revision of papers
Revising the Papers
Figure 5.1: Estimate work plan and timeline
Table 5.1: List of publicationsPaper Paper Authorship Journal or StatusNo Title Conference
Paper 1Security of Applications Involving Multiple M.Ahmadian, A.Paya IEEE 28th InternationalOrganizations-OPE in Hybrid Cloud Environments [4] D.Marinescu Parallel & Distributed Processing Published (2014)
Paper 2A security scheme for geographic information M.Ahmadiandatabases in location based systems [3] J.Kho., D.Marinescu IEEE SoutheastCon Published (2015)
Paper 3SecureNoSQL: An approach to secure search on M.Ahmadian, F.Plochan International Journal ofencrypted NoSQL databases in public cloud [5] Z.Roessler, D.Marinescu Information Management (IJIM) Published (2017)
Paper 4An Analysis of Information Leakage due to Insider M.Ahmadian Journal of Information Securityand some Outsider Attackers in Computer Clouds D.Marinescu and Applications Under review
Paper 5Secure Query Processing in Cloud NoSQL [2] M.Ahmadian IEEE International Conference
on Consumer Electronics Published (2017)
Paper 6On information leakage in cloud database M.Ahmadian Transaction of sustainable computationservices D.Marinescu Under review
5.2 Future Work
The current research will be continued by the following suggestions:
43
• Multiple proxies in order to deal with a huge number of clients,
• Developing an efficient, fully homomorphic encryption for unlimited operations over the
encrypted data,
• Encryption key management mechanism development for periodically assigning new key for
cryptosystems in order to obtain higher levels of security.
44
LIST OF REFERENCES
[1] Amazon web services growth unrelenting. (last accessed 3rd May, 2016).
[2] M. Ahmadian. SECURE QUERY PROCESSING in CLOUD NoSQL. In 2017 IEEE in-ternational conference on consumer electronics (ICCE) (2017 ICCE), Las Vegas, USA, Jan.2017.
[3] M. Ahmadian, J. Khodabandehloo, and D. Marinescu. A security scheme for geographicinformation databases in location based systems. IEEE SoutheastCon, pages 1–7, April 2015.
[4] M. Ahmadian, A. Paya, and D. Marinescu. Security of applications involving multiple orga-nizations and order preserving encryption in hybrid cloud environments. IEEE Internationalconf. on Parallel Distributed Processing Symposium Workshops (IPDPSW), pages 894–903,May 2014.
[5] M. Ahmadian, F. Plochan, Z. Roessler, and D. C. Marinescu. SecureNoSQL: An approachfor secure search of encrypted nosql databases in the public cloud. International Journal ofInformation Management, 37(2):63 – 74, 2017.
[6] A. Boldyreva, N. Chenette, Y. Lee, and A. Oneill. Order-preserving symmetric encryption.In Advances in Cryptology-EUROCRYPT 2009, pages 224–241. Springer, 2009.
[7] Z. Brakerski and V. Vaikuntanathan. Fully homomorphic encryption from ring-lwe and secu-rity for key dependent messages. Advances in Cryptology–CRYPTO, pages 505–524, 2011.
[8] Z. Brakerski and V. Vaikuntanathan. Efficient fully homomorphic encryption from (standard)lwe. SIAM Journal on Computing, 43(2):831–871, 2014.
[9] T. Bray. The javascript object notation (json) data interchange format. 2014.
[10] D. Cash, D. Hofheinz, E. Kiltz, and C. Peikert. Bonsai trees, or how to delegate a latticebasis. Journal of cryptology, 25(4):601–639, 2012.
[11] D. Cash, J. Jaeger, S. Jarecki, C. Jutla, H. Krawczyk, M.-C. Rosu, and M. Steiner. Dynamicsearchable encryption in very-large databases: Data structures and implementation. Networkand Distributed System Security Symposium (NDSS14), 2014.
[12] D. Cash, S. Jarecki, C. Jutla, H. Krawczyk, M.-C. Rosu, and M. Steiner. Highly-scalablesearchable symmetric encryption with support for boolean queries. Advances in Cryptology–CRYPTO 2013, pages 353–373, 2013.
[13] R. Cattell. Scalable sql and nosql data stores. ACM SIGMOD Record, 39(4):12–27, 2011.
[14] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra,A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACMTransactions on Computer Systems (TOCS), 26(2):4, 2008.
[15] K. Chatzikokolakis, T. Chothia, and A. Guha. Statistical measurement of information leak-age. In International Conference on Tools and Algorithms for the Construction and Analysisof Systems, pages 390–404. Springer, 2010.
45
[16] R. Chow, P. Golle, M. Jakobsson, E. Shi, J. Staddon, R. Masuoka, and J. Molina. Controllingdata in the cloud: outsourcing computation without outsourcing control. Proc. of the ACMworkshop on Cloud computing security, pages 85–90, 2009.
[17] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloudserving systems with ycsb. In Proceedings of the 1st ACM symposium on Cloud computing,pages 143–154. ACM, 2010.
[18] C. Curino, E. P. Jones, R. A. Popa, N. Malviya, E. Wu, S. Madden, H. Balakrishnan, andN. Zeldovich. Relational cloud: A database-as-a-service for the cloud. 2011.
[19] J. Daemen and V. Rijmen. Aes proposal: Rijndael. 1999.
[20] L. Ducas and D. Micciancio. Fhew: Bootstrapping homomorphic encryption in less than asecond. Advances in Cryptology–EUROCRYPT 2015, pages 617–640, 2015.
[21] S. Faber, S. Jarecki, H. Krawczyk, Q. Nguyen, M. Rosu, and M. Steiner. Rich queries onencrypted data: Beyond exact matches. In European Symposium on Research in ComputerSecurity, pages 123–145. Springer, 2015.
[22] F. Galiegue and K. Zyp. Json schema: core definitions and terminology. Internet EngineeringTask Force (IETF), 2013.
[23] J. Gantz and D. Reinsel. The digital universe in 2020: Big data, bigger digital shadows, andbiggest growth in the far east. IDC iView: IDC Analyze the Future, 2007:1–16, 2012.
[24] C. Gentry. A fully homomorphic encryption scheme. PhD thesis, Stanford University, 2009.
[25] O. Goldreich and R. Ostrovsky. Software protection and simulation on oblivious rams. Jour-nal of the ACM (JACM), 43(3):431–473, 1996.
[26] S. Gorbunov, V. Vaikuntanathan, and H. Wee. Attribute-based encryption for circuits. Proc.of the Forty-fifth Annual ACM Symposium on Theory of Computing, pages 545–554, 2013.
[27] H. Hacigumus, B. Iyer, and S. Mehrotra. Providing database as a service. In Data Engineer-ing, 2002. Proceedings. 18th International Conference on, pages 29–38. IEEE, 2002.
[28] S. Halevi and V. Shoup. Algorithms in helib. CRYPTO–Advances in Cryptology, pages554–571, 2014.
[29] H. Hu, J. Xu, C. Ren, and B. Choi. Processing private queries over untrusted data cloudthrough privacy homomorphism. In Data Engineering (ICDE), 2011 IEEE 27th InternationalConference on, pages 601–612. IEEE, 2011.
[30] M. Islam and M. Islam. An approach to provide security to unstructured big data. 8th Interna-tional Conf. on Software, Knowledge, Information Management and Applications (SKIMA),pages 1–5, Dec 2014.
[31] M. Kuzu, M. S. Islam, and M. Kantarcioglu. Distributed search over encrypted big data.In Proceedings of the 5th ACM Conference on Data and Application Security and Privacy,CODASPY ’15, pages 271–278, New York, NY, USA, 2015. ACM.
[32] L. Lamport. Paxos made simple. ACM Sigact News, 32(4):18–25, 2001.
46
[33] N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity andl-diversity. In IEEE 23rd International Conference on Data Engineering, pages 106–115,2007.
[34] C. Liu, L. Zhu, M. Wang, and Y.-a. Tan. Search pattern leakage in searchable encryption:Attacks and new construction. Information Sciences, 265:176–188, 2014.
[35] H. Liu. Amazon data center size. published March, 13, 2012.
[36] F. Lombardi and R. D. Pietro. Secure virtualization for cloud computing. Journal of Networkand Computer Applications, 34:1113–1122, 2011. Advanced Topics in Cloud Computing.
[37] D. C. Marinescu. Cloud computing: theory and practice. Newnes, 2013.
[38] C. Mavroforakis, N. Chenette, A. O’Neill, G. Kollios, and R. Canetti. Modular order-preserving encryption, revisited. Proc. of the 2015 ACM SIGMOD International Conf. onManagement of Data, pages 763–777, 2015.
[39] D. Micciancio. Lattice-based cryptography. Encyclopedia of Cryptography and Security,pages 713–715, 2011.
[40] R. Ostrovsky. Efficient computation on oblivious rams. In Proceedings of the twenty-secondannual ACM symposium on Theory of computing, pages 514–523. ACM, 1990.
[41] P. Paillier. Public-key cryptosystems based on composite degree residuosity classes. In Ad-vances in cryptologyEUROCRYPT99, pages 223–238. Springer, 1999.
[42] R. A. Popa, C. M. S. Redfield, N. Zeldovich, and H. Balakrishnan. Cryptdb: Protectingconfidentiality with encrypted query processing. Proc. of the Twenty-Third ACM Symposiumon Operating Systems Principles, pages 85–100, 2011.
[43] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. Hey, you, get off of my cloud: ex-ploring information leakage in third-party compute clouds. In Proceedings of the 16th ACMconference on Computer and communications security, pages 199–212. ACM, 2009.
[44] S. Sivasubramanian. Amazon dynamodb: A seamlessly scalable non-relational database ser-vice. Proc. of ACM SIGMOD Int. Conf. on Management of Data, pages 729–730, 2012.
[45] D. X. Song, D. Wagner, and A. Perrig. Practical techniques for searches on encrypted data.Proc. IEEE Symposium on Security and Privacy, pages 44–55, 2000.
[46] M. Stonebraker. Sql databases v. nosql databases. Commun. ACM, 53(4):10–11, Apr. 2010.
[47] C. Tankard. Big data security. Network security, 2012(7):5–8, 2012.
[48] S. Tu, M. F. Kaashoek, S. Madden, and N. Zeldovich. Processing analytical queries overencrypted data. Proc. of the VLDB Endowment, 6(5):289–300, 2013.
[49] S. E. Whang and H. Garcia-Molina. Managing information leakage. 2010.
[50] L. Xu, C. Jiang, J. Wang, J. Yuan, and Y. Ren. Information security in big data: Privacy anddata mining. Access, IEEE, 2:1149–1176, 2014.
[51] L. Xu, X. Zhang, X. Wu, and W. Shi. Abss: An attribute-based sanitizable signature forintegrity of outsourced database with public cloud. Proc. of the 5th ACM Conf. on Data andApplication Security and Privacy, pages 167–169, 2015.
47
[52] X. Yu and Q. Wen. A view about cloud data security from data life cycle. International Conf.on Computational Intelligence and Software Engineering (CiSE), pages 1–4, Dec 2010.
48