IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, … › people › fpierazzi › ... · master...

13
Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cloud Databases Luca Ferretti, Fabio Pierazzi, Michele Colajanni, and Mirco Marchetti Abstract—The cloud database as a service is a novel paradigm that can support several Internet-based applications, but its adoption requires the solution of information confidentiality problems. We propose a novel architecture for adaptive encryption of public cloud databases that offers an interesting alternative to the tradeoff between the required data confidentiality level and the flexibility of the cloud database structures at design time. We demonstrate the feasibility and performance of the proposed solution through a software prototype. Moreover, we propose an original cost model that is oriented to the evaluation of cloud database services in plain and encrypted instances and that takes into account the variability of cloud prices and tenant workloads during a medium-term period. Index Terms—Cloud database, confidentiality, encryption, adaptivity, cost model Ç 1 INTRODUCTION T HE cloud computing paradigm is successfully converg- ing as the fifth utility [1], but this positive trend is par- tially limited by concerns about information confidentiality [2] and unclear costs over a medium-long term [3], [4]. We are interested in the database as a service para- digm (DBaaS) [5] that poses several research challenges in terms of security and cost evaluation from a tenant’s point of view. Most results concerning encryption for cloud-based services [6], [7] are inapplicable to the data- base paradigm. Other encryption schemes that allow the execution of SQL operations over encrypted data either have performance limits [8] or require the choice of which encryption scheme must be adopted for each database column and SQL operation [9]. These latter proposals are fine when the set of queries can be statically determined at design time, while we are interested in other common scenarios where the workload may change after the data- base design. In this paper, we propose a novel architec- ture for adaptive encryption of public cloud databases that offers a proxy-free alternative to the system described in [10]. The proposed architecture guarantees in an adaptive way the best level of data confidentiality for any database workload, even when the set of SQL queries dynamically changes. The adaptive encryption scheme, which was initially proposed for applications not referring to the cloud, encrypts each plain column to mul- tiple encrypted columns, and each value is encapsulated in different layers of encryption, so that the outer layers guarantee higher confidentiality but support fewer com- putation capabilities with respect to the inner layers. The outer layers are dynamically adapted at runtime when new SQL operations are added to the workload. Although this adaptive encryption architecture is attrac- tive because it does not require to define at design time which database operations are allowed on each column, it poses novel issues in terms of applicability to a cloud con- text, and doubts about storage and network costs. We inves- tigate each of these issues and we reach three original conclusions in terms of prototype implementation, perfor- mance evaluation, and cost evaluation. We initially design the first proxy-free architecture for adaptive encryption of cloud databases that does not limit the availability, elasticity and scalability of a plain cloud database because multiple clients can issue concurrent oper- ations without passing through some centralized compo- nent as in alternative architectures [10]. Then, we evaluate the performance of encrypted database services by assum- ing the standard TPC-C benchmark as the workload and by considering different network latencies. Thanks to this testbed, we show that most performance overheads of adaptively encrypted cloud databases are masked by net- work latencies that are typical of a geographically distrib- uted cloud scenario. Finally, we propose the first analytical cost estimation model for evaluating cloud database costs in plaintext and encrypted configurations from a tenant’s point of view over a medium-term period. This model also considers the vari- ability of cloud prices and of the database workload during the evaluation period, and allows a tenant to observe how adaptive encryption influences the costs related to storage and network usage of a database service. By applying the model to several cloud provider offers and related prices, the tenant can choose the best compromise between the data confidentiality level and consequent costs in his period of interest. This paper is structured as following. Section 2 exam- ines related solutions for data confidentiality and cost estimation in cloud database services, and compares them against our proposal. Section 3 describes the proposed adaptive encryption architecture for cloud database The authors are with the Department of Engineering “Enzo Ferrari”, Uni- versity of Modena and Reggio Emilia, Modena, Italy. E-mail: {luca.ferretti, fabio.pierazzi, michele.colajanni, mirco.marchetti}@unimore.it. Manuscript received 15 Sept. 2013; revised 7 Mar. 2014; accepted 18 Mar. 2014. Date of publication 1 Apr. 2014; date of current version 30 July 2014. Recommended for acceptance by I. Bojanova, R.C.H. Hua, O. Rana, and M. Parashar. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TCC.2014.2314644 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014 143 2168-7161 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Transcript of IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, … › people › fpierazzi › ... · master...

Page 1: IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, … › people › fpierazzi › ... · master key is the encryption key of the encrypted metadata, and is known by legitimate

Performance and Cost Evaluation of an AdaptiveEncryption Architecture for Cloud Databases

Luca Ferretti, Fabio Pierazzi, Michele Colajanni, and Mirco Marchetti

Abstract—The cloud database as a service is a novel paradigm that can support several Internet-based applications, but its adoption

requires the solution of information confidentiality problems. We propose a novel architecture for adaptive encryption of public cloud

databases that offers an interesting alternative to the tradeoff between the required data confidentiality level and the flexibility of the

cloud database structures at design time. We demonstrate the feasibility and performance of the proposed solution through a software

prototype. Moreover, we propose an original cost model that is oriented to the evaluation of cloud database services in plain and

encrypted instances and that takes into account the variability of cloud prices and tenant workloads during a medium-term period.

Index Terms—Cloud database, confidentiality, encryption, adaptivity, cost model

Ç

1 INTRODUCTION

THE cloud computing paradigm is successfully converg-ing as the fifth utility [1], but this positive trend is par-

tially limited by concerns about information confidentiality[2] and unclear costs over a medium-long term [3], [4].

We are interested in the database as a service para-digm (DBaaS) [5] that poses several research challengesin terms of security and cost evaluation from a tenant’spoint of view. Most results concerning encryption forcloud-based services [6], [7] are inapplicable to the data-base paradigm. Other encryption schemes that allow theexecution of SQL operations over encrypted data eitherhave performance limits [8] or require the choice of whichencryption scheme must be adopted for each databasecolumn and SQL operation [9]. These latter proposals arefine when the set of queries can be statically determinedat design time, while we are interested in other commonscenarios where the workload may change after the data-base design. In this paper, we propose a novel architec-ture for adaptive encryption of public cloud databasesthat offers a proxy-free alternative to the systemdescribed in [10]. The proposed architecture guaranteesin an adaptive way the best level of data confidentialityfor any database workload, even when the set of SQLqueries dynamically changes. The adaptive encryptionscheme, which was initially proposed for applications notreferring to the cloud, encrypts each plain column to mul-tiple encrypted columns, and each value is encapsulatedin different layers of encryption, so that the outer layersguarantee higher confidentiality but support fewer com-putation capabilities with respect to the inner layers. The

outer layers are dynamically adapted at runtime whennew SQL operations are added to the workload.

Although this adaptive encryption architecture is attrac-tive because it does not require to define at design timewhich database operations are allowed on each column, itposes novel issues in terms of applicability to a cloud con-text, and doubts about storage and network costs. We inves-tigate each of these issues and we reach three originalconclusions in terms of prototype implementation, perfor-mance evaluation, and cost evaluation.

We initially design the first proxy-free architecture foradaptive encryption of cloud databases that does not limitthe availability, elasticity and scalability of a plain clouddatabase because multiple clients can issue concurrent oper-ations without passing through some centralized compo-nent as in alternative architectures [10]. Then, we evaluatethe performance of encrypted database services by assum-ing the standard TPC-C benchmark as the workload and byconsidering different network latencies. Thanks to thistestbed, we show that most performance overheads ofadaptively encrypted cloud databases are masked by net-work latencies that are typical of a geographically distrib-uted cloud scenario.

Finally, we propose the first analytical cost estimationmodel for evaluating cloud database costs in plaintext andencrypted configurations from a tenant’s point of view overa medium-term period. This model also considers the vari-ability of cloud prices and of the database workload duringthe evaluation period, and allows a tenant to observe howadaptive encryption influences the costs related to storageand network usage of a database service. By applying themodel to several cloud provider offers and related prices,the tenant can choose the best compromise between thedata confidentiality level and consequent costs in his periodof interest.

This paper is structured as following. Section 2 exam-ines related solutions for data confidentiality and costestimation in cloud database services, and compares themagainst our proposal. Section 3 describes the proposedadaptive encryption architecture for cloud database

� The authors are with the Department of Engineering “Enzo Ferrari”, Uni-versity of Modena and Reggio Emilia, Modena, Italy. E-mail: {luca.ferretti,fabio.pierazzi, michele.colajanni, mirco.marchetti}@unimore.it.

Manuscript received 15 Sept. 2013; revised 7 Mar. 2014; accepted 18 Mar.2014. Date of publication 1 Apr. 2014; date of current version 30 July 2014.Recommended for acceptance by I. Bojanova, R.C.H. Hua, O. Rana, andM. Parashar.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TCC.2014.2314644

IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014 143

2168-7161� 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, … › people › fpierazzi › ... · master key is the encryption key of the encrypted metadata, and is known by legitimate

services. Section 4 proposes the analytical cost model forthe estimation of database service costs in a medium termwhere it is likely that workloads and cloud prices change.Section 5 presents experimental evaluations for differentnetwork scenarios, workload models and number of cli-ents. Section 6 reports the results of the cost modelapplied to real cloud database providers over a threeyears horizon that is a typical view for tenant’s invest-ments. Section 7 outlines main conclusions and possibledirections for further research.

2 RELATED WORK

Improving the confidentiality of information stored in clouddatabases represents an important contribution to the adop-tion of the cloud as the fifth utility because it addressesmost user concerns. Our proposal is characterized by twomain contributions to the state of the art: architecture andcost model.

Although data encryption seems the most intuitive solu-tion for confidentiality, its application to cloud databaseservices is not trivial, because the cloud database must beable to execute SQL operations directly over encrypted datawithout accessing any decryption key. Na€ıve solutionsencrypt the whole database through some standard encryp-tion algorithms that do not allow to execute any SQL opera-tion directly on the cloud. As a consequence, the tenant hastwo alternatives: download the entire database, decrypt it,execute the query and, if the operation modifies the data-base, encrypt and upload the new data; decrypt temporarilythe cloud database, execute the query, and re-encrypt it.The former solution is affected by huge communication andcomputation overheads, and consequent costs that wouldmake cloud database services quite inconvenient; the lattersolution does not guarantee data confidentiality because thecloud provider obtains decryption keys.

The right alternative is to execute SQL operations directlyon the cloud database, without giving decryption keys tothe provider. An initial solution presented in [5] is based ondata aggregation techniques [8], that associate plaintextmetadata to sets of encrypted data. However, plaintextmetadata may leak sensitive information and data aggrega-tion introduces unnecessary network overheads.

The use of fully homomorphic encryption [11] would guar-antee the execution of any operation over encrypted data,but existing implementations are affected by huge compu-tational costs to the extent that the execution of SQL opera-tions over a cloud database would become impractical.Other encryption algorithms characterized by acceptablecomputational complexity support a subset of SQL opera-tors [12], [13], [14]. For example, an encryption algorithmmay support the order comparison command [12], but nota search operator [14]. The drawback related to these feasi-ble encryption algorithms is that in a medium-long termhorizon, the database administrator cannot know at designtime which database operations will be required over eachdatabase column. This issue is in part addressed in [10] byproposing an adaptive encryption architecture that isfounded on an intermediate and trusted proxy. This ten-ant’s component, which mediates all the interactionsbetween the clients and a possibly untrusted DBMS server,is fine for a locally distributed architecture but it cannot be

applied to a cloud context. Indeed, any centralized compo-nent at the tenant side reduces the scalability and avail-ability that are among the most important features ofcloud services. A solution to this problem was presentedin [9]: the proposed architecture allows multiple clients toissue concurrent SQL operations to an encrypted databasewithout any intermediate trusted server, but it assumesthat the set of SQL operations does not change after thedatabase design. A first idea to integrate adaptive encryp-tion schemes with a proxy-free architecture was proposedby the same authors in [15]. This paper develops the initialdesign through a prototype implementation, novel experi-mental results and an original cost model.

Indeed, besides data confidentiality, unclear costs are amain concern for cloud tenants. To this purpose, we proposean analytical cost model and a usage estimation methodol-ogy that allow a tenant to estimate the costs deriving fromcloud database services characterized by plain, encryptedand adaptively encrypted databases over a medium-termhorizon during which it is likely that both the databaseworkload and the cloud prices change. This model isanother original contribution of this paper because previousresearch focuses on the costs of cloud computing from aprovider’s perspective [16], [17]. For example, the authors in[16] outline the problems related to the cost estimation of acloud data center, such as servers, power consumption, andinfrastructures, but they do not propose an analytical costestimation model. CloudSim [18] can help a provider to esti-mate performance and resource consumptions of one ormultiple cloud data center alternatives.

This paper has a focus on database services and takes anopposite direction by evaluating the cloud service costsfrom a tenant’s point of view. This approach is quite origi-nal because related papers evaluate the pros and cons ofporting scientific applications to a cloud platform, such as[4] focusing on specific astronomy software and a specificcloud provider (Amazon), and [3] presenting a composablecost estimation model for some classes of scientific applica-tions. Besides the focus on a different context (scientific ver-sus database applications), the proposed model can beapplied to any cloud database service provider, and it takesinto account that over a medium-term period the databaseworkload and the cloud prices may vary.

3 ARCHITECTURE DESIGN

The proposed system supports adaptive encryption for pub-lic cloud database services, where distributed and concur-rent clients can issue direct SQL operations. By avoiding anarchitecture based on intermediate servers [10] between theclients and the cloud database, the proposed solution guar-antees the same level of scalability and availability of thecloud service. Fig. 1 shows a scheme of the proposed archi-tecture where each client executes an encryption engine thatmanages encryption operations. This software module isaccessed by external user applications through the encrypteddatabase interface. The proposed architecture manages fivetypes of information:

� plain data represent the tenant information;

� encrypted data are the encrypted version of the plaindata, and are stored in the cloud database;

144 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014

Page 3: IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, … › people › fpierazzi › ... · master key is the encryption key of the encrypted metadata, and is known by legitimate

� plain metadata represent the additional informationthat is necessary to execute SQL operations onencrypted data;

� encrypted metadata are the encrypted version of theplain metadata, and are stored in the cloud database;

� master key is the encryption key of the encryptedmetadata, and is known by legitimate clients.

All data and metadata stored in the cloud database areencrypted. Any application running on a legitimate clientcan transparently issue SQL operations (e.g., SELECT,INSERT, UPDATE and DELETE) to the encrypted clouddatabase through the encrypted database interface. Datatransferred between the user application and the encryptionengine are not encrypted, whereas information is alwaysencrypted before sending it to the cloud database. When anapplication issues a new SQL operation, the encrypted data-base interface contacts the encryption engine that retrievesthe encrypted metadata and decrypts them with the masterkey. To improve performance, the plain metadata arecached locally by the client. After obtaining the metadata,the encryption engine is able to issue encrypted SQL state-ments to the cloud database, and then to decrypt the results.The results are returned to the user application through theencrypted database interface.

As in related literature, the proposed architecture guar-antees data confidentiality in a security model in which: thenetwork is untrusted; tenant users are trusted, that is, theydo not reveal information about plain data, plain metadata,and the master key; the cloud provider administrators aredefined semi-honest or honest-but-curious [19], that is, theydo not modify tenant’s data and results of SQL operations,but they may access tenant’s information stored in the clouddatabase. The remaining part of this section describes theadaptive encryption schemes (Section 3.1), the encryptedmetadata stored in the cloud database (Section 3.2), and themain operations for the management of the encrypted clouddatabase (Section 3.3).

3.1 Adaptive Encryption Schemes

We consider SQL-aware encryption algorithms that guaran-tee data confidentiality and allow the cloud database engineto execute SQL operations over encrypted data. As eachalgorithm supports a specific subset of SQL operators, werefer to the following encryption schemes.

� Random (Rand): it is the most secure encryptionbecause it does not reveal any information about theoriginal plain value (IND-CPA) [20], [21]. It does notsupport any SQL operator, and it is used only fordata retrieval.

� Deterministic (Det): it deterministically encrypts data,so that equality of plaintext data is preserved. It sup-ports the equality operator.

� Order Preserving Encryption (Ope) [12]: it preserves inthe encrypted values the numerical order of the orig-inal unencrypted data. It supports the comparisonSQL operators (i.e., ¼; < ;�; > ;�).

� Homomorphic Sum (Sum) [13]: it is homomorphicwith respect to the sum operation, so that the multi-plication of encrypted integers is equal to the sum ofplaintext integers. It supports the sum operatorbetween integer values.

� Search: it supports equality check on full strings (i.e.,the LIKE operator).

� Plain: it does not encrypt data, but it is useful to sup-port all SQL operators on non confidential data.

If each column of the database was encrypted with onlyone algorithm, then the database administrator would haveto decide at design time which operations must be sup-ported on each database column. However, this solution isimpractical for scenarios in which the database workloadchanges over time. As an example, if we consider a databasesupporting a web application for which features or securityupdates are released, data encryption prevents the applica-tion of any update that introduces new SQL operations thatwere not considered at database design time. Similarly,encryption prevents the execution of data analytics on theencrypted database and of user-defined queries that do notbelong to a fixed workload (e.g., because the database isqueried directly by tenant employees). This issue can beaddressed through adaptive encryption schemes that sup-port at runtime SQL operations while preserving the highestpossible data confidentiality level on the encrypted data. Tothis purpose, the encryption algorithms are organized intostructures called onions, where each onion is composed byan ordered set of encryption algorithms, called (encryption)layers [10]. Outer layers guarantee higher level of data confi-dentiality and support fewer operations on encrypted data.Hence, each onion supports a specific set of operations. Wedesign the following onions:

� Onion-Eq: it supports the equality operator, and inte-grates Plain, Det and Rand layers.

� Onion-Ord: it supports the comparison operators(i.e., ¼; < ;�; > ;�), and integrates Plain, Ope andRand layers.

� Onion-Sum: it supports the sum operator, and inte-grates Plain, Sum and Rand layers.

� Onion-Search: it supports the string equality operator(LIKE), and integrates the Plain, Search and Randlayers.

� Onion-Single-Layer: this is a special type of onion thatsupports only one encryption layer.

Each plaintext column is converted into one or moreencrypted columns, each one corresponding to an onion.Each plaintext value is encrypted through all the layers of

Fig. 1. Encrypted cloud database architecture.

FERRETTI ET AL.: PERFORMANCE AND COST EVALUATION OF AN ADAPTIVE ENCRYPTION ARCHITECTURE FOR CLOUD DATABASES 145

Page 4: IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, … › people › fpierazzi › ... · master key is the encryption key of the encrypted metadata, and is known by legitimate

its onions. For example, the plaintext values associated withOnion-Eq are encrypted with Det, then the Det value isencrypted with Rand. The most external layer of an onion iscalled actual layer, which corresponds to its strongestencryption algorithm. The cloud database can only see theactual layer of the onions, and has no access to inner layersnor to plaintext data. The first time that a new SQL opera-tion is requested on a column, the outer layer of the appro-priate onion is dynamically removed at runtime throughthe adaptive layer removal mechanism that exposes a layersupporting the requested operations. This layer becomesthe new actual layer of the onion in the encrypted database.The layer removal mechanism is designed to ensure that thecloud provider can never access plaintext data. A detaileddescription is in Section 3.3.

Fig. 2 shows an example of the onions and layers struc-tures by considering two plaintext columns having datatypes int and varchar. The integer column is encrypted withOnion-Eq, Onion-Ord, and Onion-Sum, and the string col-umn is encrypted with Onion-Eq and Onion-Search. Eachonion represents a column in the encrypted database struc-ture. The actual layers of all the onions are set to Rand, thatguarantees the best data confidentiality level but it does notallow computations on encrypted data. When an equalitycheck is requested on the integer column the adaptive layerremoval mechanism removes the Rand layer of Onion-Eq,thus leaving Det as its new actual layer.

3.2 Metadata Structure

Metadata include all information that allows a legitimate cli-ent knowing the master key to execute SQL operations overan encrypted database. They are organized and stored attable-level granularity to reduce communication overheadfor retrieval, and to improve management of concurrentSQL operations [22]. We define all metadata informationassociated with a table as table metadata. Let us describe thestructure of a table metadata by referring to Fig. 3.

Table metadata include the correspondence between theplain table name and the encrypted table name because eachencrypted table name is randomly generated. Moreover, foreach column of the original plain table they also include aset of column metadata containing the name and the datatype of the corresponding plain column (e.g., integer, var-char, datetime). Each set of column metadata is associatedwith as many sets of onion metadata as the number of onionsassociated with the column. Onion metadata describe allthe encryption information about an onion and its layers,

hence they are organized in a data structure that containsthe following attributes:

� the encrypted name is the name of the encrypted col-umn (i.e., the onion) in the encrypted cloud database;

� the actual encryption layer is the name of the mostexternal layer of the encrypted data (e.g., Rand)stored in the column;

� the field confidentiality denotes which set of keysmust be used to encrypt a column data, becauseonly columns that share the same encryption keyscan be joined; we identify three types of field confi-dentiality parameters: self denotes a private set ofkeys for the column, multi-column identifies thesharing of the same set of keys among two columns,database imposes the same set of keys on all columnsof the same data type.

� the onion parameter identifies the type of onion thatis used to encrypt data (e.g., Onion-Eq).

Each set of onion metadata is associated with as manysets of layer metadata as the number of layers required by theonion type. Each set of layer metadata includes an encryp-tion key and a label identifying the corresponding encryp-tion algorithm. The set of encryption keys for each onion isgenerated according to the field confidentiality parameterimposed on each encrypted column.

3.3 Encrypted Database Management

We now describe the three main operations involved in theencrypted database management: database creation, execu-tion of SQL operations, and adaptive layer removal.

3.3.1 Database Creation

In the setup phase, the database administrator generates amaster key, and uses it to initialize the architecture metadata.The master key is then distributed to legitimate clients.Table creation requires the insertion of a new row in themetadata table. For each table creation, the administratoradds a column by specifying the column name, data type andconfidentiality parameters. These last data are the most impor-tant for this paper because they include the set of onions asso-ciated with the column, the starting layer denoting the actuallayer at creation time, and the field confidentiality of eachonion. If the administrator does not specify the confidential-ity parameters of a column, they are automatically chosenby the encryption engine with respect to some tenant’s pol-icy. Typically, the default policy assumes that each columnis associated with all the compatible onions, and the starting

Fig. 2. Example of onion structures.

Fig. 3. Metadata structure.

146 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014

Page 5: IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, … › people › fpierazzi › ... · master key is the encryption key of the encrypted metadata, and is known by legitimate

layer of each onion is set to the strongest encryption algo-rithm. For example, integer columns are encrypted bydefault with Onion-Eq, Onion-Ord and Onion-Sum usingRand as the actual encryption layer (see Fig. 2).

3.3.2 Execution of SQL Operations

When a user/application wants to execute an operation onthe cloud database, the client encryption engine analyzesthe SQL command structure and identifies which tables,columns and SQL operators are involved. The client issuesa request for the table metadata for each involved table,and decrypts the metadata with the master key. Then, theclient determines whether the SQL operators are sup-ported by the actual layers of the onions associated withthe involved columns. If required, the client issues arequest for layer removal in order to support the SQLoperators at runtime. By using the information stored inthe table metadata, the client is able to encrypt the parame-ters of the SQL operations: tables and columns names, andconstant values. The client issues this new statement calledencrypted SQL operation to the cloud database which trans-parently executes it over encrypted data. The encryptedresults are decrypted using information contained in themetadata.

3.3.3 Adaptive Layer Removal

The adaptive layer removal is the process that dynamicallyremoves the external layer of an onion in order to adap-tively support SQL operations issued by legitimate clients.

Let us describe the details of the adaptive layer removalmechanism by referring to the following example. Weconsider a table T with columns id of type int and nameof type string, and a tenant client preparing to issue thefollowing statement to the encrypted cloud database:SELECT � FROM T WHERE id < 10. The client encryptionengine analyzes the SQL statement, and identifies that theoperation id < 10 has to be executed on the encrypteddatabase. Then, the client reads the metadata and checkswhether there is the Onion-Ord attribute associated withthe column id because this is the only onion supportingthe operator < . If the actual layer of Onion-Ord associ-ated with id is set to Rand, then the client dynamicallyinvokes a stored procedure on the cloud database thatremoves at runtime the Rand layer of Onion-Ord of thecolumn id, thus leaving the Ope layer exposed. The clientcan now encrypt the SELECT query that contains the oper-ation id < 10 and issue the encrypted query to theencrypted database, that executes it on the Ope layer ofOnion-Ord. Any new SQL operation involving an ordercomparison on the column id does not require to invokeagain the layer removal procedure because the actuallayer of Onion-Ord is Ope.

The cloud database can execute the adaptive layerremoval if and only if a legitimate client invokes the storedprocedure and gives to it the decryption key of the mostexternal encryption layer. As each layer has a differentencryption key, the data remain encrypted and the cloudprovider cannot access plaintext data. For security reasons,we also assume that the adaptive layer removal mechanismdoes never expose the Plain layer of an onion.

4 COST ESTIMATION OF CLOUD DATABASE

SERVICES

We consider a tenant that is interested in estimating the costof porting his database to a cloud platform. This porting is astrategic decision that must evaluate confidentiality issuesand related costs over a medium-long term. For these rea-sons, we propose a model that includes the overhead ofencryption schemes and the variability of database work-load and cloud prices. The proposed model is generalenough to be applied to the most popular cloud databaseservices, such as Amazon Relational Database Service [23],EnterpriseDB [24], Windows Azure SQL Database [25], andRackspace Cloud Database [26].

4.1 Cost Model

The cost of a cloud database service can be estimated as afunction of three main parameters:

Cost ¼ fðTime; Pricing; UsageÞ; (1)

where:

� Time identifies the time interval T for which the ten-ant requires the service.

� Pricing refers to the prices of the cloud provider forsubscription and resource usage; they typically tendto diminish during T [27].

� Usage denotes the total amount of resources used bythe tenant; it typically increases during T .

In order to detail the Pricing attribute, it is important tospecify that cloud providers adopt two subscription poli-cies: the on-demand policy allows a tenant to pay-per-useand to withdraw his subscription anytime; the reservationpolicy requires the tenant to commit in advance for a reser-vation period. Hence, we distinguish between billing coststhat depend on resource usage and reservation costs denotingadditional fees for commitment in exchange for lower pay-per-use prices [28]. Billing costs are billed periodically tothe tenant every billing period TB. Moreover, if the tenantadopts the reservation policy, the cloud provider requiresthe payment of the reservation cost at the beginning of eachreservation period TR. An example of the relationship amongT (three years), TR (one year) and TB (one month) is repre-sented in Fig. 4.

Pricing is the set of reservation prices and billing prices. Res-ervation prices fRrg refer to reservation periods, wherer ¼ ½1; . . . ; NR� and NR ¼ T=TR is the number of reservationperiods. Billing prices refer to billing periods b ¼ ½1; . . . ;NB�, where NB ¼ T=TB. Our model takes into account bill-ing prices for all the resources considered by the main pro-viders [28], [29], [30], [31], [32]: uptime price fphb g, storageprice fpsbg and network price fpnb g.

Fig. 4. Example of relationship among estimation (T ), reservation (TR)and billing (TB) periods.

FERRETTI ET AL.: PERFORMANCE AND COST EVALUATION OF AN ADAPTIVE ENCRYPTION ARCHITECTURE FOR CLOUD DATABASES 147

Page 6: IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, … › people › fpierazzi › ... · master key is the encryption key of the encrypted metadata, and is known by legitimate

Usage represents the amount of resources consumed bythe tenant. It is defined as the set of uptime usage fhbg, storageusage fsbg and network usage fnbg.

The total cost of the cloud database service C can be esti-mated through the following equation:

C ¼XNR

r¼1

Rr þXNB

b¼1

�H�hb; p

hb

�þ S

�sb; p

sb

�þN

�nb; p

nb

��; (2)

where Hðhb; phb Þ is the uptime billing function, Sðsb; psbÞ is the

storage billing function and Nðnb; pnb Þ is the network billing

function. We observe that Rr represents fixed costs that donot vary with respect to Usage, while H;S and N repre-sent costs that vary with respect to database uptime, stor-age and network usage. Moreover, all prices may varyduring the estimation period T due to price adjustmentsapplied by the cloud provider. We observe that differentcloud providers apply different billing criteria, thus wecannot detail the billing functions without losing general-ity. We propose detailed price models for popular cloudproviders in Section 4.2.

4.2 Cloud Pricing Models

Popular cloud database providers adopt two different bill-ing functions, that we call linear L and tiered T . Let us con-sider a generic resource x. We define xb as its usage at theb-th billing period and pxb as its price. If the billing functionis linear, it can be computed as:

L�xb; p

xb

�¼ xb � pxb : (3)

If the billing function is tiered, the cloud provider appliesdifferent prices to different ranges of resource usage. Let usdefine Z as the number of tiers, and ½x1; . . . ; xZ�1� as the setof thresholds that define all the tiers. The price is modeledas a piecewise function:

pxb ðxÞ ¼pxb;1; x � x1

pxb;zþ1; xz < x � xzþ1; 1 � z < Z � 1pxb;Z; x � xZ�1;

8<: (4)

where ½pxb;1; . . . ; pxb;Z � represents the set of prices associatedwith the tiers. If the resource usage is lower than the firstthreshold (xb � x1), then the billing function is defined as:

T�xb; p

xb

�¼ xb � pxb;1: (5)

Otherwise, we denote as ~xz the highest threshold that islower than the usage xb. Then the billing function can bedefined as:

T�xb; p

xb

�¼ ðxb � ~xzÞ � pxb;zþ1 þ

Xz�1

j¼0

ðxjþ1 � xjÞ � pxb;jþ1: (6)

The uptime and the storage billing functions ofAmazonRDS [23] are linear, while the network usage is atiered billing function. On the other hand, the uptime billingfunctions of AzureSQL [25] is linear, while the storage andnetwork billing functions are tiered.

4.3 Usage Estimation

While the uptime (hb) is easily measurable, it is more diffi-cult to estimate the usage of storage (sb) and network (nb)because they depend on the database structure, the work-load and the use of encryption. We now propose a method-ology for the estimation of storage and network usage. Forclarity, we define sp; se; sa as the storage usage in the plain-text, encrypted, and adaptively encrypted databases for onebilling period. Similarly, np; ne; na represent network usagein the three configurations.

We assume that the tenant knows the database structureand the query workload, and that each column a 2 A storesra values. By denoting as vpa the average storage size of eachplaintext value stored in column a, we estimate the storageof the plaintext database sp as:

sp ¼Xa2A

�ra � vpa

�: (7)

This equation considers only the storage usage of tenantdata, and it does not take into account disk usage of thedatabase server, the operating system, and other necessarysoftware. This assumption holds because we consider cloudDBaaS services. An estimate of storage size for Infrastruc-ture as a Service should also include all these factors.

We define the query workload W as the set fðQ;vÞg,where each couple ðQ;vÞ represents a query Q and the fre-quency vwith which the query is executed. We assume thatv is normalized, hence it is expressed as a rational numberin the range ð0; 1� and the sum of all v is equal to 1.

The tenant can estimate the average network consump-tion of a query on the plaintext database through the follow-ing equation:

np ¼ k �X

ðQ;vÞ2W

�v �Xa2Q

vpa

�; (8)

where a 2 Q represents an attribute that is retrieved by thequery Q, and k is a corrective factor that takes into accountnetwork overheads caused by communication protocols.We do not model the value of k because it depends on net-work and applications protocols used by the cloud database(e.g., a PostgreSQL ODBC connection over SSL). Hence wepropose that the tenant evaluates it experimentally for aspecific software configuration. An experimental evaluationof k for our prototype is presented in Section 6.1.

Encrypting the content of a database increments its stor-age size because encryption algorithms expand the plaintextdata. We define F as the set of SQL-aware encryption algo-rithms available in the system [33] and f 2 F as one encryp-tion algorithm. It is then possible to compute the encryptedsize vea of the attribute a encrypted with the algorithm f as:

vea ¼ Ef

�vpa�; (9)

where Ef is a function that depends on the algorithm f.Table 1 summarizes the Ef functions for the encryptionalgorithms currently implemented in our prototype.

If the tenant does not use adaptive strategies (that is, eachcolumn is encrypted through only one SQL-aware encryp-tion algorithm) he can estimate the storage size se and thenetwork consumption ne of the encrypted database by

148 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014

Page 7: IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, … › people › fpierazzi › ... · master key is the encryption key of the encrypted metadata, and is known by legitimate

replacing vpa with vea in Equations (7) and (8). If we consideradaptive encryption techniques, the storage overheadincreases, because multiple layers of encryption are stackedover each plaintext database column. To model storage andnetwork overheads for adaptively encrypted databases, wedefine u as the ordered set of encryption algorithms thatrepresents an onion. As an example, we may representOnion-Eq by using uE ¼ hfD;fRi, where fD;fR 2 F repre-sent Deterministic and Random encryption algorithms.

We can estimate the adaptively encrypted storage size vaaof attribute a through onion u as:

vaa ¼ F�vpa; u

�; (10)

where F is defined as the composition of the functions Ef

for all the algorithms f included in the ordered set u. Forexample, if we consider Onion-Eq uE :

F�vpa; uE

�¼ EfR EfD ¼ EfR

�EfD

�vpa��: (11)

Since each plaintext column a may be associated withmultiple onion-encrypted columns, we define Qa as a setof onions associated with a. As an example, let us estimatethe storage size of a plaintext bigint column encryptedthrough the set of onions Qa ¼ fuE; uO; uSg (Onion-Eq,Onion-Ord and Onion-Sum). Since the storage size of abigint is vpa ¼ 8 bytes, the corresponding encrypted storagesize can be computed as:

Fð8; uEÞ þ Fð8; uOÞ þ Fð8; uSÞ¼ Fð8; hfD;fRiÞ þ Fð8; hfO;fRiÞ þ Fð8; hfS;fRiÞ¼ 16þ 24þ 272 ¼ 312:

The tenant can estimate the storage size of the adaptivelyencrypted database sa by using Equations (7) and (11):

sa ¼Xa2A

ra �Xu2Qa

F�vpa; u

� !: (12)

Network consumption is related to the storage size ofadaptively encrypted data retrieved by the queries. Retriev-ing adaptively encrypted data related to a plaintext columna requires to choose one of the onions associated with a.Our design choice is to minimize the network overhead byselecting the onion with minimal storage overhead. Hence,we define F$

as:

F$�vpa�¼ min

�F�vpa; u

�j u 2 Qa

: (13)

The tenant can estimate the network consumption of theadaptively encrypted database na by replacing vpa withF$ðvpaÞ in Equation (8).

5 PERFORMANCE EVALUATION

This section aims to verify whether the overheads of adap-tive encryption represent an acceptable compromisebetween performance and data confidentiality for the ten-ants of cloud database services. To this purpose, we designa suite of performance tests that allow us to evaluate theimpact of encryption and adaptive encryption on responsetimes and throughput for different network latencies andfor increasing numbers of concurrent clients. The TPC-Cstandard benchmark is used as the workload model forthe database services. The experiments are carried out inEmulab [34], which provides us with a set of machines in acontrolled environment. Each client machine runs thePython client prototype of our architecture on a pc3000machine having a single 3 GHz processor, 2 GB of RAMand two 10,000 RPM 146 GB SCSI disks. The databaseserver is PostgreSQL 9.1 running on a d710 machine hav-ing a quad-core Xeon 2.4 GHz processor, 12 GB of RAMand a 7,200 RPM 500 GB SATA disk.

The current version of the prototype supports the mainSQL operations (SELECT, DELETE, INSERT and UPDATE)and the WHERE clause. We consider three TPC-C compli-ant databases having 10 warehouses:

� Plaintext (PLAIN) is based on plaintext data.

� Encrypted (ENC) refers to a statically encrypted data-base where each column is encrypted at design timewith only one encryption algorithm.

� Adaptively encrypted (ADAPT) refers to an encrypteddatabase in which each column is encrypted with allthe onions supported by its data type (Section 3.3).

In the ENC and ADAPT configurations each column isset to the highest encryption layer that supports the SQLoperations of the TPC-C workload. During each TPC-Ctest lasting for 300 seconds, we monitor the number ofexecuted TPC-C transactions, and the response times ofall the SQL operations from the standard TPC-C work-load. We repeat the test for each database configuration(PLAIN, ENC and ADAPT) for increasing number of cli-ents (from 5 to 20), and for increasing network latencies(from 0 to 120 ms). To guarantee data consistency thethree databases use repeatable read (snapshot) isolationlevel [35].

The experiments aim to evaluate the overhead caused bystatic and adaptive encryption in terms of system through-put and response time. In Figs. 5 and 6, we report the num-ber of committed TPC-C transactions per minute executedon the three cloud database configurations for 5 and 20concurrent clients, respectively. We can appreciate that inboth cases, and in all other results not reported for spacereasons, the throughput of the ENC database is close to thatof the PLAIN database. Moreover, as the network latencyincreases, even the performance of the ADAPT databasetends to that of the other two configurations, and it is quite

TABLE 1Functions Ef for Different Encryption Algorithms

FERRETTI ET AL.: PERFORMANCE AND COST EVALUATION OF AN ADAPTIVE ENCRYPTION ARCHITECTURE FOR CLOUD DATABASES 149

Page 8: IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, … › people › fpierazzi › ... · master key is the encryption key of the encrypted metadata, and is known by legitimate

close to them for latencies higher than 60 ms, which are real-istic for typical cloud database scenarios. This is anextremely positive result because it demonstrates that adap-tive encryption can be realistically used for cloud databaseservices.

In Figs. 7 and 8, we consider the ENC and ADAPT con-figurations and we report the average encryption timesrequired by each SQL operation composing the TPC-Cworkload. The results are grouped on the basis of the fivetransactions of the TPC-C workload. Most SQL operationshave similar costs, but in the ADAPT configuration (Fig. 8)some operations require much higher encryption timeswith respect to ENC (Fig. 7). Further analyses demonstratethat the two peaks in ENC are related to SQL operationsrequiring Ope encryption, and that the majority of peaks inADAPT are related to INSERT operations combined withOpe encryption. This is due to the encryption time of Opethat is two or three orders of magnitude higher than that ofRand and Det [10]. In the ADAPT configuration, every inte-ger column is associated by default with Onion-Eq andOnion-Ord, which has an Ope layer. Hence, every insertion

of a value into an integer column requires an Ope encryp-tion, and this causes a significant increase of the overallencryption overhead. On the other hand, in the ENC config-uration the Ope layer is associated only with the columns inwhich the TPC-C workload requires support for the ordercomparison operators.

We now investigate the impact of network latency onresponse time with regard to the overhead caused by staticand adaptive encryption. We evaluate the response times ofthe most popular SELECT, DELETE, INSERT and UPDATEoperations of the TPC-C workload for different numbers ofclients. In Figs. 9 and 10 we report the response times over-heads of the SELECT and INSERT operations for 10 clientsas function of increasing network latencies. Overheads ofthe ENC and ADAPT configurations are measured withrespect to the PLAIN database response time. In Fig. 9, weobserve that the overhead of the SELECT query tends to bemasked for latencies higher than 60 ms in both the ADAPTand ENC configurations. This is an important conclusionbecause the results of the SELECT query are representativeof the performance related to the DELETE and UPDATEoperations. On the other hand, the INSERT operation is crit-ical in terms of overhead. As shown in Fig. 10, the ADAPTconfiguration has response times overheads much higher

Fig. 6. TPC-C throughput with 20 clients.

Fig. 7. Encryption times of the SQL operations composing the TPC-Cworkload (ENC configuration).

Fig. 8. Encryption times of the SQL operations composing the TPC-Cworkload (ADAPT configuration).

Fig. 9. SELECT response time overhead—10 clients.

Fig. 10. INSERT response time overhead—10 clients.

Fig. 5. TPC-C throughput with 5 clients.

150 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014

Page 9: IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, … › people › fpierazzi › ... · master key is the encryption key of the encrypted metadata, and is known by legitimate

than those related to the ENC configuration for any networklatency. This result has a twofold motivation: the highernumber of encryptions required by the adaptive architec-ture to encrypt all parameters of the different onions; thehigh computational cost characterizing the Ope encryptionas shown in previous histograms.

All experimental results show that network latencieshigher than 60 ms, which are typical of most cloud databaseenvironments, make the adaptive encryption overheadalmost negligible when considering the overall set of opera-tions of the TPC-C benchmark. However, in the ADAPTconfiguration, for some SQL operations involving the Opeencryption or for the encryption of a high number of param-eters through several encryption layers (e.g., INSERT), theimpact on the response time is visible even for networklatencies higher than 120 ms.

If the workload is characterized by many INSERT opera-tions, we can conclude that it is a tenant’s duty to solve thetradeoff between accepting adaptive encryption overheadsand paying the costs related to an entire database re-encryption when workload changes in statically encrypteddatabases. It is likely that this tradeoff can be solved on thebasis of the expected variability of the workload. Possibleimprovements can be achieved by parallel encryption algo-rithms that can leverage multi-threading over differentcores, but this research is out of the scope of this paper.

On the positive hand, we observe that the presentedADAPT configuration represents a worst case scenariothat is fully adaptive, because all database columns areencrypted with all the onions supported by its data type.On the other hand, the ENC configuration represents abest case scenario that is completely static, because theuser manually defines the single encryption scheme touse on each database column. We observe that a tenantmay choose a partially adaptive configuration in which asubset of columns are encrypted with adaptive strategiesand others are statically encrypted. As a consequence,performance of adaptive encryption for many realisticworkloads fall between the ENC and ADAPT scenariospresented in this section.

6 COST EVALUATION

In this section we demonstrate the feasibility of the pro-posed cost model by applying it to PLAIN, ENC andADAPT configurations (see Section 5) in real cloud databaseservices. We initially validate the usage estimation method-ology presented in Section 4.3. We then analyze how costsvary for different cloud providers and resource usages. Wefinally evaluate tenant’s costs over a medium-term periodequal to three years by considering realistic resource usageincrements and cloud price reductions.

6.1 Validation of the Usage Estimation

To validate the usage estimation model, we perform severalexperiments based on the TPC-C benchmark.

First of all, we validate the storage usage estimationmodel. We deploy nine TPC-C compliant databases ofthree different sizes: 1, 5 and 10 warehouses (the numberof warehouses is the TPC-C parameter that influences theinitial database size). For each size, we generate three data-base configurations: PLAIN, ENC and ADAPT. Results aresummarized in Table 2. Estimated storage of PLAIN, ENCand ADAPT are calculated by using the analytical modelpresented in Section 4.3. For each estimated value, wereport the estimation error with respect to the measureddatabase size. Errors are expressed as a percentage. Weobserve that the proposed model always overestimates thedatabase size. However, errors show that estimations areclose to measured sizes. For PLAIN databases, the error isalways below 2%, while for ENC and ADAPT databasesthe error is always between 5% and 6%.

We then validate the network usage estimation model.We deploy PLAIN, ENC and ADAPT TPC-C compliantdatabases, each having 10 warehouses. We observe thatnetwork consumption is invariant with respect to the num-ber of warehouses, because it only depends on encryptionand query workload. We measure the network usage of thePLAIN database, and we obtain an average of 7,162 Bytesper transaction. By using Equation (8), we estimatenp ¼ k � 548. Hence, we determine k ¼ 13:07. Then we usethis value of k to determine the estimated network usage ofENC and ADAPT configurations. We compare these valueswith the experimentally measured network usages. Resultsare summarized in Table 3. Estimations are quite accurate,since we achieve errors of �1:2% and �1:4% for the ENCand ADAPT configurations, respectively.

The validation demonstrates the efficacy of the proposedanalytical usage estimation methodology in the TPC-Cworkload. Costs evaluations proposed in the following sec-tions are based on the same usage estimations.

6.2 Analysis of Cloud Database Costs

We analyze cloud database costs with respect to differentcloud provider offers and different storage and networkusages. We consider a billing period equal to one month,and 24/7 availability (730 uptime hours per month).

We initially estimate the monthly costs of a cloud data-base service in the PLAIN, ENC and ADAPT configurationswith respect to a plaintext storage usage of 100 GB and aplaintext network usage of 100 GB. In Table 4, we report theresults for the following cloud instances: Small, Large, andHigh Memory: Double Extra Large from Amazon RDS [28];Premium P1 and Premium P2 from SQL Azure [31].

TABLE 2Validation of Storage Usage of TPC-C Compliant Databases

TABLE 3Validation of Outgoing Network Usage

of TPC-C Compliant Databases

FERRETTI ET AL.: PERFORMANCE AND COST EVALUATION OF AN ADAPTIVE ENCRYPTION ARCHITECTURE FOR CLOUD DATABASES 151

Page 10: IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, … › people › fpierazzi › ... · master key is the encryption key of the encrypted metadata, and is known by legitimate

The left part of Table 4 reports the storage prices ps, thenetwork prices pn, the total uptime costs H, and the annualreservation prices R reported as monthly costs. We observethat the storage and network prices do not change for differ-ent instances of the same cloud provider, whereas the reser-vation cost R and the uptime cost H may vary dependingon the chosen instance. The right part of Table 4 reports theestimations of the billing costs. For each instance, we esti-mate the monthly cost C of the PLAIN, ENC and ADAPTconfigurations, expressed in USD [$], and the influenceof storage cost S and of network cost N on the billing cost C(i.e., S=C and N =C) that are expressed as percentages. Theresults from Table 4 show the impact of static and adaptivedatabase encryption on the monthly billing costs for differ-ent instance classes.

It is interesting to observe that the cost differencesbetween plain and encrypted cloud databases tend to

remain constant for different instances of the same cloudprovider, because their unit prices ps and pn do not vary.On the other hand, higher reservation and uptime costsrelated to more powerful instances result in a reducedimpact of S and N on C. Since encryption affects onlystorage and network usages, the cost overhead due toencryption is partially masked in the case of higher reser-vation and uptime prices. For example, the cost overheadbetween the PLAIN and ADAPT configurations of Ama-zon RDS Large instance is 20:2%, whereas the sameoverhead in the Double Extra Large instance is only8:7%.

We now analyze how different network and storageusages affect the costs of PLAIN, ENC and ADAPT clouddatabases. In Table 5, we estimate the monthly costs ofAmazon RDS Large [28] and SQL Azure Premium P1 [31]for different combinations of plaintext storage and network

TABLE 4Cost Evaluation of Cloud Database Services for One Billing Period (Month)

with 24/7 Uptime, 100 GB of Plaintext Storage Size and 100 GB of Plaintext Network Usage

TABLE 5Monthly Costs Increases for Different Combinations of Plaintext Storage and Network Usages

152 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014

Page 11: IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, … › people › fpierazzi › ... · master key is the encryption key of the encrypted metadata, and is known by legitimate

usages. The cloud prices related to these two instances arereported in the left part of Table 4. In Table 5 we report thebilling cost C of the PLAIN configuration, expressed in USD[$], and the estimated monthly cost overhead C% in the ENCand ADAPT configurations, expressed as percentage. Theresults of this table show that in Amazon RDS C% is alwayslower than 60 and 90 percent in the ENC and ADAPT con-figurations respectively, whereas in SQL Azure it is alwayslower than 30% (ENC) and 50% (ADAPT). In SQL AzurePremium P1 the values of C% are lower than those of Ama-zon RDS Large because in P1 the fixed costs H and R arehigher and the unit prices ps and pn are lower. This conclu-sion is valid even for other instances characterized by highreservation and uptime costs, which are not reported forspace reasons.

Moreover, we observe that for plaintext storage sizesequal to 1,000 GB, C% remains stable for increasing plaintextnetwork consumptions. This result is motivated by the factthat for higher database sizes, the storage cost S becomesdominant with respect to the network cost N , especially inthe ADAPT configuration. This stability of cost overheadsfor almost any network usage is a quite interesting resultbecause in real database workloads the network usage isusually characterized by unpredictable variability. This con-clusion can reduce the tenant concerns about the monthlybill for cloud services.

6.3 Cost Evaluation over a Medium-Term Period

We now consider an application of the proposed cost modelin a realistic scenario in which a tenant wants to estimatethe costs of moving his e-commerce database to the cloudfor the upcoming three years. In particular, we consider a

scenario in which the storage and network usages tend toincrease because of higher numbers of database operationsand/or customers, and the cloud prices tend to decreaseover a three years horizon. In this DYNAMIC scenario, weassume that the initial month for the cost estimation isMarch 2014. The tenant initial database size is 200 GB, andthe cloud database service is subject to the TPC-C workload[36], and the workload intensity in terms of committedtransactions per day is characterized by the trend shown inFig. 11. The average network usage determined by thisworkload is approximately 2 GB/day, with a peak of about15 GB/day during the holiday season beginning at Thanks-giving. The cloud prices are related to an Amazon RDSLarge instance [23], and we assume that the tenant canleverage the likely cloud price reductions. By assuming thatin the period of interest the Amazon RDS price reductionsof the past three years [27] repeat their trend in the upcom-ing three years, we represent the expected uptime (phb ), net-work (pnb ) and storage (psb) prices in Fig. 12.

The first analysis refers to a monthly workload increaseequal to 2% with respect to the trend shown in Fig. 11. Weobserve that the TPC-C standard defines a relation betweenthe number of committed transactions and the storagegrowth over time. According to this relation and the consid-ered workload, Fig. 13 reports the expected storage usagefor the upcoming three years referring to the PLAIN, ENCand ADAPT configurations. As expected, the storage usagesof the ENC and ADAPT configurations are higher than thatof the PLAIN configuration, and this difference tends toincrease. It is interesting to evaluate the final effect caused

Fig. 11. Workload trend in terms of committed transactions per day (andmonth) during a year for a typical e-commerce workload.

Fig. 12. Expected price reductions for Amazon RDS from March 2014 toFebruary 2017.

Fig. 13. Expected storage usage in the DYNAMIC (þ2%) scenario for theupcoming three years.

Fig. 14. Billing costs of the DYNAMIC (þ2%) scenario in the three yearsperiod.

FERRETTI ET AL.: PERFORMANCE AND COST EVALUATION OF AN ADAPTIVE ENCRYPTION ARCHITECTURE FOR CLOUD DATABASES 153

Page 12: IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, … › people › fpierazzi › ... · master key is the encryption key of the encrypted metadata, and is known by legitimate

by the opposite contributions related to price reductionsand workload increase. By reporting the monthly billingcosts of PLAIN, ENC and ADAPT databases in Fig. 14, wecan observe that, despite a workload increase equal to 2 per-cent per month, the billing costs of ADAPT configurationremain approximately stable.

If we extend the cost evaluation of the three years periodto different monthly workload increases, then we can evalu-ate the so called break even point for each database configura-tion, that is, the value of monthly workload increase atwhich the annual costs of the cloud database service remainapproximately constant. In this evaluation we refer to theprice reductions reported in Fig. 12. By computing the breakeven point, the tenant can estimate whether the monthlyworkload increase will cause increments or decrements ofthe annual costs. Since PLAIN, ENC and ADAPT configura-tion determine different resource usages, the break evenpoint is different for each configuration.

An analysis of the break even points is presented inTable 6, in which we report the annual and triennial costswith respect to the PLAIN, ENC and ADAPT configura-tions and different monthly workload increases: 2%; 4%and 9%. Lines of the table with a gray background showannual costs that are almost constant for the three differ-ent configurations, hence they identify the monthly work-load increases that represent the break event points forADAPT, ENC and PLAIN. We observe that the breakeven points of the two encrypted configurations corre-spond to monthly workload increases that are lower thanthe break even point of the plaintext configuration. Thisoccurs because, even if the three configurations are sub-ject to the same workload, the actual resource usages ofENC and ADAPT are higher due to encryption and adap-tivity overheads. Just as a term of comparison, in Table 6we report also the STATIC scenario in which we assumethat during the three years period the storage size of thedatabase remains equal to 200 GB, the network usagedepends only on the trend of Fig. 11, and cloud prices donot vary with respect to the Amazon RDS Large prices atthe end of February 2014 (second line in Table 4). Wehighlight that the differences between costs estimations ofthe STATIC and DYNAMIC scenarios are higher forencrypted configurations, especially if adaptive encryp-tion strategies are used. A tenant can leverage the breakeven point analysis to quantify the annual and triennialsavings or major expenses when the monthly workloadincrease is below or over the break even point.

7 CONCLUSIONS

There are two main tenant concerns that may prevent theadoption of the cloud as the fifth utility: data confidentiality

and costs. This paper addresses both issues in the case ofcloud database services. These applications have not yetreceived adequate attention by the academic literature, butthey are of utmost importance if we consider that almost allimportant services are based on one or multiple databases.

We address the data confidentiality concerns by propos-ing a novel cloud database architecture that uses adaptiveencryption techniques with no intermediate servers. Thisscheme provides tenants with the best level of confidential-ity for any database workload that is likely to change in amedium-term period. We investigate the feasibility and per-formance of the proposed architecture through a large set ofexperiments based on a software prototype subject to theTPC-C standard benchmark. Our results demonstrate thatthe network latencies that are typical of cloud databaseenvironments hide most overheads related to static andadaptive encryption.

Moreover, we propose a model and a methodology thatallow a tenant to estimate the costs of plain and encryptedcloud database services even in the case of workload andcloud price variations in a medium-term horizon. By apply-ing the model to actual cloud provider prices, we can deter-mine the encryption and adaptive encryption costs for dataconfidentiality. Future research could evaluate the proposedor alternative architectures for multi-user key distributionschemes and under different threat model hypotheses.

REFERENCES

[1] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic,“Cloud computing and emerging it platforms: Vision, hype, andreality for delivering computing as the 5th utility,” Future Genera-tion Comput. Syst., vol. 25, no. 6, pp. 599–616, 2009.

[2] T. Mather, S. Kumaraswamy, and S. Latif, Cloud Security and Pri-vacy: An Enterprise Perspective on Risks and Compliance. Sebastopol,CA, USA: O’Reilly Media, Inc., 2009.

[3] H.-L. Truong and S. Dustdar, “Composable cost estimation andmonitoring for computational applications in cloud computingenvironments,” Procedia Comput. Sci., vol. 1, no. 1, pp. 2175–2184,2010.

[4] E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good, “Thecost of doing science on the cloud: The montage example,” inProc. ACM/IEEE Conf. Supercomputing, 2008, pp. 1–12.

[5] H. Hacig€um€us, B. Iyer, and S. Mehrotra, “Providing databaseas a service,” in Proc. 18th IEEE Int. Conf. Data Eng., Feb. 2002,pp. 29–38.

[6] G. Wang, Q. Liu, and J. Wu, “Hierarchical attribute-based encryp-tion for fine-grained access control in cloud storage services,” inProc. 17th ACM Conf. Comput. Commun. Security, 2010, pp. 735–737.

[7] Google. (2014, Mar.). Google Cloud Platform Storage with server-side encryption [Online]. Available: http://googlecloudplatform.blogspot.it/2013/08/google-cloud-storage-now-provides.html.

[8] H.Hacig€um€us, B. Iyer, C. Li, and S.Mehrotra, “Executing SQL overencrypted data in the database-service-provider model,” in Proc.ACMSIGMODInt’lConf.Manage.Data, Jun. 2002, pp. 216–227.

[9] L. Ferretti, M. Colajanni, and M. Marchetti, “Distributed, concur-rent, and independent access to encrypted cloud databases,” IEEETrans. Parallel Distrib. Syst., vol. 25, no. 2, pp. 437–446, Feb. 2014.

TABLE 6Costs of the Cloud Database Service during the Three Years Period in STATIC and DYNAMIC Scenarios

154 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014

Page 13: IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, … › people › fpierazzi › ... · master key is the encryption key of the encrypted metadata, and is known by legitimate

[10] R. A. Popa, C. M. S. Redfield, N. Zeldovich, and H. Balakrishnan,“CryptDB: Protecting confidentiality with encrypted query proc-essing,” in Proc. 23rd ACM Symp. Operating Systems Principles, Oct.2011, pp. 85–100.

[11] C. Gentry, “Fully homomorphic encryption using ideallattices,” in Proc. 41st ACM Symp. Theory Comput., May. 2009,pp. 169–178.

[12] A. Boldyreva, N. Chenette, and A. O’Neill, “Order-preservingencryption revisited: Improved security analysis and alternativesolutions,” in Proc. 31st Annu. Int. Conf. Adv. Cryptology, Aug.2011, pp. 578–595.

[13] P. Paillier, “Public-key cryptosystems based on composite degreeresiduosity classes,” in Proc. 17th Int. Conf. Theory Appl. Crypto-graphic Tech., May 1999, pp. 223–238.

[14] D. Song, D. Wagner, and A. Perrig, “Practical techniques forsearches on encrypted data,” in Proc. IEEE Symp. Security Privacy,May 2000, pp. 44–55.

[15] L. Ferretti, F. Pierazzi, M. Colajanni, and M. Marchetti, “Securityand confidentiality solutions for public cloud database services,”presented at the 7th Int. Conf. Emerging Security Information,Systems and Technology, Barcelona, Spain, Aug. 2013.

[16] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel, “The costof a cloud: Research problems in data center networks,” ACMSIGCOMM Comput. Commun. Rev., vol. 39, no. 1, pp. 68–73,Jan. 2008.

[17] L. Popa, S. Ratnasamy, G. Iannaccone, A. Krishnamurthy, andI. Stoica, “A cost comparison of data center networkarchitectures,” in Proc. ACM Int. Conf. Emerging Netw. Experi-ments Technol., 2010, pp. 16:1–16:12.

[18] R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. De Rose, and R.Buyya, “Cloudsim: A toolkit for modeling and simulation of cloudcomputing environments and evaluation of resource provisioningalgorithms,” Software: Practice and Experience, vol. 41, no. 1, pp. 23–50, 2011.

[19] O. Goldreich, Foundations of Cryptography: Volume 2, Basic Applica-tions. Cambridge, U.K.: Cambridge Univ. Press, 2004.

[20] J. Daemen and V. Rijmen, The Design of Rijndael: AES – TheAdvanced Encryption Standard. New York, NY, USA: Springer,2002.

[21] B. Schneier, “Description of a new variable-length key, 64-bitblock cipher (blowfish),” in Proc. Cambridge Security Workshop FastSoftw. Encryption, Dec. 1993, pp. 191–204.

[22] L. Ferretti, M. Colajanni, and M. Marchetti, “Supporting securityand consistency for cloud database,” in Proc. 4th Int. Symp. Cyber-space Safety Security, Dec. 2012, pp. 179–193.

[23] Amazon Web Services. (2014, Mar.). Amazon relational databaseservice [Online]. Available: http://aws.amazon.com/rds.

[24] EnterpriseDB. (2014, Mar.). Postgres plus cloud database [Online].Available: http://enterprisedb.com/cloud-database.

[25] Microsoft. (2014, Mar.). Windows azure SQL database[Online]. Available: http://www.windowsazure.com/en-us/services/data-management.

[26] Rackspace. (2014, Mar.). Rackspace cloud database [Online].Available: http://www.rackspace.com/cloud/databases.

[27] Amazon. (2014, Mar.). Amazon web services blog [Online]. Avail-able: http://aws.typepad.com/aws/price-reduction/.

[28] Amazon RDS Pricing. (2014, Mar.). Amazon relational databasepricing [Online]. Available: http://aws.amazon.com/rds/pricing.

[29] EnterpriseDB. (2014, Mar.). Postgres plus cloud database pricing[Online]. Available: http://www.enterprisedb.com/cloud-data-base/pricing-amazon.

[30] Microsoft. (2014, Mar.). Windows azure SQL database web &business pricing [Online]. Available: http://www.windowsa-zure.com/en-us/pricing/details/sql-database/#service-webandbusiness.

[31] Microsoft (2014, Mar.). Windows azure SQL database premiumpricing [Online]. Available: http://www.windowsazure.com/en-us/pricing/details/sql-database/#service-premium.

[32] Rackspace. (2014, Mar.). Rackspace cloud database pricing[Online]. Avaialble: http://www.rackspace.com/cloud/data-bases/pricing.

[33] L. Ferretti, M. Colajanni, and M. Marchetti, “Access controlenforcement of query-aware encrypted cloud databases,” in Proc.Fifth IEEE Int. Conf. Cloud Comput. Technol. Sci., Dec. 2013,pp. 717–722.

[34] B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. New-bold, M. Hibler, C. Barb, and A. Joglekar, “An integrated experi-mental environment for distributed systems and networks,” inProc. 5th USENIX Conf. Operating Syst. Design Implementation, Dec.2002, pp. 255–270.

[35] A. Fekete, D. Liarokapis, E. O’Neil, P. O’Neil, and D. Shasha,“Making snapshot isolation serializable,” ACM Trans. DatabaseSystems, vol. 30, no. 2, pp. 492–528, 2005.

[36] TPC. (2014, Mar.). Tpc-c on-line transaction processing bench-mark [Online]. Available: http://www.tpc.org/tpcc.

Luca Ferretti received the master’s degree incomputer engineering from the University ofModena and Reggio Emilia, Italy, in 2012, wherehe is currently working toward the PhD degree atthe International Doctorate School in Informationand Communication Technologies (ICT). Hisresearch focuses on information security, andcloud architectures and services.

Fabio Pierazzi received the master’s degree incomputer engineering from the University ofModena and Reggio Emilia, Italy, in 2013, wherehe is currently working toward the PhD degree atthe International Doctorate School in Informationand Communication Technologies (ICT). Hisresearch interests include security analytics andcloud architectures.

Michele Colajanni received the master’s degreein computer science from the University of Pisa,and the PhD degree in computer engineeringfrom the University of Roma in 1992. Since 2000,he has been a full professor of computer engi-neering at the University of Modena and ReggioEmilia. He manages the InterdepartmentResearch Center on Security and Safety (CRIS),and the Master in “Information Security: Technol-ogy and Law.” His research interests includesecurity of large scale systems, performance and

prediction models, Web and cloud systems.

Mirco Marchetti received the PhD degree inInformation and Communication Technologies(ICT) in 2009. He holds a postdoc position at theInterdepartment Research Center on Securityand Safety (CRIS) at the University of Modenaand Reggio Emilia. He is interested in intrusiondetection, cloud security, and in all aspects ofinformation security.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

FERRETTI ET AL.: PERFORMANCE AND COST EVALUATION OF AN ADAPTIVE ENCRYPTION ARCHITECTURE FOR CLOUD DATABASES 155