A Fine-Grained Adaptive Middleware Framework for Parallel...

A Fine-Grained Adaptive Middleware Frameworkfor Parallel Mobile Hybrid Cloud Applications

Reza ShiftehfarDepartment of Computer Science

U. of Illinois at Urbana-ChampaignEmail: [email protected]

Kirill MechitovDepartment of Computer Science


Gul AghaDepartment of Computer Science


Abstract— Mobile Cloud Computing (MCC) overcomes mobiledevice limitations by delegating tasks to more capable cloudspaces. Existing mobile offloading solutions generally rely on fullvirtual machine migration, which is coarse-grained and costly,or implementation of code offloading as part of the applicationlogic, which greatly increases the application complexity andthe associated software development costs. Some recent solutionsimplement fine-grained offloading, but pause the local mobileapplication while waiting for the offloaded code results. Thisleads to sequential execution and wastes local mobile resourcesand ignores the potential elasticity of the cloud environment. Wehave developed the IMCM framework to support parallel mobileapplication offloading to multiple cloud spaces. IMCM is fine-grained, supporting application distribution at the granularity ofindividual components; it is adaptive, addressing the dynamicityin run-time conditions and end-user contexts; and it is fully paral-lel, supporting both parallel application model and simultaneousexecution at mobile device and multiple private and public cloudspaces. Our evaluation results show that IMCM can improve theperformance of computationally intensive mobile applications bya factor of over 50, while masking the underlying complexity ofmobile-to-cloud code offloading.

I. INTRODUCTION

Mobile devices have become ubiquitous, but they are stillconstrained by their limited resources. Compared to laptopsand desktops, mobile devices typically have weaker hardware,more restricted network access, and more limited access to en-ergy. These limitations have created an increasing gap betweenthe demand for more complex applications and the availabilityof required hardware resources [11]. Cloud computing has thepotential to provide a solution to overcome mobile deviceconstraints and to address the ever-increasing complexity ofmodern mobile applications. Cloud computing provides elas-tic on-demand access to virtually unlimited resources at anaffordable price. The elastic resources allow weaker devicesrun more demanding applications by outsourcing storage orcomputation needs to cloud spaces. To achieve this, certainparts of mobile application have to be selected, sent to aremote cloud space, executed, and the results brought back tothe mobile device. This process is known as code offloadingand has been widely studied within the context of distributedsystems and grid computing [2], [15], [19].

Current practical solutions for providing the code offloadingcapability for mobile-cloud applications rely on either hard-coding the offloading decisions as part of the developed

program or using full Virtual Machine (VM) migration tomake an exact copy of the running application within cloudspace. The former has the advantage of being fine-grained,well-tuned, and potentially self-adapting based on run-timeparameters, but it requires programmers to rewrite their mobileapplication in an offloadable format. This places a significantburden on application developers, requires structural changesfor existing applications, and requires continuous maintenance,as mobile applications evolve over time. On the other hand,the latter approach is based on the assumption that runningthe same code on a faster machine improves the applicationperformance. It has the advantage of not creating additionalwork for developers but is highly coarse-grained. Virtualmachines are large components and moving them around isvery expensive even within a local area network (LAN) [9].

Application component distribution between mobile deviceand cloud resources must be flexible to satisfy differentuser expectations, and adaptive to address dynamic run-timecontext changes. This requires open systems that interactwith the environment while addressing application constraints,user expectations, and hardware limitations. Moreover, despitesome theoretical support for opportunistic parallelism, mostof the existing code offloading solutions pause local mobileexecution while waiting for offloaded code result leadingto semi-sequential applications [4], [6], [10]. With modernmobile devices benefiting from fast powerful multi-core pro-cessors, new offloading solutions are required that supportsfully-parallel applications.

These considerations motivate us to develop a fine-grainedadaptive solution that minimizes the required manual changesto applications, prevents creating additional work for pro-grammers, and allows fine-grained adaptive distribution ofapplication components. Our overall goal is to bridge the gapbetween mobile application development, cloud computingand dynamic adaptive code offloading while satisfying bothapplication and end-user requirements. Our main design objec-tive is to help mobile-cloud application programmers to focuson developing their application logic without worrying aboutcomponent distribution, that would be performed transparentlyand dynamically at run-time. We propose a framework thatmasks all the complexity of the mobile application codeoffloading to multiple cloud spaces. We model mobile-cloudapplication as a composition of self-contained autonomous

2

actor components. Our framework is fine-grained, supportingapplication configuration and distribution at the granularity ofindividual components, adaptive, addressing the dynamicityin environmental conditions and end-user contexts, and trans-parent, masking the underlying complexity of mobile-to-cloudcode offloading. It supports component distribution in a hybridcloud environment consisting of one or several public andprivate cloud spaces. Finally and most importantly, it providesa new code offloading model that supports parallel programexecution where application components located at mobiledevice and different cloud spaces are executed independentlybut concurrently.

This paper makes the following contributions:• We propose a new mobile code offloading model that

supports fully-parallel program execution, and presenta proof-of-concept implementation: the Illinois Mobile-Cloud computing Manager (IMCM).

• We highlight the advantages of the parallel executionof mobile applications on cloud spaces and demonstratethe value of simultaneous local mobile and remote cloudapplication execution in this framework.

• We study the impact of various run-time parameters onthe effectiveness of application offloading and devisean adaptive solution to dynamically manage componentdistribution.

• Empirical evaluation results for a suite of benchmarkmobile applications show a speedup factor between 9 and56 times over sequential execution on a mobile device.

II. RELATED WORK

Code offloading is not a new idea and has been used widelyin grid computing, where processes are migrated within thesame computing environment for the purpose of load balancingbetween different machines [2], [15], [19]. However, modernoffloading era started when virtualization became popularallowing cloud vendors to run arbitrary applications fromdifferent customers. In recent years, with the popularity ofmobile devices and the availability of affordable public cloudresources, code offloading has been extended to mobile de-vices and Mobile-Cloud Computing is introduced to overcomemobile limitations [11], [16]. Systems benefiting from MCCusually use one of the following two approaches: rely on theprogrammers to manually partition the program and specifyhow to offload parts of an application to remote servers, orto use full virtual machine migration in which entire processor entire OS is migrated to cloud space [17]. Former requiressignificant manual work and latter is too expensive [9].

In order to overcome manual work and expensive datatransfer, both automatic partitioning solutions [8], [13], [14]and fine-grained code offloading solutions are required. MAUI[6] combines both and enables fine-grained energy-awareoffloading of mobile code to a remote server. It uses acombination of virtual machine migration and automatic codepartitioning. However, it only supports sequential executionwhere mobile device is paused while waiting for the offloadedcode result. It also supports only single remote server and

requires manual annotation of methods by programmer andoffline static analysis of the source code before execution.

CloneCloud [4] avoids manual work and enables unmodifiedmobile applications to be offloaded. It supports offloading mul-tiple methods at the same time but requires an exact clone ofthe mobile device on the cloud. Despite its theoretical supportfor opportunistic parallelism, it leads to sequential execution inpractice, as the phone execution will be paused whenever localcode accesses the migrated state. Its application partitioner isalso static and needs to pre-process the application code in anoffline mode. It considers limited input/environmental condi-tions in the offline-preprocessing and needs to be bootstrappedfor every new application built.

ThinkAir [10] provides mobile code offloading while allow-ing on-demand VM creation and resource allocation. However,all VM creation is behind a single remote server and maskedfrom the mobile device. In fact, its main focus is on VM load-balancing rather than mobile-cloud application offloading.It supports opportunistic parallelism that results in limitedpractical parallelism.

COS [9] combines VM migration with application-levelmigration to reach fine-grained load balancing. However, itis focused on load-balancing of VMs within a cloud spacein order to improve overall cloud performance and does notconsider the performance of individual applications.

Our framework is focused on improving individual appli-cation performance while addressing dynamic run-time envi-ronment, end-user context, and application behavior. Unlikeprevious research, our system supports offloading to multipleremote locations, concurrent application model, and simul-taneous execution on both mobile device and remote cloudresources.

III. CLOUD APPLICATION AND INFRASTRUCTURE MODEL

In order to formulate application component offloadingproblem, a comprehensive mobile-hybrid-cloud applicationmodel is needed. This section summarizes our view on cloud,cloud-application, and mobile-cloud application.

A. Cloud Model

Over time, cloud services have moved from the model ofusing public cloud spaces to private clouds and recently tothe hybrid model combining both [20]. Cloud infrastructure istraditionally provided by large organizations, thus referred toas public cloud. However, storing data on third-party machinessuffers from potential lack of control and transparency inaddition to legal implications [3]. To address this, crypto-graphic methods are used to encrypt the data stored in publiccloud while decryption keys are only disclosed to authorized-users. However, these solutions inevitably introduce heavycomputational overhead for continuous encryption and de-cryption of data, distribution of decryption keys to authorizedusers, and management of data when fine-grained data accesscontrol is desired [18]. Cryptographic methods do not scalewell, have significant overhead, are expensive to maintain, andare often slow especially in widely geographically distributed

3

environments such as cloud. Moreover, they have traditionaldata-centric view on the cloud limited to storing data andproviding services for accessing it.

In modern mobile-cloud applications, resources stored in thecloud contain more than just data. These resources containpart of the application code that results in access operationmeaning execution of the code inside the cloud. Certificate-based authorization systems fail to address this type of appli-cations, as the encrypted piece of code within the cloud cannotbe executed without decryption and revealing the content tothe cloud provider. As a result, companies gradually movedtoward building their own private clouds [3]. However, owningprivate datacenter is not as efficient, reliable, nor scalable asusing the public ones. Thus, in recent years, a combinationof both private and public cloud spaces is used that benefitsfrom all the advantages of the public cloud while keepingthe confidential or sensitive data and algorithms in-house [7].Unlike previous mobile-cloud solutions that considers onlyone single remote location for offloading [4], [6], [10], ourmodel considers a hybrid cloud space consisting of one orseveral private and public cloud spaces and allows concurrentapplication component offloading and execution on all of them.

B. Cloud Application Model

In order to replace the traditional data-centric view of thecloud with a more general data/computation-centric view, cur-rent popular service-oriented architecture [12], that providesservices on data stored in the cloud to external users, needsto be replaced with a new architecture that dynamically andtransparently leverage cloud resources to address end-usermobile device limitations. An elastic application developmentenvironment allows components storing data or performingcomputations to be transparently distributed between privateclouds, public clouds, and end-user device. When such anapplication is launched, an elasticity manager monitors theenvironment, resource requirements of different applicationcomponents, and makes decision about component distributionbetween mobile device and different cloud spaces based onrun-time parameters, application behavior, and user expecta-tions. This allows mobile applications to adapt to differentworkloads, performance goals, energy limitations, and networklatencies. In order to prevent creation of additional work forapplication developers, unnecessary details of distribution andmove-around of application components should be masked.

In order to reach the maximum level of parallelism withoutthe hassle of traditional multi-threading model, modern cloud-based applications avoid using shared memory model that isunnatural for developers and leads to error-prone non-scalableprograms [12]. Instead of relying on global variables andshared states, modern cloud-based applications restrict the in-teraction between various components to communication usingmessages. This approach to cloud application developmentaligns with the concepts of actor model of computation [1] thatsees distributed components, called actors, as autonomous ob-jects operating concurrently and asynchronously. In responseto a received message, an actor can make local decisions,

create new actors, send more messages, or change its behaviorto respond differently to the next received message [1]. Com-pared to the traditional shared memory model, actors are abetter fit for highly dynamic applications operating in openand challenging environments. Actors may be created anddestroyed dynamically, they can change their behaviors, andmigrate to different physical locations. The model providesnatural concurrency, resiliency, elasticity, decentralization,extensibility, location transparency, and transparent migrationthat ease the process of scaling-up or out, which is a criticalrequirement for cloud-based applications. As a result, our viewof a mobile-cloud application consists of actors distributedbetween local mobile device and different cloud spaces.

IV. APPROACH

Mobile-cloud computing relies on code offloading processto benefit from available remote cloud servers. Running ap-plications in VM and migrating the entire VM to a moreresourceful machine allows benefiting from offloading withoutprocesses even knowing of the migration. However, VMs areusually large in size and migrating them is costly even whenperformed within a local area network [9]. An alternative thatprevents coarse-grained data transfer is to migrate applicationcomponents. As a result, offloading process consists of de-cision making about appropriate parts to offload in additionto migrating them, executing them on remote servers andbringing back the results. Proposed actor-based mobile-cloudapplication model provides natural application partitioningand masks component migration process. The only remainingpiece is finding appropriate components for offloading and thissection focuses on making such optimal offloading decisionwith respect to target goal, application behavior, and run-timeparameters.

A. Offloading Decision for Sequential Application to SingleRemote Server

Without considering offloading process cost and its effecton application behavior, speedup resulting from running thesame code on a more resourceful machine can be defined asthe ratio of available resource:

Speedup =Ss

Sm=Fserver ∗ Cserver ∗Xserver

Fmobile ∗ Cmobile(1)

where Ss, Fserver, Cserver, Sm, Fmobile, Cmobile are thespeed, processor frequency and number of cores of the serverand mobile device. Xserver is the additional speedup resultingfrom availability of additional resources on the remote server,e.g. caches, memory and potentially more aggressive pipelin-ing. Equation 1 states that offloading is always beneficial, aslong as there is a more resourceful server. However, it ignoresthe required resources for the offloading process and the effectof offloading on application behavior. However, only if therequired amount of resources for offloading process is small,network connection is fast, and amount of transferred data issmall, speedup close to Equation 1 can be achieved in practice.

4

For most applications, the required resources for offloadingprocess cannot be ignored.

Extending Equation 1 to include the cost of offloadinghighly depends on the target offloading goal. Offloading goalscan vary significantly based on the application or user andrange from maximizing the application performance (e.g.games, vision-based applications) to minimizing energy con-sumption on the mobile device (e.g. background applications).This paper focuses on maximizing application performancegoal and leaves the remaining goals to future effort.

Maximizing application performance, or minimizing totalexecution time, provides real-time applications with higherquality computation in the same amount of time leading toa smoother and better experience for users. Assuming a smallapplication with w amount of offloadable work, the goal is todecide whether to offload or not. Following [11] model, wecan summarize the problem as below:

w

Sm>diB

+w

Ss→ w ∗ (

1

Sm− 1

Ss) >

diB

(2)

where Sm and Ss are the speed of the mobile deviceand remote server processors, B is the network connectionbandwidth and di is the size of data to be transferred. Theleft side of this equation shows the total required time toexecute work w on the mobile device while the right sidecaptures the required time to transfer data to a remote serverand execute it on the server. It only makes sense to offloadwhen the left side is larger than the right side. Note thatthis equation ignores many parameters such as communicationlatency, required time to bring back the result, etc. Equation 2also shows that Ss effect is of second degree and an infinitelyfast server (Ss = ∞), does not lead to an always-offloadingdecision, if other parameters are not proportional.

Although we focused on the goal of maximizing applicationperformance, offloading decision for the goal of minimizingmobile device energy consumption is similar for sequentialapplications. In sequential execution, mobile device remainsin idle state consuming energy while waiting for the resultsfrom the offloaded code. Consequently, the required time forapplication execution on the remote server is proportional tomobile device energy usage [4], [6], [10]. However, this effectis limited to sequential applications where only one of themobile device or remote server executes code at any time.

Use of Equation 2 leads to a pause-offload-resume modelwhere the system pauses before executing any part, checksEquations 2 and decides whether to offload or not. If decisionis to offload, mobile application will be paused, data trans-ferred to remote server, code executed on remote server, resultsbrought back to the mobile device, and mobile application re-sumed [6]. However, in communication-intensive applications,offloading single components at a time results in significantremote communication. When components are on the samedevice, communications are relatively fast and through sharedmemory space. But when placed on different machines, com-munications go through multiple network devices and becomecostly. As a result, components communicating extensively

should be offloaded together. The problem of deciding onoffloading multiple parts of an application can be formulatedas a graph partitioning problem where nodes are applicationcomponents, having a weight equal to the amount of their com-putation, and edges are communications in between, having aweight equal to the amount of transferred data. In such a graph,offloading decision equals finding the minimum cost cut topartition the graph between mobile device and remote server[4], [10]. Note that application execution is still consideredsequential and only one of the components will be executedat any time.

In order to avoid sequential program execution resultingfrom previous graph-based partitioning approach, CloneCloud[4] and ThinkAir [10] support opportunistic parallelism. Whena component is offloaded, the remaining code on the mobiledevice continues with its execution as long as the offloadedstate is not accessed. As soon as the local code tries to accessthe state of the offloaded part, local execution is blocked andonly resumed when the offloaded code result is received. De-spite theoretical potential for parallel execution, this model stillleads to sequential execution in practice. In most applications,shared program state is constantly accessed by different partsand mobile code execution remains blocked most of the time.Moreover, it can only considers single remote location foroffloading. This is one of the main drawbacks of using ashared state program model and a natural result of sequentialapplications. When parallelism is considered, mobile deviceand remote server can execute code simultaneously in additionto multiple remote servers working concurrently.

B. Offloading Decision for Parallel Applications to HybridCloud Environment

Deciding on optimized offloading plan for parallel applica-tions in a hybrid cloud environment requires considering appli-cation type, available resources at different remote machines,and offloading effect on future application behavior. Similar toprevious sections, target offloading goal is maximizing appli-cation performance or minimizing total application executiontime. We still have a graph G(V,E) where vertices representapplication components and edges represent communicationsin between. The goal is to partition the graph between mobileand different cloud resources in a way that total executiontime is minimized. Total execution time consists of the timerequired to execute the application code in addition to the timerequired for remote components to communicate and exchangedata with each other. Fully parallel execution refers to bothparallel execution on multiple remote locations and simul-taneous local and remote execution. In other words, mobiledevice and different cloud spaces execute their componentssimultaneously. As a result, total application execution time isthe maximum time required for any of the mobile or remotespaces to finish executing program code for all of its assignedcomponents. Since local communication between componentslocated on the same machine is relatively fast, we can ig-nore local communication and only consider communicationsbetween different components placed at different locations.

5

Note that different locations can communicate simultaneouslyand the total required time for communication is equal to themaximum communication time of different locations. Usingtable I notations, the offloading goal can be summarized asfollowing:

TABLE I: Notations used in parallel offloading model

Notation DescriptionB(L) Connection bandwidth out of location LCommAtLoc(L) Communication time from components on Loca-

tion L to all other locationsCores(L) Number of cores available at Location L∆ Time interval of running elasticity managerExec(i,l) Exec. time of component iε[1, N ] at location

lε[0,M ]ExecAtLoc(L) Execution time for all components on Location LJobCount(i) Number of requests processed by component i

during the time interval ∆Loc(i,t) Location of component i at time tLocAllowed(i, t) Set of locations at which component i is allowed

to be placed at time t. LocAllowed(i, t) ε [0,M ]LocEQ(L1, L2) Checks whether two given locations are identical.

Returns 1, if L1 = L2. Otherwise, returns 0.MaxAppPerf Maximum Application PerformanceMinAppExec Minimum Application Execution TimeProfComm(i, j) Profiled amount of communication between com-

ponents i and j during the time interval ∆

Max( MaxAppPerf ) = Min( MinAppExec ) =

Min( max0≤L≤M

(ExecAtLoc(L)) + max0≤L≤M

(CommAtLoc(L)) )

(3)

Mobile application consists of N components and eachcomponent iε[1, N ] is located at Loc(i, t) at time t. HavingM different cloud spaces results in Loc(i, t)ε[0,M ] where0 represents local mobile device and [1,m] corresponds todifferent cloud spaces. Assuming that we know the applicationcomponent distribution between the local mobile device andthe hybrid cloud spaces at time t1, our goal is to find optimalcomponent distribution for next time interval t2 in a way thatapplication performance is maximized. Thus, different parts ofEquation 3 can be extended as following:

max0≤L≤M

( ExecAtLoc(L)) =

max0≤L≤M

(1

Cores(L)∗

N∑i=1

{LocEQ(L,Loc(i, t2))∗

Exec(i,Loc(i, t2)) ∗ JobCount(i)} )

(4)

Note that both Exec(i, L) and JobCount(i)) are providedby the monitoring system and are results of previous profilingof the application. LocEQ(L1, L2) considers the executiontime of only components running on location L. Similarly,the second part of Equation 3 can be extended as below:

max0≤L≤M

( CommAtLoc(L)) =

max0≤L≤M

(1

B(L)∗

N∑i=1

N∑j=1

{LocEQ(L,Loc(i, t2))∗

(1− LocEQ(L,Loc(j, t2))) ∗ ProfComm(i, j)} )

(5)

As mentioned before, this equation shows the maximumrequired time for one of the locations to send out all itscommunications. LocEQ(L,Loc(i, t2)) considers only com-ponents that will be located at Location L at time t2 and(1−LocEQ(L,Loc(j, t2))) captures only remote communica-tions out of location L. Solving these equations results in aset of Loc(i, t2) that are the optimized locations for differentapplication components during the next time interval ∆. Butnot all components of an application are offloadable. So, a fewconstraints must be added to the above optimization problem.As we are considering a hybrid cloud consisting of multipleprivate and public cloud spaces, application developers orusers can specify additional constraints in terms of howdifferent components can be offloaded to different locations.These additional constraints can address certain privacy issuesin terms of not offloading sensitive or confidential componentsto public cloud spaces. Required constraints can be written asbelow:

subject to constraints:

Loc(i, t1) ε LocAllowed(i, t1) : ∀ i ε [1, N ]

Loc(i, t2) ε LocAllowed(i, t2) : ∀ i ε [1, N ]N∑i=1

N∑j=1

{LocEQ(L,Loc(i, t2)) ∗ (1− LocEQ(L,Loc(j, t1)))}

≤ α ∗ Cores(L) : ∀ L ε [0,M ](6)

The last constraint is added to prevent flooding too manycomponents at once to a remote server with good initialperformance. We limit the number of components that can beoffloaded to each remote server to a factor of the number ofavailable cores on that server. α of range 2 to 8 is compatiblewith our evaluation results that shows best performance can beachieved when 2 to 8 actors are assigned to each core. If afterone round of component move around the target remote serverstill has enough resources and the execution times are still fastenough, another round of actors can be migrated to that loca-tion. In most cases, LocAllowed(i, t1) = LocAllowed(i, t2), asthe privacy constrained are not often changed. However, theuser or the run-time environment has the option of adjustingprivacy requirements at run-time whenever needed.

V. EVALUATION

This section presents our experimental results for evaluatingour proposed framework. To make the results comparable andlink them to our target offloading goal of maximizing applica-tion performance, we measure effectiveness as the speedupgained compared to sequential local execution on mobiledevice. Our selected corpus consists of applications coveringdifferent types of programs: CPU intensive, communicationintensive, I/O intensive, and combined. In subsection V-C,we investigate the effect of different application parameterson offloading decision. In subsection V-D we evaluate theperformance of the proposed middleware framework.

6

A. Experimental Setup

Our used equipment include a Samsung Google Nexus S asthe mobile device and a Macbook Pro Laptop as the remoteoffloading server. Table II summarizes the specifications of ourused equipment. Mobile device and the remote server are bothon the same WiFi network.

TABLE II: Specifications of the used equipment for evaluation

Remote Server Mobile DeviceSystem Macbook Pro-Retina Samsung Google Nexus SOS Mac OSX 10.9.4 Android 4.1.2VM JVM (JRE 1.6) DalvikVMProcessor Intel Core i7 ARM Coretex-A8Proc. speed 2.3 GHz 1 GHzNo. of cores 4 1L2 Cache 256 KB/Core 256 KBL3 Cache 6MB -Memory 16 GB 512 MB

Our mobile-cloud application model is based on the actormodel of computing that offers natural parallelism for devel-oped applications. Many actor programming languages havebeen developed over years to support different applications.Despite some small differences, most of these programminglanguages provide main standard actor semantics includingencapsulation, fair scheduling, location transparency, localityof references, and transparent migration. For our experiments,we chose Salsa ( [21]) as the programming language mainlydue to its adherence to standard actor semantics. Salsa providesgood support for parallel and distributed programming. Itssupport for code and data mobility and asynchronous mes-sage passing makes programming for distributed systems anatural task. Its coordination model provides an attractivefeature for parallel programming where multiple CPUs needto coordinate and communicate between themselves in anefficient manner. SALSA depends on Java, hence it inheritsJava’s powerful feature of portability across different platforms[21]. We were able to make SALSA work on Android mobiledevices running DalvikVM with some modifications. Salsaprovides lightweight actors. The use of lightweight actorsmakes SALSA highly scalable that is one of the main limita-tions of some older actor languages. A huge advantage of usinglightweight actors is the speed and ease of actor migrationbetween different devices. Our experimental result showed thatSalsa actors can be created or migrated in 100 ∼ 200 ms onor between different machines working on the same WiFi.This fast migration speed eases the process of mobile-cloudapplication offloading.

The base case in our evaluation is the required time for localsequential execution of the application on the mobile deviceand the execution speedups are used for comparing differentscenarios. In order to account for randomness, we repeat eachexperiment five times and verify the statistical significanceof observed execution times through non-parametric Mann-Whitney U-tests. Unless stated otherwise, the test is two-tailedand the significance level is α = 0.01.

B. Program Corpus

Table II lists the programs used in the evaluation togetherwith their main characteristics. Evaluation benchmark pro-grams are selected based on their characteristics to coverdifferent application behavior: Computational intensive, Com-munication intensive, and I/O intensive. In addition, a multi-behavior application is added to combine different characteris-tics. To avoid a bias towards specific strengths of our approachand to foster comparability, we mostly use similar examplesas for works presenting solutions to mobile-cloud computationoffloading. The NQueen program is a computation-intensiveapplication that places N queens on a N ∗ N chessboard sothat no two queens threaten each other [10]. The Heat programis a communication-intensive application that simulates heattransfer in a two-dimensional grid in an iterative fashion [9].Our implementation allows specifying the desired level ofcommunication and both medium and high level of commu-nications are studied. The Trap program is a computation-intensive application that calculates a definite integral byapproximating the region under the graph as a trapezoid andcalculating its area. The Virus program reads in file streamsfrom disk and scans for the signature of a given virus [4],[10]. The Rotate program is an I/O-intensive application thatreads in an image from disk, rotates it in memory and writes itback to disk. Similarly, the ExSort program is an I/O-intensiveapplication that sorts the content of a large file using externalsort algorithm in limited amount of memory. Finally, theImage program combines all I/O, CPU, and communicationcharacteristics by detecting and recognizing all faces in a givenpicture using a large dataset of known faces [4]–[6], [10].Since processing of each picture is performed sequentially,multiple images are processed simultaneously in order toadd parallelism. To save space, only part of the results arepresented in this paper.

C. Influence of App. Parameters on Offloading Decision

This section discusses how different application or execu-tion properties influence offloading decision, answering thefollowing research questions: (RQ1-RQ3)

RQ1: What influence do the a) cost of offloading process,b) application type, and c) run-time parameters have onthe mobile-cloud offloading decision?

Table III shows the speedup results for different applicationstogether with applications’ main characteristics. While rawspeedup column ignores the cost of offloading process, offloadspeedup column shows a more realistic view on mobile-cloudoffloading by including the required time for offloading pro-cess. Note that different rows of the table represents differentapplications with significantly different behavior, architectureand characteristics that should not be compared with eachother. Comparing the values of raw speedup and offloadspeedup columns shows the effect of offloading cost on gainedspeedup. Offloading cost includes the required resources tomake offloading decision, offload the application code toremote server and bringing back the result. Ignoring the cost ofoffloading process, Equation 1 predicts the speedup resulting

7

TABLE III: Programs used to evaluate our framework. Application characteristic column shows dominant behavior of the application, rawspeedup column summarizes maximum speedup gained by running application on a more-resourceful machine excluding offloading time,and offload speedup shows maximum speedup resulting from offloading including offloading overhead

Experiment DescriptionApplication Characteristic

Raw Speedup Offload SpeedupComp. Comm. I/Oread write

NQueen Places N Queens on N*N board intensive - - - 73 56Image Detects & recognizes all faces in a photo intensive limited limited - 91 44Trap Uses trapezoidal rule to calculate definite integral intensive limited - - 30 21Virus Scans a file stream for a specific virus signature - - intensive - 28 21Rotate Reads, rotates & saves an image to disk - - intensive intensive 28 9ExSort External Sort of the content of a file intensive - intensive intensive 46 36Heat1 simulates heat exchange on a board limited medium - - 31 29Heat2 simulates heat exchange on a board limited high - - 14 14

from running the same code on a faster machine. AssumingXserver = 7 for our experimental setup, the expected speedupis as below:

Speedup =Ss

Sm=

2.3 ∗ 4 ∗ 7

1.0 ∗ 1= 64 (7)

raw speedup column shows that a speedup of 64 timesor even higher is possible. However, when large amountof data needs to be offloaded (such as Rotate application),offloading speedup reached in practice is significantly lower.Moreover, the result highly depends on the application typeand behavior as well. A computational-intensive applicationwith high degree of parallelism (e.g. NQueen) can benefitfrom all the additional available resources on the remote serverand can reach a high offloading speedup. Extensive I/O opera-tions or communications between different components limitsapplication’s ability of benefiting from additional availablecomputational resources at the remote server and reduces thegained speedup (e.g. Rotate and Heat).

In order to decide on the beneficiary of offloading w amountof computation to a remote server for our experimental setup,Equation 2 can be used with values from table II:

w ∗ (1

1024Mhz− 1

2.3 ∗ 1024MHz ∗ 4 ∗ 7) >

diB

(8)

Rearranging the equation results in Bmin ≥ 1040 ∗ di

wto be the minimum required bandwidth in order for offload-ing decision to reduce total application execution time. Theequation depends on the ratio of di

w and can only be truewhen the ratio is small enough. In other words, applica-tion offloading is beneficial for large amount of computation(w) and low amount of transferred data (di). For values inbetween, the decision depends on the available bandwidth(B) and an elasticity manager must evaluate the equationbased on run-time parameters. For N-Queen problem, a singleinteger value has to be transferred both for input value (N )and final result and di is very small. At the same time,problem is computational-intensive and requires large amountof computation (large w). As a result, any type of networkconnection provides enough bandwidth and offloading alwaysimproves application performance. Note that the code of theN-Queen solver is assumed to be available on the remoteserver and network latency is ignored. In case of the Image,

assuming remote server to be super fast (Ss =∞), offloadingdecision depends on w, di and B. If detection of faces inthe initial image, extracting features for every detected faceand comparison to database are all offloaded, the entire initialimage needs to be transferred to the remote server and theamount of communicated data (di) is large. Thus, it is onlybeneficial to offload, if B is large enough. On the other hand,if the initial detection of faces are performed locally andonly the extracted features are transferred, di is much smaller.Consequently, even for slower network connections, offloadingof the remaining parts is beneficial. This highlights the im-portance of considering the combination of all parameters fordeciding on offloading. Different parts of an application canbecome offloading candidates at different time and an elasticitymanager is required to dynamically decide on offloading basedon run-time parameters.

RQ2: How significant is the influence of problem size(amount of work) on mobile-cloud offloading?

Figure 1 and Figure 2 show the offloading speedup fordifferent amount of work for NQueen and Image applications.The results show that larger amount of work results in morecomputationally-intensive applications, reduces the importanceof the fixed amount of work required for offloading process,and increases the gained speedup. While initial offloadingspeedup of NQueen problem is almost equal to 1 (for N=8)due to low amount of required computation, changing N valueexponentially increases the amount of work to be performedand the resulting speedup. Image problem is a multi-behaviorapplication with initial speedup of larger than 1 due to thesize of computations required for processing even one singleimage. For this problem, changing the amount of work equalsincreasing the number of images to be processed and resultsin linear increase of speedup.

RQ3: What influence does the application parallelismdegree have on mobile-cloud offloading?

Equation 7 predicts the ideal speedup resulting from offload-ing where computation is large enough, code has high degreeof parallelism roughly comparable to available resources, andnegligible amount of resources is used for offloading process.Without benefiting from parallelism, running the same code ona more resourceful machine can only provide limited speedup(Sequential remote execution graphs of Figure 1 and Figure 2).This speedup is mostly because of benefiting from remoteserver’s faster CPU speed, additional available caches, and

8

Fig. 1: Speedup summary for local and remote execution of N-Queenexecution for different amount of work (different problem size)

Fig. 2: Speedup summary for local and remote execution of ImageProcessing application for different amount of work (different no. ofimages to process)

more memory. However, additional available processing unitsare not used. We mentioned that for practical applications,the amount of resources required for offloading process isnegligible compared to resources required for performinglarge amount of computation. If computation is not largeenough, even using high degree of parallelism does not providesignificant additional speedup. However, when the amount ofcomputation is large enough, higher degree of parallelismsignificantly improves the performance and the benefit ofhaving additional processing resources becomes visible.

Figure 3 shows the relationship between application paral-lelism degree and speedup resulting from offloading. Whileon a mobile device with only one single core increasingparallelism degree does not improve the performance, ona more resourceful remote server increasing the programparallelism degree allows better utilization of resources andincreases application performance. While sequential executionof NQueen problem on a faster system generates a speedupof 14 times, increasing the parallelism degree increases theresulting speedup to 55. Performance improvement resultingfrom increasing program parallelism degree is limited bythe availability of resources. At a certain parallelism degree,resources will become saturated and further increase of par-allelism degree will have reverse negative effect (Figure 5).Considering the null hypothesis that remote sequential execu-tion is as effective as the remote parallel execution, Mann-Whitney U-test shows that all differences for various problemsizes and parallelism degrees are significant. Consequently, the

Fig. 3: Speedup summary for local and remote execution of NQueenproblem with different degree of parallelism

null hypothesis is rejected.

D. Effectiveness of the Proposed Middleware Framework

This section discusses the performance of the proposedIMCM middleware framework, answering the following re-search questions: (RQ4-RQ6)

RQ4: Is the IMCM proposed parallel local & remoteexecution offloading solution more effective than existingsequential (or pseudo-parallel) execution offloading solu-tions?

While offloading computation to a more resourceful systemcan improve overall application performance, mobile devicelocal resources are wasted while waiting in the idle statefor the result of offloaded code to be returned. With mobiledevices becoming more powerful, this wasted computationalpower can be put to a better use. Our proposed frameworksupports simultaneous local and remote application executionand uses local mobile resources to execute other parts ofan application while waiting for the offloaded code result.Figure 4 shows the speedup differences between processingdifferent number of images using only remote server andsimultaneous execution on both local device and remote server.Since processing of a single image is sequential, for smallamount of work (small number of pictures to process), totalexecution time will be dominated by the required time forlocal mobile device to process its share. This will result inremote server starvation and waste of resources, as there willbe no more job for it to process. However, with increase inthe amount of work, there will always be enough job forremote server to perform and the advantage of using both localand remote server for application code execution becomesvisible. Figure 5 shows the same effect based on applicationparallelism degree. We mentioned earlier that higher degree ofparallelism will increase the flexibility of the application andresults in higher offloading speedup. However, this is onlytrue, if enough computational resources are available. As canbe seen in the graph, increasing the parallelism degree (numberof workers) initially results in higher speedup but after acertain point this effect is reversed. In fact, having higherdegree of parallelism than the available resources results inover-saturation of resources, adds the overhead of managing

9

Fig. 4: Speedup summary for remote execution vs. local+remoteexecution of image processing problem with different problem size(different number of images)

Fig. 5: Speedup summary for remote execution (x remote workers)vs. local+remote execution (1 local + x remote workers) of imageprocessing problem with different number of remote workers

all those workers, and reduces overall speedup. Our resultsshow that required parallelism degree for an application toreach highest speedup is proportional to number of processingcores available. The coverage differences of any two differentnumber of workers for both remote and simultaneous localand remote executions are significant (α = 0.01). Thus, thenull hypothesis that there is no significant difference betweenimage processing execution with different number of workerscan be rejected.

RQ5: How effective is the IMCM elasticity manager indetecting application run-time environmental parametersand offloading appropriate application components?

Despite significant performance speedup resulting fromoffloading application to more resourceful systems, manualconfiguration of components between local mobile deviceand remote server is not possible. Ideal component distribu-tion depends on several factors that can dynamically changeduring execution. Thus, an elasticity manager is required tomonitor environmental changes and find optimal offloadingplan. Figure 6 shows the result for manual placement ofapplication components versus automatic component manage-ment using IMCM elasticity manager that solves Equation 5and Equation 6. Implemented elasticity manager uses theprevious profiled execution times of different components atvarious locations to find the optimal location for placing everycomponent for next interval. We currently do not use profiledexecution time from previous execution of the application. As

Fig. 6: Speedup summary for local execution (base case) vs. remoteexecution (ideal case) vs. local execution with elasticity manager (allautomatic management) of image processing problem with differentproblem size (different number of images to process)

Fig. 7: Overhead resulting from elasticity manager for image process-ing problem with different problem size (different number of imagesto process)

a result, there is an initial lag between start of an applicationand optimal placement of components resulting from therequired time to collect enough profiled data. As a result,when problem size and resulting total application executiontime increases, the gap between ideal placement of componentand automatic distribution becomes narrower.

RQ6: What is the performance overhead of the IMCMautomatic elasticity manager?

While offloading appropriate components to a remote servercan potentially improve application performance, having acostly elasticity manager to profile run-time and applicationparameters and finding optimal distribution plan can resultin less overall performance. Figure 7 shows the overheadresults from our implemented elasticity manager. Results showthat having profiler and elasticity manager running in thebackground generates 1 − 5% speedup decrease on average.Considering the range of 9 − 60x for speedup gain fromoffloading applications shows that IMCM elasticity manageroverhead is insignificant. Moreover, as the problem size in-creases, the benefit of offloading becomes more dominant andthe elasticity manager overhead becomes even less important.

VI. LIMITATIONS AND FUTURE WORK

We try to ensure conclusion validity of our evaluation bychecking the statistical significance of measured executiontimes with a robust non-parametric test at a high level α =0.01. One threat to the construct validity of our experiment

10

is the use of performance speedup as effectiveness metric.With the amount of work increased, the gap between localmobile execution and other form of execution becomes larger.This reflects the improved performance and can be used toevaluate the performance of one application with differentsettings. However, the amount of work performed by differentapplications varies significantly. Moreover, different applica-tions have different behavior, architecture and characteristics.Thus, comparison of speedup between different applicationscannot be performed. The external validity of our evaluationis threatened by our focused corpus. Despite most of programsbeing selected according to benchmarks used in previousworks, the corpus does not constitute a random sample ofprograms. Consequently, our results may generalize poorly.A larger study would mitigate this risk and is considered asfuture work.

Although we mentioned different component distributionplan in case of parallel execution for optimizing applicationperformance and mobile energy consumption, this paper fo-cuses only on mobile application performance improvementusing code offloading. We are currently extending the frame-work to support mobile energy consumption optimization andto allow dynamic adjustment of application target goal. Abig challenge with energy optimization is profiling detailedconsumption of individual application components. Whileexecution time of different components can individually berecorded using system clock, mobile device only reports lumpsum energy consumption and break down of total energyamong different components remains as a challenge.

VII. CONCLUSION

In this paper we proposed IMCM middleware frameworkfor transparent automatic code offloading from mobile de-vices to hybrid cloud spaces. The framework is fine-grained,supporting application configuration and distribution at thegranularity of individual components; it is adaptive, addressingthe dynamicity in run-time conditions and end-user contexts.It further supports component distribution in a hybrid cloudenvironment consisting of one or more public and private cloudspaces. Finally, it provides a new code offloading model thatsupports parallel program execution where application com-ponents located at mobile device and different cloud spacesare executed independently but concurrently. Evaluation re-sults show that the offloading result depends on applicationbehavior, offloading cost, and run-time parameters and canrange between 9 to 56 times.

VIII. ACKNOWLEDGMENTS

The authors would like to thank members of the OpenSystems Laboratory (OSL) group at University of Illinois atUrbana-Champaign for their valuable discussion and insight.This publication was made possible in part by sponsorshipfrom the Air Force Research Laboratory and the Air ForceOffice of Scientific Research under agreement FA8750-11-2-0084.

REFERENCES

[1] G. Agha and C. Hewitt. Concurrent programming using actors: Exploit-ing large-scale parallelism. In Foundations of Software Technology andTheoretical Computer Science, pages 19–41. Springer, 1985.

[2] R. Balan, J. Flinn, M. Satyanarayanan, S. Sinnamohideen, and H.-I.Yang. The case for cyber foraging. In Proceedings of the 10th workshopon ACM SIGOPS European workshop, pages 87–92. ACM, 2002.

[3] R. Chow, P. Golle, M. Jakobsson, E. Shi, J. Staddon, R. Masuoka,and J. Molina. Controlling data in the cloud: outsourcing computationwithout outsourcing control. In Proceedings of the 2009 ACM workshopon Cloud computing security, pages 85–90. ACM, 2009.

[4] B.-G. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti. Clonecloud:elastic execution between mobile device and cloud. In Proceedings ofthe sixth conference on Computer systems, pages 301–314. ACM, 2011.

[5] B.-G. Chun and P. Maniatis. Dynamically partitioning applicationsbetween weak devices and clouds. In Proceedings of the 1st ACMWorkshop on Mobile Cloud Computing & Services: Social Networksand Beyond, page 7. ACM, 2010.

[6] E. Cuervo, A. Balasubramanian, D.-k. Cho, A. Wolman, S. Saroiu,R. Chandra, and P. Bahl. Maui: making smartphones last longer withcode offload. In Proceedings of the 8th international conference onMobile systems, applications, and services, pages 49–62. ACM, 2010.

[7] R. K. Grewal and P. K. Pateriya. A rule-based approach for effectiveresource provisioning in hybrid cloud environment. In New Paradigmsin Internet Computing, pages 41–57. Springer, 2013.

[8] G. C. Hunt and M. L. Scott. The coign automatic distributed partitioningsystem. In OSDI, volume 99, pages 187–200, 1999.

[9] S. Imai, T. Chestna, and C. A. Varela. Elastic scalable cloud computingusing application-level migration. In Proceedings of the 2012 IEEE/ACMFifth International Conference on Utility and Cloud Computing, pages91–98. IEEE Computer Society, 2012.

[10] S. Kosta, A. Aucinas, P. Hui, R. Mortier, and X. Zhang. Thinkair:Dynamic resource allocation and parallel execution in the cloud formobile code offloading. In INFOCOM, 2012 Proceedings IEEE, pages945–953. IEEE, 2012.

[11] K. Kumar, J. Liu, Y.-H. Lu, and B. Bhargava. A survey of computationoffloading for mobile systems. Mobile Networks and Applications,18(1):129–140, 2013.

[12] K. Kumar and Y.-H. Lu. Cloud computing for mobile users: Canoffloading computation save energy? Computer, 43(4):51–56, 2010.

[13] M. Neubauer and P. Thiemann. From sequential programs to multi-tierapplications by program transformation. In ACM SIGPLAN Notices,volume 40, pages 221–232. ACM, 2005.

[14] R. Newton, S. Toledo, L. Girod, H. Balakrishnan, and S. Madden.Wishbone: Profile-based partitioning for sensornet applications. InNSDI, volume 9, pages 395–408, 2009.

[15] B. D. Noble, M. Satyanarayanan, D. Narayanan, J. E. Tilton, J. Flinn,and K. R. Walker. Agile application-aware adaptation for mobility,volume 31. ACM, 1997.

[16] M. Rahman, J. Gao, and W.-T. Tsai. Energy saving in mobile cloudcomputing*. In Cloud Engineering (IC2E), 2013 IEEE InternationalConference on, pages 285–291. IEEE, 2013.

[17] M. Satyanarayanan, P. Bahl, R. Caceres, and N. Davies. The case forvm-based cloudlets in mobile computing. Pervasive Computing, IEEE,8(4):14–23, 2009.

[18] R. Shiftehfar, K. Mechitov, and G. Agha. Towards a flexible fine-grained access control system for modern cloud applications. In CloudComputing (CLOUD), 2014 7th IEEE International Conference on,pages –. IEEE, 2014.

[19] J. P. Sousa and D. Garlan. Aura: an architectural framework foruser mobility in ubiquitous computing environments. In SoftwareArchitecture, pages 29–43. Springer, 2002.

[20] S. Subashini and V. Kavitha. A survey on security issues in servicedelivery models of cloud computing. Journal of Network and ComputerApplications, 34(1):1–11, 2011.

[21] C. Varela and G. Agha. Programming dynamically reconfigurable opensystems with salsa. ACM SIGPLAN Notices, 36(12):20–34, 2001.

A Fine-Grained Adaptive Middleware Framework for Parallel...

Documents

Transcript of A Fine-Grained Adaptive Middleware Framework for Parallel...