GIFT: A Coupon Based Throttle-and- Reward …...OST 1 OST 2 OST 3 100% 0% 25% 50% 75% B / W A A A B...
Transcript of GIFT: A Coupon Based Throttle-and- Reward …...OST 1 OST 2 OST 3 100% 0% 25% 50% 75% B / W A A A B...
A Coupon Based Throttle-and-Reward Mechanism for Fair and Efficient I/O Bandwidth Management on Parallel Storage Systems
Rohan GargTirthak Patel Devesh Tiwari
GIFT:
The Key Idea Behind
The Key Idea Behind
…but, first some background
Data-intensive Parallel Applications
I/O Phase
Compute Phase
Compute Phase
Compute System
Parallel Storage System
Data-intensive Parallel Applications
Compute Nodes (OSCs)
SIONCTRL A
CTRL B
CTRL A
CTRL B
HBA
HBA
HBA
HBA
NET
NET
NET
NET
OSSes OSTs
MDSes MDTs
CTRL A
CTRL B
HBA
HBA
NET
NET
I/O Phase
Compute Phase
Compute Phase
Object Storage Targets (OSTs)
C AE D B A
A B C D
Isovalues on compressed simulation data with bounding error - (32 bits, 3200x2400x42, 1.4 GB) !
0.25 bits!10.8 MB!
1.0 bits!43.3 MB!
0.5 bits!21.6 MB!
2.0 bits!86.5 MB!
EOne application performs I/O
concurrently to multiple OSTs.
Parallel applications can cause unmanaged and unpredictable I/O interference!
GeoScience
Object Storage Targets (OSTs)
C AE D B A
A A A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
MBW
A A A
B
DDC
B
OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
AA A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
A B C D
Isovalues on compressed simulation data with bounding error - (32 bits, 3200x2400x42, 1.4 GB) !
0.25 bits!10.8 MB!
1.0 bits!43.3 MB!
0.5 bits!21.6 MB!
2.0 bits!86.5 MB!
GeoScience
EOne application performs I/O
concurrently to multiple OSTs.
Parallel applications can cause unmanaged and unpredictable I/O interference!
Inefficient I/O bandwidth utilization
A A A A
B
D D
C C
B
Traditional
Time t1
Time t2
A A A A
B
D D
C C
B
A A A A
B
D D
C C
B
Traditional
Time t1
Time t2
GIFT
Time t1
Time t2
GIFT’s coupon-based I/O bandwidth allocation appears
appealing, but…
GIFT’s coupon-based I/O bandwidth allocation appears
appealing, but…
What are the challenges?What are the favorable
characteristics?
GIFT Enablers
Repetitive runs
HPC applications run repeatedly, are frequent, and exhibit similar I/O behavior across different runs.
Low-periodicityRepetitive runs
GIFT EnablersHPC applications run repeatedly, are frequent, and exhibit similar I/O behavior across different runs.
HPC applications run repeatedly, are frequent, and exhibit similar I/O behavior across different runs.
Low-periodicityRepetitive runs Predictable I/O
GIFT Enablers
Parallel applications suffer from non-synchronous I/O progress leading to bandwidth waste.
Engaging
A A A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
MBW
A A A
B
DDC
B
OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
AA A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
waste
Significant variation in I/O finish time among MPI processes of the same
application.
GIFT Challenges
Need for synchronous I/O progress in parallel applications poses new challenges in
maintaining efficiency and fairness in I/O bandwidth allocation.
GIFT Challenges
Let’s look at some bandwidth allocation policies and compare them.
A A A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
MBW
A A A
B
DDC
B
OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
AA A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
A A A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
MBW
A A A
B
DDC
B
OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
AA A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
Per-OST Fair Share
A A A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
MBW
A A A
B
DDC
B
OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
AA A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
Per-OST Fair Share
A A A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
MBW
A A A
B
DDC
B
OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
AA A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
Per-OST Fair Share
FairNot synchronous
B/W waste
A A A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
MBW
A A A
B
DDC
B
OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
AA A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
A A A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
MBW
A A A
B
DDC
B
OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
AA A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
FairNot synchronous
B/W waste
Per-OST Fair Share
Basic Synchronous I/O Progress
A A A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
MBW
A A A
B
DDC
B
OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
AA A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
Per-OST Fair Share
Basic Synchronous I/O Progress
FairNot synchronous
B/W waste
A A A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
MBW
A A A
B
DDC
B
OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
AA A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / WFair
Not synchronousB/W waste
FairSynchronous B/W waste
Per-OST Fair Share
Basic Synchronous I/O Progress
FairNot synchronous
B/W waste
A A A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
MBW
A A A
B
DDC
B
OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
AA A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
Per-OST Fair Share
Basic Synchronous I/O Progress
Minimum Bandwidth Wastage
FairSynchronous B/W waste
FairNot synchronous
B/W waste
A A A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
MBW
A A A
B
DDC
B
OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
AA A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
Per-OST Fair Share
Basic Synchronous I/O Progress
Minimum Bandwidth Wastage
FairSynchronous B/W waste
Not FairSynchronous No B/W waste
Balances three goalsFairness Synchronous
I/O ProgressMinimize B/W
Wastage
Three Key Ingredients
Fairness
GIFT breaks away from instantaneous fairness and maintains fairness over a long time-window. Barter system for unfair treatment: award compute hours for unfairness in I/O bandwidth allocation. Concept of “System Compute Hour Regret Budget”
Three Key Ingredients
Synchronous I/O Progress
GIFT’s initial allocation is the same as BSIP scheme and any subsequent readjustments ensure that this property is preserved.
Three Key Ingredients
Minimize B/W Wastage
GIFT designs a “throttle-and-reward” mechanism that picks “throttle-friendly” applications, issues them coupons to reduce b/w waste at a given time, and “reward” them later (i.e., redeem their coupons).
GIFT Workflow
Determine Throttle-Friendly
Applications
Redeem Coupons
Issue Coupons to
Throttled Applications
Decrease Redemption
Rate
Perform BSIP Bandwidth Allocation
Allocate Bandwidth Optimally
Every Decision Instance
Increase Redemption
Rate
Whom to throttle?
Which coupons to redeem?
How much to throttle and expand?
Identifying Throttle-Friendly Applicationsü Careful design leads to
minimal system regret budget (compute hours given out due to unfair treatment in long term).
ü Throttle-friendly apps can also be expanded if deemed beneficial.
ü Set of throttle-friendly applications changes over time.N𝜏
N is the length of receding window. 𝜏 is the minimum redemption rate required for an app. to be throttle-eligible.
Initial redemption
Coupons issued
Coupons issued
Coupons issued
Coupons redeemed
Careful Coupon RedemptionGIFT redeems coupons only when it does not
require throttling other applications. Spare B/W available. A has an outstanding coupon worth 15%.
A (38%)
B (25%)
OST 1 OST 2
C (38%)
B (25%)
D (25%)
E (25%)
F (25%)
B (25%)
OST 1 OST 2
B (25%)
D (25%)
E (25%)
F (25%)A (42%)
B (25%)
OST 1 OST 2
C (33%)
B (25%)
D (25%)
E (25%)
F (25%)
C (33%)
A (33%)
Careful Coupon RedemptionGIFT redeems coupons only when it does not
require throttling other applications. Spare B/W available. A has an outstanding coupon worth 15%.
B/W can be divided equally, but GIFT
does not.
A (38%)
B (25%)
OST 1 OST 2
C (38%)
B (25%)
D (25%)
E (25%)
F (25%)
B (25%)
OST 1 OST 2
B (25%)
D (25%)
E (25%)
F (25%)A (42%)
B (25%)
OST 1 OST 2
C (33%)
B (25%)
D (25%)
E (25%)
F (25%)
C (33%)
A (33%)
Careful Coupon RedemptionGIFT redeems coupons only when it does not
require throttling other applications. Spare B/W available. A has an outstanding coupon worth 15%.
B/W can be divided equally, but GIFT
does not.
A (38%)
B (25%)
OST 1 OST 2
C (38%)
B (25%)
D (25%)
E (25%)
F (25%)
B (25%)
OST 1 OST 2
B (25%)
D (25%)
E (25%)
F (25%)A (42%)
B (25%)
OST 1 OST 2
C (33%)
B (25%)
D (25%)
E (25%)
F (25%)
C (33%)
A (33%)
Instead, GIFT (partially) redeems A’s coupon, but
w/o throttling C.
One may argue that if spare I/O bandwidth is available, applications
would have naturally been allocated that I/O bandwidth.
So, how does GIFT reduce wasted bandwidth?
Issue coupon worth 15% b/w on one OST to app. A
A (35%)
B (65%)
OST 1 OST 2
B (65%)
100%
0%
25%
50%
75%
B / W
Redeem app. A’s coupon with 9% b/w on one OST
A (42%)
B (25%)
OST 1 OST 2
C (33%)
B (25%)
D (25%)
E (25%)
F (25%)
Redeem app. A’s coupon with 6% b/w on one OST
A (39%)
B (25%)
OST 1 OST 2
C (36%)
B (25%)
D (25%)
E (25%)
F (25%)
Instance k1 Instance k2 Instance k3
A (50%)
B (50%)
OST 1 OST 2
B (50%)
100%
0%
25%
50%
75%
B / W
A (38%)
B (25%)
OST 1 OST 2
C (38%)
B (25%)
D (25%)
E (25%)
F (25%)
B (25%)
OST 1 OST 2
B (25%)
D (25%)
E (25%)
F (25%)A (38%)
C (38%)
GIFT
BSIP
Issue coupon worth 15% b/w on one OST to app. A
A (35%)
B (65%)
OST 1 OST 2
B (65%)
100%
0%
25%
50%
75%
B / W
Redeem app. A’s coupon with 9% b/w on one OST
A (42%)
B (25%)
OST 1 OST 2
C (33%)
B (25%)
D (25%)
E (25%)
F (25%)
Redeem app. A’s coupon with 6% b/w on one OST
A (39%)
B (25%)
OST 1 OST 2
C (36%)
B (25%)
D (25%)
E (25%)
F (25%)
Instance k1 Instance k2 Instance k3
A (50%)
B (50%)
OST 1 OST 2
B (50%)
100%
0%
25%
50%
75%
B / W
A (38%)
B (25%)
OST 1 OST 2
C (38%)
B (25%)
D (25%)
E (25%)
F (25%)
B (25%)
OST 1 OST 2
B (25%)
D (25%)
E (25%)
F (25%)A (38%)
C (38%)
GIFT
BSIP
Issue coupon worth 15% b/w on one OST to app. A
A (35%)
B (65%)
OST 1 OST 2
B (65%)
100%
0%
25%
50%
75%
B / W
Redeem app. A’s coupon with 9% b/w on one OST
A (42%)
B (25%)
OST 1 OST 2
C (33%)
B (25%)
D (25%)
E (25%)
F (25%)
Redeem app. A’s coupon with 6% b/w on one OST
A (39%)
B (25%)
OST 1 OST 2
C (36%)
B (25%)
D (25%)
E (25%)
F (25%)
Instance k1 Instance k2 Instance k3
A (50%)
B (50%)
OST 1 OST 2
B (50%)
100%
0%
25%
50%
75%
B / W
A (38%)
B (25%)
OST 1 OST 2
C (38%)
B (25%)
D (25%)
E (25%)
F (25%)
B (25%)
OST 1 OST 2
B (25%)
D (25%)
E (25%)
F (25%)A (38%)
C (38%)
GIFT
BSIP
Optimal I/O Bandwidth AllocationHow much to throttle and whom to expand by how much?Formulated as a linear programming optimization problemSubject to constraintso All I/O requests of an application
issued across all OSTs should get the same B/W for synch. I/O progress
o The final B/W allocation should be fairo All OSTs are constrained by their full
capacityj = 2 j = 3
A B C
max$"∈$
$%∈&"
𝑏%
Set of all OSTs
Set of all apps on OST j
B/w allocation of app i
j = 1
A D C A D
GIFT: A Coupon Based Throttle-and-Reward Mechanismfor Fair and Efficient I/O Bandwidth Management on Parallel Storage Systems
Tirthak PatelNortheastern University
Rohan GargNutanix
Devesh TiwariNortheastern University
AbstractLarge-scale parallel applications are highly data-intensive
and perform terabytes of I/O routinely. Unfortunately, on alarge-scale system where multiple applications run concur-rently, I/O contention negatively affects system efficiency andcauses unfair bandwidth allocation among applications. Toaddress these challenges, this paper introduces GIFT, a princi-pled dynamic approach to achieve fairness among competingapplications and improve system efficiency.
1 Introduction
Problem Space and Gaps in Existing Approaches. In-crease in computing power has enabled scientists to expeditethe scientific discovery process, but scientific applications pro-duce more and more analysis and checkpoint data, worseningtheir I/O bottleneck [7, 45]. Many applications spend 15-40%of their execution time performing I/O, which is expectedto increase for exascale systems [12, 15, 22, 31, 53, 55]. Un-fortunately, multiple concurrent applications on a large-scalesystem lead to severe I/O contention, limiting the usability offuture HPC systems [11, 45].
Recognizing the importance of the problem, there havebeen numerous efforts to mitigate I/O contention from bothI/O throughput and fairness perspectives [13, 14, 17, 25, 37,42, 75, 76, 78, 88, 89]. Unfortunately, ensuring fairness andmaximizing throughput are conflicting objectives, and it ischallenging to strike a balance between them under I/Ocontention. For parallel HPC applications, the side-effect ofI/O contention is further amplified because of the need forsynchronous I/O progress. HPC applications are inherentlytightly synchronized; during an I/O phase, MPI processesof an HPC application must wait for all processes to finishtheir I/O before resuming computation (i.e., synchronous I/Oprogress among MPI processes is required) [28,31,39,57,90].
MPI processes of an HPC application perform parallel I/Oaccess to multiple back-end storage targets (e.g., an arrayof disks) concurrently. These back-end storage targets areshared among concurrently running applications and havedifferent degree of sharing over time and hence, a varyinglevel of contention. A varying level of I/O contention atthe shared back-end parallel storage system makes differ-ent MPI processes progress at different rates and hence, leads
to non-synchronous I/O progress. In Sec. 2, we quantify non-synchronous I/O progress as a key source of inefficiency inshared parallel storage systems. It results in (1) wastage ofcompute cycles on compute nodes, and (2) reduction in effec-tive system I/O bandwidth (i.e., the bandwidth that contributestoward synchronous I/O progress), since full bandwidth is notutilized toward synchronous I/O progress.
Recent works have noted that non-synchronous I/Oprogress degrades application and system performances onmodern supercomputers like Mira, Edison, Cori, and Ti-tan [9, 31, 32, 39, 69, 83]. Thus, there is an emerging interestin improving the quality-of-service (QoS) of parallel stor-age systems [24, 80, 86]. Previous works have proposed rule-based or ad-hoc bandwidth allocation strategies for HPC stor-age [14, 17, 23, 36, 42, 88, 89]. However, existing approachesdo not systematically implement synchronous I/O progress tobalance the competing objectives: improving effective systemI/O bandwidth and improving fairness.
To bridge this solution gap, this paper describes GIFT, acoupon-based bandwidth allocation approach to ensure syn-chronous I/O progress of HPC applications while maximizingI/O bandwidth utilization and ensuring fairness among con-current applications on parallel storage systems.
Summary of the GIFT Approach. GIFT introduces twokey ideas: (1) Relaxing the fairness window: GIFT breaksaway from the traditional concept of instantaneous fairnessat each I/O request, and instead, ensures fairness over multi-ple I/O phases and runs of an application. This opportunityis enabled by exploiting the observation that HPC applica-tions have multiple I/O phases during a run and are highlyrepetitive, often exhibiting similar behavior across runs; and(2) Throttle-and-reward approach for I/O bandwidth alloca-tion: GIFT opportunistically throttles the I/O bandwidth ofcertain applications at times in an attempt to improve theoverall effective system I/O bandwidth (i.e., it minimizes thewasted I/O bandwidth that does not contribute toward syn-chronous I/O progress). GIFT’s throttle-and-reward approachintelligently exploits instantaneous opportunities to improveeffective system I/O bandwidth. Further, relaxing the fairnesswindow enables GIFT to reward the “throttled” application ata later point to ensure fairness.
More GIFT Design and Implementation Details
IT’S
IN
THE
PAPE
R!ü Mathematical formulation of throttle-friendly application selection
ü Balancing system regret budget vs. stability of throttling decisions
ü Details of bandwidth allocation optimization solution
ü Design parameters and their impact
ü GIFT prototype implementation details
Evaluation and Analysis
Experimental Methodology
FUSE-based prototype for
testbed-based evaluation
Testbed evaluation uses job characteristics from Stampede2, Mira and Theta supercomputers:
Number of nodes, compute time, amount of data I/O, I/O interval, job inter-arrival time, backfilling scheduling
strategy, etc.
Refer to the paper for more details and simulation-based set-up.
Min. B/W Waste(MBW)
Basic Synch-I/O Progress (BSIP)
Per-OST Fair Share (POFS)
Competing Strategies
A A A
B
DD
C
B
E
POFS BSIP
A A A
B
DC
B
E
D
OST 1 OST 2 OST 3OST 1 OST 2 OST 3
MBW
A A A
B
DDC
B
OST 1 OST 2 OST 3
100%
0%
25%
50%
75%
B / W
Throttle Randomly (RND)Throttle Small App (TSA)
Other selective-throttle/expand-focused heuristics
Throttle Most Frequent App (TMF)Expand Small App (ESA)
GIFT improves system I/O bandwidth, mean app IO time and runtime
GIFT real-system prototype improves the system bandwidth my more than 15% and app I/O time by more than 10%, compared to POFS.
GIFT’s fairness is comparable to BSIP and is much fairer than MBW
POFS is the baseline for fairness.
GIFT’s fairness is comparable to BSIP and is much fairer than MBW
POFS is the baseline for fairness. Avg. I/O time degradation for degraded apps is only
1.2% for GIFT
Simulation-based results confirm real-system prototype results
Simulation results show even larger improvements because (1) longer time window, and (2) larger system scale.
GIFT can even improve the overall system throughput.
GIFT is not inherently biased against certain types of I/O behaviors.
Applications with different I/O behaviors observe an improvement with GIFT
GIFT needs to award outstanding compute node hours for coupons which are not redeemed. GIFT can bound these hours
at a low-level even under pessimistic scenarios.
GIFT’s system regret budget needed to award outstanding hours is low(a) Mean App I/O Time (b) Mean App Runtime (c) Effective System I/O B/w (d) System Throughput
Figure 7: GIFT’s implementation provides improvement for both application- and system- level objectives (higher is better).
Scheduling Policies. We evaluate GIFT against seven com-peting I/O scheduling policies: Per-OST Fair Share (POFS),Basic Synchronous I/O Progress (BSIP), Minimum Band-width Wastage (MBW), Throttle Small Applications (TSA),Expand Small Applications (ESA), Throttle Most FrequentApplications (TMF), and Throttle Randomly (RND). POFS,BSIP, and MBW are implemented as discussed in Sec. 2. TSAattempts to increase the effective system bandwidth by throt-tling small applications, while ESA attempts to improve thesystem throughput by increasing the bandwidth allocation forlonger-running, smaller applications that generally do smallI/O [2, 4, 5]. We also compare against other simple, intuitivestrategies such as TMF and RND, which pick the “most fre-quently appearing” and “random” applications for bandwidththrottling, respectively. POFS is used as the baseline policy.
Objective Metrics. Application I/O Time is the amount oftime spent in I/O by an application during its run. ApplicationRun Time is the run time of the application. Effective SystemBandwidth is the average effective I/O bandwidth during therun of an application set, defined as overall system bandwidthminus the wasted bandwidth (Sec. 2). System Throughput isthe number of jobs completed per unit time.
GIFT’s real-system implementation provides betterapplication- and system- level performances. First, our re-sults show that GIFT outperforms all competing techniquessignificantly. Fig. 7 (a)-(d) show that GIFT performs better formean application I/O time, mean application runtime, effec-tive system bandwidth, and system throughput, respectively.The mean application I/O time with GIFT is 10% better thanwith POFS, and 3.5% better than the next best technique,BSIP. Interestingly, when applications are throttled based ontheir characteristics (TSA, ESA, and TMF), or are arbitrarilythrottled (RND), the performance remains similar to that ofBSIP. This shows that naïve, rule-based techniques cannotmatch the performance delivered by the GIFT approach.
GIFT also improves the effective system bandwidth bymore than 17% compared to POFS and other techniques, ex-cept MBW. Expectedly, MBW improves the effective systembandwidth the highest because it solely focuses on this metric.Next, we note that by compromising fairness one could designtechniques that solely focus on improving system throughput(e.g., favor small jobs). GIFT does not compromise fairness,
Figure 8: GIFT implementation bounds outstanding node-hoursusing application- and system-level redemption rate thresholds.
and it neither directly manipulates nor aims to improve thesystem job throughput, but by virtue of reducing I/O band-width waste and mean application I/O time, GIFT yields 2%improvement in system throughput. We note that even a smallimprovement in system throughput leads to large monetarysavings in operational cost of HPC systems [18, 71, 84].
Next, we recall that GIFT gives out compute node-hours asregret, but it is minimal compared to the system throughputimprovement it enables (2% savings in total compute node-hours). Fig. 8 shows that GIFT gave out less than 0.06% hoursof total compute node-hours from the system regret budget ina more than two-day long experimental run – this result showsthat application- and system-level redemption rate thresholdskeep the system regret budget under control. Even if one wereto award outstanding node-hours every day, GIFT would giveout only 0.12% of node-hours, which is much smaller thanthe gains in system throughput (2%); this trend is also latersupported by simulation results.
Next, we discuss the effectiveness of GIFT in terms of fair-ness. First, recall that the design of GIFT introduces two ideas:(1) opportunistically rewarding applications, and (2) compen-sating unfairness in I/O performance via additional computehours. These ideas do not naturally align with the traditionalnotion of fairness - where a scheme tends to distribute the“benefits” equally among all applications and the “currency”of fairness measurement remains the same. In contrast, GIFTis designed to distribute the benefit opportunistically amongapplications because, as discussed earlier, distributing the ben-efits equally among all applications leads to benefit (systembandwidth) wastage due to non-synchronous I/O progress.GIFT achieves fairness by compensating I/O unfairness withcompute resources. Therefore, GIFT’s performance cannotbe directly compared with POFS to establish its fairness ef-fectiveness. Nevertheless, we provide this comparison forcompleteness and to demonstrate that GIFT is not unfair.
Testbed evaluation Simulation evaluation
GIFT is open-sourced athttps://github.com/GoodwillComputingLab/GIFT
Where is my gift in all this?