New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks
description
Transcript of New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks
![Page 1: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/1.jpg)
New Algorithms for Planning Bulk Transfervia Internet and Shipping Networks
Brian Cho Indranil GuptaUniversity of Illinois at
Urbana-Champaign
![Page 2: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/2.jpg)
2
Motivation: Ad-hoc Data Processing• Data-intensive research on OpenCirrus– Federated cloud: diverse geographic locations– Data scale of TBs
• Limited wide area bandwidth is a big bottleneck : Can take days or weeks to transfer over internet [Garfinkel 07]
• Success story: Washington Post– Hillary Clinton White House schedule
• Released as 17,481 pages non-searchable PDF images• Convert to searchable text and deliver to newsroom within the
same news cycle– Done within 26 hours with Amazon AWS
• Pay for bandwidth and computer usage
![Page 3: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/3.jpg)
3
• Pandora (People and networks moving data around)– First ever solution to transfer data cooperatively between
multiple sources with internet and shipping edges– Produce optimal transfer plans that obey time deadlines
and minimize dollar cost Better than internet-only and shipping-only strategies
Bulk Transfer Options• Internet Transfer
– Grid: [GridFTP]– PlanetLab: [CoBlitz 06]
• Disk Shipping Transfer– [Jim Gray 03]– [PostManet 04]– [DOT 06]– Amazon AWS Import/Export
![Page 4: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/4.jpg)
4
5-20 Mbps 1TB: 5-20 days
Data Source (Illinois)
Option 1: Internet Transfer
ComputationProvider
(Amazon)
Data Source(CMU)
$0.10 per GB
No Cost
![Page 5: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/5.jpg)
5
Disk Interface 40 MB/s
Overnight: $60 per DiskTwo-Day: $30 per DiskGround: $10 per Disk
Data Source(Illinois)
Option 2: Disk Shipping Transfer
ComputationProvider
(Amazon)
Data Source(CMU)
Overnight: $50 per DiskTwo-Day: $25 per Disk
Ground: $5 per Disk
$0.02 per GB$80 per Disk
Overnight: $40 per DiskTwo-Day: $15 per Disk
Ground: $5 per Disk
![Page 6: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/6.jpg)
6
Cooperative Transfer Solutions
• Good solutions– Meet deadlines– Minimize dollar cost
• Complexity– Global scale– Many strategies– Collaboration helps
• How to find the best solution?
Open Cirrus Sites
![Page 7: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/7.jpg)
7
15 Days
DataSource A
No Cost
DataSource B
Example: Minimize Dollar Cost
CloudService
Provider
0.8 TB
1.2 TBLoading: $40Handling: $80
Total Cost: $125Total Time: 20 Days
5 Days .
Ground: $5 14 hours
![Page 8: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/8.jpg)
8
DataSource A
1 Day
Overnight: $40
DataSource B
Example: Meet Deadline (3 days)while Minimizing Dollar Cost
CloudService
Provider
0.8 TB
1.2 TBLoading: $40Handling: $80
Total Cost: $210Total Time: 3 Days
1 Day .
Overnight: $50 . 14 hours
6 hours
![Page 9: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/9.jpg)
9
Outline
• Motivation• Problem Formulation– Graph Model– Flow Over Time
• Solution: Pandora• Experimental Results• Conclusion
![Page 10: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/10.jpg)
Graph Model: Internet Links
10
inet_out
inet_in
inet_out
inet_in
Incoming/Outgoing BW
Capacity (Mb/s)Cost ($/GB)Transit time (almost instantaneous)
Site A Site B
![Page 11: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/11.jpg)
Graph Model: Shipment Links
11
inet_out
inet_in
ship_in
inet_out
inet_in
ship_in
Incoming/Outgoing BW
Disk Interface BW e.g., 40 MB/sCost: Loading ($/GB)
Capacity (Mb/s)Cost ($/GB)Transit time (almost instantaneous)
Capacity (almost infinite)Cost: Shipping and Handling ($/Disk)Transit time (Hrs)
Site A Site B
![Page 12: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/12.jpg)
12
Data Transfer Over Time
• Goal: Meet time deadline T while minimizing dollar cost C
• Hard problem on graph with both Internet and Shipment links– NP-Hard– Formal problem and proof in paper
• Solution: Pandora computes optimal and approximate solutions
![Page 13: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/13.jpg)
13
Solution: Pandora Overview
• Transform into static time-expanded network– Decomposition of shipping edges
• Solve min-cost flow on static network– Mixed Integer Program– Optimizations to reduce computation time
![Page 14: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/14.jpg)
14
Time-expanded Network• Intuitively, incorporate time
into graph to create an extended graph representation
• Make T=deadlinecopies of each vertex
• Draw edges according to transit time
• Draw holdover edges
• [Ford Fulkerson 58]• Disk shipment represented as
time-expanded network
τ = 1τ = 3
T = 5
time
![Page 15: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/15.jpg)
15
Decomposed Shipping Edges• Decompose shipping
edges to fixed cost edges1. Transit time2. Fixed cost3. Capacity
cost = $130
capacity = 2 TB
cost = $110
capacity = 2 TB
cost = $100 cap = 2 TB
![Page 16: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/16.jpg)
16
• Fixed-cost edges make min-cost flow calculation NP-Hard• Mixed-Integer Program (MIP)
– Binary variable ye defined on fixed-cost edges
• Goal: Minimize dollar cost• Subject to– Capacity constraints (flowe ≤ capacitye ∙ ye)– Conservation of flow– Demands of sources and sink
• Proof of NP-Hardness and formal MIP in paper
Solution: Min-cost Flow Calculation using Mixed-Integer Program
![Page 17: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/17.jpg)
17
Optimizations: Overview
• Size of MIP grows linearly with deadline T– Worst-case running time grows exponentially with T
• Reduce size of the MIP– Reduce number of shipment edges– Δ -condensed time-expanded networks
• More optimizations in paper
![Page 18: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/18.jpg)
18
Optimizations: Reduce numberof shipment edges
• Can remove redundant shipment edges
• Example:– Overnight shipment sent
anytime before 4pm will arrive at destination at 8am
8am
4pm
3pm
2pm
1pm
noon
7am
![Page 19: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/19.jpg)
19
Optimization: Δ-condensedTime-expanded Network
• Each batch of consecutive Δ time units condensed into one virtual time unit
• Solution has– Minimum cost– Deadline approximation
depending on Δ• More details in paper• [Fleischer Skutella 07] Δ = 2
![Page 20: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/20.jpg)
20
Experimental Setup
• Trace-driven– Wrote scripts to communicate with FedEx web
services: queried package rates and destination time
– Internet BW from PlanetLab measurements• GNU Linear Programming Kit (GLPK)
![Page 21: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/21.jpg)
21
Experimental Results:8 sources, 0.25 TB per node, Heterogeneous BW
• Direct Internet– Cost: $200– Time: 280 hrs– Cannot take
advantage of heterogeneous bandwidth
• Direct Overnight– Cost: $1,500– Time: 38 hrs– Cannot fill disks
to capacity
2 3 4 5 61
78
t 0.25 TBx 8Width proportional to BW
![Page 22: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/22.jpg)
22
Experimental Results:8 sources, 0.25 TB per node, Heterogeneous BW
12 3
45
8 t7
6
1.92 TB0.14 TB
0.06 TB 0.08 TB
• Direct Internet– Cost: $200– Time: 280 hrs– Cannot take
advantage of heterogeneous bandwidth
• Direct Overnight– Cost: $1,500– Time: 38 hrs– Cannot fill disks
to capacity
• Pandora Deadline=96hrs– Cost: $183– Time: < 96 hrs
![Page 23: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/23.jpg)
23
Experimental Results: Optimizations• Reducing shipment edges
decreases computation time• Using Δ-condensed time-expanded
networks decreases computation time– Deadlines met in our experiments
2 sources 1 source
![Page 24: New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks](https://reader036.fdocuments.net/reader036/viewer/2022062410/56816122550346895dd07b5b/html5/thumbnails/24.jpg)
24
Conclusion
• First ever solution to transfer data cooperatively between multiple sources with internet and shipping edges
• Produce optimal transfer plans that obey time deadlines and minimize dollar costBetter than internet-only and shipping-only
strategies• Reasonable computation time by using
optimizations