GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks
description
Transcript of GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks
GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks
Íñigo Goiri, Kien Le, Thu D. Nguyen,Jordi Guitart, Jordi Torres, and Ricardo Bianchini
2
Motivation• Datacenters consume large amounts of energy• Energy cost is not the only problem– Brown sources: coal, natural gas…
• Connect datacenters to green sources– Solar panels, wind turbines…– Green datacenter– Early examples in the field
3
Green datacenter• Energy sources
– Solar/wind: variable over time– Electrical grid: backup
• Mitigation approaches are not ideal– Batteries and net metering
• We need to match the energy demand to the supply
Power
Time
Load
Solar power
Workload
4
J3
J3
Delaying load within time bounds
J1 J2Nod
esPow
er
Time
Nod
esPow
er
Delay some jobs is OK (respecting time bounds)
J2
J2J1
5
Scheduling data-processing workloadsin green datacenters
• Data-processing jobs– Each task operates on a chunk of data– Data distributed among servers
• Simple workflow: MapReduce– Map tasks: process input data– Reduce tasks: merge maps’ outputs
Challenges• Match MapReduce workload with green energy availability
– No information on #nodes, length, power…• Conserve energy while ensuring data availability
Map1Map2Map3Map4Map5
Reduce
Reduce 6
7
Shuffle
6
Overview of GreenHadoop
• Predict solar energy availability• May delay jobs but must meet time bounds
– Maximize green energy use– If not enough green energy, minimize brown electricity cost– Brown energy cost + peak brown power cost
• Deactivate idle servers while keeping data available
• Divided into two parts1. Computation scheduling2. Data management
7
1. Computation scheduling
Job3Job1
Job4
Job5
Job6
Job2
Estimate the energy required by jobs (EWMA)
Job3Job1
Job4
Job5
Job6
Job2
8
1. Computation scheduling
Job3Job1
Job4
Job5
Job6
Job2
Power
TimeNow
Assign green energy first
Predict energy availability(weather forecast)
On-peakOff-peak Off-peak
9
1. Computation scheduling
Job3Job1
Job4
Job5
Job6
Job2
TimeNow
Assign cheap brown energy
Power
Previouspeak
On-peakOff-peak Off-peak
10
1. Computation scheduling
Job3Job1
Job4
Job5
Job6
Job2
TimeNow
Assign expensive energy
Power
Activeservers
On-peakOff-peak Off-peak
Current power → Active servers
11
1. Computation scheduling
TimeNow
Activeservers
Power
As time goes by…
the number of active servers changes
12
2. Data management• Deactivate servers to save energy
– Some data might become unavailable• Prior solution: covering subset [Leverich’09]
– Set of servers always running has ALL data
Covering subset
7
3
45
21 6
8
7 1
4 5
6
3
2
8 1
7 3
• Our approach• Only required data has to be available• We usually require fewer active servers
13
2. Data managementServer 1
1 72Active
Decommission
Down
Server 24
356
Server 3
46
Required fileNon-required file
Server 42
3 84
Server 5
3 67
JobA 4
JobB 5
JobC 1
6
Running queue:
14
2. Data management
Server 42
3 84
Server 5
3 67
Active
Decommission
Down
GreenHadoop (computation) requires only 2 servers
Server 1
1 72
Server 1
1 72
Server 24
356
Server 3
46
Required fileNon-required file JobA 4
JobB 5
JobC 1
6
Running queue:
15
2. Data management
Active
Decommission
Down
Move required files to Active servers
Server 1
1 72
Server 24
356
Server 3
46
1
Server 42
3 84
Server 5
3 67
ReplicateJobA 4
JobB 5
JobC 1
6
Running queue:
16
Server 1
1 72
2. Data management
Active
Decommission
Down
Decommissioned server can be sent to Down
Server 1
1 72
Server 24
356
Server 3
46
Required fileNon-required file
1
Server 42
3 84
Server 5
3 67
JobA 4
JobB 5
JobC 1
6
Running queue:
17
Server 1
1 72
2. Data management
Active
Decommission
Down
Jobs to be executed change → Required files change
Server 24
356
Server 3
46
Non-required file
1
Server 42
3 84
Server 5
3 67
JobA 4
JobB 5
JobC 1
6
JobD 8
Required file
646
4
648
Required file
Running queue:
18
Server 42
3 84
Server 1
1 72
2. Data management
Active
Decommission
Down
Make missing data available
Server 24
356
Server 42
3 84
Server 5
3 67
Server 3
46
1
Required file
Non-required file
JobB 5
JobC 1
JobD 8
Required fileRunning queue:
19
Server 42
3 84
Server 1
1 72
2. Data management
Active
Decommission
Down
Server 24
356
Server 42
3 84
Server 5
3 67
GreenHadoop (computation) requires 3 servers
Server 3
46
1
Non-required file
JobB 5
JobC 1
JobD 8
Required fileRunning queue:
20
Evaluation methodology• Cluster with 16 Xeon servers
– Hadoop and Hadoop turning off idle servers (EAHadoop)– GreenHadoop: green energy, brown electricity cost
• Energy profile– NJ electricity pricing (on/off peak and peak cost)– Solar farm energy availability (14 PV panels)– Five pairs of days (combinations of high and low days)
• Workload– Derived from Facebook [Zaharia’09]– Jobs with up to 37GB, 600 tasks, and 6 hours of length– Internal time bound of one day
21
Energy prediction vs actual
6:00 AM
7:00 AM
8:00 AM
9:00 AM
10:00 AM
11:00 AM
12:00 PM
1:00 PM
2:00 PM
3:00 PM
4:00 PM
5:00 PM
6:00 PM
7:00 PM
0.0
0.5
1.0
1.5
2.0PredictionActual
Ener
gy (k
Wh)
0 6 12 18 24 30 36 42 480
10
20
30
40
Hours ahead
Erro
r (%
)
rain thunderstormcloud cover
22
30 kWh59 kWh
$8.00
39 kWh25 kWh
$6.06 -24%
31% more green 39% cost savings
GreenHadoop for Facebook & high-high days
Greenconsumed
Brownconsumed
Brownprice
Greenpredicted
Greenproduced
23
Green energy increase Cost savings05
10152025303540
High-High High-Low Low-HighLow-Low Very Low
%
Green energy increase Cost savings05
10152025303540
EAHadoopGreenGreen & Brown EnergyGreen & Brown Energy & Brown Peak
%
Different pairs of days Effect of parameters inGreenHadoop
GreenHadoop for Facebook
24
Other results
• Workload intensity (datacenter utilization)• High-priority jobs• Shorter time bounds• Data availability• Workloads variations
• Consistent green energy increases and cost savings
25
Conclusions• Data-processing scheduler for green datacenters• Predicts green energy availability• Increases the use of green energy• Reduces brown electricity costs• Manages data availability
• We are building Parasol– Solar-powered μdatacenter– Poster session
GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks
Íñigo Goiri, Kien Le, Thu D. Nguyen,Jordi Guitart, Jordi Torres, and Ricardo Bianchini