Project Matsu: Elastic Clouds for Disaster Relief
-
Upload
robert-grossman -
Category
Technology
-
view
2.151 -
download
1
description
Transcript of Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief
Collin Bennett, Robert Grossman, Yunhong Gu, and Andrew Levine
Open Cloud ConsortiumJune 21, 2010
www.opencloudconsortium.org
Project Matsu Goals
• Provide persistent data resources and elastic computing to assist in disasters:– Make imagery available for disaster relief workers– Elastic computing for large scale image processing– Change detection for temporally different and
geospatially identical image sets• Provide a resource to test standards and
interoperability studies large data clouds
Part 1:Open Cloud Consortium
• 501(3)(c) Not-for-profit corporation• Supports the development of standards,
interoperability frameworks, and reference implementations.
• Manages testbeds: Open Cloud Testbed and Intercloud Testbed.
• Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.
• Develops benchmarks.
4www.opencloudconsortium.org
OCC Members
• Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo
• Universities: CalIT2, Johns Hopkins, Northwestern Univ., University of Illinois at Chicago, University of Chicago
• Government agencies: NASA• Open Source Projects: Sector Project
5
Operates Clouds
• 500 nodes• 3000 cores• 1.5+ PB• Four data centers• 10 Gbps• Target to refresh 1/3
each year.
• Open Cloud Testbed• Open Science Data Cloud• Intercloud Testbed• Project Matsu: Cloud-
based Disaster Relief Services
Open Science Data Cloud
7
Astronomical dataBiological data (Bionimbus)
Networking dataImage processing for disaster relief
Focus of OCC Large Data Cloud Working Group
8
Cloud Storage Services
Cloud Compute Services (MapReduce, UDF, & other programming frameworks)
Table-based Data Services
Relational-like Data Services
App App App App App
App App
App App
• Developing APIs for this framework.
Tools and Standards
• Apache Hadoop/MapReduce• Sector/Sphere large data cloud• Open Geospatial Consortium
– Web Map Service (WMS)
• OCC tools are open source (matsu-project)– http://code.google.com/p/matsu-project/
Part 2: Technical Approach
• Hadoop – Lead Andrew Levine• Hadoop with Python Streams – Lead Collin
Bennet• Sector/Sphere – Lead Yunhong Gu
Implementation 1: Hadoop & Mapreduce
Andrew Levine
Image Processing in the Cloud - MapperMapper Input Key: Bounding Box
Mapper Input Value:
Mapper Output Key: Bounding BoxMapper Output Value:
Mapper resizes and/or cuts up the originalimage into pieces to output Bounding Boxes
(minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5)
Step 1: Input to Mapper
Step 2: Processing in Mapper Step 3: Mapper Output
Mapper Output Key: Bounding BoxMapper Output Value:
Mapper Output Key: Bounding BoxMapper Output Value:
Mapper Output Key: Bounding BoxMapper Output Value:
Mapper Output Key: Bounding BoxMapper Output Value:
Mapper Output Key: Bounding BoxMapper Output Value:
Mapper Output Key: Bounding BoxMapper Output Value:
Mapper Output Key: Bounding BoxMapper Output Value:
+ Timestamp
+ Timestamp
+ Timestamp
+ Timestamp
+ Timestamp
+ Timestamp
+ Timestamp
+ Timestamp
+ Timestamp
Image Processing in the Cloud - ReducerReducer Key Input: Bounding Box
(minx = -45.0 miny = -2.8125 maxx = -43.59375 maxy = -2.109375)Reducer Value Input:
Step 1: Input to Reducer
… …
Step 2: Process difference in Reducer
Assemble Images based on timestamps and compare Result is a delta of the two Images
Step 3: Reducer Output
All images go to different map layers set of images for display in WMS
Timestamp 1Set
Timestamp 2Set
Delta Set
Implementation 2: Hadoop & Python Streams
Collin Bennett
Preprocessing Step• All images (in a batch to be processed) are
combined into a single file.
• Each line contains the image’s byte array transformed to pixels (raw bytes don’t seem to work well with the one-line-at-a-time Hadoop streaming paradigm).
geolocation \t timestamp | tuple size ; image width ; image height; comma-separated list of pixels
the fields in red are metadata needed to process the image in the reducer
Map and Shuffle
• We can use the identity mapper• All of the work for mapping was
done in the pre-process step• Map / Shuffle key is the geolocation• In the reducer, the timestamp will be
1st field of each record when splitting on ‘|’
Implementation 3: Sector/SphereYunhong Gu
Sector Distributed File System
• Sector aggregate hard disk storage across commodity computers– With single namespace, file system level reliability
(using replication), high availability• Sector does not split files
– A single image will not be split, therefore when it is being processed, the application does not need to read the data from other nodes via network
– A directory can be kept together on a single node as well, as an option
Sphere UDF
• Sphere allows a User Defined Function to be applied to each file (either it is a single image or multiple images)
• Existing applications can be wrapped up in a Sphere UDF
• In many situations, Sphere streaming utility accepts a data directory and a application binary as inputs
• ./stream -i haiti -c ossim_foo -o results
For More Information