CSci 5707, Fall 2013 MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington University of...
-
Upload
mervin-joseph -
Category
Documents
-
view
224 -
download
0
Transcript of CSci 5707, Fall 2013 MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington University of...
![Page 1: CSci 5707, Fall 2013 MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington University of Minnesota.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649e5c5503460f94b547f3/html5/thumbnails/1.jpg)
CSci 5707, Fall 2013
MapReducevs.
Parallel DBMS
Hamid Safizadeh, Otelia Buffington
University of Minnesota
![Page 2: CSci 5707, Fall 2013 MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington University of Minnesota.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649e5c5503460f94b547f3/html5/thumbnails/2.jpg)
2
MapReduce Idea
Mapping
map (k1, v1)
list (k2, v2)
Reducing
reduce (k2, list(v2))
list (v2)
Pseudo-code for counting the number of occurrences of each word in a large collection of documents
Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clustering, OSDI’08
![Page 3: CSci 5707, Fall 2013 MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington University of Minnesota.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649e5c5503460f94b547f3/html5/thumbnails/3.jpg)
3
MapReduce Example
Calculation of the number of occurrences of each word
http://aimotion.blogspot.com/2010/08/mapreduce-with-mongodb-and-python.html
![Page 4: CSci 5707, Fall 2013 MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington University of Minnesota.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649e5c5503460f94b547f3/html5/thumbnails/4.jpg)
4
MapReduce Architecture
Execution overview
Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clustering, OSDI’08
![Page 5: CSci 5707, Fall 2013 MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington University of Minnesota.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649e5c5503460f94b547f3/html5/thumbnails/5.jpg)
5
MapReduce or Parallel DBMS
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., and
Stonebraker, M., “A comparison of approaches to large-scale data analysis”,
ACM SIGMOD International Conference, 2009
(http://database.cs.brown.edu/projects/mapreduce-vs-dbms)
Dean, J., and Ghemawat, S., “MapReduce: A flexible data processing tool”,
Communications of the ACM, Vol. 53, 2010 (DOI: 10.1145/1629175.1629198)
![Page 6: CSci 5707, Fall 2013 MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington University of Minnesota.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649e5c5503460f94b547f3/html5/thumbnails/6.jpg)
MapReduce Design Properties
6
Heterogeneous Systems Processing and combining data from a wide variety of storage systems
(such as relational databases, file systems, etc.)
Fault Tolerance Providing fine-grain fault tolerance for large jobs (Failure in middle of a
multi-hour execution does not require restarting the job from scratch)
Complex Functions Simple Map and Reduce functions with straightforward SQL equivalents
Offering a better framework for some complicated tasks
Jeffrey Dean and Sanjay Ghemawat, MapReduce: A Flexible Data Processing Tool, Communications of the ACM, Vol. 53, 2010
![Page 7: CSci 5707, Fall 2013 MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington University of Minnesota.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649e5c5503460f94b547f3/html5/thumbnails/7.jpg)
MapReduce Design Properties
7
Performance Loading data: Startup overhead for MapReduce
Reading data: Full scan over large data files
Merging results: A MapReduce as the next consumer
Jeffrey Dean and Sanjay Ghemawat, MapReduce: A Flexible Data Processing Tool, Communications of the ACM, Vol. 53, 2010
Cost Hardware: Network workstations
Software: Open source (Hodoop)
Communication: Network system
![Page 8: CSci 5707, Fall 2013 MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington University of Minnesota.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649e5c5503460f94b547f3/html5/thumbnails/8.jpg)
Companies Using Hodoop
8
Facebook Yahoo! Google Amazon Twitter