Map-Reduce Design Patterns - GitHub Pages
Transcript of Map-Reduce Design Patterns - GitHub Pages
![Page 1: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/1.jpg)
Venkatesh Vinayakarao (Vv)
Map-Reduce Design Patterns
Venkatesh [email protected]
http://vvtesh.co.in
Chennai Mathematical Institute
https://vvtesh.sarahah.com/
Finding patterns is the essence of wisdom. – Dennis Prager
![Page 2: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/2.jpg)
I have a story for you!Patterns here, Patterns there,
Patterns, Patterns everywhere…
![Page 3: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/3.jpg)
Are beauty and quality objective?
• Can we agree some things are beautiful and some are not?
235
![Page 4: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/4.jpg)
Christopher Alexander Asked… What is a good design?
236
Are beauty and quality objective?
What makes a good architectural design?
![Page 5: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/5.jpg)
Good Design
237
Beauty can be objectively measured.
Example: Symmetry is good
Cultural Anthropology: Within a culture, individuals agree what is good design, what is beautiful.
![Page 6: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/6.jpg)
238
Patterns
• Good design structures had similarities between them.
• Alexander called these similarities patterns.
• "Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over..."
Christopher Alexander, A Pattern Language: Towns/Buildings/Construction, 1977
A pattern is a solution to a problem in a context.
![Page 7: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/7.jpg)
239
A question
• Can you tell me one design that is absolutely symmetrical?
Equivalent ideas exist in software design
![Page 8: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/8.jpg)
240
Good Design
• What according to you are the two biggest factors that determine a good/bad design?
![Page 9: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/9.jpg)
241
Good and Bad Design
• What are the commonalities in what is viewed as good (and what is viewed as bad)?• A software system that is easy to maintain is considered
good• A fragile software system is considered bad
• A software system that is easy to understand is considered good• Obfuscated “spaghetti code” is considered bad
![Page 10: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/10.jpg)
242
Quiz
• Have you come across software patterns? Can you give one example?
![Page 11: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/11.jpg)
Singleton Pattern
243
public class CMI{
private static final CMI instance = new CMI();
private CMI() {
// private constructor }
public static CMI getInstance(){ return instance;
}
}
![Page 12: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/12.jpg)
Factory Pattern
244Source: https://dzone.com/articles/creational-design-pattern-series-factory-method-pa
![Page 13: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/13.jpg)
For More on Design Patterns
245
We shall now look at some Map-Reduce design patterns.
![Page 14: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/14.jpg)
Recap
246
Map
Re
du
ce
Shu
ffle
an
d S
ort
Map-Reduce Model
Hadoop Architecture
![Page 15: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/15.jpg)
Map-Reduce Processing
247
< Hello, 1>
< World, 1>
< Bye, 1>
< World, 1>
< Hello, 1>
< Hadoop, 1>
< Goodbye, 1>
< Hadoop, 1>
< Bye, 1>
< Hadoop, 1>
< Hadoop, 1>
< Hello, 1>
< Hello, 1>
< GoodBye,1>
< World, 1>
< World, 1>
< Bye, 1>
< Hadoop, 2>
< Hello, 3>
< GoodBye,1>
< World, 2>
Shu
ffle
an
d S
ort
Map
pe
r Re
du
ce
1 2 3
Combiner can be used to summarize locally per mapper.
Map
pe
r
Par
titi
on
er
![Page 16: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/16.jpg)
Input Splits
248
Note that a remote read may be required at block boundaries.
![Page 17: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/17.jpg)
A Hadoop Map-Reduce Developer
• Writes the “map” code
• Writes the “reduce” code
• Submits the map and reduce code to Hadoop framework.
249
job
Submit
Job Tracker
@Namenode, which datanodeshave the data?
![Page 18: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/18.jpg)
Submitting a Map-Reduce Job
hadoop jar
/usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input/usr/joe/wordcount/output
250
![Page 19: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/19.jpg)
Mapper
251
See https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html for details.
![Page 20: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/20.jpg)
Reducer
252
![Page 21: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/21.jpg)
Create a Job
253
![Page 22: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/22.jpg)
Submit Job to Hadoop
254
![Page 23: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/23.jpg)
Output
255
![Page 24: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/24.jpg)
Readings
256
![Page 25: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/25.jpg)
How Will You Implement These With Map Reduce?• Min/Max
• Average
• Count
• Median
• Filtering
• Top 10
• Convert key-values to hierarchy
• Partioning
• Sorting
257
![Page 26: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/26.jpg)
Summarization Pattern
258
A partitioner controls the logical grouping keys of the intermediate map output.
![Page 27: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/27.jpg)
Min/Max/Count
259
![Page 28: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/28.jpg)
Average
260
![Page 29: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/29.jpg)
Flex Your Brain!
• How will you compute the median?
261
Refer to Chapter 2 of MR Design Patterns Book.
![Page 30: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/30.jpg)
Counting Mappers
262
Global counters belong to job-tracker. Use responsibly.
![Page 31: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/31.jpg)
Filtering
263
No Reducer
Required.
![Page 32: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/32.jpg)
Filter Example
264
![Page 33: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/33.jpg)
Top 10 Pattern
• How will you determine the top 10 numbers in petabytes of numbers?
265
![Page 34: Map-Reduce Design Patterns - GitHub Pages](https://reader031.fdocuments.net/reader031/viewer/2022012101/6169e9f711a7b741a34cc35f/html5/thumbnails/34.jpg)
Structure to Hierarchy
266
How to store this in RDBMS?