Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students) Focused on performance...

28
Advanced Software Engineering PROJECT

Transcript of Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students) Focused on performance...

Page 1: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

Advanced Software Engineering

PROJECT

Page 2: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

1. MapReduce Join (2 students) Focused on performance analysis on different implementation of

join processors in MapReduce. Homogenization: add additional information about the source

of the data in the map phase, then do the join in the reduce phase.

Map-Reduce-Merge: a new primitive called merge is added to process the join separately.

Other implementation: the map-reduce execution plan for joins generated by Hive.

Generate 10+ figures/tables for comparisons.

Page 3: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

2. Social Network Structure Analysis (3-4 students) Learn existing classification and clustering algorithms Use both Google+ and Twitter social circle data http://snap.stanford.edu/data/egonets-Gplus.html http://snap.stanford.edu/data/egonets-Twitter.html Build a distributed computing platform on M/R or Spark Make use of Mahout/Mllib tools for data analysis, to discover the

unique characteristics of each social network Generate 10+ figures/tables for comparisons.

Bonus : compare M/R and Spark

Never use off-the-self softwares!!!

Page 4: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

3. Distributed Learning-to-Ranking Systems (3-4 students) Learn existing Pointwise, Pairewise, and Listwise learning-to-rank

algorithms Use Microsoft Learning to Rank Datasets http://research.microsoft.com/en-us/projects/mslr/ Build a distributed computing platform on either M/R, Storm, or

Spark Implement at least 3 algorithms Generate 10+ figures/tables for comparisons.

Bonus : compare M/R and Spark

Page 5: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

Mechanism Working in group: 2, OR, 3-4 students, clear roles Email me ([email protected]) by this Friday (Dec 19)

Team leader, Team members Topic

Deadline: 16 Jan 2015! Deliverable: project report in Chinese

Introduction (motivation, WHY?) Your proposal (HOW?) Performance Evaluation Conclusion

Presentation

Page 6: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

Suggested Arrangement Week-1: Define your roles and start literature

research Week-2 and 3: Propose solutions Week-4 and 5: Implementation and obtain results Finally, spend a few days writing your report

Page 7: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

Attention!! Not only an ENGIEERING project Train your research thinking What others have done? What are the research

gap? How to improve? Performance?

Accuracy, throughput, latency, etc. Compare to existing approaches

Make use of open-source frameworks What is YOUR CONTRIBUTION?

Page 8: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

IEEE Xplore: http://ieeexplore.ieee.org/

Page 9: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

http://dl.acm.org

Page 10: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

Social Network Analysis

Advanced Software Engineering

Page 11: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

Key PlayersHow to identify key/central nodes in

network

Page 12: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
Page 13: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
Page 14: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
Page 15: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
Page 16: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
Page 17: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
Page 18: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
Page 19: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

Cohesion How to characterize a network’s structure

Page 20: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
Page 21: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
Page 22: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
Page 23: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
Page 24: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
Page 25: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

Example Facebook: 5.8million users (2009), avr 5.73 degrees, max 12

degrees Twitter:

5.2 billion relationships, avr 4.67 degrees 50% users only 4 step away Almost everyone <5 steps For any 1,500 random users, 3.435 steps

Erdos Number: Collaborative distance through paper co-authoring

Page 26: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

Experiment: Forwarding Letters in US

Page 27: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

Example: Social Evolution data set by MIT Media Lab 80 undergraduates with smart devices, moving around the

campus. collects the phone usages and student locations from October

2008 to June 2009. phone usage:

3.15 million records of Bluetooth scans 3.63 million scans of WLAN access-points 61,100 call records 47,700 logged SMS events.

students provide offline, self-report answers related to their health habits, diet and exercise, weight changes, and political opinions during the presidential election campaign.

Page 28: Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.

Contact graph, only links of greater than 2,000 contacts between two students are shown. Bigger nodes indicate higher betweenness centrality value for the corresponding participants. Thicker edges indicate higher contact frequency between the connected nodes.