Algorithms for Data Science - people.scs.carleton.ca
Transcript of Algorithms for Data Science - people.scs.carleton.ca
![Page 1: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/1.jpg)
Algorithms for Data Science
Anil Maheshwari
[email protected] of Computer ScienceCarleton University
1
![Page 2: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/2.jpg)
Outline
Topics
Required Background
Blended Teaching
Course Evaluation
Help Me
2
![Page 3: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/3.jpg)
Topics
![Page 4: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/4.jpg)
Course Title
Algorithms for Data Science
Input Algorithm Output01000011101000010111001001101010 ⇒ ?01100101011101000110111101101110
3
![Page 5: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/5.jpg)
Topics
1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.
2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?
3. Locality-Sensitive Hashing
4. Geometric Approximation (Nearest Neighbors, Core Sets)
5. Dimensionality Reduction
6. Online Algorithms: Bipartite Matching, Adwords
7. Regret Minimization (Multiplicative Weight Algorithm)
8. Graph Partitioning
9. Linear Algebra: Eigenvalues + SVD
10. Recommendation Systems
11. Randomized Linear Algebra
12. . . .
4
![Page 6: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/6.jpg)
Topics
1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.
2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?
3. Locality-Sensitive Hashing
4. Geometric Approximation (Nearest Neighbors, Core Sets)
5. Dimensionality Reduction
6. Online Algorithms: Bipartite Matching, Adwords
7. Regret Minimization (Multiplicative Weight Algorithm)
8. Graph Partitioning
9. Linear Algebra: Eigenvalues + SVD
10. Recommendation Systems
11. Randomized Linear Algebra
12. . . .
4
![Page 7: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/7.jpg)
Topics
1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.
2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?
3. Locality-Sensitive Hashing
4. Geometric Approximation (Nearest Neighbors, Core Sets)
5. Dimensionality Reduction
6. Online Algorithms: Bipartite Matching, Adwords
7. Regret Minimization (Multiplicative Weight Algorithm)
8. Graph Partitioning
9. Linear Algebra: Eigenvalues + SVD
10. Recommendation Systems
11. Randomized Linear Algebra
12. . . .
4
![Page 8: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/8.jpg)
Topics
1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.
2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?
3. Locality-Sensitive Hashing
4. Geometric Approximation (Nearest Neighbors, Core Sets)
5. Dimensionality Reduction
6. Online Algorithms: Bipartite Matching, Adwords
7. Regret Minimization (Multiplicative Weight Algorithm)
8. Graph Partitioning
9. Linear Algebra: Eigenvalues + SVD
10. Recommendation Systems
11. Randomized Linear Algebra
12. . . .
4
![Page 9: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/9.jpg)
Topics
1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.
2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?
3. Locality-Sensitive Hashing
4. Geometric Approximation (Nearest Neighbors, Core Sets)
5. Dimensionality Reduction
6. Online Algorithms: Bipartite Matching, Adwords
7. Regret Minimization (Multiplicative Weight Algorithm)
8. Graph Partitioning
9. Linear Algebra: Eigenvalues + SVD
10. Recommendation Systems
11. Randomized Linear Algebra
12. . . .
4
![Page 10: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/10.jpg)
Topics
1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.
2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?
3. Locality-Sensitive Hashing
4. Geometric Approximation (Nearest Neighbors, Core Sets)
5. Dimensionality Reduction
6. Online Algorithms: Bipartite Matching, Adwords
7. Regret Minimization (Multiplicative Weight Algorithm)
8. Graph Partitioning
9. Linear Algebra: Eigenvalues + SVD
10. Recommendation Systems
11. Randomized Linear Algebra
12. . . .
4
![Page 11: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/11.jpg)
Topics
1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.
2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?
3. Locality-Sensitive Hashing
4. Geometric Approximation (Nearest Neighbors, Core Sets)
5. Dimensionality Reduction
6. Online Algorithms: Bipartite Matching, Adwords
7. Regret Minimization (Multiplicative Weight Algorithm)
8. Graph Partitioning
9. Linear Algebra: Eigenvalues + SVD
10. Recommendation Systems
11. Randomized Linear Algebra
12. . . .
4
![Page 12: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/12.jpg)
Topics
1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.
2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?
3. Locality-Sensitive Hashing
4. Geometric Approximation (Nearest Neighbors, Core Sets)
5. Dimensionality Reduction
6. Online Algorithms: Bipartite Matching, Adwords
7. Regret Minimization (Multiplicative Weight Algorithm)
8. Graph Partitioning
9. Linear Algebra: Eigenvalues + SVD
10. Recommendation Systems
11. Randomized Linear Algebra
12. . . .
4
![Page 13: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/13.jpg)
Topics
1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.
2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?
3. Locality-Sensitive Hashing
4. Geometric Approximation (Nearest Neighbors, Core Sets)
5. Dimensionality Reduction
6. Online Algorithms: Bipartite Matching, Adwords
7. Regret Minimization (Multiplicative Weight Algorithm)
8. Graph Partitioning
9. Linear Algebra: Eigenvalues + SVD
10. Recommendation Systems
11. Randomized Linear Algebra
12. . . .
4
![Page 14: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/14.jpg)
Topics
1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.
2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?
3. Locality-Sensitive Hashing
4. Geometric Approximation (Nearest Neighbors, Core Sets)
5. Dimensionality Reduction
6. Online Algorithms: Bipartite Matching, Adwords
7. Regret Minimization (Multiplicative Weight Algorithm)
8. Graph Partitioning
9. Linear Algebra: Eigenvalues + SVD
10. Recommendation Systems
11. Randomized Linear Algebra
12. . . .
4
![Page 15: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/15.jpg)
Topics
1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.
2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?
3. Locality-Sensitive Hashing
4. Geometric Approximation (Nearest Neighbors, Core Sets)
5. Dimensionality Reduction
6. Online Algorithms: Bipartite Matching, Adwords
7. Regret Minimization (Multiplicative Weight Algorithm)
8. Graph Partitioning
9. Linear Algebra: Eigenvalues + SVD
10. Recommendation Systems
11. Randomized Linear Algebra
12. . . .
4
![Page 16: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/16.jpg)
Topics
1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.
2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?
3. Locality-Sensitive Hashing
4. Geometric Approximation (Nearest Neighbors, Core Sets)
5. Dimensionality Reduction
6. Online Algorithms: Bipartite Matching, Adwords
7. Regret Minimization (Multiplicative Weight Algorithm)
8. Graph Partitioning
9. Linear Algebra: Eigenvalues + SVD
10. Recommendation Systems
11. Randomized Linear Algebra
12. . . .
4
![Page 17: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/17.jpg)
Our Focus
Warnings: First Offering - Online - Plans May Derail!Office may become inaccessible
We will try to understand
• Algorithmic Techniques
• Key Ideas
• Learn bit of Math
• High-level details
5
![Page 18: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/18.jpg)
Required Background
![Page 19: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/19.jpg)
Background
• Basic Data StructuresLists, Stacks, BST, Hashing, Searching and Sorting
See Pat Morin’s https://opendatastructures.org/
• Basic Probability TheoryExpectation, Variance, Concentration Bounds
Michiel Smid’s COMP 2804 Notes http://cglab.ca/~michiel/DiscreteStructures/
Joe Blitzstein https://projects.iq.harvard.edu/stat110/youtube
• Linear AlgebraEigenvalues and Eigenvectors, Null Space, Positive Definitive Matrices, QΛQT , SVD
Gilbert Strang
https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/
• Algorithmic TechniquesPseudocode, Correctness, Analysis (Recurrences + O,Ω-notation), Analysis of (randomized) Quicksort. Algorithms for
Connectivity, Strong Connectivity, MST, SSSP, Graph Traversal. Techniques: D& C, Greedy, Dynamic Programming.
6
![Page 20: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/20.jpg)
Blended Teaching
![Page 21: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/21.jpg)
Online Course
• Lectures (with recordings) on Tuesdays & Fridays from 06:30 AM EST.
• Timing may change when Day Light Saving Hours apply in March.
• E-mail: [email protected]
• Post queries during the lectures using the zoom chat.
7
![Page 22: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/22.jpg)
References
Useful References
1. Foundations of Data Science by Blum, Hopcroft and Kannan.https://www.cs.cornell.edu/jeh/book.pdf
2. Mining of Massive Data Sets http://www.mmds.org/
3. Notes on Algorithm Design on my web-pagehttp://people.scs.carleton.ca/~maheshwa/
4. Numerous research articles.
5. Pointers to useful Videos.
8
![Page 23: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/23.jpg)
Course Evaluation
![Page 24: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/24.jpg)
Evaluation Parameters
Principles
• Open-ended course by design
• Contents evolve
• Enjoyable learning experience
• Use zoom chat effectively
Evaluation Components
• Assignments
• Programming Assignments
• Tests
• On the spot quizzes!
9
![Page 25: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/25.jpg)
Help Me
![Page 26: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/26.jpg)
PLEASE, PLEASE, PLEASE, PLEASE
1. Post your questions on Zoom
2. Post useful links/references/ideas
3. Take initiative to learn
4. E-mail now if you want to learn some specific algorithmic technique(s)
5. Seek Help, Have Fun & Smile
10
![Page 27: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/27.jpg)
Season - SUMMER
11
![Page 28: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/28.jpg)
Season - FALL
12
![Page 29: Algorithms for Data Science - people.scs.carleton.ca](https://reader035.fdocuments.net/reader035/viewer/2022071312/62cd3467848f0b73ba4c0385/html5/thumbnails/29.jpg)
Season - WINTER
13