Thanks to Jure Leskovec, Anand Rajaraman, Jeff...
Transcript of Thanks to Jure Leskovec, Anand Rajaraman, Jeff...
![Page 1: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/1.jpg)
Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University - http://www.mmds.org
![Page 2: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/2.jpg)
![Page 3: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/3.jpg)
what is data analysis?
a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering
useful information, informing conclusions, and supporting decision-making
![Page 4: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/4.jpg)
![Page 5: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/5.jpg)
Data contains value and knowledge
![Page 6: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/6.jpg)
But to extract the knowledge data needs to be
Stored
Managed
Analyzed emphasis on this class
Visualized emphasis on this class
Data Analytics ≈ Data Mining ≈ Big Data ≈ Predictive Analytics ≈ Data Science
![Page 7: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/7.jpg)
Growing market revenue of Big Data in billion U.S. dollars from the year 2011 to 2027
https://www.edureka.co/blog/what-is-big-data/
![Page 8: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/8.jpg)
Given lots of data Discover patterns and models that are:
Valid: hold on new data with some certainty
Useful: should be possible to act on the item
Unexpected: non-obvious to the system
Understandable: humans should be able to interpret the pattern
![Page 9: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/9.jpg)
Descriptive methods
Find human-interpretable patterns that describe the data
Example: Clustering (e.g., find communities of interest)
Predictive methods
Use some variables to predict unknown or future values of other variables
Example: Recommendations (e.g., suggest new friends in a social network)
![Page 10: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/10.jpg)
A risk with “Data analysis” is that an analyst can “discover” patterns that are meaningless
Statisticians call it Bonferroni’s principle:
Roughly, if you look in more places for interesting patterns than your amount of data will support, you are bound to find crap
![Page 11: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/11.jpg)
Data analysis overlaps with: Databases: Large data, simple queries
Machine learning: Large data, complex models
CS Theory: (Randomized) Algorithms Different cultures: To a DB person, data analysis is an extreme form of
analytic processing – queries that examine large amounts of data Result is the query answer
To a ML person, data analysis is the inference of models Result is the parameters of the model
Machine Learning
CSTheory
Data Analysis
Database systems
![Page 12: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/12.jpg)
This class stresses more on
Data analysis of network data (graph model)
Data analysis of high-dimensional data
Data visualization principles & examples
![Page 13: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/13.jpg)
We will refer to different types of data:
Data is high dimensional
Data is a graph
Data is infinite/never-ending
Data is labeled
We will refer to different models of computation:
Distributed (MapReduce)
Streams and online algorithms
Single machine in-memory
![Page 14: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/14.jpg)
Scalability
Streaming
Context
Quality
Usage
![Page 15: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/15.jpg)
How do you want that data?
![Page 16: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/16.jpg)
![Page 17: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/17.jpg)
Website http://www.eecs.yorku.ca/~papaggel/courses/eecs6414/
Piazza Q&A website: Available from the website
https://piazza.com/yorku.ca/winter2020/eecs6414
You need to register with your yorku.ca email
Please participate and help each other!
e-mail for personal issues: [email protected]
![Page 18: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/18.jpg)
Project-focused course
![Page 19: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/19.jpg)
But, you need to:identify a problemfind dataprepare data for analysis create visualizations for data explorationuncover insightscommunicate critical findingscreate data-driven solutions
+ team-work (up to 3 people)
No final exam, no assignments
![Page 20: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/20.jpg)
Need for data collectionNeed for data storageNeed for data analysisNeed for data visualization
Collection Storage Analysis Visualization
…but, more of an iterative process than a sequence
![Page 21: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/21.jpg)
www.kaggle.com
~25,000 datasets
![Page 22: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/22.jpg)
Text Data Multivariate DataNetwork Data
![Page 23: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/23.jpg)
Milestone Weight
Project proposal 10%
Project midterm report 20%
Project midterm in-class presentation 10%
Project final report 40%
Project final in-class presentation 20%
+ project report in research paper format+ demo (if applicable)
![Page 24: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/24.jpg)
… a number oflectures
![Page 25: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/25.jpg)
High-dimensional Data Analysis (~3 lectures)Frequent itemsets (a-priori algorithm, PCY algorithm, SON algorithm); finding similar items (shingling, min-hashing, locality-sensitive hashing (LSH)); High dimensionality clustering (BFR algorithm, CURE algorithm)
Network Analysis (~5 lectures)basic graph theory, network measurements, network models, community detection, link analysis & link prediction
Data Visualization (~2 lectures)Value of visualization, visual variables, cognition and perception, colors, pre-attentive vs attentive processing, visual metaphors, taxonomy of visualization, visualizations of qualitative and quantitative data
![Page 26: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/26.jpg)
Data Analysis and Visualization
Visualization Tools &
Interaction
Data
Mining
Probability & Statistics
Exploratory Data
Analysis
Graph Mining
![Page 27: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/27.jpg)
Current interest in DAVYou are currently working on an interesting DAV project
Continuous interest in DAVYou worked on an interesting DAV project before (BSc thesis, MSc thesis, etc.) and would like to further expand it
Potential interest in DAVYou are interested to work on a DAV project and looking for inspirations
![Page 28: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/28.jpg)
Data Analytics
+ tools for data analytics
![Page 29: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/29.jpg)
+ tools for visualization of high-dimensional data
Data Visualization
![Page 30: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/30.jpg)
Item Comment
Classes Mon @ 16:00-19:00
Classroom Accolade East Building 013 (ACE 013)
Course group 3
Credits 3
Websitehttp://www.eecs.yorku.ca/~papaggel/c
ourses/eecs6414/
Office hourDrop anytime by my office (LAS3050)
or by appointment
![Page 31: Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullmanpapaggel/courses/eecs6414/docs/lectures/01-i… · Data Visualization (~2 lectures) Value of visualization, visual variables,](https://reader033.fdocuments.net/reader033/viewer/2022050602/5faa082b9502a823586d600e/html5/thumbnails/31.jpg)
Algorithms
Basic data structures, dynamic programming, …
Basic probability & linear algebra
Moments, typical distributions, MLE, …
Programming
Your choice, but Python/C++/Java will be very useful
It’s going to be fun and hard work.