Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis...
Transcript of Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis...
![Page 1: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/1.jpg)
CS639:DataManagementfor
DataScienceLecture2:StatisticalInferenceandExploratoryDataAnalysis
TheodorosRekatsinas1
![Page 2: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/2.jpg)
2
Announcements• Waitinglist:youreceiveinvitationstoregisterand
youhavetwodaystoreply.
• Piazza:youneedtoregistertoengageindiscussionsandreceiveannouncements
• Announcements:updatetheclasswebsite;announcementswillbepostedthere
![Page 3: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/3.jpg)
3
Firstassignment(P0)• CreateaGitHubaccountandclonethegithub
repositoryoftheclass.
• DeploytheclassVM(instructionsintheslidesofLecture1)
![Page 4: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/4.jpg)
Today’sLecture
1. QuickRecap:Thedatascienceworkflow
2. StatisticalInference
3. ExploratoryDataAnalysis• Activity:EDAinJupyter notebook
4
![Page 5: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/5.jpg)
1.QuickRecap:TheDSWorkflow
5
Section1
![Page 6: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/6.jpg)
Onedefinitionofdatascience
6
Section2
Datascienceisabroadfieldthatreferstothecollectiveprocesses,theories,concepts,toolsandtechnologies thatenablethereview,analysisandextractionofvaluableknowledgeandinformationfromrawdata.
Source:Techopedia
![Page 7: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/7.jpg)
Datascienceworkflow
7
Section2
https://cacm.acm.org/blogs/blog-cacm/169199-data-science-workflow-overview-and-challenges/fulltext
Whatiswronghere?
![Page 8: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/8.jpg)
Datascienceworkflow
8
Section2
Datascienceisnot(only)abouthacking!
![Page 9: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/9.jpg)
9
Yourmind-setshouldbe“statisticalthinkingintheageofbig-data”
![Page 10: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/10.jpg)
2.StatisticalInference
10
Section2
![Page 11: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/11.jpg)
Whatyouwilllearnaboutinthissection
1. UncertaintyandRandomnessinData
2. ModelingData
3. SamplesandDistributions
11
Section2
![Page 12: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/12.jpg)
• Datarepresentsthetraces ofreal-worldprocesses.• Thecollectedtracescorrespondtoasample ofthoseprocesses.
• Thereisrandomness anduncertainty inthedatacollectionprocess.
• Theprocessthatgeneratesthedataisstochastic (random).• Example:Let’stossacoin!Whatwilltheoutcomebe?Headsortails?Therearemanyfactorsthatmakeacointossastochasticprocess.
• Thesamplingprocessintroducesuncertainty.• Example:ErrorsduetosensorpositionduetoerrorinGPS,errorsduetotheanglesoflasertraveletc.
12
Section2
UncertaintyandRandomness
![Page 13: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/13.jpg)
• Datarepresentsthetraces ofreal-worldprocesses.
• Partofthedatascienceprocess:Weneedtomodel thereal-world.
• Amodelisafunction fθ(x)• x:inputvariables(canbeavector)• θ:modelparameters
13
Section2
Models
![Page 14: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/14.jpg)
• Datarepresentsthetraces ofreal-worldprocesses.
• Thereisrandomness anduncertainty inthedatacollectionprocess.
• Amodelisafunction fθ(x)• x:inputvariables(canbeavector)• θ:modelparameters
• Modelsshouldrelyonprobabilitytheorytocaptureuncertaintyandrandomness!
14
Section2
ModelingUncertaintyandRandomness
![Page 15: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/15.jpg)
15
Section2
ModelingExample
![Page 16: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/16.jpg)
16
Section2
ModelingExample
Themodelcorrespondstoalinearfunction
![Page 17: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/17.jpg)
17
Section2
PopulationandSamples
• Populationiscompletesetoftraces/datapoints.• USpopulation314Million,worldpopulationis7billionforexample• Allvoters,allthings
• Sampleisasubsetofthecompleteset(orpopulation).• Howweselectthesampleintroducesbiasesintothedata
• Populationèsampleèmathematicalmodel
![Page 18: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/18.jpg)
18
Section2
PopulationandSamples
• Example:EmailssentbypeopleintheCSdept.inayear.
• Method1:1/10ofallemailsovertheyearrandomlychosen
• Method2:1/10ofpeoplerandomlychosen;alltheiremailovertheyear
• Botharereasonablesampleselectionmethodforanalysis.
• Howeverestimationspdfs(probabilitydistributionfunctions)oftheemailssentbyapersonforthetwosampleswillbedifferent.
![Page 19: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/19.jpg)
19
Section2
BacktoModels
• Abstractionofarealworldprocess
• Howtobuildamodel?
• Probabilitydistributionfunctions(pdfs)arebuildingblocksofstatisticalmodels.
![Page 20: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/20.jpg)
20
Section2
ProbabilityDistributions
• Normal,uniform,Cauchy,t-,F-,Chi-square,exponential,Weibull,lognormal,etc.
• Theyareknownascontinuousdensityfunctions
• Foraprobabilitydensityfunction,ifweintegratethefunctiontofindtheareaunderthecurveitis1,allowingittobeinterpretedasprobability.
• Further,jointdistributions,conditionaldistributionsandmanymore.
![Page 21: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/21.jpg)
21
Section2
FittingaModel
• Fittingamodelmeansestimatingtheparametersofthemodel.• Whatdistribution,whatarethevaluesofmin,max,mean,stddev,etc.
• Itinvolvesalgorithmssuchasmaximumlikelihoodestimation(MLE)andoptimizationmethods.
• Example:y=β1+β2∗𝑥è y=7.2+4.5*x
![Page 22: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/22.jpg)
3.ExploratoryDataAnalysis
22
Section3
![Page 23: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/23.jpg)
Whatyouwilllearnaboutinthissection
1. IntrotoExploratoryDataAnalysis(EDA)
2. Activity:EDAinJupyter
23
Section3
![Page 24: Lecture 2 Stats - GitHub Pages · Lecture 2: Statistical Inference and Exploratory Data Analysis Theodoros Rekatsinas 1. 2 Announcements • Waiting list: you receive invitations](https://reader035.fdocuments.net/reader035/viewer/2022070113/605b5d5db743c9517c13dca9/html5/thumbnails/24.jpg)
24
Section3
Activity
• Notebooklinkprovidedonwebsite.