What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... ·...

Post on 30-Mar-2021

1 views 0 download

Transcript of What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... ·...

What is Probability and Statistics andWhy Should You Care?

CS 3130: Probability and Statistics for Engineers

August 26, 2014

What is Probability?

DefinitionProbability theory is the study of the mathematicalrules that govern random events.

But what is randomness?

Informally, a random event is an event in which we donot know the outcome without observing it.

Probability tells us what we can say about such events,given our assumptions about the possible outcomes.

What is Probability?

DefinitionProbability theory is the study of the mathematicalrules that govern random events.

But what is randomness?

Informally, a random event is an event in which we donot know the outcome without observing it.

Probability tells us what we can say about such events,given our assumptions about the possible outcomes.

What is Probability?

DefinitionProbability theory is the study of the mathematicalrules that govern random events.

But what is randomness?

Informally, a random event is an event in which we donot know the outcome without observing it.

Probability tells us what we can say about such events,given our assumptions about the possible outcomes.

What is Probability?

DefinitionProbability theory is the study of the mathematicalrules that govern random events.

But what is randomness?

Informally, a random event is an event in which we donot know the outcome without observing it.

Probability tells us what we can say about such events,given our assumptions about the possible outcomes.

What is Probability?

DefinitionProbability theory is the study of the mathematicalrules that govern random events.

But what is randomness?

Informally, a random event is an event in which we donot know the outcome without observing it.

Probability tells us what we can say about such events,given our assumptions about the possible outcomes.

What is Statistics?

DefinitionStatistics is the application of probability to thecollection, analysis, and description of random data.

Statistics is used to:I Design experimentsI Summarize dataI Make conclusions about the worldI Explore complex data

What is Statistics?

DefinitionStatistics is the application of probability to thecollection, analysis, and description of random data.

Statistics is used to:I Design experimentsI Summarize dataI Make conclusions about the worldI Explore complex data

What is Statistics?

DefinitionStatistics is the application of probability to thecollection, analysis, and description of random data.

Statistics is used to:I Design experiments

I Summarize dataI Make conclusions about the worldI Explore complex data

What is Statistics?

DefinitionStatistics is the application of probability to thecollection, analysis, and description of random data.

Statistics is used to:I Design experimentsI Summarize data

I Make conclusions about the worldI Explore complex data

What is Statistics?

DefinitionStatistics is the application of probability to thecollection, analysis, and description of random data.

Statistics is used to:I Design experimentsI Summarize dataI Make conclusions about the world

I Explore complex data

What is Statistics?

DefinitionStatistics is the application of probability to thecollection, analysis, and description of random data.

Statistics is used to:I Design experimentsI Summarize dataI Make conclusions about the worldI Explore complex data

Applications of Probability and Statistics

Computer Science:

I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Applications of Probability and Statistics

Computer Science:I Machine Learning

I Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data Mining

I Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial Intelligence

I SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI Simulation

I Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image Processing

I Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer Graphics

I VisualizationI Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI Visualization

I Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software Testing

I Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:I Signal Processing

I TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:I Signal ProcessingI Telecommunications

I Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:I Signal ProcessingI TelecommunicationsI Information Theory

I Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:I Signal ProcessingI TelecommunicationsI Information TheoryI Control Theory

I Instrumentation, SensorsI Hardware/Electronics

Testing

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, Sensors

I Hardware/ElectronicsTesting

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Applications of Probability and Statistics

General:I Gambling

(not recommended)I Stock Market AnalysisI PoliticsI SportsI DemographicsI MedicineI EconomicsI All Sciences!!

Applications of Probability and Statistics

General:I Gambling (not recommended)

I Stock Market AnalysisI PoliticsI SportsI DemographicsI MedicineI EconomicsI All Sciences!!

Applications of Probability and Statistics

General:I Gambling (not recommended)I Stock Market Analysis

I PoliticsI SportsI DemographicsI MedicineI EconomicsI All Sciences!!

Applications of Probability and Statistics

General:I Gambling (not recommended)I Stock Market AnalysisI Politics

I SportsI DemographicsI MedicineI EconomicsI All Sciences!!

Applications of Probability and Statistics

General:I Gambling (not recommended)I Stock Market AnalysisI PoliticsI Sports

I DemographicsI MedicineI EconomicsI All Sciences!!

Applications of Probability and Statistics

General:I Gambling (not recommended)I Stock Market AnalysisI PoliticsI SportsI Demographics

I MedicineI EconomicsI All Sciences!!

Applications of Probability and Statistics

General:I Gambling (not recommended)I Stock Market AnalysisI PoliticsI SportsI DemographicsI Medicine

I EconomicsI All Sciences!!

Applications of Probability and Statistics

General:I Gambling (not recommended)I Stock Market AnalysisI PoliticsI SportsI DemographicsI MedicineI Economics

I All Sciences!!

Applications of Probability and Statistics

General:I Gambling (not recommended)I Stock Market AnalysisI PoliticsI SportsI DemographicsI MedicineI EconomicsI All Sciences!!

Alan Turing: Connecting CS and Probability

I “Father of Computer Science”

I Most famous for:I Computability, Turing machineI Stored-program computerI Turing testI WWII cryptanalysis

I Wrote a dissertation onprobability theory!

I Turing used probability andstatistics to crack Enigma

Alan Turing: Connecting CS and Probability

I “Father of Computer Science”I Most famous for:

I Computability, Turing machineI Stored-program computerI Turing testI WWII cryptanalysis

I Wrote a dissertation onprobability theory!

I Turing used probability andstatistics to crack Enigma

Alan Turing: Connecting CS and Probability

I “Father of Computer Science”I Most famous for:

I Computability, Turing machineI Stored-program computerI Turing testI WWII cryptanalysis

I Wrote a dissertation onprobability theory!

I Turing used probability andstatistics to crack Enigma

Alan Turing: Connecting CS and Probability

I “Father of Computer Science”I Most famous for:

I Computability, Turing machineI Stored-program computerI Turing testI WWII cryptanalysis

I Wrote a dissertation onprobability theory!

I Turing used probability andstatistics to crack Enigma

Application: Machine Learning

Machine Learning builds statistical models of data inorder to recognize complex patterns and to makedecisions based on these observations.

Examples:I Classification (recognition of faces or handwriting)I Prediction (stock market, elections)

Application: Machine Learning

Machine Learning builds statistical models of data inorder to recognize complex patterns and to makedecisions based on these observations.

Examples:I Classification (recognition of faces or handwriting)

I Prediction (stock market, elections)

Application: Machine Learning

Machine Learning builds statistical models of data inorder to recognize complex patterns and to makedecisions based on these observations.

Examples:I Classification (recognition of faces or handwriting)I Prediction (stock market, elections)

Application: Randomized Algorithms

I Some algorithms benefit from using random stepsrather than deterministic ones

I Example: primality testingI Testing for all possible divisors is slow for large numbersI Instead test a random selection of divisorsI Can be confident of primality up to a certain degree

I Example: stochastic optimization methodsI Optimizations can get “stuck” in the wrong answer,

depending on how they are initializedI Re-run the algorithm with several random initializations

Application: Randomized Algorithms

I Some algorithms benefit from using random stepsrather than deterministic ones

I Example: primality testingI Testing for all possible divisors is slow for large numbersI Instead test a random selection of divisorsI Can be confident of primality up to a certain degree

I Example: stochastic optimization methodsI Optimizations can get “stuck” in the wrong answer,

depending on how they are initializedI Re-run the algorithm with several random initializations

Application: Randomized Algorithms

I Some algorithms benefit from using random stepsrather than deterministic ones

I Example: primality testingI Testing for all possible divisors is slow for large numbersI Instead test a random selection of divisorsI Can be confident of primality up to a certain degree

I Example: stochastic optimization methodsI Optimizations can get “stuck” in the wrong answer,

depending on how they are initializedI Re-run the algorithm with several random initializations

Application: Randomized Algorithms

I Some algorithms benefit from using random stepsrather than deterministic ones

I Example: primality testingI Testing for all possible divisors is slow for large numbersI Instead test a random selection of divisorsI Can be confident of primality up to a certain degree

I Example: stochastic optimization methodsI Optimizations can get “stuck” in the wrong answer,

depending on how they are initializedI Re-run the algorithm with several random initializations

Application: Computer Graphics

I Ray tracing models lightphotons bouncing around ascene

I Impossible to model everyphoton

I Monte Carlo ray tracingsimulates a randomselection of photons Image by Steve Parker (U of U)

Application: Visualization

I Scientific data containsuncertainty

I Visualizations can bemisleading as to “truth”

I Current researchfocuses on how tovisualize uncertainty

Johnson and Sanderson, IEEE Comp. Graph. and App., 2003

Application: Visualization

I Scientific data containsuncertainty

I Visualizations can bemisleading as to “truth”

I Current researchfocuses on how tovisualize uncertainty

Johnson and Sanderson, IEEE Comp. Graph. and App., 2003

Application: Visualization

I Scientific data containsuncertainty

I Visualizations can bemisleading as to “truth”

I Current researchfocuses on how tovisualize uncertainty

Johnson and Sanderson, IEEE Comp. Graph. and App., 2003

Application: Medical Image Analysis

I Must deal with noisyimage data

I Example: finding ananatomical structure ina 3D image

I Often includesstatistical analysis ofresulting data

Fletcher et al, NeuroImage, 2010

Application: Medical Image Analysis

I Must deal with noisyimage data

I Example: finding ananatomical structure ina 3D image

I Often includesstatistical analysis ofresulting data

Fletcher et al, NeuroImage, 2010

Application: Medical Image Analysis

I Must deal with noisyimage data

I Example: finding ananatomical structure ina 3D image

I Often includesstatistical analysis ofresulting data

Fletcher et al, NeuroImage, 2010

“Big Data” and “Analytics”

I The amount of digitaldata is exploding!

I Big data analysis isstatistics + scalable CS.

I Examples: social media,internet purchases, newsarticles, scientific data,medical data

Source: IDC/EMC Digital Universe Study

“Big Data” and “Analytics”

I The amount of digitaldata is exploding!

I Big data analysis isstatistics + scalable CS.

I Examples: social media,internet purchases, newsarticles, scientific data,medical data

Source: IDC/EMC Digital Universe Study

“Big Data” and “Analytics”

I The amount of digitaldata is exploding!

I Big data analysis isstatistics + scalable CS.

I Examples: social media,internet purchases, newsarticles, scientific data,medical data

Source: IDC/EMC Digital Universe Study

Sources: Lesk, Berkeley SIMS, Landauer, EMC, TechCrunch, Smart Planet(slide by Chris Johnson)

all digital info

new digital info/yr

all human documents in 40k Yrs

all spoken words in all lives

amount human minds can store in 1yr

Feb. 2011

Every two days we create as much data as we did from the beginning of mankind until 2003!

Exa

byte

s (1

018

byt

es)

How Much is an Exabyte?

1 Exabyte = 1000 Petabytes = could hold approximately500,000,000,000,000 pages of standard printed text

It takes one tree to produce 94,200 pages of a book

Thus it will take 530,785,562,327 trees to store an Exabyte of data

In 2005, there were 400,246,300,201 trees on Earth

We can store .75 Exabytes of data using all the trees on the entire planet.

Sources: http://www.whatsabyte.com/ and http://wiki.answers.com (slide by Chris Johnson)

How many trees does it take to print out an Exabyte?

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question2. Background research, observation3. Formulate a hypothesis4. Design and run an experiment5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question

2. Background research, observation3. Formulate a hypothesis4. Design and run an experiment5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question2. Background research, observation

3. Formulate a hypothesis4. Design and run an experiment5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question2. Background research, observation3. Formulate a hypothesis

4. Design and run an experiment5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question2. Background research, observation3. Formulate a hypothesis4. Design and run an experiment

5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question2. Background research, observation3. Formulate a hypothesis4. Design and run an experiment5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question2. Background research, observation3. Formulate a hypothesis4. Design and run an experiment5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question2. Background research, observation3. Formulate a hypothesis4. Design and run an experiment5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

Data Science

enormousnoisy data

workingdata

drawconclusion

datasquashing

model uncertainty and bound error

data mining

1. Process/Squash enormous available data

2. Mine working data (calculate many statistics)

3. Analyze the results / Draw conclusions

Every step is subject to noise and involves statistics.

What statistics can and cannot do!

Data Science

enormousnoisy data

workingdata

drawconclusion

datasquashing

model uncertainty and bound error

data mining

1. Process/Squash enormous available data

2. Mine working data (calculate many statistics)

3. Analyze the results / Draw conclusions

Every step is subject to noise and involves statistics.

What statistics can and cannot do!

Data Science

enormousnoisy data

workingdata

drawconclusion

datasquashing

model uncertainty and bound error

data mining

1. Process/Squash enormous available data

2. Mine working data (calculate many statistics)

3. Analyze the results / Draw conclusions

Every step is subject to noise and involves statistics.

What statistics can and cannot do!

Data Science

enormousnoisy data

workingdata

drawconclusion

datasquashing

model uncertainty and bound error

data mining

1. Process/Squash enormous available data

2. Mine working data (calculate many statistics)

3. Analyze the results / Draw conclusions

Every step is subject to noise and involves statistics.

What statistics can and cannot do!

Data Science

enormousnoisy data

workingdata

drawconclusion

datasquashing

model uncertainty and bound error

data mining

1. Process/Squash enormous available data

2. Mine working data (calculate many statistics)

3. Analyze the results / Draw conclusions

Every step is subject to noise and involves statistics.

What statistics can and cannot do!

Data Science

enormousnoisy data

workingdata

drawconclusion

datasquashing

model uncertainty and bound error

data mining

1. Process/Squash enormous available data

2. Mine working data (calculate many statistics)

3. Analyze the results / Draw conclusions

Every step is subject to noise and involves statistics.

What statistics can and cannot do!

What You Should Do Now

1. Check out the class web page: www.cs.utah.edu/˜jeffp/teaching/cs3130.html

2. Download the book(start reading Ch 1 & 2)

3. Download and install R on your machine(take a look at R tutorial)