Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith...
-
Upload
lionel-walton -
Category
Documents
-
view
217 -
download
3
Transcript of Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith...
![Page 1: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/1.jpg)
Empirical Research Methods in Computer Science
Lecture 1, Part 1October 12, 2005Noah Smithhttp://nlp.cs.jhu.edu/~nasmith/erm
![Page 2: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/2.jpg)
Empiricism
empeiros: experienced (peira = trial or test)
cf. rationalism
![Page 3: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/3.jpg)
Exploration & Experiment
Exploratory Data Analysis (lecture ≈5)
Hypothesis Testing (lectures 1,2)
explorevisualize
summarizemodel
experimentconfirmyes/no?
![Page 4: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/4.jpg)
Computer What?
Theory Algorithms, Computation
Practice Software Engineering,
Application Areas Systems
OS, Architecture
![Page 5: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/5.jpg)
Who cares?
1. anyone who wants to do research2. anyone who wants to follow research
(i.e., read papers)
3. anyone who wants to be able to make smart decisions / draw conclusions
4. anyone who likes thinking critically
![Page 6: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/6.jpg)
Basic Research Questions
![Page 7: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/7.jpg)
Basic Research Questions
int foo() { ...}
![Page 8: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/8.jpg)
Why bother?
int foo() { ...}
int foo() { ...}
int foo() { ...}
int foo() { ...}
int foo() { ...}
int foo() { ...}
![Page 9: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/9.jpg)
Variation → Statistics
int foo() { ...}
determinism isn’t good enough any more!
![Page 10: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/10.jpg)
Statistics, in this Course
Nonparametric tests Sampling
Later: Parametric tests (when and why)
![Page 11: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/11.jpg)
Warning
Theory (complexity analysis, etc.) is important, too!
Many phenomena aren’t surprising if you know your math.
![Page 12: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/12.jpg)
Goals
Know how to look for the interesting experiments
Know how to construct experiments Know how to analyze the results Be critical of all claims
Develop an aesthetic for good empirical work!
![Page 13: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/13.jpg)
Empiricism is FUN!
Especially in computer science!
![Page 14: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/14.jpg)
Basic Course Information
instructors: Noah and David{n,d}[email protected]
Wednesdays 4-5:15 pm no class Thanksgiving week homeworks (65%); final exam
(30%)
![Page 15: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/15.jpg)
About Us
Combined 19 years of experience in CS; 36 years programming
Autodidact empiricists Research interests in statistical
modeling and machine learning (Eisner/Yarowsky lab)
NEB 332
![Page 16: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/16.jpg)
Plan
Hypothesis testing, statistics (2) Case study: runtime (2) Exploratory data analysis (1) Parametric testing, modeling (1-2) Statistical analysis of computer
programs (1)
![Page 17: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/17.jpg)
MO
Come to class. Send us feedback anytime.
What do you want to know? Bring us papers.
![Page 18: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/18.jpg)
Empirical Research Methods in Computer Science
Lecture 1, Part 2October 12, 2005David Smith
![Page 19: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/19.jpg)
Terminological Prelude
Populations Population distributions “All possible files”. How big?
Samples Sampling distributions “Files on my system”
Statistics Functions of data “Size of my files”
Models Parameters
![Page 20: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/20.jpg)
And now for some data
![Page 21: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/21.jpg)
Abnormality
![Page 22: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/22.jpg)
Abnormality
![Page 23: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/23.jpg)
The Bootstrap
Simulates the sampling distribution
Proposed by Efron in 1979 Anticipated by permutation tests,
jackknife, cross-validation From original sample of size n,
draw B samples of size n with replacement and calculate the statistic on each
![Page 24: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/24.jpg)
Sampling Distributions
μ
μ
μ
μμ
![Page 25: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/25.jpg)
Bootstrapping the Mean
![Page 26: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/26.jpg)
What’s Going On?
Why is bootstrap distribution normal?
Remember, this is a mean Linearity of Expectation Central Limit Theorem Closed form standard error for
means
![Page 27: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/27.jpg)
More Heavy Tails
![Page 28: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/28.jpg)
Sampling Still Normal
![Page 29: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/29.jpg)
Bivariate Data
![Page 30: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/30.jpg)
Compression Performance
![Page 31: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/31.jpg)
Bootstrapping Correlation
![Page 32: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/32.jpg)
Error, Confidence, Testing
Standard error from sampling distribution
Confidence intervals: bounding error probability (e.g. to 5%)
Hypothesis testing: how likely is a particular statistic under our assumptions?
![Page 33: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/33.jpg)
Hypothesis Testing
One-sample “Are these data normal/Poisson/…?”
Two-sample “Are these two samples from the
same distribution?” Paired-sample
“Is this technique better than that?”
![Page 34: Empirical Research Methods in Computer Science Lecture 1, Part 1 October 12, 2005 Noah Smith nasmith/erm.](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d0c5503460f949e0f83/html5/thumbnails/34.jpg)
Your First Assignment
Data compression Three-way tradeoff
Compression Speed Loss
Degenerate cases (cat, echo ‘’, …) Unknown distribution of input