R for Statistics - University of Ottawa€¦ · R for Statistics SSO meeting What is R? I Software...
Transcript of R for Statistics - University of Ottawa€¦ · R for Statistics SSO meeting What is R? I Software...
R for Statistics
Rafa l KulikDepartment of Mathematics and Statistics
University of Ottawa
Statistical Society of Ottawa23 September 2011
Rafa l Kulik
R for Statistics SSO meeting
Plan
Rafa l Kulik 1
R for Statistics SSO meeting
Plan
I What is R?
Rafa l Kulik 1
R for Statistics SSO meeting
Plan
I What is R?I Basic Syntax: Vectors and Functions.
Rafa l Kulik 1
R for Statistics SSO meeting
Plan
I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.
Rafa l Kulik 1
R for Statistics SSO meeting
Plan
I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.I Simple Data Analysis.
Rafa l Kulik 1
R for Statistics SSO meeting
Plan
I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.I Simple Data Analysis.I Statistical tests.
Rafa l Kulik 1
R for Statistics SSO meeting
Plan
I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.I Simple Data Analysis.I Statistical tests.I Linear Regression.
Rafa l Kulik 1
R for Statistics SSO meeting
Plan
I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.I Simple Data Analysis.I Statistical tests.I Linear Regression.I Simulation.
Rafa l Kulik 1
R for Statistics SSO meeting
Plan
I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.I Simple Data Analysis.I Statistical tests.I Linear Regression.I Simulation.I Bootstrap.
Rafa l Kulik 1
R for Statistics SSO meeting
Plan
I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.I Simple Data Analysis.I Statistical tests.I Linear Regression.I Simulation.I Bootstrap.I Random Matrices.
Rafa l Kulik 1
R for Statistics SSO meeting
Plan
I What is R?I Basic Syntax: Vectors and Functions.I Getting data into and out of R.I Simple Data Analysis.I Statistical tests.I Linear Regression.I Simulation.I Bootstrap.I Random Matrices.I Kernel smoothing in time series.
Rafa l Kulik 1
R for Statistics SSO meeting
What is R?
Rafa l Kulik 2
R for Statistics SSO meeting
What is R?
I Software for computation, simulation, data manipulation.
Rafa l Kulik 2
R for Statistics SSO meeting
What is R?
I Software for computation, simulation, data manipulation.I It is open-source and free implementation of S-PLUS.
Rafa l Kulik 2
R for Statistics SSO meeting
What is R?
I Software for computation, simulation, data manipulation.I It is open-source and free implementation of S-PLUS.I Can be used on Windows, Macintosh, Unix, Linux.
Rafa l Kulik 2
R for Statistics SSO meeting
What is R?
I Software for computation, simulation, data manipulation.I It is open-source and free implementation of S-PLUS.I Can be used on Windows, Macintosh, Unix, Linux.I Started by R. Gentleman and R. Ihaka from the University of Auckland.
Rafa l Kulik 2
R for Statistics SSO meeting
What is R?
I Software for computation, simulation, data manipulation.I It is open-source and free implementation of S-PLUS.I Can be used on Windows, Macintosh, Unix, Linux.I Started by R. Gentleman and R. Ihaka from the University of Auckland.I R is maintained by R development team. Its webpage is
www.r-project.org
Rafa l Kulik 2
R for Statistics SSO meeting
What is R?
I Software for computation, simulation, data manipulation.I It is open-source and free implementation of S-PLUS.I Can be used on Windows, Macintosh, Unix, Linux.I Started by R. Gentleman and R. Ihaka from the University of Auckland.I R is maintained by R development team. Its webpage is
www.r-project.org
I You may want to download Tinn-R - a free and simple replacement forthe code editor provided by R-Gui (graphical user interface).
Rafa l Kulik 2
R for Statistics SSO meeting
What is R?
I Software for computation, simulation, data manipulation.I It is open-source and free implementation of S-PLUS.I Can be used on Windows, Macintosh, Unix, Linux.I Started by R. Gentleman and R. Ihaka from the University of Auckland.I R is maintained by R development team. Its webpage is
www.r-project.org
I You may want to download Tinn-R - a free and simple replacement forthe code editor provided by R-Gui (graphical user interface).I The first issue of R journal appeared last year.
Rafa l Kulik 2
R for Statistics SSO meeting
Basic Syntax: Vectors and Functions
Rafa l Kulik 3
R for Statistics SSO meeting
Basic Syntax: Vectors and Functions
I Writing data manually as a vector:
Rafa l Kulik 3
R for Statistics SSO meeting
Basic Syntax: Vectors and Functions
I Writing data manually as a vector:
a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)
Rafa l Kulik 3
R for Statistics SSO meeting
Basic Syntax: Vectors and Functions
I Writing data manually as a vector:
a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)
I Elements of the vector can be printed as a[3], a[2:4].
Rafa l Kulik 3
R for Statistics SSO meeting
Basic Syntax: Vectors and Functions
I Writing data manually as a vector:
a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)
I Elements of the vector can be printed as a[3], a[2:4].I Basic operations on vectors.
Rafa l Kulik 3
R for Statistics SSO meeting
Basic Syntax: Vectors and Functions
I Writing data manually as a vector:
a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)
I Elements of the vector can be printed as a[3], a[2:4].I Basic operations on vectors.I You may write a vector as the argument of a function. e.g. log(a).
Rafa l Kulik 3
R for Statistics SSO meeting
Basic Syntax: Vectors and Functions
I Writing data manually as a vector:
a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)
I Elements of the vector can be printed as a[3], a[2:4].I Basic operations on vectors.I You may write a vector as the argument of a function. e.g. log(a).I Operation a+b does not work. Extract e.g. b=b[2:6]. Now it works.
Rafa l Kulik 3
R for Statistics SSO meeting
Basic Syntax: Vectors and Functions
I Writing data manually as a vector:
a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)
I Elements of the vector can be printed as a[3], a[2:4].I Basic operations on vectors.I You may write a vector as the argument of a function. e.g. log(a).I Operation a+b does not work. Extract e.g. b=b[2:6]. Now it works.I Note that a*b, a/b are coordinatewise.
Rafa l Kulik 3
R for Statistics SSO meeting
Basic Syntax: Vectors and Functions
I Writing data manually as a vector:
a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)
I Elements of the vector can be printed as a[3], a[2:4].I Basic operations on vectors.I You may write a vector as the argument of a function. e.g. log(a).I Operation a+b does not work. Extract e.g. b=b[2:6]. Now it works.I Note that a*b, a/b are coordinatewise.I However, max(a,b) is not coordinatewise. You have to use pmax(a,b).
Rafa l Kulik 3
R for Statistics SSO meeting
Basic Syntax: Vectors and Functions
I Writing data manually as a vector:
a=c(2,4,7,6,5) b=c(1,3,4,6,8,10)
I Elements of the vector can be printed as a[3], a[2:4].I Basic operations on vectors.I You may write a vector as the argument of a function. e.g. log(a).I Operation a+b does not work. Extract e.g. b=b[2:6]. Now it works.I Note that a*b, a/b are coordinatewise.I However, max(a,b) is not coordinatewise. You have to use pmax(a,b).I Type help(name) to learn how to use a function name.
Rafa l Kulik 3
R for Statistics SSO meeting
Getting data into and out of R
Rafa l Kulik 4
R for Statistics SSO meeting
Getting data into and out of R
I Write longer sequences using scan() function. Type f=scan() and starttyping. Hit ENTER once to change a line, hit twice to finish.
Rafa l Kulik 4
R for Statistics SSO meeting
Getting data into and out of R
I Write longer sequences using scan() function. Type f=scan() and starttyping. Hit ENTER once to change a line, hit twice to finish.I You may read data from a file: yuan=scan("C:/docs/yuan.txt") or
swiss=scan("C:/docs/swiss.txt",nlines=500).
Rafa l Kulik 4
R for Statistics SSO meeting
Getting data into and out of R
I Write longer sequences using scan() function. Type f=scan() and starttyping. Hit ENTER once to change a line, hit twice to finish.I You may read data from a file: yuan=scan("C:/docs/yuan.txt") or
swiss=scan("C:/docs/swiss.txt",nlines=500).I Export data: swiss=write(data,"C:/docs/data.txt").
Rafa l Kulik 4
R for Statistics SSO meeting
Example:
Rafa l Kulik 5
R for Statistics SSO meeting
Example:
Merge yuan and swiss together, write them as file currency.1
1See Tables for matrix operations.
Rafa l Kulik 5
R for Statistics SSO meeting
Example:
Merge yuan and swiss together, write them as file currency.1
1See Tables for matrix operations.
Rafa l Kulik 5
R for Statistics SSO meeting
Example:
Merge yuan and swiss together, write them as file currency.1
currency=matrix(0,length(yuan),2)currency[,1]=yuan;currency[,2]=swiss ;write(currency,"C:/currency.txt"); # NOT GOODcurrency=t(currency);write(currency,"C:/currency.txt"); # NOT GOODwrite(currency,"C:/currency.txt",ncolumns=2) # GOOD !!!
1See Tables for matrix operations.
Rafa l Kulik 5
R for Statistics SSO meeting
Simple Data Analysis
Rafa l Kulik 6
R for Statistics SSO meeting
Simple Data Analysis
We have already data yuan and swiss. They describe exchange rates ofChinese Yuan and Swiss Frank, respectively, vs. US Dollar.
Rafa l Kulik 6
R for Statistics SSO meeting
Simple Data Analysis
We have already data yuan and swiss. They describe exchange rates ofChinese Yuan and Swiss Frank, respectively, vs. US Dollar. Before you startwith statistical analysis, note that:
Rafa l Kulik 6
R for Statistics SSO meeting
Simple Data Analysis
We have already data yuan and swiss. They describe exchange rates ofChinese Yuan and Swiss Frank, respectively, vs. US Dollar. Before you startwith statistical analysis, note that: financial data are not stationary.
Rafa l Kulik 6
R for Statistics SSO meeting
Simple Data Analysis
We have already data yuan and swiss. They describe exchange rates ofChinese Yuan and Swiss Frank, respectively, vs. US Dollar. Before you startwith statistical analysis, note that: financial data are not stationary. If St,t = 1, . . . , n, describes a value of financial instrument at time t, the typicaltransformation is
Rafa l Kulik 6
R for Statistics SSO meeting
Simple Data Analysis
We have already data yuan and swiss. They describe exchange rates ofChinese Yuan and Swiss Frank, respectively, vs. US Dollar. Before you startwith statistical analysis, note that: financial data are not stationary. If St,t = 1, . . . , n, describes a value of financial instrument at time t, the typicaltransformation is
Rt = log(
St
St−1
), t = 2, . . . , n.
Rafa l Kulik 6
R for Statistics SSO meeting
Simple Data Analysis
We have already data yuan and swiss. They describe exchange rates ofChinese Yuan and Swiss Frank, respectively, vs. US Dollar. Before you startwith statistical analysis, note that: financial data are not stationary. If St,t = 1, . . . , n, describes a value of financial instrument at time t, the typicaltransformation is
Rt = log(
St
St−1
), t = 2, . . . , n.
We will write a script which computes log-returns for yuan and swiss,then performs a basic statistical analysis. This will be stored in the fileReturns.R.
Rafa l Kulik 6
R for Statistics SSO meeting
Script Returns.R
Rafa l Kulik 7
R for Statistics SSO meeting
Script Returns.R
yuanreturns=yuan[2:500]/yuan[1:499];yuanreturns=log(yuanreturns);swissreturns=log(swiss[2:500]/swiss[1:499]);print(summary(yuanreturns)); # Print summary statisticsprint(summary(swissreturns));plot(yuanreturns); # Point plotplot(swissreturns);
Rafa l Kulik 7
R for Statistics SSO meeting
Script Returns.R
yuanreturns=yuan[2:500]/yuan[1:499];yuanreturns=log(yuanreturns);swissreturns=log(swiss[2:500]/swiss[1:499]);print(summary(yuanreturns)); # Print summary statisticsprint(summary(swissreturns));plot(yuanreturns); # Point plotplot(swissreturns);
This is not a good script, since the first plot is invisible. You may add thecommand par(). You may also make the pictures much nicer.
Rafa l Kulik 7
R for Statistics SSO meeting
Rafa l Kulik 8
R for Statistics SSO meeting
yuanreturns=yuan[2:500]/yuan[1:499];yuanreturns=log(yuanreturns);swissreturns=log(swiss[2:500]/swiss[1:499]);print("Summary statistics for YUAN");print(summary(yuanreturns));print("Summary statistics for SWISS");print(summary(swissreturns));par(mfrow=c(1,2));plot(yuanreturns,xlab="Time",type="l",col="blue");plot(swissreturns,main="SWISS vs. USD",ylab="SWISS log-returns");
Rafa l Kulik 8
R for Statistics SSO meeting
yuanreturns=yuan[2:500]/yuan[1:499];yuanreturns=log(yuanreturns);swissreturns=log(swiss[2:500]/swiss[1:499]);print("Summary statistics for YUAN");print(summary(yuanreturns));print("Summary statistics for SWISS");print(summary(swissreturns));par(mfrow=c(1,2));plot(yuanreturns,xlab="Time",type="l",col="blue");plot(swissreturns,main="SWISS vs. USD",ylab="SWISS log-returns");
The script has still a lot of disadvantages: data sets names have to replacedmanually, everything is printed out at the same time.
Rafa l Kulik 8
R for Statistics SSO meeting
yuanreturns=yuan[2:500]/yuan[1:499];yuanreturns=log(yuanreturns);swissreturns=log(swiss[2:500]/swiss[1:499]);print("Summary statistics for YUAN");print(summary(yuanreturns));print("Summary statistics for SWISS");print(summary(swissreturns));par(mfrow=c(1,2));plot(yuanreturns,xlab="Time",type="l",col="blue");plot(swissreturns,main="SWISS vs. USD",ylab="SWISS log-returns");
The script has still a lot of disadvantages: data sets names have to replacedmanually, everything is printed out at the same time. We will write afunction which will perform a simple data analysis for a given data set.
Rafa l Kulik 8
R for Statistics SSO meeting
Function DataAnalysis
Rafa l Kulik 9
R for Statistics SSO meeting
Function DataAnalysis
DataAnalysis<−function(data,gr=TRUE){ print(summary(data))
if(gr){ readline(prompt = "Press <Enter> to continue...")par(mfrow=c(1,1))plot(data,type="l",col="blue")abline(mean(data),0,col="red")readline(prompt = "Press <Enter> to continue...")boxplot(data)readline(prompt = "Press <Enter> to continue...")hist(data,breaks=floor(sqrt(length(data))),prob=TRUE)curve(dnorm(x,mean(data),sd(data)),add=TRUE,col="red")readline(prompt = "Press <Enter> to continue...")qqnorm(data); qqline(data,col="red")} }
Rafa l Kulik 9
R for Statistics SSO meeting
Statistical tests
Rafa l Kulik 10
R for Statistics SSO meeting
Statistical tests
I There is just one command with many possible specifications: t.test(x,y).
Rafa l Kulik 10
R for Statistics SSO meeting
Statistical tests
I There is just one command with many possible specifications: t.test(x,y).I Variables:
Rafa l Kulik 10
R for Statistics SSO meeting
Statistical tests
I There is just one command with many possible specifications: t.test(x,y).I Variables:
• x-data set, y-second data set, default=NULL;
Rafa l Kulik 10
R for Statistics SSO meeting
Statistical tests
I There is just one command with many possible specifications: t.test(x,y).I Variables:
• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);
Rafa l Kulik 10
R for Statistics SSO meeting
Statistical tests
I There is just one command with many possible specifications: t.test(x,y).I Variables:
• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);• paired=FALSE;
Rafa l Kulik 10
R for Statistics SSO meeting
Statistical tests
I There is just one command with many possible specifications: t.test(x,y).I Variables:
• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);• paired=FALSE;• conf.level=0.95.
Rafa l Kulik 10
R for Statistics SSO meeting
Statistical tests
I There is just one command with many possible specifications: t.test(x,y).I Variables:
• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);• paired=FALSE;• conf.level=0.95.
I Usage:
Rafa l Kulik 10
R for Statistics SSO meeting
Statistical tests
I There is just one command with many possible specifications: t.test(x,y).I Variables:
• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);• paired=FALSE;• conf.level=0.95.
I Usage:• t.test(x,alternative="less");
Rafa l Kulik 10
R for Statistics SSO meeting
Statistical tests
I There is just one command with many possible specifications: t.test(x,y).I Variables:
• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);• paired=FALSE;• conf.level=0.95.
I Usage:• t.test(x,alternative="less");• results=t.test(x,y,paired=TRUE,conf.level=0.97).
Rafa l Kulik 10
R for Statistics SSO meeting
Statistical tests
I There is just one command with many possible specifications: t.test(x,y).I Variables:
• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);• paired=FALSE;• conf.level=0.95.
I Usage:• t.test(x,alternative="less");• results=t.test(x,y,paired=TRUE,conf.level=0.97).
I Results of performing t.test function are stored in the object results,which is a t.test object.
Rafa l Kulik 10
R for Statistics SSO meeting
Statistical tests
I There is just one command with many possible specifications: t.test(x,y).I Variables:
• x-data set, y-second data set, default=NULL;• alternative=c(”two.sided”,”less”,”greater”);• paired=FALSE;• conf.level=0.95.
I Usage:• t.test(x,alternative="less");• results=t.test(x,y,paired=TRUE,conf.level=0.97).
I Results of performing t.test function are stored in the object results,which is a t.test object.Ia=t.test(yuanreturns); a$statistics; a$statistic[[1]]
Rafa l Kulik 10
R for Statistics SSO meeting
Linear Regression
Rafa l Kulik 11
R for Statistics SSO meeting
Linear Regression
Having data (Xt, Yt), t = 1, . . . , n from the model
Yt = β0 + β1Xt + εt, t = 1, . . . , n.
we want to estimate parameters β0 and β1 using mean squared errorcriterion.
Rafa l Kulik 11
R for Statistics SSO meeting
Linear Regression
Having data (Xt, Yt), t = 1, . . . , n from the model
Yt = β0 + β1Xt + εt, t = 1, . . . , n.
we want to estimate parameters β0 and β1 using mean squared errorcriterion.I The basic function for linear regression model is lm (y~ x).
Rafa l Kulik 11
R for Statistics SSO meeting
Linear Regression
Having data (Xt, Yt), t = 1, . . . , n from the model
Yt = β0 + β1Xt + εt, t = 1, . . . , n.
we want to estimate parameters β0 and β1 using mean squared errorcriterion.I The basic function for linear regression model is lm (y~ x).I Write lm (y~ x-1) to fit regression line without intercept.
Rafa l Kulik 11
R for Statistics SSO meeting
Linear Regression
Having data (Xt, Yt), t = 1, . . . , n from the model
Yt = β0 + β1Xt + εt, t = 1, . . . , n.
we want to estimate parameters β0 and β1 using mean squared errorcriterion.I The basic function for linear regression model is lm (y~ x).I Write lm (y~ x-1) to fit regression line without intercept.I The output is a linear model object. If lmresults=lm (y~ x), then:
Rafa l Kulik 11
R for Statistics SSO meeting
Linear Regression
Having data (Xt, Yt), t = 1, . . . , n from the model
Yt = β0 + β1Xt + εt, t = 1, . . . , n.
we want to estimate parameters β0 and β1 using mean squared errorcriterion.I The basic function for linear regression model is lm (y~ x).I Write lm (y~ x-1) to fit regression line without intercept.I The output is a linear model object. If lmresults=lm (y~ x), then:
• lmresults$coefficients to get intercept and slope;
Rafa l Kulik 11
R for Statistics SSO meeting
Linear Regression
Having data (Xt, Yt), t = 1, . . . , n from the model
Yt = β0 + β1Xt + εt, t = 1, . . . , n.
we want to estimate parameters β0 and β1 using mean squared errorcriterion.I The basic function for linear regression model is lm (y~ x).I Write lm (y~ x-1) to fit regression line without intercept.I The output is a linear model object. If lmresults=lm (y~ x), then:
• lmresults$coefficients to get intercept and slope;• lmresults$residuals to get residuals;
Rafa l Kulik 11
R for Statistics SSO meeting
Linear Regression
Having data (Xt, Yt), t = 1, . . . , n from the model
Yt = β0 + β1Xt + εt, t = 1, . . . , n.
we want to estimate parameters β0 and β1 using mean squared errorcriterion.I The basic function for linear regression model is lm (y~ x).I Write lm (y~ x-1) to fit regression line without intercept.I The output is a linear model object. If lmresults=lm (y~ x), then:
• lmresults$coefficients to get intercept and slope;• lmresults$residuals to get residuals;• lmresults$fitted.values to get fitted values.
Rafa l Kulik 11
R for Statistics SSO meeting
Function Linear Regression
Rafa l Kulik 12
R for Statistics SSO meeting
Function Linear Regression
LinearRegression<−function(x,y) {cat("Would you to fit with intercept? (Y/N) ");answer <- readline(); pr=switch(answer,y=,Y=TRUE,FALSE);pr1=switch(answer,n=,N=TRUE,FALSE);if(pr){lmresults= lm (y~ x) } ;if(pr1){lmresults= lm (y~ x-1) } ;par(mfrow=c(1,2)) ;plot(x,y,xlab=""); abline(lmresults) ;plot(x,lmresults$residuals,ylab="Residuals") ;readline(prompt = "Press <Enter> to continue...") ;print(lmresults) ;
}
Rafa l Kulik 12
R for Statistics SSO meeting
Simulation
Rafa l Kulik 13
R for Statistics SSO meeting
Simulation
I Simulation of basic random variables: b=rnorm(6).2
2See Tables for other simulation functions.
Rafa l Kulik 13
R for Statistics SSO meeting
Simulation
I Simulation of basic random variables: b=rnorm(6).2
BrownianMotion<−function(n=1000,m=10) {print("Simulate one path of BM");par(mfrow=c(1,1))l=3*sqrt(n)plot(cumsum(rnorm(n)),type="l",col="red",ylim=c(-l,l),xlab="")readline(prompt = "Press <Enter> to continue...")print("Simulation of"); print(m); print("paths of BM")for (s in 1:m){ points(cumsum(rnorm(n)),type="l" )}
}2See Tables for other simulation functions.
Rafa l Kulik 13
R for Statistics SSO meeting
Bootstrap
Rafa l Kulik 14
R for Statistics SSO meeting
Bootstrap
I Given a sample X1, . . . , Xn, let T (X1, . . . , Xn) = Θn be an estimator ofa quantity θ.
Rafa l Kulik 14
R for Statistics SSO meeting
Bootstrap
I Given a sample X1, . . . , Xn, let T (X1, . . . , Xn) = Θn be an estimator ofa quantity θ.I A confidence interval for θ has typically the following form
[Θn − zα/2se(Θn), Θn + zα/2se(Θn)],
where zα/2 is 1− α/2 quantile of N (0, 1).
Rafa l Kulik 14
R for Statistics SSO meeting
Bootstrap
I Given a sample X1, . . . , Xn, let T (X1, . . . , Xn) = Θn be an estimator ofa quantity θ.I A confidence interval for θ has typically the following form
[Θn − zα/2se(Θn), Θn + zα/2se(Θn)],
where zα/2 is 1− α/2 quantile of N (0, 1).
I Example 1: Xi ∼ N (µ, σ2) and Θn = Xn, then se(Θn) = σ/√
n and wecan easily estimate σ by its sample version.
Rafa l Kulik 14
R for Statistics SSO meeting
Bootstrap
I Given a sample X1, . . . , Xn, let T (X1, . . . , Xn) = Θn be an estimator ofa quantity θ.I A confidence interval for θ has typically the following form
[Θn − zα/2se(Θn), Θn + zα/2se(Θn)],
where zα/2 is 1− α/2 quantile of N (0, 1).
I Example 1: Xi ∼ N (µ, σ2) and Θn = Xn, then se(Θn) = σ/√
n and wecan easily estimate σ by its sample version.I Example 2: Xi ∼ F and Θn = med(X1, . . . , Xn), then se(Θn) =???.
Rafa l Kulik 14
R for Statistics SSO meeting
Bootstrap
I Given a sample X1, . . . , Xn, let T (X1, . . . , Xn) = Θn be an estimator ofa quantity θ.I A confidence interval for θ has typically the following form
[Θn − zα/2se(Θn), Θn + zα/2se(Θn)],
where zα/2 is 1− α/2 quantile of N (0, 1).
I Example 1: Xi ∼ N (µ, σ2) and Θn = Xn, then se(Θn) = σ/√
n and wecan easily estimate σ by its sample version.I Example 2: Xi ∼ F and Θn = med(X1, . . . , Xn), then se(Θn) =???.
I Solution: use bootstrap to get se(Θn), an estimate of se(Θn).
Rafa l Kulik 14
R for Statistics SSO meeting
Bootstrap
I Given a sample X1, . . . , Xn, let T (X1, . . . , Xn) = Θn be an estimator ofa quantity θ.I A confidence interval for θ has typically the following form
[Θn − zα/2se(Θn), Θn + zα/2se(Θn)],
where zα/2 is 1− α/2 quantile of N (0, 1).
I Example 1: Xi ∼ N (µ, σ2) and Θn = Xn, then se(Θn) = σ/√
n and wecan easily estimate σ by its sample version.I Example 2: Xi ∼ F and Θn = med(X1, . . . , Xn), then se(Θn) =???.
I Solution: use bootstrap to get se(Θn), an estimate of se(Θn).I Such constructed bootstrap confidence interval works well when thedistribution of Θn is approximately normal.
Rafa l Kulik 14
R for Statistics SSO meeting
Bootstrap - ctd.
Rafa l Kulik 15
R for Statistics SSO meeting
Bootstrap - ctd.
n=1000; m=700; k=500;X=rnorm(n,1,1);TX=median(X);TXboot=1:k;for (s in 1:k)
{;Xstar=sample(X,m,replace=TRUE);TXboot[s]=median(Xstar);};
print(sd(TXboot));qqnorm(TXboot);
Rafa l Kulik 15
R for Statistics SSO meeting
Random Matrices
Rafa l Kulik 16
R for Statistics SSO meeting
Random Matrices
Let A be a matrix n × n with the entries N (0, 1). Then the matrixB = A + AT is called Gaussian Unitary Ensemble. Its eigenvalues follow aparticular pattern.
Rafa l Kulik 16
R for Statistics SSO meeting
Random Matrices
Let A be a matrix n × n with the entries N (0, 1). Then the matrixB = A + AT is called Gaussian Unitary Ensemble. Its eigenvalues follow aparticular pattern.
n = 5000;A = array(rnorm(n*n),c(n,n));B = (A+t(A))/sqrt(2*n);Eigens <- eigen(B, symmetric=T);Eigenvalues <- Eigens$values;hist(Eigenvalues,xlab="Eigenvalues",freq=T)
Rafa l Kulik 16
R for Statistics SSO meeting
Random Matrices
Let A be a matrix n × n with the entries N (0, 1). Then the matrixB = A + AT is called Gaussian Unitary Ensemble. Its eigenvalues follow aparticular pattern.
n = 5000;A = array(rnorm(n*n),c(n,n));B = (A+t(A))/sqrt(2*n);Eigens <- eigen(B, symmetric=T);Eigenvalues <- Eigens$values;hist(Eigenvalues,xlab="Eigenvalues",freq=T)
Note: RMTstat package deals with different aspects of random matrices.
Rafa l Kulik 16
R for Statistics SSO meeting
Kernel Smoothing in Time Series
Rafa l Kulik 17
R for Statistics SSO meeting
Kernel Smoothing in Time Series
I Recall swiss data Xt, t = 1, . . . , n. We could think about a model
Xt = f(t/n) + εt, t = 1, . . . , n.
Rafa l Kulik 17
R for Statistics SSO meeting
Kernel Smoothing in Time Series
I Recall swiss data Xt, t = 1, . . . , n. We could think about a model
Xt = f(t/n) + εt, t = 1, . . . , n.
I We can estimate f via a kernel estimator, e.g.
fh(x) =n∑
t=1
`t(x)Xt,
where
`t(x) =K
(x−t/n
h
)∑n
s=1 K(
x−s/nh
).
Rafa l Kulik 17
R for Statistics SSO meeting
Rafa l Kulik 18
R for Statistics SSO meeting
my.data=swiss; n=length(my.data); ggrid=1:n/n;H=c(0.025,0.05,0.1);K<−function(x) { dnorm(x)};fhat<− function(x){sum(data*K((x-ggrid)/h))/sum(K((x-ggrid)/h)) };par(mfrow=c(1,3))fhatvec=1:nfor (h in H){ temp=1;for (i in ggrid)
{fhatvec[temp]=fhat(i); temp=temp+1;};plot(ggrid,my.data,xlab="Time",ylab="Data",type="p")points(ggrid,fhatvec,type="l",col="blue") }
Rafa l Kulik 18
R for Statistics SSO meeting
my.data=swiss; n=length(my.data); ggrid=1:n/n;H=c(0.025,0.05,0.1);K<−function(x) { dnorm(x)};fhat<− function(x){sum(data*K((x-ggrid)/h))/sum(K((x-ggrid)/h)) };par(mfrow=c(1,3))fhatvec=1:nfor (h in H){ temp=1;for (i in ggrid)
{fhatvec[temp]=fhat(i); temp=temp+1;};plot(ggrid,my.data,xlab="Time",ylab="Data",type="p")points(ggrid,fhatvec,type="l",col="blue") }
3
3For other smoothing techniques, e.g. wavelets, see waved, wavelets.
Rafa l Kulik 18
R for Statistics SSO meeting
Some commands
Table 1: Basic functionssqrt() Square root.
abs() Absolut value.
sin(), cos(), tan() Trigonometric functions.
exp() Exponential function.
log(), log2(), log10(), logb() Logarithmic functions.
Table 2: Basic vector functionssum(), prod() Sum and product of all elements.
cumsum(), cumprod() Cumulative sums and products.
length() Length of a vector.
sort(), rev(sort()) Sort a vector in increasing and decreasing order.
Table 3: Sequences
seq(a,b,by=x) Sequence from a to b with the step x. Default x = 1.
seq(a,b,length.out=n) Sequence from a to b with the size n.
rep(a,n) Repeat the value a n times.
Rafa l Kulik 19
R for Statistics SSO meeting
Table 4: Matrix operations
dim() Dimension.
% ∗% Matrix multiplication.
t() Matrix transpose.
det() Matrix determinant.
solve() Inverse.
eigen() Eigenvalues.
Table 5: Graphics
curve(expr) expr - written as e.g. x2
abline(a,b) Adds a line with intercept a and slope b.
points() Overlaying plots.
Table 6: Distributionsdname() Density (probability) function.
pname() Distribution function P (X ≤ x).
qname() Quantiles.
rname() Simulation.
”name” binom, pois, unif, exp, gamma, t, norm
Rafa l Kulik 20
R for Statistics SSO meeting
Table 7: Basic Statisticsmean(), median()
sd(), var()
min(), max()
summary()
quantile()
cor(), cov()
hist() Histogram.
boxplot() Boxplot.
stem() Stem-and-leaf plot.
ecdf Empirical distribution function.
Rafa l Kulik 21