Subject: Statistics Paper: Advanced R Module: Programming ...

70
Subject: Statistics Paper: Advanced R Module: Programming in R: Part 2 1 / 24

Transcript of Subject: Statistics Paper: Advanced R Module: Programming ...

Page 1: Subject: Statistics Paper: Advanced R Module: Programming ...

Subject: Statistics

Paper: Advanced RModule: Programming in R: Part 2

1 / 24

Page 2: Subject: Statistics Paper: Advanced R Module: Programming ...

Principal investigator: Prof. Bhaswati Ganguli, University of

Calcutta

Paper co-ordinator: Dr. Abhra Sarkar, Duke University

Content writer: MS. Rimli Sengupta

Content writer: MS. Moumita Chatterjee, West Bengal State

Univrsity, Kolkata

Content reviewer: Dr. Santu Ghosh, Shri Ramachandra University

Copyright University Grants Commission

2 / 24

Page 3: Subject: Statistics Paper: Advanced R Module: Programming ...

To define user defined functions

I In most of the cases it is often found that, the basic functionsthat R provides us are insufficient to solve our problem. Inthat case you can always define your own function, which isalso known as ”User defined function” and this can be doneby writing a programme.

Why do we need to programme? 3 / 24

Page 4: Subject: Statistics Paper: Advanced R Module: Programming ...

To create libraries

I Another aspect is that, suppose you want to do the samething, as discussed previously, but with a magnifying effect,that is if you want to create a library with a series offunctions, then programming is required almost every where.

Why do we need to programme? 4 / 24

Page 5: Subject: Statistics Paper: Advanced R Module: Programming ...

To execute simulation studies

I Simulation study is indeed an important part of statisticsnowadays. And for a simulation study, when you need torepeat a programme a large number of times, a severalprogramming techniques are often used to minimize youreffort as well as to minimize the time required.

Why do we need to programme? 5 / 24

Page 6: Subject: Statistics Paper: Advanced R Module: Programming ...

An Example

Here, the problem is to find the distribution of the median ofCauchy distribution.

Here, we are simulating 10000 samples each of size 10. Then, weare calculating the median for each of the 10000 sample.

Example 6 / 24

Page 7: Subject: Statistics Paper: Advanced R Module: Programming ...

An Example

Here, the problem is to find the distribution of the median ofCauchy distribution.

Here, we are simulating 10000 samples each of size 10. Then, weare calculating the median for each of the 10000 sample.

Example 6 / 24

Page 8: Subject: Statistics Paper: Advanced R Module: Programming ...

Have a look at the histogram that have been made using code 1.This will give us the distribution of the sample median.

Example 7 / 24

Page 9: Subject: Statistics Paper: Advanced R Module: Programming ...

Have a look at the histogram that have been made using code 1.This will give us the distribution of the sample median.

Example 7 / 24

Page 10: Subject: Statistics Paper: Advanced R Module: Programming ...

Make your programme as generic as possible

Look at code 2. Here, the objective is to get the mean of200 simulated samples each of size 20. It looks good.

But after a gap of sometimes, it might be difficult tounderstand, specifically the sample size and simulationsize.

But look at the programme next to it. We have specifiedthe sample size and simulation size separately in thiscase. This helps the programme easier to understand.

Programming Style 8 / 24

Page 11: Subject: Statistics Paper: Advanced R Module: Programming ...

Make your programme as generic as possible

Look at code 2. Here, the objective is to get the mean of200 simulated samples each of size 20. It looks good.

But after a gap of sometimes, it might be difficult tounderstand, specifically the sample size and simulationsize.

But look at the programme next to it. We have specifiedthe sample size and simulation size separately in thiscase. This helps the programme easier to understand.

Programming Style 8 / 24

Page 12: Subject: Statistics Paper: Advanced R Module: Programming ...

Make your programme as generic as possible

Look at code 2. Here, the objective is to get the mean of200 simulated samples each of size 20. It looks good.

But after a gap of sometimes, it might be difficult tounderstand, specifically the sample size and simulationsize.

But look at the programme next to it. We have specifiedthe sample size and simulation size separately in thiscase. This helps the programme easier to understand.

Programming Style 8 / 24

Page 13: Subject: Statistics Paper: Advanced R Module: Programming ...

Make your programme as generic as possible

Look at code 2. Here, the objective is to get the mean of200 simulated samples each of size 20. It looks good.

But after a gap of sometimes, it might be difficult tounderstand, specifically the sample size and simulationsize.

But look at the programme next to it. We have specifiedthe sample size and simulation size separately in thiscase. This helps the programme easier to understand.

Programming Style 8 / 24

Page 14: Subject: Statistics Paper: Advanced R Module: Programming ...

Indent your programme

Have a look at the pair of codes that have been given undercode 3. They are actually the same. Just by indenting the firstone, we have the 2nd. The 2nd one is

I Nicer looking

I Helps us to avoid lot of errors

Programming Style 9 / 24

Page 15: Subject: Statistics Paper: Advanced R Module: Programming ...

Indent your programme

Have a look at the pair of codes that have been given undercode 3. They are actually the same. Just by indenting the firstone, we have the 2nd. The 2nd one is

I Nicer looking

I Helps us to avoid lot of errors

Programming Style 9 / 24

Page 16: Subject: Statistics Paper: Advanced R Module: Programming ...

Indent your programme

Have a look at the pair of codes that have been given undercode 3. They are actually the same. Just by indenting the firstone, we have the 2nd. The 2nd one is

I Nicer looking

I Helps us to avoid lot of errors

Programming Style 9 / 24

Page 17: Subject: Statistics Paper: Advanced R Module: Programming ...

Indent your programme

Have a look at the pair of codes that have been given undercode 3. They are actually the same. Just by indenting the firstone, we have the 2nd. The 2nd one is

I Nicer looking

I Helps us to avoid lot of errors

Programming Style 9 / 24

Page 18: Subject: Statistics Paper: Advanced R Module: Programming ...

Give meaningful variable name

I It is always recommended to give proper name to each of thevariables involved in your programme.

I This would help us to understand the programme at a glanceeven after a gap of a substantial amount of time.

I Look at the sets of codes that are there in Code 4.

I The first set is valid, but not user friendly.

I But the second set would even help someone to understandthe programme, who haven’t written it and is only using this.

Programming Style 10 / 24

Page 19: Subject: Statistics Paper: Advanced R Module: Programming ...

Use matrices for faster computation

I The use of matrices is one of the important part ofprogramming style. This will help you to run your programmemuch faster. For example, look at the 2 sets of codes that arethere in code 5. These 2 are giving the same results as thesame logic is being used.

I But the second one is faster than the first.

I So, replace loops with matrices in order to have fastercomputation.

Programming Style 11 / 24

Page 20: Subject: Statistics Paper: Advanced R Module: Programming ...

Add comments using the sign.

I Here comes another useful aspects of the programming style.

I You can always add comments with a ”hash” sign. Thesewould help you to understand the objective behind writing thiscode.

I You can really add whatever you want, because R is not goingto execute anything which is added after the ”hash” sign.

I But don’t overuse it. It might make the programme clumsy.

I Look at code 6 in order to have a better understanding.

Programming Style 12 / 24

Page 21: Subject: Statistics Paper: Advanced R Module: Programming ...

Loop structure

I So, let’s try to find the structure of some of the loops, whichwe often come across.

I we will consider ”for”, ”while”, ”if” and ”if else” loops here.

I Let’s consider them one by one.

Structure of loops in R 13 / 24

Page 22: Subject: Statistics Paper: Advanced R Module: Programming ...

for loop

Consider a ”for” loop.

I Here we specify the range of the indices within the firstbracket and after the word ”for”.

I Then, by optionally using a second bracket, we can alwayswrite down the R code.

I The final structure would be as follows:

for (i in 1:10){

... R code ...}

Structure of loops in R 14 / 24

Page 23: Subject: Statistics Paper: Advanced R Module: Programming ...

for loop

Consider a ”for” loop.

I Here we specify the range of the indices within the firstbracket and after the word ”for”.

I Then, by optionally using a second bracket, we can alwayswrite down the R code.

I The final structure would be as follows:

for (i in 1:10){

... R code ...}

Structure of loops in R 14 / 24

Page 24: Subject: Statistics Paper: Advanced R Module: Programming ...

for loop

Consider a ”for” loop.

I Here we specify the range of the indices within the firstbracket and after the word ”for”.

I Then, by optionally using a second bracket, we can alwayswrite down the R code.

I The final structure would be as follows:

for (i in 1:10){

... R code ...}

Structure of loops in R 14 / 24

Page 25: Subject: Statistics Paper: Advanced R Module: Programming ...

for loop

Consider a ”for” loop.

I Here we specify the range of the indices within the firstbracket and after the word ”for”.

I Then, by optionally using a second bracket, we can alwayswrite down the R code.

I The final structure would be as follows:

for (i in 1:10){

... R code ...}

Structure of loops in R 14 / 24

Page 26: Subject: Statistics Paper: Advanced R Module: Programming ...

for loop

Consider a ”for” loop.

I Here we specify the range of the indices within the firstbracket and after the word ”for”.

I Then, by optionally using a second bracket, we can alwayswrite down the R code.

I The final structure would be as follows:

for (i in 1:10){

... R code ...}

Structure of loops in R 14 / 24

Page 27: Subject: Statistics Paper: Advanced R Module: Programming ...

while loop

Consider a ”while” loop.

I This will be used primarily, when we are asked to write down acode, which will work if certain thing holds.

I This is a logical one.

I We need to specify the logical statement within the firstbracket written after the word while.

I Then we can use an optional second bracket to specify the Rcode.

I So, the final structure would be as follows:

while (logical condition){

... R code ...}

Structure of loops in R 15 / 24

Page 28: Subject: Statistics Paper: Advanced R Module: Programming ...

while loop

Consider a ”while” loop.

I This will be used primarily, when we are asked to write down acode, which will work if certain thing holds.

I This is a logical one.

I We need to specify the logical statement within the firstbracket written after the word while.

I Then we can use an optional second bracket to specify the Rcode.

I So, the final structure would be as follows:

while (logical condition){

... R code ...}

Structure of loops in R 15 / 24

Page 29: Subject: Statistics Paper: Advanced R Module: Programming ...

while loop

Consider a ”while” loop.

I This will be used primarily, when we are asked to write down acode, which will work if certain thing holds.

I This is a logical one.

I We need to specify the logical statement within the firstbracket written after the word while.

I Then we can use an optional second bracket to specify the Rcode.

I So, the final structure would be as follows:

while (logical condition){

... R code ...}

Structure of loops in R 15 / 24

Page 30: Subject: Statistics Paper: Advanced R Module: Programming ...

while loop

Consider a ”while” loop.

I This will be used primarily, when we are asked to write down acode, which will work if certain thing holds.

I This is a logical one.

I We need to specify the logical statement within the firstbracket written after the word while.

I Then we can use an optional second bracket to specify the Rcode.

I So, the final structure would be as follows:

while (logical condition){

... R code ...}

Structure of loops in R 15 / 24

Page 31: Subject: Statistics Paper: Advanced R Module: Programming ...

while loop

Consider a ”while” loop.

I This will be used primarily, when we are asked to write down acode, which will work if certain thing holds.

I This is a logical one.

I We need to specify the logical statement within the firstbracket written after the word while.

I Then we can use an optional second bracket to specify the Rcode.

I So, the final structure would be as follows:

while (logical condition){

... R code ...}

Structure of loops in R 15 / 24

Page 32: Subject: Statistics Paper: Advanced R Module: Programming ...

while loop

Consider a ”while” loop.

I This will be used primarily, when we are asked to write down acode, which will work if certain thing holds.

I This is a logical one.

I We need to specify the logical statement within the firstbracket written after the word while.

I Then we can use an optional second bracket to specify the Rcode.

I So, the final structure would be as follows:

while (logical condition){

... R code ...}

Structure of loops in R 15 / 24

Page 33: Subject: Statistics Paper: Advanced R Module: Programming ...

while loop

Consider a ”while” loop.

I This will be used primarily, when we are asked to write down acode, which will work if certain thing holds.

I This is a logical one.

I We need to specify the logical statement within the firstbracket written after the word while.

I Then we can use an optional second bracket to specify the Rcode.

I So, the final structure would be as follows:

while (logical condition){

... R code ...}

Structure of loops in R 15 / 24

Page 34: Subject: Statistics Paper: Advanced R Module: Programming ...

if loop

Consider the ”if” loop.

I It serves nearly the similar kind of purpose as ”while” does.

I Also, the structure is almost same for both the loops.

I So, here also, the structure is as follows:

if (logical condition){

... R code ...}

Structure of loops in R 16 / 24

Page 35: Subject: Statistics Paper: Advanced R Module: Programming ...

if loop

Consider the ”if” loop.

I It serves nearly the similar kind of purpose as ”while” does.

I Also, the structure is almost same for both the loops.

I So, here also, the structure is as follows:

if (logical condition){

... R code ...}

Structure of loops in R 16 / 24

Page 36: Subject: Statistics Paper: Advanced R Module: Programming ...

if loop

Consider the ”if” loop.

I It serves nearly the similar kind of purpose as ”while” does.

I Also, the structure is almost same for both the loops.

I So, here also, the structure is as follows:

if (logical condition){

... R code ...}

Structure of loops in R 16 / 24

Page 37: Subject: Statistics Paper: Advanced R Module: Programming ...

if loop

Consider the ”if” loop.

I It serves nearly the similar kind of purpose as ”while” does.

I Also, the structure is almost same for both the loops.

I So, here also, the structure is as follows:

if (logical condition){

... R code ...}

Structure of loops in R 16 / 24

Page 38: Subject: Statistics Paper: Advanced R Module: Programming ...

if loop

Consider the ”if” loop.

I It serves nearly the similar kind of purpose as ”while” does.

I Also, the structure is almost same for both the loops.

I So, here also, the structure is as follows:

if (logical condition){

... R code ...}

Structure of loops in R 16 / 24

Page 39: Subject: Statistics Paper: Advanced R Module: Programming ...

if else loop

Consider the ”if else” loop.

I The ”if else” loop is an improvisation of the ”if” loop.

I The ”else” part proves the instructions through the R codewhich is applicable when the logical condition does not hold.

I So, here also, the structure is as follows:

if (logical condition){

... R code ...}

else{

... R code ...}

Structure of loops in R 17 / 24

Page 40: Subject: Statistics Paper: Advanced R Module: Programming ...

if else loop

Consider the ”if else” loop.

I The ”if else” loop is an improvisation of the ”if” loop.

I The ”else” part proves the instructions through the R codewhich is applicable when the logical condition does not hold.

I So, here also, the structure is as follows:

if (logical condition){

... R code ...}

else{

... R code ...}

Structure of loops in R 17 / 24

Page 41: Subject: Statistics Paper: Advanced R Module: Programming ...

if else loop

Consider the ”if else” loop.

I The ”if else” loop is an improvisation of the ”if” loop.

I The ”else” part proves the instructions through the R codewhich is applicable when the logical condition does not hold.

I So, here also, the structure is as follows:

if (logical condition){

... R code ...}

else{

... R code ...}

Structure of loops in R 17 / 24

Page 42: Subject: Statistics Paper: Advanced R Module: Programming ...

if else loop

Consider the ”if else” loop.

I The ”if else” loop is an improvisation of the ”if” loop.

I The ”else” part proves the instructions through the R codewhich is applicable when the logical condition does not hold.

I So, here also, the structure is as follows:

if (logical condition){

... R code ...}

else{

... R code ...}

Structure of loops in R 17 / 24

Page 43: Subject: Statistics Paper: Advanced R Module: Programming ...

if else loop

Consider the ”if else” loop.

I The ”if else” loop is an improvisation of the ”if” loop.

I The ”else” part proves the instructions through the R codewhich is applicable when the logical condition does not hold.

I So, here also, the structure is as follows:

if (logical condition){

... R code ...}

else{

... R code ...}

Structure of loops in R 17 / 24

Page 44: Subject: Statistics Paper: Advanced R Module: Programming ...

”stop” and ”break” command

I Suppose your programme falls in an infinite loop or,

I suppose you want to change something in your programme,when it is still running.

I Then it is recommended to type ”stop” and ”break”command to stop execution.

Structure of loops in R 18 / 24

Page 45: Subject: Statistics Paper: Advanced R Module: Programming ...

”stop” and ”break” command

I Suppose your programme falls in an infinite loop or,

I suppose you want to change something in your programme,when it is still running.

I Then it is recommended to type ”stop” and ”break”command to stop execution.

Structure of loops in R 18 / 24

Page 46: Subject: Statistics Paper: Advanced R Module: Programming ...

”stop” and ”break” command

I Suppose your programme falls in an infinite loop or,

I suppose you want to change something in your programme,when it is still running.

I Then it is recommended to type ”stop” and ”break”command to stop execution.

Structure of loops in R 18 / 24

Page 47: Subject: Statistics Paper: Advanced R Module: Programming ...

Problem

To write down a programme that will compare the power of thetwo sample t-test, with that of the Wilcoxon and Kolmogorov -Smirnov tests, when the underlying data are normal.

Example 19 / 24

Page 48: Subject: Statistics Paper: Advanced R Module: Programming ...

I So, at first open an R script, name it example and save thescript in the R home directory.

I Add the purpose of the programme and the modification date.

I Here the purpose is to simulate the power of the two sample ttest vs various non-parametric alternatives and the date ofmodification is 1/6/15.

I So, it will look like below:

R programme for simulating the power of the twosample t test vs various non-parametric alternatives.

Created : 1/6/15.

Example 20 / 24

Page 49: Subject: Statistics Paper: Advanced R Module: Programming ...

I So, at first open an R script, name it example and save thescript in the R home directory.

I Add the purpose of the programme and the modification date.

I Here the purpose is to simulate the power of the two sample ttest vs various non-parametric alternatives and the date ofmodification is 1/6/15.

I So, it will look like below:

R programme for simulating the power of the twosample t test vs various non-parametric alternatives.

Created : 1/6/15.

Example 20 / 24

Page 50: Subject: Statistics Paper: Advanced R Module: Programming ...

I So, at first open an R script, name it example and save thescript in the R home directory.

I Add the purpose of the programme and the modification date.

I Here the purpose is to simulate the power of the two sample ttest vs various non-parametric alternatives and the date ofmodification is 1/6/15.

I So, it will look like below:

R programme for simulating the power of the twosample t test vs various non-parametric alternatives.

Created : 1/6/15.

Example 20 / 24

Page 51: Subject: Statistics Paper: Advanced R Module: Programming ...

I So, at first open an R script, name it example and save thescript in the R home directory.

I Add the purpose of the programme and the modification date.

I Here the purpose is to simulate the power of the two sample ttest vs various non-parametric alternatives and the date ofmodification is 1/6/15.

I So, it will look like below:

R programme for simulating the power of the twosample t test vs various non-parametric alternatives.

Created : 1/6/15.

Example 20 / 24

Page 52: Subject: Statistics Paper: Advanced R Module: Programming ...

I So, at first open an R script, name it example and save thescript in the R home directory.

I Add the purpose of the programme and the modification date.

I Here the purpose is to simulate the power of the two sample ttest vs various non-parametric alternatives and the date ofmodification is 1/6/15.

I So, it will look like below:

R programme for simulating the power of the twosample t test vs various non-parametric alternatives.

Created : 1/6/15.

Example 20 / 24

Page 53: Subject: Statistics Paper: Advanced R Module: Programming ...

I Next, specify the sample size and the number of simulations.

I Here, the simulation size is 200 and the sample size is 10 andwe will write down as:

sim.size = 200sample.size = 10

Example 21 / 24

Page 54: Subject: Statistics Paper: Advanced R Module: Programming ...

I Next, specify the sample size and the number of simulations.

I Here, the simulation size is 200 and the sample size is 10 andwe will write down as:

sim.size = 200sample.size = 10

Example 21 / 24

Page 55: Subject: Statistics Paper: Advanced R Module: Programming ...

I Next, specify the sample size and the number of simulations.

I Here, the simulation size is 200 and the sample size is 10 andwe will write down as:

sim.size = 200sample.size = 10

Example 21 / 24

Page 56: Subject: Statistics Paper: Advanced R Module: Programming ...

I At first specify the means of the two samples.

I Set the first population mean to be zero and run thesimulation for a range of values of the difference in means.

I Here delta is taken to be the difference between the twosample means and it takes a sequence of 50 values between -2and 2.

This will be specified through the following code:mu1 = 0delta = seq(-2,2, length=50)

Example 22 / 24

Page 57: Subject: Statistics Paper: Advanced R Module: Programming ...

I At first specify the means of the two samples.

I Set the first population mean to be zero and run thesimulation for a range of values of the difference in means.

I Here delta is taken to be the difference between the twosample means and it takes a sequence of 50 values between -2and 2.

This will be specified through the following code:mu1 = 0delta = seq(-2,2, length=50)

Example 22 / 24

Page 58: Subject: Statistics Paper: Advanced R Module: Programming ...

I At first specify the means of the two samples.

I Set the first population mean to be zero and run thesimulation for a range of values of the difference in means.

I Here delta is taken to be the difference between the twosample means and it takes a sequence of 50 values between -2and 2.

This will be specified through the following code:mu1 = 0delta = seq(-2,2, length=50)

Example 22 / 24

Page 59: Subject: Statistics Paper: Advanced R Module: Programming ...

I At first specify the means of the two samples.

I Set the first population mean to be zero and run thesimulation for a range of values of the difference in means.

I Here delta is taken to be the difference between the twosample means and it takes a sequence of 50 values between -2and 2.

This will be specified through the following code:mu1 = 0delta = seq(-2,2, length=50)

Example 22 / 24

Page 60: Subject: Statistics Paper: Advanced R Module: Programming ...

I At first specify the means of the two samples.

I Set the first population mean to be zero and run thesimulation for a range of values of the difference in means.

I Here delta is taken to be the difference between the twosample means and it takes a sequence of 50 values between -2and 2.

This will be specified through the following code:mu1 = 0delta = seq(-2,2, length=50)

Example 22 / 24

Page 61: Subject: Statistics Paper: Advanced R Module: Programming ...

I Here, we are specifying the seed value, in order to get exactlythis set of random numbers in future as well.

This can be done with the following:set.seed(231)

Example 23 / 24

Page 62: Subject: Statistics Paper: Advanced R Module: Programming ...

I Here, we are specifying the seed value, in order to get exactlythis set of random numbers in future as well.

This can be done with the following:set.seed(231)

Example 23 / 24

Page 63: Subject: Statistics Paper: Advanced R Module: Programming ...

I Here, we are specifying the seed value, in order to get exactlythis set of random numbers in future as well.

This can be done with the following:set.seed(231)

Example 23 / 24

Page 64: Subject: Statistics Paper: Advanced R Module: Programming ...

Finally, applying all the things that we have gathered so far fromthis module, our programme will look like the one, that have beengiven in Code 7.

Example 24 / 24

Page 65: Subject: Statistics Paper: Advanced R Module: Programming ...

Necessary R codes corresponding to this moduleMoumita Chatterjee

Code 1

Consider the following R code. Here, we are simulating 10000 samples each of size 10. Next, we are calculatingthe median for each of the 10000 sample. So, we are getting the histogram. This will give us the distributionof the sample median.

n <- 10nsim <- 10000theta.hat <- double(nsim)for (i in 1:nsim) {

x <- rcauchy(n)theta.hat[i] <- median(x)

}mean(theta.hat^2)

## [1] 0.3400206

cat("Calculation took", proc.time()[1], "seconds.\n")

## Calculation took 3.12 seconds.

hist(theta.hat, freq = FALSE, breaks = 100)curve(dnorm(x, sd = sqrt(mean(theta.hat^2))), add = TRUE)curve(dnorm(x, sd = sqrt(1 / (4 * n * dcauchy(0)^2))), add = TRUE, col = "red")

1

Page 66: Subject: Statistics Paper: Advanced R Module: Programming ...

Histogram of theta.hat

theta.hat

Den

sity

−2 0 2 4

0.0

0.2

0.4

0.6

0.8

Code 2

Consider the two sets of programme, both intending to find the same thing. But have a look at where thedifference lies.

Instead of

samp.mean<-rep(0,200)

for (i in 1:200){

samp.mean[i] <- mean(rnorm(20))}

2

Page 67: Subject: Statistics Paper: Advanced R Module: Programming ...

A more generic alternative is:

sample.size <- 20simulation.size <- 200

samp.mean<-matrix(rep(0,4000),nrow=200,ncol=20)

for (i in 1: simulation.size){

samp.mean[i] <- mean(rnorm(sample.size))}

Code 3

Look at the following code. It’s all about indenting a programme. In this case, since all the steps are notbeing executed together, so it would not be justified to start them from the same distance of the margin.

Instead of

sample.size <- 20simulation.size <- 200samp.mean<-matrix(rep(0,4000),nrow=200,ncol=20)

for (i in 1:simulation.size){for (j in 1: sample.size)

{samp.mean[i,j] <- mean(rnorm(mean=j,sd=1, n=sample.size))}

}

Use

sample.size <- 20simulation.size <- 200samp.mean<-matrix(rep(0,4000),nrow=200,ncol=20)

for (i in 1:simulation.size){

for (j in 1: sample.size){

samp.mean[i,j] <- mean(rnorm(mean=j,sd=1, n=sample.size))}

}

3

Page 68: Subject: Statistics Paper: Advanced R Module: Programming ...

Code 4

Have a look at the following sets

Instead of

m<- 200n<- 20

x<-matrix(rep(0,4000),nrow=200,ncol=20)

for (i in 1:m){

for (j in 1: n){

x[i,j] <- mean(rnorm(mean=j,sd=1, n=20))}

}

This is valied but not user friendly.

Use the following:

# use simulation size instead of msimulation_size<- 200# use sample size instead of nsample_size<- 20

# use sample mean instead of xsample_mean<-matrix(rep(0,4000),nrow=200,ncol=20)

for (i in 1:simulation_size){

for (j in 1: sample_size){

sample_mean[i,j] <- mean(rnorm(mean=j,sd=1, sample_size))}

}

Code 5

Look at the following set of codes. For the first one, we are using 2 loops, where for each “i”, R is consideringdifferent values of “j” and then the values of i and j are added and the result is stored in A. We are doingthis for different i’s as well. But this whole thing is making the programme a bit slow.

A <- matrix(0,500,500)for (i in 1:500)

for (j in 1:500)A[i,j] <- i + j

4

Page 69: Subject: Statistics Paper: Advanced R Module: Programming ...

Instead of this, just consider the second one, where a matrix is built with the elements of i and j, and then byadding each of the element of the matrix with the element of it’s transpose, we have the desired result. Thisrequires much lesser times. This time difference will be more if we increase the number of elements.

I.mat <- matrix(seq(1,500), nrow=500, ncol=500)A <- I.mat + t(I.mat)

Code 6

Look at the following R code. Here, we have added comments starting from the specification of the simulationsize, sample size, the number of repetitions and the calculation of sample mean.

# Add comments using the # sign.

simulation.size <- 200 # Set simulation sizesamp.size <- 20 # Set sample sizenum.times <- 10 # Set number of repetitions

sample.mean<-matrix(rep(0,4000),nrow=200,ncol=20)for (i in 1:simulation.size)

{

for (j in 1: num.times){

# Calculate the sample mean for i,j th observationsamp.mean[i,j] <- mean(rnorm(mean=j,sd=1, n=num.times))

}}

Code 7

Let us write a programme which will compare the power of the two sample t-test with that of the Wilcoxonand Kolmogorov - Smirnov tests when the underlying data are normal. #### R programme for simulatingthe power of the two sample t test vs various #### non-parametric alternatives #### 1/6/15

#### R programme for simulating the power of the two sample t test vs various non-parametric alternatives#### 1/6/15

sim.size <- 200;sample.size <- 10;set.seed(231)mu1 <- 0;delta <- seq(-2,2, length=50)

pow.ttest <- rep(0,length(delta))pow.wtest <- rep(0,length(delta))pow.kstest <- rep(0,length(delta))

pt.test<-rep(0,sim.size)pw.test<-rep(0,sim.size)pks.test<-rep(0,sim.size)

5

Page 70: Subject: Statistics Paper: Advanced R Module: Programming ...

for (j in 1:length(delta)){

mu2 <- mu1 + delta[j]for (i in 1:sim.size){

# Generate ith samplesamp1 <- rnorm(mean=mu1,sample.size)samp2 <- rnorm(mean=mu2,sample.size)

# Perform ith set of teststest1 <- t.test(samp1, samp2,alternative = c("two.sided"))pt.test[i] <- (test1$p.value < 0.05)

test2 <- wilcox.test(samp1, samp2,alternative = c("two.sided"),exact = TRUE)

pw.test[i] <- (test2$p.value < 0.05)

test3 <- ks.test(samp1, samp2,alternative = c("two.sided"),exact = TRUE)

pks.test[i] <- (test3$p.value < 0.05)

}pow.ttest[j] <- sum(pt.test)/sim.size # Calculate powers for jth settingpow.wtest[j] <- sum(pw.test)/sim.sizepow.kstest[j] <- sum(pks.test)/sim.size

} # End of j loop

6