Data Analysis Using R: 3. Graphical Analyses

22
Data Analysis Using R: 3. Graphical Analyses Tuan V. Nguyen Garvan Institute of Medical Research, Sydney, Australia

description

Data Analysis Using R: 3. Graphical Analyses. Tuan V. Nguyen Garvan Institute of Medical Research, Sydney, Australia. Overview. Data Barchart Historgram Stripchart Boxplot Scatter plot. Data. Body composition data measured by dual energy X-ray absorptiometry - PowerPoint PPT Presentation

Transcript of Data Analysis Using R: 3. Graphical Analyses

Page 1: Data Analysis Using R: 3. Graphical Analyses

Data Analysis Using R:3. Graphical Analyses

Tuan V. Nguyen

Garvan Institute of Medical Research,

Sydney, Australia

Page 2: Data Analysis Using R: 3. Graphical Analyses

Overview

• Data• Barchart • Historgram• Stripchart• Boxplot• Scatter plot

Page 3: Data Analysis Using R: 3. Graphical Analyses

Data

• Body composition data measured by dual energy X-ray absorptiometry

• 43 men and women, aged between 11 and 28• Variable names:

– id – age– sex– dur– weight– height– lm (lean mass)– pclm (percent lean mass)– fm (fat mass)– pcfm (percent fat mass)– bmc (bone mineral contents)

Page 4: Data Analysis Using R: 3. Graphical Analyses

Reading data into R

setwd(“c:/works/stats”)

bc <- read.table(“comp.txt”, header=T)

attach(bc)

names(bc)

[1] "id" "age" "sex" "dur" "weight" "height" "lm" "pclm"

[9] "fm" "pcfm" "bmc"

Page 5: Data Analysis Using R: 3. Graphical Analyses

View data

bc

id age sex dur weight height lm pclm fm pcfm bmc1 1 15 M 5 39 148 32.96 84.50 4.86 12.5 1.332 2 16 M 8 45 162 38.16 84.80 4.15 9.2 1.893 3 11 M 4 23 132 18.51 80.50 2.99 13.0 0.744 4 19 M 9 46 159 35.92 78.10 6.73 14.6 1.595 5 19 M 6 56 166 46.63 83.00 5.61 10.2 2.566 6 22 M 12 50 152 42.13 84.00 3.93 8.1 2.127 7 16 M 8 53 170 45.23 85.00 5.15 9.8 2.218 8 12 M 5 35 151 25.26 72.20 9.02 25.6 0.959 9 21 M 8 46 166 39.44 85.70 4.64 10.1 2.0010 10 15 M 6 45 165 38.47 85.50 3.92 8.9 1.7011 11 13 M 5 32 142 25.50 79.70 4.26 13.9 0.9912 12 20 M 6 40 153 32.70 82.00 4.66 12.0 1.38...40 40 12 M 10 39 155 33.00 84.60 3.50 9.2 1.4341 41 15 M 6 45 154 36.00 80.00 5.33 12.5 1.5242 42 22 M 7 46 157 38.50 84.00 4.63 10.3 1.8643 43 25 M 13 45 162 37.35 83.00 4.34 10.0 1.70

Page 6: Data Analysis Using R: 3. Graphical Analyses

Counting: barplot

freq <- table(sex)barplot(freq)barplot(freq, horiz=T, main="Sex distribution")

F M

05

1015

2025

30

FM

Sex distribution

0 5 10 15 20 25 30

Page 7: Data Analysis Using R: 3. Graphical Analyses

Counting by group : barplot

agegroup <- cut(age, 3)agesex <- table(sex, agegroup)barplot(agesex)

(11,16.7] (16.7,22.3] (22.3,28]

05

10

15

20

25

Page 8: Data Analysis Using R: 3. Graphical Analyses

Counting by group : barplot

agegroup <- cut(age, 3)agesex <- table(sex, agegroup)barplot(agesex, xlab="Age group")barplot(agesex, beside=T, xlab="Age group")

(11,16.7] (16.7,22.3] (22.3,28]

Age group

05

10

15

20

25

(11,16.7] (16.7,22.3] (22.3,28]

Age group

05

10

15

Page 9: Data Analysis Using R: 3. Graphical Analyses

Distribution of data: Histogram

par(mfrow=c(2,2))hist(age)hist(age, breaks=20)hist(age, breaks=40)hist(age, breaks=50)

Histogram of age

age

Fre

qu

en

cy

10 15 20 25

02

46

81

0

Histogram of age

age

Fre

qu

en

cy

15 20 25

01

23

45

67

Histogram of age

age

Fre

qu

en

cy

15 20 25

01

23

45

67

Histogram of age

age

Fre

qu

en

cy

15 20 250

12

34

56

7

Page 10: Data Analysis Using R: 3. Graphical Analyses

Distribution of data: Histogram

par(mfrow=c(2,2))hist(age)hist(weight)hist(lm)hist(fm)

Histogram of age

age

Fre

qu

en

cy

10 15 20 25

02

46

81

0

Histogram of weight

weight

Fre

qu

en

cy

20 30 40 50 60

05

10

15

Histogram of lm

lm

Fre

qu

en

cy

15 20 25 30 35 40 45 50

02

46

81

01

21

4

Histogram of fm

fm

Fre

qu

en

cy

2 4 6 8 10 12 14

05

10

15

Page 11: Data Analysis Using R: 3. Graphical Analyses

Distribution of data: plot(density)

hist(lm, main="Distribution of lean mass")plot(density(lm), main="Distribution of lean mass")

Distribution of lean mass

lm

Fre

qu

en

cy

15 20 25 30 35 40 45 50

02

46

81

01

21

4

10 20 30 40 50

0.0

00

.01

0.0

20

.03

0.0

40

.05

Distribution of lean mass

N = 43 Bandwidth = 2.607

De

nsi

ty

Page 12: Data Analysis Using R: 3. Graphical Analyses

Normal distribution? qqnorm

• qqnorm(lm)

-2 -1 0 1 2

20

25

30

35

40

45

Normal Q-Q Plot

Theoretical Quantiles

Sa

mp

le Q

ua

ntil

es

Page 13: Data Analysis Using R: 3. Graphical Analyses

Contiunity of data: stripchart

stripchart(lm, xlab=“Lean mass; kg")

20 25 30 35 40 45

Lean mass; kg

?

Page 14: Data Analysis Using R: 3. Graphical Analyses

Summary of continuous data: boxplot2

02

53

03

54

04

5

46

81

01

2

boxplot(fm)boxplot(lm)

LMMin. 1st Qu. Median Mean 3rd Qu. Max. 18.51 31.91 35.92 35.65 40.14 46.63

FMMin. 1st Qu. Median Mean 3rd Qu. Max. 2.990 4.250 5.270 6.500 8.795 12.800

Page 15: Data Analysis Using R: 3. Graphical Analyses

Summary of data by group: boxplot

boxplot(fm ~ sex)boxplot(lm ~ sex)

F M

20

25

30

35

40

45

F M

46

81

01

2

Lean mass by sex Fat mass by sex

Page 16: Data Analysis Using R: 3. Graphical Analyses

Analysis of association: scatter plot

plot(lm ~ age) plot(lm ~ age, pch=16)

15 20 25

20

25

30

35

40

45

age

lm

15 20 25

20

25

30

35

40

45

age

lm

Page 17: Data Analysis Using R: 3. Graphical Analyses

Analysis of association: scatter plot

line <- lm(lm ~ age)

plot(lm ~ age, pch=16)

abline(line)

15 20 25

20

25

30

35

40

45

age

lm

Page 18: Data Analysis Using R: 3. Graphical Analyses

Analysis of association by group: scatter plot

plot(lm ~ age, pch=ifelse(sex=="M", "M", "F"), xlab="Age", ylab="Kg")

M

M

M

M

M

M

M

M

MM

M

M

M

F

F

FF

F

F

M

M

FF

F

FF

M

M

M

F

M

M

M

F

M

M

MM

M

M

M

MM

15 20 25

20

25

30

35

40

45

Age

Kg

Page 19: Data Analysis Using R: 3. Graphical Analyses

Analysis of multiple associations

data <- data.frame(age, weight, lm, fm, bmc)pairs(data)

age

25 35 45 55 4 6 8 10 12

15

20

25

25

35

45

55

weight

lm

20

30

40

46

81

01

2

fm

15 20 25 20 30 40 1.0 1.5 2.0 2.5

1.0

1.5

2.0

2.5

bmc

Page 20: Data Analysis Using R: 3. Graphical Analyses

Analysis of multiple associations – more fancy graph

matrix.cor <- function(x, y, digits=2, prefix="", cex.cor){ usr <- par("usr"); on.exit(par(usr)) par(usr = c(0, 1, 0, 1)) r <- abs(cor(x, y)) txt <- format(c(r, 0.123456789), digits=digits)[1] txt <- paste(prefix, txt, sep="") if(missing(cex.cor)) cex <- 0.8/strwidth(txt) test <- cor.test(x,y) # borrowed from printCoefmat Signif <- symnum(test$p.value, corr = FALSE, na = FALSE, cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1), symbols = c("***", "**", "*", ".", " ")) text(0.5, 0.5, txt, cex = cex * r) text(.8, .8, Signif, cex=cex, col=2)}

pairs(data,lower.panel=panel.smooth, upper.panel=matrix.cor)

Page 21: Data Analysis Using R: 3. Graphical Analyses

Results

age

25 35 45 55

0.48**

0.36*

4 6 8 10 12

0 .0 9 5

15

20

25

0.56***

25

35

45

55

weight 0.88***

0.11

0.85

***

lm 0.36*

20

30

40

0.86***

46

81

01

2

fm 0.16

15 20 25 20 30 40 1.0 1.5 2.0 2.5

1.0

1.5

2.0

2.5

bmc

Page 22: Data Analysis Using R: 3. Graphical Analyses

Summary

• R is a very powerful package for graphical analysis

• First step in data analysis: graphical analysis

• Look for – Distributions– Differences– Associations