Chapter 8 - Graphics
Transcript of Chapter 8 - Graphics
Chapter 8 - GraphicsExercises - Alternative with SIMD
Brian Fogarty24 August 2018
ContentsEXERCISE I - BAR PLOTS 1
ANSWERS FOR EXERCISE I 1Question 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Question 1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Question 1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Question 1.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Question 1.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
EXERCISE II - HISTOGRAMS 6
ANSWERS FOR EXERCISE II 6Question 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Question 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Question 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
EXERCISE III - SCATTERPLOTS 9
ANSWERS FOR EXERCISE III 10Question 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Question 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Question 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Question 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Question 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Question 3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Question 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Question 3.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Question 3.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
EXERCISE I - BAR PLOTS
Using the 2012 survey about smoking and drug use amongst English pupils (2012smokedrugs.dta), createthe following bar plots using ggplot2 code.
1. Create a bar plot with grey bars using the recoded version of lifewant from Chapter 5, which wecalled lifewant2. Discuss what it shows you.
2. Create the same bar plot but now with red bars.
3. Create a bar plot with lifewant2 and free. You should label the values of free as 0=’0. Not FreeLunch’;1=’1. Free Lunch’. Discuss what it shows you.
4. Create the same bar plot but now with grey shade bars.
1
5. Create the same bar plot but now using dodging.
ANSWERS FOR EXERCISE I
setwd("C:/QSSD/Exercises/Chapter 8 - Exercises")getwd()
[1] "C:/QSSD/Exercises/Chapter 8 - Exercises"library(foreign)drugs <- read.dta("2012smokedrugs.dta", convert.factors=FALSE)
Question 1.1
library(car)
Loading required package: carDatadrugs$lifewant2 <- recode(drugs$lifewant, "1='1. Strongly Agree';2='2. Agree';
3='3. Neither Agree/Disagree';4='4. Disagree';5='5. Strongly Disagree'")
drugs1 <- subset(drugs, !is.na(lifewant2))
library(ggplot2)x11()ggplot(data = drugs1) +
geom_bar(mapping = aes(lifewant2))
2
0
1000
2000
1. Strongly Agree 2. Agree 3. Neither Agree/Disagree 4. Disagree 5. Strongly Disagree
lifewant2
coun
t
The bar plot shows that the modal response is that pupils “strongly agree” that they have the life they want.Meanwhile, the smallest category are pupils responding “strongly disagree” that they have the life they want.
Question 1.2
x11()ggplot(data = drugs1) +
geom_bar(mapping = aes(lifewant2), fill="red")
3
0
1000
2000
1. Strongly Agree 2. Agree 3. Neither Agree/Disagree 4. Disagree 5. Strongly Disagree
lifewant2
coun
t
Question 1.3
drugs$free1 <- recode(drugs$free, "0='0. Not Free Lunch';1='1. Free Lunch'")table(drugs$free1)
0. Not Free Lunch 1. Free Lunch6162 1194
drugs1 <- subset(drugs, !is.na(lifewant2) & !is.na(free1))
x11()ggplot(data = drugs1) +
geom_bar(mapping = aes(lifewant2, fill=free1))
4
0
1000
2000
1. Strongly Agree 2. Agree3. Neither Agree/Disagree4. Disagree5. Strongly Disagree
lifewant2
coun
t free1
0. Not Free Lunch
1. Free Lunch
The bar plot shows that whether a student receives a free lunch or not makes up roughly the same portion ofresponses for each category.
Question 1.4
x11()ggplot(data = drugs1) +
geom_bar(mapping = aes(lifewant2, fill=free1)) +scale_fill_grey()
5
0
1000
2000
1. Strongly Agree 2. Agree3. Neither Agree/Disagree4. Disagree5. Strongly Disagree
lifewant2
coun
t free1
0. Not Free Lunch
1. Free Lunch
Question 1.5
x11()ggplot(data = drugs1) +
geom_bar(mapping = aes(lifewant2, fill=free1), position="dodge")
6
0
500
1000
1500
2000
1. Strongly Agree 2. Agree3. Neither Agree/Disagree4. Disagree5. Strongly Disagree
lifewant2
coun
t free1
0. Not Free Lunch
1. Free Lunch
EXERCISE II - HISTOGRAMS
Using the 2016 SIMD (simd.csv), create the following histograms using ggplot2 code.
1. Create a histogram with using the variable pct_employment_deprived, the percentage of people in adatazone receiving financial support from the government due to being unable to work, and set thebinwidth to 1. Discuss what it shows you.
2. Create the same histogram but now with red bars.
3. Create a histogram with pct_employment_deprived and urban as the second variable. Discuss whatit shows you.
ANSWERS FOR EXERCISE II
Question 2.1
simd <- read.csv("simd.csv")
simd <- subset(simd, !is.na(pct_depress))
x11()
7
ggplot(data=simd) +geom_histogram(mapping = aes(pct_employment_deprived), binwidth=1)
0
200
400
0 20 40
pct_employment_deprived
coun
t
The histogram shows that most datazones are clustered around 5% percent of employment deprivation.
Question 2.2
x11()ggplot(data=simd) +
geom_histogram(mapping = aes(pct_employment_deprived), binwidth=1, fill="red")
8
0
200
400
0 20 40
pct_employment_deprived
coun
t
Question 2.3
library(car)simd$urban <- recode(simd$urban, "0='Rural';1='Urban'",as.factor=TRUE)class(simd$urban)
[1] "factor"x11()ggplot(data=simd) +
geom_histogram(mapping = aes(pct_employment_deprived, fill=urban),binwidth=1)
9
0
200
400
0 20 40
pct_employment_deprived
coun
t urban
Rural
Urban
The histogram shows that urban datazones tend to have percentages of employment deprivation than ruraldatazones. The datazones with the highest percentages of employment deprivation mostly appear to beurban datazones.
EXERCISE III - SCATTERPLOTS
Using the 2016 SIMD data (simd.csv), create the following scatterplots using ggplot2 code.
1. Create a scatterplot with black points with pct_school_attend on the x-axis, pct_employment_deprivedon the y-axis, and include position="jitter". Discuss what it shows you.
2. Create the same scatterplot, but with red instead of black data points.
3. Create the same scatterplot using black points and alpha=1/10 to illustrate points that overlay oneanother. Discuss what you find.
4. Create a scatterplot with pct_school_attend on the x-axis, pct_employment_deprived on the y-axis,and urban as the third variable. Use scale_colour_grey() to differentiate the urban/rural data points.Discuss what it shows you.
5. Create the same scatterplot, but with the default colours to differentiate the urban/rural data points.
6. Create the same scatterplot, but specify "darkblue" and "orange" as the colours to differentiate theurban/rural data points.
7. Create the same scatterplot, but with default symbols to differentiate the urban/rural data points.
10
8. Create a similar scatterplot using urban as the faceting variable. This will create two scatterplots -one for rural datazones and one for urban datazones.
9. Place the scatterplots from questions 4 through 7 onto one page using the grid.arrange() function.
ANSWERS FOR EXERCISE III
Question 3.1
x11()ggplot(data=simd) +
geom_point(mapping=aes(x=pct_school_attend,y=pct_employment_deprived),position="jitter")
0
20
40
50 60 70 80 90 100
pct_school_attend
pct_
empl
oym
ent_
depr
ived
The scatterplot shows that as the percentage of school attendance increases, the percentage of employmentdeprivation decreases in Scottish datazones.
Question 3.2
x11()ggplot(data=simd) +
geom_point(mapping=aes(x=pct_school_attend,y=pct_employment_deprived),position="jitter", colour="red")
11
0
20
40
50 60 70 80 90 100
pct_school_attend
pct_
empl
oym
ent_
depr
ived
Question 3.3
x11()ggplot(data=simd) +
geom_point(mapping=aes(x=pct_school_attend,y=pct_employment_deprived),position="jitter", alpha=1/10)
12
0
20
40
50 60 70 80 90 100
pct_school_attend
pct_
empl
oym
ent_
depr
ived
This scatterplot shows a cluster of datazones between 85-95% of school attendance and between 0-8% ofemployment deprivation.
Question 3.4
x11()ggplot(data=simd) +
geom_point(mapping=aes(x=pct_school_attend,y=pct_employment_deprived,colour=urban), position="jitter") +
scale_colour_grey()
13
0
20
40
50 60 70 80 90 100
pct_school_attend
pct_
empl
oym
ent_
depr
ived
urban
Rural
Urban
This scatterplot shows a mix of urban and rural datazones with high percentages of school attendanceand low percentages of employment deprivation. However, the datazones with low percentages of schoolattendance and high percentages of employment deprivation tend to be overwhelmingly urban datazones.
Question 3.5
x11()ggplot(data=simd) +
geom_point(mapping=aes(x=pct_school_attend,y=pct_employment_deprived,colour=urban), position="jitter")
14
0
20
40
50 60 70 80 90 100
pct_school_attend
pct_
empl
oym
ent_
depr
ived
urban
Rural
Urban
Question 3.6
x11()ggplot(data=simd) +
geom_point(mapping=aes(x=pct_school_attend,y=pct_employment_deprived,colour=urban), position="jitter") +
scale_colour_manual(values=c("darkblue","orange"))
15
0
20
40
50 60 70 80 90 100
pct_school_attend
pct_
empl
oym
ent_
depr
ived
urban
Rural
Urban
Question 3.7
x11()ggplot(data=simd) +
geom_point(mapping=aes(x=pct_school_attend,y=pct_employment_deprived,shape=urban), position="jitter")
16
0
20
40
50 60 70 80 90 100
pct_school_attend
pct_
empl
oym
ent_
depr
ived
urban
Rural
Urban
Question 3.8
x11()ggplot(data=simd) +
geom_point(mapping=aes(x=pct_school_attend,y=pct_employment_deprived),position="jitter") +
facet_wrap(~urban)
17
Rural Urban
50 60 70 80 90 100 50 60 70 80 90 100
0
20
40
pct_school_attend
pct_
empl
oym
ent_
depr
ived
Question 3.9
p1 <- ggplot(data=simd) +geom_point(mapping=aes(x=pct_school_attend,y=pct_employment_deprived,
colour=urban)) +scale_colour_grey()
p2 <- ggplot(data=simd) +geom_point(mapping=aes(x=pct_school_attend,y=pct_employment_deprived,
colour=urban))
p3 <- ggplot(data=simd) +geom_point(mapping=aes(x=pct_school_attend,y=pct_employment_deprived,
colour=urban)) +scale_colour_manual(values=c("darkblue","orange"))
p4 <- ggplot(data=simd) +geom_point(mapping=aes(x=pct_school_attend,y=pct_employment_deprived,
shape=urban))
library(gridExtra)x11()grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2)
18
0
20
40
50 60 70 80 90 100
pct_school_attend
pct_
empl
oym
ent_
depr
ived
urban
Rural
Urban
0
20
40
50 60 70 80 90 100
pct_school_attend
pct_
empl
oym
ent_
depr
ived
urban
Rural
Urban
0
20
40
50 60 70 80 90 100
pct_school_attend
pct_
empl
oym
ent_
depr
ived
urban
Rural
Urban
0
20
40
50 60 70 80 90 100
pct_school_attend
pct_
empl
oym
ent_
depr
ived
urban
Rural
Urban
19