Rearranging and manipulang data - Evolutionary...

50
An introduc+on to Noémie Becker & Benedikt Holtmann Winter Semester 16/17 Rearranging and manipula.ng data

Transcript of Rearranging and manipulang data - Evolutionary...

Page 1: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Anintroduc+onto

NoémieBecker&BenediktHoltmann

WinterSemester16/17

Rearrangingandmanipula.ngdata

Page 2: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Courseoutline–Day7

Courseoutline

•  Review–Checkingandcleaningdata

•  Rearrangingandmanipula+ngdata

•  Reshapingdata

•  Combiningdatasets

•  Makingnewvariables

•  SubseIngdata

•  Summarisingdata

Page 3: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Review–Cleaningandcheckingdata

Setworkingdirectoryusingsetwd()setwd("~/Desktop")Importdatausingread.table()andread.csv()func+onsmyData<-read.csv(file="datafile.csv”,

header=TRUE, sep=",", strip.white=TRUE, na.strings="")

Importdata

Page 4: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Review–Cleaningandcheckingdata

Getanoverview

str(datafile)

'data.frame': 769obs.of12variables: $Snail.ID:int1111111111... $Sex:Factorw/4levels"female","male",..:2242222222... $Size:Factorw/2levels"large","small":2222222222... $Feeding:logiFALSEFALSEFALSEFALSEFALSETRUE... $Distance:num0.170.870.220.130.360.840.690.60.850.59... $Depth:num1.661.261.431.461.211.561.621621.961.93...

$Temp:int21211819212120201919…

Page 5: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Review–Cleaningandcheckingdata

Getanoverview

summary(datafile)Snail.ID Sex Size Feeding DistanceMin.:1.00female:384large:383Mode:logicalMin.:0.00001stQu.:4.00male:385small:385FALSE:5031stQu.:0.2800Median:8.50 NA's:1TRUE:266Median:0.5100Mean:8.49 NA's:0 Mean:0.51253rdQu.:12.00 3rdQu.:0.7500Max.:16.00 Max.:1.0000.........Con+nues

Page 6: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Review–Cleaningandcheckingdata

Getanoverview

head(x)tail(x) Snail.IDSex Size FeedingDistanceDepthTemp11 male smallFALSE 0.17 1.66 2121 male smallFALSE 0.87 1.26 2131 male smallFALSE 0.22 1.43 1841 male smallFALSE 0.13 1.46 1951 male smallFALSE 0.36 1.21 2161 male smallTRUE 0.84 1.56 21

Page 7: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Review–Cleaningandcheckingdata

Getanoverview

str() providesanoverviewofanobject

summary() returnsbasicsta+s+calsummaryforvariables

head() returnsthefirstrecordsofanobject

tail() returnsthelastrecordsofanobject

sort() sortsavectororfactorintoascendingordescendingorder

order() takesasetofvectorsasargumentsandsortsrecursivelybyeachvector,breaking+esbylookingatsuccessivevectorsintheargumentlist

Page 8: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Built-indata

•  Manypackagescomewithbuilt-indatasets

•  Tosavememory,datasetsarenotloadedun+ltheyarereferencedthefirst+me

•  Thefunc+ondata()willlistallloadedpackagesandtheirbuilt-indatasets

•  Built-indataisusuallyusedforexamplesthatyoucanfindinthehelpfile

Page 9: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Courseoutline–Day7

Courseoutline

•  Review–Checkingandcleaningdata

•  Rearrangingandmanipula.ngdata

•  Reshapingdata

•  Combiningdatasets

•  Makingnewvariables

•  SubseIngdata

•  Summarisingdata

Page 10: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Reshapingdata

WewillusedataonfishabundanceFish_survey<-read.csv("Fish_survey.csv",header=TRUE)

Site Month Transect Trout Perch S+ckleback1River1 January 1 10 5 282River1 January 2 0 13 423River1 January 3 8 19 94River2 January 1 3 5 725River2 January 2 2 9 336River2 January 3 15 24 65...…

Page 11: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Reshapingdata

WewillusedataonfishabundanceFish_survey<-read.csv("Fish_survey.csv",header=TRUE)

Site Month Transect Trout Perch S.ckleback1River1 January 1 10 5 282River1 January 2 0 13 423River1 January 3 8 19 94River2 January 1 3 5 725River2 January 2 2 9 336River2 January 3 15 24 65...…

Page 12: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Reshapingdata

Reshapingdatausingthepackage.dyrlibrary(+dyr)Tomakeonesinglecolumnincludingallthreespeciesyoucanusethefunc+ongather()

Page 13: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Reshapingdata

Examplegather()Fish_survey_long<-gather(Fish_survey,Species,Abundance,4:6)

Site Month Transect Species Abundance1River1 January 1 Trout 102River1 January 2 Trout 03River1 January 3 Trout 84River2 January 1 Trout 35River2 January 2 Trout 26River2 January 3 Trout 15…...

Page 14: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Reshapingdata

Toconvertthedatabackintoaformatwithseparatecolumnsforeachusethefunc+onspread()Examplespread()Fish_survey_wide<-spread(Fish_survey_long,Species,Abundance)

Page 15: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Reshapingdata

Reshapingdatausingthepackagereshape2library(reshape2)Insteadofgather()thereshape2packageusesthefunc+onmelt()Cau.on:Donotconfusethereshape2librarywiththereshapefunc+on!!

Page 16: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Reshapingdata

Reshapingdatausingthepackagereshape2Examplemelt()Fish_survey_long<-melt(Fish_survey, id.vars=c("Site","Month","Transect"),

measure.vars=c("Trout","Perch","S+ckleback"), variable.name="Species",value.name="Abundance")

Page 17: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Reshapingdata

Reshapingdatausingthepackagereshape2Similarly,insteadofspread()thereshape2packageusesthefunc+ondcast()Exampledcast()Fish_survey_wide<-dcast(Fish_survey_long,

Site+Month+Transect~Species, value.var="Abundance")

Page 18: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Combiningdatasets

Tocombinedatasetswewillusethepackagedplyrinstall.packages(dplyr)library(dplyr)

Page 19: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Combiningdatasets

Tocombinedatasetswewillusethepackagedplyr

Page 20: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Combiningdatasets

TocombinedatasetswewillusethepackagedplyrImportdatasetsFish_survey_long<-read.csv("Fish_survey_long.csv",header=TRUE,stringsAsFactors=FALSE)Water_data<-read.csv("Water_data.csv",header=TRUE,stringsAsFactors=FALSE)GPS_loca+on<-read.csv("GPS_data.csv",header=TRUE,stringsAsFactors=FALSE)

Page 21: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Combiningdatasets

Whynotjustusecbind()?•  Datasetsneedtohavethesamenumberofrows•  Rowsneedtobeinthesameorderbecauserowsarematchedby

posi+on

X1 X2A 1B 1A 2B 2

X1 X3A TA FB FB T

X1 X4A 1A 2A 3

Page 22: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Combiningdatasets

Wecanjoindatasetsbyusingthecolumnstheyshare:

FishsurveySite

MonthTransectSpecies

Watercharacteris.csSite

MonthWatertemp.O2-content

GPSSite

TransectLa+tudeLongitude

Page 23: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Combiningdatasets

Func.onstocombinedatasetsindplyr

les_join(a,b,by="x1") Joinsmatchingrowsfrombtoa

right_join(a,b,by="x1") Joinsmatchingrowsfromatob

inner_join(a,b,by="x1") Returnsallrowsfromawheretherearematching

valuesinb

full_join(a,b,by="x1") Joinsdataandreturnsallrowsandcolumns

semi_join(a,b,by="x1") Allrowsinathathaveamatchinb,keepingjust

columnsfroma.

an+_join(a,b,by="x1") Allrowsinathatdonothaveamatchinb

Page 24: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Combiningdatasets

1.Joinwatercharacteris+cstofishabundancedatausinginner_join()Fish_and_Water<-inner_join(Fish_survey_long,Water_data,

by=c("Site","Month"))

Page 25: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Combiningdatasets

Checkthenewdataframestr(Fish_and_Water)head(Fish_and_Water)'data.frame':72obs.of7variables:$Site:Factorw/2levels"River1","River2":1112221112...$Month:chr"January""January""January""January"...$Transect:int1231231231...$Species:Factorw/3levels"Perch","S+ckleback",..:3333333333...$Abundance:int1008321527011...$Mean_water_temp:num3.63.63.66.26.26.22.32.32.38...$Mean_O2_content:num12.612.612.61212129.89.89.812.3...

Page 26: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Combiningdatasets

2.AddGPSloaca+onstonewFish_and_Waterdatasetusinginner_join()Fish_survey_combined<-inner_join(Fish_and_Water,GPS_loca+on,

by=c("Site","Transect"))Checkifitworked:str(Fish_survey_combined)head(Fish_survey_combined)

Page 27: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Addingnewvariables

WewillusedataonbirdbehaviourBird_Behaviour<-read.csv("Bird_Behaviour.csv",header=TRUE,

stringsAsFactors=FALSE)#Getanoverviewstr(Bird_Behaviour)

X1 X2A 1B 1A 2B 2

X1 X2 X3A 1 TB 1 FA 2 TB 2 F

Page 28: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Addingnewvariables

Threewaysaddinganewvariable(logofFID)Using$Bird_Behaviour$log_FID<-log(Bird_Behaviour$FID)Using[]-operatorBird_Behaviour[,"log_FID"]<-log(Bird_Behaviour$FID)Usingmutate()fromdplyrpackageBird_Behaviour<-mutate(Bird_Behaviour,log_FID=log(FID))

Page 29: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Addingnewvariables

Addinganewvariablehead(Bird_Behaviour)IndSpeciesSexYearFIDDisturbanceFledglingslog_FID1PD1Passer_domes+cusmale 201358 1 1.60943792PD1Passer_domes+cusmale 2014240 4 0.69314723PD1Passer_domes+cusmale 2015830 4 2.07944154PD2Passer_domes+cusfemale 20131035 3 2.30258515PD2Passer_domes+cusfemale 20141015 0 2.30258516PD2Passer_domes+cusfemale 201566 2 1.7917595

Page 30: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Addingnewvariables

Splitonecolumnintotwousingseparate()fromdplyrpackageBird_Behaviour<-separate(Bird_Behaviour,Species,

c("Genus","Species"),sep="_",remove=TRUE)

X1 X2A 1_1B 1_2A 2_1B 2_2

X1 X2.1 X2.2A 1 1B 1 2A 2 1B 2 2

Page 31: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Addingnewvariables

Splitonecolumnintotwousingseparate()fromdplyrpackagehead(Bird_Behaviour)IndGenusSpeciesSexYearFID ...1 PD1 Passerdomes+cus male 20135 ...2 PD1 Passerdomes+cus male 20142 ...3 PD1 Passerdomes+cus male 20158...4 PD2 Passerdomes+cus female 2013 10 ...5 PD2 Passerdomes+cus female 2014 10 ...6 PD2 Passerdomes+cus female 20156 ...

Page 32: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Addingnewvariables

Combinetwocolumnsusingunite()from.dyrpackageBird_Behaviour<-unite(Bird_Behaviour,"Genus_Species", c(Genus,Species),sep="_",remove=TRUE)

X1 X2A 1_1B 1_2A 2_1B 2_2

X1 X2.1 X2.2A 1 1B 1 2A 2 1B 2 2

Page 33: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Addingnewvariables

Combinetwocolumnsusingunite()from.dyrpackagehead(Bird_Behaviour)IndGenus_SpeciesSexYearFID ...1 PD1 Passer_domes+cus male 20135 ...2 PD1 Passer_domes+cus male 20142 ...3 PD1 Passer_domes+cus male 20158...4 PD2 Passer_domes+cus female 2013 10 ...5 PD2 Passer_domes+cus female 2014 10 ...6 PD2 Passer_domes+cus female 20156 ...

Page 34: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

SubseZngdata

SubseZngdata•  Using[]–operator•  Usingsubset()•  SubseIngwithfunc+onsfromdplyrpackage•  slice()•  filter()•  sample_frac()•  sample_n()•  select()

Page 35: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

SubseZngdata

SubseIngusing[]-operatorExamples:Bird_Behaviour[,1:4]#selectsthefirst4columnsBird_Behaviour[c(2,3),]#selectsrows2and3Bird_Behaviour[1:3,1:4]#selectstherows1to3andcolumns1to4Bird_Behaviour[c(1:3,6),c(1:4,8]#selectstherows1to3and6,andthecolumns1to4and8

Page 36: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

SubseZngdata

SubseIngusing[]and$operatorExamples:BirdBird_Behaviour[Bird_Behaviour$Sex=="male",]#selectsallrowswithmales

Page 37: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

SubseZngdata

SubseIngusingsubset()subset(x,subset,select,...)

Argument Descrip.onx Theobjectfromwhichtoextractsubsetsubset Alogicalexpressionthatdescribesthesetofrowsto

returnselect Anexpressionindica+ngwhichcolumnstoreturn

Page 38: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

SubseZngdata

Examplessubset():subset(Bird_Behaviour,FID<10)#selectsallrowswithFIDsmallerthan10msubset(Bird_Behaviour,FID<10&Sex=="male")#selectsallrowsformaleswithFIDsmallerthan10msubset(Bird_Behaviour,FID>10|FID<15,select=c(Ind,Sex,Year))#selectsallrowsthathaveavalueofFIDgreaterthan10orlessthan15.WekeeponlytheIND,SexandYearcolumn

Page 39: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

SubseZngdata

ReviewoflogicaloperatorsCheckout?base::Logicand?Comparisontolearnmore

Operator Descrip.on> greaterthan>= greaterthanorequalto< lessthan<= lessthanorequalto== equalto!= notequaltox&y xandyx|y xory

Page 40: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

SubseIngbyrowsusingslice()andfilter()Examplesslice()andfilter():Bird_Behaviour.sclice<-slice(Bird_Behaviour,3:5)#selectsrows3-5Bird_Behaviour.filter<-filter(Bird_Behaviour,FID<5)#selectsrowsthatmeetcertaincriteria

SubseZngrowsindplyr

Page 41: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Takingarandomsampleofrowsusingsample_frac()andsample_n()Examplessample_frac()andsample_n():Bird_Behaviour.50<-sample_frac(Bird_Behaviour,size=0.5,

replace=FALSE)#takesrandomly50%oftherowsBird_Behaviour_50Rows<-sample_n(Bird_Behaviour,50,

replace=FALSE)#takesrandomly50rows

SubseZngrowsindplyr

Page 42: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

SubseIngbycolumnsusingselect()Examplesselect():Bird_Behaviour_col<-select(Bird_Behaviour,Ind,Sex,Fledglings)#selectsthecolumnsInd,Sex,andFledglingsBird_Behaviour_reduced<-select(Bird_Behaviour,-Disturbance)#excludesthevariabledisturbance

SubseZngcolumnsindplyr

Page 43: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Summarisingdata

SummarizingdatawithdplyrGettheoverallmeanforFIDusingsummarise()andmean()summarise(Bird_Behaviour,mean.FID=mean(FID))mean.FID111.82639

Page 44: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Summarisingdata

SummarizingdatawithdplyrWecanaddothermeasurementstothis:summarise(Bird_Behaviour,

mean.FID=mean(FID),#meanmin.FID=min(FID),#minimummax.FID=max(FID),#maximummed.FID=median(FID),#mediansd.FID=sd(FID), #standarddevia+onvar.FID=var(FID),#variancen.FID=n())#samplesize

Page 45: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Summarisingdata

SummarizingdatawithdplyrWecanaddothermeasurementstothis: mean.FID min.FIDmax.FIDmed.FIDsd.FIDvar.FIDn.FID1 11.826391 30 108.08203665.3193144

Page 46: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Summarisingdata

SummarizingdatawithdplyrGetsummariesforeachspeciesBeforeyoucalculatesummaries,youhavetoapplythegroup_by()func+onBird_Behaviour_by_Species<-group_by(Bird_Behaviour,Species)

Page 47: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Summarisingdata

SummarizingdatawithdplyrAserweappliedthegroup_by()func+on,wecangetsummariesforeachspeciesSummary.species<-summarise(Bird_Behaviour_by_Species,

mean.FID=mean(FID),#mean min.FID=min(FID),#minimum max.FID=max(FID),#maximum med.FID=median(FID),#median sd.FID=sd(FID), #standarddevia+on var.FID=var(FID),#variance n.FID=n())#samplesize

Page 48: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

Summarisingdata

Summarizingdatawithdplyras.data.frame(Summary.species) Speciesmean.FIDmin.FIDmax.FIDmed.FIDsd.FIDvar.FIDn.FID1Fringilla_coelebs20.44530216.3139.83482Passer_domes+cus6.1011073.129.71483Passer_montanus8.9412085.6131.5148

Page 49: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

WhichRfunc.onsdidwelearn?gather() takesmul+plecolumnsandcollapsesthemintokey-

valuepairs

spread() spreadsakey-valuepairacrossmul+plecolumns

melt() reshapeswideformattolongformat

dcast() reshapeslongformattowideformat

inner_join() Joinsdataandreturnsallrowsfromxwheretherearematchingvaluesiny,andallcolumnsfromxandy

separate() separatessinglecolumnintomul+plecolumns

unite() pastesmul+plecolumnsintoone

subset() returnssubsetswhichmeetcertaincondi+ons

slice() selectsrowsbyposi+on

filter() extractsrowsthatmeetlogicalcriteria

sample_frac() randomlyselectsafrac+onofrows

Page 50: Rearranging and manipulang data - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1617/slides/Day7_DataManipula… · Get an overview str() provides an overview of an object

Rearrangingandmanipula+ngdata

WhichRfunc.onsdidwelearn?sample_n() randomlyselectsnrows

select() selectscolumnsbynameorhelperfunc+on

summarise() summarisesmul+plevaluestoasinglevalue

mean() computesthearithme+cmean

min() returnstheminimumoftheinputvalues

max() returnsthemaximumoftheinputvalues

median() computesthemedian

sd() computesthestandarddevia+on

var() computesthevariance

n() returnsthenumberofrows

group_by() takesanexis+ngtableandconvertsitintoagroupedtablewhereopera+onsareperformed"bygroup"

View() invokesaspreadsheet-styledatavieweronamatrix-likeobject