Reproducible Research in Ecology with R: distribution of...
Transcript of Reproducible Research in Ecology with R: distribution of...
IntroductionApplying RR
Discussion & conclusion
Reproducible Research in Ecology with R:distribution of threatened mammals
in Equatorial Guinea
Marıa V. Jimenez-Franco ([email protected]),Chele Martınez-Martı, Jose F. Calvo, Jose A. Palazon
Department of Ecology and HidrologyUnivesity of Murcia (Spain)
10 July 2013
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
1 Introduction
2 Applying RR
3 Discussion & conclusion
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
ProblemsSolution?Our ecological study
Our problem
Scientific studies:
Announce a resultConvince readers that the result is correct
Do scientific studies allow readers to repeat and extend theanalytical process?
This type of situation sounds familiar to many of us.Don’t you agree?
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
ProblemsSolution?Our ecological study
RR: a solution?
What is Reproducible Research (RR)?
The ability to repeat the calculations for analyzing the data andobtaining the computational results
Why is RR so important?
Describe the results and provide a clear enough protocol to allowsuccessful repetition and extension of papersCoordinate different researchersLearn a new protocol or analysis (Teaching: students and beginners)
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
ProblemsSolution?Our ecological study
RR in R
Why is RR so important when we use R?
The method involves complex steps:
Preprocessing of the data (standardize them)Building the models to test their efficacyBuilding of figures/graphics/maps with the main results
A wide range of software tools and packages (often combined inunusual or novel ways)
Data sets are often analyzed many times, with modifications to themethods and parameters, until the final results are produced
Reproducible electronic document: document which compiles themethods (software components and the precise details of their use)with the results in a standardized form
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
ProblemsSolution?Our ecological study
The ecological study aims
1. to estimate the probabilities of occupancy (ψ) anddetection (p) by large size mammals in Ecuatorial Guineabased on ecological and social covariates, and
2. to map species-specific occurrence probability to identifypriority areas for conservation of large mammals.
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
ProblemsSolution?Our ecological study
Study area: Equatorial Guinea
26 000 km2 rectangular-shaped Rio Muni region
We divided the study area into 225, 5x5 km, sample units
Defined within the hunting area (21.6% as a final study area)
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
ProblemsSolution?Our ecological study
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
ProblemsSolution?Our ecological study
Study species
Golgen cat Leopard Forest buffalo
Mandrill Gorilla Chimpanzee
(Caracal aurata) (Panthera pardus) (Loxodonta cyclotis)Forest elephant
(Syncerus caffer)
(Mandrillus sphinx) (Gorilla gorilla) (Pan troglodytes)
Golgen cat
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
ProblemsSolution?Our ecological study
Ecological methods: Research Team A
Team A
Team B
Team CSamplingPresence/absencespecie data
GIS fromdata and enviromentalmaps
Siteoccupancymodels
GRASS
Hunterinterviews unmarked
Conduct hunter interviews in the 225 sample units
Between April 13 and October 16, 2010
To record presence/absence data for the study species
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
ProblemsSolution?Our ecological study
Ecological methods: Research Team B
Team A
Team B
Team CSamplingPresence/absencespecie data
GIS fromdata and enviromentalmaps
Siteoccupancymodels
GRASS
Hunterinterviews unmarked
Landscape characteristics (elevation, ruggedness and forest with 60%above tree cover land use for each 5x5 km sample unit)
Human influence (density of human settlements in each 5x5 km site)
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
ProblemsSolution?Our ecological study
Ecological methods: Research Team C
Team A
Team B
Team CSamplingPresence/absencespecie data
GIS fromdata and enviromentalmaps
Siteoccupancymodels
GRASS
Hunterinterviews unmarked
Establish single season site occupancy models
In order to estimate species occurrence (e.g., number of occupied sites)
as functions of site-level covariates
Logit link function of U covariates associated with site i
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
ProblemsSolution?Our ecological study
Ecological methods: Research Team C
Team A
Team B
Team CSamplingPresence/absencespecie data
GIS fromdata and enviromentalmaps
Siteoccupancymodels
GRASS
Hunterinterviews unmarked
Model: logit(ψi ) = β0 + β1 × xi + β2 × xi2 + ...+ βU × xiUwhere, β0 is the intercept or constant term and βU reression coefficientsfor each covariate
R package unmarcked(Fiske and Chandler, 2011)
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
Our procedimental problemsOur solutionResults of RR in our study
Needs
Team A
Team B
Team CSamplingPresence/absencespecie data
GIS fromdata and enviromentalmaps
Siteoccupancymodels
GRASS
Hunterinterviews unmarked
1. A feedback of information among the research teams was needed:
We remade the models including some new species andExchanged some covariates (forest area instead of river area)
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
Our procedimental problemsOur solutionResults of RR in our study
Needs
Team A
Team B
Team CSamplingPresence/absencespecie data
GIS fromdata and enviromentalmaps
Siteoccupancymodels
GRASS
Hunterinterviews unmarked
2. Share information:
To estimate the average probability of occupancy for each species in allthe study area.To draw the occurrence maps for each species based on site occupancymodels with the covariates
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
Our procedimental problemsOur solutionResults of RR in our study
Our study problems
Team A
Team B
Team CSamplingPresence/absencespecie data
GIS fromdata and enviromentalmaps
Siteoccupancymodels
GRASS
Hunterinterviews unmarked
We applied the model averaging technique to the best occupancy modelsobtained, using the maps with the information of covariates.
ψ =ψM1 × wM1 + ψM2 × wM2 + . . .+ ψMn × wMn∑
wMi
R package raster (Hijmans and Etten, 2012)
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
Our procedimental problemsOur solutionResults of RR in our study
Our study solution
We need to document the analytical process inorder to finish the study properly
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
Our procedimental problemsOur solutionResults of RR in our study
Our study solution
Team A
Team B
Team C
Reproducible
SamplingPresence/absencespecie data
GIS fromdata and enviromentalmaps
Siteoccupancymodels
Covariate maps
Probability occupancy model maps
Model Averaging Technique
GRASS
Hunterinterviews unmarked
raster
Research
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
Our procedimental problemsOur solutionResults of RR in our study
Our study solution
We used RR by applying markdown language and R package knitr to:
Calculate and include the information of the R code of statisticalanalyses and spatial data
Include explanation of the analyses
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
Our procedimental problemsOur solutionResults of RR in our study
Our reproducible research electronic document
This document describes the theory and used procedures
It has been made in markdown language using the R package kntir
Our document index:
AbstractIntroductionStudy areaData and information (data and funtions ad hoc)Values of probability of occupancy (for each species)
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
Our procedimental problemsOur solutionResults of RR in our study
Functions for simple writer: example
Data and informationHow to calculate the probability of occupancy (ψ) through themodel averagingMaps with the covariates for the modelsFunctions
Function modAve
Function proceso
Function mplot
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
Our procedimental problemsOur solutionResults of RR in our study
Simple writing: example
Values of probability of occupancy (ψ)Golden cat (GC)Leopard (L)Elephant (E)Buffalo (B)Gorilla (G)Chimpanzee (CH)Mandrill (M)
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
Our procedimental problemsOur solutionResults of RR in our study
Simple reading: example
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
DiscussionConclusions
Principles
A good beginning makes a good end
Put your best foot forward
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
DiscussionConclusions
Work options
LATEX language using Sweave (Leisch, 2002) is an useful tool tomake automatic documents.
Advantages of markdown:
Allows you to write using an easy-to-read, easy-to-write plain textformat, then convert it to structurally valid XHTML (or HTML)
Useful for beginners: eg, this presentation
Used in different programs: bash (GRASS), R, awk, python, perl, . . .
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
DiscussionConclusions
RR is important to
Coordinate different research teams.
Homogenize the analytical process for an easy use in futureapplications of this and similar studies.
Obtain reproducible electronic document for a better comprehensionof the analyses.
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
DiscussionConclusions
The reproducible electronic document
Can be read and understood easily after a long period of time by thesame authors.
Can be reused or modified for other similar studies. Therefore, it isuseful for other researchers.
Useful tool that facilitates the learning and the work in the Rproceedings. This process of compiling the methods in a documentcould be applied not only for ecologists and researchers of otherscientific areas but also for beginners and students in their degreesand masters.
The realization of the first document could take some time.
Markdown language is very suitable and a straightforward way tomake this document.
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
Acknowledgements
Panthera and Conservation International for funding and supportingthe field work in Equatorial Guinea.
A. Mang for assistance in field work and local hunters forcollaborating with the interviews.
A. Royle for the support in the realization of Site Occupancy Models.
M. V. Jimenez-Franco is supported by a FPU grant from theSpanish Ministerio de Educacion y Ciencia (reference AP2009-2073).
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
References
Fiske, I., Chandler, R., 2011. Unmarked: An R package for fitting hierarchicalmodels of wildlife occurrence and abundance. J. Stat. Softw. 43(10), 1–23.
Hijmans, R.J., Van Etten, J. (2012). Geographic analysis and modeling withraster data. URLhttp://cran.r-project.org/web/packages/raster/raster.pdf.
Martınez-Martı, C. (2011). The leopard (Panthera pardus) and the golden cat(Caracal aurata) in Equatorial Guinea: A national assessment of status,distribution and threat. Annual report submitted to Panthera/ConservationInternational.
Friedrich Leisch. Sweave: Dynamic generation of statistical reports using literatedata analysis. In Wolfgang Hardle and Bernd Ronz, editors, Compstat 2002 -Proceedings in Computational Statistics, pages 575-580. Physica Verlag,Heidelberg, 2002. ISBN 3-7908-1517-9.
R Core Team (2013). R: A language and environment for statistical computing.R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0,URL http://www.Rproject.org/
Xie, Y. (2013). knitr: A general-purpose package for dynamic report generationin R . R package version 1.1, URL http://yihui.name/knitr
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia
IntroductionApplying RR
Discussion & conclusion
Thanks!
Some questions?
Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia