Reproducible Research in Ecology with R: distribution of...

30
Introduction Applying RR Discussion & conclusion Reproducible Research in Ecology with R: distribution of threatened mammals in Equatorial Guinea Mar´ ıa V. Jim´ enez-Franco ([email protected]), Chele Mart´ ınez-Mart´ ı, Jos´ e F. Calvo, Jos´ e A. Palaz´ on Department of Ecology and Hidrology Univesity of Murcia (Spain) 10 July 2013 Mar´ ıa V. Jim´ enez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Transcript of Reproducible Research in Ecology with R: distribution of...

Page 1: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

Reproducible Research in Ecology with R:distribution of threatened mammals

in Equatorial Guinea

Marıa V. Jimenez-Franco ([email protected]),Chele Martınez-Martı, Jose F. Calvo, Jose A. Palazon

Department of Ecology and HidrologyUnivesity of Murcia (Spain)

10 July 2013

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 2: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

1 Introduction

2 Applying RR

3 Discussion & conclusion

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 3: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

ProblemsSolution?Our ecological study

Our problem

Scientific studies:

Announce a resultConvince readers that the result is correct

Do scientific studies allow readers to repeat and extend theanalytical process?

This type of situation sounds familiar to many of us.Don’t you agree?

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 4: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

ProblemsSolution?Our ecological study

RR: a solution?

What is Reproducible Research (RR)?

The ability to repeat the calculations for analyzing the data andobtaining the computational results

Why is RR so important?

Describe the results and provide a clear enough protocol to allowsuccessful repetition and extension of papersCoordinate different researchersLearn a new protocol or analysis (Teaching: students and beginners)

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 5: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

ProblemsSolution?Our ecological study

RR in R

Why is RR so important when we use R?

The method involves complex steps:

Preprocessing of the data (standardize them)Building the models to test their efficacyBuilding of figures/graphics/maps with the main results

A wide range of software tools and packages (often combined inunusual or novel ways)

Data sets are often analyzed many times, with modifications to themethods and parameters, until the final results are produced

Reproducible electronic document: document which compiles themethods (software components and the precise details of their use)with the results in a standardized form

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 6: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

ProblemsSolution?Our ecological study

The ecological study aims

1. to estimate the probabilities of occupancy (ψ) anddetection (p) by large size mammals in Ecuatorial Guineabased on ecological and social covariates, and

2. to map species-specific occurrence probability to identifypriority areas for conservation of large mammals.

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 7: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

ProblemsSolution?Our ecological study

Study area: Equatorial Guinea

26 000 km2 rectangular-shaped Rio Muni region

We divided the study area into 225, 5x5 km, sample units

Defined within the hunting area (21.6% as a final study area)

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 8: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

ProblemsSolution?Our ecological study

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 9: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

ProblemsSolution?Our ecological study

Study species

Golgen cat Leopard Forest buffalo

Mandrill Gorilla Chimpanzee

(Caracal aurata) (Panthera pardus) (Loxodonta cyclotis)Forest elephant

(Syncerus caffer)

(Mandrillus sphinx) (Gorilla gorilla) (Pan troglodytes)

Golgen cat

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 10: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

ProblemsSolution?Our ecological study

Ecological methods: Research Team A

Team A

Team B

Team CSamplingPresence/absencespecie data

GIS fromdata and enviromentalmaps

Siteoccupancymodels

GRASS

Hunterinterviews unmarked

Conduct hunter interviews in the 225 sample units

Between April 13 and October 16, 2010

To record presence/absence data for the study species

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 11: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

ProblemsSolution?Our ecological study

Ecological methods: Research Team B

Team A

Team B

Team CSamplingPresence/absencespecie data

GIS fromdata and enviromentalmaps

Siteoccupancymodels

GRASS

Hunterinterviews unmarked

Landscape characteristics (elevation, ruggedness and forest with 60%above tree cover land use for each 5x5 km sample unit)

Human influence (density of human settlements in each 5x5 km site)

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 12: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

ProblemsSolution?Our ecological study

Ecological methods: Research Team C

Team A

Team B

Team CSamplingPresence/absencespecie data

GIS fromdata and enviromentalmaps

Siteoccupancymodels

GRASS

Hunterinterviews unmarked

Establish single season site occupancy models

In order to estimate species occurrence (e.g., number of occupied sites)

as functions of site-level covariates

Logit link function of U covariates associated with site i

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 13: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

ProblemsSolution?Our ecological study

Ecological methods: Research Team C

Team A

Team B

Team CSamplingPresence/absencespecie data

GIS fromdata and enviromentalmaps

Siteoccupancymodels

GRASS

Hunterinterviews unmarked

Model: logit(ψi ) = β0 + β1 × xi + β2 × xi2 + ...+ βU × xiUwhere, β0 is the intercept or constant term and βU reression coefficientsfor each covariate

R package unmarcked(Fiske and Chandler, 2011)

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 14: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

Our procedimental problemsOur solutionResults of RR in our study

Needs

Team A

Team B

Team CSamplingPresence/absencespecie data

GIS fromdata and enviromentalmaps

Siteoccupancymodels

GRASS

Hunterinterviews unmarked

1. A feedback of information among the research teams was needed:

We remade the models including some new species andExchanged some covariates (forest area instead of river area)

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 15: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

Our procedimental problemsOur solutionResults of RR in our study

Needs

Team A

Team B

Team CSamplingPresence/absencespecie data

GIS fromdata and enviromentalmaps

Siteoccupancymodels

GRASS

Hunterinterviews unmarked

2. Share information:

To estimate the average probability of occupancy for each species in allthe study area.To draw the occurrence maps for each species based on site occupancymodels with the covariates

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 16: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

Our procedimental problemsOur solutionResults of RR in our study

Our study problems

Team A

Team B

Team CSamplingPresence/absencespecie data

GIS fromdata and enviromentalmaps

Siteoccupancymodels

GRASS

Hunterinterviews unmarked

We applied the model averaging technique to the best occupancy modelsobtained, using the maps with the information of covariates.

ψ =ψM1 × wM1 + ψM2 × wM2 + . . .+ ψMn × wMn∑

wMi

R package raster (Hijmans and Etten, 2012)

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 17: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

Our procedimental problemsOur solutionResults of RR in our study

Our study solution

We need to document the analytical process inorder to finish the study properly

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 18: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

Our procedimental problemsOur solutionResults of RR in our study

Our study solution

Team A

Team B

Team C

Reproducible

SamplingPresence/absencespecie data

GIS fromdata and enviromentalmaps

Siteoccupancymodels

Covariate maps

Probability occupancy model maps

Model Averaging Technique

GRASS

Hunterinterviews unmarked

raster

Research

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 19: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

Our procedimental problemsOur solutionResults of RR in our study

Our study solution

We used RR by applying markdown language and R package knitr to:

Calculate and include the information of the R code of statisticalanalyses and spatial data

Include explanation of the analyses

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 20: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

Our procedimental problemsOur solutionResults of RR in our study

Our reproducible research electronic document

This document describes the theory and used procedures

It has been made in markdown language using the R package kntir

Our document index:

AbstractIntroductionStudy areaData and information (data and funtions ad hoc)Values of probability of occupancy (for each species)

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 21: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

Our procedimental problemsOur solutionResults of RR in our study

Functions for simple writer: example

Data and informationHow to calculate the probability of occupancy (ψ) through themodel averagingMaps with the covariates for the modelsFunctions

Function modAve

Function proceso

Function mplot

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 22: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

Our procedimental problemsOur solutionResults of RR in our study

Simple writing: example

Values of probability of occupancy (ψ)Golden cat (GC)Leopard (L)Elephant (E)Buffalo (B)Gorilla (G)Chimpanzee (CH)Mandrill (M)

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 23: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

Our procedimental problemsOur solutionResults of RR in our study

Simple reading: example

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 24: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

DiscussionConclusions

Principles

A good beginning makes a good end

Put your best foot forward

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 25: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

DiscussionConclusions

Work options

LATEX language using Sweave (Leisch, 2002) is an useful tool tomake automatic documents.

Advantages of markdown:

Allows you to write using an easy-to-read, easy-to-write plain textformat, then convert it to structurally valid XHTML (or HTML)

Useful for beginners: eg, this presentation

Used in different programs: bash (GRASS), R, awk, python, perl, . . .

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 26: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

DiscussionConclusions

RR is important to

Coordinate different research teams.

Homogenize the analytical process for an easy use in futureapplications of this and similar studies.

Obtain reproducible electronic document for a better comprehensionof the analyses.

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 27: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

DiscussionConclusions

The reproducible electronic document

Can be read and understood easily after a long period of time by thesame authors.

Can be reused or modified for other similar studies. Therefore, it isuseful for other researchers.

Useful tool that facilitates the learning and the work in the Rproceedings. This process of compiling the methods in a documentcould be applied not only for ecologists and researchers of otherscientific areas but also for beginners and students in their degreesand masters.

The realization of the first document could take some time.

Markdown language is very suitable and a straightforward way tomake this document.

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 28: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

Acknowledgements

Panthera and Conservation International for funding and supportingthe field work in Equatorial Guinea.

A. Mang for assistance in field work and local hunters forcollaborating with the interviews.

A. Royle for the support in the realization of Site Occupancy Models.

M. V. Jimenez-Franco is supported by a FPU grant from theSpanish Ministerio de Educacion y Ciencia (reference AP2009-2073).

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 29: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

References

Fiske, I., Chandler, R., 2011. Unmarked: An R package for fitting hierarchicalmodels of wildlife occurrence and abundance. J. Stat. Softw. 43(10), 1–23.

Hijmans, R.J., Van Etten, J. (2012). Geographic analysis and modeling withraster data. URLhttp://cran.r-project.org/web/packages/raster/raster.pdf.

Martınez-Martı, C. (2011). The leopard (Panthera pardus) and the golden cat(Caracal aurata) in Equatorial Guinea: A national assessment of status,distribution and threat. Annual report submitted to Panthera/ConservationInternational.

Friedrich Leisch. Sweave: Dynamic generation of statistical reports using literatedata analysis. In Wolfgang Hardle and Bernd Ronz, editors, Compstat 2002 -Proceedings in Computational Statistics, pages 575-580. Physica Verlag,Heidelberg, 2002. ISBN 3-7908-1517-9.

R Core Team (2013). R: A language and environment for statistical computing.R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0,URL http://www.Rproject.org/

Xie, Y. (2013). knitr: A general-purpose package for dynamic report generationin R . R package version 1.1, URL http://yihui.name/knitr

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia

Page 30: Reproducible Research in Ecology with R: distribution of ...edii.uclm.es/~useR-2013/slides/131.pdfReproducible Research in Ecology with R: distribution of threatened mammals in Equatorial

IntroductionApplying RR

Discussion & conclusion

Thanks!

Some questions?

Marıa V. Jimenez-Franco et al Reproducible Research in Ecology with R: . . . Univ Murcia