Interacting with the REDCap API using the REDCapR Package Thomas Wilson, Will Beasley, David Bard...

34
Interacting with the REDCap API using the REDCapR Package Thomas Wilson, Will Beasley, David Bard University of Oklahoma Health Sciences Center Pediatrics Dept, Biomedical & Behavioral Methodology Core (BBMC) REDCap Con Sept 23, 2014

Transcript of Interacting with the REDCap API using the REDCapR Package Thomas Wilson, Will Beasley, David Bard...

Interacting with the REDCap API using the REDCapR Package

Thomas Wilson, Will Beasley, David Bard

University of Oklahoma Health Sciences CenterPediatrics Dept,

Biomedical & Behavioral Methodology Core (BBMC)

REDCap ConSept 23, 2014

Accessing REDCap Data

1. Manual Import & Export (eg, through CSV files)

– Require human interaction every time.

2. Dynamic Data Pull (pull data from an external system)

3. REDCap’s API (application programming interface)

– The API allows nonhumans to interact with each other directly (i.e. R, SAS, python, etc.).

4. REDCapR call REDCap’s API (an R library)

– Provides functions that wrap around calls to API.– Write 1 line of R code instead of ~40 lines.

Python Interaction Packages

1. PyCap "PyCap is an interface to the REDCap Application Programming Interface (API). PyCap is designed to be a minimal interface exposing all required and optional API parameters." (sburns.org/PyCap)

2. django-redcap "Utilities for porting REDCap projects to and from Django models." (github.com/cbmi/django-redcap)

R Interaction Packages

1. redcapAPI This is the most active fork of Jeffrey Horner’s ‘redcap’ package, now developed by Benjamin Nutter. A complete list of forks can be found through GitHub. (github.com/nutterb/redcapAPI)

2. REDCapR a similar package that also streamlines API calls from R to REDCap (github.com/Ouhsc/REDCapR)

Required pieces of information for API

1. The URL of the REDCap server.

2. A “token”, which is a hash that combines:– The specific REDCap project (within the REDCap server).– The specific user.– The user’s password.

Security

• Could spend 4 hours discussing security details.– Consult REDCap IT staff and/or our team.

• Use a private GitHub repository. (free for academics)

• Be careful with REDCap tokens. (ie, passwords)

• Get PHI into REDCap & SQL as early as possible.– We regularly receive CSVs & XLSXs from partners.– DB files aren’t accidentally copied or emailed.– And try to store derivative datasets in REDCap & SQL

instead of on the file server.

R is a free software environment for statistical computing and graphics.

R compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.

For more information:www.r-project.org

About R

What is this REDCapR you speak of?

REDCapR is an R package developed to streamline API calls from R to REDCap by encapsulating various functions.

REDCapR

“Necessity is the mother of invention”-English Proverb

REDCapR was born out of the necessity of breaking one large data call from REDCap, which is prone to timing out, into multiple small calls to REDCap. From the user perspective, the data call has the look of one call. From REDCap’s perspective, the data call is multiple smaller calls that are later assembled.Created to help with the Maternal Infant and Early Childhood Home Visiting (MIECHV) evaluation.

REDCapR History

Our current MIECHV investigation uses two REDCap projects:• Recruiting: 84,000 records and 204 fields

(17 million EAV rows)• Community Survey: 1,500 records and 2,330 fields

(3.5 million EAV rows)

• We were constantly timing out the operations, andtying up the server.

• Timeouts make things unpredictable and unnecessarily DOS our own people repeatedly.

Motivation for REDCapR

REDCapR Installation

### Read short intro at# https://github.com/OuhscBbmc/REDCapR

### Choice 1: Either install the stable version from CRANinstall.packages("REDCapR")

### Choice 2: Or install the development version from GitHubinstall.packages("devtools")devtools::install_github(repo="OuhscBbmc/REDCapR")

### Load the 'REDCapR' package into R's memory# so the functions are more easily accessible.library(REDCapR)

REDCapR Installation

create_batch_glossaryredcap_column_sanitizeredcap_download_file_oneshotredcap_projectredcap_readredcap_read_oneshotredcap_upload_file_oneshotredcap_writeredcap_write_oneshotretrieve_tokenvalidate_for_write

REDCapR Functions

Data extraction:

redcap_read_oneshotRead/export records from a REDCap project.

redcap_readRead/export records from a REDCap project in subsets, and stacks them together before returning a data.frame.

REDCapR Data Extraction

Usage

### Sample Coderedcap_read(batch_size = 100L, interbatch_delay = 0.5,redcap_uri, token, records = NULL, records_collapsed =

NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw",

verbose = TRUE, cert_location = NULL)

Several arguments of the redcap_read function will be discussed, however it should be noted that not all arguments are required. This function can be used with a statement as simple as:

redcap_read(redcap_uri, token)

redcap_read

Usage

### Sample Coderedcap_read(batch_size = 100L, interbatch_delay = 0.5,redcap_uri, token, records = NULL, records_collapsed =

NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw",

verbose = TRUE, cert_location = NULL)

batch_size: The maximum number of subject records a single batch shouldcontain. The default is 100.

redcap_read

Usage

### Sample Coderedcap_read(batch_size = 100L, interbatch_delay = 0.5,redcap_uri, token, records = NULL, records_collapsed =

NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw",

verbose = TRUE, cert_location = NULL)

interbatch_delay: The number of seconds the function will wait beforerequesting a new subset from REDCap. The defaultis 0.5 seconds

redcap_read

Usage

### Sample Coderedcap_read(batch_size = 100L, interbatch_delay = 0.5,redcap_uri, token, records = NULL, records_collapsed =

NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw",

verbose = TRUE, cert_location = NULL)

redcap_uri: The URI of the REDCap project. Required.

Note: In computing, a uniform resource identifier (URI) is a string of characters used to identify a name of a web resource. Such identification enables interaction with representations of the web resource over a network (typically the World Wide Web) using specific protocols.

redcap_read

Usage

### Sample Coderedcap_read(batch_size = 100L, interbatch_delay = 0.5,redcap_uri, token, records = NULL, records_collapsed =

NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw",

verbose = TRUE, cert_location = NULL)

token: The user-specific string that serves as the password for a project. Required.

redcap_read

Usage

### Sample Coderedcap_read(batch_size = 100L, interbatch_delay = 0.5,redcap_uri, token, records = NULL, records_collapsed =

NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw",

verbose = TRUE, cert_location = NULL)

records: An array, where each element corresponds to the ID of a desiredrecord. Optional.

REDCapR

Usage

### Sample Coderedcap_read(batch_size = 100L, interbatch_delay = 0.5,redcap_uri, token, records = NULL, records_collapsed =

NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw",

verbose = TRUE, cert_location = NULL)

fields: An array, where each element corresponds to a desired project field.Optional.

redcap_read

Usage

### Sample Coderedcap_read(batch_size = 100L, interbatch_delay = 0.5,redcap_uri, token, records = NULL, records_collapsed =

NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw",

verbose = TRUE, cert_location = NULL)

export_data_access_groups: A boolean value that specifies whether ornot to export the “redcap_data_access_group”field when data access groups are utilized in theproject. Default is FALSE.

redcap_read

Usage

### Sample Coderedcap_read(batch_size = 100L, interbatch_delay = 0.5,redcap_uri, token, records = NULL, records_collapsed =

NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw",

verbose = TRUE, cert_location = NULL)

raw_or_label: A string (either ‘raw’ or ‘label’) that specifies whether to exportthe raw coded values or the labels for the options of multiplechoice fields. Default is ‘raw’.

redcap_read

Usage

### Sample Coderedcap_read(batch_size = 100L, interbatch_delay = 0.5,redcap_uri, token, records = NULL, records_collapsed =

NULL, fields = NULL, fields_collapsed = NULL, export_data_access_groups = FALSE, raw_or_label = "raw",

verbose = TRUE, cert_location = NULL)

cert_location: If present, this string should point to the location of certfiles required for SSL verification. If the value is missing or NULL,the server’s identity will be verified using a recent CA bundlefrom the cURL website. Optional.

redcap_read

Details

Specifically, redcap_read internally uses multiple calls to redcap_read_oneshot to select and return data. Initially, only primary key is queried through the REDCap API. The long list is then subset into partitions, whose sizes are determined by the batch_size parameter. REDCap is then queried for all variables of the subset’s subjects. This is repeated for each subset, before returning a unified data.frame. The function allows a delay between calls, which allows the server to attend to other users’ requests.

redcap_read

Data import:

redcap_write_oneshot: writes data to REDCap all at once

redcap_write: writes data to REDCap in subsets

REDCapR Data Import

Usage

### Sample Coderedcap_write(ds_to_write, batch_size = 10L, interbatch_delay = 0.5, redcap_uri, token, verbose = TRUE)

This function contains many similar arguments to redcap_read. The new argument, ds_to_write, is the R data.frame that is going to be imported into a REDCap project.

redcap_write

Exporting records (less secure)### Declare the address of the server and# your token (ie, hash of project_id, username, password)uri <- "https://bbmc.ouhsc.edu/redcap/api/"token <- "9A81268476645C4E5F03428B8AC3AA7B"

### Call the serverresult_read <- redcap_read(redcap_uri=uri, token=token)

### Extract the dataset from the resultsds <- result_read$datads

record_id first_name age1 1 Nutmeg 102 2 Tumtum 113 3 Marcus 794 4 Trudy 615 5 John Lee 58

Comparison against Minimal

### Call the serverrawCsvText <- RCurl::postForm( uri = uri, token = token, content ='record', format = 'csv', type = 'flat', .opts = curlOptions(ssl.verifypeer=FALSE))

### Convert raw text into a data.frameds <- read.csv(text=rawCsvText, stringsAsFactors=FALSE)

### Call the serverresult <- redcap_read(redcap_uri=uri,

token=token)

### Pull out the dataset from the resultsds <- result$data

Comparison without batchingredcap_read_oneshot <- function( redcap_uri, token, records=NULL, records_collapsed="", fields=NULL, fields_collapsed="", export_data_access_groups=FALSE, raw_or_label='raw', verbose=TRUE, cert_location=NULL ) { start_time <- Sys.time() if( missing(redcap_uri) ) stop("The required parameter `redcap_uri` was missing from the call to `redcap_read_oneshot()`.") if( missing(token) ) stop("The required parameter `token` was missing from the call to `redcap_read_oneshot()`.") if( nchar(records_collapsed)==0 ) records_collapsed <- ifelse(is.null(records), "", paste0(records, collapse=",")) #This is an empty string if `records` is NULL. if( nchar(fields_collapsed)==0 ) fields_collapsed <- ifelse(is.null(fields), "", paste0(fields, collapse=",")) #This is an empty string if `fields` is NULL. export_data_access_groups_string <- ifelse(export_data_access_groups, "true", "false") if( missing( cert_location ) | is.null(cert_location) | (length(cert_location)==0)) cert_location <- system.file("cacert.pem", package="httr")

if( !base::file.exists(cert_location) ) stop(paste0("The file specified by `cert_location`, (", cert_location, ") could not be found.")) config_options <- list(cainfo=cert_location, sslversion=3) post_body <- list( token = token, content = 'record', format = 'csv', type = 'flat', rawOrLabel = raw_or_label, exportDataAccessGroups = export_data_access_groups_string, records = records_collapsed, fields = fields_collapsed ) result <- httr::POST( url = redcap_uri, body = post_body, config = config_options ) status_code <- result$status success <- (status_code==200L) raw_text <- httr::content(result, "text") elapsed_seconds <- as.numeric(difftime( Sys.time(), start_time, units="secs")) if( success ) { try ( ds <- read.csv(text=raw_text, stringsAsFactors=FALSE), #Convert the raw text to a dataset. silent = TRUE #Don't print the warning in the try block. Print it below, where it's under the control of the caller. ) outcome_message <- paste0(format(nrow(ds), big.mark=",", scientific=FALSE, trim=TRUE), " records and ", format(length(ds), big.mark=",", scientific=FALSE, trim=TRUE), " columns were read from REDCap in ", round(elapsed_seconds, 2), " seconds. The http status code was ", status_code, ".") raw_text <- "" } else { ds <- data.frame() #Return an empty data.frame #outcome_message <- paste0("Reading the REDCap data was not successful. The error message was:\n", geterrmessage()) outcome_message <- paste0("Reading the REDCap data was not successful. The error message was:\n", raw_text) } if( verbose ) message(outcome_message) return( list( data = ds, success = success, status_code = status_code, # status_message = status_message, outcome_message = outcome_message, records_collapsed = records_collapsed, fields_collapsed = fields_collapsed, elapsed_seconds = elapsed_seconds, raw_text = raw_text ) )}

### Call the serverresult <- redcap(redcap_uri=uri, token=token)

### Pull out the dataset from the resultsds <- result$data

That’s a lot of code to copy for every project.

Double this amount of code to batch.

Perks of REDCapR (part 1)

1. Batching: making smaller calls to server, and combining the results to appear as if only one call was made.

– Avoids server-time outs.– Can suspend between calls, to avoid tying up server.

2. Translates: resolves differences between API and R.

– eg, R stores IDs as a vector c(10, 20, 30), whilethe API needs a string "10,20,30"

3. Validates: proactively looks for common mistakes.

– Helps catch errors sooner, – Better error messages b/c it’s closer to error’s source.

4. Subset: easier to avoid retrieving an entire dataset.– Fewer rows.– Fewer columns.

Perks of REDCapR (part 2)

1. SSL: provides extra transport security, by default.

– Assumes responsibility for updating certificates.

2. Unit & Integration Tested: 100+ checks before release.– Corner cases are being added every month.

3. Wider Adoption: Library is used across multiple projects.– More assurances than evolving code that’s copy &

pasted.– Builds on experience within and between libraries

(eg, PyCap Python package and redcap R package).

Future Directions

• Attaching data labels to the variable names and values

• Extracting Calendar Events• Cloning Projects

To contribute

https://github.com/OuhscBbmc/REDCapR

Contributors:William H. BeasleyDavid E. BardThomas N. WilsonJohn J. AponteRollie ParrishBenjamin NutterAndrew R. Peters

Thanks to Funders

HRSA/ACF D89MC23154

OUHSC CCAN Independent Evaluation of the State of Oklahoma Competitive Maternal, Infant, and Early Childhood Home Visiting

(MIECHV) Project.

Evaluates MIECHV expansion and enhancement of Evidence-based Home Visitation programs in four Oklahoma counties.