FOT-Net Final Event First FOT-Net Data Workshop Amsterdam, 18-19 March 2014 Research Data Exchange...
-
Upload
randolf-anderson -
Category
Documents
-
view
213 -
download
0
Transcript of FOT-Net Final Event First FOT-Net Data Workshop Amsterdam, 18-19 March 2014 Research Data Exchange...
FOT-Net Final Event
First FOT-Net Data Workshop
Amsterdam, 18-19 March 2014
Research Data Exchange (RDE) and Safety Pilot Model Deployment Data
Dale Thompson
USDOT / ITS Joint Program Office
2U.S. Department of TransportationITS Joint Program Office
Outline
The Research Data Exchange (RDE)□ Mission□ Structure□ Statistics and Usage
Featured Data Environment: Safety Pilot Model Deployment (SPMD)□ Overview of the SPMD□ Hosting the SPMD Data
Future Data Environment - SHRP 2 Naturalistic Driving Study (NDS)□ Overview of SHRP2 NDS□ Anticipated Use Cases and Challenges Of NDS Data
3U.S. Department of TransportationITS Joint Program Office
The RDE: a central transportation data repository for researchers and application developers
Data Capture and Management program’s mission is to provide a variety of data-related services that support the development, testing, and demonstration of multi-modal transportation mobility applications
The RDE is a transportation data sharing system that promotes sharing of archived and real-time data from multiple sources and multiple modes
The RDE provides the ability for users to download data and appropriate documentation, create research projects and collaborate with other users, and comment on data sets Source: http://its-rde.net/
4U.S. Department of TransportationITS Joint Program Office
The RDE employs the concept of a Data Environment to structure the various data sets
RDE organizes data using a data environment / data set / data file hierarchy
A Data Environment is a collection of data sets which were obtained under the same test / experiment
Data Sets represent a logical arrangement of files that convey a central concept or idea about an aspect of a data collection exercise
Data sets contain Data Files that are archived collection of data (elements) and can be text, zip, binary, or other file types
Source: http://its-rde.net/
5U.S. Department of TransportationITS Joint Program Office
The RDE currently houses 11 Data Environments with data from different locations throughout the US
States from which data have been collected include California, Florida, Michigan, Minnesota, Oregon, Virginia, and Washington
The number of data sets that are a part of these 11 environments range from 2 to 37
And the number of data files per data set ranges from a few to well over 100
Source: http://its-rde.net/
6U.S. Department of TransportationITS Joint Program Office
The Safety Pilot Model Deployment is a recently added data environment to the RDE
SPMD is an exploration of the real-world effectiveness of connected vehicle safety applications in multi-modal driving conditions
A one-day sample of SPMD data is captured in this environment □ This provide users with a snapshot
of the output from the implementation of connected vehicle technology
This environment contains□ 5 data sets □ mobility data elements collected
from approximately 3000 vehicles□ weather and infrastructure related
data elements
Source: USDOT
7U.S. Department of TransportationITS Joint Program Office
Hyper-accurate, hyper-frequent data posed a series of challenges in uploading the SPMD data
Some of the challenges faced in making the SPMD data publicly available include:□ Data governance□ Distribution rights□ Personally identifiable information□ Size of the data sets and data files
Understanding the data governance structure amongst involved entities is integral to acquiring data for distribution
Two of the more directed constraints for data distribution are the inclusion of data that may compromise the initial goal of the exercise, and the presence of PII Source: http://www.safetypilot.us/
8U.S. Department of TransportationITS Joint Program Office
PII had to be removed from the SPMD data while maintaining meaningfulness of the data
To protect participants’ identity the RDE team rid all data files of data elements that contain PII
Data elements that could be paired with other publicly available data were also deleted
Vehicle trajectories, with points collected at 10Hz, revealed the identity of participants, therefore□ Sanitization algorithms were
developed to truncate trajectories to mask trip origins and destinations
□ The algorithms were also applied to dependent / related data elements
Complete Trajectories
Truncated Trajectories
Map Source: Google Maps
9U.S. Department of TransportationITS Joint Program Office
Connected vehicle data is an emerging area, subject to “Big Data” opportunities and challenges
The SPMD data environment was structured in 5 data sets, with a total sanitized volume of approximately 24 GB (largest file ~ 10GB) for a 24-hr period
The original un-sanitized data set was approximately 50GB
The challenge with working with such large data sets is two-fold□ Extracting and sanitizing the data is
computationally expensive□ (Large) files had to be carefully
broken into more manageable segments for easy download
Source: http://www.safetypilot.us/
10U.S. Department of TransportationITS Joint Program Office
The RDE team will continue to post additional data sets while leveraging efforts of similar data sharing entities
Data sets being pursued for RDE hosting include data from: □ Dynamic Mobility Applications□ Applications for the Environment:
Real Time Information Synthesis (AERIS)
□ Road Weather Management Program
Entities that the RDE team is looking to partner with, to not only share data, but also sharing strategies and insights when distributing data□ FOT-Net Data□ Research Data Alliance
11U.S. Department of TransportationITS Joint Program Office
The RDE team will be adding the SHRP2 Naturalistic Driving Study (NDS) data in the coming months
Designed to investigate ordinary driving under real world conditions, with aim of learning about driver decisions
Wide-spread demographics of the study’s 3100 participants
Two year timeframe for extensive data collection
Wide-spread geography of test sites around the US: Tampa, FL; Bloomington, IN; Durham, NC; Buffalo, NY; State College, PA; Seattle, WA.
Source: https://insight.shrp2nds.us/docs/shrp2_background.pdf
12U.S. Department of TransportationITS Joint Program Office
There is a wide variety of data available from the study
Driver Assessment Data: visual perception, medical history, reaction time, driving knowledge, etc.
Vehicle Data: vehicle make and model, and how vehicle is equipped (with sensors, for example)
Driving Data: Video images from various perspectives in vehicle, vehicle kinematics, and others such as seat belt use, steering wheel angle, alcohol presence, radar to identify external near field objects
Crash Data: interview Q&As, police crash reports
Roadway Data: roadway geometry, speed limit signs, intersection location and characteristics, etc. (these data are obtained in an effort separate from the collection of the driving data)
13U.S. Department of TransportationITS Joint Program Office
In making NDS data accessible via the RDE, the procedure followed will be informed by that of SPMD
Similar to the challenges when distributing SPMD data, these challenges will face when distributing NDS data
These challenges include:□ Data governance□ Distribution rights□ Personally identifiable information□ Size of the data sets and data files
RDE team will employ lessons leaned from posting the SPMD data to the RDE, while being cognizant of the nuisances of the NDS data that will lead changes to the developed approach
14U.S. Department of TransportationITS Joint Program Office
RDE Policy Issues
The RDE is a public-facing, research resource that hosts large volumes of potentially sensitive data from multiple sources
It required development of policies and procedures in a number of areas typical of other websites:□ Authorities and membership management□ Accessibility□ Terms of use
To create the RDE, the team also confronted a range of unique policy issues in these areas:□ Data ownership□ Data security□ Data privacy
15U.S. Department of TransportationITS Joint Program Office
Data Ownership
Issue: The RDE may host data that has been provided by different sources:□ Federal contractors□ State and other public agencies□ Universities□ Private individuals or businesses
Relevant RDE Goal: Foster and support research in transportation operations by a wide variety of stakeholders
Challenge: Balance rights of various providers (with different institutional structures and needs) against needs for wide access and use
Response:□ Sign agreements with each data contributor□ Offer RDE content to the public under open source license that requires attribution
(Creative Commons Attribution-ShareAlike 3.0 Unported)
16U.S. Department of TransportationITS Joint Program Office
Data Security
Issue: The RDE contains several terabytes of data, scaling up to petabytes. Relevant RDE Goals:
□ Offer reliable and cost-effective access to huge data sets□ Comply with Federal Information Security Management Act (FISMA)
Challenge: Develop a business model for on-site Departmental hosting or certify external server host
Response:□ Launch version 1.0 on website contractor servers
▪ Enforce security training▪ (Insert additional info on IndraSoft certification here)
□ Transition to FedRAMP-certified cloud-based host (Amazon Web Services or similar)
17U.S. Department of TransportationITS Joint Program Office
Data Privacy
Issue: The RDE contains GPS traces from vehicles. Relevant RDE Goals:
□ Provide maximum research value from available data□ Protect identity of vehicle users
Challenge: Develop an approach to reliably de-identify GPS traces Response:
□ Launch with GPS traces only from public agency vehicles on agency business□ Develop processes for:
▪ GPS trace de-identification by minimal truncation▪ Validation of the de-identification methods
18U.S. Department of TransportationITS Joint Program Office
Near-Term Steps: Data Federation
Issue: Data federation entails providing access through the RDE to data sets not owned or managed by the RDE Team
Relevant RDE Goals:□ Protect data rights of providers□ Protect privacy of vehicle users in data sets□ Ensure overall system security
Challenge: Develop a flexible system of agreements that can be instituted between the RDE Team and federated sites
Response: TBD
19U.S. Department of TransportationITS Joint Program Office
Questions
Dale Thompson
USDOT
ITS Joint Program Office
202-493-0259