The Process of Data Ingestion in ÆKOS

17
The Process of Data Ingestion in ÆKOS Andrew Graham and Matt Schneider TERN Ecoinformatics Data Analysts Logos used with consent. Content of this presentation except logos is released under TERN Attribution Licence Data Licence v1.0

description

The Process of Data Ingestion in ÆKOS. Andrew Graham and Matt Schneider TERN Ecoinformatics Data Analysts. Logos used with consent. Content of this presentation except logos is released under TERN Attribution Licence Data Licence v1.0. Introduction. - PowerPoint PPT Presentation

Transcript of The Process of Data Ingestion in ÆKOS

Page 1: The Process of Data Ingestion in ÆKOS

The Process of Data Ingestion in ÆKOS

Andrew Graham and Matt SchneiderTERN Ecoinformatics Data Analysts

Logos used with consent. Content of

this presentation except logos is

released under TERN Attribution Licence Data Licence v1.0

Page 2: The Process of Data Ingestion in ÆKOS

Introduction

The Data Analyst Role with TERN Ecoinformatics• Analysis of source data and methods• ÆKOS system development and domain modelling• Contextual description of the data• Publication of data into ÆKOS

Page 3: The Process of Data Ingestion in ÆKOS

The AEKOS Framework

1. Upper Context: Party, Project, Scope etc

2. Domain Model (Ontology): Observed entities, their features and relationships

3. Description Model: Methods and definitions

4. Indexing Model: Search and federation

Page 4: The Process of Data Ingestion in ÆKOS

Upper Context

Provides context for Datasets:• Contact details• High level objectives of program• Licensing details and conditions of use• Statement of scope• Alignment with national metadata standards

(ANDS)• Statement of curation processes applied to data

Page 5: The Process of Data Ingestion in ÆKOS

Understanding Field SamplingSchematic view of sampling configuration

Page 6: The Process of Data Ingestion in ÆKOS

Methodological work-flowStudy Location

Selection

Study Location Visit

Study Location Establishment

Sampling Unit Selection Vegetation

Assessment

Physical Assessment

Landscape AssessmentSoil Assessment

Fire EvidenceSurface Cover

Disturbance EvidenceVertebrate Evidence

Climate Evidence

Species AssessmentSpecies Life Stage

Vegetation Assemblage

Voucher Collection

Canopy Age-classCanopy AssessmentStructural Formation

Overstorey Measurement

Page 7: The Process of Data Ingestion in ÆKOS

Authored Method Descriptions• Start with published

method manuals

• Enrich existing method descriptions (protocols) with external web links and other resources

• Clarify questions about methods

• Divide the protocol into smaller method descriptions

Page 8: The Process of Data Ingestion in ÆKOS

Authored Method Descriptions

• Use a consistent format across datasets to allow comparison

• Direct linkage between the data value and the specific method of measurement

• Allows rapid assessment of suitability of data for re-use

• Eventually a method catalogue for researchers

Page 9: The Process of Data Ingestion in ÆKOS

Definition of source datasets

Analysis and definition of source data types:

• Observation data• Taxonomic concepts (a

specific type of ref. data)• Reference data (i.e. Lookup

tables)• Images and other artefacts.

Page 10: The Process of Data Ingestion in ÆKOS

Mapping to the ÆKOS Domain ModelStudy Location

Sampling Unit

Study Location Visit Spatial Point

mudmapcomment

visit dateobserversdisturbance

datumx coordy coord

identifiermarkertype

Species Organism Group

Voucher Specimendetermined identityaccession No.determiner

field identitylife formcover/abundancelife stagephenologydominance

Landscapeslopeaspectlandform pattern

selectsrepresents

contains

contains

represented by

Page 11: The Process of Data Ingestion in ÆKOS

IndexingEnrichment of data with common indexes:• Project level traits• Data management traits• Ecological process traits (disturbance and land-use)• Measurement details• Species taxonomy• Vegetation Assemblage (e.g. NVIS Major Veg. Groups)• Jurisdictional and Bio-geographic boundaries• Spatially derived features (e.g. distance from road,

slope, aspect, etc.)

Page 12: The Process of Data Ingestion in ÆKOS

Federated Taxonomy

Page 13: The Process of Data Ingestion in ÆKOS

The AEKOS Ingestion “DSL”

Screen cap of Eclipse...

• Source data query• Vocabulary management• Method description• Mapping to the common model• Populate indexes• Upper context authoring• Sandbox testing

Page 14: The Process of Data Ingestion in ÆKOS

Data Work-flow• Point of truth is always the source database• Data values are not changed• Data issues fed back to Data Providers• Automatic data refresh mechanism developed• Corrections made in source database and fed back

to AEKOS on next “push”• Just new records and edits after the first load• Update frequency defined for each dataset

Page 15: The Process of Data Ingestion in ÆKOS

Quality Assurance

ÆKOS QA and review:• Team review domain modelling of every dataset

ingested• “Sandbox” test ingestion before publishing to

ÆKOS• Review of method description by other team

members• Internal code validation and error checking

Page 16: The Process of Data Ingestion in ÆKOS

Quality AssuranceData Providers QA:• Review method descriptions• Review upper context

Portal feedback:• Review data content in the portal• Use the portal and suggest enhancements and changes• Look and feel• Index traits• Data accuracy and representation

• Feedback survey and email facility on portal

Page 17: The Process of Data Ingestion in ÆKOS

Thank you

Contact Details

Data Analyst – Matt Schneider [email protected]

Data Analyst – Andrew Graham [email protected]

Website www.aekos.org.au