Information Systems for Ecological Research John Porter – University of Virginia Scalable...

40
Information Systems for Information Systems for Ecological Research Ecological Research John Porter – University John Porter – University of Virginia of Virginia Scalable Information Networks for Scalable Information Networks for the Environment - Oct. 30, 2001 the Environment - Oct. 30, 2001

Transcript of Information Systems for Ecological Research John Porter – University of Virginia Scalable...

Page 1: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Information Systems for Information Systems for Ecological ResearchEcological Research

John Porter – University of VirginiaJohn Porter – University of Virginia

Scalable Information Networks for the Scalable Information Networks for the Environment - Oct. 30, 2001Environment - Oct. 30, 2001

Page 2: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

WHY have Ecological Databases?WHY have Ecological Databases?

New ScienceNew Science Long TermLong Term

• long-term studies depend on databases to retain long-term studies depend on databases to retain project historyproject history

SynthesisSynthesis• use of data for a purpose other than which it was use of data for a purpose other than which it was

collectedcollected• Regional and global studies requiring data from a Regional and global studies requiring data from a

large array of sampling locationslarge array of sampling locations Integrated, multidisciplinary projectsIntegrated, multidisciplinary projects

• depend on databases to facilitate sharing of datadepend on databases to facilitate sharing of data

Page 3: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Scientists have been successfully Scientists have been successfully conducting research for centuries conducting research for centuries without databases. We need to focus without databases. We need to focus on information systems that will let on information systems that will let us expand our scientific horizons us expand our scientific horizons and realize the full potential of our and realize the full potential of our research – not just “business as research – not just “business as usual”usual”

Page 4: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

RoadmapRoadmap

Who will be the users of Ecological Who will be the users of Ecological Information Systems? – What are their Information Systems? – What are their needs?needs?

An Idealized Ecological Information An Idealized Ecological Information EnvironmentEnvironment

System Needs for Development of an System Needs for Development of an Idealized Information EnvironmentIdealized Information Environment

Page 5: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

UsersUsers

ScientistsScientists Policy MakersPolicy Makers Conservation and Development Conservation and Development

OrganizationsOrganizations StudentsStudents

• GraduateGraduate• UndergraduateUndergraduate• K-12K-12

Recreational UsersRecreational Users

Page 6: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

The Ecological Information The Ecological Information ChallengeChallenge

Can we make information available to Can we make information available to ecologists:ecologists:• in ways they canin ways they can locatelocate the information the information

they need?they need?• with information in forms they can readilywith information in forms they can readily

useuse?? How can we assure that the information How can we assure that the information

is current and accurate?is current and accurate?

Page 7: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Examples- Population Ecologist Examples- Population Ecologist

Long-term time series of population size Long-term time series of population size and composition (his or her own data)and composition (his or her own data)

Climatological Data for the study site(s)Climatological Data for the study site(s) Comparative population data from other Comparative population data from other

locations or specieslocations or species Habitat change informationHabitat change information Predator community composition and Predator community composition and

abundanceabundance

Page 8: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Example -Ecosystem ModelerExample -Ecosystem Modeler

Specific information on the area being Specific information on the area being modeledmodeled• ClimateClimate• Species composition & Growth RatesSpecies composition & Growth Rates• Soil CharacterSoil Character

Page 9: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Example- Global Change ResearcherExample- Global Change Researcher

Global Scale DatasetsGlobal Scale Datasets• Satellite-derived productsSatellite-derived products• GIS data layersGIS data layers

Integrated data productsIntegrated data products• Comparable data from a large number of Comparable data from a large number of

sites, worldwidesites, worldwide– From International Monitoring ProgramsFrom International Monitoring Programs– From assembly of information sources From assembly of information sources

collected at local scalescollected at local scales

Page 10: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Example: Policy MakerExample: Policy Maker

Environmental policy decisions require Environmental policy decisions require data that are regional or nationaldata that are regional or national• worldworld• regionalregional• nationalnational• LocalLocal

Data need to be accessible and Data need to be accessible and understandable by non-scientistsunderstandable by non-scientists

Page 11: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

How can the needs of all these How can the needs of all these types of users be met?types of users be met?

What types of systems do we What types of systems do we need?need?

Page 12: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Database CharacteristicsDatabase Characteristics

““Deep” Deep” Relatively few kinds Relatively few kinds

of dataof data Large numbers of Large numbers of

observationsobservations Sophisticated query Sophisticated query

and analysis toolsand analysis tools

““Wide”Wide” Many different types Many different types

of dataof data Smaller number of Smaller number of

observations of observations of each typeeach type

Few analysis toolsFew analysis tools

““Deep” vs “Wide”Deep” vs “Wide”

Page 13: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Ways of Obtaining Needed DataWays of Obtaining Needed Data ““Bring out yer Dead*”Bring out yer Dead*”

• Extract aggregated data from the literatureExtract aggregated data from the literature• Make “educated guesses” about the Make “educated guesses” about the

content of poorly documented or content of poorly documented or fragmentary datafragmentary data

*With apologies to “Monte Python and the Holy Grail”*With apologies to “Monte Python and the Holy Grail”

ExamplesExamples• Digitizing graphs in published papers to extract Digitizing graphs in published papers to extract

point valuespoint values• Piecing together documentation on the meaning of Piecing together documentation on the meaning of

columns of a spreadsheet based on various columns of a spreadsheet based on various publications that used the datapublications that used the data

Page 14: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Ways of Obtaining Needed DataWays of Obtaining Needed Data

““U-Haul”U-Haul”• Obtain well-documented, but eclectic, data from Obtain well-documented, but eclectic, data from

information systemsinformation systems• Analyze and process the data to obtain needed Analyze and process the data to obtain needed

forms of dataforms of data

ExamplesExamples• Get primary production data from 3 LTER sites, each in a Get primary production data from 3 LTER sites, each in a

different form. Write separate programs to read in each different form. Write separate programs to read in each dataset and create an integrated version.dataset and create an integrated version.

• Get specimen lists for a study area from 5 museums. Break Get specimen lists for a study area from 5 museums. Break into taxonomic groups and tally the number of species in into taxonomic groups and tally the number of species in each group.each group.

Page 15: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Ways of Obtaining Needed DataWays of Obtaining Needed Data

““Fast Food”Fast Food”• Use pre-integrated data from an Use pre-integrated data from an

“integrated” or “value added” database“integrated” or “value added” database• Specify the data you need and the system Specify the data you need and the system

provides it in the form you requestprovides it in the form you request ExamplesExamples

• A climatological graph comparing sites from the A climatological graph comparing sites from the LTER ClimDB systemLTER ClimDB system

• KU-Species AnalystKU-Species Analyst• ORNL Primary Productivity CDORNL Primary Productivity CD

Page 16: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

National/Regional SystemsNational/Regional Systems

““Value-Added” or Value-Added” or

““Integrated”Integrated”

InfobasesInfobases

ResearchersResearchers

Individual datasetsIndividual datasets

Project or Site-Based SystemsProject or Site-Based Systems

An Idealized Information EnvironmentAn Idealized Information Environment

Page 17: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Examples and major system Examples and major system features features

Page 18: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Individual DatasetsIndividual Datasets

MetadataMetadata DataData

Page 19: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Most not originally intended for Most not originally intended for integrationintegration

EsotericEsoteric• Extremely diverse types of dataExtremely diverse types of data• High variability in topics, methods and High variability in topics, methods and

contentscontents• Metadata of highly variable qualityMetadata of highly variable quality

Data in a wide variety of formsData in a wide variety of forms• ASCII TextASCII Text• SpreadsheetsSpreadsheets• DatabasesDatabases

Characteristics of Characteristics of Individual DatasetsIndividual Datasets

Page 20: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Tools Needed forTools Needed forIndividual DatasetsIndividual Datasets

Tools for Data EntryTools for Data Entry• Relational Databases, Spreadsheets, Text Relational Databases, Spreadsheets, Text

editors editors Tools for Quality Assurance & ControlTools for Quality Assurance & Control

• Relational Databases, Statistical PackagesRelational Databases, Statistical Packages Tools for Capturing Primary MetadataTools for Capturing Primary Metadata

• MethodsMethods• GlitchesGlitches

In some cases these needs can be met In some cases these needs can be met using a site or project-based systemusing a site or project-based system

Page 21: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Site & Project SystemsSite & Project Systems

Page 22: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

““Wide” databasesWide” databases• Wide variety of information types Wide variety of information types • Relatively few analysis and query toolsRelatively few analysis and query tools• Data usually in (or near) original formsData usually in (or near) original forms

Have metadataHave metadata• Forms and contents may vary between systemsForms and contents may vary between systems

– Structured (DBMS and structured text)Structured (DBMS and structured text)– Unstructured (variable, ASCII text)Unstructured (variable, ASCII text)

• Usually provide browse and free-text search Usually provide browse and free-text search capabilitiescapabilities

Characteristics of Characteristics of Site & Project SystemsSite & Project Systems

Page 23: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Tools Needed for Tools Needed for Site & Project SystemsSite & Project Systems

Metadata Management SystemsMetadata Management Systems• Structured Metadata (either relational Structured Metadata (either relational

database, XML or other structured text)database, XML or other structured text) Data Access SystemsData Access Systems

• WWW servers, often linked to relational WWW servers, often linked to relational databasesdatabases

Quality Assurance SystemsQuality Assurance Systems• Review, Error checking programsReview, Error checking programs

Page 24: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Regional & National Regional & National

Page 25: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

““Wide” databasesWide” databases• Wide array of data types, few toolsWide array of data types, few tools• A few are “Deep” databases that focus on a A few are “Deep” databases that focus on a

single type of data (e.g. USGS map single type of data (e.g. USGS map databases)databases)

MetadataMetadata• Often follows some standardOften follows some standard

Some only provide links to data held by Some only provide links to data held by projects or individuals - National projects or individuals - National “Clearinghouses” “Clearinghouses”

Characteristics ofCharacteristics ofRegional & National Regional & National

Page 26: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Tools Needed forTools Needed forRegional & National Regional & National

(Similar to Project Systems)(Similar to Project Systems) Metadata Management SystemsMetadata Management Systems

• Structured Metadata (either relational Structured Metadata (either relational database or XML)database or XML)

Data Access SystemsData Access Systems• WWW servers, often linked to relational WWW servers, often linked to relational

databasesdatabases Quality Assurance SystemsQuality Assurance Systems

• Review, Error checking programsReview, Error checking programs

Page 27: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Integrated “Value Added” Integrated “Value Added” SystemsSystems

Example: Example:

KU-Species AnalystKU-Species Analyst

Page 28: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Climate Climate database database integrates integrates data from a data from a number of number of sitessites

Integrated “Value Added” Integrated “Value Added” SystemsSystems

Page 29: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

““Deep” DatabasesDeep” Databases• Specialized query toolsSpecialized query tools• Deal with specific types of dataDeal with specific types of data

Can produce data in specialized formsCan produce data in specialized forms Draw data from one or more projects or Draw data from one or more projects or

national databasesnational databases

Characteristics: Integrated Characteristics: Integrated “Value Added” Systems“Value Added” Systems

Page 30: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Tools Needed: Integrated Tools Needed: Integrated “Value Added” Systems“Value Added” Systems

Tools to “harvest” the data from a Tools to “harvest” the data from a variety of sourcesvariety of sources

Tools to integrate that data into a unified Tools to integrate that data into a unified wholewhole• Often relational databasesOften relational databases

Tools for query, output of specialized Tools for query, output of specialized data products and graphicsdata products and graphics

Page 31: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

NEONsNEONs

What roles would What roles would a NEON site a NEON site play?play?

IntegratedIntegrated

RegionalRegional

SiteSite

ResearchersResearchers

Page 32: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

What are the database What are the database requirements for each of the requirements for each of the elements of the idealized elements of the idealized information environment?information environment?

Page 33: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Key Elements needed at each Key Elements needed at each levellevel

Site/ProjectSite/Project• Metadata – in structured forms, preferably Metadata – in structured forms, preferably

standards-basedstandards-based National or NetworkNational or Network

• Consistent keyword vocabulariesConsistent keyword vocabularies• Standards for metadata contentStandards for metadata content

““Value Added” Value Added” • Domain ExpertiseDomain Expertise• Need for structured metadataNeed for structured metadata• Standards for data productsStandards for data products

Page 34: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Making Links WorkMaking Links Work

IntegratedIntegrated

NationalNational

SiteSite

ResearchersResearchers

Controlled Controlled vocabulariesvocabulariesNeeded for Needed for identifying identifying needed dataneeded data

Spatio-Spatio-Temporal Temporal ReferencesReferences

Page 35: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Needed - InterfacesNeeded - Interfaces

IntegratedIntegrated

NationalNational

SiteSite

ResearchersResearchers

Documented/Documented/Standards-Standards-based based interfacesinterfacesNeeded to Needed to transfer data transfer data and metadataand metadata

Page 36: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

““Missing Pieces”Missing Pieces”

Domain expertise applied to developing Domain expertise applied to developing “integrated” or “value added” systems“integrated” or “value added” systems

Methods for transferring attribution Methods for transferring attribution (credit) along with the data and (credit) along with the data and metadatametadata

Page 37: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

SummarySummary

There are diverse needs for ecological There are diverse needs for ecological datadata

Meeting those needs will require a Meeting those needs will require a variety of interlinked information variety of interlinked information systemssystems• And the tools, technologies and standards And the tools, technologies and standards

that make the links functionalthat make the links functional

Page 38: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

RolesRoles

Development of a functional information Development of a functional information infrastructure for ecology demands the infrastructure for ecology demands the involvement:involvement:• of scientists with expertise in ecology & of scientists with expertise in ecology &

related disciplines who are willing to related disciplines who are willing to participate in system developmentparticipate in system development

• of individuals with technical expertise who of individuals with technical expertise who can work with those scientistscan work with those scientists

Page 39: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Example- LTER ClimDBExample- LTER ClimDB The LTER Climate The LTER Climate

Committee needed to Committee needed to develop the standards develop the standards for for database contentdatabase content and needed and needed output output formsforms in consultation in consultation with LTER Information with LTER Information ManagersManagers

IM’s were then able to IM’s were then able to create a system that create a system that met those needsmet those needs

Baker, K.B. B.J. Benson, D.L. Henshaw, D. Baker, K.B. B.J. Benson, D.L. Henshaw, D. Blodgett, J.H. Porter, and S.G. Stafford. 2000. Blodgett, J.H. Porter, and S.G. Stafford. 2000. Evolution of a Multisite Network Information Evolution of a Multisite Network Information System: The LTER Information Management System: The LTER Information Management Paradigm. BioScience 50(11):963-978.Paradigm. BioScience 50(11):963-978.

Page 40: Information Systems for Ecological Research John Porter – University of Virginia Scalable Information Networks for the Environment - Oct. 30, 2001.

Useful ReferencesUseful References

http://www.ecoinformatics.orghttp://www.ecoinformatics.org Baker, K.B. B.J. Benson, D.L. Henshaw, D. Blodgett, J.H. Baker, K.B. B.J. Benson, D.L. Henshaw, D. Blodgett, J.H.

Porter, and S.G. Stafford. 2000. Evolution of a Multisite Porter, and S.G. Stafford. 2000. Evolution of a Multisite Network Information System: The LTER Information Network Information System: The LTER Information Management Paradigm. BioScience 50(11):963-978.Management Paradigm. BioScience 50(11):963-978.

W.K. Michener and J. Brunt. 2000. Ecological Data: Design, W.K. Michener and J. Brunt. 2000. Ecological Data: Design, Processing and Management. Blackwell Science Ltd., Processing and Management. Blackwell Science Ltd., London. London.

Olson, R. J., J. M. Briggs, J. H. Porter, G. R. Mah, and S. G. Olson, R. J., J. M. Briggs, J. H. Porter, G. R. Mah, and S. G. Stafford. 1999. Managing Data from Multiple Disciplines, Stafford. 1999. Managing Data from Multiple Disciplines, Scales, and Sites to Support Synthesis and Modeling. Scales, and Sites to Support Synthesis and Modeling. Remote Sensing Environment 70:99-107. Remote Sensing Environment 70:99-107.