Synthesizing and Spatially Linking Disparate Datasets Catherine Crosier 1,2, Gregory J. Newman 1,...

1
Synthesizing and Spatially Linking Disparate Datasets Catherine Crosier 1,2 , Gregory J. Newman 1 , and Thomas J. Stohlgren 1,2 1 Natural Resource Ecology Laboratory, Colorado State University, Fort Collins, Colorado 80523-1499 2 Fort Collins Science Center, US Geological Survey, Fort Collins, Colorado 80525 Acknowledgements DATA SOURCE DATA TYPE 1. Biota of North America Program County species list 2. U.S. Geological Survey Multi-scale plot data 3. U.S. Forest Service Multi-scale plot data (2 national programs) 4. Graduate students Multi-scale plot data 5. San Luis Valley GIS Authority Point, line, and polygon data 6. Larimer County Point, line, and polygon data (noxious weeds) 7. Center for the Environmental Management of Military Lands (CEMML) Vegetation monitoring transects 8. The Nature Conservancy Polygon, point, and transect data 9. Colorado Natural Heritage Program Polygon and point data 10. State of Colorado Quarter quad data for 20 species; DOT weed GIS data 11. CSU Research Scientists Point data and control data 12. National Park Service Park species lists and GIS data 13. U.S. Fish and Wildlife Service Refuge manager survey results 14. LTER Plot data 15. Bureau of Land Management Point, polygon, and control data Problem: Lack of knowledge 1.Ecological studies often are conducted over short temporal scales in specific areas. 2.Species occurrence and abundance data do not exist for many public land units (e.g. Fish and Wildlife Refuges), making it hard to make sound management decisions. Solution: Synthesize smaller studies 1.Addresses additional and more complex questions 2.Capitalizes on resources already expended 3.Creates probable species lists and distribution maps Introduction Figure 2: Dataset Bias 1.Obtain datasets from partners (Table 1). 2.Synthesize pre-existing datasets, including multiple scale (i.e., 1-m 2 to entire county) information on non-native vascular plant species in Colorado obtained from individuals, organizations, and agencies at all levels (i.e., federal to local) in a geodatabase (Figure 1). 3.Use on-line GIS technology to disseminate information on non-native species locations to land managers and the public in a user-friendly and easily understood manner (Figure 1). Objectives Table 1: Datasets Used, Pending Figure 3: Species Added Extensive surveys to obtain species distribution data are expensive and time consuming. In addition, many land managers do not know what species occur within their management boundaries, let alone adjacent to them. Yet, there is a wealth of data collected by government agencies and non-government organizations that, when synthesized, maximize the use of existing data on species locations without incurring large costs. We obtained datasets on non-native species including species lists, plot data, transect data, point data, and individual species polygon data that have been collected using various methods at different spatial scales to meet different objectives. We then created a geodatabase with ESRI’s ArcGIS and Microsoft Access 2000 that relates these different datasets spatially and hierarchically and allows us to use them in concert. When combined and spatially linked, the data improve the completeness of each individual dataset. As an example, species reported in a specific area by one dataset were not reported by a second dataset for the same location. The creation of a synthesized, spatially linked database allows development of predictive models and maps for the distribution and abundance of non-native species. These products can be used by land managers at all levels to aid in the early detection, management, and control of non-native species. Keywords: Data synthesis, Data synergy, Spatial linking, Non-native species, Ecological databases Abstract 1.Create three linked geodatabases to synthesize datasets using nested locations (i.e., plot located in a park located in a county) and standardized NRCS plant codes (Figure 1). The three geodatabases are general enough to accommodate plot data; GIS point, line and polygon data; and species lists for large geographic units. 2.Compare datasets at smaller scales to those at larger scales where they are nested. Datasets include 63 county lists, two National Park unit lists, 493 plots, and 15,666 GIS points, lines, and polygons. Methods Dataset synthesis indicates biases in individual datasets. For example, the county dataset is based on herbarium records that are biased towards specific areas (Figure 2). Two hundred seventy four new species records (including 35 Colorado noxious weeds and 99 unique species) were added to 47 of 48 counties containing nested locations (Figure 3). Cardaria draba (Hoary cress) was added to six county lists. In addition, Cardaria draba was not listed in parks and other areas within some counties listing it, indicating managers should be vigilant for these species (Figure 4). Results These results indicate that data synergy can be used to improve the quality of individual datasets. Bias of county datasets can be diminished by supporting them with data from other sources. Land managers can use synthesized non-native datasets to set priorities for early detection and prevention efforts. In addition, the geodatabase can be used to create predictive spatial models for hotspots of invasion in the state or in a specific land management unit and to create distribution surfaces for individual species. These models can then be served on-line to distribute non- native species information to land mangers at all levels (Figure 5) . Conclusions and Future Steps San Luis Valley GIS/GPS Figure 4: Dataset Synergy Figure 1: Data Management Figure 5: On-line Interface

Transcript of Synthesizing and Spatially Linking Disparate Datasets Catherine Crosier 1,2, Gregory J. Newman 1,...

Page 1: Synthesizing and Spatially Linking Disparate Datasets Catherine Crosier 1,2, Gregory J. Newman 1, and Thomas J. Stohlgren 1,2 1 Natural Resource Ecology.

Synthesizing and Spatially Linking Disparate Datasets Catherine Crosier1,2, Gregory J. Newman1, and Thomas J. Stohlgren1,2

1Natural Resource Ecology Laboratory, Colorado State University, Fort Collins, Colorado 80523-1499

2Fort Collins Science Center, US Geological Survey, Fort Collins, Colorado 80525

Acknowledgements

DATA SOURCE DATA TYPE1. Biota of North America Program County species list

2. U.S. Geological Survey Multi-scale plot data

3. U.S. Forest Service Multi-scale plot data (2 national programs)

4. Graduate students Multi-scale plot data

5. San Luis Valley GIS Authority Point, line, and polygon data

6. Larimer County Point, line, and polygon data (noxious weeds)

7. Center for the Environmental Management of Military Lands (CEMML)

Vegetation monitoring transects

8. The Nature Conservancy Polygon, point, and transect data

9. Colorado Natural Heritage Program Polygon and point data

10. State of Colorado Quarter quad data for 20 species; DOT weed GIS data

11. CSU Research Scientists Point data and control data

12. National Park Service Park species lists and GIS data

13. U.S. Fish and Wildlife Service Refuge manager survey results

14. LTER Plot data

15. Bureau of Land Management Point, polygon, and control data

Problem: Lack of knowledge

1. Ecological studies often are conducted over short temporal scales in specific areas.

2. Species occurrence and abundance data do not exist for many public land units (e.g. Fish and Wildlife Refuges), making it hard to make sound management decisions.

Solution: Synthesize smaller studies

1. Addresses additional and more complex questions

2. Capitalizes on resources already expended

3. Creates probable species lists and distribution maps

Introduction

Figure 2: Dataset Bias

1. Obtain datasets from partners (Table 1).

2. Synthesize pre-existing datasets, including multiple scale (i.e., 1-m2 to entire county) information on non-native vascular plant species in Colorado obtained from individuals, organizations, and agencies at all levels (i.e., federal to local) in a geodatabase (Figure 1).

3. Use on-line GIS technology to disseminate information on non-native species locations to land managers and the public in a user-friendly and easily understood manner (Figure 1).

Objectives

Table 1: Datasets Used, Pending Figure 3: Species Added

Extensive surveys to obtain species distribution data are expensive and time consuming. In addition, many land managers do not know what species occur within their management boundaries, let alone adjacent to them. Yet, there is a wealth of data collected by government agencies and non-government organizations that, when synthesized, maximize the use of existing data on species locations without incurring large costs. We obtained datasets on non-native species including species lists, plot data, transect data, point data, and individual species polygon data that have been collected using various methods at different spatial scales to meet different objectives. We then created a geodatabase with ESRI’s ArcGIS and Microsoft Access 2000 that relates these different datasets spatially and hierarchically and allows us to use them in concert. When combined and spatially linked, the data improve the completeness of each individual dataset. As an example, species reported in a specific area by one dataset were not reported by a second dataset for the same location. The creation of a synthesized, spatially linked database allows development of predictive models and maps for the distribution and abundance of non-native species. These products can be used by land managers at all levels to aid in the early detection, management, and control of non-native species.

Keywords: Data synthesis, Data synergy, Spatial linking, Non-native species, Ecological databases

Abstract 1. Create three linked geodatabases to synthesize datasets using nested locations (i.e., plot located in a park located in a county) and standardized NRCS plant codes (Figure 1). The three geodatabases are general enough to accommodate plot data; GIS point, line and polygon data; and species lists for large geographic units.

2. Compare datasets at smaller scales to those at larger scales where they are nested. Datasets include 63 county lists, two National Park unit lists, 493 plots, and 15,666 GIS points, lines, and polygons.

Methods

• Dataset synthesis indicates biases in individual datasets. For example, the county dataset is based on herbarium records that are biased towards specific areas (Figure 2).

• Two hundred seventy four new species records (including 35 Colorado noxious weeds and 99 unique species) were added to 47 of 48 counties containing nested locations (Figure 3).

• Cardaria draba (Hoary cress) was added to six county lists. In addition, Cardaria draba was not listed in parks and other areas within some counties listing it, indicating managers should be vigilant for these species (Figure 4).

Results

These results indicate that data synergy can be used to improve the quality of individual datasets. Bias of county datasets can be diminished by supporting them with data from other sources. Land managers can use synthesized non-native datasets to set priorities for early detection and prevention efforts. In addition, the geodatabase can be used to create predictive spatial models for hotspots of invasion in the state or in a specific land management unit and to create distribution surfaces for individual species. These models can then be served on-line to distribute non-native species information to land mangers at all levels (Figure 5) .

Conclusions and Future Steps

San Luis Valley GIS/GPS

Figure 4: Dataset Synergy

Figure 1: Data Management Figure 5: On-line Interface