Pinus albicaulis) in Waterton Lakes National Park using logistic...

11
Mapping the distribution of whitebark pine ( Pinus albicaulis) in Waterton Lakes National Park using logistic regression and classification tree analysis G.J. McDermid and I.U. Smith Abstract. Accurate spatial information on the distribution of whitebark pine, a keystone species in alpine environments across western Canada, is critical for the planning of conservation activities designed to ameliorate the damaging effects of blister rust, mountain pine beetle, and interspecific competition. We compared classification tree analysis and logistic regression analysis to explore their relative abilities to model whitebark pine presence and absence with medium-spatial-resolution satellite and topographic variables across a complex study site in Waterton Lakes National Park, Alberta. Both techniques were found to be effective, generating map products of roughly equal thematic quality (91% overall accuracy; kappa = 0.76). However, the logistic model was valuable in its ability to predict ratio-level probability surfaces, whereas the classification tree approach was simpler, faster, and found to generate a slightly more balanced model from an individual class accuracy perspective. End users selecting between the two techniques should make choices that balance flexibility with simplicity while always taking care to exercise sound modeling practices. Résumé. L’information exacte dans l’espace sur la distribution du pin à écorce blanche, une espèce clef dans les régions alpines de l’ouest Canadien, est essentielle pour les activités de conservation qui ont pour but d’améliorer les effets nocifs provoqués par la rouille vésiculeuse du pin blanc, le dendroctone du pin ponderosa, et la compétition interspécifique. Nous avons comparé l’analyse basée sur un arbre de décision avec l’analyse basée sur la r ression logistique de façon à définir la capacité de chaque méthode à schématiser la présence ou l’absence du pin à écorce blanche à résolution spatiale moyenne, et avec des variables topographiques dans la région du Parc National des Lacs-Waterton. Les deux méthodes se sont avérées efficaces, permettant de créer des cartes de distribution d’une qualité thématique similaire (avec une précision globale de 91 % et un coefficient kappa de 0,76). Par contre, le modèle basé sur la régression logistique était int essante de par sa capacité à générer des surfaces de probabilités à une echelle de mesure de type rapport, tandis que l’analyse basée sur un arbre de décision était plus simple, plus rapide, et permettait de créer un modèle de classification plus équilibré au niveau de l’exactitude de la catégorisation de chaque classe. Finalement les utilisateurs qui ont le choix entre ces deux techniques doivent choisir la plus simple et la plus flexible, tout en exerçant de bonnes méthodes de schématisation. 366 Introduction Whitebark pine (Pinus albicaulis) is a five-needle pine species that occurs in high-elevation environments across western Canada and the United States (Critchfield and Little, 1966). Widely regarded as a keystone species (Mattson et al., 2001), whitebark pine (WBP) plays a central role in wildlife habitat and watershed protection in the timberline forests that it inhabits (Farnes, 1990; Tomback et al., 2001; Copeland et al., 2007). The seeds of WBP are large and nutritious (Lanner and Gilbert, 1994) and represent an important food source for a variety of species, including the grizzly bear (Mattson and Merrill, 2002), red squirrel (Mattson et al., 2001), and Clark’s nutcracker (Hutchins and Lanner, 1982). So valuable is WBP as a food resource that when whitebark seeds are abundant, grizzly bears eat virtually nothing else (Craighead et al., 1982), and the presence of a healthy cone crop is thought be a key factor governing the interaction between grizzly bears and humans (Mattson et al., 1992; Mattson and Reinhart, 1997). Unfortunately, however, WBP is declining throughout much of its range, being adversely impacted by a variety of factors, including exotic white pine blister rust (Zeglen, 2002; McKinney and Tomback, 2007), epidemics of mountain pine beetle (Perkins and Roberts, 2003; Waring and Six, 2005), 356 © 2008 CASI Can. J. Remote Sensing, Vol. 34, No. 4, pp. 356–366, 2008 Received 31 December 2007. Accepted 18 June 2008. Published on the Canadian Journal of Remote Sensing Web site at http://pubs.nrc-cnrc.gc.ca/cjrs on 12 September 2008. G.J. McDermid 1 and I.U. Smith. Foothills Facility for Remote Sensing and GIScience, Department of Geography, University of Calgary, Calgary, AB T2N 1N4, Canada. 1 Corresponding author (e-mail: [email protected]).

Transcript of Pinus albicaulis) in Waterton Lakes National Park using logistic...

Page 1: Pinus albicaulis) in Waterton Lakes National Park using logistic ...people.ucalgary.ca/~mcdermid/Docs/Articles/McDermid-Smith_2008.pdf · Mapping the distribution of whitebark pine

Mapping the distribution of whitebark pine(Pinus albicaulis) in Waterton Lakes

National Park using logistic regression andclassification tree analysis

G.J. McDermid and I.U. Smith

Abstract. Accurate spatial information on the distribution of whitebark pine, a keystone species in alpine environmentsacross western Canada, is critical for the planning of conservation activities designed to ameliorate the damaging effects ofblister rust, mountain pine beetle, and interspecific competition. We compared classification tree analysis and logisticregression analysis to explore their relative abilities to model whitebark pine presence and absence withmedium-spatial-resolution satellite and topographic variables across a complex study site in Waterton Lakes National Park,Alberta. Both techniques were found to be effective, generating map products of roughly equal thematic quality (91%overall accuracy; kappa = 0.76). However, the logistic model was valuable in its ability to predict ratio-level probabilitysurfaces, whereas the classification tree approach was simpler, faster, and found to generate a slightly more balanced modelfrom an individual class accuracy perspective. End users selecting between the two techniques should make choices thatbalance flexibility with simplicity while always taking care to exercise sound modeling practices.

Résumé. L’information exacte dans l’espace sur la distribution du pin à écorce blanche, une espèce clef dans les régionsalpines de l’ouest Canadien, est essentielle pour les activités de conservation qui ont pour but d’améliorer les effets nocifsprovoqués par la rouille vésiculeuse du pin blanc, le dendroctone du pin ponderosa, et la compétition interspécifique. Nousavons comparé l’analyse basée sur un arbre de décision avec l’analyse basée sur la r ression logistique de façon à définir lacapacité de chaque méthode à schématiser la présence ou l’absence du pin à écorce blanche à résolution spatiale moyenne,et avec des variables topographiques dans la région du Parc National des Lacs-Waterton. Les deux méthodes se sont avéréesefficaces, permettant de créer des cartes de distribution d’une qualité thématique similaire (avec une précision globale de91 % et un coefficient kappa de 0,76). Par contre, le modèle basé sur la régression logistique était int essante de par sacapacité à générer des surfaces de probabilités à une echelle de mesure de type rapport, tandis que l’analyse basée sur unarbre de décision était plus simple, plus rapide, et permettait de créer un modèle de classification plus équilibré au niveau del’exactitude de la catégorisation de chaque classe. Finalement les utilisateurs qui ont le choix entre ces deux techniquesdoivent choisir la plus simple et la plus flexible, tout en exerçant de bonnes méthodes de schématisation.

366

Introduction

Whitebark pine (Pinus albicaulis) is a five-needle pinespecies that occurs in high-elevation environments acrosswestern Canada and the United States (Critchfield and Little,1966). Widely regarded as a keystone species (Mattson et al.,2001), whitebark pine (WBP) plays a central role in wildlifehabitat and watershed protection in the timberline forests that itinhabits (Farnes, 1990; Tomback et al., 2001; Copeland et al.,2007). The seeds of WBP are large and nutritious (Lanner andGilbert, 1994) and represent an important food source for avariety of species, including the grizzly bear (Mattson and

Merrill, 2002), red squirrel (Mattson et al., 2001), and Clark’snutcracker (Hutchins and Lanner, 1982). So valuable is WBPas a food resource that when whitebark seeds are abundant,grizzly bears eat virtually nothing else (Craighead et al., 1982),and the presence of a healthy cone crop is thought be a keyfactor governing the interaction between grizzly bears andhumans (Mattson et al., 1992; Mattson and Reinhart, 1997).Unfortunately, however, WBP is declining throughout much ofits range, being adversely impacted by a variety of factors,including exotic white pine blister rust (Zeglen, 2002;McKinney and Tomback, 2007), epidemics of mountain pinebeetle (Perkins and Roberts, 2003; Waring and Six, 2005),

356 © 2008 CASI

Can. J. Remote Sensing, Vol. 34, No. 4, pp. 356–366, 2008

Received 31 December 2007. Accepted 18 June 2008. Published on the Canadian Journal of Remote Sensing Web site athttp://pubs.nrc-cnrc.gc.ca/cjrs on 12 September 2008.

G.J. McDermid1 and I.U. Smith. Foothills Facility for Remote Sensing and GIScience, Department of Geography, University of Calgary,Calgary, AB T2N 1N4, Canada.

1Corresponding author (e-mail: [email protected]).

Page 2: Pinus albicaulis) in Waterton Lakes National Park using logistic ...people.ucalgary.ca/~mcdermid/Docs/Articles/McDermid-Smith_2008.pdf · Mapping the distribution of whitebark pine

climate change (Koteen, 1999), and competition fromshade-tolerant tree species (Keane and Arno, 1993) bolsteredby more than a century of fire suppression (Hallett and Walker,2000). Concerns regarding the future of WBP, includingconsideration for formal listing as a threatened species, haveled to a variety of conservation initiatives, including fieldsurveys (Campbell and Antos, 2000), genetic inventories(Beardmore et al., 2006; Bower and Aitken, 2006), and an arrayof proactive intervention programs (Schoettle and Sniezko,2007).

Although WBP is known to exist throughout much of theCanadian Cordillera south of the Peace River, detailedinformation regarding the occurrence of the species at the patchor stand level is seriously lacking (Wilson and Stuart-Smith,2002). Previous work in the Kootenay, Yoho, and Lake Louiseregions of Alberta and British Columbia (Wilson and Marcoux,2005) has investigated the use of logistic regression models forexplaining patterns of WBP occurrence using topographicvariables derived from a digital elevation model (DEM).However, the effort was only able to account for 9% of theobserved variance, revealing, perhaps, the limitation ofmoderate-scale topographic datasets for supporting this type ofdemanding stand-level modeling work. Other studies havedemonstrated the potential of spectral variables from remotesensing for capturing patterns related to the distribution ofspecific vegetation species. For example, Kokaly et al. (2003)used hyperspectral Airborne Visible/Infrared ImagingSpectrometer (AVIRIS) data to compile a spectral library for anumber of vegetation cover types in Yellowstone National Park,including WBP, and used a spectral feature-fitting algorithmand expert system rules to generate vegetation maps withaccuracy levels of 74% overall. Goodwin et al. (2005) usedanalysis of variance to document statistical differences betweenspecies in an Australian eucalypt forest using 80 cm CompactAirborne Spectrographic Imager-2 (CASI-2) imagery andshowed that broad species groups could be separated withsupervised maximum likelihood strategies. Hamada et al.(2007) employed hierarchical clustering procedures to classifytamarisk communities in riparian habitats in southernCalifornia using high spatial and spectral resolution airbornedata. However, each of these studies relied on highly detailedairborne imagery and employed methods that would be difficultto duplicate over large areas.

Other researchers have shown that detailed vegetationpatterns can be modeled effectively withmoderate-spatial-resolution satellite imagery using datasetscomprised of combined spectral and topographic explanatoryvariables. For example, Wulder et al. (2006) usedmultitemporal Landsat Enhanced Thematic Mapper Plus(ETM+) imagery and a variety of topographic descriptors tomodel red-attack mountain pine beetle damage with anaccuracy of 86%. Lawrence and Wright (2001) achievedsimilar results mapping vegetation communities in theYellowstone ecosystem with combined TM–DEM explanatoryvariables. The potential of combined datasets for modelingpatterns of WPB occurrence was demonstrated by

Landenburger et al. (2006), who achieved overall accuraciesapproaching 90% in the Yellowstone region. However, thereremains a paucity of examples of this type of work in theremote sensing literature, particularly in the Canadian context,and we require additional research efforts designed toilluminate specific strategies for mapping WBP, and othervegetation communities of special concern, across the complexhabitats in which they occur. Of particular interest is the role ofspecific modeling techniques designed to handle thepresence–absence information that typically comprises theresponse variables available for this type of work. Twotechniques, binary logistic regression and classification treeanalysis, show particular promise. Logistic regression is a formof generalized linear modeling designed to handle the variancepatterns normally generated by binary response variables(Crawley, 2002). The procedure is attractive in that it generatesa continuous probability surface that represents a flexible,ratio-level information product that can be interpreted in avariety of manners. However, the model construction andcriticism process is time-consuming and requires specializedstatistical prowess. Classification tree analysis (CTA) is analternative statistical procedure designed to identify rule-baseddecision trees that predict the most probable classificationcategories using a series of recursive binary divisions (Breimanet al., 1984; Venables and Ripley, 1997). Although thelow-level, categorical output is not as desirable as theprobabilistic surface provided by logistic regression analysis,CTA is attractive in its ability to handle large numbers ofdiverse explanatory variables quickly and efficiently, andprevious studies (e.g., Vayssiers et al., 2000; Miller andFranklin, 2002) have demonstrated its broad effectivenessacross a variety of experimental conditions.

The objectives of the research reported in this paper weretwo-fold. The first was to compare the abilities of logisticregression and CTA to model the distribution of WBP withspectral and topographic variables across a diverse environmentin the northern portion of the species’ range. We posed thefollowing questions: Can medium-spatial-resolution satelliteimagery be combined with topographic descriptors derivedfrom a DEM to map the patch-level occurrence of WBP withreasonable accuracies? Would one of logistic regressionanalysis or CTA stand out as a superior strategy for modelingWBP presence and absence? The second objective of this workwas to contribute to ongoing conservation efforts by mappingthe distribution of WBP across Waterton Lakes National Park,where the species is experiencing high rates of blister rustinfection. A high-quality map of stand-level WBP occurrencedoes not presently exist in the area and would provide both avaluable tool for guiding proactive intervention programs and asolid foundation for conducting future studies designed toclarify the role of the species in the Waterton ecosystem. Ingeneral, we hope that this effort will help establish a foundationthat leads to subsequent interagency efforts aimed at mappingthe distribution of WBP across its entire Canadian range.

© 2008 CASI 357

Canadian Journal of Remote Sensing / Journal canadien de télédétection

Page 3: Pinus albicaulis) in Waterton Lakes National Park using logistic ...people.ucalgary.ca/~mcdermid/Docs/Articles/McDermid-Smith_2008.pdf · Mapping the distribution of whitebark pine

MethodsStudy area

Waterton Lakes National Park is located in the extremesouthwest corner of Alberta, Canada (Figure 1). The park islocated 270 km south of the city of Calgary and is bordered bythe province of British Columbia to the west and GlacierNational Park, Montana, to the south. Waterton Lakes NationalPark is a unique environment in that it is located on the frontranges of the Rocky Mountains, where the mountains meet theprairies. The park itself is relatively small (approximately500 km2) but is home to an exceptional diversity of plants andfauna, with 45 unique vegetation types identified in a recentParks Canada ecological land classification (Achuff et al., 2002).The landscape transitions from relatively flat grasslands in theeast to steep alpine peaks as the elevation rises from 1300 m toover 2500 m. Whitebark pine is a major component ofhigh-elevation forests in the study area, where it experiencesblister rust infection rates approaching 70% (Smith et al., 2008).

Datasets

The chief data sources for this project were (i) a Landsat 7ETM+ image from 7 July 2001, (ii) a digital elevation model,and (iii) georeferenced point data representing the presence andabsence of WBP. In addition, we used 1-m spatial resolutiondigital orthophotographs and 1 : 20 000 scale ecological landclassification maps to confirm the validity of WBPpresence–absence data where necessary (described later in thissection). The orthophotographs were derived from 1 : 40 000scale true colour aerial photography acquired by Horizons, Inc.(Rapid City, S. Dak.) for the Waterton–Glacier InternationalPeace Park area in August of 2004 (Horizons Job 12352).

The Landsat 7 ETM+ image (scene 41/26) contained sixspectral bands (bands 1–5 and 7) at a spatial resolution of 30 m.The image was converted to top-of-atmosphere reflectance andthen rescaled to eight-bit values. The tasseled-cap transformation(TCT) of Crist and Cicone (1984) was used to generate the TCTvariables brightness, greenness, and wetness, a reduced list oforthogonal spectral variables designed to facilitate subsequentmodeling efforts. In addition, a principal components analysis(PCA) was performed to guard against the possibility that theTCT would not be suitable for the high-elevation environmentprevalent in the study area. The normalized difference vegetationindex (NDVI) of Deering (1978) was also generated to fill outthe final spectral variable set.

A topographic–positional descriptor set consisting of thefollowing five digital explanatory variables was extracted froma coregistered DEM of the study area: elevation, slope, easting,northness, and eastness. Northness and eastness are both lineartransformations of the circular variable aspect and werecalculated as the cosine (northness) or the sine (eastness) ofaspect. Easting, a relative measure of longitude, was alsoincluded in the variable set. The final list of spectral andtopographic explanatory variables available for modeling issummarized in Table 1.

The distribution of WBP across the Waterton Lakes NationalPark study area was characterized by presence and absencepoints obtained from a combination of field surveys and aerialphotograph calls cross-referenced with Parks Canadaecological land classification maps. Presence points werecompiled from five separate field campaigns conductedbetween 1994 and 2006 (Table 2). From these efforts, 145georeferenced global positioning system (GPS) locationsindicating the presence of whitebark pine were obtained. Wetook special care to control for methodological differencesobserved between individual field programs, includinginconsistencies in the sample units surveyed, which rangedfrom locations of individual trees to the centroids of 100 msample plots. This issue was addressed through the creation ofspectral objects (described later in the paper) designed toprovide larger units of analysis that would increase thelikelihood that each presence entity contained at least oneconfirmed WBP individual.

WBP absence points were made up of 141 confirmed fieldplots acquired in the summer of 2005 using a stratified randomsampling method, in which 30 m sample plots were selectedacross locations constrained to within 250 m of a trail or accessfeature. The absence dataset was then supplemented with anadditional 349 randomly generated sample points designed tomatch the 0.98 points/km2 sample density achieved within the250 m access zone. Each randomly generated point wascarefully scrutinized with 1 m digital orthophotography andcross-checked with Parks Canada ecological land classificationmaps. The final WBP point sample dataset was comprised of145 presence locations (all confirmed with field surveys) and490 absence locations (141 field confirmed and 349 aerialphotograph confirmed).

358 © 2008 CASI

Vol. 34, No. 4, August/août 2008

Figure 1. Location of Waterton Lakes National Park in Alberta,Canada.

Page 4: Pinus albicaulis) in Waterton Lakes National Park using logistic ...people.ucalgary.ca/~mcdermid/Docs/Articles/McDermid-Smith_2008.pdf · Mapping the distribution of whitebark pine

Modeling data

To minimize the spatial uncertainty arising from hand-heldGPS locations and the varying plot sizes associated withindividual field campaigns, we created new object-basedsample units with Definiens Professional 5.0 using spectralTCT variables segmented with scale = 2, smoothness = 0.5, andcolour = 0.9. The parameters were selected iteratively withvisual interpretation for the purpose of creating small,spectrally homogeneous units of analysis that were roughly thesize of small tree stands, within which spectral and topographicexplanatory variables could be summarized. The resultingentities and their associated attributes defined the dataset uponwhich subsequent statistical modeling activities took place.The final dataset was split randomly into training (85%) andvalidation (15%) sets using Huberty’s (1994) heuristic, witheach set containing the same proportion of absence to presenceobjects: a ratio of approximately 3.4 to 1.

Logistic regression analysis

Simple linear regression assumes that variance is constantand that the errors are normally distributed (Crawley, 2002).However, binary and proportional data often display ∩-shapeddistribution patterns that violate these assumptions and call forthe use of binomial-family generalized linear models with alogit link. The logit transformation takes the form

logit =

ln

p

q(1)

where p is the probability of presence of whitebark pine, q isthe probability of absence of whitebark pine, and p + q = 1. Inthis way, presence and absence observations are combined inthe construction of a two-vector response variable, andregression analysis calculates changes in the logit variable,

© 2008 CASI 359

Canadian Journal of Remote Sensing / Journal canadien de télédétection

Variable Spatial resolution (m)

SpectralLandsat 7 ETM+ imagery (scene 41/26, 7 July 2001), bands 1–5 and 7 30Principal components analysis results (PC 1 – PC 6) 30Tasseled cap transformation results (brightness, greenness, wetness) 30NDVI 30

TopographicDigital elevation model of WLNP (elevation) 30Easting 30Eastness 30Northness 30Slope 30

Table 1. Summary of the candidate spectral and topographic variables available to model thedistribution of whitebark pine in the Waterton Lakes National Park (WLNP) study area.

Organization Date collectedCoordinaterecording device Purpose of survey Coordinate information

No. ofpoints

B. Wilson, SelkirkCollege andParks Canada

Summer 2005and 2006

Hand-held GPSunits

Identify blister rust resistanttrees

Coordinates represent location ofindividual tree

45

B. Wilson, SelkirkCollege andParks Canada

Summer 2004and 2005

Hand-held GPSunits

Collect information onwhitebark pine regeneration

Coordinates represent centre of100 m plots in which at leastone whitebark tree is present

38

C. Wong,University ofBritish Columbia

Summer 2005 Hand-held GPSunits

Research on whitebark pinestand dynamics as part ofPh.D. project

Coordinates represent centre of30 m plots in which one ormore whitebark pine are present

15

Parks Canada Summer 2003 Hand-held GPSunits

Document extent of whitebarkpine blister rust

Coordinates represent beginning orend of transects; at least onewhitebark pine is present within5 m of these coordinates

38

Parks Canada 1994–1996 Hand-held GPSunits

Collect predominant vegetationinformation to aid increation of ecosystem landclassification (ELC) maps

Coordinates represent centre of20 m plots within which at leastone whitebark was identified

9

Total 145

Table 2. Summary of the whitebark pine presence points used in the analyses.

Page 5: Pinus albicaulis) in Waterton Lakes National Park using logistic ...people.ucalgary.ca/~mcdermid/Docs/Articles/McDermid-Smith_2008.pdf · Mapping the distribution of whitebark pine

rather than the individual dependents themselves. The modelcalculates the probability of WBP presence (p) as

px xk k

=+ − + + +

1

1 0 1 1exp[ ( )]α α α�

(2)

where α0–αk are the regression coefficients of k explanatoryvariables (x). Unlike ordinary least squares regression, theexplanatory variables can be continuous or discrete, normallydistributed or not (Hosmer and Lemeshow, 2000).

We constructed nine candidate models using ecologicallyfeasible combinations of explanatory variables independent atthe r < 0.7 level and examined for significance using the Waldstatistic (Norusis, 2005). The significance of each model wascriticized using Hosmer and Lemeshow goodness-of-fit tests(Hosmer and Lemeshow, 2000) and evaluated against each otherusing Akaike’s information criteria (AIC). AIC measures therelative “distance” of each model from the truth, calculated as

AIC LL= − +2 2k (3)

where LL is the logarithm-likelihood function, and k is thenumber of parameters estimated in the model (Burnham andAnderson 2002). Lower AIC values indicate a better fittingmodel with fewer estimated parameters, following the principleof parsimony. AIC was adjusted for sample size through thecalculation of AICc:

AICc AIC= + +− −

[ ( )]

( )

2 1

1

k k

n k(4)

where n is the sample size. We determined the rank of eachmodel by calculating the difference (∆i) between the AICcvalue of the ith model and that of the best model (AICcmin):

∆i i= −AICc AICcmin (5)

Once identified, the best model was applied spatially inArcGIS to generate a continuous probability surface rangingbetween 0 and 1 for all objects in the study area. We used theindependent test dataset to create receiver operatingcharacteristics (ROC) curves (Hanley 1989), confusionmatrices, and standard categorical accuracy-assessmentstatistics by thresholding the probability surface at the 0.4 level,a level selected empirically to maximize overall accuracy andone that approximates the level used previously by otherauthors (e.g., Wulder et al., 2005).

Classification tree analysis

Classification and regression tree analyses represent anincreasingly popular set of statistical techniques designed topredict categorical (classification tree) or continuous(regression tree) response variables through binary recursivepartitioning, wherein feature space is divided up using arecursive series of binary splits based on the thresholding of

individual predictor variables (Venables and Ripley, 1997).The result is a multibranched tree diagram that defines a seriesof decision rules which are well suited for implementing in aGIS environment. A number of splitting criteria are available(Zambon et al., 2006), though Pal and Mather (2003) havesuggested that the choice has little effect on overall accuracylevels. CTA offers several important advantages over logisticregression analysis, including a greatly simplified, one-stepvariable selection procedure and simple, easy-to-interpretgraphical outputs. The procedure is also nonparametric andhandles explanatory variables at any data level, though thecategorical output is decidedly less flexible than the probabilitydensity surface generated by logistic regression analysis.

We used the class probability splitting algorithm contained inS-Plus 14.0 to generate a decision tree comprised of 16 terminalnodes. The analysis segregates the data based on probability,with the algorithm comparing all possible values for everypredictor variable, and selecting splits that maximize the“purity” of the child nodes: minimizing deviance whilemaximizing homogeneity (Venables and Ripley, 1997). Splittingalong a given branch is halted when the number of observationsis less than 10 or the node features less than 1% of the totaldeviance of the entire tree. The maximal 16-node tree waspruned to 10 nodes using cross-validation and a standardcost-complexity measure (Crawley, 2002) As in logisticregression analysis, the final model was implemented withinArcMap to generate binary values of 1 (presence) or 0 (absence)for every object in the study area. The predicted classes werethen compared against known values for the 95 objects in theindependent validation dataset and used to generate confusionmatrices and standard accuracy-assessment statistics.

ResultsLogistic regression analysis

The rank, AICc, ∆i values, and other relevant statistics for thenine candidate logistic regression models are summarized inTable 3. As expected, each model contained a blend of spectraland topographic predictors, verifying once again the superiorperformance of combined multisource data for this kind ofanalysis. The best model (model 1) contained six explanatoryvariables, with positive coefficients displayed for elevation andNDVI, and negative coefficients for northness and twoadditional spectral variables, namely ETM+ band 2 and PCA 2.The topographic–positional relationships were logical,reflecting the increased occurrence of WBP at high elevationand warm aspect environments documented elsewhere in theliterature (e.g., Arno and Hoff, 1989). The negative coefficientof the easting variable, a simple measure of longitude, indicatedthat WBP presence decreased as one moved farther east acrossthe study area. The inclusion of the three spectral variablesreflected the associations one would expect between WBP andgreen vegetation, namely positive for NDVI and negative forETM+ band 2 (chlorophyll absorption) and PCA 2, interpretedhere to represent land cover dominated by conifer forest –

360 © 2008 CASI

Vol. 34, No. 4, August/août 2008

Page 6: Pinus albicaulis) in Waterton Lakes National Park using logistic ...people.ucalgary.ca/~mcdermid/Docs/Articles/McDermid-Smith_2008.pdf · Mapping the distribution of whitebark pine

treeline (including WBP habitat) in the ranges selected by themodel. The topographic variables of easting, elevation, andnorthness appeared in each of the candidate regression models,indicating the importance of these topographic controls onWPB distribution. Other highly ranked models containedreduced or alternate combinations of spectral variables (PCA 2was excluded from model 2, ranked second; brightness wasincluded in model 9, ranked third).

The final WBP presence map, as predicted by the bestlogistic regression model probability surface, thresholded at the0.4 level is shown in Figure 2. The area under the ROC curveswas 0.952 and 0.950 for the training and validation datasets,respectively, falling well within the 0.9–1.0 range suggested bySwets (1988) to indicate “very good” levels of discrimination.The overall accuracy of the map was 90% (kappa = 0.76), withthe pattern of errors (95% producer’s accuracy and 72% user’saccuracy for WPB presence; 89% producer’s accuracy and 89%user’s accuracy for WBP absence) suggesting an overpredictionof WBP presence, with almost 30% error of commission amongindependent test objects.

Classification tree analysis

The 10-node classification tree generated by CTA is shown inFigure 3. The variables qualifying for the final model (elevation,easting, northness, NDVI, ETM+ band 2, and PCA 2) wereidentical to those observed in the best logistic regression model,suggesting a reassuring consistency in variable selectionbetween the two techniques. The final WBP presence map, aspredicted by CTA, is shown in Figure 4. The overall accuracy of

the map was 91% (kappa = 0.76), with the pattern of errors (77%producer’s accuracy and 85% user’s accuracy for WBPpresence; 96% producer’s accuracy and 93% user’s accuracy forWBP absence) suggesting a moderate underprediction of WBPpresence of about 20% among independent test objects.

DiscussionThe comparative accuracies of the final CTA and logistic

regression models, as defined by standard categorical accuracyassessment statistics from the independent test dataset, are shownin Table 4. The overall accuracy of CTA and logistic regressionmap products is the same at 0.76, a result confirmed by a kappaZ-test score of 0, suggesting (obviously) no significant difference(Foody, 2004). Although the overall thematic quality of the mapsis the same, there were substantial differences among errorpatterns at the individual class level. The logistic model had ahigher producer’s accuracy of WBP presence than the CTA model(77%–95%), but lower user’s accuracy (72%–85%), suggestingthat the logistic model thresholded at the 0.4 level overpredictedthe presence of WBP. The extent of the difference is revealedgraphically in Figure 5, in which the logistic regression model isshown to predict areas below 1500 m as containing WPBpresence; well beneath the lowest observed presence of the speciescontained in the field data (1781 m).

We increased the threshold of the logistic regression modelto the 0.8 probability level, a flexibility that illustrates one ofthe distinct advantages of this modeling strategy over CTA, inan attempt to reduce the documented overprediction of WBPpresence and explore the impact of threshold adjustments on

© 2008 CASI 361

Canadian Journal of Remote Sensing / Journal canadien de télédétection

Model Variables (coefficients)

Rank and summary statistics

2LL k AIC AICc ∆i Rank

1 Easting (–0.00013), elevation (0.00683), northness (–0.01485),NDVI (0.20719), PCA 2 (–0.1839), band 2 (–0.346),constant (83.6334)

201.11 7 215.11 215.32 0.00 1

2 Easting (–0.00017), elevation (0.00740), northness (–0.01597),NDVI (0.5646), band 2 (–0.1489), constant (101.072)

235.17 6 247.17 247.32 32.00 2

3 Easting (–0.00019), elevation (0.00732), northness (–0.01225),NDVI (0.09589), constant (108.180)

254.38 5 264.38 264.49 49.17 6

4 Easting (–0.00018), elevation (0.00751), northness (–0.01396),NDVI (0.08972), band 5 (–0.0135), constant (105.264)

251.78 6 263.78 263.94 48.62 7

5 Easting (–0.00018), elevation (0.00598), northness (–0.01744),greenness (0.0649), brightness (–0.0322), constant (121.907)

258.13 6 270.13 270.28 54.97 8

6 Easting (–0.00016), elevation (0.00613), northness (–0.01693),greenness (–0.1226), brightness (0.1159), band 3 (–0.534),constant (102.170)

241.38 7 255.38 255.59 40.28 4

7 Easting (–0.00021), elevation (0.00274), northness (–0.01275),eastness (–0.0055), slope (–0.320), constant (149.148)

296.26 6 308.26 308.41 93.10 9

8 Easting (–0.00019), elevation (0.00701), northness (–0.01275),eastness (–0.00596), NDVI (0.09538), constant (111.149)

248.15 6 260.15 260.31 44.99 5

9 Easting (–0.00016), elevation (0.00770), northness (–0.01621),NDVI (0.09113), brightness (–0.0313), constant (93.308)

238.12 6 250.12 250.28 34.96 3

Note: AIC, Akaike’s information criteria; AICc, sample size corrected AIC; k, number of parameters estimated in the model; LL, logarithm-likelihoodfunction; ∆i, difference between the AICc value of the ith model and that of the best model.

Table 3. Rank and summary statistics for the nine candidate variables generated through logistic regression analyses.

Page 7: Pinus albicaulis) in Waterton Lakes National Park using logistic ...people.ucalgary.ca/~mcdermid/Docs/Articles/McDermid-Smith_2008.pdf · Mapping the distribution of whitebark pine

362 © 2008 CASI

Vol. 34, No. 4, August/août 2008

Figure 2. Whitebark pine (WBP) distribution as predicted using logistic regression modeling. The probability densitysurface was thresholded at 0.4 to create a binary presence–absence map.

Figure 3. Final classification tree, highlighting the three rules predicting the presence of WBPin the Waterton Lakes National Park study area.

Page 8: Pinus albicaulis) in Waterton Lakes National Park using logistic ...people.ucalgary.ca/~mcdermid/Docs/Articles/McDermid-Smith_2008.pdf · Mapping the distribution of whitebark pine

thematic accuracy. Although the resulting statistics (Table 4)did reveal better user’s accuracy for WBP presence (88% forthe 0.8 threshold versus 72% for the 0.4 threshold), thecorresponding producer’s accuracy suffered substantially (68%for the 0.8 threshold versus 95% for the 0.4 threshold).Although the kappa Z-test scores suggest no significant overalldifferences between the three models, there are clear disparitiesin class accuracies, the most “balanced” of which seems to bethe CTA product. Some of the issues documented in the logisticregression model could be due to the fact that the regressionapproach assesses all the variables simultaneously in theselection process, as opposed to CTA, which partitions the data

sequentially, starting with the best predictor variable(elevation, in this case) first and following through logicallywith the best remaining predictors. This multiple step approachto classification could lead to more effective use of theindependent variables and subsequently more balanced models.

Of the two modeling methods, CTA was considerably fasterand more user-friendly to apply than binary logistic regression.The machine learning nature of the variable selection process iseasy and efficient, and the CTA graphical output is logical andaccessible. Although practitioners are advised to guard againstthe habit of introducing illogical variables into the analysis andrelying exclusively on the ability of CTA to select relevant

© 2008 CASI 363

Canadian Journal of Remote Sensing / Journal canadien de télédétection

Figure 4. Whitebark pine distribution as predicted using classification tree analysis.

CTA Logistic threshold 0.4 Logistic threshold 0.8

Accuracy WBP Non-WBP WBP Non-WBP WBP Non-WBP

User’s (%) 85.00 93.33 72.41 98.48 88.24 91.03Producer’s (%) 77.27 95.89 95.45 89.04 68.18 97.26Overall (%) 91.58 90.53 90.53Kappa (%) 76 76 71Kappa Z-test score 0.0 (CTA to the 0.4 logistic model); 0.4214 (CTA to the 0.8 logistic model);

0.4218 (0.4 logistic model to the 0.8 logistic model)

Table 4. Comparative accuracy statistics for the final CTA and logistic regression maps thresholded at the0.4 and 0.8 levels.

Page 9: Pinus albicaulis) in Waterton Lakes National Park using logistic ...people.ucalgary.ca/~mcdermid/Docs/Articles/McDermid-Smith_2008.pdf · Mapping the distribution of whitebark pine

predictors, the approach represents an attractive alternative tothe more demanding logistic regression modeling procedure.However, the ratio-level probability surface generated by theregression model is clearly the more flexible and potentiallyuseful output of the two. Though end users must be made awareof the trade-offs inherent in selecting liberal or conservativethreshold values, the ability to adapt the WBP likelihoodsurface to suit individual needs represents a key advantage.

ConclusionClassification tree analysis (CTA) and logistic regression

analysis were both found to be effective strategies for modelingthe distribution of whitebark pine (WBP) across a diverse studyarea in Waterton Lakes National Park, combining spectral andtopographic predictor variables to generate map products ofroughly equal thematic quality. The logistic model wasvaluable in its ability to predict ratio-level probability surfacesfor WBP presence, but the CTA product was found to be morebalanced from a class-accuracy perspective and simpler toproduce. Practitioners selecting between the two approachesare encouraged to balance the needs of flexibility (logisticregression) with those of speed and simplicity (CTA), though it

will always remain important to exercise sound modelingpractices.

Medium-spatial-resolution optical satellite data and digitalelevation models (DEMs) provided a solid foundation fromwhich to conduct detailed stand-level mapping of WBP inwestern Canada, with overall accuracies exceeding 90% in thisexperiment. It should be noted, however, that these results (aswith all such empirical techniques) may vary considerably atdifferent sites or spatial extents. Presence and absence pointdata from a variety of sources can be combined if sufficient careis taken to minimize spatial uncertainty arising from disparatefield campaigns. Future work should be aimed at facilitatingmulti-agency efforts to map WBP across its entire westernCanada range to support ongoing conservation activitiesassociated with this keystone species.

AcknowledgementsWe gratefully acknowledge the cooperation and support of

the partner projects that supported this research both financiallyand through the sharing of data products. The FoothillsResearch Institute Grizzly Bear Research Program provided thedigital and field data that contributed to the explanatory

364 © 2008 CASI

Vol. 34, No. 4, August/août 2008

Figure 5. Comparison of the logistic regression model (grey) and classification tree analysis (CTA) model (red),illustrating the overprediction of WBP presence.

Page 10: Pinus albicaulis) in Waterton Lakes National Park using logistic ...people.ucalgary.ca/~mcdermid/Docs/Articles/McDermid-Smith_2008.pdf · Mapping the distribution of whitebark pine

variables and absence points used in this analysis. Cyndi Smithfrom Parks Canada donated the materials that provided theWBP presence points across the Waterton Lakes National Parkstudy area. Dr. Brendan Wilson provided key initial insightsand ongoing intellectual support.

ReferencesAchuff, P.L., McNeil, R.L., Coleman, M.L., Wallis, C., and Wershler, C. 2002.

Ecological land classification of Waterton Lakes National Park, Alberta.Vol. I: integrated resource description. Parks Canada, Waterton Park, Alta.221 pp.

Arno, S.F., and Hoff, R.J. 1989. Silvics of whitebark pine (Pinus albicaulis).USDA Forestry Service Intermountain Research Station, Ogden, Utah.General Technical Report INT-253.

Beardmore, T., Loo, J., McAfee, B., Malouin, C., and Simpson, D. 2006. Asurvey of tree species of concern in Canada: the role for geneticconservation. Forestry Chronicle, Vol. 82, No. 3, pp. 351–363.

Bower, A.D., and Aitken, S.N. 2006. Geographic and seasonal variation incold hardiness of whitebark pine. Canadian Journal of Forest Research,Vol. 36, No. 7, pp. 1842–1850.

Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984. Classification andregression trees. Wadsworth, Belmont, Calif.

Bricklemyer, R.S., Lawrence, R.L., Miller, P.R., and Battogtokh, N. 2006.Predicting tillage practices and agricultural soil disturbance in north centralMontana with Landsat imagery. Agriculture Ecosystems & Environment,Vol. 114, No. 2–4, pp. 210–216.

Burnham, K.P., and Anderson, D.R. 2002. Model selection and multimodelinference: a practical information–theoretic approach. 2nd ed. Springer,New York.

Campbell, E.M., and Antos, J.A. 2000. Distribution and severity of white pineblister rust and mountain pine beetle on whitebark pine in BritishColumbia. Canadian Journal of Forest Research, Vol. 30, No. 7,pp. 1051–1059.

Copeland, J.P., Peek, J.M., Groves, C.R., Melquist, N.E., McKelvey, K.S.,McDaniel, G.W., Long, C.D., and Harris, C.E. 2007. Seasonal habitatassociations of the wolverine in central Idaho. Journal of WildlifeManagement, Vol. 71, No. 7, pp. 2201–2212.

Craighead, J.J., Sumner, J.S., and Scaggs, G.B. (Editors). 1982. A definitivesystem for analysis of grizzly bear habitat and other wilderness resources.University of Montana Foundation, Missoula, Mont. Wildlife–WildlandsInstitute Monograph 1.

Crawley, M.J. 2002. Statistical computing: an introduction to data analysisusing S-Plus. Wiley, New York.

Crist, E.P., and Cicone, R.C. 1984. Application of the tasselled cap concept tosimulated Thematic Mapper data. Photogrammetric Engineering & RemoteSensing, Vol. 50, No. 3, pp. 343–352.

Critchfield, W.B., and Little, E.L., Jr. 1966. Geographic distribution of thepines of the world. USDA Forestry Service, Washington, D.C.Miscellaneous Publication 991. 97 pp.

Deering, D.W. 1978. Rangeland reflectance characteristics measured byaircraft and spacecraft sensors. Texas A&M University, College State, Tex.

Farnes, P.E. 1990. SNOTEL and snow course data: describing the hydrology ofwhitebark pine ecosystems. In Proceedings of the Symposium on Whitebark

Pine Ecosystems: Ecology and Management of a High-Mountain Resource,29–31 March 1989, Bozeman, Mont. Compiled by W.C. Schmidt and K.J.MacDonald. USDA Forestry Service Intermountain Research Station,Ogden, Utah. General Technical Report INT-270, pp. 302–304.

Foody, G.M. 2004. Thematic map comparison: Evaluating the statisticalsignificance of differences in classification accuracy. PhotogrammetricEngineering & Remote Sensing, Vol. 70, No. 5, pp. 627–633.

Goodwin, N., Turner, R., and Merton, R. 2005. Classifying Eucalyptus forestswith high spatial and spectral resolution imagery: an investigation ofindividual species and vegetation communities. Australian Journal ofBotany, Vol. 53, No. 4, pp. 337–345.

Hallett, D., and Walker, R.C. 2000. Paleoecology and its application to fire andvegetation management in the Kootenay National Park, British Columbia.Journal of Paleolimnology, Vol. 24, pp. 401–414.

Hamada, Y., Stow, D.A., Coulter, L.L., Jafolla, J.C., and Hendricks, L.W.2007. Detecting Tamarisk species (Tamarix spp.) in riparian habitats ofsouthern California using high spatial resolution hyperspectral imagery.Remote Sensing of Environment, Vol. 109, No. 2, pp. 237–248.

Hanley, J.A. 1989. Receiver operating characteristic (ROC) methodology —the state of the art. Critical Reviews in Diagnostic Imaging, Vol. 29, No. 3,pp. 307–335.

Hosmer, D., and Lemeshow, S.. 2000. Applied logistic regression. 2nd ed.Wiley, New York.

Huberty, H.J. 1994. Applied discriminant analysis. Wiley-Interscience, NewYork.

Hutchins, H.E., and Lanner, R.M. 1982. The central role of Clark’s nutcrackerin the dispersal and establishment of whitebark pine. Oecologia, Vol. 55,No. 2, pp. 192–201.

Keane, R.E., and Arno, S.F. 1993. Rapid decline of whitebark pine in westernMontana: evidence from 20-year remeasurements. Western Journal ofApplied Forestry, Vol. 8, pp. 44–47.

Kokaly, R.F., Despain, D.G., Clark, R.N., and Livo, K.E. 2003. Mappingvegetation in Yellowstone National Park using spectral feature analysis ofAVIRIS data. Remote Sensing of Environment, Vol. 84, No. 3, pp. 437–456.

Koteen, L.E. 1999. Climate change, whitebark pine, and grizzly bears in thegreater Yellowstone ecosystem. American Zoologist, Vol. 39, No. 5,pp. 113A–114A.

Landenburger, L., Lawrence, R.L., Podruzny, S., and Schwartz, C. 2006.Mapping whitebark pine distribution in the Greater YellowstoneEcosystem. In Proceedings of the American Society for Photogrammetryand Remote Sensing Annual Conference, 1–5 May 2006, Reno, Nev.American Society for Photogrammetry and Remote Sensing (ASPRS),Bethesda, Md.

Lanner, R.M., and Gilbert, B.K. 1994. Nutritive value of whitebark pine seedsand the question of their variable dormancy. In Proceedings, InternationalWorkshop on Subalpine Stone Pines and Their Environment: The Status ofOur Knowledge. Edited by W.C. Schmidt and F.-K. Holtmeier. USDAForestry Service Intermountain Research Station, Ogden, Utah. GeneralTechnical Report INT-309.

Lawrence, R.L., and Wright, A. 2001. Rule-based classification systems usingclassification and regression tree (CART) analysis. PhotogrammetricEngineering & Remote Sensing, Vol. 67, No. 10, pp. 1137–1142.

Mattson, D.J., and Merrill, T. 2002. Extirpations of grizzly bears in thecontiguous United States, 1850–2000. Conservation Biology, Vol. 16,No. 4, pp. 1123–1136.

© 2008 CASI 365

Canadian Journal of Remote Sensing / Journal canadien de télédétection

Page 11: Pinus albicaulis) in Waterton Lakes National Park using logistic ...people.ucalgary.ca/~mcdermid/Docs/Articles/McDermid-Smith_2008.pdf · Mapping the distribution of whitebark pine

Mattson, D.J., and Reinhart, D.P. 1997. Excavation of red squirrel middens bygrizzly bears in the whitebark pine zone. Journal of Applied Ecology,Vol. 34, No. 4, pp. 926–940.

Mattson, D.J., Blanchard, B.M., and Knight, R.R. 1992. Yellowstone grizzlybear mortality, human habituation, and whitebark pine seed crops. Journalof Wildlife Management, Vol. 56, No. 3, pp. 432–442.

Mattson, D.J., Kendall, K.C., and Reinhart, D.P. 2001. Whitebark pine, grizzlybears, and red squirrels. In Whitebark pine communities: ecology andrestoration. Edited by D.F. Tomback, S.F. Arno, and R.E. Keane. IslandPress, Washington, D.C.

McKinney, S.T., and Tomback, D.F. 2007. The influence of white pine blisterrust on seed dispersal in whitebark pine. Canadian Journal of ForestResearch, Vol. 37, No. 6, pp. 1044–1057.

Miller, J., and Franklin, J. 2002. Modeling the distribution of four vegetationalliances using generalized linear models and classification trees withspatial dependence. Ecological Modelling, Vol. 157, No. 2–3, pp. 227–247.

Norusis, M. 2005. SPSS 13.0 advanced statistical procedures companion.Prentice Hall, New York.

Pal, M., and Mather, P.M. 2003. An assessment of the effectiveness of decisiontree methods for land cover classification. Remote Sensing of Environment,Vol. 86, No. 4, pp. 554–565.

Perkins, D.L., and Roberts, D.W. 2003. Predictive models of whitebark pinemortality from mountain pine beetle. Forest Ecology and Management,Vol. 174, No. 1–3, pp. 495–510.

Schoettle, A.W., and Sniezko, R.A. 2007. Proactive intervention to sustainhigh-elevation pine ecosystems threatened by white pine blister rust.Journal of Forest Research, Vol. 12, No. 5, pp. 327–336.

Smith, C.M., Wilson, B.C., Rasheed, S., Walker, R.C., Carolin, T., andShepherd, B. 2008. Whitebark pine and white pine blister rust in the RockyMountains of Canada and northern Montana. Canadian Journal of ForestResearch, Vol. 38, No. 5, pp. 982–995.

Swets, J.A. 1988. Measuring the accuracy of diagnostic systems. Science(Washington, D.C.), Vol. 240, No. 4857, pp. 1285–1293.

Tomback, D.F., Arno, S.F., and Keane, R.E. 2001. The compelling case formanagement intervention. In Whitebark pine communities: ecology andrestoration. Edited by D.F. Tomback, S.F. Arno, and R.E. Keane. IslandPress, Washington, D.C.

Vayssieres, M.P., Plant, R.E., and Allen-Diaz, B.H. 2000. Classification trees:an alternative non-parametric approach for predicting species distributions.Journal of Vegetation Science, Vol. 11, No. 5, pp. 679–694.

Venables, W.N., and Ripley, B.D. 1997. Modern applied statistics with S-PLUS.2nd ed. Springer, New York.

Waring, K.M., and Six, D.L. 2005. Distribution of bark beetle attacks afterwhitebark pine restoration treatments: a case study. Western Journal ofApplied Forestry, Vol. 20, No. 2, pp. 110–116.

Wilson, B.C., and Marcoux, H. 2005. A draft whitebark pine restoration planfor the Rocky Mountain national parks. Cordilleran Ecological Research,Winlaw, B.C. Cordilleran Ecological Research Report KNP03-00.

Wilson, B.C., and Stuart-Smith, G.J. 2002. Whitebark pine conservation forthe Canadian Rocky Mountain national parks. Cordilleran EcologicalResearch, Winlaw, B.C. Cordilleran Ecological Research ReportKNP01-01.

Wulder, M.A., White, J.C., Bentz, B., Alvarez, M.F., and Coops, N.C. 2006.Estimating the probability of mountain pine beetle red-attack damage.Remote Sensing of Environment, Vol. 101, No. 2, pp. 150–166.

Zambon, M., Lawrence, R., Bunn, A., and Powell, S. 2006. Effect ofalternative splitting rules on image processing using classification treeanalysis. Photogrammetric Engineering & Remote Sensing, Vol. 72, No. 1,pp. 25–30.

Zeglen, S. 2002. Whitebark pine and white pine blister rust in BritishColumbia, Canada. Canadian Journal of Forest Research, Vol. 32, No. 7,pp. 1265–1274.

366 © 2008 CASI

Vol. 34, No. 4, August/août 2008