Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner...

35
Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural Resources Fort Collins, CO shop at the 2008 Pathways to Success Confere ating Human Dimensions into Fish & Wildlife

Transcript of Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner...

Page 1: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Structuring Data to Facilitate Analysis

Jerry J. VaskeJay Beaman

Colorado State UniversityWarner College of Natural Resources

Human Dimensions of Natural ResourcesFort Collins, CO

Workshop at the 2008 Pathways to Success Conference:Integrating Human Dimensions into Fish & Wildlife Mgmt.

Page 2: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Workshop Foundation

Page 3: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Workshop Objectives

• Illustrate strategy for:

– Facilitating analysis of 2006 National Survey ofFishing, Hunting, and Wildlife-Associated Recreation (FHWAR)

– Increasing the usability of FHWAR data formanagement, planning & policy

• Compare two types of data structures:

– Flat files

– Relational Entities

Page 4: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Traditional Flat File

Rows = RespondentsColumns = Variables

Page 5: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Flat Files – Journal Article Example

Every journal article has:

• One or more authors

• Title

• Journal name

• Specifics about date of publication:YearVolume numberIssue numberPage numbers

• Potentially keywords

Page 6: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Flat File Data Structure for Journal Articles

Page 7: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Potential Issues with Flat Files

• Problem– Diefenbach et al (2005) article had 7 co-authors– 7 columns (variables) necessary to accommodate

all authors’ last names– 19 of 26 articles in flat file had only 1 or 2 authors– 67% of author fields empty– If first names included – more empty fields

• Solution – Relational database

Page 8: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Relational Databases• Definition

– Set of tables containing data for predefined categories– Data stored in separate files (tables) that are linked

• Terminology– Table = Entity (E)

– Rows (tuples) in table = information about an object(e.g., journal article or respondent)

– Columns (attributes) = variables

– Two types of relations (R)1. Set of tuples – a table with attributes (these R’s store data)2. Algebraic (Person ID in Table A = Person ID in Table B)

(these relations use data stored in entities)

Page 9: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Relational Data Structure for Journal Articles

Article EntityArticle IDJournal IDArticle titleYear, Issue, Pages

Journal EntityJournal IDJournal namePublisher info

Author EntityAuthor IDLast, First nameContact info

Keyword EntityKeyword ID(attitudes, norms)

(R2)Relation

Journal ID

(R3)Relational Table

Article IDKeyword ID

(R1)Relational Table

Article IDAuthor ID

Page 10: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Comparison Flat File vs. Relational Database

Flat file

Relational Database R1 = table multiple authors (AuthorID) linked to given article (ArticleID)

ArticleID AuthorID

2059 314

2059 59

2059 233

Author entity and R1 (author–article relation)can have any number of rowsso all authors of an articlecan be identified

Page 11: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

FHWAR Flat File Example

• Fishing, Hunting & Wildlife-Associated Recreation (FHWAR)

• National Survey – Conducted about every 5 years– 1955 – first survey

– 2006 – most recent survey

• Data on hunters, anglers, wildlife watchers:– Sportsperson expenditures

– Species sought in different states

• Data collection costs (1991–2006) $55 million (in 2008 $)

• 1991-2006 data comparable within limits but not integrated

Page 12: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

2006 – FHWAR Flat File Data

Data distributed on CD containing 3 ASCII text files:

1. Screening data

2. Sportsperson (hunting & fishing) data

3. Wildlife Watcher data

Data file # of Records # of Variables

Screening 144,509 56

Sportsperson 21,942 3,765

Wildlife Watcher 11,285 772

Page 13: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

FHWAR Flat File – Analytical Issues

• Important issues– 4,500+ vars with obtuse variable names

(e.g., NCU_STD1)

– 200 pages of documentation

– Census conversion programs do not create variable labels or value labels

• Major issues– Data compression

– Conceptual complexity

Page 14: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Analytical Issues Affecting Use

• Data compression– No hunters hunt in all 50 states (at most 8 in 2006 data)

– To avoid numerous empty cells data are compressed(e.g., the values for 3 vars are combined into a single var)

– For example:“days” of participation is combined with an “activity” (e.g., big game or small game hunting)

in a given “state” (in the order states are mentioned)

– Compressed vars cannot be directly analyzed by SAS or SPSS • Conceptual complexity

– When uncompressed to blocks of 50 states ≈ 20,000 variables

– Difficult to visualize analysis strategy

– Flat FHWAR files hide data structure

Page 15: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Relational File Structure Illustration

Entity

1. PERSON

2. SPORTSPERSON

3. HUNTING_ACTIVITY

4. TRIP_EXPENDITURES

Based on flat file:

Screening

Sportsperson

Sportsperson

Sportsperson

Four entities ≈ half of the 2006 FHWAR flat file data

Page 16: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

PERSON Entity

• 6 control variables (e.g., Person_Weight)

• 10 demographic variables (e.g., Age, Sex)

• 8 hunting variables (e.g., Hunted_2005)

• 8 fishing variable (e.g., Fished_2005)

• 6 residential wildlife watching variables(e.g., Home_Observe_2005)

• 5 non-residential wildlife watching variables(e.g., Trip_Watch_2005)

Page 17: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

SPORTSPERSON Entity

• 6 control variables(e.g., Person_ID, Sportsperson_Weight)

• 11 demographic variables (e.g., Age, Sex)

• 15 national summary variables (e.g., Hunted_2006)

PERSON variables in SPORTSPERSON could be “obtained” from PERSON but also included SPORTSPERSON to simplify analyses

Page 18: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

TRIP_EXPENDITURES Entity

TRIP_EXPENDITURES entity reduces 844 compressed vars to 10 vars

Person_ID Unique person ID

Sportsperson_Weight Sportsperson weight

Spender Person in TRIP_EXPENDITURES entity

State_of_Residence State of residence

Location_Trip_Spending State of spending

Spend_State_of_Residence Expenditure in state of residence

Fish_Hunt Fishing or hunting expenditure

Fish_Hunt_Type Fishing or hunting type

Trip_Expend_Category Sportsperson expenditure categories

Dollars Amount spent in dollars

Page 19: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

HUNTING_ACTIVITY EntityPerson_ID Unique person ID

Sportsperson_Weight Sportsperson weight

Hunter Person in HUNTING_ACTIVITY entity

Table_Cell_Description Description of tables (e.g., relation to FH3 variables)

Sub_Table_ID Sportsperson sub-table identifier

State_of_Residence State of residence (start Wave 3)

In_State_of_Residence Participation in state of residence

Activity_Location Geographic location for activity (USA or a State)

Private_Public Activity of private or any public land

Fish_Hunt_Type Fishing or hunting type

Response_Unit Participation = 1, Days = 2, Trips = 3

Response Participation (1 = Yes), # of days, or # of trips

HUNTING_ACTIVITY entity reduces 840 compressed vars to 12 vars

Page 20: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Variable: Sub_Table_ID

HUNTING_ACTIVITY entityA collection of state-level sub-tables to facilitate analysis

Page 21: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Visualizing the 4 Entities

Person(Screening data)

Control Variables Person ID Person Weight

Demographics (11)

Hunting (8) Hunted Ever Hunt Intentions

Fishing (8)

Wildlife Watching Residential (6) Trips (5)

Sportsperson(Sportsperson data)

Control Variables Person ID Sportsperson Weight

Demographics (11)

National summary “species” variables (15) Hunted 2006 Big game hunted Days big game hunted Trips hunting big game

Hunting Activity(Sportsperson data)

Person IDSportsperson WeightSub Table IDResponse UnitResponse

Trip Expenditure(Sportsperson data)

Person IDSportsperson WeightSpending categoriesDollars

Page 22: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Summary

• About 1,750 flat file variables reduced to < 60

• Obtuse variable names replaced with intuitive names

• Compressed flat file variables cannot be directly used in SPSS or SAS

Variables in relational entities can be used in analysis

• Details in Beaman & Vaske (2008)

Page 23: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Entity Data FilesEntity SAS filename SPSS filename

PERSON Person.sas7bdat Person.sav

SPORTSPERSON Sportsperson.sas7bdat Sportsperson.sav

HUNTING_ACTIVITY Hunting_Activity.sas7bdat Hunting_Activity.sav

TRIP_EXPENDITURES Trip_Expenditures.sas7bdat Trip_Expenditures.sav

(http://welcome.warnercnr.colostate.edu/~jerryv/)

To simplify analyses 2 additional entities:Hunting_Activity_and_DemographicsTrip_Expenditures_and_Demographics

Page 24: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

SAS Code & SPSS Syntax

Figure number: SAS code SPSS syntax

Figure 3 Figure_3_Syntax.sps

Figure 4 Figure_4.sas Figure_4_Syntax.sps

Figure 5 Figure_5_Syntax.sps

Figure 6 Figure_6.sas

Figure 7 Figure_7_Syntax.sps

Figure numbers based on Beaman & Vaske (2008)

http://welcome.warnercnr.colostate.edu/~jerryv/

Page 25: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Example – Hypothesis

Average days of elk huntingvaries between Colorado vs. Wyoming

and by hunter’s sex

Page 26: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Flat File to Entity for HypothesisData FHWAR6.Hunt_BGspecies_States ;

Length Person_ID 5 Sportsperson_Weight 4 Sex 3 State_of_Residence Activity_Location Fish_Hunt_Type Response_Unit Response 4 ;

Set FHWAR6.fh3 (rename = (sex = xsex)) ;

Keep Person_ID Sportsperson_Weight Sex State_of_Residence In_State Response Activity_Location Fish_Hunt_Type Response_Unit ;         Person_ID = PersonID ;         Sportsperson_Weight = spwgt ;        Sex = Xsex ;         State_of_Residence = put (resstate, $st2num2.) ;

* Array stores info to identify state when decompressing ;Array a1( 2, 8 ) HUNTSTD1-HUNTSTD8 STDAYSHD1-STDAYSHD8 ;

Page 27: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

* Array stores info to associate species with variables ;

Array gam1( 9) g1-g9 ;Retain g1 1 g2 2 g3 3 g4 4 g5 5 g6 6 g7 7 g8 40 g9 41 ;Array a7( 2, 9 , 8 ) bgame1d1--bgdifday9d8 ;Do m = 1 To 2 ; Do j=1 To 9 ; Do k=1 To 8 ;If a1( 1, k) = ' ' Then Goto End7 ;  Fish_Hunt_Type = gam1(j) ;         If m = 1 Then Do ; Response_Unit = 1 ; End ;                  Else Do ; Response_Unit = 2 ; End ;         Response = a7(m, j, k) ;         Activity_Location = put(a1( 1, k), $st2num2. ) ;          If Activity_Location = State_of_Residence Then In_State = 1 ;                  Else In_State = 0 ;

* Outputs data for hypothesis;

If Response > 0 Then Output ;End7: End ; End ; End ;run ;

Page 28: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

SAS Entity to SPSS Entity

Get SAS Data = ‘C:\Hunt_BGspecies_States.sas7bdat’.

Add Value labels

Save Outfile = ‘C\Hunt_BGspecies_States.sav’.

Page 29: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Testing Hypothesis with Relational EntityGET File = 'C:\Hunt_BGspecies_States.sav'.

WEIGHT BY Sportsperson_Weight.

Select if (Activity_Location = 8 or Activity_Location = 56).

Select if (Fish_Hunt_Type = 2).Select if (Response_Unit = 2).

UNIANOVA Response BY Sex Activity_Location.

Opens data

Weights data

CO huntersWY hunters

Elk huntersDays of participation

ANOVA

GET File = 'C:\FHWAR\Hunting_Activity.sav'.Select if (Sub_Table_ID = 10).

WEIGHT BY Sportsperson_Weight.

Select if (Activity_Location = 8 or Activity_Location = 56).

Select if (Fish_Hunt_Type = 2).Select if (Response_Unit = 2).

UNIANOVA Response BY Sex Activity_Location.

Page 30: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Results

Page 31: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Conclusions

• Analyses that are difficult to perform with flat file data are possible with relational structure

• Restructuring all of 2006 FHWAR data as well as data from 1991, 1996, & 2001 would:

– Yield similar analysis capabilities

– Allow for trend analysis

– New practical opportunities for state agencies

Page 32: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Practical Opportunity• State agencies have accurate records of license

sales (e.g., hunting only, fishing only, combos)

• With potentially 100s of licenses, permits, & stamps sold, not practical to ask about specific licenses in a flat file

• Moving to relational structure for obtaining license data has advantages …

Page 33: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Advantages of Relational License Data

1. Can ask about actual state license salesAll state license info can be “pre-stored” in one entitySize of entity would not impact other data entities

2. Questions about specific license cost not necessary; correct information pre-stored

3. Establishing relationship between state specific license sales & FHWAR dataprovides foundation for benchmarking / calibratingmeaningful estimates based on FHWAR

Page 34: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

From Analysis to Data Collection

• Entity based models:– facilitate analyses– can also enhance data collection

• Currently working with software company Techneos (www.techenos.com) toimplement pilot models that yield:– more consistent and – accurate data collection

Page 35: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Questions?