1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16:...

14
International Passenger Survey (IPS) workshop: 29 June 2012 This session will use the 2010 International Passenger Survey and covers: 1. IPS 2010 documentation on the ESDS website and the format of the IPS data 2. Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross- tabulations in SPSS, missing data in SPSS 3. Answer an example research question: What are the characteristics of UK residents arriving in the UK bringing in large numbers of cigarettes (more than 400)?

Transcript of 1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16:...

Page 1: 1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross-tabulations in SPSS, missing

International Passenger Survey (IPS) workshop: 29 June 2012

This session will use the 2010 International Passenger Survey and covers:

1. IPS 2010 documentation on the ESDS website and the format of the IPS data

2. Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross-tabulations in SPSS, missing data in SPSS

3. Answer an example research question: What are the characteristics of UK residents arriving in the UK bringing in large numbers of cigarettes (more than 400)?

Page 2: 1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross-tabulations in SPSS, missing

1. IPS documentation and data files

1.1 Information about the IPS on the ESDS website: www.esds.ac.uk

Go to the ESDS website (www.esds.ac.uk) and type “ International Passenger” in the “Simple search for data” box to the right of the screen.

Scroll down the results page until you see the 2010 data, and click on the Study Description Documentation link.

You will see the following page for the IPS 2010:

Note that there is a link near the top right of this page for downloading the data (but don’t press this now as we have already downloaded the data for you to use in this workshop).

Scroll down this page. This webpage contains a lot of information about this survey and at the bottom there are a number of files under the title ‘Documentation:’.

Page 3: 1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross-tabulations in SPSS, missing

Exercise 1:

Find the following information about IPS 2010 from this page or the documentation at the bottom of the page:

Which country is covered by this survey? (look at the section on ‘Coverage’ on the webpage)

..................................................................................................................................................................

What is the population covered by the IPS? (hint: try control-F and then type “population”) ………………………………………………………………………………………………………………………………………

The IPS 2010 data come in 4 data files called airmiles, alcohol, qregtown and qcontact. Which datasets contain the variables:

Money spent on spirits? …………………………………………………………………….…..………………….…..………………..

Towns stayed in overnight? …………………………………………………………………….…..………….…..……………………

Nationality? …………………………………………………………………….…..…………………………………………………………….

What is the ‘port’ code for Southampton? (Hint: look in the excel file for UK port codes)…..……………

What is the ‘port’ code for Gatwick? ……………………………………………………………………………………………….…

What is the weighting variable called and which dataset(s) does it appear in? (Hint: try searching the main page and if you don’t find anything there, the documentation at the bottom of the page for information about the weight(s) – use control-F and ‘weight’)

…………………………………………………………………………………………………………………………………….………………………

1.2 The structure of IPS files

When you download the data from ESDS, it comes in 4 files – airmiles, alcohol, qregtown and qcontact. The contents of these are shown in the IPS 2010 page on the ESDS website: (see the next page for a screenshot)

Page 4: 1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross-tabulations in SPSS, missing

IPS data are deposited at the UK Data Archive every 3 months.

In some years (e.g. 2009), you will find the data in 16 files: there are 4 datasets for each quarter (airmiles, alcohol, qregtown and qcontact for quarter 1, and so on for each quarter) so if you download the IPS 2009 from ESDS, you will see 16 datasets.

In other years, you will see only 4 files: in 2010 the data were reissued as an annual dataset – so each of the four files (airmiles, alcohol, qregtown and qcontact) contains the variables shown above for the whole of 2010.

Page 5: 1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross-tabulations in SPSS, missing

2. Working with the data in SPSS

This section covers:

• Viewing data and outputs in SPSS 16

• Creating one-way frequencies

• Weighting the data

• Doing a cross-tabulation

• Filtering the data

2.1 The data in SPSS 16

Open SPSS version 16 (Start, All Programs, Site Licenced Applications, Statistics, SPSS16 Network Version). Then open the data (using the standard windows open button, or File>Open): the file we will use is called qcont2010cust.sav.

There are two ways to view the data: in the Data View or the Variable View tabs - the tabs to switch between the two are at the bottom left of the screen.

Ensure the Variable View tab (bottom left of screen) is clicked. In Variable View:

• Each row represents something that varies between respondents (known as a variable) and each column provides information about the variable including the name, label and coding information in the Values column.

Click on the Data View tab (bottom left of screen). In Data View:

• Each column represents a variable in the survey. This is often a response to a question or derived from answers to a question or several questions.

• Each row represents an individual respondent

Page 6: 1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross-tabulations in SPSS, missing

In both views, you can use the drop-down menus at the top of the page (i.e. File, Edit, View, Data, Analyze, Graphs etc.) to manipulate the data and to do analyses. The menus used in this section are:

• Analyze: Descriptive Statistics: Frequencies… • Data: Weight cases… • Analyze: Descriptive Statistics: Crosstabs… • Data: Select cases…

When you conduct an analysis, you see the results in the Output window. You can move between the Output window and the data in Data View or Variable View windows, by clicking on the tab in the Taskbar (the bar that is always at the bottom of the screen).

2.2 To create a one-way frequency table of a variable

From the menu bar at the top of the page, use:

Analyze> Descriptive Statistics> Frequencies

We will look at the main purpose for the visit: Purpose.

Use the Analyze drop-down menu to open the Frequencies dialogue box. When you get to the Frequencies screen: Select the variable for the main purpose for the visit: Purpose from the list and move it into the right hand box. See below for how this should look. When you have finished, click on ‘OK’.

Page 7: 1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross-tabulations in SPSS, missing

Another Output window will now open. You should see the frequency table shown below displayed in this window:

Main purpose for visit code

Frequency Percent Valid Percent

Cumulative Percent

Valid Holiday/pleasure 56922 17.8 47.7 47.7

Visit family (priority) 23208 7.3 19.4 67.1

Visit friends 4793 1.5 4.0 71.1

Getting married 137 .0 .1 71.3

Play amateur sport 1302 .4 1.1 72.4

Watch sport 1021 .3 .9 73.2

Personal shopping 876 .3 .7 73.9

Cruise 0-2 nights ashore - UK 406 .1 .3 74.3

Cruise 0-2 nights ashore - For 527 .2 .4 74.7

Business; Work 17362 5.4 14.5 89.3

Visit trade fair 452 .1 .4 89.7

Conference 20+ people 1991 .6 1.7 91.3

Definite job to go to 372 .1 .3 91.6

International commuter 110 .0 .1 91.7

Looking for work 136 .0 .1 91.8

Au Pair 17 .0 .0 91.9

Formal course (check residence and definition) 985 .3 .8 92.7

Medical treatment 219 .1 .2 92.9

Accompany / join 293 .1 .2 93.1

OTHER 1489 .5 1.2 94.4

Overnight transit 1860 .6 1.6 95.9

Same day transit 4325 1.4 3.6 99.5

Military or embassy (serving on duty) 331 .1 .3 99.8

Merchant navy (joining or leaving ship) 99 .0 .1 99.9

Airline crew (positioning) 107 .0 .1 100.0

Unacc schoolchild (16 or under, school to parents) 15 .0 .0 100.0

Coding query 4 .0 .0 100.0

Total 119359 37.3 100.0 Missing System 200259 62.7 Total 319618 100.0

These frequencies are the number of responses of each type given in the 2010 IPS. So, 58,210 of those interviewed in the IPS 2010 were travelling to or from the UK on holiday and 136 were travelling to look for work.

Page 8: 1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross-tabulations in SPSS, missing

2.3 To weight the data

Use: Data> Weight Cases…

Using the drop-down menus, go to the Weight Cases screen and add the weight fweight, then click ‘OK’.

Redo the frequency table for Purpose and you will see that the numbers are much larger than before and the percentages have changed a bit.

The previous frequency table showed the numbers of visitors interviewed for the survey. Using the weight has given you estimates of numbers for all visits, not just those in the sample. The weighted results are the ones that you would report in your research.

Main purpose for visit code

Frequency Percent Valid Percent

Cumulative Percent

Valid Holiday/pleasure 46861026 23.5 52.0 52.0

Visit family (priority) 16236606 8.1 18.0 70.1

Visit friends 3434845 1.7 3.8 73.9

Getting married 92487 .0 .1 74.0

Play amateur sport 957439 .5 1.1 75.0

Watch sport 722814 .4 .8 75.8

Personal shopping 747756 .4 .8 76.7

Cruise 0-2 nights ashore - UK 494768 .2 .5 77.2

Cruise 0-2 nights ashore - For 320055 .2 .4 77.6

Business; Work 12487633 6.3 13.9 91.4

Visit trade fair 240486 .1 .3 91.7

Conference 20+ people 1008784 .5 1.1 92.8

Definite job to go to 255270 .1 .3 93.1

International commuter 83654 .0 .1 93.2

Looking for work 110163 .1 .1 93.3

Au Pair 13690 .0 .0 93.3

Formal course (check residence and definition) 600970 .3 .7 94.0

Page 9: 1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross-tabulations in SPSS, missing

Medical treatment 136407 .1 .2 94.2

Accompany / join 208369 .1 .2 94.4

OTHER 1259870 .6 1.4 95.8

Overnight transit 975668 .5 1.1 96.9

Same day transit 2478158 1.2 2.8 99.6

Military or embassy (serving on duty) 202953 .1 .2 99.8

Merchant navy (joining or leaving ship) 57727 .0 .1 99.9

Airline crew (positioning) 72573 .0 .1 100.0

Unacc schoolchild (16 or under, school to parents) 8220 .0 .0 100.0

Coding query 837 .0 .0 100.0

Total 90069225 45.1 100.0 Missing System 109660767 54.9 Total 199729992 100.0

2.4 To create a cross-tabulation

Use: Analyse> Descriptive Statistics> Crosstabs…

We will look at purpose of visit by gender to see whether, for example, equal proportions of men and women travel mainly for business. To create the cross-tabulation using the drop-down menus:

• Click on: Analyze, Descriptive Statistics, Crosstabs… • Select Purpose (Main purpose of visit) as your row variable and Sex (the gender of the

respondent) as your column variable (see below for how the Crosstabs box should look)

• Click on the Cells… button and select row percentages (you can leave observed ticked too, to keep frequencies)

• Click Continue to return to the Crosstabs dialogue box and press OK to run

Page 10: 1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross-tabulations in SPSS, missing

You should see the following cross-tab in the output window:

We can interpret the cells on the first row to mean that 52.1% of people who travelled mainly for holiday or pleasure were men and 47.9% were women, while among those who travelled mainly to watch sport (on the 6th row), 83.8% were men and 16.2% were women.

2.5 Filtering: Selecting parts of the data

Use: Data> Select Cases…

If we are interested in a research question that refers to only a part of the population, we can filter out all the parts of the data that we are not interested in. This means that our analyses use responses by some respondents and not others. An example might be that we only want to look at older adults - so we select only people aged 65 and over and filter out all everyone else. Filtering does not delete data from the data set, it just removes it from analyses while the filter is on.

!!! Don’t forget to remove the filter when you want to use all the data again !!!

To select only people who are aged 65 and over, we use the drop-down menu:

• Click on: Data, Select Cases… to see the following box:

Page 11: 1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross-tabulations in SPSS, missing

• Select If condition is satisfied and click on the button If…

• To select people aged 65 and over, you must select Age=8 (because the Age variable has been grouped so that 1=0-15 years, …, 8=65 years and over). So click on Age and use the arrow to move it to the box at the top, then type in =8 (see above for how this looks), and

Page 12: 1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross-tabulations in SPSS, missing

press Continue. This selects only those cases for which the purpose of the visit was stated as business.

• Press OK

You can see that the filtering has worked by looking at the data in Data View (see below). The rows with lines through the numbers will not be included in analyses.

You should also do a frequency table of Age to check that the filtering is doing what you think it is doing.

Now try looking again at the purpose for visit (Purpose) and see how the frequencies for older people are different from the general population.

!!! Now remove the filter via Data> Select Cases… and choosing Select all. !!!

Page 13: 1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross-tabulations in SPSS, missing

3. Research question: Among UK residents, what are the characteristics of people bringing over 400 cigarettes into the UK?

This exercise uses the 2010 dataset: alcohol2010.sav. Open alcohol2010.sav (you can close qcont2010cust.sav if you used in the last section of this workbook).

a) Documentation: Use the documentation on the ESDS website to find out which questions were asked on the IPS about tobacco/cigarettes brought into the UK. (See Section 1.1 of this workbook for instructions about how to find the IPS 2010 webpage. Then look at the questionnaire to see which questions were asked.)

b) Select only UK residents arriving in the UK

Select only UK residents arriving in the UK (select if Flow=4|Flow=8). Check that the filtering has worked as you expected by looking at the data in Data View or, better, by looking at a frequency table of Flow.

c) Number of cigarettes brought into UK by UK residents

Run a frequency table for ‘Number of cigarettes’ (nocig) to see how many cigarettes were brought in by UK resident respondents to the survey in 2010.

What does the frequency table show?

• How many respondents brought no cigarettes? And 200 cigarettes? And 16,000?

• Did anyone bring in 99999 cigarettes??

• Is there a lot of missing data in the variable nocig? (What do you think the difference is between the ‘system missing’ and those with the code 99999?)

d) Weight the data

Add a weight to the data to make the numbers representative of the numbers of UK resident passengers entering the UK by air or sea in 2010. The weight is called fweight. (See Section 2.3 for how to weight data in SPSS) Run the frequency of nocig and notice how the numbers and percentages have changed.

e) Create a variable for number of cigarettes grouped: no cigarettes, 400 cigarettes or less and over 400 cigarettes

We must create a recoded ‘number of cigarettes’ variable (e.g. “nocig2”) such that 0= no cigarettes (but not missing) 1= brings in 400 cigarettes or fewer, 2=brings in 400 or more cigs. We need to make sure we exclude those who have missing values on this variable (=99999) who are not already system missing. In the Transform menu, use Recode into Different Variables… . (Never use recode into same variables as this effectively deletes the variable in its original form so you can’t see what was in it or check that the recoding has worked).

Now do a cross-tab of the new variable with the old to check that the recoding has worked as you expected.

Look at the data using the Variables view. Your new variable will be at the bottom of the list of variables. Add labels for your new variable: 0=’no cigarettes’, 1=’400 cigarettes or less’, 2=’more than 400 cigarettes’.

Page 14: 1. IPS 2010 documentation on the ESDS website and the ... · Working with the IPS in SPSS 16: frequencies, weighting data, filtering, recoding, cross-tabulations in SPSS, missing

f) Among UK residents, what proportion of men vs women brought over 400 cigarettes into the UK in 2010? What about the proportions brought in by scheduled vs charter flights? What about private flights?

Do a cross-tab of the new ‘Number of cigarettes’ variable by sex and of the new ‘Number of cigarettes’ variable by flightyp (Type of flight).

Syntax for Section 3:

USE ALL. COMPUTE filter_$=(Flow=4|Flow=8). VARIABLE LABEL filter_$ 'Flow=4|Flow=8 (FILTER)'. VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'. FORMAT filter_$ (f1.0). FILTER BY filter_$. EXECUTE. FREQUENCIES VARIABLES=nocig /ORDER=ANALYSIS. WEIGHT BY fweight. FREQUENCIES VARIABLES=nocig /ORDER=ANALYSIS. RECODE nocig (0=1) (99999=SYSMIS) (SYSMIS=SYSMIS) (1 thru 400=2) (401 thru Highest=3) INTO nocig2. VARIABLE LABELS nocig2 'Number of cigarettes (recoded)'. EXECUTE. CROSSTABS /TABLES=nocig BY nocig2 /FORMAT=AVALUE TABLES /CELLS=COUNT /COUNT ROUND CELL. CROSSTABS /TABLES=nocig2 BY Sex /FORMAT=AVALUE TABLES /CELLS=COUNT COLUMN /COUNT ROUND CELL. CROSSTABS /TABLES=nocig2 BY flightyp /FORMAT=AVALUE TABLES /CELLS=COUNT COLUMN /COUNT ROUND CELL.