Telephone Collection as Part of a Multimode SurveyChallenges include difficulty in finding and...

Telephone Collection as Part of a Multimode Survey

Mark Pierzchala and Debra Wright, Mathematica Policy Research, Inc. Claire Wilson, Insight Policy Research

Paul Guerino, Education Statistics Services Institute

Abstract

We review the complimentary attributes of telephone collection vis-à-vis other modes. We discuss issues involved in fielding a multi-mode telephone survey including survey design, instrumentation, survey operations and management, data quality and comparability, and survey costs. Experiences from three MPR surveys, with CATI/CAPI, Paper/Web/CATI, and Web/CATI modes illustrate essential points.

Introduction

The viability of phone-only surveys has been called into question due to declining response rates and increased costs. Challenges include difficulty in finding and contacting respondents, increased number of phone attempts, and reduction in land-line phone coverage with the increased use of cell phones, perceived barriers such as the Federal Trade Commission’s “Do Not Call Registry”, sample frame limitations, and expectations of being paid for participation. Additionally, as response to telephone surveys continues to fall, making it increasingly expensive to contact and complete interviews, many surveys simply do not have the budget to pursue potential respondents using a telephone only methodology. Emerging technologies such as the Web and email, and long-standing approaches such as mail and field efforts, offer opportunities to overcome these limitations.

Phone Surveys Transition to Multimode Surveys

The use of other data collection modes such as mail, Web, and in-person interviewing, when used as a supplement to telephone interviewing, can compensate for the potential shortfalls of CATI and help survey organizations control costs. Mixed-mode surveys offer multiple channels for reaching sample persons who may be difficult to contact or reluctant to participate by telephone. In addition, multi-mode studies may help increase response rates, or at least slow declines in response rates. While the greatest improvement in response rates have been observed when modes are offered sequentially, (Shettle & Mooney, 1999; Dillman, et al 2001; Goho, 2002), one recent study showed an improved response rate using a mixed mode approach of mail and Web compared to mail or Web only (Sax, Gilmartin & Bryant, 2003). Research indicating that individuals have mode preferences (Groves & Kahn, 1979) also suggests that offering respondents a choice of modes could yield a higher response rate than using only one mode.

No one mode is likely to replace the phone-only survey. Web-only surveys may have lower response rates and can introduce biases associated with respondent self-selection. Field-only surveys are often prohibitively expensive. Paper-only surveys can take several

contacts to achieve an acceptable response rate and can require extensive data cleaning after collection. Given these sole-mode limitations, we see a continued transition to multimode surveys where the phone has a major role.

Summary of Three Multimode Surveys at MPR

This paper is based on experiences with three multimode surveys conducted by Mathematica Policy Research, Inc. In the 2004 National Beneficiary Survey (NBS) conducted for the Social Security Administration (SSA), the primary form of data collection was CATI with CAPI follow-up. The 2003 National Survey of Recent College Graduates (NSRCG), conducted for the National Science Foundation (NSF), was conducted as a mail survey with CATI and Web options. Finally, in the Kauffman Firm Survey (KFS), sponsored by the Ewing Marion Kauffman Foundation, MPR surveyed principals of newly formed businesses, trying to contact them first by Web then following up with CATI. The phone mode is common to, but used differently, in the three surveys. Because we generally stage the cheapest mode(s) first, CATI is the first mode in NBS and the follow-up mode in NSRCG and KFS. In each of these surveys the phone mode collected the most cases though large numbers of responses were achieved with the other modes. The phone mode is a personal, persistent, and relatively inexpensive mode through which interviewers persuade sample members to cooperate. A basic summary of the key characteristics of the three surveys follows.

Table 1: Basic Contrast and Comparison of Three Multimode Surveys at MPR 2004 NBS 2003 NSRCG KFS Sponsor Social Security

Administration National Science

Foundation Ewing Marion Kauffman

Foundation Paper First Web Second First CATI First

First for call-in, Second for call-out

Second

CAPI Second Percent completes by mode

CATI: 80% CAPI: 20%

Paper: 30.7% Web: 21.5% CATI: 47.7%

Web: 20.5% CATI: 79.5%

Kinds of data Factual / opinion Factual / opinion (few) Factual / economic Data use Descriptive Descriptive Modeling Survey type Repeated, cross

sectional / longitudinal

Repeated, cross sectional Longitudinal

Population (all in USA)

SSA Disability Beneficiaries

Recent Bachelors or Masters degree recipients

Businesses formed in 2004

Frame List from SSA List developed from

university data List from Dun and

Bradstreet™ Avg. Length (mins)

45 30 – 35 35-40

History New survey Several previous surveys New survey

The three surveys are diverse and give a good range of contrasts along several dimensions. They allow us to compare the conduct of these three surveys against mixed mode methodologies as discussed in de Leeuw (2005), Dillman (2000), and others. This gives MPR a way to assess its methods, and a chance to comment on the utility of the latest methods through a review of these very practical applications.

Survey Design

Before deciding on a multi-mode over telephone only design, several factors should be considered. What is the sample for the survey? What is the subject matter of the survey? How complex is the survey instrument? What is the survey budget? If a multi-mode method is appropriate, the researcher must evaluate which data collection modes offer the best compliment to the telephone mode and how the modes should be staged. Multiple modes can be offered simultaneously providing sample persons a choice of response modes, or they can be staged sequentially with telephone as either a starting mode or as a follow-up mode to the other data collection modes.

Considerations for Choosing Modes

In terms of survey population, the researcher should consider the coverage that each mode provides. While mail and in-person interviewing will provide nearly complete coverage when combined with telephone for most populations, coverage still remains an issue for Web surveys (Couper, 2000) of general populations. Although Web coverage is expected to continue to increase, at this time, a supplemental Web mode is a better option for populations who are likely have access to the Internet, such as business or student samples. Additionally, self-administered modes may not be ideal for populations or surveys that are likely to require guidance or assistance from interviewers. In such situations, alternate modes used in conjunction with telephone should also be interviewer administered. An advantage to in-person administration as a supplement to telephone is that field interviewers can find and secure the participation of hard-to-reach populations such as older, low-income, or minority populations that can be under-represented in telephone and mail surveys.

The subject matter of the survey may make some modes more desirable than others. Evidence suggests that items collecting sensitive information are the most subject to mode effects; therefore, researchers should carefully consider whether the survey contains series of items that could elicit socially desirable responses. Since self-administered modes may produce different estimates of sensitive or illicit behaviors than interviewer administered modes, (Aquilino, 1994; Tourangeau & Smith, 1996), a mix of telephone and mail or Web may cause data comparability problems. Additionally, there is evidence that face-to-face interviews can yield different estimates of sensitive behavior than telephone interviews (Aquilino, 1994; Holbrook, Green, & Krosnick; 2003). This suggests that multi-mode approaches may not be ideal for surveys comprised largely of sensitive items. However, for surveys that may have a subset of sensitive items, telephone and face-to-face or telephone and self-administered modes can be made more comparable by the use of a self-administered mode for such series (e.g., handing out or mailing a

questionnaire which covers these items or turning the laptop over to the respondent in a CAPI survey). Even between mail and Web surveys there may be response effects for sensitive items. Research has shown that Web surveys may lead to fewer socially desirable responses than mail, particularly among adolescents (McCabe, et. al, 2002; Link & Mokdad, 2004; Wright, et. al, 1998).

Instrument complexity should also influence mode selection. A strength of a dynamic mode is that it can handle complicated skip patterns. If the survey instrument is particularly complex, computer-assisted modes such as CAPI and Web are better suited to supplement CATI than paper. CAPI may be a particularly good option as a supplemental mode when the sample population requires greater assistance or persuasion.

Finally, costs will impact the selection of modes. Since self-administered surveys are generally less expensive than CATI or CAPI surveys, incorporating a self-administered option may offer a way to reduce data collection costs. See the section on costs for a suggested model for weighing the costs of multi-mode surveys.

Staging of Modes

Modes can be staged and combined in a way that takes advantage of their relative strengths while compensating for other weaknesses. A general formula for sequentially combining data collection modes is to start with the least costly and progress to the more expensive mode(s).

A sequential multi-mode approach was used to complete two rounds of the NBS. Due to instrument length and complex skip patterns and fills, a mail survey version was not a feasible option for this study. Additionally, since this population was more likely to require the assistance and encouragement of an interviewer to complete the survey, a self-administered interview was less than ideal. To control costs, MPR attempted to interview by telephone all sample persons for whom a valid telephone number was available. CAPI interviews were conducted with people who could not be interviewed by phone because they could not be located, had a disability that prevented them from responding by phone, or who refused to participate by phone. When field interviewers were able to locate sample members who had not been reachable by phone, they urged the sample member to call the survey’s toll free number to complete the interview via telephone.

The KFS employs a sequential Web/CATI data collection methodology. In this study, the Web mode is first offered (without mention of phone) to take advantage of the low cost of the Web survey and the fact that most businesses have Internet access. Businesses that did not respond over the Web were contacted by telephone and encouraged to complete the survey over the telephone. The phone mode also promotes Web completions through voice-mail messages and direct reminders from interviewers to sample members.

Multiple modes can be offered simultaneously, as a way to allow sample persons a choice of ways to respond. For the NSRCG, a follow-up mailing to non-respondents

included an invitation to complete the survey on-line. Sample persons were also given the option of calling a toll free number to complete the survey by telephone. CATI follow-up of nonrespondents was used to supplement Web and mail response and to retrieve data from items skipped in the mail or Web survey.

Instrumentation

For many mixed mode surveys there is little, if any difference between modes in either question wording or format (for example between two computer assisted modes such as CATI and CAPI). For surveys including more disparate modes, there is often tension between designing questions that are cognitively similar across modes, and designing questions that take advantage of the capabilities of each mode. As Dillman and Christian (2003) suggest, the features of each mode often shape question wording and layout and affect design considerations. CATI, CAPI, Paper, and Web surveys differ in several fundamental ways. Many of these design differences are the result of differences in presentation, manner of responding, segmentation of the questionnaire, the dynamic versus passive nature of the questionnaire, administration (self versus interviewer), pace of interview, the medium of the interview, and the training of the person recording the responses. Table 2 summarizes these factors using de Leeuw’s (2005) categorization of media-related and transmission related factors. Additionally, interviewer effects can interact with the other factors.

Table 2: Attributes of Modes of Surveys That Affect Cognition and Response Modal Aspect CATI Paper CAPI Web

Presentation * Aural Visual Aural / Visual Visual Transmission Spoken Written Spoken / Typed Typed Segmentation Segmented None Segmented Varies

Transmission Factors

Dynamic / Passive *

Dynamic Passive Dynamic Dynamic

Administration * Interviewer Self Interviewer /

Self Self

Pace Respondent / Interviewer

Respondent Respondent / Interviewer

Respondent / Computer Media Factors

Medium Phone Paper In-person /

Screen Browser

Disparate Modes

Because of their impact on instrument design, we define the term disparate modes when two modes of administration differ along one or more of these three dimensions: 1) Aural vs. Visual, 2) Self-Administered vs. Interviewer-administered, and 3) Dynamic vs. Passive. For example, paper and Web modes are disparate because the first is passive and the second is dynamic. Web and CATI modes are even more disparate because they differ along two dimensions (self- vs. interviewer-administered and aural vs. visual). CATI and paper would be the most disparate since these two differ along all three dimensions. In our experience, adhering to a single design becomes increasingly complex, the more disparate the modes. For example, the Web and CATI modes differ in administration (interviewer vs. self) and in presentation (aural vs. visual), but are both

dynamic (computer-assisted). It may be necessary to make wording changes to make the text appropriate for a self-administered survey; questions formatted for a CATI mode may not be optimally formatted for Web.

Differences in mode attributes have historically lead to particular question design standards within each mode. For example, in self-administered modes such as paper and Web, while respondents can leave items blank, Don’t Know (DK) and Refusal (RF) options are generally not displayed, although these options are available for CATI interviewers. In CATI, because questions are presented aurally, items with long response options are generally broken up into several items. On the other hand, for modes that are visually presented (paper and Web) long lists are possible (Dillman 2000). Because of these inherent differences, designing a multi-mode survey so that it operates optimally in each mode is likely to result in a number of differences in question design and layout.

Unimode and Generalized Mode Design Approaches

Dillman (2000) has suggested that multimode surveys should strive for a unimode design to minimize mode effects. This design approach uses same question forms and wording across modes where possible even though this may be sub-optimal for one of the modes. For example, he suggests offering a don’t know response option on a mail or Web survey to make it compatible with CATI. As he states, such a design may require instrument developers to abandon standard conventions or practices regarding best format for each mode. In other words, this approach makes modes consistent but does not take full advantage of each mode’s respective capabilities.

De Leeuw (2005) suggested that a unimode approach is particularly important when the survey will have one primary mode of data collection, with the other modes serving an auxiliary or complementary function. She suggests that when one mode is primary, the researcher should strive for a unimode approach such that the auxiliary modes conform to the specifications of the primary mode in order to minimize mode effects. In contrast, when all modes will be given equal weight (for example a CATI/Web survey, where contact with sample members was initiated both by phone and email), de Leeuw suggests a generalized mode design, where the goal is to achieve cognitive equivalence rather than literal uniformity of questions across modes. This design approach employs different question forms across modes if necessary in order to achieve this cognitive equivalence. This approach allows the researcher to capitalize on the unique capabilities of each mode, but presents another set of challenges. Specifically, researchers adopting a generalized mode design need to understand how modal differences affect the way that respondents process the question, and how they select an answer.

MPR Instrumentation Experience

The NBS and KFS studies followed a unimode approach while the NSRCG used a generalized mode design. The NBS is naturally a unimode design since CATI and CAPI modes are not disparate. For example, while CATI and CAPI interviews differ in their medium (phone vs. in-person), they are both interviewer administered and computer-

assisted or dynamic in nature. Though Web and CATI modes are disparate, the KFS achieves a unimode design through some compromises in the design of the instrument from each mode’s design optimum. The NSRCG displays a great amount of disparity in its design. Almost 50% of the items have a different form for at least one of the modes and some items have three different forms (Pierzchala, Wright, Wilson, Guerino, 2004).

To illustrate a unimode design, we consider the KFS, a Web and CATI survey. It collects monetary values similarly across modes using a traditional CATI methodology. In this method, the respondent is first asked for a direct value. If the CATI respondent refuses or answers don’t know, the instrument branches to categories of monetary ranges. DK and RF are not offered in the Web mode, but an EMPTY value in the Web mode is treated like a DK or RF response. In the Web mode, the request for a direct value appears first on its own browser page. The monetary categories appear on a subsequent browse screen only if needed (Figure 1 below). Since both modes apply dynamic routing and since there is not a limitation of paper space, the collection of money amounts can be the same on Web as in CATI. For both modes, respondents who supply direct values are never aware that alternative monetary categories are available. A passive paper mode (which does not exist for the KFS) would probably be constructed with only the categorical responses.

Figure 1: Collecting Monetary Amounts in KFS

If the respondent leaves the box above empty the following browser screen appears.

Another example of unimode design in the KFS, is in the reconciliation of ownership percentages where there are two or more owners. The collection of ownership information, e.g., number and names of owners, amount invested, and percent ownership, occurs in several widely separated places in the instrument. In addition to ownership by the principals, there can be investor-owners such as friends, relatives, government agencies, and venture capitalists. An important goal of the instrument is to ensure that ownership percentages add to 100%. In CATI mode, it is easy to employ edits within the instrument to catch ownership problems as they occur. However, in the Web mode, given that navigation is more limited, the respondent-operator of the instrument is untrained, and the desire to not dissuade the respondent from continuing, it was not feasible to employ CATI-style consistency edits. Instead, if there is a reconciliation problem in the Web mode, two summary screens are presented after all ownership data are collected. The first announces the problem; the second provides a way for the respondent to reconcile the problem. The images in figure 2 below show the implementation of Web methodology in CATI mode. The interviewer corrects the percentages, or adds or deletes listed owners in the reconciliation screen. The Web respondent has a browser version of the same screen.

Figure 2: Reconciling Ownership Allocations in KFS

The screen above announces the problem, the screen below allows reconciliation.

The interviewer can make corrections directly to the allocation as shown below.

The NSRCG instrument, with three disparate modes, exemplifies a generalized mode design for paper, Web, and CATI. The paper instrument was developed first, separately, and it provided the starting point for developing the CATI mode. Separate CATI specifications were developed that would maintain comparability with previous rounds of the survey that were run as CATI-only surveys. The Web version of the instrument was a hybrid of both the paper and CATI modes. In this situation a unimode design was impossible to achieve because the survey’s history worked against this approach.

An example from the NSRCG where the generalized mode design was used is an item that asks to choose a description of the employer from among a long list of response categories. In paper and Web it was rendered as a Mark one answer, but it required significant adaptation for CATI because of the complexity of the response options as shown in figure 3 below. The original question wording and the response format did not translate well into an interviewer-administered mode.

Figure 3: Question B14 as Displayed in Paper, Web, and CATI Modes Paper Web

CATI B14_1. Was your principal employer during the week of October 1, 2003. . . 1. a private company or individual (go to B14_3) 2. a government agency at any level (go to B14_4) 3. were you self-employed (go to B14_2) 4. did you work for some other type of employer (go to B14_specify) DK, RF B14_2. Were you. . . 1. Self-employed in your own NOT INCORPORATED business, professional practice, or farm 2. Self-employed in your own INCORPORATED business, professional practice, or farm DK, RF

B14_3. Was that . . . 1. A private for-profit organization or individual paying your wages, salary or commissions 2. A private not-for-profit, tax-exempt, or charitable organization DK, RF B14_4. Was that. . . IF EMPLOYER WAS A SCHOOL: [State schools, colleges, universities are “state government” and schools run by local school districts are “local government”]. 1. Local government, such as city or county government 2. State government 3. U.S. military service, active duty or commissioned corps, such as USPHS or NOAA 4. U.S. government as a civilian employee DK, RF

Specification

To achieve a unimode design, specifying all modes at the same time is critical. The researcher must consider how to handle question wording and response formats in each mode, the use of probes, allowing for don’t know, refuse or empty responses, the use of data definition validation in computer-assisted modes, and the application of consistency edits. Adequate time and resources must be budgeted for testing in all modes, to ensure that each mode works well as a stand-alone instrument and does not deviate in unspecified ways from the other modes. Finally, before deciding on (de Leeuw’s) generalized mode design, the researcher should consider whether equivalence in design can be adequately maintained across the modes.

The Impact of Instrumentation on Data Quality

The researcher much also consider the impact of a mixed mode design on data quality and comparability including overall response rates, nonresponse bias, and item nonresponse. For example, when questions are adapted to suit the mode in which they are administered, differences in question format may lead to response effects that can produce different response estimates by mode (Dillman, 2000; Dillman, et al, 1996; McCabe, et al, 2002). Sources of variation between modes (e.g., visual versus aural presentation, interviewer-administered versus self-administered, etc.) can also impact responses and introduce bias (Dillman, et al, 2001). These issues are discussed more fully later in the section on questionnaire design and data quality.

Survey Operations and Management

Telephone survey operations and management must be adapted for multi-mode data collection. Contact methods, frequency of contacts, case management, data retrieval, and interviewer training are all affected by using other modes with the phone.

Making Contact, Locating, and Communicating

The use of multiple channels of communication, such as phone, mail, email and in-person visits, can increase the chances of finding and contacting sample persons. Coordinating communications across various channels, however, can be complex. In our experience, multimode surveys use more channels of communication for both contact and administration resulting in more communication attempts.

In the NBS, in-field locating offered another means to follow-up on leads or partial address information for cases that had not been located using traditional means. However, to avoid making too many contacts, rules were established regarding the number of calls that would be placed prior to sending a case to the field.

In the NSRCG, contact was made via mail, e-mail, and telephone. MPR sent advance letters, paper questionnaires, reminder postcards, and later in the field period, incentive letters. Graduates who did not respond to the initial paper questionnaire mailing were sent another questionnaire packet with a letter providing information about the Web option. All nonrespondents to the second mailing were followed up by CATI. E-mail reminders were sent to sample members with available email addresses; they included a link to the survey website to encourage recipients to log on and respond.

Just as communication with sample members can be initiated through a variety of channels, survey operations centers need to be able to receive communications from sample persons in a variety of ways. Toll-free numbers should be included on mailing materials as a means of encouraging sample persons to call in. In addition, providing sample members with an e-mail address allows them to contact the call center at their convenience if they wish to ask a question, provide an address update, or specify good times to call. For example, in the KFS an e-mail address was provided that was used by over 200 sample members as a way to validate MPR as a legitimate policy research company.

Integrated Data Storage and Retrieval

Integration of database systems that take advantage of the full range of technical and design capabilities is key to ensuring that multi-mode studies such as those described above operate efficiently. For example, the ability to write all data into a single database as they are collected provides the flexibility to follow-up partially completed cases from multiple modes. In both the NSRCG and the KFS, because data from the Web mode (and in the NSRCG the paper mode) are entered into the same database as data collected via CATI, the telephone mode can be used for data collection as well as for encouraging Web response. For partially complete Web or paper cases, interviewers can quickly complete the interview over the telephone by jumping to the appropriate items in CATI. In the NSRCG, this capability also facilitates within-record data retrieval for critical data items, allowing interviewers to skip right to the critical items when collecting missing data for partially completed cases.

Case Management

In CATI/CAPI surveys such as the NBS, communication between the field and the operations center is critical. There are means to transfer updated information from locating staff to field interviewers, from field interviewers to CATI management, and from CATI management to field interviewers. The sample person is able to call in and complete an interview based on a field interviewer’s prompting. CATI and locator notes are available to field interviewers so that they are aware of what has transpired with each case. To avoid duplicating data collection efforts on cases, field interviewers are informed if a case has been completed in CATI. Information on case statuses are sent back to a central system so that completed cases can be included in integrated management reports.

Adapting systems for tracking the flow of cases and their status between modes during the field period is critical. This requires an expanded status code system and rules for sequencing codes through call attempts and their respective outcomes. Status codes for cases that are in transition are important for keeping track and following up with sample persons who say they will complete the survey using a particular mode or who have partially completed an interview using a particular mode. For example, in the NSRCG, special codes indicated which cases were Web partial completes and which were received by mail. Other codes indicated current telephone status as well as which cases were missing critical data items and required a follow-up phone call. MPR’s Sample Management System (SMS) was used for locating, mailing questionnaires and letters, producing reports, and sending email reminders.

Call scheduling is another aspect of case management that requires careful consideration when fielding a multimode survey. The call scheduling component of the case management system must not only account for the outcome status of contact attempts in multimode surveys, it must be able to turn off access to cases in one mode that have just been accessed or completed in another mode. There are also times when a CATI respondent expresses a preference for the Web and in response the case management system must release the case immediately.

Role of Interviewers

The role of telephone interviewers must also be adapted for multi-mode surveys. Interviewers must understand the survey methodology and the interplay of modes. When making follow up calls to cases to retrieve missing information, interviewers should be trained to jump to missing items and collect only the necessary information. They must also leave extensive notes so that incomplete cases that are assigned to e field follow-up are well documented for field interviewers. Field interviewers must be willing to send incomplete cases to the survey operations center, as necessary, in order to facilitate completion of more cases using the less expensive CATI mode.

CATI interviewers attempt to complete a case while they have the sample person on the phone. Sometimes however, it may be more effective to persuade a sample member to

complete in another mode. Offering increased incentives to encourage response in a less costly mode, for example by Web, may be done via letters but also by interviewers leaving messages on answering machines or offering it to the respondent directly.

It is important to use caution when contacting sample persons using subsequent modes, because, failure to respond to the initial mode of a multi-mode survey is often a veiled refusal. This makes follow-up with a subsequent mode more like refusal conversion than an initial contact. Therefore, interviewers who are assigned to different follow-up modes usually get the hardest cases and may suffer morale problems. It may be necessary to provide special training, bonuses, or other incentives to encourage good interviewer performance (Groves & Tarnai, 1988).

In summary, there are numerous details associated with efficient multimode survey management that must be considered, developed, and tested before production. While initial startup can be time consuming, once systems are integrated, programming and management effort for the remainder of the data collection period can be reduced. As de Leeuw (2005) noted, the goal of multi-mode data collection is to have the best affordable mix of modes, balancing response rates, data quality, total survey error, costs, and respondent preference. In these surveys, the phone was the main mode because of the persistence this mode allows at a reasonable cost.

Data Quality and Comparability

Differences in question format and other sources of variation between modes (e.g., visual vs. aural presentation, interviewer-administered vs. self-administered, dynamic vs. passive) can impact responses, introducing potential bias and producing different response estimates by mode. Therefore, measures should be in place to examine data quality and comparability across modes. Several measures of data quality are appropriate to mixed-mode surveys. These include both unit-level and item-level measures. We use data from the 2003 NSRCG to look at several indicators of data quality.

Unit-Level Measures of Data Quality

Common measures of overall data quality by mode include completion rate, rate of partial break-offs, completion times, and rates of missing data (McCabe, et al 2002; Holbrook, et al 2003; Fricker, et al 2005). On measures such as item-non-response and consistency, computer-assisted modes should contribute to higher data quality since programmed skips keep the respondent on route, consistency edits promote accurate reporting, and dynamic fills help the respondent answer questions that are referred to previously. One would also expect less missing data in interviewer-administered modes since interviewers can prompt for responses if the respondent declines to answer. On the other hand, in Web or paper modes, respondents can typically leave items blank.

Item Data Comparability

Differences in values of items between modes may be due either to self-selection into a mode or to the modes themselves. For example, if employed people tend to use the Web more than unemployed people you may find that the average income of Web respondents is higher than that of respondents reporting in CATI. On the other hand, if self selection into the mode has no relationship to employment status, then a difference in income between modes might be due to the way the question is asked in each mode. This is what we mean by “mode effect”. Generally, mode effects are most prevalent when questions are subjective, sensitive, vague, sensitive, or cognitively demanding and are less likely when questions require simple, factual answers.

As mentioned above, unimode instrument design is particularly difficult in modes that differ along aural vs. visual, self-administered vs. interviewer-administered, and dynamic vs. passive dimensions. Therefore, we would expect to be most likely to find potential mode differences between the most disparate modes, CATI and paper modes (with Web somewhere in between), where instrumentation has followed a more generalized design. For the purpose of illustrating data comparisons between CATI and paper modes, we focus on two indicators; non-differentiation (the number of identical or nearly identical responses in a series using the same scale) and social desirability.

Little research has compared CATI to Web and paper modes. However, Fricker, et, al (2005) report that Web and mail surveys may encourage non-differentiation in response, particularly when items in a set of related questions are displayed in a grid. In a CATI survey such a series is likely to be asked as a series of yes/no items that may produce more variation in response. We would therefore expect to find less variation in response to a series of similar items in Web and paper than in CATI.

As mentioned above, there is considerable evidence that respondents tend to present themselves in socially desirable ways when an interviewer is present (Aquilino, 1994; Tourangeau & Smith, 1996). Given this tendency, we would expect that for any sensitive items in the survey, estimates derived from data collected by the Web and paper could differ from estimates derived using CATI data.

Mode Effect versus Self Selection into a Mode

To determine if there is a mode effect it is necessary to separate out any self-selection effects through randomization. Two ways to (try to) separate self-selection effects from mode effects are to conduct a formal experiment in which sampled cases are randomly assigned to alternate modes of data collection, or to analyze a dataset constructed from post-collection randomized matching. This latter method consists of randomly matching respondents from different modes based on frame variables.

A well-conducted experiment is better than post-collection matching analysis. Its strength is in the pre-experiment randomized selection of respondents into groups, e.g., treatment and control. However, in a real-world operational survey, it can be very

difficult to conduct such an experiment. An alternative approach is post-collection randomized matching (de Leeuw, 2005). Matching should be considered a bias-reducing technique. In this method an analysis dataset is constructed based on matching records between the modes. The matching is based on frame variables using either exact matching of categorical variables or some other method such as propensity score matching. After the matched dataset is constructed, standard statistical techniques can be used to detect mode effects. Matching methods can be applied continuously, at low cost, and without disrupting operations.

For the NSRCG, exact matching was used on 5 categorical frame variables to produce an analytical dataset for CATI / paper mode comparisons. These are Age, Gender, Degree, Field of Study, and Race. See Appendix A for a summary.

Choice of Variables to Investigate Mode Effects

We selected two variables for this analysis that would allow us to test for item level mode effects outlined above. Question B33 (salary) was evaluated for social desirability effects in the interviewer-administered CATI mode. Question C10 (importance of job factors) was evaluated for non-differentiation.

Items that have different forms between modes were of particular interest. The salary question (B33) is slightly different in the CATI mode from the paper mode in that Don’t Know and Refusal are followed up with categorical items representing salary ranges (similar to the expense question of the KFS in Figure 1). Question C10 was the same in all modes.

While researching the statistical basis for mode comparisons, we learned, through internal MPR statistical consulting, of an important and profound limitation in the selection of variables. Routing of respondents to a subset of items based on their response to earlier items is common in surveys. An item that is sometimes on and sometimes off route cannot usually be statistically evaluated for mode effects because the routing mechanism creates a subset of the matched dataset that can destroy the randomness of the matching. (This is true for formal experiments as well.) Due to this constraint, only items that were asked of everyone (with a few exceptions) were suitable for this analysis. For some items, we can impute a reasonable value for the items not on route. For example, for the salary item (B33) we imputed a value of 0 for unemployed respondents to carry out a test of average salary by mode.

Data Analysis

While there is a rich descriptive literature on mode effects, we found little mention of the specific statistical approaches and tests that were used. Thus we spent much time determining appropriate data structures and tests in order to execute the comparisons in a statistically rigorous manner. Table 3 summarizes the methods for the 2 example items.

Table 3: Questions used in an example mode effect analysis for the 2003 NSRCG Question /

Description Question Format Potential Mode Effect

-Data Preparation -Statistical Test(s)

B33. Annual salary of job; asked only of the employed.

Numeric in all modes. CATI mode followed up DK/RF with salary categories.

Social desirability leads to higher salary in CATI mode

-Impute 0 for unemployed -Impute the relatively few CATI categorical responses to numeric using the category midpoint -t-test for average equality

C10. Importance of 9 factors when thinking about a job; always asked.

Same in all modes. -Very important -Somewhat important -Somewhat unimportant -Not important

Expect more non-differentiation in paper (more of the same answer) than in CATI.

-Count number of each choice -Compute indicator variable if sum of any choice is 9. -Chi-square test for distribution sameness

B33: Salary

A comparison of mean salaries for the CATI/Paper matched dataset indicates that CATI respondents reported significantly higher salaries, with a mean of $31,637, than paper respondents, with a mean of $28,924 (t=2.33, p=.020). This suggests that respondents to the CATI mode gave more socially desirable responses than paper respondents on the sensitive question of salary and is consistent with other evidence that interviewer presence leads to socially desirable reporting (Aquilino, 1994; Tourangeau & Smith, 1996).

C10 Non-differentiation: Factors When Considering Employment

To test for non-differentiation across modes, we flagged cases in which the respondent chose the same response for all 9 sub-items. Respondents to the paper mode (12%) were significantly more likely to give identical answers to all items in the scale �� 2 = 6.5818, df=1, p=0.01).

These analyses were executed with unweighted data, an appropriate method for the detection of mode effects. However, in order to judge the effect on survey estimates, we would need to weight the data with base weights from sample selection.

Survey Costs

Multimode surveys may or may not be more costly than a phone-only survey. We developed a model to compare multimode costs to CATI-only costs. It is based in concept on the experience of the three MPR surveys and borrows its form from a model presented in Groves (1989, pp527). He defines a cost model for a single mode survey from a centralized telephone facility as:

C = F + S(·) + M(·) + I(·) + A(·) + D(·), where

C = total cost of the survey

F = summary of fixed costs

S(·) = total costs for supervisory activities

M(·) = total costs for monitoring activities

I(·) = total costs for interviewer salaries and training

A(·) = total costs for clerical and administrative salaries

D(·) = total costs for device support, hardware and communication costs

Multimode Cost Model

We adapted the cost model above into a multimode cost model.

CA = F + Fa + S (·) + M (·) + I (·)+ A(·) + Aa(·) + D(·) + Da(·) + Pa(·) + Za(·)

CA = adapted cost of a multimode survey

Fa = additional fixed costs

S = changed supervisory costs

M = changed monitoring costs

I = changed interviewing costs

Pa = additional costs of paper

Aa = additional administrative costs

Da = additional systems costs

Za = additional costs of incentives

Multimode cost model terms sometimes are additional terms in the adapted cost model and sometimes are changed terms from the original model. The adapted model is constructed so that if only the CATI mode is used the result is the original cost model. The model above is useful for a mature multimode capability within an organization. The subscript (lowercase) a refers to additional terms and the subscript (delta) denotes changed terms in the multimode cost model compared to the CATI-only alternative.

An additional cost always represents increased costs. A changed cost may be an increased or a decreased cost with respect to the CATI-only alternative. The choice of the descriptor additional or changed reflects the authors' experiences in conducting multimode surveys with a telephone mode.

The terms are related to each other through values of parameters that are not explicitly listed. These include response rates, salary and hourly rates of various levels of staff, effectiveness of various channels of communication with respondent, and levels of

incentives per mode. For example, a higher rate of response for paper and Web modes should reduce the cost of the CATI component, but not always linearly (see below).

Description of Adapted Terms

CA - Adapted cost of a multimode survey: The cost of a multimode survey may be higher or lower than its CATI-only counterpart. Fixed costs will certainly be higher but the degree to which respondents use lower-cost modes may more than make up this cost.

Fa - Additional fixed costs: These include additional costs of specification, survey management, modifications to the CAI instrument itself, the implementation of a multimode management system, and other items.

S - Changed supervisory costs: Costs of survey center CATI supervision should decrease whenever there are fewer CATI cases. For a CATI/CAPI survey, fewer hours of CATI supervision will be replaced with hours of CAPI supervision.

M - Changed monitoring costs: This cost should increase or decrease directly with the number of interviewing hours. This term may explicitly disappear for the Web, paper, or CAPI components but be replaced with other downstream costs included elsewhere.

I - Changed interviewing costs: This cost is related to number of CATI cases to be attempted, but a smaller number of such cases will not necessarily reduce this cost linearly. See the sub-section on non-linear rate reduction below.

Aa - Additional administrative costs: A multimode survey in a complex project normally involves a broader group of survey professionals from different departments to field the survey. There may be additional administrative costs associated with project supervision, benefits, and other fees charged to the project.

Da - Additional systems costs: These include costs of providing multiple platforms and environments for the various modes such as secure Web servers and laptops, communication with the field, laptop management systems, provision of multimode survey management systems and reports, data merging, and so forth. These costs are reduced considerably if the modes can be accommodated in the same electronic instrument and database. They are further reduced to the extent that this multimode capability becomes part of the established survey infrastructure.

Pa - Additional costs of paper: If there is a paper mode there will be the costs of producing the paper questionnaire, mail, data entry (including hand edit and programming), and any re-contact with the respondent to clear up problems. This cost may pay off if enough paper questionnaires are received to offset the cost per case of the CATI mode. Given the advent of the Web mode and its advantages, the authors conclude that a paper mode should only be used if it will result in an overall response rate increase.

Za - Additional costs of incentives: In the multimode context incentives may be variable in order to encourage the respondent to the best and cheapest mode, the Web mode. A good-faith Web respondent completes the survey virtually without cost beyond fixed costs. MPR found it advantageous to offer an increased incentive for the Web mode in the 2003 NSRCG, the additional cost of the variable incentive being more than made up by reduction in number of call attempts and interviewer time to complete by CATI.

Comparing Alternatives

The 2 cost models can be used together to judge the relative costs of the alternative approaches. This is done by comparing the (hopefully) saved costs in the changeable terms (S to S , M to M , and I to I ) to the always-positive additional costs. Using terms from both models, the cost reduction CostReduced is defined as follows:

CostReduced = [S(·) + M(·) + I(·)] - [S (·) + M (·) + I (·)]

From the multimode cost model, the additional costs CostAdditions is defined as:

CostAdditions = Fa + Aa(·)+ Da(·) + Pa(·) + Za(·)

The cost differential CostDifference is then defined as the difference and can be either positive or negative. If incentives are the same for all modes then the last term drops out.

CostDifference = CostAdditions - CostReduced

Non-linear Rate Reduction in the Multimode Alternative

As mentioned above, completing cases in a less expensive mode does not always reduce costs linearly in the CATI mode. Consider the interviewing cost term I(·) in an example CATI-only survey with a sample size of 10,000. Ignoring training costs, a simplified value (hours only) for I(·) at 2 hours per case would be 20,000 hours. Say an alternative design would consist of a staged Web and CATI design where 3,000 cases are done on the Web before CATI starts. The changed I (·) term would have a higher number of hours per case, for example 2.5 hours average over the remaining 7,000 cases, yielding 17,500 interviewing hours as opposed to a more naïve expectation of 14,000 hours. The reason for this is that the Web mode will have picked off the easiest 3,000 cases, those that are located and compliant. Additionally, if you don’t prepare your interviewers correctly for the follow up of what can be considered passive refusals, you could possibly find that the rate increase in hours per case would result in a higher number of interviewing hours over the CATI-only mode.

It is worth noting what the cost model does not take into account. Such a model does not include increased or decreased value along the lines of other survey quality indicators such as timeliness or item-level data problems. Also, a study may require a minimum response rate, no matter how achieved. A survey sponsor may require multiple modes even if this is not the correct choice from a pure cost/quality perspective. The costs of a survey are a major quality consideration, but not the only one.

Conclusions

The three multimode MPR surveys fit well into survey-taking paradigms as described by de Leeuw, Dillman, Groves, and others, but often with twists. MPR worked through many implementation issues along the lines of instrumentation, survey design, survey operations, data quality and comparability, and survey costs. In doing so, MPR has been able to define generalized concepts, systems, approaches, and specifications that enable it to more easily accommodate the diverse world of multimode surveys. Main conclusions follow:

Instrumentation

Modes are disparate if they differ along at least one of the three major dimensions of aural/visual, self-/interviewer administered, or dynamic/passive. Unimode specification (Dillman 2000), such as that used for NBS and KFS, should be used where possible. It is cheaper, easier, faster to field, and methodologically more consistent than the alternative but is more difficult to achieve the more disparate the modes. It is possible to accommodate generalized mode design (de Leeuw, 2005), if necessary, as was done for the 2003 NSRCG. However, given the number of differences in items across modes (approaching 50% of the items) it was a far more expensive and drawn out process than the unimode approach. In either situation a simultaneous multimode specification is much preferred over successive specification.

Survey Design

Staging of modes proceeded from cheapest to most expensive in the three MPR surveys. An exception to this staging principle could be considered if an alternative staging could result in a higher response rate, especially among special groups. An exception might also be made based on data quality considerations. The timing of the staging of modes, and the interplay between the modes is not always obvious.

Survey Operations

Survey operations are more complex in a multimode study. Survey operations staff must appropriately adapt the CATI-only survey procedures. Call scheduling parameters are not as easy to set and it is harder to anticipate workload. Interviewers often find themselves attempting to persuade respondents who are passive refusals but they frequently motivate completion in other modes. The organization should therefore find appropriate ways to evaluate interviewer’s worth in a multimode context.

Data Quality and Comparability

Analysis of mode effects must separate out effects of selection into a particular mode. Formal operational experiments are theoretically better than post-collection randomized matching techniques, but are difficult to carry out and are not often funded. Post-collection assessment may be the only possibility for many surveys. Any search for

mode effects should follow along theoretical lines. Data from items that are routed cannot usually be analyzed for mode effect because the routing destroys the randomization for both formal experiments and matching techniques.

Survey Costs

A multimode survey may or may not be more expensive than its CATI-only counterpart. There will be additional costs to field a multimode survey, but these may be offset if enough respondents shift to a less expensive mode, especially the Web mode. The more the survey-taking organization can generalize database and survey management systems and approaches to multimode surveys, the less difficult and less expensive it is to conduct them. Avoid using a paper mode unless it results in a higher response rate.

Telephone Collection as Part of a Multimode Survey

In the three MPR example multimode surveys, the use of the telephone would be justified solely in terms of its contribution to response rate. In the NBS, it had the additional advantage that it was the least expensive mode. In the NSRCG and the KFS, the use of the phone was necessary to achieve a reasonable response rate despite its higher cost, but was also used to prompt completions in other modes. In these two latter surveys the phone also had the secondary role of being used to complete partial Web or paper cases and (for the NSRCG) to execute within-record critical item data retrieval.

References

Aquilino, W. S. (1994). Interview Mode Effects in Surveys of Drug and Alcohol Use. Public Opinion Quarterly, 58, 210-240.

Couper, M. P. (2000). Web Surveys: A Review of Issues and Approaches. Public Opinion Quarterly, 64, 464-494.

De Leeuw, E. D. (2005). To Mix or Not to Mix Data Collection Modes in Surveys. Journal of Official Statistics, 21, 2, 233-255.

Dillman, D. A., and Christian, L. M. (2003). Survey Mode as a Source of Instability in Responses across Surveys. Paper presented at the Workshop on Stability of Methods for Collecting, Analyzing, and Managing Panel data, American Academy of Arts and Sciences, Cambridge, MA, March 26-28.

Dillman, D.A., Phelps, G. Tortora, R. Swift, K., Kohrell, J. & Berck, J. (2001). Response Rate and Measurement Differences in Mixed Mode Surveys Using Mail, Telephone, Interactive Voice Response and the Internet. Draft Paper.

Dillman, D. A. (2000). Mail and Internet Surveys: The Tailored Design Method. New York: John Wiley & Sons.

Dillman, D. A. & Tarnai, J. (1988). Administrative Issues in Mixed Mode Surveys. In Telephone Survey Methodology, p509-528, 20p. Groves, R. M. (Ed).

Dillman, D. A., Sangster, R.L., Tanari, J. & Rockwood, T. (1996). Understanding differences in people’s answers to telephone and mail surveys. In M.T. Braverman and

J.K. Slater (Eds.) Advances in Survey Research: New Directions for Evaluation Series (Vol. 70, pp.45-62). San Francisco: Jossey-Bass.

Dillman, D. A., Brown, T. L., Carlson, J. E., Carpenter, E. H., Lorenz, F. O., Mason, R., Saltiel, J., and Sangster, R. L. (1995). Effects of Category Order on Answers in Mail and Telephone Surveys. Rural Sociology, 60, 674-687.

Fricker, S., Galesic, M., Tourangeau, R., Yan, T. (2005). An Experimental Comparison of Web and Telephone Surveys. Public Opinion Quarterly, 69, 370-392.

Goho, J. (2002). Mixed Mode Effects in a Community College Graduate Survey. Paper presented at the 42nd Annual Forum of the Association for Institutional Research, Toronto, Canada

Groves, R.M. (1989). Survey Errors and Survey Costs. New York: John Wiley.

Groves, R.M. & Kahn, R.L. (1979). Surveys by Telephone: A National Comparison with Personal Interviews. New York: Academic.

Holbrook, A. L., Green, M. C., Krosnick, J. A. (2003). Telephone Interviewing Versus Face-to-Face: Interviewing of National Probability Samples with Long Questionnaires. Public Opinion Quarterly, 67, 79-125.

Link , M. W., & Mokdad, A. (2004). Are Web and Mail Modes Feasible Options for the Behavioral Risk Factor Surveillance System? In S. Cohen & J. M. Lepkowski (Eds.), 8th Conference on Health Survey Research Methods. (pp. 149-154)

McCabe, S., Boyd, C. J., Couper, M. P., Crawford, S., D’Arcy, H. (2002). Mode Effects for Collecting Alcohol and Other Drug Use Data: Web and U.S. Mail. Journal of Studies on Alcohol 63: 755-761.

Pierzchala, M., Wright, D., Wilson, C., Guerino, P. (2004). Instrument Design for a Blaise Multi-Mode Web, CATI, and Paper Survey. Paper presented at the 2004 International Blaise User’s Conference, May 2005 (at http://www.blaiseusers.org\IBUCPDFS/2004/24.pdf).

Sax, Linda J., Shannon K. Gilmartin, & Alyssa N. Byrant. (2003). "Assessing Response Rates and Nonresponse Bias in Web and Paper Surveys," Research in Higher Education 44, 409-432.

Schwarz, N., Strack, F., Hippler, H. J., Bishop, G. (1991). The Impact of Administration Mode on Response Effects in Survey Measurement. Applied Cognitive Psychology, 5, 193-212.

Shettle, C. & Mooney, G. (1999). Evaluation of using monetary incentives in a government survey, Journal of Official Statistics, 15, 231-250.

Tourangeau, R., and Smith, T. W. (1996). Asking Sensitive Questions: The Impact of Data Collection Mode, Question Format, and Question Context. Public Opinion Quarterly, 60, 275-304.

Wilson, D., Wright, D., Barton, T., and Guerino, P. (2005). Data Quality in a Mixed Mode Survey. Paper presented at AAPOR 2005.

Wright, D., Supple, A., and Aquilino, W. S. (1998). A comparison of computer-assisted and paper-and-pencil self-administered questionnaires in a survey on smoking, alcohol, and drug use. Public Opinion Quarterly, 62, 331-354.

Appendix A: Some Details on Mode Comparisons

The NSRCG data used for comparisons were taken from the Blaise database before data cleaning in order to eliminate the effects of any imputation on the comparisons.

The population of interest for these mode comparisons is a subset of the NSRCG completes subtracting cases that either did not have a choice of modes (for a variety of reasons) or cases where data were collected in 2 or more modes. The records available for matching are called eligible records below. Tables 4 summarize the sub-setting process.

Table 4a: Records Eligible for Matching by Mode After Sub-Setting Mode Number Percent

CATI 3,827 62.2 Web 1,236 20.9 Paper 1,090 17.7 Total 6,153 100.0

Table 4b: Respondent Characteristics by Mode, Eligible Cases for Matching Characteristic Phone Web Paper Total Age (Chi-Square: 75.247***, df = 12, Cramer’s V: 0.0782) Unknown 61.75 (373) 18.54 (112) 19.70 (119) 100 (604) <22 57.14 (40) 32.86 (23) 10.00 (7) 100 (70) 23-25 62.29 (1647) 20.84 (551) 16.87 (446) 100 (2644) 26-30 63.88 (1068) 21.17 (354) 14.95 (250) 100 (1672) 31-40 62.44 (507) 18.35 (149) 19.21 (156) 100 (812) >40 54.70 (192) 13.39 (47) 31.91 (112) 100 (351) Gender (Chi-Square: 114.714***, df = 2, Cramer’s V: 0.1365) Male 63.83 (2065) 23.06 (746) 13.11 (424) 100 (3235) Female 60.38 (1762) 16.79 (490) 22.82 (666) 100 (2918) Degree (Chi-Square: 29.239***, df = 2, Cramer’s V: 0.0689) Bachelor’s 64.13 (2921) 19.34 (881) 16.53 (753) 100 (4555) Master’s 56.70 (906) 22.22 (355) 21.09 (337) 100 (1598) Race (Chi-Square: 81.574***, df = 4, Cramer’s V: 0.0814) White 58.64 (1721) 20.85 (612) 20.51 (602) 100 (2935) Asian 59.60 (745) 25.04 (313) 15.36 (192) 100 (1250) Minority 69.16 (1361) 15.80 (311) 15.04 (296) 100 (1968) Field (Chi-Square: 1037.387***, df = 6, Cramer’s V: 0.2903) Unknown 68.57 (96) 15.71 (22) 15.71 (22) 100 (140) Science 66.92 (2602) 13.36 (714) 14.71 (572) 100 (3888) Engineering 61.90 (1129) 24.89 (454) 13.21 (241) 100 (1824) Health 0 (0) 15.28 (46) 84.72 (255) 100 (301)

The theory of matching supports only pair-wise comparisons of data. Thus three datasets were produced for the analysis as summarized in Table 5.

Table 5: Summary of Matched Analytical Datasets Pair-wise matching

combination Matches Number analytical

records CATI / paper 757 1,514 Paper / Web 569 1,138 Web / CATI 1,063 2,126

Tables 6 compares percent breakdown by category, between the dataset of all eligible cases and the matched cases for CATI / paper.

Table 6a: By Age Category Age Unknown 22 or less 23 to 25 26 to 30 31 to 40 Over 40

All Eligible 9.8 1.1 43.0 27.17 13.2 5.7 CATI / paper 9.8 0.7 49.8 22.2 11.8 5.8

Table 6b: By Gender Table 6c: By Degree Gender Male Female Degree Bachelors Masters

All Eligible 52.6 47.4 All Eligible 74.0 26.0 CATI / paper 43.5 56.5 CATI / paper 77.3 22.7

Table 6d: By Field, Very high level Field Unknown Science Engineering Health

All Eligible 2.3 63.2 29.6 4.9 CATI / paper 2.4 69.5 28.2 -

Table 6e: By Race Race White Asian Minority

All Eligible 47.8 20.3 32.0 CATI / paper 52.0 18.0 30.0

Acknowledgements

The following people from MPR provided valuable support: Elizabeth Stuart and Amang Sukasih on statistical matching techniques, Sameena Salvucci for a Quality Assurance review and advice on data analysis, and Sara Skidmore for assistance with the literature search.

Telephone Collection as Part of a Multimode SurveyChallenges include difficulty in finding and...

Documents

Transcript of Telephone Collection as Part of a Multimode SurveyChallenges include difficulty in finding and...