Cigarette smoking and self-reported health in China · 2010. 5. 3. · Cigarette smoking and...
Transcript of Cigarette smoking and self-reported health in China · 2010. 5. 3. · Cigarette smoking and...
Cigarette smoking and self-reported health in China
[Accepted, forthcoming, China Economic Review]
Steven T. YEN a,* W. Douglass SHAW b, and Yan YUAN c
a Department of Agricultural Economics 308D Morgan Hall
University of Tennessee Knoxville, TN 37996-4518, USA
b Department of Agricultural Economics
Texas A&M University TAMU 2124 / Blocker Building
College Station, TX 77843-2124, USA
c The Research Institute of Economics and Management Southwestern University of Finance and Economics
Chengdu, China
The authors thank Thomas McGuire, Richard Dunn and Ximing Wu for their comments on an earlier version of this manuscript, and Paul Jakus, Mary Riddel, and V. Kerry Smith for valuable comments on a related paper on smoking behavior. Shaw is also Research Fellow, Hazard Reduction and Recovery Center at A&M, and acknowledges support from the W-2133 USDA/Hatch Project. An earlier version of this paper was presented at the 2009 Far East and South Asia Meeting of the Econometric Society, Tokyo, August 3-5. * Corresponding author.
E-mail address: [email protected] (S.T. Yen).
1
Cigarette smoking and self-reported health in China Abstract
The effect of cigarette smoking on self-reported or assessed health (SAH) has been
considered in several studies, with some surprising results, but smoking behavior has received
less attention in studies in countries like China, than in the United States and various European
countries. In this manuscript the variation in an ordinal endogenous SAH variable is modeled
with an ordinal endogenous cigarette smoking variable, using the copula approach to
accommodate skewness in the error distribution. The treatment approach avoids several selection
issues that could bias empirical estimates. The empirical model is estimated for a random sample
of adult males from nine Chinese provinces in the 2006 China Health and Nutrition Survey. The
results for our sample suggest that heavy smokers are more likely to report excellent health.
Government and those in health policy might target heavy smokers with the message that
quitting does result in benefits, keeping in mind that their own reported assessment of their
health is itself a function of several factors.
Keywords: China; Self-assessed health; Smoking
JEL Classification: I10, C31
2
1. Introduction
Are cigarette smokers more likely to state that their health is good or bad than non-
smokers? They can answer either way, as the relationship is complicated. People who believe
themselves to be healthy may know about the dangers of smoking, and have stopped, or never
started smoking. Some healthy people may also simply believe that they can handle the potential
negative effects of smoking, continuing smoking even when they know the risks. People who are
self-assessed to be less healthy may get the message about the dangers of smoking and have
stopped, but there are also many sick people who may not have stopped, for various reasons.
Perceptions about the risks of disease or death likely play a role here, as does addiction.
We investigate the complicated relationship between self-assessed health (SAH) and
smoking, using micro-level data for a sample of Chinese men. Cigarette smoking behavior in
lower-income countries including China has received less attention from economists (Lance,
Akin, Dow, & Loh, 2004) than in other countries, although studies have existed for other ethnic
Chinese in Taiwan (Liu & Hsieh, 1995; Hsieh, 1998) and Hong Kong (Ho, Lam, Fielding, &
Janus, 2003). By examining this relationship, it may help health officials who are involved in
trying to get messages about risks of smoking out to people in China. Identifying the
characteristics of smokers and their health status will help officials learn what more they might
do to communicate knowledge that people can use to gauge their own health and consider
smoking risks.
Cigarette smoking is the single most preventable cause of death in the world today.
Worldwide it kills one person every six seconds, causes 1 in 10 deaths among adults, and claims
more than five million lives annually (Mathers & Loncar, 2006; WHO, 2009). Smoking not only
causes premature deaths but also leads to several diseases which may not necessarily kill a
person but affect health, such as chronic bronchitis, mucus hypersecretion, bladder cancer, and
3
peptic ulcer disease (Samet, 2001). Yet, cigarette or tobacco use remains common throughout the
world, with many countries having in excess of a quarter of its adult population smoking.
With more than 320 million smokers consuming 30% of the world’s cigarette production,
China is the largest producer and consumer of tobacco (Mackay, 1997; WHO, 2009). Per capita
cigarette consumption in China rose dramatically from the early 1970s to the early 1990s, and
deaths due to second-hand smoke are estimated to be over 100,000 per year (China Ministry of
Health, 2007). In China, smoking is largely an activity for males. Estimates from the China
Health and Nutrition Survey (CHNS) suggest that 59.6% of men and 5.1% of women were
current smokers in 1993, and the percentages decreased to 48.9% and 3.2%, respectively, by
2003 (Qian et al., 2010). Statistics from the 2006 CHNS, the sample used in the current study,
suggest 53.3% of men were current smokers, compared to 3.7% women. Because of the small
prevalence of smokers among females, our paper investigates the demand for cigarettes in China
by men.
It is expensive to obtain medical professionals’ assessments of a subject’s health
condition, or to do extensive objective tests of overall health on a patient. Therefore, it is
common to rely on survey questionnaires or interviews that simply ask individuals to rate or
assess their own health on a scale, yielding SAH. The question typically is phrased as, “On a
scale from 1 (poor) to 4 (excellent), how would you rate your health?” When respondents can see
the question (in mail, internet, or in-person surveys that use visual devices) they might be
instructed to circle one of the four or more discrete (ordinal) responses, i.e., numbers on a Likert
scale. Sometimes the SAH is simply a binary variable indicating excellent/good health versus
mediocre or poor health, but the data set we use offers information on a 1 to 4 scale. We posit
that the SAH variable is endogenous, and explore its variation from several variables, including
cigarette smoking. Smoking is a choice that people make, and can be a variable that is
4
endogenous in our model, so we link the SAH to an ordinal smoking participation equation. We
use a very general econometric model that accommodates dependent and potentially skewed
error distributions using the copula approach.
Below we briefly review related literature on the SAH measure and studies of smoking
behavior. Then we describe the data and report sample statistics in section 3. The empirical
model is featured in section 4, followed by estimation results in section 5. The final section
concludes.
2. Empirical literature
2.1 Self-assessed health
SAH, as a health measure, has been the topic of many previous studies (Au, Crossley, &
Shellhorn, 2005; Baker, Stabile, & Deri, 2004; Butler, Burkhauser, Mitchell, & Pincus, 1987;
Cai & Kalb, 2006; Campolieti, 2002; Case & Paxson 2005; Dwyer & Mitchell, 1999; Etilé &
Milcent, 2006; Idler 2003; Idler & Benyamini, 1997; Moore & Zhu, 2000; Shmueli, 2002; van
Doorslaer & Jones, 2003). Many researchers have examined factors that lead to heterogeneity in
SAH or in reporting it. Reported health might depend on personal characteristics such as gender
and education levels, while “true” health, which is perhaps never observable, does not (Groot,
2000). Exogeneity of SAH has been rejected in many recent studies of various behaviors related
to health status. We focus below on the link between the SAH and cigarette smoking.
2.2 Cigarette smoking and health
The vast literature on cigarette smoking is much too extensive to address here (see
Viscusi, 1992 for an extensive discussion). Researchers have dealt with a host of issues for over
twenty-five years, but much of the in-depth micro-level analysis has been done in the United
States (US) or in a European country. Many smoking studies conducted in developing countries
rely on aggregated and not micro-level data (e.g., Chapman & Richardson, 1990), and aggregate
5
data have been used to examine the demand for cigarettes in China (e.g., Mao & Xiang, 1997;
Xu, Hu, & Keeler, 1998). These studies are subject to the limitations and empirical concerns
normally associated with aggregated data: explanatory variables are often highly collinear and
there can be substantial simultaneity. In addition, aggregate data generally do not contain
detailed demographic information. Modest overall smoking rates in poorer societies often
disguise high incidences for certain subgroups for whom consumption is highly concentrated,
and many studies also fail to incorporate price variation below the national level, which can be
considerable in developing countries (Lance et al., 2004).
Micro-level data have been used recently in studies of cigarette demand in China (Lance
et al., 2004; Mao & Xiang, 1997) and smoking status in Taiwan (Hsieh, 1998; Liu & Hsieh,
1995) and Hong Kong (Ho et al., 2003). Important issues in microdata modeling of smoking are
summarized in Sloan, Smith, and Taylor (2003), and earlier, by Jones (1994). The issues include
whether smokers are addicts, and if so, whether the addiction is rational (Becker, Grossman, &
Murphy, 1994; Chaloupka, 1991; Chaloupka & Warner, 2000), and as in the Taiwan studies, the
role of risks perceptions in the decisions to smoke (which builds on earlier work by Viscusi
(1990)). As reported and perceived risks are not available in our data, the large number of studies
that deal with perceived risks are not reviewed here.
Studies have explored the effect of smoking on SAH (suggesting that smoking causes
variation in the SAH), some the reverse (the SAH causes variation in smoking behavior), while
others have explored simultaneity in the relationship. In an early exploration, Blaylock and
Blisard (1992) consider the possibility of simultaneity between health status, the decision to be a
current smoker, to quit, and the quantity of cigarettes consumed. The SAH variable used by these
authors is binary (good or not good), as are the smoking participation and quit decision variables.
Using a sample of women from the Continuing Survey of Food Intakes by Individuals in the US,
6
they find SAH did not influence the probability of smoking or quitting.
Jones (1994) noted that the smoking-health relationship is potentially obscured by many
unobservable variables. This includes such variables as a measure of the above-mentioned level
of stress. For example, an individual with a genetic predisposition toward anxiety might feel
stressed and smoke because of that personal trait, which may be quite difficult to measure, but
which may be correlated with an observed health measure. Using a sample of British adults from
the Health and Lifestyle Survey (HALS), Jones (1994) finds that individuals with poor or fair
SAH are less likely to have quit smoking than those in better health.1
Shmueli (1996) questioned Jones’ (1994) result because of the fact that people who quit
might have done so at different times, causing an unobserved heterogeneity in SAH. Using a
sample of the elderly from Israel, he finds that the effect of SAH (binary) on smoking varies,
depending on the four year cohort considered. Income is used as an identifying instrument. For a
period of 1966 to 1985 (for only the last group of smokers from 1981 to 1985), present health
was an exogenous variable with a negative effect, with sicker (and not healthier) smokers
quitting smoking.
Jones (1996) notes, in a response, that a variable indicating a disability may in fact pick
up a curative effect sought by Shmueli (1996); he also questions the use of income as an
instrument, for income is likely correlated with the quitting decision. Using additional data from
a follow-up survey to the 1984-85 HALS and allowing for endogeneity in SAH, Jones (1996)
confirms his earlier finding. However, he notes that those who experienced serious injury or
illness at the end of the period of analysis are indeed more likely to quit smoking.
Ho et al. (2003) examine a cross section of women and men surveyed in Hong Kong in
1 Jones (1994) and Sloan et al. (2003) considered more complicated decisions to stop smoking, but such modeling requires panel data, which we do not have.
7
the mid-1990’s. They noted that at that time, the proportion of former smokers among those who
had smoked at some point, was higher than that in Mainland China. Most of their sample report
being in good or very good health, but 5% of men and about 8% of women report being in poor
health. Using odds ratio models, these authors consider the role of smoking in reporting poor or
very poor health, adjusted for age, alcohol consumption, exercise (in the past 30 days),
education, marital status, and place of birth. Those who had never smoked had better perceived
health than those currently smoking, but those who had quit (had smoked before) had the worst
perceived health, for both genders.
Most recently, Contoyannis and Jones (2004) consider the role of several “lifestyle”
behaviors in SAH. The lifestyle behaviors that may affect health include diet (a breakfast
indicator), smoking (current or not), exercise (binary), sleep, alcohol consumption, and stress.
Their data again come from the original HALS used by Jones (1994), supplemented with a
follow-up source of panel data. They take advantage of exogenous variables from the follow-up
data to model the lifestyle variables from the earlier data, and use lifestyle variables to explain
SAH. Allowing for unobserved heterogeneity, they find that non-smoking has a large and
positive effect on the probability of reporting excellent or good health (SAH = 1). We conclude,
based on the above literature and debates, that the jury is still out on how smoking affects the
SAH, that this may ultimately be an empirical outcome that varies across samples, and most
importantly, that findings for our sample will contribute to knowledge about this for men in
China.
3. Data and sample
Our data come from the 2006 China Health and Nutrition Survey (CHNS). The survey
was designed to examine the effects of the health, nutrition, and family planning programs
implemented by national and local governments and to see how the social and economic
8
transformation of the Chinese society is affecting the health and nutritional status of its
population. CHNS is a longitudinal survey which covers the Guangxi Zhuang and eight other
provinces with substantial variations in geography, economic development, public resources, and
health indicators in 1989, 1991, 1993, 1997, 2000, 2004 and 2006. The nine provinces accounted
for approximately 42% of China’s population in 2006. The surveys collected data on consumer
goods as well as detailed information on measures of health outcomes, such as height, weight,
blood pressure, activities of daily living, SAH status, morbidity, physical function limitations,
and disease history.
A multistage, random cluster process was used to draw the sample surveyed in each of
the provinces. Counties in the nine provinces were stratified by income level (low, middle, and
high) and a weighted sampling scheme was used to randomly select four counties (one in low,
two in middle, and one in high-income levels) from each province. In addition, the provincial
capital and a lower income city were selected. Villages and townships within the counties and
urban and suburban neighborhoods within the cities were selected randomly. Currently, there are
about 4,400 households in the survey, covering some 19,000 individuals. Further details and
updates of the survey are described elsewhere (CHNS, 2007).
A sample of males from the 2006 CHNS adult survey is used for this study. Other years
of the CHNS are not used because of problems that may arise when estimating limited dependent
variable models with short panels; see Greene (2004), who suggests that ignoring panel
approaches might be best for such data. To focus the comparison between current smokers and
those that have never smoked (“never smokers”), former smokers are excluded from the sample.
After eliminating observations with missing values, 2,908 men remained in the final sample.
Definitions of variables and their descriptive statistics are presented in Table 1. The first
endogenous variable used in this analysis is an indicator of cigarette smoking behavior. The
9
survey contains information on the number of cigarettes smoked per day, and therefore cigarette
consumption could be modeled as an integer-count variable. However, due to the observed
clusters of the number of cigarette packs smoked in this data, with pile-up’s at 0.5, 1, 1.5 and 2
packs, we recode the cigarette pack quantity into an ordinal variable representing the decision to
smoke or not and how much (with values from 1 to 4). Ordinal response models are found to
perform better in modeling cigarette demand with such clustered data (Kasteridis, Munkin, &
Yen, 2010). The other endogenous variable in the analysis, SAH, is collected as an ordinal
variable, ranging from poor (coded as 1) to excellent (coded as 4). The average SAH is about
2.7, with the largest frequency falling in category 3 (good health).
Table 2 reports two-way frequencies between SAH and smoking, and illustrates the
difficulty in spotting a simple relationship between these two variables, although there are a few
suggestive cells of high frequency [e.g. non-smokers (smoking = 1) and those in good health
(SAH = 3); heavy smokers (smoking = 4) and those in excellent health (SAH = 4)]. Several cell
frequencies appear puzzling, such as the high count of moderate smokers (smoking = 3) in good
health. Further exploration of these frequencies distribution requires a formal modeling
procedure.
Continuous explanatory variables include age, income, and household size, which have
been shown to be important in health-related studies (e.g. Deshazo & Cameron, 2005). A
common problem in cross-sectional studies is a lack of variation in cigarette prices. As our data
include people from many provinces and regions, community-level prices were collected for
local markets and merged to the sample. Community-level prices in poorer, developing countries
can provide legitimate cross-sectional variation that identifies demand (Deaton, 1997). The
extraneous price data exhibit some variation, with a mean of RMB 4.85 and a standard deviation
of RMB 2.75 per pack. Binary explanatory variables include categories of education, which are
10
found to play a role in cigarette smoking (e.g., Viscusi, 1995), and the region of residence,
marital status, and some indicators of alcohol drinking behavior.2
4. An ordinal health model with an ordinal endogenous treatment
A treatment effect model accommodates the non-random selection of individuals into the
“treated” state and avoids statistical bias in empirical estimates caused by such non-random
selection. Most sample selection and treatment effect models have heretofore been estimated
with Gaussian (jointly normal) error disturbances. Normality of the error disturbances generally
cannot be justified by economic theory and misspecification of the distribution can lead to
inconsistent empirical estimates and misleading inference. To avoid biases caused by such
distributional misspecification, we develop an ordered probability treatment effect model with a
more flexible error distribution. Below we specify the model, first building its likelihood
function without a specific distributional assumption for the error terms. Following this the error
distribution is specified by the copula approach (Nelsen, 2006), which requires a marginal
cumulative distribution function (cdf) (henceforth, margin) for each error term and a copula
function which links the margins. In all that follows, observation subscripts are suppressed for
brevity. The model consists of an ordinal treatment equation for cigarette smoking (y1)
1 1 1if , 1,...,j jy j z u j J− ′= µ ≤ α + < µ = (1)
and an ordinal outcome equation for SAH (y2) which includes observed cigarette level as a
regressor:
2 1 1 2if , 1,...,k ky k x y u k K− ′= ξ ≤ β + δ + < ξ = (2)
2 An anonymous reviewer suggested control for occupations, as being in certain types of jobs may be associated with smoking for social reasons, or because of stress, or [we add] because of correlations with risk preferences. However, we explored this for six types of occupations and the only significant one was for those who were professional or administrative managers, which was in turn strongly linked to being employed for wages.
11
where z and x are vectors of explanatory variables, α and β are conformable vectors of
parameters, and the µ’s and ξ’s are threshold parameters such that 0 1, 0, ,Jµ = −∞ µ = µ = ∞
0 1, 0, ,Kξ = −∞ ξ = ξ = ∞ and 2 1,..., J −µ µ and 2 1,..., K −ξ ξ are estimable. The random errors u1, u2
are bivariate (not necessarily normally) distributed with zero means, unitary variances, and a
correlation structure specified below. Besides a different error distribution, the specification
above is an extension of the conventional treatment effect model (Barnow, Cain, & Goldberger,
1980) in that the treatment variable y1 is ordinal (vs. binary) and the outcome variable (y2) is also
ordinal rather than continuous. The likelihood function for an independent sample is
{ } 1 21( , )
1 2all 1 1
Pr( , )J K
y j y k
j k
L y j y k= =
= =
= = =ÕÕÕ (3)
where “all” indexes summation over sample observations, 1(A) is a binary indicator function
which equals 1 if event A holds, and the bivariate probabilities (likelihood contributions) are
1 2
1 1 1
1 1 1 1 1
Pr( , )
( , ) ( , )
( , ) ( , )
1,..., ; 1,...,
j k j k
j k j k
y j y k
F z x y F z x y
F z x y F z x y
j J k K
µ α ξ β δ µ α ξ β δ
µ α ξ β δ µ α ξ β δ-
- - -
= =
¢ ¢ ¢ ¢= - - - - - - -
¢ ¢ ¢ ¢- - - - + - - -
= =
(4)
such that 1 2 1 1 2 2( , ) Pr( , )F v v V v V v= £ £ is a bivariate cdf for standardized random variables V1
and V2 with margins 1 1 1 1( ) Pr( )F v V v= £ and 2 2 2 2( ) Pr( ).F v V v= £
4.1 The copulas
To accommodate skewness in the distribution of the error terms u1, u2, each bivariate cdf in Eq.
(4) is specified with the copula approach (Nelsen, 2006).3 We consider two leading
3 A copula, denoted 1 2 1 1 2 2( , ) [ ( ), ( )],C v v C F v F v= is a dependent function that can be used
to generate joint distributions of random variables V1 and V2 with specific margins 1 1( )F v and
2 2( )F v .
12
specifications: the Frank and Gaussian (bivariate normal) copulas (Nelsen, 2006). The Frank
copula was used in a slightly different context, in a recent paper in this journal by Yen, Yuan,
and Liu (2009). Here we present the Gaussian copula, defined as
1 11 2
1 11 2 1 2
2 2( ) ( )
2 1/2 2
( , ; ) [ ( ), ( ); ]
1 ( 2 )
2 (1 ) 2(1 )
F F
C F F F F
s st tdsdt
Φ Φ
θ Ψ Φ Φ θ
θπ θ θ
- -
- -
- ¥ - ¥
=
é ù- - +ê ú=ê ú- -ë û
ò ò (5)
where Φ–1 is the inverse univariate standard normal cdf and Ψ is the standard bivariate normal
cdf with correlation parameter θ bounded between [–1,1]. This is the copula used by Lee (1983)
in developing sample selectivity models with continuous but non-normal distributions.
4.2 The margins
Most alternatives to the Gaussian copula admit skewness in the random errors even with
symmetric margins. Additional skewness can be accommodated by using skewed margins. We
consider two forms of margins for F1 and F2. The first is the benchmark univariate Gaussian cdf
and the other is the generalized log-Burr cdf for standardized random variable ui with skewness
κi (Burr, 1942; Lawless, 2003)
1/( ; ) 1 (1 ) , , 1,2.i iui i i i iF u e u iκκ κ -= - + - ¥ < < ¥ = (6)
The generalized log-Burr distribution includes the logistic (κi = 1) and extreme value (κi → 0)
distributions as special cases; it can deliver very different probabilities even within a moderate
range of skewness.
To demonstrate the copula approach in the present context, for a model with Gaussian
copula and generalized log-Burr margins (henceforth, Gaussian-Burr model), our preferred
specification, the first probability on the right-hand side of Eq. (4) is obtained by substituting
1 1 1( ; )F u κ and 2 2 2( ; )F u κ from Eq. (6) into the probability in Eq. (4):
13
1
1
1
1/12 1
1/12 1
( , )
{ [1 (1 exp( )) ],
[1 (1 exp( )) ]; }.
j k
j
k
F z x y
z
x y
κ
κ
µ α ξ β δ
Φ Φ κ µ α
Φ κ ξ β δ θ
--
--
¢ ¢- - -
¢= - + -
¢- + - -
(7)
The specific forms of the remaining probabilities in the likelihood contribution Eq. (4) and for
alternative copulas can be derived in like manners, with slightly different threshold parameters
µ’s and ξ’s.
4.3 Average treatment effects
An important reason for estimating a treatment effect model is calculation of average
treatment effects (ATEs). Unlike conventional models in which a treatment is specified as an
exogenous discrete variable, an endogenous treatment effect model accommodates endogeneity
of the treatment and non-random selection of individuals into the treated states, thus eliminating
potential sample selection bias and providing consistent estimates of the treatment effects and the
effects of other explanatory variables. Eqs. (1) and (2) give the marginal probabilities
1 1 1 1 1 1Pr( ) ( ; ) ( ; )h hy h F z F zµ α κ µ α κ-¢ ¢= = - - - (8)
2 2 1 2 1 1Pr( ) ( ) ( ).k ky k F x y F x yξ β δ ξ β δ-¢ ¢= = - - - - - (9)
Using the joint probability in Eq. (4) and the marginal probability in Eq. (8), we have the
conditional probability
2 1 1 2 1Pr( | ) Pr( , ) / Pr( ).y k y h y h y k y h= = = = = = (10)
Using Eq. (10), the treatment effects can be calculated as
2 1 2 1Pr( | ) Pr( | ), for all , 1,...,khgTE y k y h y k y g h g k K= = = - = = > = (11)
which are the effect of being in smoking category h (in reference to category g) on the
probability of being in the kth SAH category for all k. All ATEs are calculated by averaging the
component effects across the sample.
14
4.4 Marginal effects of explanatory variables
Drawing on the marginal probability in Eq. (9) and conditional probability in Eq. (10),
we calculate the marginal effects of exogenous variables on SAH probabilities by differentiating
(or differencing, in the case of a binary explanatory variable) the marginal probabilities
2Pr( 1)y = and 2Pr( 4)y = , and conditional probabilities 2 1Pr( 1| 1),y y= = 2 1Pr( 1| 4),y y= =
2 1Pr( 4 | 1),y y= = and 2 1Pr( 4 | 4).y y= = For statistical inference, standard errors of all
marginal effects (and ATEs) are calculated by the delta method (Serfling,1980).
5. Empirical results
First, we explore a number of alternative modeling options. We estimate a Gaussian
binary treatment (smoking) effect model, assuming that SAH is a continuous variable (Barnow et
al., 1980). We also estimate an ordinary least-squares model with exogenous smoking, and two
ordered probit models with exogenous smoking (with the Gaussian and generalized log-Burr
distributions). These latter two models are restricted forms of the preferred Gaussian-Burr and
Frank-Burr models (by imposing zero error correlation). These alternative empirical models lead
to different results than those reported below. The most notable difference is insignificance of
the exogenous smoking variable in the last three models. The Gaussian binary treatment effect
model, though not directly comparable to the probability models considered here, did produce a
significant treatment effect of binary smoking but notably different results on some of the
explanatory variables (e.g., significance of age and its squared term in the SAH equation).
Complete results of these preliminary analyses are not presented due to space considerations but
are available upon request.
Our next task is to choose among alternative copulas and margins. As the models with
different copulas and margins are nonnested, model selection is accomplished with a nonnested
specification test procedure. Specifically, let ri and si be the maximum log likelihood contributions
15
of sample observation i for two competing specifications and define differences di = ri – si for i =
1,…,n with sample mean d and standard deviation ds . Then, under the null hypothesis of no
difference between the two models, Vuong’s (1989, Eqs. (3.1), (4.2), (5.6)) standard normal statistic
is 1/2 / dz n d s= ~ (0,1)N . The test results suggest that the Frank and Gaussian copulas perform
equally well, and that the generalized log-Burr margin fits the data better when either copula is
used. In sum, the Frank-Gaussian and Frank-Burr model perform equally well (z = 0.29, p-value
= 0.77), and that they are both preferable to the Gaussian-Gaussian (bivariate Gaussian) and
Frank-Gaussian models, all with z > 2.0 and p-value < 0.04. Most important is the finding that
the Gaussian-Gaussian model, used extensively in empirical studies with sample selection, is
rejected at the 5% level of significance. The Gaussian-Burr and Frank-Burr models produce
fairly close empirical estimates, in terms of average treatment effects and marginal effects of
explanatory variables, and we focus on results of the former for the remainder of the analysis.4
5.1 Maximum-likelihood estimates
Maximum-likelihood estimates for the Gaussian-Burr model are presented in Table 3.
Estimate of the error correlation is positive and significant at the 1% level. Both threshold
parameters are positive and significant at the 1% level; a negative threshold parameter estimate
would have suggested misspecification of the models. The skewness parameter for the cigarette
equation is not significantly different from zero at the 10% level of significance (suggesting the
extreme-value margin) but is significantly different from one (rejecting the logistic margin). The
skewness parameter for the SAH equation shows the opposite, which is significantly different
from zero at the 1% level of significance (rejecting the extreme-value margin) but is not
significantly different from (and is extremely close to) unity, suggesting the logistic distribution
is appropriate.
4 The full set of results for all econometric specifications are available from the authors.
16
Table 3 contains many interesting coefficients, but moving straight to the variable
relating to the focus of the paper, note that the effect of cigarette smoking, holding other factors
constant, is negative and significant on SAH at the 1% level. This statistical significance of
smoking on SAH suggests a misspecified model, such as the exogenous model referred to above,
can disguise the important effects of smoking on SAH, and highlights the importance of
accommodating endogeneity of smoking We more carefully consider the effects of smoking on
SAH by turning to the average treatment effects.
5.2 Average treatment effects
Table 4 reports the ATEs, i.e., the average changes in the probability of each SAH level
for various smoking treatments. All ATE’s are significant at the 1% level with only a few
exceptions. These results indicate the average effect of one level of smoking relative to another
level, on the probability of falling into a given SAH category. For example, in the upper left-
hand corner, the –5.71 estimate indicates that relative to being a non-smoker (smoking = 1), a
person who smokes 1 to 10 cigarettes (smoking = 2) is 5.71% less likely to be in poor self-
assessed health (SAH = 1). The diagonal elements show the effects between adjacent smoking
categories, so here, for example, a category 3 smoker is 2.52% less likely to be in poor self-
reported health than a category 2 smoker. Similarly, in the lower right-hand corner, a person who
smokes very heavily (smoking = 4) has a 21.64% higher chance of being in excellent self-
reported health (SAH = 4) than a category 3 smoker, and a 28.70% chance of being in the
excellent SAH category than a non-smoker.
Our result showing a negative effect of smoking on the probability of reporting poor
health (SAH = 1) is similar to the findings by Ho et al. (2003), i.e., that those who had never
smoked had better perceived health than those currently smoking. However, at higher SAH
levels, our ATE results clearly suggest that the level of smoking increases the SAH status. For
17
our sample of Chinese men, those who smoke more report being healthier than those who smoke
less. Non-smokers may have quit smoking because they were in poor health, or they may have
never started smoking because of poor health. Another possibility is that heavy smokers are
falsely reporting their health as being very good or excellent because they have adapted to
whatever negative actual effects on their health may cause, and report their health status quite
differently from those who never smoked, or fully understand the possible negative effects of
smoking. We cannot control for past health using panel data in the manner that Shmueli (1996)
did, but our results from the ATE’s are consistent with his.
It is very likely that some smokers in our sample have quit smoking (former smokers)
because they had problems with their health. It is thus worthwhile to investigate the SAH
specifically for current smokers and for former smokers. We accomplish this by estimating a
special case of the same (Gaussian-Burr) model as presented in this paper, but with a binary
quitting treatment approach (J = 2 in Eq. (1)). We create a subsample of current and former
smokers, excluding non-smokers from this part of the analysis.5 The ATEs for quitting smoking,
also presented in Table 4, in fact do suggest that quitting has negative effects on the probabilities
of reporting low SAH (poor and fair) and a positive effect on the probability of reporting excellent
health. This highlights the importance of comparing not only current to non-smokers, but also
considering former smokers in their relation to current smokers.
We cannot read more specific causal stories into the ATEs because details on underlying
nature of the health status or timing of starting or quitting smoking are not available in the data
used here. For example, with the exception of a variable indicating hypertension, we do not
know whether individuals who rate themselves as unhealthy do so because they have one or
5 We acknowledge one of the reviewers for suggesting this analysis. Parameter estimates for the current and former smoker sample are available upon request of the authors.
18
more of the specific smoking-related diseases (cardiovascular disease, one of the associated
cancers, or non-fatal diseases mentioned in the introduction). The relationships between smoking
and the explanatory variables are next explored by examining the marginal effects, while
controlling for the SAH measure.
5.3 Marginal effects
The marginal effects of exogenous variables on the probabilities of falling into particular
SAH categories, conditioned on the smoking levels, are calculated at the sample means of all
explanatory variables and results are presented in Table 5. The large number of explanatory
variables in the model prohibits discussion of the marginal effects of every variable in this paper,
so we focus on variables most important for the theme here—the conditional probabilities reflect
the strength of effects for nonsmokers, and heavy smokers. To begin, age has the usual affect on
health: older people are more (less) likely to be in poor (excellent) health, and the magnitudes
differ only slightly (in reference to their standard errors) between the smoking status (non-
smokers and heavy smokers). Being in a larger household increases the chance of reporting
excellent health. There is evidence of regional differences, with residents in Shandong Province,
for instance, being healthier than residents in Liaoning (and many other provinces). Residing in
Guangxi has the opposite effect.
Government agencies naturally can influence income via tax and other policies, and this
may affect health for men in China. According to the marginal effects, household income has a
negative effect on the probability of self-reporting poor health, with higher income reducing the
marginal probability of being in SAH = 1, and increasing the probability of being in excellent
health (SAH = 4), both conditional and unconditional on smoking status. Wealthier men report
being healthier, but being a non-smoker has a stronger marginal effect of income (0.82%) on the
probability of being in excellent health, than being a heavy smoker does (0.50%). We also note
19
that the income effects on health, conditioned on smoking status (a non-smoker or heavy
smoker) are not significant.
Government policies can affect education levels, which are often found to play a role in
understanding health risks. Several different qualitative education categories are included in the
specification of the models, but for Chinese men, there is no significant marginal effect of
education on health. The relationship between education and SAH can be significant. For
example, D'Hombres et al. (2010) find that tertiary (primary) levels of education in their sample
of households from eight transition countries of the former Soviet Union lead to higher (lower)
levels of SAH, as compared to secondary levels. Di Novi (2010) reports a positive effect of
college education on SAH as well in the US. It is possible that the higher levels of education
(high school or more) are required before one sees a strong positive influence of years of
schooling on the SAH. Thus, the insignificance that we find may be due to the fact that the
majority our sample of men are educated at the junior high school level or less (79%), and only
0.06% have some college education or more.
Of considerable interest here are the drinking and physical activity variables. Physical
activity has a positive marginal effect on being in the excellent health category, and oddly, so
does drinking occasionally, up to drinking almost daily. Here again, interpretation is limited
because health is self-reported and a factor could be adaptation and perceptions in that reporting,
but note that “frequently” here does not necessarily indicate abuse of alcohol, as moderate
drinking could be one drink per day. Contoyannis and Jones (2004) consider drinking behavior
described as “prudent” as a lifestyle that is also modeled as an endogenous variable in their
analysis, but found no strongly significant inverted relationship to social class: there is only a
hint that social class negatively influences drinking alcohol.
20
6. Conclusions
In this paper we have developed and estimated a model of the variation in SAH that
includes formal development as an ordinal endogenous smoking variable, applied to a sample of
Chinese men from nine provinces. The importance of this is first and foremost, simply to get the
estimated model of variation in the SAH, assumed to be endogenous, to also take consideration
of the possibility that smoking is an endogenous variable. Exogeneity of the smoking variable
has often been maintained in many earlier attempts to explain behaviors. The treatment approach
we apply, in general, can alleviate some problems stemming from selection issues. Indeed,
results of our preliminary analysis demonstrate that failure to accommodate endogeneity of
smoking can disguise an important effect of smoking on SAH.
Though there have been some important studies of smoking in China, Hong Kong and
Taiwan, we add to the empirical literature on smoking in a developing country. Results based on
one-year sample from the CHNS support the notion that health, as indicated by the SAH, is
better for this sample of Chinese men, after controlling for age and other socio-demographic
effects. Sick people in our sample may have quit smoking, which is what was found by Shmueli
(1996) for a sample of Israelis. Agencies that send messages about smoking risks may need to
work harder to reach those with lower education levels, as are predominant in our sample. The
CHNS data do not contain information on stopping and starting smoking over time, nor do we
have details on specific diseases of the sample units, so our interpretation is somewhat limited to
what can be drawn from this cross sectional analysis, using sub-categories such as current and
“former” smokers. However, our research here suggests that heavy smokers be targeted by
policy-makers, communicating to them that there are positive benefits of quitting smoking on
their health, even if they have smoked for a significant period of time. They also need to keep in
mind that self reported health for this group may have a complicated relationship to other factors.
21
As mentioned above, we do not use other years from the CHNS and adopt a panel-data
technique because of econometric issues with a short panel and the type of limited dependent
variable models we have (Greene, 2004)Also at the technical/econometric level, the copula
approach we implement here allows for specification of the treatment effect model with more
flexible distributions for the error terms than that used in conventional treatment effect models.
The use of the flexible model allows a convenient way to test other specifications against the
more conventional model. The Gaussian-Gaussian model, used in much of the sample selection
literature, is rejected.
A few final additional caveats pertain to the results we present here. First, we use
smoking as a variable that may affect health, formally treated as an endogenous variable, but
recognize that there may also be a fully simultaneous relationship between the SAH and smoking
(e.g., Blaylock & Blisard, 1992). A fully simultaneous relationship is not pursued here as a
simultaneous-equation model with dual observed discrete (binary or ordinal) variables is not
identified (Schmidt, 1981). Second, unlike some other models of smoking behavior, we have not
explored the role of several possible additional influences on smoking decisions. For example,
we have not included information about the subjective smoking-related mortality or morbidity
risks of the smokers and non-smokers (e.g., Viscusi, 1990), because the data set for this part of
China does not contain it. Third, we do not have indicators of perceived addiction, which may be
quite important in determining whether people currently smoke. However, to the extent that
subjective risks of harm or addiction are themselves endogenous variables that are functions of
other explanatory variables, such as gender, and age, etc., these other factors that become
instruments are indeed used in our models. Thus, whether our model being “incomplete” matters
in the usual econometric sense is not assured, but is a question worthy of further investigation.
22
Finally, a number of the regressors we do use, such as drinking and physical activity,
may be potentially endogenous. While the lack of viable instruments does not allow further
exploration of potential endogeneity of these variables, results of an alternative model without
these variables produce few discernable differences in the treatment effects and marginal effects
of other explanatory variables, based on the current sample. Further studies might consider
endogeneity of these additional variables, perhaps when a longer panel data sample becomes
available.
23
References
Au, D. W. H., Crossley, T. F., & Shellhorn, M. (2005). The effect of health changes and long-
term health on the work activity of older Canadians. Health Economics, 14(10), 999–
1018.
Baker, M., Stabile, M., & Deri, C. (2004). What do self-reported, objective measures of health
measure? Journal of Human Resources, 39(4), 1067–1093.
Barnow, B. S., Cain, G. G., & Goldberger, A. S. (1980). Issues in the analysis of selectivity bias.
In E. W. Stromsdorfer, & G. Farkas (Eds.), Evaluation studies review annual, 5 (pp. 43–
59). Beverley Hills: Sage.
Becker, G. S., Grossman, M., & Murphy, K. M. (1994). An empirical analysis of cigarette
addiction. American Economic Review, 84(3), 396–418.
Blaylock, J. R., & Blisard, W. N. (1992). Self-evaluated health status and smoking behaviour.
Applied Economics, 24(4), 429–435.
Burr, I. W. (1942). Cumulative frequency functions. The Annals of Mathematical Statistics,
13(2), 215−232.
Butler, J. S., Burkhauser, R. V., Mitchell, J. M., & Pincus, T. P. (1987). Measurement error in
self-reported health variables. Review of Economics and Statistics, 69(4), 644–650.
Cai, L., & Kalb, G. (2006). Health status and labour force participation: Evidence from
Australia. Health Economics, 15(3), 241–261.
Campolieti, M. (2002). Disability and the labor force participation of older men in Canada.
Labour Economics, 9(3), 405–432.
Case, A. C., & Paxson, C. (2005). Sex differences in morbidity and mortality. Demography,
42(2), 189–214.
Chaloupka, F. J. (1991). Rational addictive behavior and cigarette smoking. Journal of Political
24
Economy, 99(4), 722–742.
Chaloupka, F. J., & Warner, K. E. (2000). The economics of smoking. In A. J. Culyer, & J. P.
Newhouse (Eds.), Handbook of Health Economics, Vol. 1 (pp. 1539–1627). Amsterdam:
Elsevier.
Chapman, S., & Richardson, J. (1990). Tobacco excise and declining consumption: The case of
Papua New Guinea. American Journal of Public Health, 80(5), 537–540.
China Health and Nutrition Survey (CHNS) (2007). Design and methods. Chapel Hill, NC:
Carolina Population Center. http://www.cpc.unc.edu/projects/china/design/design.html
(Accessed April 1, 2008).
China Ministry of Health. (2007). China tobacco control report. Beijing, May.
Contoyannis, P., & Jones, A. M. (2004). Socio-economic status, health and lifestyle. Journal of
Health Economics, 23(5), 965–995.
Deaton, A. (1997). The analysis of household surveys. Baltimore: Johns Hopkins University
Press.
DeShazo, J. R., & Cameron, T. A. (2005) The effect of health status on willingness to pay for
morbidity and mortality risk reductions. California Center for Population Research. On-
Line Working Paper Series. Paper CCPR-050-05. URL
http://repositories.cdlib.org/ccpr/olwp/CCPR-050-05/ (Accessed March 8, 2009).
D'Hombres, B., Rocco, L., Suhrcke, M., & McKee, M. (2010). Does social capital determine
health? Evidence from eight transition countries. Health Economics, 19(1), 56–74.
Di Novi, C. (2010). The influence of traffic-related pollution on individuals' life-style: results from
the BRFSS. Health Economics. in press. DOI: 10.1002/hec.1550.
Dwyer, D. S., & Mitchell, O. S. (1999). Health problems as determinants of retirement: Are self-
rated measures endogenous? Journal of Health Economics, 18(2), 173–193.
25
Etilé, F., & Milcent, C. (2006). Income-related reporting heterogeneity in self-assessed health:
Evidence from France. Health Economics, 15(9), 965–981.
Greene, W. (2004). The behavior of the maximum likelihood estimator of limited dependent
variable models in the presence of fixed effects. Econometrics Journal, 7(1), 98–119.
Groot, W. (2000). Adaptation and scale reference bias in self-assessments of quality of life.
Journal of Health Economics, 19(3), 403–420.
Hsieh, C. (1998). Health risk and the decision to quit smoking. Applied Economics, 30(6), 795–
804.
Ho, S. Y., Lam, T. H., Fielding, R., & Janus, E. D. (2003). Smoking and perceived health in
Hong Kong Chinese. Social Science & Medicine 57(9), 1761–1770.
Idler, E. L. (2003). Discussion: Gender differences in self-rated health, in mortality, and in the
relationship between the two. Gerontologist, 43(3), 372–375.
Idler, E. L., & Benyamini, Y. (1997). Self-rated health and mortality: A review of twenty-seven
community studies. Journal of Health and Social Behavior, 38(1), 21–37.
Jones, A. M. (1994). Health, addiction, social interaction and the decision to quit smoking.
Journal of Health Economics, 13(1), 93–110.
Jones, A. M. (1996). Smoking cessation and health: A response. Journal of Health Economics,
15(6), 755–759.
Kasteridis, P. P., Munkin, M. K., & Yen, S. T. (2010). A binary-ordered probit model of
cigarette demand. Applied Economics, 42(4), 413–426.
Lance, P. M., Akin, J. S., Dow, W. H., & Loh, C. P. (2004). Is cigarette smoking in poorer
nations highly sensitive to price? Evidence from Russia and China. Journal of Health
Economics, 23(1), 173–189.
Lawless, J. F. (2003). Statistical models and methods for lifetime data. New York: John Wiley &
26
Sons.
Lee, L-F. (1983). Generalized econometric models with selectivity. Econometrica, 51(2),
507−513.
Liu, J., & Hsieh, C. (1995). Risk perception and smoking behaviour: Empirical evidence from
Taiwan. Journal of Risk and Uncertainty, 11(2), 139–157.
Mackay, J. (1997). Beyond the clouds – Tobacco smoking in China. Journal of the American
Medical Association, 278(18), 1531–1532.
Mao, Z., & Xiang, J. (1997). Demand for cigarettes and factors affecting demand: A cross-
sectional survey. Chinese Healthcare Industry Management, 5, 227–229.
Mathers, C. D., & Loncar, D. (2006). Projections of global mortality and burden of disease from
2002 to 2030. PLoS Medicine, 3(11), e442.
Moore, M. J., & Zhu, C. W. (2000). Passive smoking and health care: Health perceptions myth
vs. health care reality. Journal of Risk and Uncertainty, 21(2–3), 283–310.
Nelsen, R. B. (2006). An introduction to copulas, (2nd Ed.). New York: Springer.
Qian, J., Cai, M., Gao, J., Tang, S., Xu, L., & Critchley, J. A. (2010). Trends in smoking and
quitting in China from 1993 to 2003: National Health Service Survey data. Bulletin of the
World Health Organization, 88, in press. DOI: 10.2471/BLT.09.064709.
Samet, J. M. (2001). The risks of passive and active smoking. In P. Slovic (Ed.), Smoking: Risk,
perception and policy (pp. 3–28). London: Sage Publications.
Schmidt, P. (1981). Constraints on the parameters in simultaneous Tobit and probit models. In C.
F. Manski, & D. McFadden (Eds.), Structural analysis of discrete data with econometric
applications (Chap. 12, pp. 422–434). Cambridge, MA: MIT Press.
Serfling, R. J. (1980). Approximation theorems of mathematical statistics. New York: John
Wiley & Sons.
27
Shmueli, A. (1996). Smoking cessation and health: A comment. Journal of Health Economics
15(6), 751–754.
Shmueli, A. (2002). Reporting heterogeneity in the measurement of health and health-related
quality of life. Pharmacoeconomics, 20(6), 405–412.
Sloan, F., Smith, V. K., & Taylor, D. H. (2003). The smoking puzzle: Information, risk
perception, and choice. Cambridge: Harvard University Press.
van Doorslaer, E., & Jones, A. M. (2003). Inequalities in self-reported health: Validation of a
new approach to measurement. Journal of Health Economics, 22(1), 61–87.
Viscusi, W. K. (1995). Cigarette taxation and the social consequences of smoking. In J. M.
Poterba (Ed.), Tax policy and the economy (pp. 51–101). Cambridge, MA: MIT Press.
Viscusi, W. K. (1990). Do smokers underestimate risks? Journal of Political Economy, 98(6),
1253−1269.
Viscusi, W. K. (1992). Smoking: Making the risky decision. New York: Oxford University Press.
Vuong, Q. H. (1989). Likelihood ratio tests for model selection and nonnested hypotheses.
Econometrica, 57(2), 307–333.
WHO. (2009). WHO report on the global tobacco epidemic, 2009: Implementing smoke-free
environments. Geneva: World Health Organisation. URL
http://www.who.int/tobacco/mpower/en/ (Accessed April 23, 2010).
Xu, X., Hu, T., & Keeler, T. (1998). Optimal cigarette taxation: Theory and estimation. Working
Paper, University of California at Berkeley.
Yen, S. T., Yan, Y., & Liu, X. (2009). “Alcohol consumption by men in China: A non-Gaussian
censored system approach.” China Economic Review, 20(2), 162–173.
28
Table 1
Variable definitions and sample statistics (n = 2908)
Variable Definition Mean
Endogenous variables (ordinal)
Smoking Cigarettes smoked per day: recoded as 1 = nonsmoker; 2 = 1–10 cigarettes, 3 = 11–20 cigarettes, 4 = > 20 cigarettes
2.01 (1.00)
SAH Self-assessed (reported) health status: 1 = poor, 2 = fair, 3 = good, 4 = excellent
2.74 (0.78)
Continuous explanatory variables
Cigarette price Local free-market cigarette prices (RMB per pack) 4.85
(2.75)
Age Age in years 47.76
(15.02)
HH income Household income in RMB per year 12818.26
(18373.03)
HH size Household size 2.18
(1.50)
Binary explanatory variables (1 = yes; 0 = no)
< Primary No school or did not graduate from primary school 0.22
Primary Graduated from primary school (reference) 0.20
Junior high Graduated from junior high school 0.37
Senior high Graduated from senior high school 0.14
≥ College Some college or more 0.06
Employed Employed for wages 0.25
Self-employed Self-employed 0.43
Unemployed Unemployed 0.05
Home maker Home maker (reference) 0.05
Retired Retired 0.12
Student Student, unable to work, or others 0.10
Heilongjiang Resides in Heilongjiang Province 0.12
Jiangsu Resides in Jiangsu Province 0.12
Shandong Resides in Shandong Province 0.10
Henan Resides in Henan Province 0.12
Hubei Resides in Hubei Province 0.10
Hunan Resides in Hunan Province 0.11
29
Guangxi Resides in Guangxi Zhuang Autonomous Region 0.11
Guizhou Resides in Guizhou Province 0.11
Liaoning Resides in Liaoning Province (reference) 0.11
Urban Resides in central city 0.41
Married Married 0.88
Divorced Divorced, separated, or widowed 0.02
Single Never married (reference) 0.10
Drink frequently Drinks almost everyday 0.21
Drink less frequently Drinks 1–4 times a week 0.25
Drink occasionally Drinks 1–2 times a month 0.09
Drink rarely Drinks < 1 time a month 0.46
Minority National minority 0.09
Tea Normally drinks tea 0.45
Active Participates in physical activities 0.54
Hypertension Hypertension 0.09
Standard deviations in parentheses.
30
Table 2
Two-way frequency distributions of smoking and SAH status
SAH
Smoking 1 2 3 4 Total
1 86 354 590 194 1224
(2.96) (12.17) (20.29) (6.67)
2 35 180 311 99 625
(1.20) (6.19) (10.69) (3.40)
3 37 250 435 133 855
(1.27) (8.60) (14.96) (4.57)
4 10 72 102 20 204
(0.34) (2.48) (3.51) (0.69)
Total 168 856 1438 446 2908
Relative frequencies (%) in parentheses.
31
Table 3
Maximum-likelihood estimates of ordinal probability models with ordinal treatment: Gaussian
copula with generalized log-Burr margins
Smoking SAH
Variable Estimate S.E. Estimate S.E.
Constant –1.334*** 0.287 4.756*** 0.421 Income / 10000 0.041*** 0.015 0.078*** 0.024 Age / 10 0.896*** 0.124 –0.240 0.209 Age2 / 1000 –0.916*** 0.127 –0.157 0.203 HH size 0.022 0.017 0.081*** 0.028 Heilongjiang –0.458*** 0.101 0.158 0.170 Jiangsu 0.015 0.119 0.235 0.157 Shandong –0.223** 0.100 0.439*** 0.166 Henan –0.282*** 0.098 –0.123 0.161 Hubei –0.095 0.101 –0.177 0.164 Hunan 0.254*** 0.094 0.295* 0.163 Guangxi –0.021 0.101 –0.517*** 0.166 Guizhou 0.092 0.105 0.068 0.166 < Primary 0.144* 0.075 –0.069 0.119 Junior high –0.052 0.064 0.083 0.102 Senior high –0.127 0.084 0.061 0.131 ≥ College –0.191* 0.106 –0.034 0.170 Employed 0.051 0.124 0.005 0.189 Self-employed 0.180 0.116 0.160 0.178 Unemployed 0.100 0.157 0.185 0.232 Retired –0.248* 0.132 –0.046 0.197 Student –0.065 0.132 –0.281 0.190 Married 0.164 0.109 0.203 0.167 Divorced –0.236 0.185 –0.253 0.271 Urban –0.115** 0.059 0.169* 0.093 Minority –0.212** 0.102 0.114 0.160 Hypertension 0.191** 0.078 Cigarette price –0.040 0.011 Tea –0.030 0.075 Active 0.244*** 0.080 Drink frequently 0.521*** 0.101 Drink less freq. 0.336*** 0.092 Drink occasionally 0.330** 0.131 Cigarettes –0.627*** 0.158
32
µ2, ξ2 0.687*** 0.035 2.276*** 0.117 µ3, ξ3 1.859*** 0.114 4.770*** 0.251 κi 0.115 0.084 1.001*** 0.143 θ 0.362*** 0.094 Log likelihood –6574.464
Asterisks *** indicate statistical significance at the 1% level, ** at the 5% level, and * at the
10% level.
33
Table 4
Average treatment effects of smoking and quitting on the probabilities of health: Gaussian-Burr model
SAH = 1 SAH = 2 SAH = 3 SAH = 4
Smoking 1 2 3 1 2 3 1 2 3 1 2 3
Effects of smoking
2 –5.71*** –7.19*** 5.84*** 7.06***
(2.13) (1.39) (1.78) (1.72)
3 –8.23*** –2.52*** –13.58*** –6.39*** 7.05*** 1.22*** 14.76*** 7.70***
(2.73) (0.61) (2.96) (1.58) (1.53) (0.40) (4.19) (2.48)
4 –10.33*** –4.62*** –2.10*** –22.18*** –15.00*** –8.60*** 3.82** –2.02 –3.24 28.70*** 13.94*** 21.64***
(2.91) (0.81) (0.22) (4.56) (3.19) (1.62) (1.67) (3.27) (2.92) (8.84) (4.66) (7.14)
Effects of quitting
–5.99*** –13.71*** 1.45 18.25*
(2.29) (5.30) (2.73) (10.17)
All probabilities are multiplied by 100. Asymptotic standard errors are in parentheses. Asterisks *** indicate statistical significance at
the 1% level and ** at the 5% level.
34
Table 5
Marginal effects of selected explanatory variables on the probabilities of health: Gaussian copula with
generalized log-Burr margins
Probability or SAH = 1 conditional on Probability or SAH = 4 conditional on Variable None Smoking = 1 Smoking = 4 None Smoking = 1 Smoking = 4 Continuous explanatory variables Income / 10000 –0.20*** –0.29 –0.23 1.42*** 0.82*** 0.50* (0.07) (0.21) (0.22) (0.48) (0.29) (0.30) Age / 10 1.00*** 1.85*** 1.86*** –7.06*** –5.04*** –4.30*** (0.18) (0.24) (0.33) (0.95) (0.56) (0.50) HH size –0.21*** –0.33** –0.30* 1.47*** 0.93*** 0.68** (0.08) (0.14) (0.16) (0.54) (0.34) (0.30) Cigarette price –0.08* –0.13 0.18* 0.33** (0.04) (0.33) (0.10) (0.17) Binary explanatory variables Heilongjiang –0.39 –1.50** –1.91*** 2.92 4.27** 6.10*** (0.44) (0.67) (0.67) (3.02) (2.02) (1.97) Shandong –0.96** –2.12*** –2.20*** 8.67*** 7.56*** 7.34*** (0.41) (0.66) (0.67) (3.18) (2.31) (2.00) Hunan –0.69* –0.86 –0.62 5.64* 2.79 1.23 (0.39) (0.70) (0.73) (3.29) (2.07) (1.64) Guangxi 1.78*** 3.12*** 3.45*** –7.89*** –4.92*** –4.19*** (0.64) (1.04) (1.24) (2.79) (1.62) (1.41) Urban –0.43* –0.99** –1.11*** 3.08* 2.68** 2.77*** (0.25) (0.41) (0.42) (1.68) (1.16) (1.04) Minority –0.28 –0.88 –1.07* 2.11 2.50 3.18* (0.38) (0.61) (0.56) (3.00) (2.12) (1.88) Hypertension 0.38* 0.61** –0.81** –1.46* (0.20) (0.30) (0.40) (0.75) Active –0.64*** –1.15*** –1.14*** 4.40*** 3.08*** 2 .57*** (0.22) (0.38) (0.40) (1.52) (1.03) (0.85) Drink frequently –1.31*** –2.36*** –2.33*** 9.62*** 6.85*** 5.68*** (0.28) (0.43) (0.55) (2.14) (1.51) (1.12) Drink less freq. –0.91*** –1.64*** –1.66*** 5.91*** 4.08*** 3.43*** (0.26) (0.43) (0.50) (1.76) (1.19) (0.95) Drink occasionally –0.90*** –1.62*** –1.64*** 5.81** 4.01** 3.38** (0.34) (0.58) (0.62) (2.51) (1.76) (1.43)
All marginal effects on probabilities are multiplied by 100. Asymptotic standard errors are in
parentheses. Asterisks *** indicate statistical significance at the 1% level, ** at the 5% level, and * at
the 10% level. Marginal effects for insignificant variables are not presented.