Exploration of the Relationship Between Children's Asthma Rates and Indoor Air
Pollution Using Housing Complaints as Proxy Indicator
Applied Data Science Prof. Stanislav Sobolevsky
Team members: Alexis SotoColorado Alejandro Rey Porcel Cindy Yuning Liu
Tania Vara Mazariegos
I. Introduction:
Indoor air pollution has significant impact on public health because people spend 90% of
their time indoor and the air in their house affects them most (Tox Tom, 2015). According to the
Environmental Protection Agency (2015), the indoor air pollution is 2 to 5 times more toxic than
outdoor air pollution. Currently, there are two main challenges in measuring indoor air pollution.
First, it poses threat on the privacy of household to collect every house’s air quality information.
Second, it is not feasible to purchase all the expensive sensor equipments. Therefore, there are
not feasible indicators or monitoring system to determine indoor air pollution at the moment.
This project will determine if housing complaints and its relationship with asthma rate can work
as proxy indicator to represent the presence and severity of indoor air pollution. This would help
public health department officials and city planners to determine the areas or communities which
suffer from bad housing conditions.
1
II. Why choose housing complaints as proxy indicator ?
Asthma rate is not constant across geographic areas and demographic groups.
Environmental and social factors such as indoor air pollution give rise to increased asthma
(Akinbami, 2009). Improved housing conditions which subsequently mean improved indoor air
quality, have been shown to reduce asthma incidents (Beck, 2013). Since we know that there is a
relationship between housing conditions (indoor air pollution) and asthma. We can use the data
of housing complaints and asthma hospitalization to get a picture of the air quality. The research
will only use data on the types of housing complaints that would have impact on the quality of
the indoor air quality. Also, we will assume some level of homogeneity in housing conditions
within the same community. In addition, only the data of children asthma hospitalization rate
will be used as proxy for air pollution because children tend to be more sensitive to change in the
air environment.
III. Usability of this Research:
The main user of this research will be the Department of Health & Mental Hygiene and
Department of Housing and Development of New York City. The index will be a tool that allow
public health officials to target specific communities for education and landlord compliance.
This should be considered as initial intent and foundation to create a way to measure indoor air
pollution, which will be improved in future research. At this moment patient home addresses are
not available, so we can’t target smaller level of communities.
2
IV. Hypothesis:
1. Null Hypothesis: No correlation between housing complaints of different types and asthma
incidents.
2. Alternative Hypothesis: There is a correlation between housing complaints of different types
and asthma incidents.
V. Research Data Description:
A. Housing Complaints:
We use the New York City 311 open data. The housing complaints from all New York
City were collected from 2010 to 2013. All the housing complaints types were analyzed, but only
complaints related to indoor air quality were considered relevant in the analysis. The type
housing complaints that were considered were:
■ Complaints related to Mold
■ Mold
■ Damp Spot
■ Failure to Retain Water/Improper Drainage
■ Sewer
■ Slow Leak
■ Plumbing WorkImproper
■ Plumbing WorkDefective
■ Complaints Relate to Indoor Temperature
3
■ Boiler Deactivate
■ Heat
■ Heat Related
■ Complaints Relate to Indoor Air Quality
■ Failure to Maintain
■ Pest
■ Gas
■ Ventilation System
■ Vent/Exhaust Illegal.
B. Hospitalization Rate of Asthma: we use the hospitalization rate data of children asthma from
2010 to 2013 from the Department of Health of New York State.
C. Population Data by Zip Code: the U.S. census bureau only offers population data on the
census tract level. So, we use an open source data which offers population data on the zip code
level.
VI. Analysis and Modeling:
1.Pearson Correlation: We find that different complaint types have different correlation coefficient with the
asthma incidents. The complaint types which are most highly correlated with the asthma rates are
nonconstruction, heating, appliance,boilers, elevators, electrical, paint/plaster, building and
plumbing. Among them, we picked the complaint types which are relevant to cause asthma for
our further regression model.
4
2. Summary Statistics
In total, there are 175 Zip Codes. The mean of asthma cases per Zip Code in New York is
148 cases, but it varies significantly from the minimum of 2 cases to the maximum of 804 cases.
On average, boiler complaint type has 42 cases, general construction 685 cases, heating 4430
cases, paint 1986 cases and plumbing 2347 cases.
5
Visualization per Zip Code:
It can be observed that the Boroughs The
Bronx and Brooklyn have the the highest
asthma rate in children.
The Complaints about Heating are
concentrated in Inwood, Fort George,
Washington Heights and Harlem in
Manhattan, also in different Zip Codes
around The Bronx and Brooklyn.
6
The Complaints about Paint and Plaster are
concentrated principally in Zip Codes around
The Bronx.
The Complaints about Plumbing are
concentrated in several Zip Codes around
The Bronx and Brooklyn.
The visualisation for asthma incidents and housing complaints shows a spatial correlation pattern. 3. Regression Model:
In order to find the relation between children asthma cases and housing complaints a
regression was conducted. The type of regression that was used is “backward stepwise” because
it will give us the ultimate combination of housing complaints that predicted asthma cases. The
7
regression showed that the combination of heating, plumbing, and paint/plaster have the best
RSquared with a 0.845. This means that those 3 housing complaints can be used to predict how
many asthma hospitalization of children is going to happen as shown in the table below.
These results are based on the number of complaints about housing conditions independently and
they are not based in which factor has the most impact in causing an asthma episode on the child.
VII. Conclusion:
The research shows that the boroughs Bronx and Brooklyn have the highest rates of
asthma among children and the highest number of complaints in Plumbing, Paint and Heating.
The results demonstrate that there is a correlation between children asthma and housing
complaints related. This means more specific studies need to be done to explore the type of
relationship is occurring. Future studies should be done using patience address, so a specific
study of the patience housing conditions can be established. Using specific addresses will allow
to include a factor into analysis like building age. The importance of discovering a proxy
indicator for asthma is of great importance in order to address the high number of asthma rates
in certain communities and groups. This isn’t just a public health issue, but also a social
8
inequality issue and an economic issue because the groups who suffer from asthma are mostly
minorities and in economic disadvantage.
VIII. Appendix: Procedures: I. Data Munging: The data we used was contained in 3 datasets:
Asthma rates per zip code: this dataset has the information children who had been hospitalized
per Zip code. The number of children was a density per 100,000 inhabitants.
Complaints 311 data : The list of complaints was a really huge data about, every complaint in
the year, containing type of complaint and Zip Code. This data were aggregated per Zip Code
and count each type of complaint and finally joined with the data of asthma rates per Zip Code.
Population per Zip Code: The number of children who were hospitalized due to asthma was a
density, so we needed to obtain the real number of Children who were hospitalized per Zip Code.
Finally we got one data set that contained Zip Code, Number of Complaints per type of
complaints (each type in one column) and the number of children who were hospitalized.
II. Pearson Correlation
9
II. Procedures “Regression”:
1. Divide the Data between training and validation
2. Use the function regress for
3. Now we do the backward regression, which take out x1(Boiler) and x2(General construction & Plumbing).
10
4. The previous code also give us the ultimate combination of housing complaints with x3(Heating), x4(Paint/Plaster), and x5(Plumbing).
5. Regression Fitting Line: The line showed a very good fitting life for Predicted Asthma Rate versus Actual Asthma.
11
XI. References:
“The 2010 US Census Population By Zip Code (Totally Free).” The Splitwise Blog, 2013.
http://blog.splitwise.com/2013/09/18/the2010uscensuspopulationbyzipcodetotallyfr
ee/.
Akinbami, L. J., J. E. Moorman, P. L. Garbe, and E. J. Sondik. “Status Of Childhood Asthma in
the United States, 19802007.” Pediatrics 123, no. Supplement (January 2009).
doi:10.1542/peds.20082233c.
“Basic Information.” Basic Information. Accessed November 13, 2015.
http://www3.epa.gov/air/basic.html#indoor.
Beck, A. F., J. M. Simmons, H. S. Sauers, K. Sharkey, M. Alam, C. Jones, and R. S. Kahn.
“Connecting AtRisk Inpatient Asthmatics To a CommunityBased Program to Reduce
Home Environmental Risks: Care System Redesign Using Quality Improvement
Methods.” Hospital Pediatrics 3, no. 4 (January 2013): 326–34.
doi:10.1542/hpeds.20130047.
12
“Housing Complaints | NYC Open Data.” NYC Open Data. Accessed November 10, 2015.
https://nycopendata.socrata.com/socialservices/housingcomplaints/i3j2v52s.
“Indoor Air Pollution Worse Than Outdoor.” Dr Axe, 2010.
http://draxe.com/indoorairpollutionworsethanoutdoor/.
“Tox Town Indoor Air Text Version.” Tox Town Indoor Air Text Version. Accessed
November 12, 2015. http://toxtown.nlm.nih.gov/text_version/locations.php?id=136.
13
Top Related