BR Assignment Report Full
-
Upload
thanobol-cenphakdee -
Category
Documents
-
view
220 -
download
0
Transcript of BR Assignment Report Full
-
8/6/2019 BR Assignment Report Full
1/15
Housing Price Prediction Model Business Research Assignment
Full time MBA 2009 Utrecht
Date of submission: 17 th November 2009
Word count: 900 words (excluding Appendix)
FTMBA09, UB Number: 09028224
-
8/6/2019 BR Assignment Report Full
2/15
Business Research Assignment, FTMBA09, UB Number: 09028224
Table of Contents
Executive Summary ................................................................................................................................. 1
Introduction ............................................................................................................................................ 1
Objective........................................................................................................................................... 1
Data and Methodology ..................................................................................................................... 1
Data Analysis ........................................................................................................................................... 1
Linear Multiple Regression Analysis ....................................................................................................... 2
Conclusion ............................................................................................................................................... 4
Recommendations .................................................................................................................................. 4
Appendix ................................................................................................................................................. 5
Table of illustrations
Tables
Table 1 : Descriptive statistics of each variable from district A and B .................................................... 1
Table 2 : Correlations table ..................................................................................................................... 2
Table 3 : Multiple Regression Analysis of Price, H_Size, Age, District, H_Dist and Age_Dist ................. 2
Charts
Chart 1 : Box plot of price in district A and B .......................................................................................... 1
Chart 2 : Histogram of residual value from regression model 3 ............................................................. 3
Chart 3 : Residual value plot against predicted value from regression model 3 .................................... 3
Chart 4 : Scatter plot, Y axis = Price, X axis = H_Size, Z axis = Age separated by district ....................... 4
-
8/6/2019 BR Assignment Report Full
3/15
Housing Price Prediction Model November 17, 2009
Real Estate Association 1
Executives summary
This report has developed a reliable housing price
prediction model to forecast the selling price in District A
and B by using linear multiple regression technique. Our
model can explain 88.6% of total variation in price within
the relevant range of house size and age of house.
Introduction
Objective
To develop a regression model as a tool for predicting
the selling price of resident properties in both districts in
the city
Data and Methodology
Several real estate agents and property assessors were
interviewed in order to identify what the major
explanatory variables are that might affect the price of
properties. The following independent variables were
considered:
Quantitative variables: H_Size (House size in square feet), L_Size (Lot size in acres), Age (House age in years),
Attract (An attractiveness rating of the property ranging
from 0 to 100, the higher the better), P_Tax (Property
tax of the prior year in dollars), N_Rooms (Number of
bedrooms in the house)
Qualitative variable: District (The district in the city: 0
for district A, 1 for district B)
The data consists of 625 properties sold in the past 3
months. We used linear multiple regression by adding
the dummy variable (District ) and interaction terms
(H_Dist : H_Size*District, Age_Dist : Age*District)
technique to find out the forecasting model that give the
most suitable relationship between independent
variables and Price (dependent variable) in each district.
Data Analysis
Chart 1: Box plot of price in district A and B
The median of property price in district B is higher than
that of in district A and there are no outliers. This means
that the price data are reliable.
District A (District = 0)
District B (District = 1)
Table 1: Descriptive statistics of each variable from district A and B
-
8/6/2019 BR Assignment Report Full
4/15
Housing Price Prediction Model November 17, 2009
Real Estate Association
Table 1 shows that the average of the housing price in
district B (USD 453,980.94) is more expensive than that
in district A (USD 226,174.77). Consequently, average
property
tax
in
district
B
is
more
expensive
than
that
in
district A (USD 5,300.90 in district B and USD 1,655.65 in
district A). Additionally, average house size and lot size
in district B are 4,055.05 square feet and 1.4568 acres
respectively, bigger than those in district A, which are
2,032.47 square feet and 0.6608 acres respectively. The
average age of a house in district B is 47.28 years, older
than in district A, which is 12.57 years. Average
attractiveness and number of bedrooms in both districts
are not significantly different.
2
Prce 50215.092 94.322H_Size 1241.796Age
(0.000) (0.000) (0.000)
6.994H_Dist 1087.526Age_Dist
(0.023) (0.001)
Std. Error of the Estimate = 46669.901
Table 2: Correlations table
As shown in Table 2, Attract and N_rooms have no
significant relationship to Price. However, H_Size,
L_Size, Age and P_Tax have a significant relationship
between each other. This means that there is multi
collinearity between independent variables.
Linear Multiple Regression Analysis
Table 3: Multiple Regression Analysis of Price, H_Size, Age, District, H_Dist and Age_Dist
Model 1:
= 0.886, Adjusted = 0.885
-
8/6/2019 BR Assignment Report Full
5/15
Housing Price Prediction Model November 17, 2009
Real Estate Association 3
odel 2:
88.970 95.212H_Size 1211.297Age
(0.000) (0.000) (0.000)
8District
(0.004) (0.646)
= 0.886, Adjusted = 0.885
Model 3:
Prce 42025.525 98.102H_Size 1212.041Age
(0.000) (0.000) (0.000)
e_Dist 22962.084District
(0.004) (0.031)
= 0.886, Adjusted = 0.885
Std. Error of the Estimate = 46690.584
odel 1
has the least Std. Error of the Estimate. However, model
1 arginality. It is not practical
includes interaction
erms but eliminates the main effect from the dummy
variable. Model 2 dropped because H_Dist and District
District terms).
all of p values of coefficient show that all regressors
have significant effect on Price. Secondly, even though
VIFs of Age and Age_Dist are more than 10, which mean
Chart 2: Histogram of residual value from regression model 3
Chart 3: Residual value plot against predicted value from regression model 3
M
Prce 478
4.856H_Dist 1029.883Age_Dist 8885.61
(0.384)
Std. Error of the Estimate = 46699.619
1025.601Ag
All models have the same R2and adjusted R2. M
violates the principle of m
to stipulate and fit a model that
t
terms have 95% potential to have no linear relationship
with Price (evaluated from p values of coefficients of
H_Dist and Therefore, model 3 is the
most appropriate to be our forecasting model. Firstly,
there is collinearity between them, it is acceptable
because Age_Dist is interaction term of Age and District.
Lastly, residual pattern analysis of model 3 shows there
is
no
evidence
to
violate
normality,
constant
variance
and independence of errors assumptions.(See chart 2, 3)
-
8/6/2019 BR Assignment Report Full
6/15
-
8/6/2019 BR Assignment Report Full
7/15
Appendix
1 Define objective
To develop a regression model as a tool for predicting the selling price of resident properties in both
districts in the city
2 Specify model
Using linear multiple regression model (1 dependent variable and many independent variables)
3 Collect data
The data consists of 625 properties sold in the past 3 months both in District A and District B.
3.1 Dependent variable
Price (House selling price in USD)
3.2 Initial independent variables
Quantitative
H_Size (House size in square feet)
L_Size (Lot size in acres)
Age (House age in years)
Attract (An attractiveness rating of the property ranging from 0 to 100, the higher the
better)
P_Tax (Property tax of the prior year in dollars)
N_Rooms (Number of bedrooms in the house)
Qualitative
District (The district in the city: 0 for district A, 1 for district B)
Page | 5
-
8/6/2019 BR Assignment Report Full
8/15
4 Descriptive Data Analysis
Figure 1: Box plot of Price
There are no outliers data in Price. Median of price in District B is more expensive than that in
District A
H_Size
L_Size Age
Attract
P_Tax N_Rooms
Figure 2: Box plot of all quantitative variables
There are no outliers data in any of independent variables. H_Size, L_Size, Age and P_Tax have the
same pattern of box plot.
Page | 6
-
8/6/2019 BR Assignment Report Full
9/15
District A (District = 0)
District B (District = 1)
Figure 3: Descriptive statistics of each variable from district A and B
The average of the housing price in district B (USD 453,980.94) is more expensive than that in district
A (USD 226,174.77). Consequently, average property tax in district B is more expensive than that in
district A (USD 5,300.90 in district A and USD1,655.65 in district B). Additionally, average house size
and lot size in district B are 4,055.05 square feet and 1.4568 acres respectively, bigger than those in
district A, which are 2,032.47 square feet and 0.6608 acres respectively. The average age of a house
in district B is 47.28 years, older than in district A, which is 12.57 years. Average attractiveness and
number of bedrooms in both districts are not significantly different.
Page | 7
-
8/6/2019 BR Assignment Report Full
10/15
Figure 4: Correlation between each of variables in both districts
Attract and N_rooms have no significant relationship to Price. However, H_Size, L_Size, Age and
P_Tax have a significant relationship between each other. This means that there is multi collinearity
between independent variables.
Page | 8
-
8/6/2019 BR Assignment Report Full
11/15
5 Estimate unknown parameter and Evaluate model
We have one independent quantitative variable that is District. Therefore, we add District as dummy
variable into linear multiple regression model, created 2 interaction terms (H_Dist : H_Size*District , Age_Dist : Age*District). Attract and N_Rooms are eliminated because they are no relationship to
Price (from correlation analysis). We decide to not adding P_Tax because it is necessary to know the
price before we pay the tax that means it is not suitable to add it in price prediction model.
Figure 5: Statistic results from SPSS
(
Page | 9
-
8/6/2019 BR Assignment Report Full
12/15
R2 = 0.886, Adjusted R2 = 0.885 and Standard Error of the Estimate = 46733.169
F test (Overall test)
: 0
F = 799.862, p value = 0.000 which is less than 0.05 (95% confident interval)
tly linear relationship
etween independent variables and dependent variable.
We reject null hypothesis. We are 95% confident that there are significan
b
T test (Individual test)
We fail to reject 0 : 2 0 , t = 0.334, p value = 0.739 which is more than 0.05. We are 95%
no significantly linear relationship between L_Size and Price.
L_Size fro
confident that there is
Now, we eliminate m the initial model
Page | 10
-
8/6/2019 BR Assignment Report Full
13/15
Figure 6: Statistic results from SPSS
R2 = 0.886, Adjusted R2 = 0.885 and Standard Error of the Estimate = 46699.169
F test (Overall test)
: 0
F = 961.192, p value = 0.000 which is less than 0.05 (95% confident interval)
e reject null hypothesis. We are 95% confident that there are significantly linear relationship
T test (Individual test)
Wbetween independent variables and dependent variable.
We fail to reject 0 : 4 0 , t = 0.872, p value = 0.384 which is more than 0.05. We are 95%
re is no sign
though we fail to reject 0 : 3 0 , t = 0.460, p value = 0.646 which is more than 0.05, we
keep it in our model because of it is dummy variable. It is not practical to stipulate and fit a model
but elimina
confident that the ificantly linear relationship between H_Dist and Price.
Even
that includes interaction terms tes the main effect from the dummy variable.
ow, we eliminate H_Dist from the model N
Page | 11
-
8/6/2019 BR Assignment Report Full
14/15
Figure 7: Statistic results from SPSS
R2 = 0.886, Adjusted R2 = 0.885 and tandard Error of the Estimate = 46690.584
F test (Overall test)
S
: 0
F = 1201.765, p value = 0.000 which is less than 0.05 (95% confident interval)
We reject null hypothesis. We are 95% confident that there are significantly linear relationship
between independent variables and variable. dependent
Page | 12
-
8/6/2019 BR Assignment Report Full
15/15
Page | 13
T test (Individual test)
We reject all null hypotheses ( 0 : 0 0 , 0 : 1 0, 0 : 2 0, 0 : 3 0, 0 : 4 0 ).
p values are more than 0.05. We are 95% confident that there are significantly linear relationship
etween each independent variables and Price.
6 Prediction model
1Age 1025.601Age_Dist 22962.084District
A (District = 0):
42025.525 98.102 _ 1212.041
Prediction equation for District B (District = 1):
64987.609 98.102 _ 186.44
Allb
Prce 42025.525 98.102H_Size 1212.04
Prediction equation for District