Analisis Regresi Linier Dan Logistik
Transcript of Analisis Regresi Linier Dan Logistik
-
8/9/2019 Analisis Regresi Linier Dan Logistik
1/22
26/05/1
Analisis Regresi Linier dan
Logistik
Oleh :
Nurita Andayani
Introduction• Difference between chi-square and regression : chi-
square test of independence to determine whether astatistical relationship existed between two variables. Thechi-square test tell if there is such a relationship, but it doesnot tell about what that relationship. But regression andcorrelation analyses will show how to determine both thenature and the strength of a relationship between twovariables
• Regression analysis is a body of statistical methodsdealing with the formulation of mathematical models thatdepict relationships among variables, and the use of thesemodeled relationships for the purpose of prediction and otherstatistical inferences.
• The word “regression” was first in its present technicalcontext by Sir Francis Galton, who analyzed the heights ofsons and the average heights of their parents.
-
8/9/2019 Analisis Regresi Linier Dan Logistik
2/22
26/05/1
Models• The independent or controlled variable is also called the predictor variable
and is denoted by x. The effect or response variable is denoted by y.
• If the relation between y and x is exactly a straight line, then the variablesare connected by the formula :
y = α + β x
where α indicates the intercept of the line with the y axis and β representsthe slope of the line, or the change in y per unit change in x.
y
x
β
α
1
yi
α + β x i
x i
Statistical Model
Yi = α + βxi + ei , i = 1, …, n
Where :
a) x 1, x 2 , …,x n are the set values of the controlled variable x
that the experimenter has selected for the study.
b) e1, e2 , …,en are the unknown error components that are
superimposed on the true linear relation. These are
unobservable random variables, which we assume areindependently and normally distributed with a mean of
zero and unknown variance of σ2.
c) The parameters α and β, which together locate the
straight line, are unknown.
-
8/9/2019 Analisis Regresi Linier Dan Logistik
3/22
26/05/1
Basic Notations
y xn y x y y x xS
yn y y yS
xn x x xS
n y x
n x
iiii xy
ii y
ii x
i
))((
)(
)(
1,
1
2222
2222
Example
• Zippy Cola is studying the effect of itslatest advertising campaign. Peoplechosen at random were called and askedhow many cans of Zippy Cola they hadbought in the past week and how many
Zippy Cola advertisements they had eitherread or seen in the past week.X (number of ads) 3 7 4 2 0 4 1 2
Y( cans purchased) 11 18 9 4 7 6 3 8
-
8/9/2019 Analisis Regresi Linier Dan Logistik
4/22
26/05/1
Least Squares Regression Line
• Least squares regression line :
• Least square estimate of α :
• Least square estimate of β :
• The residual sum of squares or the sum of squares due to
error is :
x y ˆˆˆ
2ˆ
x
xy
S
S b
x ya ˆˆ
n
i
ii x y x yS S SSE
1
2222 )ˆˆ(ˆ
Properties of the Least SquaresEstimators
a) The least squares estimators are unbiased; that is
and
b) and
c) The distribution of and are normal with means of α and
β, respectively; the standard deviations are the square rootsof the variances given in b).
d) s2=SSE/(n-2) is an unbiased estimator of σ2. Also, (n-1)s2/σ2
is distributed as with d,f,=n-2, and it is independent ofand
)ˆ( E )ˆ( E
22 )( s E
ˆ ̂
ˆ
̂
2
)( s E
-
8/9/2019 Analisis Regresi Linier Dan Logistik
5/22
-
8/9/2019 Analisis Regresi Linier Dan Logistik
6/22
26/05/1
Inference about α
2
2
2/)1(
2
2
0
0100
1.ˆ
:forintervalconfidence p%
2nd.f.,1
)ˆ(on basedis::
x
CI
x
S
x
n st
S
x
n s
t
H vs H
Checks on The Straight Line Model
relation
linear fromdeviationrelationlinear value
or residual byExplainedobserved
)ˆˆ()ˆˆ(
y
x y x y iiii
ed)(unexplainrelationlinear byof SS
SSresidualexplainedSSTotal
ˆ 222
y
SS E S S x y
-
8/9/2019 Analisis Regresi Linier Dan Logistik
7/22
26/05/1
Anova for checking regression model
Source Sum of Squares d.f. Mean Squares F
Regression SSR 1 MSR=SSR/1 MSR/MSE
Error SSE n – 2 MSE=SSE/(n-2)
Total SST n – 1
Inference for regression model
))2(,1(
10
:
)leveltsignifican(with:regionRejection
0:0:
n F F R
H H
-
8/9/2019 Analisis Regresi Linier Dan Logistik
8/22
26/05/1
The coefficient of determination
• The sample coefficient of determination is
developed from relationship between two kinds of
variation: variation of Y values in a data set around :
– The fitted regression line
– Their own mean
%100010
1
22
2
Ror R
SS T
SS E
SST
SS R R
Perfect fitted
regression lineunfitted
regression
model
The coefficient correlation
• Coefficient correlation ( r ) indicates the direction of
the relationship between the two variables X and Y
• If an inverse relationship exist-that is, if Y decreases
as X increases-then r will fall between 0 and -1
• If there is a direct relationship (if Y increases as X
increases), then r will be a value within the range 0
and 1
22. y x
xy
S S
S r
-
8/9/2019 Analisis Regresi Linier Dan Logistik
9/22
26/05/1
Exercise• PUSKESMAS „PANCORAN MAS‟ ingin mengetahui
hubungan antara usia dengan besarnya tekanandarah dari pasien. Diambil 10 pasien dan didapatkan
hasilnya sebagai berikut
Usia 38 36 72 42 68 63 49 56 60 55
Tekanan darah 115 118 160 140 152 149 145 147 155 150
a) Buat model regresinya !
b) Jika usia pasien adalah 40 pediksikan besar tekanan
darahnya !
c) Ujilah model regresi yang telah anda buat !d) Ujilah apakah parameter α=0 dan β=0 ?
e) Buat selang kepercayaan 90% untuk α dan β !
f) Hitung koefisien determinasi dan korelasinya, jelaskan artinya
!
What is Logistic Regression?
– Form of regression that allows the predictionof discrete variables by a mix of continuousand discrete predictors.
– Addresses the same questions thatdiscriminant function analysis and multiple
regression do but with no distributionalassumptions on the predictors (thepredictors do not have to be normallydistributed, linearly related or have equalvariance in each group)
-
8/9/2019 Analisis Regresi Linier Dan Logistik
10/22
26/05/1
1
What is Logistic Regression?
– Logistic regression is often used because
the relationship between the a discrete
variable and a predictor is non-linear
• Example from the text: the probability of heart disease
changes very little with a ten-point difference among
people with low-blood pressure, but a ten point change
can mean a drastic change in the probability of heart
disease in people with high blood-pressure.
Assumptions
• Absence of multicollinearity
• No outliers
• Independence of errors – assumes a
between subjects design. There are
other forms if the design is withinsubjects.
-
8/9/2019 Analisis Regresi Linier Dan Logistik
11/22
26/05/1
Background
– Odds – like probability. Odds are usually
written as “5 to 1 odds” which is equivalent to
1 out of five or .20 probability or 20% chance,
etc.
• The problem with probabilities is that they are
non-linear
• Going from .10 to .20 doubles the probability, but
going from .80 to .90 barely increases the
probability.
Background
• Odds ratio – the ratio of the odds over 1
– the odds. The probability of winning
over the probability of losing. 5 to 1 odds
equates to an odds ratio of .20/.80 = .25.
-
8/9/2019 Analisis Regresi Linier Dan Logistik
12/22
26/05/1
1
Background
• Logit – this is the natural log of an odds
ratio; often called a log odds even though
it really is a log odds ratio. The logit
scale is linear and functions much like a
z-score scale.
Background
LOGITS ARE CONTINOUS, LIKE Z
SCORES
p = 0.50, then logit = 0
p = 0.70, then logit = 0.84
p = 0.30, then logit = -0.84
-
8/9/2019 Analisis Regresi Linier Dan Logistik
13/22
26/05/1
1
Plain old regression
• Y = A BINARY RESPONSE (DV)
– 1 POSITIVE RESPONSE (Success)P
– 0 NEGATIVE RESPONSE (failure)Q = (1-P)
• MEAN(Y) = P, observed proportion of
successes
• VAR(Y) = PQ, maximized when P = .50,
variance depends on mean (P)
• XJ = ANY TYPE OF PREDICTORContinuous, Dichotomous, Polytomous
Plain old regression
• and it is assumed that errors are
normally distributed, with mean=0 andconstant variance (i.e., homogeneity of
variance)
0 1 1|Y X B B X
-
8/9/2019 Analisis Regresi Linier Dan Logistik
14/22
26/05/1
1
Plain old regression
• an expected value is a mean, so
0 1 1ˆ( | ) E Y X B B X
• The predicted value equals the proportion of
observations for which Y|X = 1; P is theprobability of Y = 1(A SUCCESS) given X, and
Q = 1- P (A FAILURE) given X.
1ˆ ˆ( ) |
Y Y P X
An alternative – the ogive function
• An ogive function is a curved s-shaped
function and the most common is the
logistic function which looks like:
-
8/9/2019 Analisis Regresi Linier Dan Logistik
15/22
26/05/1
1
The logistic function
The logistic function
• Where Y-hat is the estimated probability
that the ith case is in a category and u isthe regular linear regression equation:
1
u
i u
eY
e
1 1 2 2 K K u A B X B X B X
-
8/9/2019 Analisis Regresi Linier Dan Logistik
16/22
26/05/1
1
The logistic function
0 1 1
0 1 1
ˆ
1
b b X
i b b X
e
e
The logistic function
• Change in probability is not constant
(linear) with constant changes in X
• This means that the probability of a
success (Y = 1) given the predictor
variable (X) is a non-linear function,specifically a logistic function
-
8/9/2019 Analisis Regresi Linier Dan Logistik
17/22
-
8/9/2019 Analisis Regresi Linier Dan Logistik
18/22
26/05/1
1
Logistic Function
• Constant regression
constant different
slopes
– v2: b0 = -4.00
b1 = 0.05 (middle)
– v3: b0 = -4.00
b1 = 0.15 (top) – v4: b0 = -4.00
b1 = 0.025 (bottom)10090807060504030
1.0
.8
.6
.4
.2
0.0
V4
V1
V3
V1
V2
V1
Logistic Function
• Constant slopes
with different
regression
constants
– v2: b0 = -3.00
b1 = 0.05 (top)
– v3: b0 = -4.00
b1 = 0.05 (middle)
– v4: b0 = -5.00
b1 = 0.05 (bottom)
10090807060504030
1.0
.9
.8
.7
.6
.5
.4
.3
.2
.1
0.0
V4
V1
V3
V1
V2
V1
-
8/9/2019 Analisis Regresi Linier Dan Logistik
19/22
26/05/1
1
The Logit
• By algebraic manipulation, the logistic
regression equation can be written in
terms of an odds ratio for success:
0 1 1
ˆ( 1| )exp( )ˆ(1 ( 1| )) (1 )
i
i
i
P Y X
b b X P Y X
The Logit
• Odds ratios range from 0 to positive
infinity
• Odds ratio: P/Q is an odds ratio; less
than 1 = less than .50 probability, greater
than 1 means greater than .50 probability
-
8/9/2019 Analisis Regresi Linier Dan Logistik
20/22
26/05/1
2
The Logit
• Finally, taking the natural log of both
sides, we can write the equation in
terms of logits (log-odds):
0 1 1
ˆ( 1| )ln ln
ˆ(1 ( 1| )) (1 )
P Y X b b X
P Y X
For a single predictor
The Logit
• For multiple predictors
0 1 1 2 2
ˆln
ˆ(1 ) k k
b b X b X b X
-
8/9/2019 Analisis Regresi Linier Dan Logistik
21/22
26/05/1
2
The Logit
• Log-odds are a linear function of thepredictors
• The regression coefficients go back totheir old interpretation (kind of)
– The expected value of the logit (log-odds) when X = 0
– Called a „logit difference‟; The amountthe logit (log-odds) changes, with a oneunit change in X; the amount the logitchanges in going from X to X + 1
Conversion
• EXP(logit) or = odds ratio
• Probability = odd ratio / (1 + odd ratio)
-
8/9/2019 Analisis Regresi Linier Dan Logistik
22/22
26/05/1
THANK YOU
GOOD LUCK