Statistical Arbitrage by Pair Trading using Clustering and ...
Statistical Arbitrage
description
Transcript of Statistical Arbitrage
Statistical Arbitrage
Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu
March, 2010
Outline
Overview of the project Implement issues Data adjustment mistakes Stocks classification Future work
Framework
RawHistoricalData From WRDS
PCAEigenportfolio
s
PCAEigenportfolio
s
Residuals as
increments of AR
process
Residuals as
increments of AR
process
Compute S-
scores
Compute S-
scoresETFsfor industry
sectors
ETFsfor industry
sectors
Signal trade orders
Signal trade orders
Marketmodel
60-dayreturns
Residualprocessmodel
Current stock prices
Marketmodel
252-day returns
AdjustedStock priceSeries+indices
Data pre-processing(python scripts) Back-testing simulations (matlab scripts)
iii RFR~
Implementation Issues
Delist tomorrow Criteria: detect tomorrow’s outstanding
shares In the portfolio, close transaction Not in the portfolio, not consider trading
but still consider PCA calculating Today’s price == 0 in the middle
Not consider PCA calculating and trading In the portfolio, keep it
Implementation Issues (Cont’d)
Market Cap < 1B If already in the portfolio, keep it and
consider trading No, not consider PCA calculating and
trading Stocks picked to calculate
Eigenportfolio Today’s price != 0 Previous 252 days have nonzero prices Market Cap > 1B or already in the portfolio
0 500 1000 1500 2000 2500 3000 35000
0.5
1
1.5
2
2.5
3
3.5x 10
8 Fund Value
day
Dolla
rs
12 Dec 1994
04 Nov 2008
Data Adjustment Mistakes
Dividend adjustment
)( * 0)()(
0
0 ttPP
DIVPP old
tt
tnewt
DATE PRC SHROUT DIVAMT Adjusted Price Yahoo Adjusted
20081009 10.08 57428 . 0.98851 4.07
20081010 8.77 145887 . 0.33855 3.54
20081013 5.44 145887 5.23 0.21 5.44
20081014 5.45 145887 . 5.4501 5.45
20081015 5.14 145887 . 5.1401 5.14
20081016 5.34 145887 . 5.3401 5.34
20081017 5.33 145887 . 5.3301 5.33
20081020 6.14 145887 . 6.1401 6.14
20081021 5.96 145887 5.9601 5.96
Data Adjustment Plan
Dividend adjustment Split detection and adjustment using
CFACPR and CFACSHRDATE PRC VOL SHROUT DIVAMT FACPR FACSHR CFACPR CFACSHR
20090807 0.26 26066 23346 . . . 0.05 0.05
20090810 -7.1 176 1167 0 -0.95 -0.95 1 1
20090811 5.975 1937 1167 . . . 1 1
20090812 6.3499 3406 1167 . . . 1 1
20090813 4.78 26123 1167 . . . 1 1
20090814 4.2999 27486 1167 . . . 1 1
20090817 4.05 1658 1167 . . . 1 1
20090818 4.3 6042 1167 . . . 1 1
20090819 4.06 10015 1167 . . . 1 1
20090820 3.7972 8805 1167 . . . 1 1
Stock Classification
Using GIC (Global Industry Classification) in CRSP
10 Sectors, 24 Industry Groups, 67 Industries and 147 Sub-Industries
XXXXXXXX
Sector
Industry Groups
Industries
Sub-Industry
Stock Classification (Cont’d)
PCA eigenportfolio Weights Normalization Basic principle Find the most important
eigenvectors (15 in the paper) and normalize them by the corresponding standard deviations of each stock return
PCA algorithm by the author
Suppose X is a nxp matrix including n samples and p features;
Original algorithm: Calculate the Eigen-decomposition of
the correlation matrix:
The matrix Q consists of the Eigen-vectors of the correlation matrix
1QDQXX T
PCA discussion?
Question Should the eigenvector be divided by
the sigma, the sample standard deviation?
Answer: No. (different from the paper)
PCA discussion
The meaning of “risk factor” F F should represent the market overall performance. The behavior of F should act as the “market return”
What can PCA do? PCA is mathematically defined as an orthogonal
linear transformation that transforms the data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.
PCA is theoretically the optimum transform for given data in least square terms.
PCA discussion
Derivation Notations
F =EX F :mxn matrix, represents the eigenportifolio E: mxp matrix, first m important eigenvectors X: pxn matrix, contains the stock return m: 15 in the paper n: the number of days, (samples) p: the number of stocks
PCA discussion
Derivation
The i-th row of the eigenportfolio The variation should be maximized under the
constraint that
to be maximized, then
That is to say, the weighting factor should be the eigenvectors rather than the eigenvectors divided by the standard deviation. (The experiment is the same without dividing)
XEF ii
Tii EE 1
1 Tii
Tii EEEE
0 pi IE
Experiment result
Top 50 eigenvalues of the correlation
matrix of market returns computed on May 1 2007 estimated using a 1-year window and a universe of 1590 stocks
0 10 20 30 40 50 600
0.05
0.1
0.15
0.2
0.25
Value of the first eigenvector
0 200 400 600 800 1000 1200 1400 1600
-0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
Valu
e o
f eig
envecto
rs
Future work
Data adjustment Experiment on ETF
Compare ETF with PCA Take into account
Transaction fee, interest, dividend Volume
THANK YOU