Prediction the stock market with genetic programming

39
Predicting the Stock Market with Genetic Programming David Moskowitz, Ph.D. Infoblazer LLC Data Scientists in Stamford Westport January 25, 2016

Transcript of Prediction the stock market with genetic programming

Page 1: Prediction the stock market with genetic programming

Predicting the Stock Market with Genetic Programming

David Moskowitz, Ph.D.

Infoblazer LLC

Data Scientists in Stamford Westport

January 25, 2016

Page 2: Prediction the stock market with genetic programming

Disclaimer

• The following is my opinion only

• It is not the opinion of my employer

• It is not related to any work done at my employer

You should not make any decision, financial, investments, trading or otherwise, based on any of the information presented without undertaking independent due diligence and consultation with a professional broker or competent financial advisor. You understand that you are using any and all information presented at your own risk.

Page 3: Prediction the stock market with genetic programming

Agenda

• What is Genetic Programming?

• Time Series Prediction

• Stock Market Prediction

• Other Issues• Modularity

• Linear GP

• Genetic Algorithms

• Demonstrations

Page 4: Prediction the stock market with genetic programming

What is Genetic Programming?

• Get a computer to do something without telling it how to do it

• Breeds a population of computer programs that improve over time

• Evolution• Genetic Operators• Survival of the Fittest

• Stochastic component• Non-Greedy• Creativity• Novel solutions

• Size and shape of solution unknown

• Limited only by what can be represented as a computer program

(Koza et al., 2006, p. 11)

Page 5: Prediction the stock market with genetic programming

Example: Design of a Satellite Boom

• Designed using a genetic algorithm

• 20,000+% improvement in frequency averaged energy levels

(Keane, 1996)

Page 6: Prediction the stock market with genetic programming

History

• Visionaries• Samuel 1959 – Goal of AI• Turing 1948 – Evolutionary search, gene combination, survival of the fittest

• Evolutionary Algorithms , 1962-• mutation , populations,

• Genetic Algorithms, 1973-• John Holland• Crossover

• Genetic Programming, 1989-• John Koza• Best way to represent a computer program is a computer program

Page 7: Prediction the stock market with genetic programming

How GP Works

• Preparatory Steps• Primitives

• Fitness Function(s)

• Initialize Population

• Evolve Population• Calculate population fitness

• Select next generation

• Termination Condition

(Poli et al.,2008, p. 2)

Page 8: Prediction the stock market with genetic programming

GP Representation

• LISP

• (max (+ x x) (+ x (* 3 y)))

• max(x+x,x+3*y)

MAX

X

+ +

*X X

3 y

Page 9: Prediction the stock market with genetic programming

GP Operations

• Probabilistically select an operation

• Crossover• Switch two nodes on different individuals

• Mutation• Randomly modify an individual (node)

• Reproduction• Copy parent, as is, to next generation

Page 10: Prediction the stock market with genetic programming

GP Selection

• Need to select one or two individuals for genetic operations

• Selection is probabilistic

• Fitness Proportional Selection

• Tournament Selection

Page 11: Prediction the stock market with genetic programming

Crossover

+

+

X y

3

*

+

y 1

/

X 2

+

+

X y

3

*

+

y 1

/

X 2

Parents

Children

Page 12: Prediction the stock market with genetic programming

Mutation*

+

y 1

/

X 2

Parent

Child

*

+

y 1

*

2 MIN

X y

Page 13: Prediction the stock market with genetic programming

Demo – Symbolic Regression

• Curve Fitting

• 𝑥3 − 𝑥2 + 𝑥 − 4

• Prove that GP works

-200000

0

200000

400000

600000

800000

1000000

1200000

0 10 20 30 40 50 60 70 80 90

Y

XDemo a, b

Page 14: Prediction the stock market with genetic programming

Chaotic Series

• Look random, but are deterministic

• Highly dependent on initial conditions• Difficult to predict

𝑌𝑡 = 𝑠𝑖𝑛 𝑥 − 130 + 𝑥 + 130 x=0: 𝑌 0 = 0.9

X>0: 𝑌 𝑡+1 = 4𝑌𝑡(1 − 𝑌𝑡)

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50 60 70 80 90

Y

X

9

10

11

12

13

14

15

16

17

0 10 20 30 40 50 60 70 80 90

Y

X

Page 15: Prediction the stock market with genetic programming

Demo- Chaotic Series Symbolic Regression

Demo c, d, e, f

Page 16: Prediction the stock market with genetic programming

Regime Change

• Goal is to uncover underlying data generating process• This can change over time

-15

-10

-5

0

5

10

0

14

28

42

56

70

84

98

11

2

12

6

14

0

15

4

16

8

18

2

19

6

Y

X

0<=x<70: 𝑌𝑡 = 𝑠𝑖𝑛 𝑥 + 𝑥70<=x<130: 𝑌𝑡 = 𝑐𝑜𝑠 𝑥 − 𝑥

130<=x<200: 𝑌𝑡 = 𝑠𝑖𝑛 𝑥 − 130 + 𝑥 − 130

-3

-2

-1

0

1

2

3

0

20

40

60

80

10

0

12

0

14

0

16

0

18

0

20

0

22

0

24

0

26

0

28

0

30

0

32

0

34

0

36

0

38

0

40

0

Y

X

x<200, x>=300: 𝑌 𝑡+1 = 4𝑌𝑡(1 − 𝑌𝑡)

200<=x<300: 𝑌 𝑡+1 = 1.8708𝑌𝑡 − 𝑌𝑡−1

S&P 500 index close price during the stock market

crash of 2008 (Yahoo, 2013).

𝑌𝑡 = 𝑓 𝑊𝑇𝐹 ?

Page 17: Prediction the stock market with genetic programming

Demo- Symbolic Regression Regime Change

Demo g

Page 18: Prediction the stock market with genetic programming

Time Series Prediction

• Train on past values

• Predict future values

• Retrain periodically

Page 19: Prediction the stock market with genetic programming

Demo- Chaotic Series Prediction

Demo h, I, j

Page 20: Prediction the stock market with genetic programming

Market Prediction

• S&P 500 Long-Flat (Invest-don’t invest)

• Ignore transaction costs

• Ignore out of market returns

• Predictors• S&P 500 Price

• S&P 500 Volume

• 250-day MA normalized0

0.2

0.4

0.6

0.8

1

1.2

1.4

0

500

1000

1500

2000

2500

S&P 500 Index Series

Index Value 250 Day Moving Average 250-day MA Normalized

Page 21: Prediction the stock market with genetic programming

GP is a Perfect Match for Market Prediction

• “The interrelationships among the relevant variables is unknown or poorly understood (or where it is suspected that the current understanding may possibly be wrong).”

• “Finding the size and shape of the ultimate solution is a major part of the problem.”

• “Significant amounts of test data are available in computer-readable form.”

• “There are good simulators to test the performance of tentative solutions to a problem, but poor methods to directly obtain good solutions.”

• “Conventional mathematical analysis does not, or cannot, provide analytic solutions”

• “An approximate solution is acceptable (or is the only result that is ever likely to be obtained)”

• “Small improvements in performance are routinely measured (or easily measurable) and highly prized.”

(Poli et al.,2008, pp. 111-113)

Page 22: Prediction the stock market with genetic programming

Primitive Set

• Hundreds of indicators• Ex. (http://www.investopedia.com/active-trading/technical-indicators/)

• Include common technical analysis indicators• Momentum- compare to recent average

• Breakout- compare to recent minimum/maximum

• Ex. Buy if current price risen by 2% over minimum price last 30 days

• Prefer low level functions

• Better results possible with higher level , packaged indicators?

Page 23: Prediction the stock market with genetic programming

Primitive set

• Functions• Add• Subtract• Multiply• Divide• Gt• Lt• And• Or• Not• offsetValue• ifElseBoolean• movingAverage• periodMaximum• periodMinimum• AbsoluteDifference

• Terminals• randomInteger(low high)

• randomDouble(low high)

• True

• False

• offsetValue(0)

• Hundreds of other technical indicators(http://www.investopedia.com/active-trading/technical-indicators/)

Page 24: Prediction the stock market with genetic programming

Training Approaches

• Train-Predict-Retrain

• Train-Test-Predict

• Multiple Runs

Train M

Test

N

Predict

1

Train

Predict

M

N (1)

Demo k, l, m

Page 25: Prediction the stock market with genetic programming

Experiment- Market Prediction

• Investment decisions in S&P 500 Index

• Modeled after (Chen et al., 2008), 1988-2004

• Long-Flat decisions

• Normalized by 250-day moving average

• Fitness = investment gain

0

200

400

600

800

1000

1200

1400

1600

19

88

-01

-04

19

88

-08

-16

19

89

-03

-31

19

89

-11

-10

19

90

-06

-27

19

91

-02

-08

19

91

-09

-24

19

92

-05

-07

19

92

-12

-18

19

93

-08

-04

19

94

-03

-17

19

94

-10

-31

19

95

-06

-15

19

96

-01

-29

19

96

-09

-11

19

97

-04

-25

19

97

-12

-08

19

98

-07

-24

19

99

-03

-10

19

99

-10

-21

20

00

-06

-06

20

01

-01

-19

20

01

-09

-04

20

02

-04

-25

20

02

-12

-06

20

03

-07

-24

20

04

-03

-09

20

04

-10

-21

20

05

-06

-07

S&P 500

0.7

0.8

0.9

1

1.1

1.2

1.3

19

88

-12

-27

19

89

-08

-10

19

90

-03

-26

19

90

-11

-06

19

91

-06

-21

19

92

-02

-04

19

92

-09

-17

19

93

-05

-03

19

93

-12

-14

19

94

-07

-29

19

95

-03

-14

19

95

-10

-25

19

96

-06

-10

19

97

-01

-22

19

97

-09

-05

19

98

-04

-22

19

98

-12

-03

19

99

-07

-21

20

00

-03

-03

20

00

-10

-16

20

01

-06

-01

20

02

-01

-22

20

02

-09

-05

20

03

-04

-22

20

03

-12

-03

20

04

-07

-21

20

05

-03

-04

20

05

-10

-17

20

06

-06

-02

Normalized

Page 26: Prediction the stock market with genetic programming

Experiment- Market Prediction

• Training-validation-prediction approach (T-V-P)

• Training-prediction approach (T-P)• Training - 1989-1998

• Prediction- 1999-2004

(Chen et al., 2008, p. 110)

Page 27: Prediction the stock market with genetic programming

Results T-V-P w/Trans Cost

Method Mean Std. Dev. Min Max 95% CI

# beating

benchmark

1999-2000

Buy & Hold 0.0751

GP 0.0434 0.0664 -0.1917 0.1197 [0.0250 ... 0.0618] 5/50

ADF 0.0309 0.0798 -0.3054 0.0845 [0.0088 ... 0.0530] 3/50

ADT 0.0510 0.0519 -0.1974 0.1042 [0.0366 ... 0.0654] 5/50

2001-2002

Buy & Hold -0.3144

GP -0.3693 0.1306 -0.8087 -0.2885 [-0.4055 ... -0.3331] 1/50

ADF -0.3347 0.0887 -0.7290 -0.1777 [-0.3593 ... -0.3102] 2/50

ADT -0.3697 0.1390 -0.7450 -0.0134 [-0.4082 ... -0.3312] 1/50

2003-2004

Buy & Hold 0.3332

GP 0.2945 0.0497 0.1432 0.3291 [0.2807 ... 0.3083] 0/50

ADF 0.3139 0.0390 0.1170 0.3539 [0.3031 ... 0.3247] 1/50

ADT 0.3247 0.0150 0.2349 0.3522 [0.3205 ... 0.3289] 2/50

(Moskowitz, 2016, p. 119)

Page 28: Prediction the stock market with genetic programming

Results T-V-P wo/Trans Cost

Method Mean Std. Dev. Min Max 95% CI

# beating

benchmark

1999-2000

Buy & Hold 0.0751

GP 0.1494 0.1088 -0.0438 0.4525 [0.1192 ... 0.1795] 35/50

ADF 0.1418 0.1238 -0.0399 0.5112 [0.1075 ... 0.1761] 35/50

ADT 0.1567 0.1099 -0.0068 0.4796 [0.1262 ... 0.1871] 37/50

2001-2002

Buy & Hold -0.3144

GP -0.3121 0.0573 -0.4081 -0.0348 [-0.3280 ... -0.2962] 17/50

ADF -0.3023 0.0848 -0.5153 0.0196 [-0.3258 ... -0.2788] 18/50

ADT -0.2843 0.0635 -0.3924 -0.1245 [-0.3020 ... -0.2667] 32/50

2003-2004

Buy & Hold 0.3332

GP 0.3045 0.0929 0.0463 0.5045 [0.2788 ... 0.3303] 15/50

ADF 0.3395 0.1171 -0.0016 0.5597 [0.3070 ... 0.3719] 22/50

ADT 0.3329 0.1202 0.0775 0.6443 [0.2996 ... 0.3663] 29/50

(Moskowitz, 2016, p. 122)

Page 29: Prediction the stock market with genetic programming

Results T-P w/Trans Cost

(Moskowitz, 2016, p. 120)

Method Mean Std. Dev. Min Max 95% CI

# beating

benchmark

1999-2000

Buy & Hold 0.0634

ADT 0.0018 0.1372 -0.4290 0.2388 [-0.0362 ... 0.1010] 17/50

DyFor GP -0.0157 0.1101 -0.2690 0.1426 [-0.0463 ... 0.0639] 15/50

2001-2002

Buy & Hold -0.3339

ADT -0.1364 0.1014 -0.2964 0.1601 [-0.1645 ... -0.0631] 50/50

DyFor GP -0.1018 0.0819 -0.2810 0.0817 [-0.1245 ... -0.0426] 50/50

2003-2004

Buy & Hold 0.2970

ADT 0.1035 0.0653 -0.0603 0.2529 [0.0854 ... 0.1507] 0/50

DyFor GP 0.0489 0.0723 -0.1780 0.2156 [0.0289 ... 0.1012] 0/50

1999-2004

Buy & Hold -0.0189

ADT -0.0349 0.1933 -0.5395 0.4592 [-0.0884 ... 0.0187] 24/50

DyFor GP -0.0698 0.1413 -0.3597 0.2136 [-0.1089 ... -0.0306] 15/50

Page 30: Prediction the stock market with genetic programming

Results T-P wo/Trans CostMethod Mean Std. Dev. Min Max 95% CI

# beating

benchmark

1999-2000

Buy & Hold 0.0634

ADT 0.0788 0.1071 -0.1106 0.3576 [0.0491 ... 0.1562] 27/50

DyFor GP 0.0807 0.1323 -0.1904 0.3408 [0.0440 ... 0.1763] 26/50

2001-2002

Buy & Hold -0.3339

ADT -0.0524 0.1026 -0.2674 0.1521 [-0.0808 ... 0.0218] 50/50

DyFor GP -0.0594 0.0862 -0.2314 0.1020 [-0.0833 ... 0.0029] 50/50

2003-2004

Buy & Hold 0.2970

ADT 0.1246 0.0782 -0.0132 0.3739 [0.1029 ... 0.1811] 2/50

DyFor GP 0.1233 0.0702 -0.0297 0.2783 [0.1038 ... 0.1740] 0/50

1999-2004

Buy & Hold -0.0189

ADT 0.1683 0.2005 -0.1946 0.6959 [0.1128 ... 0.2239] 39/50

DyFor GP 0.1568 0.1887 -0.2618 0.5762 [0.1045 ... 0.2091] 41/50

(Moskowitz, 2016, p. 124)

Page 31: Prediction the stock market with genetic programming

ADT vs DyFor GP vs Buy and Hold

750

850

950

1050

1150

1250

1350

1450

1550

1/4/1999 1/4/2000 1/4/2001 1/4/2002 1/4/2003 1/4/2004

With Transaction Costs, 50 run mean

index value Dyfor.y_predicted ADT Sliding.y_predicted

750

850

950

1050

1150

1250

1350

1450

1550

1/4/1999 1/4/2000 1/4/2001 1/4/2002 1/4/2003 1/4/2004

Without Transaction Costs, 50 run mean

index value Dyfor.y_predicted ADT Sliding.y_predicted

ADT: +0.1683%DyFor GP: +0.1568%B&H: -0.0189%

ADT: -0.349%DyFor GP: -0.698%B&H: -0.0189%

(Moskowitz, 2016, p. 208) (Moskowitz, 2016, p. 213)

Page 32: Prediction the stock market with genetic programming

Advanced GP

• Modularity• Automatically Defined Functions

• Strongly-typed GP• Closure

• Advanced techniques• Looping• Memory store• Lambdas• Recursion• Time series regimes (Moskowitz, 2016)• Design patterns (Moskowitz, 2016)

(Koza, 1994, p. 74)

Page 33: Prediction the stock market with genetic programming

Genetic Algorithms

• Non-differentiable / nonlinear optimization problem

• Search for parameters, rules

• Size and shape prescribed

• Bit, Numeric, or other representations

• Ex. Minimize 𝑥2 − 50𝑦 + 𝑧3, x={0-31},y={0-31},z={0-15}

10011 11000 101100110 11010 1100

10010 11010 110000111 11000 1011

192 − 50 ∗ 24 + 113=49262 − 50 ∗ 26 + 123=464

182 − 50 ∗ 26 + 123=752

72 − 50 ∗ 24 + 113=180

Page 34: Prediction the stock market with genetic programming

Linear GP

• Sequence of imperative instructions

• Register-based operations

• Machine code, GPU Instructions

(Poli et al.,2008, pp. 61-65)

Page 35: Prediction the stock market with genetic programming

Demo- Symbolic Regression Regime Change

• Regime determining branch

• Regime specific functions

• Implements template method design pattern

Demo o

Page 36: Prediction the stock market with genetic programming

Next Steps

• MATLAB (GA only)

• JGAP (Java)

• DEAP (Python)

• Roll your own

• Evolutionary Signals

Page 37: Prediction the stock market with genetic programming

Not So Shameless Plug

• Evolutionary Signals • Develop Buy/Sell signals using genetic programming

• Minimal financial knowledge and no programming experience required

• Goals• Much larger number of predictor series

• Crowdsource models

• Earn revenue from high performing models

• Open beta testing

• Visit www.gpsignals.com for more information

Page 38: Prediction the stock market with genetic programming

Questions?

• Thank you!

• Contact info:LinkedIn: infoblazer

@infojester

Page 39: Prediction the stock market with genetic programming

References• Chen, S. H., Kuo, T. W., & Hoi, K. M. (2008). Genetic Programming and Financial Trading:

How Much About “What We Know.” In Handbook of financial engineering (pp. 99–154). Springer US. doi:10.1007/978-0-387-76682-9

• Keane, A. J. (1996). THE DESIGN OF A SATELLITE BOOM WITH ENHANCED VIBRATION PERFORMANCE USING GENETIC ALGORITHM TECHNIQUES. The Journal of the Acoustical Society of America, 99(4), 2599–2603.

• Koza, J. R. (1994). Genetic programming II: automatic discovery of reusable programs. MIT press.

• Koza, J. ., Keane, M. A., Streeter, M. J., Mydlowec, W., Yu, J., & Lanza, G. (2006). Genetic programming IV: Routine human-competitive machine intelligence (Vol. 5). Springer.

• Moskowitz, D. (2016). Automatically Defined Templates for Improved Prediction of Non-stationary, Nonlinear Time Series in Genetic Programming. Doctoral dissertation. Nova Southeastern University. Retrieved from http://nsuworks.nova.edu/gscis_etd/953.

• Poli, R., Langdon, W. B., McPhee, N. F., & Koza, J. R. (2008). A field guide to genetic programming. Lulu. com.