Isi2007 nn shc_2007
-
Upload
nicolas-navet -
Category
Technology
-
view
136 -
download
0
description
Transcript of Isi2007 nn shc_2007
1
1
Financial Data Mining with Genetic Programming:
a Survey and Look Forward
Nicolas NAVETNicolas NAVET –– INRIAINRIAFrance France [email protected]@loria.fr
ShuShu--HengHeng CHENCHEN –– AIECON/NCCU AIECON/NCCU [email protected]@nccu.edu.tw
ISI 2007 ISI 2007 -- 08/23/200708/23/2007
2
Genetic programmingGenetic programming
Generate a population of
random programs
Evaluate their quality (“fitness”)
Create better programs by applying genetic operators, eg
- mutation- combination (“crossover”)
GP is the process of evolving a population of computer programs, that are candidate solutions,
according to the evolutionary principles
Solution
2
3
In GP, programs are In GP, programs are represented by trees represented by trees
Trading system: buy if
functions
terminals
abs(Close(t)/0.7748) < Close(t− 218)
4
Typical genetic operator: Typical genetic operator: standard crossover standard crossover
Standard crossover : exchange two randomly chosen sub-trees among the parents
+
3
5
Strong points of GP Strong points of GP
Solutions are produced under a symbolic formSolutions are produced under a symbolic formthat can be analyzed by humansthat can be analyzed by humans
GP does not assume a predefined size and shape: GP does not assume a predefined size and shape: it creates bothit creates both the functional form and the the functional form and the parameters’ valuesparameters’ values
“Ability to produce a large number of different, “Ability to produce a large number of different, yet meaningful hypotheses .. that are nonyet meaningful hypotheses .. that are non--intuitive and sometimes provocative” [Kei02] intuitive and sometimes provocative” [Kei02]
6
G.P. in the financial domainG.P. in the financial domain
1.1. Knowledge discovery :Knowledge discovery : results are scarceresults are scarce
Agent based modeling:Agent based modeling: study the evolution of study the evolution of a population of decision rulesa population of decision rulesTesting the EMHTesting the EMH in real and artificial marketsin real and artificial markets
2.2. Financial trading :Financial trading :Composing portfoliosComposing portfoliosEvolving structure of NN used for predictionEvolving structure of NN used for predictionPredicting price evolutionPredicting price evolutionDiscovering trading rulesDiscovering trading rules
4
7
Discovering trading rules : Discovering trading rules : the big picturethe big picture
1 ) Creation of the trading rules using GP
2) Selection of the best resulting strategies
Further selection on unseen data
-
One strategy is chosen for
out-of-sample
Performance evaluation
Training interval
Validation interval
Out-of-sample interval
8
Improvements ahead of us (1/2)Improvements ahead of us (1/2)
1.1. Rigorous assessment of the GP Rigorous assessment of the GP outcomesoutcomes: controlling the data: controlling the data--mining mining bias!bias!
2.2. Selecting the right time seriesSelecting the right time series: market : market can be efficientcan be efficient
3.3. Reducing variabilityReducing variability of the results from of the results from GP run to GP runGP run to GP run
4.4. ReRe--thinking the datathinking the data--division schemedivision scheme for for training, validation and testing periodstraining, validation and testing periods
5
9
Improvements ahead of us (2/2)Improvements ahead of us (2/2)
5.5. PrePre--processing the data ?!?processing the data ?!?
6.6. ReRe--thinking fitness functionsthinking fitness functions : GP: GP--friendly, sensitivity and risk adjusted, … friendly, sensitivity and risk adjusted, …
7.7. Embedding more domain specific Embedding more domain specific knowledgeknowledge : GP function set is still very : GP function set is still very primitive .. primitive ..
10
1.1. Rigorous assessment of the GP Rigorous assessment of the GP outcomesoutcomes
6
11
GP’s outcomes on the training GP’s outcomes on the training interval (1/2)interval (1/2)
Assume an “inefficient” solution leads to a Assume an “inefficient” solution leads to a profitable trade with probability 0.5profitable trade with probability 0.5
Number of trades
Success rate
Probability than an inefficient system achieves a given success rate for a given number of trades
10 50 10060% 0.38 0.1 0.0370% 0.17 3 · 10−3 4 · 10−5
Guideline :Guideline : penalize or discard systems with few penalize or discard systems with few tradestrades
12
GP’s outcomes on the training GP’s outcomes on the training interval (2/2)interval (2/2)
Number of trades
Number of solutions tested
Probability than at least one inefficient system achieves a success rate = 70% for a given number
of solutions
NB :NB : in a typical GP run, 50000 solutions are tested in a typical GP run, 50000 solutions are tested and the average number of trades is usually small … and the average number of trades is usually small …
10 50 100100 1 0.28 0.0041000 1 0.96 0.3850000 1 1 0.85
7
13
GP’s outcomes on the testing GP’s outcomes on the testing period period [ChNa07][ChNa07]
Compare GP with several variants ofRandom search algorithms
“Zero-Intelligence Strategies” - ZISRandom trading behaviors
“Lottery trading” - LT
Statistical hypotheses testingNull : GP does not outperform ZISNull : GP does not outperform LT
Issue : how to best constrain randomness ?
14
2.2. Selecting the Right Time SeriesSelecting the Right Time Series
Experiments [CIEF2007]:Experiments [CIEF2007]:Does low entropy imply better Does low entropy imply better
profitability of GPprofitability of GP--induced induced GP Trading Rules ?GP Trading Rules ?
NYSE US 100 Stocks NYSE US 100 Stocks Daily Data from 2000 to 2006Daily Data from 2000 to 2006
8
15
Experimental setup Experimental setup
Entropy rate estimator: Entropy rate estimator: KontoyannisKontoyannis et al 1998et al 1998
rt = ln(ptpt−1
)
Discretization:
3,4,1,0,2,6,2,…
{rt} ∈ R→ {At} ∈ N
alphabet of size 8 - equal number of values in each bin max. theoretical entropy = 3
16
Entropy of NYSE US 100 stocks Entropy of NYSE US 100 stocks ––period 2000period 2000--20062006
entropy
Den
sity
2.66 2.68 2.70 2.72 2.74 2.76 2.78 2.80
05
1015
2025
NB : a normal distribution of same mean and standard deviation is plotted for comparison.
Mean = Median = 2.75
Max = 2.79
Min = 2.68
Rand() boost = 2.96
Rand() C lib = 2.77 !
9
17
Entropy is high but price time Entropy is high but price time series are not random! series are not random!
Entropy (original data)
Den
sity
2.65 2.70 2.75 2.80 2.85
010
2030
4050
Original time series
Entropy (shuffled data)D
ensi
ty
2.65 2.70 2.75 2.80 2.85
010
2030
4050
Randomly shuffled time series
18
Stocks in the distribution’s tailsStocks in the distribution’s tails
Symbol EntropyTWX 2.677EMC 2.694C 2.712JPM 2.716GE 2.723
Highest entropy time series
Lowest entropy time series
Symbol EntropyOXY 2.789VLO 2.787MRO 2.785BAX 2.78WAG 2.776
10
19
Up to a lag 100, there are 2.7 x more autocorrelations outside the 99% confidence bands for the lowest entropy stocks than for the highest entropy stocks
Autocorrelation analysisAutocorrelation analysisLow complexity
stock (C)High complexity
stock (OXY)
20
BDS tests: are daily log price BDS tests: are daily log price changes changes i.i.di.i.d ??
Lowest entropy time series
m δ OXY V LO MRO BAX WAG2 1 5.66 4.17 6.69 8.13 7.453 1 6.61 5.35 9.40 11.11 8.895 1 9.04 6.88 13.08 15.31 11.17
Highest entropy time series
Null that log price changes are i.i.d. always rejected at 1% level but - whatever BDS parameters - rejection is much stronger for high-entropy stocks
m δ TWX EMC C JPM GE2 1 18.06 14.21 13.9 11.82 11.673 1 22.67 19.54 18.76 16.46 16.345 1 34.18 29.17 28.12 26.80 24.21
11
21
Results: surprisingly .. Results: surprisingly ..
On highOn high--entropy stocksentropy stocksGP is always profitable
LT is never better than GP (95% confidence level)
GP outperforms LT 2 times out of 5 (95% C.L.)
On lowOn low--entropy stocksentropy stocksGP is never better than LT (95% C.L.)
LT outperforms GP 2 times out of 5 (95% C.L.)
22
Explanations (1/2) Explanations (1/2)
GP is not good when training period is very GP is not good when training period is very different from outdifferent from out--ofof--samplesample e.g.e.g.
2000 2006 2000 2006
Typical low complexity stock (EMC)
Typical high complexity stock (MRO)
12
23
Explanations (2/2) Explanations (2/2)
The 2 cases where GP outperforms LT : The 2 cases where GP outperforms LT : training training quite similar to outquite similar to out--ofof--samplesample
BAX WAG
2000 2006 2000 2006
24
4.4. ReRe--thinking data division thinking data division schemescheme
13
25
Data division schemeData division scheme
There is multiple evidence that GP performs poorly when training interval ≠ from the out-of-sample interval …
What is needed: characterization of the market condition – similarity measure
Re-learning triggered when similarity or performances below a threshold
26
5.5. ReRe--thinking fitness functionsthinking fitness functions
14
27
Rethinking fitness Rethinking fitness functionsfunctions
from [LaPo02]
Issue 1 : some fitness functions induce a “difficult" landscape for GP GP-friendly fitness
Issue 2 : a few lucky trades alone may lead to an outstanding return risk-adjusted fitness
Issue 3 : solutions located on peaks of the fitness landscape are not robust out-of-sample
sensitivity-adjusted fitness
28
7.7. Embedding more domain specific Embedding more domain specific knowledgeknowledge
15
29
Embedding more domain specific Embedding more domain specific knowledgeknowledge
Choice of the function/terminal sets is crucial – no guidelines - 2 risks:
Extraneous functionsRequired functions not available
As yet, GP uses a very primitive language
Enrich primitive set with volume, indexes, bid/ask spread, …
Enrich function set with cross-correlation, predictability measure, …
30
References (1/2)References (1/2)[ChKuHo06][ChKuHo06] S.S.--H. Chen and T.H. Chen and T.--W. W. KuoKuo and K.and K.--M. Hoi. “Genetic M. Hoi. “Genetic Programming and Financial Trading: How Much about "What we Programming and Financial Trading: How Much about "What we Know“”. In 4th NTU International Conference on Economics, Know“”. In 4th NTU International Conference on Economics, Finance and Accounting, April 2006.Finance and Accounting, April 2006.[ChNa06][ChNa06] S.S.--H. Chen and N. Navet. “Pretests for geneticH. Chen and N. Navet. “Pretests for genetic--programming evolved trading programs : “zeroprogramming evolved trading programs : “zero--intelligence” intelligence” strategies and lottery trading”, Proc. ICONIP’2006, Hongstrategies and lottery trading”, Proc. ICONIP’2006, Hong--Kong, Kong, October 2006October 2006[ChNa07][ChNa07] S.S.--H. Chen, N. Navet, "Failure of GeneticH. Chen, N. Navet, "Failure of Genetic--Programming Programming Induced Trading Strategies: Distinguishing between Efficient Induced Trading Strategies: Distinguishing between Efficient Markets and Inefficient Algorithms", Chapter 8, Evolutionary Markets and Inefficient Algorithms", Chapter 8, Evolutionary Computation in Economics and Finance: Volume 2, Springer, Computation in Economics and Finance: Volume 2, Springer, ISBN3540728201, 2007.ISBN3540728201, 2007.[NaCh07][NaCh07] N. Navet, S.N. Navet, S.--H. Chen, "Entropy rate and profitability of H. Chen, "Entropy rate and profitability of technical analysis: experiments on the NYSE US 100 stocks", 6th technical analysis: experiments on the NYSE US 100 stocks", 6th International Conference on Computational Intelligence in International Conference on Computational Intelligence in Economics & Finance (CIEF2007), SaltEconomics & Finance (CIEF2007), Salt--Lake City, USA, July 2007.Lake City, USA, July 2007.[Kab02][Kab02] M. M. KaboudanKaboudan, “GP Forecasts of Stock Prices for Profitable , “GP Forecasts of Stock Prices for Profitable Trading”, Evolutionary computation in economics and finance, Trading”, Evolutionary computation in economics and finance, KluwersKluwers, 2002., 2002.
16
31
References (2/2)References (2/2)
[SaTe02][SaTe02] M. M. SantiniSantini, A. , A. TettamanziTettamanzi, “Genetic Programming for , “Genetic Programming for Financial Series Prediction”, Proceedings of EuroGP'2001, 2001.Financial Series Prediction”, Proceedings of EuroGP'2001, 2001.[BhPiZu02][BhPiZu02] S. Bhattacharyya, O. V. S. Bhattacharyya, O. V. PictetPictet, G. , G. ZumbachZumbach, , “Knowledge“Knowledge--Intensive Genetic Discovery in Foreign Exchange Intensive Genetic Discovery in Foreign Exchange Markets”, IEEE Transactions on Evolutionary Computation, Markets”, IEEE Transactions on Evolutionary Computation, volvol 6, 6, n° 2, April 2002.n° 2, April 2002.[LaPo02][LaPo02] W.B. Langdon, R. W.B. Langdon, R. PoliPoli, “, “FondationsFondations of Genetic of Genetic Programming”, Springer Programming”, Springer VerlagVerlag, 2002., 2002.[Kab00][Kab00] M. M. KaboudanKaboudan, “Genetic Programming Prediction of Stock , “Genetic Programming Prediction of Stock Prices”, Computational Economics, vol16, 2000.Prices”, Computational Economics, vol16, 2000.[Wag03][Wag03] L. L. WagmanWagman, “Stock Portfolio Evaluation: An Application , “Stock Portfolio Evaluation: An Application of Geneticof Genetic--ProgrammingProgramming--Based Technical Analysis”, Genetic Based Technical Analysis”, Genetic Algorithms and Genetic Programming at Stanford 2003, 2003.Algorithms and Genetic Programming at Stanford 2003, 2003.[Dem05][Dem05] I. Dempsey, “Constant Generation for the Financial I. Dempsey, “Constant Generation for the Financial Domain using Grammatical Evolution”, Proceedings of the 2005 Domain using Grammatical Evolution”, Proceedings of the 2005 workshops on Genetic and evolutionary computation 2005, pp workshops on Genetic and evolutionary computation 2005, pp 350 350 –– 353, Washington, June 25 353, Washington, June 25 -- 26, 2005.26, 2005.[Kei02][Kei02] M. M. KeijzerKeijzer, “Scientific discovery using Genetic , “Scientific discovery using Genetic Programming”, Programming”, PhdPhd Thesis, DTU, Thesis, DTU, LyngbyLyngby, Denmark, 2002. , Denmark, 2002.
32
?