Random Artificial Incorporation of Noise in a Learning Classifier System Environment.
-
Upload
daniele-loiacono -
Category
Health & Medicine
-
view
409 -
download
1
description
Transcript of Random Artificial Incorporation of Noise in a Learning Classifier System Environment.
Random Artificial Incorporation of Noise in a Learning Classifier System
Environment
Ryan J. Urbanowicz, Nicholas A. Sinnott-Armstrong, and Jason H. Moore
Dartmouth Medical School
GECCO Dublin, Ireland - 2011
Genetic Epidemiology• Association Study (Case/Control)
• Single Nucleotide Polymorphism (SNP)
• Allele & Genotype
Subject #1
-- AGGTCA ---- AGGTCA --
Subject #2
-- AGGTCA ---- AGCTCA --
Subject #3
-- AGCTCA ---- AGCTCA --
Two alleles (G and C)
Three genotypes (GG, GC, CC)
Encode genotypes (0, 1, 2)
“One SNP at a time approach”
“Complex systems approach”
Epistasis
SNP1
Disease
SNP2 SNP3
Disease Disease
SNP1 SNP2 SNP3
Disease
Main Effects
AA(.25) Aa(.5) aa(.25)
BB(.25) 0 1 0 0.5
Bb(.5) 1 0 1 0.5
bb(.25 0 1 0 0.5
0.5 0.5 0.5
AA(.25) Aa(.5) aa(.25)
SNP X 0 0 1
SNP 1
SNP
2
Marginal Penetrance(s)
Genetic Heterogeneity
G8G7
G6
Sample population
G5G4G3
G2G1
• Evidence of GH in… – Autism
– Schizophrenia
– Breast Cancer
– Alzheimer disease
– Tuberous sclerosis
– Cystic Fibrosis
– Asthma
– And many, many others…
Learning Classifier Systems
Urbanowicz 2009 LCS: A Complete Introduction, Review, and Roadmap. Journal of Artificial Evolution and Applications
Population [P]
Environment
Detectors
Match Set [M]
Action Set [A] Effectors
Action
Selection
Genetic
Algorithm
Learning Strategy
Credit Assignment
Covering
Prediction Array
Classifiern = Condition : Action :: Parameter(s)
Discovery Component
Performance Component
Reinforcement Component
1
2
3
4
5
6
8
10
Reward
Action Performed
Classifierm
Classifiera
[A]t-1Classifiert-1
7
9
• Autonomous Robotics• Complex Adaptive Systems• Function Approximation• Classification• Data Mining
• Effective Generalization – Maximizing rule generality while preserving accuracy. (Testing Acc. = Training Acc.)
• Examples of LCS Generalization– Generalization Hypothesis (Wilson 1995)– Action Set GA & Subsumption (Wilson 1998)– Hierarchical Selection operator (Bacardit/Garrell 2002)– Windowing (Bacardit et. al. 2004)– Minimum Description Length (Bacardit/Garrell 2007)– Ensemble LCS (Gao et. al. 2005)
• Noisy Problem domains –– Over-fitting become a particularly important problem– Classification Noise - < 100% testing accuracy possible– Attribute Noise – attributes which contribute nothing to testing accuracy.
Effective Generalization
1 0 # # # 0 0 # # # - 10 2 # # 1 # # # # # - 0
• Given: a noisy problem, LCS’s with accuracy-based fitness will tend to over-fit (learn structure idiosyncratic to the training dataset).
• Consider: datasets with a small sample size would be particularly susceptible to this (online learning repeatedly considers same samples).
• If: we probabilistically incorporate variable noise into the incoming training instances, than every epoch of learning, the Michigan LCS is exposed to a randomly permuted version of the original dataset. Leads to an artificially inflated sample size.
• Hypothesis: The incorporation of low levels of random classification noise will discourage over-fitting, and promote effective generalization
Hypothesis:
RAIN Random Artificial Incorporation of Noise
Environment
0210110220 -- 1
Population [P]
Match Set [M] Correct Set [C]
0210120220 -- 1
Temporal Models• Pm = Maximum Permutation Prob.•Pc = Current Permutation Prob.•Im = Maximum Iteration•Ic = Current Iteration
•Uniform
•Linear
•Inverse Linear
•Gaussian
Power Estimation
1 0 # # # 0 0 # # # - 10 2 # # 1 # # # # # - 0# # # 1 0 2 1 # # # - 0# # 1 0 # # # # # # - 00 1 # # # # # # 2 # - 1# # 0 0 # # # 0 0 1 - 11 2 0 0 1 # # # # 2 - 11 1 1 # # # # # # # - 02 # 2 # # 1 # 0 # # - 0# 1 # # # # # # # # - 1# # 1 1 # # # 2 # # - 0
5 5 5 6 8 8 9 8 9 9
2 1 0
2 0 1 0
1 1 0 1
0 0 1 0
SNP 2
SNP
1
2 1 0
2 0 0 0
1 0 0 0
0 0 0 1
SNP 4
SNP
3
Model 1
Model 2
1 2 3 4 . . .
Power at the CV level:> 50% of the 10 CV runs
• Idea: Strategically and automatically avoid destructively adding noise to attributes more likely to be important to classification.
• Probabilistically targets attributes which are more frequently generalized (rather than specified)
• Pc = Pm
• Two Implementations (Weight lists generated differently)– Targeted Generality (TG)– Targeted Fitness Weighted Generality (TFWG)
• Noise Generation (same for both implementations)– First epoch – no noise– Weight list recalculated at the end of each epoch.– Subtract minimum weight in list from all values in list.– Determine number of attributes to be permuted (Random < Pm)– Choose attribute - Roulette wheel selection
Targeted RAIN
Experimental Evaluation• UCS –
– Iterations = [50000,100000, 200000, 500000]
– Micro Pop. Size = 1600
– Other parameters are default
– Track: Training Acc. Testing Acc. Generality, Macro Pop. Size, Run Time
Power to find both or a single underlying model
– Pm = 0.001, 0.01, 0.05, 0.1
• Each Dataset
– Main effect free
– 2X two-locus epistatic interation
– 20 Attributes
– Balanced
– Minor allele frequencies = 0.2
– Heritability = 0.2
– Mix Ratio = 50:50
– Sample Sizes [200, 400, 800, 1600]
– 20 Replicates
• 80 Simulated Datasets + (10 Fold CV) 800 runs of UCS
G1 G2 G3 G4
• Incorporation of RAIN with equal attribute probability is ineffective.
• Targeted RAIN was able to reduce over-fitting (sig. decrease in training accuracy without reducing testing accuracy.
• Improvements in power (not significant) suggest that RAIN may improve UCS’s ability to identify predictive attributes.
• Try RAIN on datasets with much larger numbers of attributes
• Consider the combination of targeted RAIN with temporal models
• Explore a larger range of Pm values
• Implement RAIN with an adaptive Pm
Conclusions & Future Work
Acknowledgements
Jason Moore &Nicholas A. Sinnott-Armstrong
Funding SupportNIH: AI59694, LM009012, LM010098William H. Neukom 1964 Institute for
Computational Science at Dartmouth College
Quaternary Rule Representation
[ #, 0, 1, 2]