New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance...
Transcript of New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance...
![Page 1: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/1.jpg)
Generative Adversarial Networks (GANs) for data imbalanceArmando Vieira & Javier Rodriguez Zaurin
![Page 2: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/2.jpg)
Data imbalance in insurance - why it matters?
![Page 3: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/3.jpg)
Data imbalance in insurance - why it matters?
![Page 4: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/4.jpg)
Myths Around Insurance modelingStatistics are there for a reason...
1.
![Page 5: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/5.jpg)
▪ Data follows Poisson / overdispersed Poisson / Gamma distributions
▪ There are well defined quantities we can always compute□ mean □ standard deviation □ skewness
▪ Data is stationary (i.e. The rate at which events occur is constant)
▪ Events are independent (i.e. The occurrence of one event does not affect the probability that a second event will occur
▪ Pooling solves the issues
![Page 6: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/6.jpg)
The mean is meaningless
2.
![Page 7: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/7.jpg)
In Insurance only two types of clients matter
❏ Clients that never claims❏ Clients with large claims
![Page 8: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/8.jpg)
By focusing on the extremes we can better model the important factors
![Page 9: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/9.jpg)
Imbalance Datasets
3.
![Page 10: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/10.jpg)
Techniques to handle imbalance data
SMOTE + Tomek
![Page 11: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/11.jpg)
GANsGenerative Adversarial Networks
11
![Page 12: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/12.jpg)
GANs
![Page 13: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/13.jpg)
GANs were very effective for images, but how abouttabular data?
![Page 14: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/14.jpg)
Conditional GAN
![Page 15: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/15.jpg)
Conditional GAN
![Page 16: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/16.jpg)
CycleGAN
Horse Zebra
or ...
No Claim Claim
![Page 17: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/17.jpg)
CycleGAN for fraud detection + sparse Autoencoders
![Page 18: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/18.jpg)
Topology preserving CycleGAN
![Page 19: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/19.jpg)
Results
Precision Recall F1-score
No augmentation 0.88 0.77 0.79
SMOTE 0.94 0.79 0.85
ADASYNE 0.79 0.76 0.77
cGAN 0.90 0.85 0.87
cycleGAN 0.92 0.83 0.87
![Page 20: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/20.jpg)
Implementation Details
▪ Data has to be converted into a dense representation
▪ Domains have to be consistently related - retain same semantic features.
▪ Loads of tricks (weight clipping, regularizers, training schedule)
▪ Consistency over time (i.e. stationary)
▪ Avoid mode collapsing
▪ Diversity enforcing with conditional batch normalization and noise
![Page 21: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/21.jpg)
How to train GANs
▪ Small (decreasing) learning rate
▪ Different optimizers for encoder and generator
▪ Train generator multiple times for each discriminator
▪ Batch normalization and Instance Batch Normalization
![Page 22: New Generative Adversarial Networks (GANs) for data imbalance · 2020. 4. 21. · 2. In Insurance only two types of clients matter Clients that never claims Clients with large claims.](https://reader034.fdocuments.net/reader034/viewer/2022051910/5ffec8e8221f0b549464b7a3/html5/thumbnails/22.jpg)
Conclusions:
▪ Focus on the extremes (less data, better results)
▪ Use augmentation to alleviate imbalance data
▪ GANS can be effective under some conditions