© 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords...
Transcript of © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords...
© 2020 Verisk Analytics, Inc. All rights reserved. 11
© 2020 Verisk Analytics, Inc. All rights reserved. 22
What Machine Learning Can Do for You
Shane De Zilwa, Ph.D.
Talmor Meir, Ph.D.
© 2020 Verisk Analytics, Inc. All rights reserved. 33© 2017 Verisk Analytics, Inc. All rights reserved. 3© 2017 Verisk Analytics, Inc. All rights reserved. 3
44 Zettabytes of Data in 2020That is 40 times more bytes then there are stars in the
observable universe.
© 2018 Verisk Analytics, Inc. All rights reserved. 3
© 2020 Verisk Analytics, Inc. All rights reserved. 44
Outline
❑ Analytics of big data
❑ Translating data into business value
❑ Predictive models & Data science techniques
© 2020 Verisk Analytics, Inc. All rights reserved. 55
Cross
Domain
Forecasting
Artificial
Intelligence
Telematics
Neural Network
Time Series
Audio Analytics
Classification
Social
Network
Analysis
Internal Analytics:
Sales/Marketing, HR
Deep Machine
Learning
Data
Exploration
Lifetime Value
Analysis
DA
TA
A
NA
LY
TIC
SD
ATA
AN
ALY
TIC
S
Trends &
Patterns
© 2020 Verisk Analytics, Inc. All rights reserved. 66
The 3V’s of Big Data
01How much data is
there?
Volume
03How many different
types of sources are
there?
Variety
02How quickly is data
accessed?
Velocity• Batch
• Real Time
• Periodic
• Stream
• Transactions
• Sensors
• Terabyte
• Structured
• Semi Structured
• Unstructured
3-V’s
of Data
© 2020 Verisk Analytics, Inc. All rights reserved. 77
Data Types
▪ Data that has no inherent structure
▪ e.g., video, social media text, images
▪ Data with some defining pattern but does not
conform to a table like structure
▪ e.g., emails, smart phone photos
▪ Data having a defining structure
▪ e.g., Database
Structured
Semi-Structured
Unstructured
Incre
asin
g G
row
th
© 2020 Verisk Analytics, Inc. All rights reserved. 88
Translating Data into Business Value
Cross Industry Standard Process for Data
Mining (CRSIP-DM)Data
Understanding
Data
Preparation
Modeling
Evaluation
Implementation
Business
Understanding
Data
© 2020 Verisk Analytics, Inc. All rights reserved. 99
Policy Life-Cycle
Issue PolicyRatingData Collection Data Validation
Sta
rt
3 to 4 weeks
Automation
Data Science
Natural Language Processing
Electronic Medical Records
Connected Technology
Advance Modeling
Issue
PolicyRating
Data
Collection
Data
Validation
Sta
rt
1 to 2 weeks
shortened time to policy
© 2020 Verisk Analytics, Inc. All rights reserved. 1010
In the Beginning…
• Predictive Models were built on structured data
• Credit Scoring was one of the first commercial applications
• Analytic Approaches included Supervised Models
Verisk Life Insurance Analytics10
Customer_ID State
# of
Cards
Card
Balances,
$
Card
Limits, $
Utilization,
%
Age of
Oldest
Card,
months
Time since
delinquency,
months
# of 60+ Day
Delinquencies in
last 12 months Performance
1 A12345 CA 2 8,000 10,000 80 60 5 1 Bad
2 B34567 UT 4 1,000 10,000 10 120 24 0 Good
3 B56789 NV 4 3,000 5,000 60 12 N/A 0 Bad
4 C12345 NV 3 6,000 12,000 50 60 36 1 Good
5 D45678 NY 1 3,000 10,000 30 72 48 0 Good
6 D67890 MA 2 5,000 20,000 25 180 N/A 0 Good
7 E23456 TX 6 6,000 8,000 75 120 24 2 Bad
… … … … … … … … … … …
Predictive variables Target
© 2020 Verisk Analytics, Inc. All rights reserved. 1111
Natural Language Processing
• The natural first evolution from structured data
was to Natural Language Processing (NLP)
also known as Text Mining
Verisk Life Insurance Analytics11
Text Mining
Scientific Papers
Web Content
Social media posts
EmailsRegulatory
Filings
Insurance Forms
Medical Records
© 2020 Verisk Analytics, Inc. All rights reserved. 1212
Natural Language Processing – Bag of Words Approaches
• Initial approaches used a “Bag of Words” approach
– Disregarded grammar and word order
• Useful for document classification
• But, for information extraction, word order and grammar are important
Verisk Life Insurance Analytics12
I love you only
only I love you
I only love you
I love only you
© 2020 Verisk Analytics, Inc. All rights reserved. 1313
Natural Language Processing – Rule Based Approaches
• Context
– Proximity to keywords
– Regular Expressions
– Journalistic practices
Verisk Life Insurance Analytics13
A Raleigh man, Robert Stevens, is charged with insurance fraud and attempting to obtain property by false pretense after investigators with the North Carolina Department of Insurance say he filed a fraudulent insurance claim. Department of Insurance criminal investigator, Paul Hernandez accuses Robert Stevens, 37, of 1480 Mountain Road, Raleigh, of filing a fraudulent insurance claim with Mutual Insurance following a motor vehicle accident in Durham on Nov. 15, 2016, according to a release from the department. Brian Smith, the attorney representing Stevens, could not be reached for comment.
© 2020 Verisk Analytics, Inc. All rights reserved. 1414
Natural Language Processing – Machine Learning
Instead of providing the machine with
rules, you provide it with many labeled
examples and let the machine learn the
patterns itself.
Verisk Life Insurance Analytics14
A Raleigh man, Robert Stevens, is charged with insurance fraud and attempting to obtain property by false pretense after investigators with the North Carolina Department of Insurance say he filed a fraudulent insurance claim. Department of Insurance criminal investigator, Paul Hernandez accuses Robert Stevens, 37, of 1480 Mountain Road, Raleigh, of filing fraudulent insurance claim with Statefarm Insurance following a motor vehicle accident in Durham on Nov. 15, 2016, according to a release from the department. Brian Smith, the attorney representing Stevens, could tell not be reached for comment.
FBI investigators charged Brad Philips with insurance related Raleigh man, Robert Stevens, is charged with insurance fraud and attempting to obtain property by false pretense after investigators with the North Carolina Department of Insurance say he filed a fraudulent insurance claim. Department of Insurance criminal investigator, Paul Hernandez accuses Robert Stevens, 37, of 1480 Mountain Road, Raleigh, of filing fraudulent insurance claim with Statefarm Insurance following a motor vehicle accident in Durham on Nov. 15, 2016, according to a release from the department. Brian Smith, the attorney representing Stevens, could not be reached for comment.
Deepak Patel of Oakland California was arrested for charged with insurance fraud and attempting to obtain property by false pretense after investigators with the North Carolina Department of Insurance say he filed a fraudulent insurance claim. Department of Insurance criminal investigator, Paul Hernandez accuses Robert Stevens, 37, of 1480 Mountain Road, Raleigh, of filing fraudulent insurance claim with Statefarm Insurance following a motor vehicle accident in Durham on Nov. 15, 2016, according to a release from the department. Brian Smith, the attorney representing Stevens, could not be reached for comment.
Tom Birch accuses his former lawyer, Jane Brown, of stealing the life , Robert Stevens, is charged with insurance fraud and attempting to obtain property by false pretense after investigators with the North Carolina Department of Insurance say he filed a fraudulent insurance claim. Department of Insurance criminal investigator, Paul Hernandez accuses Robert Stevens, 37, of 1480 Mountain Road, Raleigh, of filing a fraudulent insurance claim with Statefarm Insurance following motor vehicle accident in Durham on Nov. 15, 2016, according to a release from the department. Brian Smith, the attorney representing Stevens, could not be reached for comment.
James Yang, 45, of Detroit, pleaded guilty to six counts insurance fraud and attempting to obtain property by false pretense after investigators with the North Carolina Department of Insurance say he filed a fraudulent insurance claim. Department of Insurance criminal investigator, Paul Hernandez accuses Robert Stevens, 37, of 1480 Mountain Road, Raleigh, of filing fraudulent insurance claim with Statefarm Insurance following a motor vehicle accident in Durham on Nov. 15, 2016, according to a release from the department. Brian Smith, the attorney representing Stevens, could not be reached for comment.
A Miami woman, Vanessa Smith, is in court for insurance fraud and attempting to obtain property by false pretense after investigators with the North Carolina Department of Insurance say he filed a fraudulent insurance claim. Department of Insurance criminal investigator, Paul Hernandez accuses Robert Stevens, 37, of 1480 Mountain Road, Raleigh, of filing a fraudulent insurance claim with Statefarm Insurance following a motor vehicle accident in Durham on Nov. 15, 2016, according to a release from the department. Brian Smith, the attorney representing Stevens, could not be reached for comment.
A New York man pleaded guilty in federal court to scheming to defraud several insurance companies out of millions of dollars by taking out life insurance policies on a brother who had died. Jason Yang, 38, pleaded guilty to mail and wire fraud, the U.S. attorney said in a statement. Yang took out at least 18 life insurance policies in his brother’s name, which combined carried total coverage limits in excess of $10 million between May 2015 and June 2017. But Yang's brother had died in China in 2014, prosecutors said. Yang also took steps to make it appear as though his brother was alive by opening and using bank accounts in his brother’s name and renewing his brother’s driver’s license.
© 2020 Verisk Analytics, Inc. All rights reserved. 1515
Artificial Intelligence:Any technique that enables computers to mimic human intelligence, using logic, if-then rules, decision
trees, and machine learning (including deep learning)
Machine Learning:A subset of AI that includes abstruse statistical techniques that enable machines to
improve at tasks with experience. The category includes deep learning
Definitions
Verisk Life Insurance Analytics15
Deep Learning:The subset of machine learning composed
of algorithms that permit software to train
itself to perform tasks, like speech and
image recognition, by exposing
multilayered neural networks to vast
amounts of data
© 2020 Verisk Analytics, Inc. All rights reserved. 1616
Deep Learning came to the forefront in Computer Vision
Trying to emulate Human learning
Verisk Life Insurance Analytics
© 2020 Verisk Analytics, Inc. All rights reserved. 1717
ImageNet Timeline
Verisk Life Insurance Analytics
https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/
© 2020 Verisk Analytics, Inc. All rights reserved. 1818
Speech Analytics
Automated Speech Recognition (ASR)
Converting speech to text enables subsequent mining
Voice Attributes
e.g., gender, age
emotion, health
Ensemble Model
Combine ‘what was said’ with ‘how it was said’
© 2020 Verisk Analytics, Inc. All rights reserved. 1919
Voice Analytics
Verisk Life Insurance Analytics
Ensemble
Rules-basedMachine-learning
based
Computer-vision
based
Pre-processing
Voice Signal
© 2020 Verisk Analytics, Inc. All rights reserved. 2020
What Can Machine Learning Do for You?
Verisk Life Insurance Analytics20
Structured: Demographic,
Lifestyle,
Credit
Unstructured
Text: Medical
Records, APS’s
Voice: Recorded
statements,
tele-interviews
Images:Medical
Scans, Social
Media feeds
Life UW
© 2020 Verisk Analytics, Inc. All rights reserved. 2121
Machine Learning Is Not a Magic Bullet
• Lots of data
• Representative datasets
• Explainability
• Palatability
• Test, test, test
• Beware the unexpected!!
Verisk Life Insurance Analytics
© 2020 Verisk Analytics, Inc. All rights reserved. 2222
Machine Learning is not a Magic Bullet
• Lots of data
• Representative datasets
• Explainability
• Palatability
• Test, test, test
• Beware the unexpected!!
Verisk Life Insurance Analytics
© 2020 Verisk Analytics, Inc. All rights reserved. 2323
Big Data
Deep Learning
Artificial
Intelligence
Neural Networks
Computer Vision Natural Language
Processing
Supervised Learning
Speech Processing
CONCLUSION:
Machine Learning can help you