ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active...
Transcript of ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active...
![Page 1: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/1.jpg)
ASK-the-Expert:Activelearningbasedknowledgediscoveryusingthe expert
Kamalika DasDataSciences Group
NASAAmesResearch Center
MLWorkshop,August 2017
![Page 2: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/2.jpg)
Problem• Identifysafetyeventsinflightoperational data• Unsupervisedanomaly detection• SMEreviewof anomalies
Unsupervisedanomalydetection
NOS OSOS NOS
NOS NOS
Statisticalflight anomalies2
![Page 3: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/3.jpg)
• Lackofdefinitionof‘safety’ incident• One-classSVMbasedanomaly detection
x2 x2
Θ
Unsupervisedanomaly detection
x1 x1
S. Das, B. Matthews, A. Srivastava, N Oza. 2010.Multiple kernel learning for heterogeneous anomaly detection: algorithm and aviation safety case study. InProceedings of the 16th ACM SIGKDD (KDD '10). 47-56.
3
![Page 4: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/4.jpg)
Stateofthe art
TRACON
ATRCC
FAAFacilities
Data Collection Data Processing
TRACON
ARTCC
DataFilter
FeatureSelectionAndNormalization
DataMerge
Calculate FlightSeparation andTurn-to-finalfeatures
Existing System
MKAD: UnsupervisedAnomaly Detection Nominals
Anomalies
Labels
OperationallySignificantEvents
4
![Page 5: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/5.jpg)
Proposed approach
Input Features Anomalies
MKAD
Operationally significant anomalies
Active learning strategy
SMETraining
Uninteresting anomalies
Nominals
Active learning with rationales framework
Inst
ance
forl
abel
ing
Labe
l
5
Ratio
nale
Output
2-class classification/ranking algorithmActive Learner
![Page 6: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/6.jpg)
Activelearning framework
…
Flightsflight1
f?1 f2 f3 … … … … fn f*?
?
?
labels
Statistical anomaliesFeatures
Bootstrap samples
Labeled poolf1 f2 f3 … … … … fn f*
O
N
N
f1 f2 f3 … … … … fn f*?
?
?
?
Unlabeled pool
ActiveLearner
Model:2-classmultiplekernelSVMActivelearningstrategy:MostlikelypositiveAutomatedfeatureconstruction:Multiplekernellearning+decisiontree construction
Lossof separationOSflight:x* Label:y* rationale
flight: x* Sample to label
6
![Page 7: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/7.jpg)
ASK-the-Experttool: architecture
7K.Das,I.Avrekh,B.Matthews,M.Sharma,N.Oza.2017.ASK-the-Expert:Activelearningbasedknowledgediscoveryusingthe expert.InProceedingsofECML-PKDD2017.Tobepublished.
![Page 8: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/8.jpg)
Annotator component
8
![Page 9: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/9.jpg)
Coordinator component
10
![Page 10: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/10.jpg)
Multiplekernelsupportvector machine
• 2-classSVM objective:
• Decision function:
f1 f2 f2 … … fn
1
2
3
…
m
• Multiplekernel2classSVM:classifyingbetweenoperationallysignificant(OS)anduninteresting(NOS) flights
Feature set
Flight
timeserie
s
… Weightedaverageofallfeature kernels
… …
ηnKernelweights: η1 … η3 ……
10
![Page 11: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/11.jpg)
Rationalefeatureconstruction
• Howtosetweights:η1,η2,…,ηn𝑠. 𝑡. 𝜂𝑚 >=0 &∑𝜂𝑚= 1
• SimpleMKL algorithm– Modifiedobjective function– Alternatesbetweenoptimizingclassifiermarginandweightsof kernels
11
![Page 12: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/12.jpg)
Rationalefeatureconstruction
• Decisiontree induction
12
![Page 13: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/13.jpg)
Data
altitude
Verticalseparation
Horizontal separation
ORIGINAL FEATURES• Latitude• Longitude• Altitude• Ground speed• Horizontal separation• Vertical separation• Aircraft size• Turn-to-final(TTF) parameters:
• Maximum overshoot• Speedat TTF• Distanceat TTF• Angleat TTF• Altitudedifferenceat TTF
• Nearestneighboring(NN)flight info:• NNflightonsame runway• NNflightonparallel runway• NNflightpartofthesame flow
Runway
![Page 14: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/14.jpg)
Rationale features
“Lossof separation”• Horizontalseparation<3milesAND
Verticalseparation<1000ftANDnearestneighboringflightisnotonparallelrunwaysandnotpartofthesame flow
“Large overshoot”• Maximumovershootisgreaterthana
thresholdbasedonvaluesofflightswithpositive labels
“Unusualflight path”• Overalldeviationfromexpected(average)
trajectoryofalllandingflightsonthatrunway
x
Begin PointxLandingPoint
Expected
Actutrajecto
alry
trajectory
Deviation fromexpectedpath
Verticalseparation<1000 ft
Horizontalseparation<3miles
![Page 15: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/15.jpg)
Experimental setup
15
• Dataset:30NMairspacearoundDenverInternationalAirportforAug 2014– Trainingset:~2400 flights– Statisticalanomalies: 153– OSflights: 24
• 2foldcrossvalidationwith10randombootstrapsforeach fold
![Page 16: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/16.jpg)
Performance analysis• Metrics:precision@5and precision@10• Most-likelypositivestrategy
Learningcurvesfordifferentactivelearning strategies16
![Page 17: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/17.jpg)
Performance analysis
75%savingsinlabelingeffort
Learningcurvesformostlikelypositivestrategywithandwithout rationales
17M.Sharma,K.Das,M.Bilgic,B.Matthews,D.Nielsen,N.Oza.2016.ActiveLearningwithRationalesforIdentifyingOperationallySignificantAnomaliesinAviation.InProceedingsofECML-PKDD2016.pp209-225.
![Page 18: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/18.jpg)
Performance analysis
Comparisonofnumberoflabeledflightsrequiredbyvariousstrategiestoachieveatargetperformancemeasure.‘n/a’representsthatthetarget
performancecannotbeachievedbyamethodevenwith45labeled flights.
18
![Page 19: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/19.jpg)
Performance benefits
20
• Generalization– Twodifferenttestdatasets:July2014andJuly 2015– Averageimprovementinprecision@5: ~30%– Averageimprovementinprecision@10: ~65%
• Review time– Upto75%reductioninreviewtimeforsametargetperformance
![Page 20: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/20.jpg)
Summary
20
• Upto75%reductioninSMEreview time• Methodandtoolisagnostictodomain
• Canbetailoredtoworkinanydomainsufferingfromlackoflabeleddata
![Page 21: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/21.jpg)
Acknowledgement
21
• ThisworkissupportedbyCenterInnovationFund(CIF)2017 award
• Team:– NikunjOza,NASAAmesResearch Center– BryanMatthews,SGT Inc.– IllyaAvrekh,SGT Inc.– ManaliSharma,PhDStudent,IllinoisInstituteof Technology– SayeriLala,UndergraduateStudent,MassachusettsInstituteof Technology
![Page 22: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA](https://reader034.fdocuments.net/reader034/viewer/2022052103/603dba705f3812735e07aaee/html5/thumbnails/22.jpg)
Thank You
22