NSA SKYNET

download NSA SKYNET

of 15

Transcript of NSA SKYNET

  • 8/9/2019 NSA SKYNET

    1/40

  • 8/9/2019 NSA SKYNET

    2/40

  • 8/9/2019 NSA SKYNET

    3/40

  • 8/9/2019 NSA SKYNET

    4/40

  • 8/9/2019 NSA SKYNET

    5/40

  • 8/9/2019 NSA SKYNET

    6/40

  • 8/9/2019 NSA SKYNET

    7/40

  • 8/9/2019 NSA SKYNET

    8/40

  • 8/9/2019 NSA SKYNET

    9/40

  • 8/9/2019 NSA SKYNET

    10/40

  • 8/9/2019 NSA SKYNET

    11/40

  • 8/9/2019 NSA SKYNET

    12/40

  • 8/9/2019 NSA SKYNET

    13/40

  • 8/9/2019 NSA SKYNET

    14/40

  • 8/9/2019 NSA SKYNET

    15/40

  • 8/9/2019 NSA SKYNET

    16/40

  • 8/9/2019 NSA SKYNET

    17/40

  • 8/9/2019 NSA SKYNET

    18/40

  • 8/9/2019 NSA SKYNET

    19/40

  • 8/9/2019 NSA SKYNET

    20/40

  • 8/9/2019 NSA SKYNET

    21/40

  • 8/9/2019 NSA SKYNET

    22/40

    Given a handful of courier selectors, can we find othersthat “behave similarly” by analyzing GSM metadata?

    It’s worth noting that:• we are looking for

    different people usingphones in similar ways• without using any call

    chaining techniquesfrom known selectors

    • by scanning throughall selectors seen inPakistan that have notleft Af/Pak (~55M)

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA, FVEY

  • 8/9/2019 NSA SKYNET

    23/40

  • 8/9/2019 NSA SKYNET

    24/40

    This presentation describes our search forAQSL couriers using behavioral profiling

    Preliminary SIGINT Findings

    Cross Validation Experimenton AQSL Couriers

    Behavioral Feature Extraction

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA, FVEY

  • 8/9/2019 NSA SKYNET

    25/40

    Counting unique UCELLIDs shows that courierstravel more often than typical Pakistani selectors

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA FVEY

  • 8/9/2019 NSA SKYNET

    26/40

    By examining multiple features at once, we can see someindicative behaviors of our courier selectors

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA, FVEY

  • 8/9/2019 NSA SKYNET

    27/40

    TOP SECRET//COMINT//REL TO USA FVEY

  • 8/9/2019 NSA SKYNET

    28/40

    Now, we’ll describe a cross validation experimenton the AQSL selectors that we were provided

    Cross Validation Experimenton AQSL Couriers

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA FVEY

  • 8/9/2019 NSA SKYNET

    29/40

    Our initial detector uses the centroid of the AQSLcouriers to “find other selectors like these”

    AQSL Cross-ValidationExperiment• 7 MSISDN/IMSI pairs• Hold each pair out

    and score them whentraining the centroidon the rest

    • Assume that random

    draws of Pakistaniselectors arenontargets

    • How well do we do?

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA FVEY

  • 8/9/2019 NSA SKYNET

    30/40

    Our initial detector uses the centroid of the AQSLcouriers to “find other selectors like these”

    AQSL Cross-ValidatExperiment• Initial experiments

    showed EER in

    10-20% range• Here, performance

    much worse againthese nontargets:

    • Seen in Pakistan• Not seen outside

    Af/Pak• Not FVEY selecto

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA FVEY

  • 8/9/2019 NSA SKYNET

    31/40

    Statistical algorithms are able to find the couriers at verylow false alarm rates, if we’re allowed to miss half of them

    Random ForestClassifier • 7 MSISDN/IMSI pairs• Hold each pair out and

    then try to find them af learning how to distingremaining couriers froother Pakistanis(using 100k random selectors here)

    • Assume that randomdraws of Pakistaniselectors are nontarge

    • 0.18% False Alarm Ra50% Miss Rate

    better

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA FVEY

  • 8/9/2019 NSA SKYNET

    32/40

    We’ve been experimenting with several errormetrics on both small and large test sets

    100k Test Selectors 55M Test Selectors

    Training Data Classifier Features

    False AlarmRate at 50%

    Miss Rate

    MeanReciprocal

    Rank

    TaskedSelectors in

    Top 500

    TaskedSelectors in

    Top 100

    None Random None 50%1/23k

    (simulated)

    0.64

    (active/Pak)

    0.13

    (active/Pak)

    KnownCouriers

    CentroidAll 20% 1/18k

    Outgoing

    43% 1/27k

    RandomForest

    0.18% 1/9.9 5 1+ AnchorySelectors

    Random Forest:• 0.18% false alarm rate at 50% miss rate• 7x improvement over random performance when

    evaluating its tasked precision at 100

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA, FVEY

  • 8/9/2019 NSA SKYNET

    33/40

    To get more training data we scraped selectors from S2I11Anchory reports containing keyword “courier”

    Anchory Selectors• Searched for reports

    containing “S2I11”

    AND “courier”• Filtered out non-mobile

    numbers and keptselectors with“interesting” travelpatterns seen inSmartTracker

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA, FVEY

  • 8/9/2019 NSA SKYNET

    34/40

    Adding selectors from Anchory reports to the training datareduced the false alarm rates even further

    Anchory Selectors• Searched for reports

    containing “S2I11”

    AND “courier”• Filtered out non-mob

    numbers and keptselectors with“interesting” travelpatterns seen inSmartTracker

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA, FVEY

  • 8/9/2019 NSA SKYNET

    35/40

    We’ve been experimenting with several errormetrics on both small and large test sets

    100k Test Selectors 55M Test Selectors

    Training Data Classifier Features

    False AlarmRate at 50%

    Miss Rate

    MeanReciprocal

    Rank

    TaskedSelectors in

    Top 500

    TaskedSelectors in

    Top 100

    None Random None 50%1/23k

    (simulated)

    0.64

    (active/Pak)

    0.13

    (active/Pak)

    KnownCouriers

    CentroidAll 20% 1/18k

    Outgoing

    43% 1/27k

    RandomForest

    0.18% 1/9.9 5 1+ AnchorySelectors

    0.008% 1/14 21 6

    Random Forest trained on Known Couriers + Anchory Selectors:• 0.008% false alarm rate at 50% miss rate• 46x improvement over random performance when

    evaluating its tasked precision at 100

    ,

    TOP SECRET//COMINT//REL TO USA, FVEY

    TOP SECRET//COMINT//REL TO USA, FVEY

  • 8/9/2019 NSA SKYNET

    36/40

    Now, we’ll investigate some findings after running theseclassifiers on +55M Pakistani selectors via MapReduce

    Preliminary SIGINT Findings

    ,

    TOP SECRET//COMINT//REL TO USA, FVEY

  • 8/9/2019 NSA SKYNET

    37/40

  • 8/9/2019 NSA SKYNET

    38/40

  • 8/9/2019 NSA SKYNET

    39/40

    TOP SECRET//COMINT//REL TO USA, FVEY

  • 8/9/2019 NSA SKYNET

    40/40

    Preliminary results indicate that we’re on theright track, but much remains

    Cross Validation Experiment: – Random Forest classifier operating at

    0.18% false alarm rate at 50% miss – Enhancing training data with Anchory

    selectors reduced that to 0.008% – Mean Reciprocal Rank is ~1/10

    Preliminary SIGINT Findings:

    – Behavioral features helped discoversimilar selectors with “courier-like”travel patterns

    – High number of tasked selectors atthe top is hopefully indicative of thedetector performing well “in the wild”