Social Networks and Surveillance: Evaluating Suspicion by Association Ryan P. Layfield Dr. Bhavani...

Social Networks and Surveillance: Evaluating Suspicion by Association

Ryan P. LayfieldDr. Bhavani Thuraisingham

Dr. Latifur KhanDr. Murat Kantarcioglu

The University of Texas at Dallas

{layfield, bxt043000, lkhan, muratk}@utdallas.edu

Overview

Introduction►Our Goal►System Design►Social Networks►Threat Detection►Correlation Analysis

The Experiment►Setup►Current Results►Issues►Future Work

Introduction

Automated message surveillance is essential to communication monitoring►Widespread use of electronic

communication

►Exponential data growth

►Impossible to sift through all ‘by hand’

Going beyond basic surveillance►Identifying groups rather than individuals

►Monitoring conversations rather than messages

Our Goal

Design new techniques and apply existing algorithms to…►Create a machine-understandable model

of existing social networks

►Identify abnormal conversations and behavior

►Monitor a given communications system in real-time

►Continuously learn and adapt to a dynamic environment

System Design

Three major components:►Social Network Modeler

►Initial Activity Detector

►Correlated Activity Investigator

Social Networks

Individuals engaged in suspicious or undesirable behavior rarely act alone

We can infer than those associated with a person positively identified as suspicious have a high probability of being either:►Accomplices (participants in suspicious

activity)►Witnesses (observers of suspicious activity)

Making these assumptions, we create a context of association between users of a communication network

Social Networks

Within our model:► Every node is a unique user► Every message creates or strengthens a link between

nodesOver time, the network changes

► Frequent communication leads to stronger links► Intermittent messaging implies weakening social ties

The strength of the link implies how strong an association between individuals is

From this data, we can theoretically identify► Hubs► Groups► Liaisons

Social Networks

Threat Detection

Every message sent is scrutinized in the interest of identifying suspicious communication►Keywords analysis►Prior context (i.e. previous message content)

When a detection algorithm yields a strong result, a token is created►The token is created at the origin and passed to the

recipient(s)►Existing tokens, if any, are cloned instead

The result is a web that potentially reflects the dissemination of suspicious information activity

Correlation Analysis

Future messages with similar suspicious topics are not always identifiable with the same ‘initial’ techniques►Quick replies ►Pronoun use►Assumption that recipient is aware of topic

If a token is present at the sender when a message is sent:►Message token is associated with and new

message are analyzed►If analysis yields a strong match, the token

is further cloned and passed to recipient

The Experiment

A rare set of words shared between two or more messages are candidates for keyword analysis, but they are not always easily sifted from ‘noise’

Noise within text-based messages comes in a variety of forms► Misspelled words► Unusual word choice► Incompatible variations of the same language (i.e. British

vs. American English)► Unexpected language

However, we do not want to eliminate potential keywords► Document names► Terminology specific to a subject► ‘Buzz’ words

The Experiment

We proposed an experiment that attempts to eliminate false positives due to noisy data while strengthening and expanding our correlation techniques

Tools► Running word ‘rank’ database

► Implementation of word set theory infrastructure

► JAMA Matrix LibrarySingular Value Decomposition

Our Approach► Apply SVD noise filtering based on 100 messages

► Analyze word frequency correlation between current message and prior suspicious messages

► Generate a score based on the results

Construct a matrix based on the last 100 messages

mwcountc

messages

More common

Less common

Decompose and rebuild

Eliminate ‘weak’ singular values

SetupPulled from messages j and k

),(),()(

kijii wrank

mwcountmwcountwscore

‘Raw’ total score for word wi

Pulled from ‘running’ word database

kji WWw

iwscore )(Counts only intersection of words Predefined fixed

threshold

Current Results

Method is not currently accurateLarge fluctuations

►Correlation easily swayed by plethora of common words

►Uncommon words not given enough weight

Current Results

Accuracy of Results over 900 Messages

True Positives

False Positives

True Negatives

False Negatives

1000 messages evaluated, first 100 used to seed word ranks.

Issues

Word frequencies fluctuate wildly during beginning of experiment (0.0 – 10.0+)

Extreme cost for current construction methods and computation

Filtering context limited to recent global history

Affected by large bodies of text

Future Work

Tap potential of existing matrix for further analysis

Adaptive filtering feedback algorithmsSpeed improvements to accommodate

real-time streamsFlexible communication platform

monitoringAddition of pipe architecture for

modular threat detection and correlation

Social Networks and Surveillance: Evaluating Suspicion by Association Ryan P. Layfield Dr. Bhavani...

Documents

Transcript of Social Networks and Surveillance: Evaluating Suspicion by Association Ryan P. Layfield Dr. Bhavani...

Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.

Curriculum Vitae Latifur R. Khan Computer Science ...lkhan/resume.pdf · Curriculum Vitae Latifur R. Khan Computer Science Department The University of Texas at Dallas MS EC-31, P.O.

Erman Pattuk Murat Kantarcioglu Vaibhav Khadilkar Huseyin Ulusoy

SGX IR - Secure Information Retrieval with Trusted Processors · Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu ... Challenge: Access Pattern

Efficient Similarity Search over Encrypted Datamuratk/courses/cloud13s_files/Efficie… · Efficient Similarity Search over Encrypted Data Mehmet Kuzu, Saiful Islam, Murat Kantarcioglu.

Data Mining, Security and Privacy Prof. Bhavani Thuraisingham Prof. Murat Kantarcioglu Ms Li Liu (PhD Student – completing December 2007) The University.

Information Operation across Infospheres: Assured Information Sharing Prof. Bhavani Thuraisingham Prof. Latifur Khan Prof. Murat Kantarcioglu Prof. Kevin.

Trustworthy Semantic Webs Prof. Bhavani Thuraisingham The University of Texas at Dallas Collaborators: Profs. Latifur Khan Prof. Murat Kantarcioglu Prof.

Privacy-Preserving Distributed Mining of Association Rules ...muratk/publications/tkde04.pdf · Mining of Association Rules on Horizontally Partitioned Data Murat Kantarcioglu and

Adversarial Data Mining for Cyber Securitymuratk/CCS-tutorial.pdf · Adversarial Data Mining for Cyber Security Murat Kantarcioglu*, Bowei Xi ** ... • ** Statistics, Purdue University

Justin Sahs and Prof. Latifur Khan 1 A M ACHINE L EARNING A PPROACH TO A NDROID M ALWARE D ETECTION.

Geographically-Typed Geospatial Data Source Matching with High- Quality Clustering and Multi- Attribute Matching Jeffrey Partyka Dr. Latifur Khan Dr. Bhavani.

Overview of Cryptographic Tools for Data Security Murat Kantarcioglumuratk/courses/dbsec09s_files/crypto.pdf · Overview of Cryptographic Tools for Data Security Murat Kantarcioglu.

Faculties: Latifur Khan Bhavani Thuraisingham

Data Warehousing Data Mining Privacy. Reading Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer. 2007. Extended RBAC-design and implementation.

Social Computing and Incentivized Sharing. Group Members UT Dallas Murat Kantarcioglu Alain Bensoussan (UT Dallas) Nathan Berg Bhavani Thuraisingham University.

UT Dallas Syllabus for se6v81.502.11f taught by Latifur Khan (lkhan)

Pallabi Parveen, Nate McDaniel, Varun S. Hariharan, Bhavani Thuraisingham and Latifur Khan Department of Computer Science at The University of Texas at.

Privacy-Preserving Distributed Data Mining Chris Clifton This talk presents joint work with Prof. Mike Atallah, Murat Kantarcioglu, Xiadong Lin, and Jaideep.

25-1 Image Annotation and Feature Extraction Latifur Khan, November 2007 Digital Forensics:

Adversarial Data Mining for Cyber Securitymuratk/CCS-tutorial.pdf · Adversarial Data Mining for Cyber Security Murat Kantarcioglu*, Bowei Xi ... • Statistics, Purdue University