The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach...

27
The credit for crea-ng these slides belongs to Fall 2014 CS 521/621 students. Student names have been removed per FERPA regula-ons.

Transcript of The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach...

Page 1: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

The$credit$for$crea-ng$these$slides$belongs$to$Fall$2014$CS$521/621$students.$$Student$names$have$been$removed$per$FERPA$regula-ons.$

Page 2: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Checking App Behavior Against App Descriptions

Page 3: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies
Page 4: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Is it Malicious?

● An app that sends a text message to a premium number to raise money is suspicious? Maybe, but on Android, this is a legitimate payment method for unlocking game features.

● An app that tracks your current position is malicious? Not if it is a navigation app, a trail tracker, or a map application.

● An application that takes all of your contacts and sends them to some server is malicious? This is what WhatsApp does upon initialization, one of the world’s most popular mobile messaging applications.

Page 5: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Research Questions● How do we know an application does what it claims to do?

● How do we identify malware in an app market?

● How do we group together similar applications based on their

description?

Page 6: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Key Idea

● By clustering Android apps by their description topics, we can identify outliers in each cluster by examining their API usage.

● Applications that are similar, in terms of app description, should also behave similarly.

● If an app make an API call that other apps in the same cluster do not make, we can mark that app as an outlier

Page 7: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

ContributionsA new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps)

to detect anomalies (mismatches between advertised and implemented behavior).● Applied on a set of 22,500+ Android applications● Identified the main topics for each application from their descriptions● Sorted applications by related topics into 32 clusters (different from

Google Play Store categories)● Ranked applications in each cluster as most abnormal with respect to

their usage of sensitive APIs ● Correctly identified 56% of malicious applications, without requiring

known malware patterns

Page 8: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Example - GPS App

Page 9: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Example - GPS App1. CHABADA starts with a collection of “good” apps2. It identifies the main topics (“weather”, “navigation”...)3. It then clusters apps by related topics (Figure 1)4. In each cluster, it identifies the APIs each app accesses

(sensitive APIs only). (Figure 2)5. It identifies outliers with respect to API usage, produces

a ranked list of applications for each cluster for abnormality.

Page 10: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Example - GPS App

Figure 1

Page 11: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Example - GPS App

Figure 2

Page 12: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Clustering Apps by Description● Obtained a total of 22,521 apps across all categories in the

Google Play Store.● Used NLP to reduce App description to only English,

functional base words.● Feed processed descriptions into LDA to form 30 “topics”,

clusters of words commonly found together.● Use K-means clustering algorithm to group applications into

32 topics

Page 13: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies
Page 14: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Clusters and Assigned Names

Page 15: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Identifying Outliers by API’s● Extracted all API calls from apk file via static analysis● Filtered to only sensitive API’s, which are those regulate by

user approved app permissions.● Use machine learning to recognize abnormal API usage in

each topic cluster● Train an OC-SVM on a subset of each topic, then use it to

identify outlier applications.

Page 16: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Example - Outlier API UsageLondon Restaurants App, bold are outlier API calls

Page 17: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

EvaluationDoes this technique effectively identify anomalies?● Ran CHABADA on the 32 clusters and obtained top 5 outliers for

each cluster: total 160 outliers● Manually examined sensitive behavior of apps and classified as:

○ Malicious - unadvertised behavior against user’s interest○ Dubious - unadvertised behavior not necessarily against user’s

interest○ Benign - properly advertised behavior

Page 18: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

EvaluationCan this technique identify malicious Android applications?● Collected 1,200 known malicious Android apps and searched for their

corresponding descriptions, using title and package identifier● Manually checked the 188 matches found and removed the non-English

descriptions: resulting in 172 malicious apps● Trained OC-SVM only on benign apps, thereby simulating a situation

where every malware attack is novel● Ran OC-SVM model as a classifier on benign apps and known malicious

apps in each cluster

Page 19: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

EvaluationCan this technique identify malicious Android applications

without clustering?

● Trained OC-SVM only on sensitive APIs and NLP-preprocessed words from the description

● All applications form one big cluster

Page 20: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

EvaluationCan this technique identify malicious Android applications

using given categories?

● Clustered applications based on their categories in Google Play Store● Repeated the experiment with the resulting 30 clusters

Page 21: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Discussion QuestionsWould this tool be useful/ worthwhile for Google to employ to find

malware before it is enters the app store?

Page 22: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

What type of enhancements could be added to CHABADA in

order for it to more successfully detect threats?

Page 23: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Why are benign applications identified as outliers by this

technique?

Page 24: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

What would be the effect of clustering the apps into more/less

clusters (via the k-means algorithm)?

Page 25: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

All of these approaches rely on static analysis, if dynamic

analysis was used what further information would this give the

researchers?

Page 26: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Could this technique be applied to different application

ecosystems? What would be challenges of bringing this

approach to the iOS App Store? Could it be used to identify

abnormal Windows programs?

Page 27: The$credit$for$crea-ng$these$slides$belongs$to$ … · 2020-05-26 · Contributions A new approach called CHABADA (CHecking App Behavior Against Descriptions of Apps) to detect anomalies

Bibliography

All information, data, and sample code taken from the article: