Identifying and Profiling Key Sellers in Cyber Carding ... and Profiling Key Sellers in ......

29
Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=mmis20 Download by: [University of Arizona] Date: 14 May 2017, At: 20:38 Journal of Management Information Systems ISSN: 0742-1222 (Print) 1557-928X (Online) Journal homepage: http://www.tandfonline.com/loi/mmis20 Identifying and Profiling Key Sellers in Cyber Carding Community: AZSecure Text Mining System Weifeng Li, Hsinchun Chen & Jay F. Nunamaker Jr. To cite this article: Weifeng Li, Hsinchun Chen & Jay F. Nunamaker Jr. (2016) Identifying and Profiling Key Sellers in Cyber Carding Community: AZSecure Text Mining System, Journal of Management Information Systems, 33:4, 1059-1086, DOI: 10.1080/07421222.2016.1267528 To link to this article: http://dx.doi.org/10.1080/07421222.2016.1267528 Published online: 10 Feb 2017. Submit your article to this journal Article views: 88 View related articles View Crossmark data

Transcript of Identifying and Profiling Key Sellers in Cyber Carding ... and Profiling Key Sellers in ......

Full Terms & Conditions of access and use can be found athttp://www.tandfonline.com/action/journalInformation?journalCode=mmis20

Download by: [University of Arizona] Date: 14 May 2017, At: 20:38

Journal of Management Information Systems

ISSN: 0742-1222 (Print) 1557-928X (Online) Journal homepage: http://www.tandfonline.com/loi/mmis20

Identifying and Profiling Key Sellers in CyberCarding Community: AZSecure Text Mining System

Weifeng Li, Hsinchun Chen & Jay F. Nunamaker Jr.

To cite this article: Weifeng Li, Hsinchun Chen & Jay F. Nunamaker Jr. (2016) Identifying andProfiling Key Sellers in Cyber Carding Community: AZSecure Text Mining System, Journal ofManagement Information Systems, 33:4, 1059-1086, DOI: 10.1080/07421222.2016.1267528

To link to this article: http://dx.doi.org/10.1080/07421222.2016.1267528

Published online: 10 Feb 2017.

Submit your article to this journal

Article views: 88

View related articles

View Crossmark data

Identifying and Profiling Key Sellers inCyber Carding Community: AZSecureText Mining System

WEIFENG LI, HSINCHUN CHEN, AND JAY F. NUNAMAKER JR.

WEIFENG LI ([email protected]; corresponding author) is a doctoral student inthe Department of Management Information Systems and a research associate in theArtificial Intelligence Lab at the University of Arizona. His research interests includesocial media analytics, natural language processing, machine learning, and securityinformatics. His work has appeared in various conferences and workshops, includingInternational Conference on Information Systems,Workshop on Information Technologyand Systems, and the IEEE Conference on Intelligence and Security Informatics.

HSINCHUN CHEN ([email protected]) is University of Arizona Regents Professorand Thomas R. Brown Chair in Management and Technology in the ManagementInformation Systems Department at the Eller College of Management. He joined theNational Science Foundation (NSF) as program director of the Smart and ConnectedHealth Program in September 2014. He received his Ph.D. in information systems fromNewYorkUniversity. He is director of the Artificial Intelligence Labwhere he developedthe COPLINK system, which has been cited as a national model for public safetyinformation sharing and analysis, and has been adopted in more than 3,500 law enforce-ment and intelligence agencies. He is the author or editor of 20 books, 25 book chapters,280 journal papers, and 150 refereed conference articles covering digital library, data/text/web mining, business analytics, security informatics, and health informatics. He iseditor in chief of Security Informatics. He has received over 90 grants totaling more than$40 million in research funding from the NSF, National Institutes of Health, NationalLibrary ofMedicine, Department of Defense, Department of Justice, Central IntelligenceAgency, Department of Homeland Security, and other agencies.

JAY F. NUNAMAKER JR. ([email protected]) is Regents and SoldwedelProfessor of MIS, Computer Science and Communication and director of the Center forthe Management of Information and the National Center for Border Security andImmigration at the University of Arizona. He received his Ph.D. in operations researchand systems engineering from Case Institute of Technology. He has held a professionalengineer’s license since 1965. Hewas inducted into the Design Science Hall of Fame andreceived the LEOAward for LifetimeAchievement from the Association for InformationSystems.Hewas featured in the July 1997 issue ofForbesMagazine on technology as oneof eight key innovators in information technology. He specializes in the fields of system

Color versions of one or more of the figures in the article can be found online atwww.tandfonline.com/mmis

Journal of Management Information Systems / 2016, Vol. 33, No. 4, pp. 1059–1086.

Copyright © Taylor & Francis Group, LLC

ISSN 0742–1222 (print) / ISSN 1557–928X (online)

DOI: 10.1080/07421222.2016.1267528

analysis and design, collaboration technology, and deception detection. The commercialproduct GroupSystems ThinkTank, based on his research, is often referred to as the goldstandard for structured collaboration systems. He founded the MIS Department at theUniversity of Arizona in 1974 and served as department head for 18 years.

ABSTRACT: The past few years have witnessed millions of credit/debit cards flowingthrough the underground economy and ultimately causing significant financial loss.Examining key underground economy sellers has both practical and academic signifi-cance for cybercrime forensics and criminology research. Drawing on social mediaanalytics, we have developed the AZSecure text mining system for identifying andprofiling key sellers. The system identifies sellers using sentiment analysis of customerreviews and profiles sellers using topic modeling of advertisements. We evaluated theAZSecure system on eight international underground economy forums. The systemsignificantly outperformed all benchmark machine-learning methods on identifyingadvertisement threads, classifying customer review sentiments, and profiling sellercharacteristics, with an average F-measure of about 80 percent to 90 percent. In ourcase study, we identified the famous carder, Rescator, who was affiliated with theTarget breach, and captured important seller characteristics in terms of product type,payment options, and contact channels. Our research leverages social media analyticsto probe into the underground economy in order to help law enforcement target keysellers and prevent future fraud. It also contributes to our understanding of the use ofinformation technology in detecting deception in online systems.

KEY WORDS AND PHRASES: carding community, cybersecurity, deep learning, frauddetection, online deception, social media analytics, topic modeling, undergroundeconomy.

Carding is the deceptive process of stealing, reselling, and ultimately using largevolumes of payment information to commit fraud [40]. Carding has increasinglycaused significant economic and societal loss. The number of confirmed cardingincidents went from 14 in 2004 to 1,367 in 2013 [50]. Hundreds of millions of cardingvictims have been exposed to potential financial fraud. For instance, over 40 millioncredit/debit cards were leaked from Target in 2013; one-third of American householdswere affected by the leak of 83 million accounts at JPMorgan Chase in August 2014;and 56 million payment card records were stolen from Home Depot in 2014. In 2015,Carbanak, a carding advanced persistent threat (APT), is reported to have caused over100 banks to suffer losses up to $1 billion [26]. Although carding involves a sequenceof sophisticated deceptive processes, the international online underground economyhas commoditized carding activities by providing a platform for exchanging carding-related products and services worldwide [23]. Specifically, the underground economy,which is often housed on carding forums, allows carders to easily advertise or acquireattacking malwares and stolen cards [19, 21]. Consequently, the underground economyhas put card owners at greater risk of financial fraud.Lately, the identification and profiling of key underground economy sellers has

gained increasing traction in both academia and law enforcement [41]. Peretti [40]argued that “prosecuting and punishing” key sellers is a key solution to data breaches[p. 407]. Holt [21] called for law enforcement to “target” key sellers that are “reputable

1060 LI, CHEN, AND NUNAMAKER

and trustworthy” and “gather information in order to develop cases” [p. 175]. Therecent arrest and conviction of several key sellers has rescued millions of cards fromfurther dissemination and prevented myriad cases of potential financial fraud [48, 49].However, the identification of key sellers in the underground economy has beenchallenging, considering a number of deceptive sellers who do not provide buyerswith promised products and services [20]. While many English linguistic cues havebeen found to detect deception [18], the multilingual nature of the international under-ground economy poses great challenges to the application of these techniques. Asunderground economy forums allow buyers to comment on their purchase experience,we are motivated to make use of the customer reviews to tell deceptive sellers apart.Furthermore, little has been done to profile key sellers in terms of their specialties,which inspires us to capture sellers’ characteristics from their advertisements.Using a design science approach [37], we propose a text mining-based system for

identifying and profiling the key sellers from the carding forums. Our system is capableof effectively ranking sellers based on their quality and extracting sellers’ characteristics.The proposed system uses two types of textual traces in carding forums—advertisementsand customer reviews. While advertisements reflect major seller characteristics, includ-ing product or service descriptions, payment options, and contact, customer reviewsreveal seller quality. The system leverages (1) deep learning-based sentiment analysis toevaluate seller quality based on customer reviews, and (2) topic modeling to profilesellers based on their advertisements. To the best of our knowledge, our system is the firstto contribute to carding crime forensics by providing a means for large-scale cybersurveillance on carders and thereby alleviating law enforcement investigation efforts.

Literature Review

We provide a review of prior work from the following related research areas toform the basis of our study.

Underground Economy

The underground economy [20] is a vast international online blackmarket for exchangingcrime-related products and services, including vulnerabilities, malware, stolen data, hostservices, cash-out services, and spammers, to name only a few. Carders actively partici-pate in the online underground economy to acquire tools or services for the maliciousactivities [23]. Sellers’participation in the carding forums aswell as their interactionswithbuyers have resulted in two critical textual traces: advertisements and customer reviews.

Advertisements

Sellers rely heavily on advertisements as the major way of promoting products orservices [17, 40]. Advertisements usually contain a thorough description of a product orservice, the accepted payment options along with the prices, and the contact channels

THE AZSECURE TEXT MINING SYSTEM 1061

[17]. The advertisement is a reflection of the seller’s characteristics, fromwhich we canprofile the seller. For example, by categorizing the advertisement threads, we candetermine the products or services in which the carder specializes. Advertisements(as shown in Figure 1) are relatively easy to distinguish from other forum posts forthree reasons. First, the advertisements from the same seller use similar language anddescriptions across forums [22]. This is because the seller posts the same advertisementmultiple times to reach potential customers. To expedite this process, many sellersemploy spamming scripts to automate advertisement posting [16, 17]. Second, theadvertisements for the same product or service contain the same set of lexiconsreferring to a certain product or service feature, payment option, or contact channel.Third, advertisements all have an aesthetic appearance that regular posts usually do nothave. To attract customers, advertisements often use capitalization, multicolored text,ASCII flares, and repeated sales pitches across multiple lines [17].

Customer Reviews

Because heterogeneous deception sensors [38] are lacking in computer-mediatedcommunications, deception is pervasive in online social media [15]. It is especiallyso in the carding forums due to the absence of stringent regulations and the asymmetryof information [20]. Customer reviews serve as a crucial mechanism for building trustbetween buyers and sellers [22, 23, 24, 42]. Buyers leave comments regarding their

Figure 1. Illustration of Underground Economy. In the Screenshot, Rescator is PromotingHis Dump Shop by Listing the Prices, Highlighting the Dumps, and So On

1062 LI, CHEN, AND NUNAMAKER

experiences with the sellers or the products and services beneath the advertisement post[21]. These comments may, in turn, improve prospective buyers’ perceptions of theseller [25]. Exploratory research has found that customer reviews are critical forprospective buyers in assessing the seller quality [22, 32]. Therefore, sellers rely onpositive comments to gain reputation, trust, and credibility [42]. For investigators,customer reviews collectively reflect the seller quality. Computational analysis ofcustomer reviews would allow us to assess the sellers based on their quality andidentify key sellers.Since underground economies are hosted on carding forums [16, 17], we can leverage

social media analytics to analyze textual traces in underground economy forums.Consequently, we review selected prior hacker social media analytics literature.

Hacker Social Media Analytics

We review research investigating hacker communities using hacker social mediadata and social media analytics techniques. The review of prior research is orga-nized using a taxonomy of four dimensions (Table 1): research objective, datasource, analysis approach, and features. In particular, textual features includeattributes derived from the message body that has semantic meaning [43]; andstructural features include attributes from communication mechanisms, such as thenumber of posts, number of replies, and friendship connections [30].

Research Objectives

Three major themes of objectives can be identified in past research. The first theme is thegeneral exploration of the hacker community for enriching our understanding of thehacker community [21, 23, 39, 42, 53, 56]. Prior researchers are interested in exploringthe operational mechanisms of hacker communities [53] and social norms [42]. Thesecond theme is the analysis of hacker organizations [24, 31, 55]. This line of researchmainly tests the generalizability of criminal organization theory in the context of onlinehacker communities. The third, emerging theme focuses on investigation of key hackersbased on their reputation [2, 54]. This line of research relates to the identification of keysellers; however, there is a lack of evidence supporting the correlation between reputationscores and the seller quality.

Data Sources

There are two major sources of data: social network services data and hacker communitydata. Social network services data have been used to investigate hacker social organiza-tions [24]. For example, Holt and Strumsky [24] experimented on LiveJournal, a Russia-based social network service, to build a social network characterizing key hackerinteractions. As hackers increasingly congregate in dedicated hacker communities to

THE AZSECURE TEXT MINING SYSTEM 1063

Table

1.ATaxon

omyof

Prior

HackerSocialMedia

Analytics

Research

Analysis

Feature

Study

Objective

Data

Metho

dManual

Textual

Structural

Ben

jamin

etal.[9]

Iden

tifying

eviden

ceof

potentialthrea

tsHCD

IRNo

Yes

Yes

Oda

baset

al.[39]

The

orizingan

dan

alyzingtheun

dergroun

dec

onom

yHCD

CA

Yes

Yes

No

Zha

nget

al.[55]

Classifyingha

ckers

HCD

CA

No

Yes

No

Abb

asie

tal.[2]

Iden

tifying

expe

rtha

ckersan

dtheirsp

ecialties

HCD

SNA

No

Yes

Yes

Lauet

al.[29]

Miningcybe

rcrim

inal

network

SNS,HCD

LDA

No

Yes

No

Zha

ngan

dLi

[54]

Iden

tifying

Cha

ractersof

repu

tableha

ckers

HCD

SNA

No

No

Yes

Yip

etal.[53]

Und

erstan

ding

forum-bas

edun

dergroun

dec

onom

ymec

hanism

sHCD

CA,SNA

Yes

Yes

Yes

Holt[21]

Exa

miningso

cial

dyna

micsin

cybe

rcrim

emarke

tsHCD

CA

Yes

Yes

No

Holtet

al.[24]

Exp

lorin

gRus

sian

hacker

social

networks

SNS

SNA

No

No

Yes

Wan

get

al.[51]

Correlatio

nbe

twee

nha

cker

discus

sion

san

dattacks

HCD,VDB

LRNo

No

Yes

Zha

oet

al.[56]

Iden

tifying

adve

rsaries

SNS

MB

No

No

Yes

Motoy

amaet

al.[36]

Cha

racterizingun

dergroun

dforums

HCD

SNA

No

No

Yes

Chu

etal.[13]

Exa

miningthecrea

tionan

dsa

leof

malwares

HCD

CA

Yes

Yes

No

Holtan

dLa

mpk

e[23]

Exa

miningthena

ture

ofstolen

data

marke

tsHCD

CA

Yes

Yes

No

Luet

al.[31]

Study

ingha

cker

commun

ityorga

niza

tion

SD

SNA

Yes

Yes

Yes

Rad

ianti[42

]Exa

miningha

cker

social

beha

vior

lead

ingto

blac

kmarke

tco

ntinuity

HCD

CA

Yes

Yes

No

Notes:C

A=contentanalysis;CR=Cox

Regression;HCD=hackercommunity

data;IR=inform

ationretrieval;LDA=latentDirichletallocatio

n;LR=lin

earregression;

MB=modelbuild

ing;

SD=secondarydata;S

NA=socialnetworkanalysis;S

NS=socialnetworkservices;V

DB=vulnerability

database.

1064 LI, CHEN, AND NUNAMAKER

share hacking knowledge and experiences, find collaborators, and acquiremalware, therehas been a trend to examine hacker community data [33]. For example, Yip et al. [53]showed that forum-based underground economy had its unique operational mechanismsto sustain the black market. Following this stream of work, our research focuses on theforum-based underground economy where most advertisements and customer reviewsreside.

Analysis Techniques

Three types of social media analytics techniques have been used: social networkanalysis, regression analysis, and content analysis. Widely adopted in hacker orga-nization research, social network analysis has characterized the properties of hackernetworks with centrality measures [2, 24, 31, 36, 54]. Yip et al. [53] further comparedhacker networks to canonical social network models (e.g., preferential attachment)and discovered the uniqueness of the hacker networks. Regression analysis has beenmostly leveraged to test the generalizability of theories in hacker communities. Forexample, Wang et al. [51] studied the relationship between the volume of cyberattacks and the number of attack-related threads. Content analysis employedgrounded-theory methodology by manually examining hacker community texts [9,39]. The findings include the operation mechanism [21, 53], black market dynamics[13, 23], and social behavior [42]. The biggest drawback of these studies has been thelack of scalability and consistency. The automation of content analysis, including textclassification and topic modeling, is needed in light of the large and diverse interna-tional hacker community. Text classification techniques, such as Maximum Entropyclassifier, have helped group similar documents together in many research contexts[34]. Topic modeling techniques, such as latent Dirichlet allocation, can automati-cally explore underlying topics from hacker community discussions.

Features

Both textual features and structural features have been used. While structuralfeatures are useful in representing communication activities [44], textural featuresare informative in carrying semantics of the communication [1]. One major short-coming of past textual feature analysis has been scalability. In contrast to hundredsof thousands of records that structural feature analysis could handle, prior textualfeature analysis has been able to process only several hundred records throughmanual coding. On the other hand, many significant insights regarding hackercommunities, such as the underground economy dynamics [13, 23, 39] and hackerbehaviors [42], were first discovered by analyzing textual features (i.e., contentanalysis). The lack of scalability undermines the significance of the findings and thegeneralizability of the method. While prior studies leveraging textual features havedemonstrated their potential in capturing rich hacker semantics, there is a need forinterpreting the semantics of textual features on a large scale. Another drawback of

THE AZSECURE TEXT MINING SYSTEM 1065

past textual feature analysis is language barrier. The international hacker commu-nity has members speaking languages from all over the world; however, most of thepast research focuses only on English [2, 55]. A language barrier is a big challengefor textual feature analysis on international hacker communities because multi-lingual textual analysis techniques are lacking. It is important to handle the multi-lingual problem either by developing language-specific textual analysis techniquesor by using a reliable machine translator.To summarize, several major conclusions can be drawn from the prior literature.

First, focusing on key hackers has become an emerging direction for understandinghacker communities. Second, there is a trend to use hacker community data ashackers increasingly congregate in hacker communities. Third, significant findingsfrom textual feature analysis necessitate a scalable method for content analysis thatcan interpret the semantics in hacker discussions.

Text Mining for Content Analysis

We further review two promising types of text mining techniques that couldautomate content analysis: sentiment analysis and topic modeling. Sentiment ana-lysis can quantify customer reviews to measure seller quality, while topic modelingcan profile sellers based on their seller characteristics inherent in advertisements.

Sentiment Analysis

Sentiment analysis is a class of text mining techniques for determining the subjectiveinformation of a text. Sentiment analysis has been applied to a variety of user-generated content (UGC) contexts to inform business intelligence (BI). For example,Lau et al. [28] applied sentiment analysis to financial news to inform decision supportfor mergers and acquisitions; Bollen et al. [11] performed sentiment analysis onTwitter to predict stock prices; Aggarwal and Singh [4] tested the influence of thesentiment of blog posts on venture capitalists’ decision-making process; Archak et al.[7] demonstrated that the sentiment of product reviews is a determinant of consu-mers’ purchase decisions. Among all UGC contexts, online customer reviews haveserved as the most popular research testbed [47]. This validates our intention to applysentiment analysis to the underground economy customer reviews.From a methodological perspective, there are generally two types of approaches:

machine-learning–based and dictionary-based. Machine-learning–based techniquestrain classifiers based on a set of predefined features extracted from the text.Commonly used classifiers include support vector machine (SVM) and naive Bayes(NB). Dictionary-based techniques rely on sentiment dictionaries incorporated withscoring rules to determine the sentiment scores in text. The sentiment dictionaryincludes mappings of words to sentiment scores, while the scoring rules guide thecalculation of the sentiment score from the constituent words in the text. An emergingsentiment analysis technique that has outperformed existing methods is deep learning.

1066 LI, CHEN, AND NUNAMAKER

Deep-learning–based sentiment analysis is a hybrid approach with both machine-learning–based properties and dictionary-based properties [12, 46].Deep-learning–based sentiment analysis includes two building blocks: word

vectorization and recursive neural network (RNN) [46]. Word vectorization buildsa word-vector language model to represent each word as a low-dimensional con-tinuous-valued sentiment vector. RNN composites word sentiment vectors intosentence sentiment vectors using the parse trees of sentences, where words arethe leaves and the phrases are the vertexes. Starting from the bottom subtree, RNNrecursively computes the parent sentiment vectors from its children sentimentvectors. The root sentiment vector represents the sentiment score of the entiresentence. At each subtree, child vectors, c1

! and c2!, are composited into the parent

vector, ~p, using the equation: ~p ¼ tanh Wc1!c2!� �� �

. In this equation, W is the

compositionality matrix that contains weights for compositing child vectors andtanh �ð Þ introduces nonlinearity into RNN to ensure model complexity. The deep-learning–based method proves to outperform state-of-the-art sentiment models by 5percent and reaches 85 percent accuracy [46].

Topic Modeling

Topic modeling refers to a class of statistical techniques for discovering the under-lying topics from a corpus [10]. Latent Dirichlet allocation (LDA) is a prominenttechnique widely used as automated content analysis on social media data. LDAcharacterizes each document in the corpus as a mixture of topics, which aredistributions over words with certain words having high probability [10]. Whileall the documents share the same set of topics, each document has different weightsfor the topics. Markov chain Monte Carlo (MCMC) sampling methods are oftenused to fit the LDA model. While LDA has gained increasing appreciation as alegitimate content analysis method, little research has leveraged this technique forhacker community analysis. Therefore, we turn to studies using LDA for businessintelligence (BI), a discipline to which cyber-security intelligence belongs. We thusreview and summarize prior studies from five perspectives: data, problem type,preprocessing procedure, topic interpretation, and evaluation method.First, social media data have been widely modeled using LDA, including electronic

communication [52], enterprise blogging posts [45], search engine queries [3], andstock recommendations [6]. Second, LDA can solve both confirmatory problems andexploratory problems. Confirmatory problems seek to match the inferred LDA topicsagainst a predefined set of topics from the research context. For example, Singh et al.[45] used LDA to categorize enterprise blogs to determine whether they were work-related, managerial, or technical; Bao and Datta [8] applied LDA to discover thepredefined risk types concealed in 10-K forms. Exploratory problems seek to decom-pose the collection of documents into a list of themes. For instance, Aral et al. [5]decomposed a collection of stock recommendations to inform the analysis of buyers’

THE AZSECURE TEXT MINING SYSTEM 1067

choices; Abhishek et al. [3] leveraged the output topics to interpret the semantics of thewords; and Wu [52] measured the information diversity based on output topics. Third,the preprocessing procedure is widely performed [5, 10]. It usually includes removingannotations, tokenizing the sentences, lemmatizing terms, removing stop words, and soon. Fourth, it is a common practice to qualitatively interpret the topics from thedistributions over the vocabulary. In particular, the distribution over the vocabulary isdifficult for humans to conceive and interpret because of its massiveness. The mostcommonly adopted procedure is to infer the topic based on its top key words [5, 8, 45].Fifth, as an unsupervised learning model, LDA lacks robust external validation meth-ods for evaluating its results, so most studies evaluate their LDAmodels using internalvalidation [5, 45]. The major internal validation method is perplexity, a languagemodeling evaluation metric [8, 10]. Perplexity is a measure of the topic model’s abilityto predict unobserved documents. However, as an internal validation metric, theperplexity value has been used to compare different models, but it cannot measurehow close the model is to perfect classification.

Research Gaps and Questions

Based on the review of prior literature on underground economy and hacker socialmedia analytics, we present two research gaps.First, prior research rarely approaches the underground economy through the lens

of its participants, especially the key sellers. Most underground economy studieshave instead focused on the social-economic features of the underground economy.However, key sellers play pivotal roles in cyber carding crimes and thus are criticalfor cyber forensics. The identification and profiling of key sellers can potentiallyprovide actionable intelligence to law enforcement, and consequently, reducepotential financial fraud.Second, few studies have systematically developed advanced text mining techni-

ques to textual features in the underground economy. Moreover, we are not awareof any prior application of advanced text mining techniques, such as sentimentanalysis and topic modeling, to carding forum discussions. Texts, such as adver-tisements and customer reviews, have shown great potential for revealing sellerquality and characteristics in prior literature. There is a need for techniques that canhandle the rich textual features in carding forums.To address these research gaps, this study seeks to answer the following questions:

● How can we develop a scalable text mining-based system for identifying andprofiling key sellers from the underground economy forums?

● How effective is it to leverage advanced text mining techniques to (1)identify key sellers using deep-learning–based sentiment analysis of custo-mer reviews, and (2) profile key sellers using topic modeling ofadvertisements?

1068 LI, CHEN, AND NUNAMAKER

Research Design

We propose the development of a text mining-based system called AZSecure foridentifying and profiling key sellers based on advertisements and customer reviews inthe underground economy. In particular, the system seeks to (1) identify key sellers usingdeep-learning–based sentiment analysis of customer reviews, and (2) profile key sellersusing topic modeling of advertisements. We aim to compare our proposed system andtechniques against popular benchmark techniques, including support vector machine,naive Bayes, k-Nearest Neighbour (kNN), and N-gram models. The evaluation willassess the effectiveness of our system for the identification and profiling of key sellers.We also present a case study to further demonstrate the utility and applicability of ourproposed system in a real cyber forensics setting.

Overview

We present the AZSecure system (Figure 2) for identifying and profiling the topsellers in the underground economy. The system contains two steps—the collec-tion step and the analytics step. The collection step extracts advertisements andgroups them based on the product and service type. The collection step has twocomponents: thread crawling and thread classification. Thread crawling identi-fies underground economy-related threads from a comprehensive collection of acarding forum. Thread classification determines whether they are advertisementthreads or not, and if so, what product or service they are promoting. Theanalytics step analyzes the advertisements and associated customer reviews toevaluate and profile the seller. This step has two components: seller rating andtop seller profiling. The seller rating evaluates seller quality based on customerreviews. Top seller profiling extracts seller characteristics from advertisements.As discussed in the literature review, the scope of the research focuses on two

Figure 2. The AZSecure Text Mining Research System

THE AZSECURE TEXT MINING SYSTEM 1069

groups of sellers—malware sellers and stolen data sellers. Nonetheless, oursystem can generalize to the sellers with other specialties. In the reminder ofthis section, we elaborate on each component in the system.

Thread Crawling

The thread crawling component aims to generate an inclusive subset of underground-economy–related threads and rule out irrelevant threads, thereby allowing the computa-tion-intensive thread classification component to run more efficiently. If a carding forumuser is involved in the underground economy, the user must have either (1) joined athread containing underground economy keywords, or (2) joined a thread containingother users involved in the underground economy. Based on this intuition, we can find aninclusive subset of underground-economy–related threads by following users involved inthreads containing underground-economy key words. The snowball sampling approachhas been used in previous exploratory cyber-security research to collect the discussionsfrom underground economy [22]. Inspired by this technique, we use a breadth-firstsearch-based snowball method to retrieve potentially relevant threads (Algorithm 1).

Starting with a set of seeding underground economy key words, we first retrieve thethreads containing these key words and the users in these threads. These users thenbecome new seeds to find more threads that are relevant. This iterative process stopsuntil all threads by hacker forum users interested in the underground economy arecollected [14].

Algorithm 1. Thread Crawling Using Snowball Sampling

Input: Ω={all threads};X={all users}; Φ={underground economy-related keywords}Output: X�={underground economy-related users}; Θ={thread|thread.author 2 X�}X� :¼ {x|x.thread 2 Θ};collect all the threads that contains underground economy-related keywordsΘq :¼ fθjθ 2 Ω; θ:contains φ 2 Φð Þg; //Θq is the queue of threads to be examinedΘc :¼ fg; //Θc is the thread history that keeps track of the traversed threadswhile Θq is not empty doGet the next thread in collection θ :¼ Θq.head;if θ 2 Θc then continue; //skip if the thread has been traversedΘc.add(θ); //add the thread to be examined into the thread historyfind the users in the threads Y :¼ fyjy 2 X; yinvolveθg;for each y 2 Y doif y‚X� thenadd the user to the underground economy-related user collection X�.add(y);add other threads the user posted to the task queue Θq.add({θ|θ.author ¼ y});

collect the threads of underground economy-related usersΘ ¼ fthreadjthread 2 Ωandthread:author 2 X�g;

return X�;Θ;

1070 LI, CHEN, AND NUNAMAKER

Thread Classification

We use thread classification to identify advertisements for malware and stolendata from the crawled threads. The Maximum Entropy classifier (MaxEnt) hasshown excellent performance in multiclass classification over other text classi-fiers due to its advanced feature selection technique [34]. MaxEnt learns thelikelihood of each textual pattern’s systematic appearance in a certain class. InMaxEnt, each feature fi pattern; classð Þ represents the relationship between acertain textual pattern and a certain class, and the feature weight θi representsthe likelihood of the textual pattern’s systematic presence in the class. Further,the conditional probability of the textual data uses the entropy model:p classjdatað Þ / expðP

iθifiÞ. By maximizing this entropy, MaxEnt weights the

relevance of each textual patterns under each class. Therefore, MaxEnt allows usto emphasize certain features that are strongly related to a certain class andaccommodate the dependencies among the features. In addition to the ranking-based feature selection that is inherent to MaxEnt, we incorporate three cate-gories of knowledge-based features informed by prior literature [16, 17]: topicalfeatures, highlight features, and hyperlink features (Table 2). Topical featuresare textual features that semantically relate to underground economy topics.Monetary lexicons are a major type of underground economy topical feature.Domain-specific lexicons are words that reflect the product or service type ofthe advertisements. Lexical measures capture the fact that advertisements areusually longer than regular discussions where detailed descriptions are rarelyneeded. Highlight features are the layout details that depict the decoration of theadvertisements, including page layout, multicolor texts, and font style (e.g.,bold, italic). Hyperlink features capture the contact channels of these sellersincluding the external links and e-mail addresses.

Table 2. Features for Thread Classification

Category Cue Example Type

Topical Monetarylexicons

“$,” ”Ruble,” “wmz” Malware, Carding, Other products orservices

Domain-specificlexicons

“cc,” “program,”“v1.0,” “shop”

Malware, Carding, Other products orservices

Lexicalmeasures

Length of thethread

Malware, Carding, Other products orservices, Irrelevant

Highlight Layout “<center>” Malware, CardingColor ”#FF0000” Malware, CardingFont style “<strong>” Malware, Carding

Hyperlink External URLs “shop: http://octavian.su”

Carding

E-mail “@” Malware, Carding

THE AZSECURE TEXT MINING SYSTEM 1071

Seller Rating

Wemeasure the seller quality by quantifying customer reviews [17, 21]. As mentioned inthe literature review, the review comments replying to an advertisement thread is a majorchannel for prospective buyers to evaluate the quality of a seller [7]. Sentiment analysistechniques have been used to evaluate the subjective information of the customerreviews. Therefore, we use the sentiment analysis score as a measure of a buyer’sevaluation of the seller quality. The seller rating component processes customer reviewsthrough four steps: machine translation, word vectorization, recursive neural network,and score aggregation. We present our seller rating component in Algorithm 2.

We highlight the major steps in the seller rating component as follows: the threadcontent is automatically translated from the original language to English via GoogleTranslate because much of the content is multilingual, which is incompatible withdeep-learning–based sentiment analysis. This step first segments the content sen-tence by sentence, then detects the language of the original sentence, and finallytranslates the sentence to English. Then, each word is vectorized into a five-dimensional sentiment vector using SentiTreeBank, a word vector dictionarytrained on customer reviews [46]. Each dimension represents an ordered sentimentdegree, with the first dimension being the most negative and the fifth dimension

Algorithm 2. Seller Rating Algorithm

Input: x=seller to be rated, Θ ¼ fθjθ is the advertisement of x}, Ω ¼ fΩijΩi ¼ fωjω repliesto θi}, W=compositionality matrix.

Output: Γ ¼ fγijγi is the rating for θig, r=overall rating of the sellerfor each advertisement θi 2 Θ dofind the set of replies for the advertisement Ωi;for each reply ω 2 Ωi doif ω is not in English thenω :¼ translate(ω);

break ω into a collection of sentences, S ¼ s1; s2; . . . ; snf gfor each sentence sj 2 S dosj :¼ word_vectorization(sjjSentiTreebank);T :¼ binary_parse_tree(sj);Q :¼ queue({root nodes of bottom subtrees in T});while Q is not empty don :¼ thefirstnodeinQ;composite the parent vector pn

! :¼ tanh Wpc1�!pc2�!� �� �

, c1 and c2 are children of n.

if the sibling of n has a sentiment vector thenadd the parent of n to Q;

calculate the rating of the reply γij :¼ largest(proot��!);

calculate the rating of the advertisement γi :¼P

γijcalculate the rating of the seller r ¼ average Γð Þ

return sellers and their ratings Γ, r;

1072 LI, CHEN, AND NUNAMAKER

being the most positive. The value of each dimension reflects the probability ofbeing the corresponding sentiment degree. Next, the recursive neural network stepparses the sentence into a binary tree and composites sentiment vectors recursively.From the bottom of the tree, we recursively combine the two child sentimentvectors into a single parent sentiment vector for each subtree. We evaluate thesentiment by averaging the probability of the root node sentiment vector over thesentiment spectrum. Finally, the sentiment-averaging step averages the post-levelsentiment scores for each advertisement and further averages the advertisementlevel sentiment scores for each seller.

Top Seller Profiling

As suggested by prior literature [17], we profile the top sellers in terms of sellercharacteristics, including product/service, payment options, and contact channels.Based on the intuition that the relationships among topics, documents, and wordsare similar to seller characteristics, sellers, and advertisements, we derive our topseller profiling component from latent Dirichlet allocation (LDA) to extract sellercharacteristics. In particular, we build the seller characteristic model based on thefollowing assumptions: (1) the advertisement is represented with a collection ofwords; (2) the characteristic is treated as a distribution over words; (3) the seller isportrayed by the collection of advertisements; and (4) the seller is characterized bya mixture of characteristics, in which we are interested. We define the correspond-ing generative process in Algorithm 3 accordingly:

This process depicts the imaginary process through which the advertisements weregenerated. Following McCallum and Mallet [35], we fit our model using collapsedGibbs sampling. Based on the fitted seller characteristic model, we first interpret eachcharacteristic with the top key words, then extract the posterior seller characteristicproportion pðθsjwÞ, and finally profile the seller with the major characteristics. Wesummarize the major steps of top seller profiling component in Algorithm 4.

Algorithm 3. Top Seller Advertisement Generative Process

Step 1. For each seller s, choose seller characteristics proportion θs,Dirichlet αð Þ, whereθs ¼ θs1; . . . ; θsTð Þ with θst being the probability of seller s having characteristics t andα is the hyperprior parameter.

Step 2. For each characteristic z, choose seller characteristic φz~Dirichlet βð Þ, where φz ¼

φz1; . . . ; φzWð Þ with φzw being the probability of word w in characteristic z and β is thehyperprior parameter.

Step 3. For each word w in that the advertisements of seller s,a. Choose a characteristic zw ~Multinomial θsð Þ.b. Choose the word ws ~Multinomial φzð Þ

THE AZSECURE TEXT MINING SYSTEM 1073

Evaluation

To evaluate the key technical components in our proposed system, three experi-ments were conducted by comparing against several benchmark methods.

Testbed

The experiments and case study were conducted across the research testbed encom-passing eight forum-based underground economies (Table 3). Due to the sensitivity ofthe underground economy, we censored the forum names. Forums 5 through 8 wererecommended by our cyber-security expert who examined the same forums in priorstudies. Forums 1 through 4 were found via following the web links posted by users inthe four aforementioned forums. All the forums are then validated by our cyber-security expert to ensure data integrity. A majority of these forums are Russian cardingforums because East European hackers are heavily involved in cyber carding. Thesecarding forums are often equipped with sophisticated anticrawlingmeasures. To collectthem, we leveraged multiple countersecurity measures, including automated

Algorithm 4. Top Seller Profiling Algorithm

Input: D ¼ fdsjds is the collection of advertisements of top seller sgOutput: C ¼ fCsjCsisthecharacteristicsofsellersgfor each seller s dopreprocess ds using unigram model ds ¼ preprocess dsð Þ

fit the seller characteristic model to D using collapsed Gibbs samplingfor each characteristic z dopick the top-k words to interpret seller characteristic φz

for each seller s doextract the posterior seller characteristic proportion pðθsjwÞpick the top-p seller characteristics φzuse the interpreted characteristics to profile the seller Cs ¼ φzinterpretationf g

return C

Table 3. Summary of the Carding Forums

Forum Date range Members Threads Posts Language

1 1/1/2003–8/29/2015 35,406 90,520 622,560 Russian2 12/17/2010–6/23/2015 4,168 14,228 23,094 English3 6/2/2007–11/17/2015 27,607 70,302 679,893 English/Russian4 8/7/2013–3/11/2016 4,972 4,834 38,335 English5 7/20/2002–1/3/2015 13,572 49,617 395,530 Russian6 4/15/2009–9/4/2015 3,818 12,869 43,073 Russian7 2/26/2005–9/3/2015 7,761 16,194 157,106 Russian8 6/13/2007–9/3/2015 4,850 48,947 62,316 Russian

1074 LI, CHEN, AND NUNAMAKER

authentication, cookie installation, concealing request origin, and so forth. Most ofthese forums are longitudinal in nature, with thousands of members and tens ofthousands of posts and threads. All metadata are extracted, including thread title,original post content, user, date, post sequence, and so on.

Experiment 1: Thread Classification

Experimental Setup

The objective of the first experiment was to assess the effectiveness of the proposedthread classification technique and the benchmark techniques in determining thetype of advertisement threads. Each thread was to be classified into four groups:malware advertisement, stolen data advertisement, other advertisement, and non-advertisement. The first two groups of advertisement threads would lead us tomalware sellers and stolen data sellers. One thousand threads were randomlychosen and manually categorized into the aforementioned four groups. The bench-mark methods consisted of state-of-the-art text classifiers, including SVM, NB, andk-NN. All the benchmark methods used the best parameter combinations optimizedon the research testbed to allow the best comparison against the proposed threadclassification. Both our proposed thread classification technique and the benchmarkmethods were trained on the holistic feature set. Finally, all methods were evaluatedusing a fivefold cross-validation setting to avoid overfitting. Standard evaluationmetrics including precision, recall, and the F-measure were used: precision mea-sures the correctness of the identified instances matching its true class; recallmeasures the completeness of the identified instances with respect to the desiredcategory; and the F-measure assesses the overall performance by calculating theharmonic mean of precision and recall. Testing set results from each fold wereaggregated to test the statistical significance of the techniques’ performance.

Results

Table 4 shows the experimental results for categorizing malware and stolen dataadvertisements. The proposed MaxEnt-based method had the best performance,with an F-measure of 69.17 percent in categorizing malware advertisements and anF-measure of 84.88 percent in categorizing stolen data advertisements.Furthermore, our proposed method reached 90.20 percent on precision in categor-izing malware advertisements and 97.33 percent on precision in categorizing stolendata advertisements, suggesting that most of the extracted advertisements werecorrect hits. In terms of the benchmark methods, SVM had the best overallperformance. SVM achieved 49.13 percent on the F-measure in categorizing mal-ware advertisements and 77.22 percent on the F-measure in categorizing stolen dataadvertisements. Overall, our proposed MaxEnt-based thread classifier outperformed

THE AZSECURE TEXT MINING SYSTEM 1075

the best benchmark method by 20.04 percent on the F-measure in classifyingmalware advertisements and 7.66 percent in classifying stolen data advertisements.Table 5 shows the p-values for the pairwise t-test conducted on the classification

evaluation for both malware advertisements and stolen data advertisements. In extract-ing malware advertisements, the MaxEnt-based thread classifier significantly outper-formed the benchmark methods on most experimental metrics. Specifically, itsignificantly outperformed all the benchmark methods on recall and the F-measureand it significantly outperformed NB and kNN on precision. In extracting stolen dataadvertisements, our proposed method significantly outperformed all the benchmarkmethods on the F-measure, NB and kNN on precision, and NB and SVM on recall.

Discussion

In Experiment 1, we assessed the system’s effectiveness in identifying the malwaresellers and stolen data sellers. There are several interesting findings. First, theoverall performance for identifying the malware sellers was not as good as thatfor identifying the stolen data sellers. We believe that the variation of malwareadvertisement content might be the cause. In the case of malware advertisements,less text was observed in malware advertisements probably because malwarewriters wanted to prevent imitation and potential competition. Therefore, the

Table 4. Thread Classification Performance (%)

Malware Stolen data

Techniques Precision Recall F-measure Precision Recall F-measure

MaxEnt 90.20 56.10 69.17 97.33 75.26 84.88NB 37.01 57.32 44.98 58.45 85.57 69.46SVM 87.50 34.15 49.13 100.00 62.89 77.22kNN 41.56 39.02 40.25 64.86 74.23 69.23

Table 5. P-Values for Pairwise t-tests on Precision, Recall, and the F-Measure forMaxEnt-based Method over the Benchmark Methods

MaxEntvs. NB

MaxEnt vs.SVM

MaxEntvs. kNN

MaxEntvs. NB

MaxEnt vs.SVM

MaxEntvs. kNN

Malware (H1a) Stolen data (H1b)

Precision < 0.001*** 0.1309 < 0.001*** < 0.001*** 0.176 < 0.001***Recall 0.1785 0.0082** 0.0170* 0.074 0.0221* 0.5114F- measure < 0.001*** 0.0141* < 0.001*** < 0.001***0.0441* 0.0033**

Notes: Asterisks represent significance level: *: p < 0.05, **: p < 0.01, ***: p < 0.001.

1076 LI, CHEN, AND NUNAMAKER

amount of text was insufficient for the classifiers to effectively discriminate.Second, SVM achieved the highest on precision and lowest on recall in classifyingstolen data advertisements. This suggests that SVM suffered overfitting: SVMfailed to identify many stolen data advertisements but what SVM captured wasmostly correct. On the other hand, our proposed method is capable of balancingbetween precision and recall, resulting in a higher F-measure. Third, we notice thatMaxEnt performed far better than the benchmark methods. This is attributed toMaxEnt’s advanced feature selection technique, which emphasizes the most rele-vant features to each class. To demonstrate this point, we examined the impact ofN-gram features on the overall performance (Figure 3). When given the unigramfeature set, the overall performance of MaxEnt is comparable to SVM. As weintroduced more features by allowing for higher-order N-gram features, MaxEntwas able to consistently select the most relevant features compared whereas NB andSVM suffered from overfitting.

Experiment 2: Seller Rating

Experimental Setup

The objective of the second experiment was to assess the efficacy of ourproposed seller rating technique, RNN, and the benchmark methods in determin-ing the sentiment orientation of customer reviews. The customer reviews wereto be classified into either a positive sentiment or a negative sentiment. Fourhundred customer reviews were randomly chosen and manually categorized intothe two sentiment orientations. The benchmark methods consisted of bothmachine-learning–based techniques, including SVM and NB, and the diction-ary-based technique, the well-known SentiWordNet (SWN). SVM and NB weretrained on the feature set containing N-grams and part-of-speech (POS) tags.Similarly, we optimize the benchmark methods to allow best possible

Figure 3. Impact of N-gram Features on the Overall Thread Classification Performance: (a)Malware Advertisements Classification; (b) Stolen Data Advertisements Classification

THE AZSECURE TEXT MINING SYSTEM 1077

comparison. Similar to Experiment 1, all methods were tested using a fivefoldcross-validation setting to avoid overfitting. We also use precision, recall, andthe F-measure for evaluation.

Results

Table 6 shows the experimental results for determining the sentiment orientation ofcustomer reviews. The proposed deep-learning–based seller rating technique (RNN)generally outperformed all the benchmark methods, with 96.12 percent on theF-measure in determining positive review comments and 90.14 percent on theF-measure in determining negative review comments. In terms of the benchmarkmethods, the best F-measure for positive sentiment was 93.31 percent by SVM andthe best F-measure for negative sentiment was 82.91 percent by NB. Overall, ourproposed deep-learning–based seller rating technique outperformed the best bench-mark method by 2.81 percent in positive reviews and 7.23 percent in negativereviews. Table 7 shows the p-values for the pairwise t-test for the experimentevaluating RNN against baseline methods. RNN significantly outperformed thebenchmark methods on most experimental metrics. In particular, RNN significantlyoutperformed all the benchmark methods on the F-measure. RNN also significantlyoutperformed SVM on precision and NB and SWN on recall.

Table 6. Seller Rating Performance (%)

Positive sentiment Negative sentiment

Techniques Precision Recall F-measure Precision Recall F-measure

RNN 94.20 98.11 96.12 95.05 85.71 90.14NB 94.12 90.57 92.31 79.51 86.61 82.91SVM 89.58 97.36 93.31 92.13 73.21 81.59SWN 93.40 63.45 75.57 48.29 88.39 62.46

Notes: Bolded numbers are the best performance.

Table 7. P-Values for Pairwise t-tests on Precision, Recall, and the F-Measure forDeep Learning-based Method over the Benchmark Methods

H2

RNN > NB RNN > SVM RNN > SWN

Precision 0.9866 0.0124* 0.399Recall < 0.001*** 0.4927 < 0.001***F-measure 0.0016** 0.00582** < 0.001***

1078 LI, CHEN, AND NUNAMAKER

DiscussionIn Experiment 2, we assessed the system’s efficacy in determining the sentimentorientations (i.e., positive or negative) from customer reviews. The overall perfor-mance on our testbed was better than on previous testbeds, such as blog [4] and productreviews [7]. This is because texts on the aforementioned testbeds had relatively vaguesentiment orientations, while much of the feedback in our testbed was made up ofcomments with clear sentiment terms, such as “good,” “good seller,” or “invalid.” Toevade the investigation, Russian hackers tend to leave brief review comments withsimple terms, such as “прекрасный” (“prekrasnyy” meaning “excellent”),“отличный” (“otlichnyy” meaning “great”), and “теряю деньги” (“teryayu den’gi”meaning “lost money”), which significantly lowered the translation difficulty causedby ambiguity. Google Translate has thus reduced its negative side effect to the mini-mum. The machine-learning–based techniques had better overall performance (i.e., theF-measure) than the dictionary-based approach. The reason is that machine-learning–based techniques had the ability to learn new patterns from data, while the scoring rulesin the dictionary-based technique were predefined and may not apply to our testbed.Overall, our proposed technique is effective in evaluating customer reviews and furtherrating the sellers.

Experiment 3: Top Seller Profiling

Experimental Setup

The objective of the third experiment was to assess the effectiveness of the proposedLDA-based top seller profiling technique and the benchmark methods in modelingseller advertisements. The most typical evaluation for topic models is perplexity,which measures the model’s ability to predict unobserved documents [8, 10]. Inparticular, a good model learned on the training set should give better prediction onthe testing set. Better models have lower perplexity, suggesting less uncertainty aboutthe unobserved document. Perplexity given test data D was computed as follows:

Perplexity Dð Þ ¼ exp �XDd¼1

log p wdð Þ=XDd¼1

N dð ÞÞ;

where p wdð Þ is the probability of document d andN dð Þ is the length of document d. Wecompared our proposed model against the benchmark methods, including unigram,bigram, and trigram models. All techniques were run using a tenfold cross-validationsetting. Our LDA-based top seller profiling technique was parameterized with 1,500iterations of collapsed Gibbs sampling and 100 seller characteristics. Hyperparameterswere set to a = 0.01 and β= 0.5 as suggested in Blei [10]. Benchmark N-grams modelswere trained with commonly adopted Good–Turing smoothing.

THE AZSECURE TEXT MINING SYSTEM 1079

Results

Table 8 shows the experimental results and t-test results. The proposed LDA-basedseller profiling technique outperformed all the benchmark methods. It achieved thelowest perplexity (1,968.08), suggesting strong certainty in modeling unobserveddocuments (i.e., sellers). In contrast, the unigram model performed poorly with aperplexity of 19,250.02. Results from the paired t-tests suggested that our proposedLDA-based top seller profiling technique significantly outperformed all the bench-mark methods.

Discussion

In the experiment, the LDA-based top seller profiling technique had the bestperformance. LDA was effective in modeling seller advertisements. It outper-formed the benchmark methods, N-grams models, because it assumes character-istics as distributions of words, and sellers as distributions of characteristicswhile N-grams models only profile sellers with frequent words. The perplexityvalue of our proposed model was comparable to Blei’s [10], suggesting aplausible model. The unigram model was much worse than the others due toits independence assumption.

Case Study: The Cyber-Forensics Setting

To demonstrate the value of our system, we conducted a case study to examinethe application of our system in a real cyber-forensics setting. In particular, weshow how we address the following questions when working with real cardingcommunities: Who created the malware used to conduct the cyber cardingcrime? And who sold the stolen card data? What are the characteristics of topsellers in the carding community? We chose the top three forums (in terms ofmembers and activities) in our research testbed as the case study data set. Eachforum was processed individually so that we could compare the results acrossforums and further assess the generalizability of the system.

Table 8. Comparison of Models on Holdout Perplexity

Unigram Bigram Trigram LDA

Mean 19,250.02 4,388.08 3,682.49 1,968.08Std. dev. 2,119.24 653.05 684.41 306.07p-value < 0.001*** < 0.001*** < 0.001*** —

Notes: Asterisks represent significance level: *: p < 0.05, **: p < 0.01, ***: p < 0.001.

1080 LI, CHEN, AND NUNAMAKER

Key Seller Identification

We used the seller rating component in our system to identify the key sellers. Thethree best-rated sellers and the three worst-rated sellers for three of the forums arelisted in Table 9. Seller screen names are anonymized partially to avoid possiblecomplication.Three findings can be drawn from comparing these malware and stolen data sellers for

all three forums. First, some forums had better-rated sellers than others. For example,stolen data sellers in Forum 1 were not as highly rated as those in the other two forums.Stolen data sellers in Forum 3 were the most highly rated among the three forums.Second, malware sellers generally tended to have higher ratings than the stolen datasellers for all three forums. This phenomenon is attributable to the property of the goodsthey are selling. The quality of malware is easier to determine than that of cardinginformation. It is not easy for carders to guarantee the validity of each carding record dueto the banking industry’s remedial actions. Third, all three forums had equally fraudulentmalware sellers and stolen data sellers. This means that although different measures ofregulation had been enforced in each forum, fraudulent sellers existed universally acrossthese underground economies.

Top Seller Profiling

The top seller profiling component was trained on malware and stolen data adver-tisements separately. As a common practice from prior literature [5, 8, 45], the top

Table 9. Top 3 Best/Worst Malware and Stolen Data Sellers for Each Forum

Top 3 Best Top 3 Worst

Malware Stolen data Malware Stolen Data

Rank User Score User Score User Score User Score

Forum 11 L**G 5 I**] 3.6 N**g 1.8 I**s 2.32 V**U 4.5 A**s 3.5 K**a 2 P**A 2.33 G**l 4 D**R» 3.4 D**i 2 S**8 2.4

Forum 31 H**l 4 Rescator 4.4 N**0 2 F**4 1.32 B**t 4 F**x 4 1**4 2 S**3 1.33 S**r 4 R**c 4 M**D 2 L**u 1.3

Forum 51 P**t 5 B**r 4 R**t 1.5 R**y 12 D**n 4 B**1 4 W**0 1.6 M**n 13 D**f 4 S**c 4 G**n 2 J**a 2

Notes: Asterisks are used to anonymize seller screen names. Bolded entries are key sellers who weprovide more details on.

THE AZSECURE TEXT MINING SYSTEM 1081

10 seller characteristics were reviewed for each seller and the top key words wereextracted to interpret each characteristic. The resulting seller characteristics werecategorized into a taxonomy comprising three topical groups: product or servicetype, payment options, and contact channels. Our example relates to one of the topcarding sellers, Rescator from Forum 3, who was famous for distributing stolendata from the Target breach [27]. Table 10 shows the top 5 seller characteristics ofRescator. Based on the top key words from each seller characteristic, we profiledthis particular seller from product or service type, payment options, and contactchannels: Rescator sold mostly CC (credit card) and dumps (card magnetic strip)through the shop that required deposit and registration or email/icq/jabber andaccepts mainly webmoney, Bitcoin, and lesspay. To verify the findings from topseller profiling regarding Rescator, we compared the profile against the seller’sadvertisement in the forum. We show the essence of the advertisement in Figure 4a.The findings from the original advertisement generally matched our profiling.Furthermore, we were able to find Rescator’s shop through the link in the adver-tisement (Figure 4b), where we found millions of stolen payment cards for sale.

Table 10. Top Seller Characteristics of Rescator

# Top key words Interpretation

5 shop, wmz, icq, webmoney, price, dump, Product: CCs, dumps (valid,verified);

6 валид (valid), чекер (checker), карты (cards),баланс (balance), карт (cards)

Payment: wmz, webmoney,bitcoin, lesspay;

8 shop, good, CCs, bases, update, cards, bitcoin,webmoney, validity, lesspay

Contact: shop, register, deposit,e-mail, icq, jabber

11 dollars, dumps, deposit, payment, sell, online,verified

16 e-mail, shop, register, icq, account, jabber,

(a) (b)

Figure 4. Illustration of Seller Characteristics: (a) Seller Characteristics Reflected fromRescator’s Advertisement; (b) Rescator’s Carding Shop

1082 LI, CHEN, AND NUNAMAKER

Conclusions

In this study, we proposed the AZSecure text mining-based system for identifyingand profiling top sellers in the underground economy. Three experiments wereperformed to evaluate the efficacy of our system in identifying threads of relevance,classifying customer feedback sentiment, and profiling seller characteristics. Ourproposed system outperformed the benchmark methods, reaching an averageF-measure of about 80 percent to 90 percent. A case study was provided toillustrate the utility and applicability of our proposed system in a real cyber-forensics setting. We identified the best-rated and the least-reputable malwaresellers and stolen data sellers in three carding forums, discovered both similaritiesand differences between malware sellers and stolen data sellers and across differentforums, and delivered an accurate profile for the famous carder, Rescator.The contribution of our research is manifold. First, we proposed to study the

underground economy through the new lens of its participants, especially the keysellers. Key sellers play a pivotal role in cyber carding crime by providing thecritical criminal activities. Knowing the key sellers and observing their behaviorsenriches our understanding of the underground economy and may allow us tofurther contain potential crimes. Second, we developed advanced text miningtechniques to analyze multilingual textual traces in the underground economy. Ashackers increasingly congregate in the hacker communities, the question of how touse hacker community data to inform cyber-security intelligence remains open [33].Our research gives an example of leveraging social media analytics to probe intothe underground economy in order to target key sellers and possibly prevent futurefraud. Third, we developed a novel system capable of identifying and profiling keyinternational underground economy sellers and conducted experiments to evaluateits effectiveness. A holistic feature set and advanced text mining techniques wereleveraged to help identify and profile key sellers from carding forum discussions.Experiments demonstrated the effectiveness of our proposed system as comparedwith benchmark methods. Fourth, we built a hacker community data set encom-passing eight major carding forums. As the firsthand carding community data, thisdata set would benefit researchers in their academic exploration and practitioners intheir crime investigation.Future directions for this research are suggested as follows. First, we will con-

sider distinguishing the authenticity of the customer reviews. We treated eachreview equally, assuming that customer feedback was a true reflection of the reality.This may not be true, if feedback manipulation exists. Second, we will investigatethe correlation between price or reputation and seller quality. For example, does ahigh-quality seller necessarily benefit from higher prices and a better reputation?Third, we will also further study buyers’ advertisements, where buyers are askingfor a specific product or service. These advertisements are also valuable for under-standing carder behavior. Fourth, we intend to develop language-specific sentimentanalysis that allows us to better capture the semantics of the text in East Europeanlanguages, especially Russian.

THE AZSECURE TEXT MINING SYSTEM 1083

Acknowledgments: This work was supported in part by the National Science Foundationunder Grant no. SES-1314631 and DUE-1303362.

REFERENCES

1. Abbasi, A.; Chen, H.; and Nunamaker, J.F. Jr. Stylometric identification in electronicmarkets: Scalability and robustness. Journal of Management Information Systems, 25, 1(2008), 49–78.

2. Abbasi, A.; Li, W.; Benjamin, V.; Hu, S.; and Chen, H. Descriptive analytics: Investigatingexpert cybercriminals in web forums. In M. den Hengst, M. Israël, and D. Zeng (eds.).Proceedings of the IEEE Joint Intelligence and Security Informatics Conference. The Hague:IEEE, 2014, pp. 56–63.

3. Abhishek, V.; Gong, J.; and Li, B. Examining the impact of contextual ambiguity onsearch advertising keyword performance: A topic model approach. Available at SSRN:https://ssrn.com/abstract=2404081 or http://dx.doi.org/10.2139/ssrn.2404081

4. Aggarwal, R., and Singh, H.H. Differential influence of blogs across different stages ofdecision making: The case of venture capitalists. MIS Quarterly, 37, 4 (2013), 1093–1112.

5. Aral, S.; Ipeirotis, P.; and Taylor, S. Content and context: Identifying the impact ofqualitative information on consumer choice. (March 12, 2011). Available at SSRN: https://ssrn.com/abstract=1784376 or http://dx.doi.org/10.2139/ssrn.1784376

6. Aral, S., and Walker, D. Creating social contagion through viral product design: Arandomized trial of peer influence in networks.Management Science, 57, 9 (2011), 1623–1639.

7. Archak, N.; Ghose, A.; and Ipeirotis, P.G. Deriving the pricing power of productfeatures by mining consumer reviews. Management Science, 57, 8 (2011), 1485–1509.

8. Bao, Y., and Datta, A. Simultaneously discovering and quantifying risk types fromtextual risk disclosures. Management Science, 60, 6 (2014), 1371–1391.

9. Benjamin, V.A.; Li, W.; Holt, T.J.; and Chen, H. Exploring threats and vulnerabilitiesin hacker web forums, IRC and carding shops. In Proceedings of IEEE InternationalConference on Intelligence and Security Informatics. Baltimore, MD, 2015, pp. 85–90.10. Blei, D.M. Probabilistic topic models. Communications of the ACM, 55, 4 (2012), 77–84.11. Bollen, J.; Mao, H.; and Zeng, X. Twitter mood predicts the stock market. Journal of

Computational Science, 2, 1 (2011), 1–8.12. Chen, D.; Socher, R.; Manning, C.D.; and Ng, A.Y. Learning new facts from knowl-

edge bases with neural tensor networks and semantic word vectors (March 16, 2013).Available at arXiv: https://arxiv.org/abs/1301.3618.13. Chu, B.; Holt, T.J.; and Ahn, G.J. Examining the Creation, Distribution, and Function of

Malware On Line. 2010. Available at: www.ncjrs.gov/App/Publications/abstract.aspx?ID=25214314. Chung, W.; Chen, H.; and Nunamaker, J.F. Jr. A visual framework for knowledge

discovery on the web: An empirical study of business intelligence exploration. Journal ofManagement Information Systems, 21, 4 (2005), 57–84.15. Derrick, D.C.; Meservy, T.O.; Jenkins, J.L.; Burgoon, J.K.; and Nunamaker, J.F. Jr.

Detecting deceptive chat-based communication using typing behavior and message cues.ACM Transactions on Management Information Systems, 4, 2 (2013), 1–21.16. Fallmann, H.; Wondracek, G.; and Platzer, C. Covertly probing underground economy

marketplaces. In C. Kreibich and M. Jahnke (eds.). Detection of Intrusions and Malware, andVulnerability Assessment. DIMVA 2010. Lecture Notes in Computer Science, vol 6201. Bonn:Springer, 2010, pp. 101–110.17. Fossi, M.; Johnson, E.; Turner, D. et al. Symantec report on the underground economy.

Journal of Financial Services Technology, 3, 1 (2009), 77–82.18. Fuller, C.M.; Biros, D.P.; Burgoon, J.; and Nunamaker, J.F. Jr. An examination and

validation of linguistic constructs for studying high-stakes deception. Group Decision andNegotiation, 22, 1 (2013), 117–134.19. Graaf, D. De; Shosha, A.; and Gladyshev, P. BREDOLAB: Shopping in the cybercrime

underworld. In M. Rogers and K.C. Seigfried-Spellar (eds.). Digital Forensics and CyberCrime. ICDF2C 2012. Lecture Notes of the Institute for Computer Sciences, Social

1084 LI, CHEN, AND NUNAMAKER

Informatics and Telecommunications Engineering, vol 114. Lafayette: Springer, 2012, pp.302–313.20. Herley, C., and Florêncio, D. Nobody sells gold for the price of silver: Dishonesty,

uncertainty and the underground economy. In T. Moore, D. Pym, and C. Ioannidis (eds.).Economics of Information Security and Privacy. Boston: Springer, 2010, pp. 33–53.21. Holt, T.J. Examining the forces shaping cybercrime markets online. Social Science

Computer Review, 31, 2 (September 2012), 165–177.22. Holt, T.J. Exploring the social organisation and structure of stolen data markets. Global

Crime, 14, 2–3 (2013), 155–174.23. Holt, T.J., and Lampke, E. Exploring stolen data markets online: Products and market

forces. Criminal Justice Studies, 23, 1 (2010), 33–50.24. Holt, T.J.; Strumsky, D.; Smirnova, O.; and Kilger, M. Examining the social networks

of malware writers and hackers. International Journal of Cyber Criminology, 6, 1 (2012),891–903.25. Jensen, M.L.; Averbeck, J.M.; Zhang, Z.; and Wright, K.B. Credibility of anonymous

online product reviews: A language expectancy perspective. Journal of ManagementInformation Systems, 30, 1 (2013), 293–324.26. Kaspersky Lab. Carbanak APT: The great bank robbery. Securelist, 2015. https://

securelist.com/files/2015/02/Carbanak_APT_eng.pdf.27. Krebs, B. The Target breach, by the numbers. KrebsonSecurity, 2014. http://krebsonse

curity.com/2014/05/the-target-breach-by-the-numbers/.28. Lau, R.Y.K.; Liao, S.S.Y.; Wong, K.F.; and Chiu, D.K.W. Web 2.0 environmental

scanning and adaptive decision support for business mergers and acquisitions. MISQuarterly, 36, 4 (2012), 1239–1268.29. Lau, R.Y.K.; Xia, Y.; and Ye, Y. A probabilistic generative model for mining cyber-

criminal networks from online social media. Computational Intelligence Magazine, IEEE, 9,1 (2014), 31–43.30. Li, X.; Chen, H.; Zhang, Z.; Li, J.; and Nunamaker Jr., J.F. Managing knowledge in

light of its evolution process: An empirical study on citation network-based patent classifica-tion. Journal of Management Information Systems, 26, 1 (2009), 129–154.31. Lu, Y.; Polgar, M.; Luo, X.; and Cao, Y. Social network analysis of a criminal hacker

community. Journal of Computer Information Systems, 51, 2 (2010), 31–41.32. Ma, X.; Khansa, L.; Deng, Y.; and Kim, S.S. Impact of prior reviews on the subsequent

review process in reputation systems. Journal of Management Information Systems, 30, 3(2013), 279–310.33. Mahmood, M.A.; Siponen, M.; Straub, D.; and Rao, H.R. Moving toward black hat

research in information systems security: An editorial introduction to the special issue. MISQuarterly, 34, 3 (2010), 431–433.34. Manning, C.D., and Klein, D. Optimization, Maxent Models, and Conditional

Estimation without Magic. In Proceedings of the 2003 Conference of the North AmericanChapter of the Association for Computational Linguistics on Human Language Technology:Tutorials,Volume 5. Edmonton, Canada: Association for Computational Linguistics, 2003,pp. 8–8.35. McCallum, A.K. Mallet: A machine learning for language toolkit. (2002). Available at:

http://mallet.cs.umass.edu.36. Motoyama, M.; McCoy, D.; Levchenko, K.; Savage, S.; and Voelker, G.M. An analysis

of underground forums. In Proceedings of the 2011 ACM SIGCOMM Conference on InternetMeasurement Conference. New York: ACM Press, 2011, pp. 71–80.37. Nunamaker, J.F. Jr.; Chen, M.; and Purdin, T.D.M. System development in information

systems research. Journal of Management Information Systems, 7, 5 (1990), 89–106.38. Nunamaker, J.F. Jr.; Derrick, D.C.; Elkins, A.C.; Burgoon, J.K.; and Patton, M.W.

Embodied conversational agent-based kiosk for automated interviewing. Journal ofManagement Information Systems, 28, 1 (2011), 17–48.39. Odabas, M.; Breiger, R.; and Holt, T.J. Toward an economic sociology of online hacker

communities. In Twenty-Seventh Annual Meeting. London, England: Society for theAdvancement of Socio-Economics. 2015.

THE AZSECURE TEXT MINING SYSTEM 1085

40. Peretti, K. Data breaches: What the underground world of carding reveals. Santa ClaraComputer and High Tech. L.J., 25 (2008), 375–413.41. Png, I.P.L., and Wang, Q.-H. Information security: Facilitating user precautions vis-à-

vis enforcement against attackers. Journal of Management Information Systems, 26, 2(September 2009), 97–121.42. Radianti, J. A study of a social behavior inside the online black markets. In R. Savola, M.

Takesue, R. Falk, and M. Popescu (eds.). 2010 Fourth International Conference on EmergingSecurity Information, Systems and Technologies. Venice: IEEE, 2010, pp. 189–194.43. Romano, N.C. Jr.; Donovan, C.; Chen, H.; and Nunamaker Jr., J.F. A methodology for

analyzing web-based qualitative data. Journal of Management Information Systems, 19, 4(2003), 213–246.44. Sack, W. Conversation map: An interface for very-large-scale conversations. Journal

of Management Information Systems, 17, 3 (2000), 73–92.45. Singh, P.V.; Sahoo, N.; and Mukhopadhyay, T. How to attract and retain readers in

enterprise blogging? Information Systems Research, 25, 1 (March 2014), 35–52.46. Socher, R.; Perelygin, A.; Wu, J.Y. et al. Recursive deep models for semantic compo-

sitionality over a sentiment treebank. In D. Yarowsky, T. Baldwin, A. Korhonen, K. Livescu,and S. Bethard (eds.). Proceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP), Seattle, Washington, 2013, pp. 1631–1642.47. Taboada, M.; Brooke, J.; and Tofiloski, M. Lexicon-based methods for sentiment

analysis. Computational Linguistics, 37, 2 (2011), 267–307.48. United States Department of Homeland Security. U.S. Secret Service arrests one of the

world’s most prolific traffickers of stolen financial information. 2014. Available at: https://www.dhs.gov/news/2014/07/07/us-secret-service-arrests-one-worlds-most-prolific-traffickers-stolen-financial49. United States Department of Justice. Five indicted in New Jersey for largest known

data breach conspiracy. 2013. Available at: https://www.justice.gov/opa/pr/five-indicted-new-jersey-largest-known-data-breach-conspiracy50. Verizon and Verizon Business. 2014 Data breach investigations report. Verizon

Business Journal (2014), 1–60.51. Wang, Q.; Yue, W.; and Hui, K. Do hacker forums contribute to security attacks? In M.

J. Shaw, D. Zhang, and W.T. Yue (eds.). E-Life: Web-Enabled Convergence of Commerce,Work, and Social Life, Lecture Notes in Business Information Processing, vol 108. Shanghai:Springer, 2011, pp. 143–152.52. Wu, L. Social network effects on productivity and job security: Evidence from the

adoption of a social networking tool. Information Systems Research, 24, 1 (2013), 30–51.53. Yip, M.; Shadbolt, N.; and Webber, C. Why forums? An empirical analysis into the

facilitating factors of carding forums. In H. Davis, H. Halpin, and A. Pentland (eds.).Proceedings of the 5th Annual ACM Web Science, Paris, France, 2013, pp. 453–462.54. Zhang, X., and Li, C. Survival analysis on hacker forums. In K.R. Lang and W.T. Yue.

(eds.). SIGBPS Workshop on Business Processes and Service, Milan, Italy, 2013, pp. 106–110.55. Zhang, X.; Tsang, A.; Yue, W.T.; and Chau, M. The classification of hackers by

knowledge exchange behaviors. Information Systems Frontiers, 17, 6 (2015), 1239–1251.56. Zhao, Z.; Ahn, G.; Hu, H.; and Mahi, D. SocialImpact: Systematic analysis of under-

ground social dynamics. In S. Foresti, M. Yung, and F. Martinelli (eds.). Computer Security –ESORICS 2012. ESORICS 2012. Lecture Notes in Computer Science, vol 7459. Pisa:Springer, 2012, pp. 877–894.

1086 LI, CHEN, AND NUNAMAKER