Automatically Traceback RDP-Based Targeted Ransomware...

14
Research Article Automatically Traceback RDP-Based Targeted Ransomware Attacks ZiHan Wang , 1 ChaoGe Liu , 1 Jing Qiu , 2 ZhiHong Tian , 2 Xiang Cui, 2 and Shen Su 2 1 Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2 Cyberspace Institute of Advanced Technology Guangzhou University, Guangzhou, China Correspondence should be addressed to ChaoGe Liu; [email protected], Jing Qiu; [email protected], and ZhiHong Tian; [email protected] Received 13 July 2018; Revised 24 October 2018; Accepted 22 November 2018; Published 6 December 2018 Guest Editor: Vishal Sharma Copyright © 2018 ZiHan Wang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. While various ransomware defense systems have been proposed to deal with traditional randomly-spread ransomware attacks (based on their unique high-noisy behaviors at hosts and on networks), none of them considered ransomware attacks precisely aiming at specific hosts, e.g., using the common Remote Desktop Protocol (RDP). To address this problem, we propose a systematic method to fight such specifically targeted ransomware by trapping attackers via a network deception environment and then using traceback techniques to identify attack sources. In particular, we developed various monitors in the proposed deception environment to gather traceable clues about attackers, and we further design an analysis system that automatically extracts and analyze the collected clues. Our evaluations show that the proposed method can trap the adversary in the deception environment and significantly improve the efficiency of clue analysis. Furthermore, it also helps us trace back RDP-based ransomware attackers and ransomware makers in the practical applications. 1. Introduction Ransomware was first emerged in late 1980s [1, 2] and has resurfaced since 2013 [3]. Recently, several wide-spread ran- somware attacks have caused significant damages on a large number of user systems and businesses on the Internet. Symantec reported a 250% increase in new crypto ran- somware families between 2013 and 2014 [2]. In May 2017, WannaCry spread across more than 150 countries and 200,000 computers in just a few days, and severely disrupted many businesses and personal systems [4, 5]. In addition, specifically targeted ransomware like Crysis disrupted many small and large enterprises across the globe; e.g., Trend Micro observed that the Crysis family specifically targeted busi- nesses in Australia and New Zealand in September 2016. e number of such targeted ransomware attacks was doubled in January 2017, compared with in late 2016 [6]. What is more, the lack of focus on security has leſt IoT (Internet of ings) devices vulnerable, which has been the target of 10% of all ransomware attacks. Researcher predicts IoT ransomware attacks being likely to increase to around 25% to 30% of all ransomware cases [7]. Because traditional ransomware was typically spread randomly without specific targets via network scanning or host probing, they can be easily detected by monitoring of the abnormal behaviors in host activities such as file system operations and network traffic [1, 3, 8]. Recently, more and more ransomware attacks aimed at specific targets. Kaspersky Security Bulletin indicated that targeted attacks have become one of the main propagation methods for several widespread ransomware families in 2017 [9, 10]. For instance, an attacker using Crysis ransomware first logs in a victim’s host and spreads itself via a brute force attack on the common Remote Desktop Protocol (RDP). Such a targeted ransomware attack usually has a clear command-and-control structure and aimed at resource exploitation and resource theſt on these targets, while generating fairly limited noisy on hosts and networks which is hard to detect. Existing ransomware defense methods (designed for dealing with randomly-spread attacks) usually protect a host by blocking the spreading of ransomware attacks (in nearly real-time) based on the signatures generated by ransomware detection solutions. However, because of the different char- acteristic of targeted ransomware attacks with less notable Hindawi Wireless Communications and Mobile Computing Volume 2018, Article ID 7943586, 13 pages https://doi.org/10.1155/2018/7943586

Transcript of Automatically Traceback RDP-Based Targeted Ransomware...

Page 1: Automatically Traceback RDP-Based Targeted Ransomware …downloads.hindawi.com/journals/wcmc/2018/7943586.pdfResearchArticle Automatically Traceback RDP-Based Targeted Ransomware Attacks

Research ArticleAutomatically Traceback RDP-Based TargetedRansomware Attacks

ZiHan Wang 1 ChaoGe Liu 1 Jing Qiu 2 ZhiHong Tian 2 Xiang Cui2 and Shen Su2

1 Institute of Information Engineering Chinese Academy of Sciences Beijing China2Cyberspace Institute of Advanced Technology Guangzhou University Guangzhou China

Correspondence should be addressed to ChaoGe Liu liuchaogeiieaccn Jing Qiu qiujinggzhueducnand ZhiHong Tian tianzhihonggzhueducn

Received 13 July 2018 Revised 24 October 2018 Accepted 22 November 2018 Published 6 December 2018

Guest Editor Vishal Sharma

Copyright copy 2018 ZiHan Wang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

While various ransomware defense systems have been proposed to deal with traditional randomly-spread ransomware attacks(based on their unique high-noisy behaviors at hosts and on networks) none of them considered ransomware attacks preciselyaiming at specific hosts eg using the commonRemote Desktop Protocol (RDP) To address this problem we propose a systematicmethod to fight such specifically targeted ransomware by trapping attackers via a network deception environment and thenusing traceback techniques to identify attack sources In particular we developed various monitors in the proposed deceptionenvironment to gather traceable clues about attackers and we further design an analysis system that automatically extracts andanalyze the collected clues Our evaluations show that the proposed method can trap the adversary in the deception environmentand significantly improve the efficiency of clue analysis Furthermore it also helps us trace back RDP-based ransomware attackersand ransomware makers in the practical applications

1 Introduction

Ransomware was first emerged in late 1980s [1 2] and hasresurfaced since 2013 [3] Recently several wide-spread ran-somware attacks have caused significant damages on a largenumber of user systems and businesses on the InternetSymantec reported a 250 increase in new crypto ran-somware families between 2013 and 2014 [2] In May 2017WannaCry spread across more than 150 countries and200000 computers in just a few days and severely disruptedmany businesses and personal systems [4 5] In additionspecifically targeted ransomware like Crysis disrupted manysmall and large enterprises across the globe eg Trend Microobserved that the Crysis family specifically targeted busi-nesses in Australia and New Zealand in September 2016Thenumber of such targeted ransomware attacks was doubled inJanuary 2017 compared with in late 2016 [6] What is morethe lack of focus on security has left IoT (Internet of Things)devices vulnerable which has been the target of 10 of allransomware attacks Researcher predicts IoT ransomwareattacks being likely to increase to around 25 to 30 of allransomware cases [7]

Because traditional ransomware was typically spreadrandomly without specific targets via network scanning orhost probing they can be easily detected by monitoring ofthe abnormal behaviors in host activities such as file systemoperations and network traffic [1 3 8] Recently more andmore ransomware attacks aimed at specific targets KasperskySecurity Bulletin indicated that targeted attacks have becomeone of the main propagation methods for several widespreadransomware families in 2017 [9 10] For instance an attackerusing Crysis ransomware first logs in a victimrsquos host andspreads itself via a brute force attack on the common RemoteDesktop Protocol (RDP) Such a targeted ransomware attackusually has a clear command-and-control structure andaimed at resource exploitation and resource theft on thesetargets while generating fairly limited noisy on hosts andnetworks which is hard to detect

Existing ransomware defense methods (designed fordealing with randomly-spread attacks) usually protect a hostby blocking the spreading of ransomware attacks (in nearlyreal-time) based on the signatures generated by ransomwaredetection solutions However because of the different char-acteristic of targeted ransomware attacks with less notable

HindawiWireless Communications and Mobile ComputingVolume 2018 Article ID 7943586 13 pageshttpsdoiorg10115520187943586

2 Wireless Communications and Mobile Computing

patterns these traditional blocking-based defense systemsbecome much less effective for these targeted attacks

To address this issue we propose to utilize advanceddefense schemes to protect important hosts under targetedransomware attacks In this paper we utilize the cyber decep-tion technology to help us protect critical systems throughattack guidance by drawing attackers away from theseprotected systems While the cyber deception technologyhelps us protect important targets (such as in dealing withthe Advanced Persistent Threat (APT) [11 12]) it cannothelp us traceback attack sources To address this issue wefurther design specific techniques to traceback RDP-basedransomware attacks and identify the original attack sourcesas the main deterrence of ransomware attackers

Our deception environment simulates an actual user sys-tem in three layers with multiple monitors to observe variouskey system operations related to login network communica-tion clipboard process shared folder and file system It col-lects traceable clues and helps us detect the RDP ransomwareattack Because traditional tracebackmethods usually requiresecurity experts to manually analyze a large amount ofcollected clues it is difficult for make them to achieve fastresponsesTherefore we develop an automatic analysis systemto work on traceable clues by taking advantage of naturallanguage processing and machine learning techniques

To evaluate our system we invite 122 volunteers in a sim-ulated RDP-based ransomware attack The proposed systemwas able to capture traceable clues through the proposeddeception environment It can also automatically analyzethe clues effectively The convergence rate of the analysissystem reaches about 98 Moreover we demonstrated thatit helps us traceback RDP-based attack sources in practicalapplications

In summary this paper makes the following contribu-tions

(i) We propose a systematic method to deter RDP-basedransomware by identifying attackers which traps ran-somware attackers via a cyberdeception environmentand uses an automatic analysis system to obtaintraceable clues and identify attack sources

(ii) We build a deception environment to trap RDP-based ransomware attacker by simulating an userenvironment in three layers a network layer a hostlayer and a file system layer The environment helpsus discover attacker behaviors and collects attacker-related information

(iii) We develop an automatic analysis system with natu-ral language processing and machine learning tech-niques to automatically recognize effective clues fortracing back ransomware attack sources

(iv) We designed two practical experiments to test RDP-based ransomware attacks and ransomware makersand demonstrated the feasibility of the proposedsystem

The remainder of the paper is structured as follows InSection 2 we briefly present background and related workIn Section 3 we describe the methodology of our systematic

method In Section 4 we present the implementation of ourdeception environment prototype In Section 5 we describethe details of the clue analysis system In Section 6 we discussthe evaluation setup and results We conclude this paper inSection 7

2 Background and Related Work

21 Related Work on Ransomware Defense Ransomware isa type of malware which manipulates an user system toextort money It operates in many different ways eg simplylocking a userrsquos desktop or encrypting an entire file systemRecent rampant ransomware attacks have called for effectiveransomware defense solutions In the studies that tackleransomware counteraction several solutions are proposed toconfront this attack [14 15]

Some of these solutions are proposed to deal with all typesof ransomware [1 16ndash20] For example Kharraz presented adynamic analysis system calledUNVEILThe system analyzesand detects ransomware attacks by modeling ransomwarebehaviors It focuses on the observation of three elementsnamely IO data buffer entropy access patterns and filesystem activities [1] Moreover some others are type-specificsolutions that deal with only one type such as crypto-ransomware [21ndash25] For example Scaife presented an early-warning detection system that alerts users during suspiciousfile activities [21] Utilizing a set of behavior indicators thedetection system can halt a process that appears to tamperwith a large amount of user data Furthermore it is claimedthat the system can stop a ransomware execution with amedian loss of only 10 files Similarly some studies tackle thedetection of specific ransomware families only [26ndash28] Forexample Maltester is a family-specific technique proposedby Cabaj to detect Cryptowall infections [27] It employsdynamic analysis along with honeypot technology to analyzethe network behavior and detect the infection chain

These solutions can be categorized into prevention anddetection However these two kinds of countermeasures havethe following disadvantages Firstmany preventionmeasuresrequire many services to be disabled which is likely to affectservice functionality For example Prakash suggested severalprevention measures including disabled macros in officedocuments and restricted access permissions on ldquoTemprdquoand ldquoAppdatardquo folders [29] Secondly the detecting systemis often difficult to conceal itself and perform its functionswhen against ransomware attacks that precisely aiming atspecific hosts eg using the common Remote DesktopProtocol (RDP) Finally while these countermeasures canbe used to detect or block specific ransomware attacks theycannot fundamentally inhibit the spread of ransomwareBut traceability technology can fundamentally inhibit theransomware spread by traceback to attack sources

22 RDP-Based Ransomware Attacks Traditional ransom-ware randomly spreads across the Internet in executablefiles development kits macro files and other maliciousprograms on a large scale with various dissemination meth-ods including phishing emails puddle attacks vulnerabilityattacks server intrusion and supply chain pollution They

Wireless Communications and Mobile Computing 3

Ensnare to thedeception

environment

Ransom-wareDetect

Extraction Analysis Result

No

YESMonitor

① ② ③ ④ ⑤

Figure 1 Data collection and analysis process of the whole prototype

use different ways to trick a victim to launch such programsAmong these dissemination method phishing emails is themost widely used However according to Kasperskys 2017ransomware report the number of targeted ransomwareattacks based on RDP is growing rapidly

Recently more and more ransomware criminals havespread ransomware using RDP services and then installedransomware manually These attackers use a brute-forcemethod to acquire usernames and passwords on a targetmachine with an active RDP service [9] For instance oneof the typical families Crysis a copycat of Locky not onlyaims at commonbusinesses but also targets healthcare serviceproviders [9] Crysis gains access to admin level privilegesby stealing passwords and credentials In addition duringan RDP session the attacker uses both clipboard and sharedfolders to upload files to a remote host And then attackerscan installed ransomware manually

23 Cyber Deception Technology Because many critical sys-tems are known and always on it is difficult to protect themfrom potential network attacks

(i) Attackers can use zero-day vulnerabilities highlyantagonistic malicious code or other resources tobreak the defense system

(ii) Because humans are always the weakest link indefense systems attackers can use social engineeringto identify system weaknesses and penetrate thedefense

(iii) Attackers can repeatedly explore the potential vulner-abilities on a target system to identify its weaknesses

However when an attacker aims at a specific targeteg exploiting its RDP service traditional passive defensemethods cannot be usually less effective Therefore we needto use advanced active solutions to deal with such attackswithless observable features such as cyber deception

The earlier use of cyber deception technology is honey-pot Honeypot detects attacks by deploying a series of systemsor resources in the service network that do not have realbusiness When a trap is accessed it represents an attackHoneypot system generally waiting attacks passively anddoes not have the role of misleading and confusing attack-ers Whatrsquos more the honeypot system does not have realbusiness and does not have high interactive characteristicswhich may easy to be identified by attackers Compared withthe traditional honeypot system the cyber deception systemcan be deployed more conveniently the cyber deceptionenvironment is more real and can be linked with existingdefense products It can provide more effective solutions

for APT attacks ransomware attacks intranet attacks andother threats defense A Gartner report in 2015 [30] pointedout the market prospect of deception-based security defensetechnology and predicted that 10 of organizations will usedeception tools (or tactics) to counter cyber-attacks in 2018Compared with the traditional passive defense approachcyber deception technology is an active defense approachand can be applied to all stages of network attacks We canuse this technology to trap the RDP-based attacker detecttargeted attacks and deter ransomware attackers by preciselyidentifying them

Trap RDP-Based Ransomware Attackers A targeted ran-somware attack generally has three steps detection infiltra-tion and execution [31] However traditional security solu-tions are unable to copewith the internal translation phase Inaddition traditional honeypot technology (often used to fightnetwork attacks) generally does not focus on tracing back toattackers However cyber deception technology can deceivethe attacker into a surveillance environment and consume histime and energy with bait information

Detect RDP-Based Ransomware Attacks Once the attackerobtains the correct username and password combination heusually returnsmultiple timeswithin a short period to try andinfect the compromised host [6] In one particular caseCrysiswas deployed six times on an endpoint within a span of 10minutes As a result by monitoring in the cyber deceptionenvironment we can detect RDP-based ransomware attacksin time and determine the attackerrsquos behavior through theenvironment monitor

Deter the Ransomware Attacker Deterring ransomwareattackers can be approached in two ways First if an attackerrealizes he is entrapped it becomes a deterrent Second ifthe attacker is exposed to the deception environment andremains within the perspective of the defense surveillancethe monitor can collect the attackerrsquos traceable clues that areaccidentally released by the attacker (eg IP address pathnickname strings) The exposure of these clues hidden fromattackers can be a powerful deterrent to other attackers

3 Methodology

In this section we describe our method of tracing back RDP-based ransomware attackers Figure 1 summarizes the datacollection and analysis process of the entire prototype Firstwe implement a deception environment to trap attackers Sec-ond we monitor RDP-based ransomware attacks and collectinformationwhen they occurThird we extract effective clues

4 Wireless Communications and Mobile Computing

from the monitor information Fourth we use automaticanalysis to screen a large number of clues for tracing backthe attacker Finally we will generate a report to tracebackthe RDP-based ransomware attacker We refer readers toSections 4 and 5 for the detailed implementation of this proto-type

31 Deception Environment Generally the ransomwareattack execution stage has two steps login and spread [31]To build a deception environment is nontrivial in practicebecause it must make the ransomware attacker believe that itbelongs to a real user and the user data is worthy to attackBecause advanced attackers always exploit static featuresbased on certain analysis systems before they launch attacks[32] an intuitive approach to address such reconnaissanceattacks is to build the user environment in such a way thatthe user data is valid real and nondeterministic In additionthe environment serves as an ldquoenticing targetrdquo to encourageransomware attackers We elaborate on how to generatesan artificial realistic and enticing user environment for theRDP-based ransomware in Section 4

The RDP-based attackers commonly upload maliciousprograms in the following ways before spread ransomware(1) The attacker downloads malicious programs on theInternet (2) programs are transferred through FTP SCP orother transport protocols (3) programs are uploaded throughthe clipboard (4) programs are uploaded through a sharedfolder The clipboard and shared folders are most commonlyused to transfer programs by RDP-based ransomware attacksbecause they are simple and convenient However both areeasy to monitor by our proposed system

32 Environment Monitor In order to avoid the attackersobservation and collect more attackerrsquos information a sharedfolder and clipboard on the remote PC are always used totransfer ransomware programs from the attacker machineafter the attacker logs in to the environment This paperproposes three monitor layers the network layer the hostlayer and the file layer We elaborate on how to configure themonitor system for the deception environment in Section 4

The Network Layer Monitor The network layer monitordetects a remote connection and collects information includ-ing the remote IP addresses remote ports status codes ofports keyboard layout and so on When the RDP-basedattacker logs in to the host the monitor can obtain informa-tion and detect the attack without the attackers knowledge

The Host Layer Monitor We propose to detect changes suchas processes and clipboards by monitoring the host layerThe host layer monitor can gather information about theattackers behavior and their use of these system applicationsin the deception environment For instance as the clipboardis in the system-level heap space any application in the systemhas access to it The RDP-based ransomware always takesadvantage of the clipboard to interact between applicationsMoreover it might get the clues left by the attackers using theclipboard locally as theWindows system shares the clipboardby default during the RDP session

The File Layer Monitor By monitoring the file layer we canidentify ransomware attacks by file changes Furthermore itcan gather local traceable information by monitoring filesin the shared folder For instance as a shared folder on theremote PC is always used to transfer ransomware from theattacker machine during the RDP session In addition for amore convenient and quick attack the attacker often mountsthe entire local disk to the remote computer As a resultthrough the monitor of the shared folder we can detect thenewly-added shared folders in real time and capture a largeamount of path information automatically

33 Clue Extraction Through environmental monitors itcan gather a lot of information left by attackers such aslogin information communication information clipboardcontent folder path and portable execution (PE) file Manytraceable clues can be extracted here including but notrestricted to IP address keyboard layout compile path andfile path In order to analyze these clues quickly we dividedthem into two categories string clues and path clues Theseclues are then submitted to the automatic analysis systemWewill elaborate the types of clues that the proposed system canextract in Section 51

34 Automatic Analysis According to our investigation cur-rent traceback tools mostly analyze clues manually Howeverwe usually have to deal with a large amount clues withno semantic correlation Because such manual tracebackanalysis usually takes a lot time and efforts we propose anautomatic analysis system and we will elaborate on how toanalyze clues automatically in Section 52

4 Implementation of theDeception Environment

As the Windows platform is the main target of ransomwarewe chooseWindows as the proof of concept implementationIn this section we describe the implementation details of aWindows-based deception environment prototype It elabo-rates on how the deception environment traps ransomwareattackers how the monitor detects the RDP-based attack andcollects traceable clues The entire system implementationprocess is shown in Figure 2

41 At the Network Layer

411 The Login Monitor The login monitor is used to detectattacks in real time and collect the attackerrsquos login informa-tion On the Windows platform Win32 is an environmentsubsystem that provides an API for operating system servicesand functions to control all user inputs and outputs Thelogin monitor relies on Windows APIs to gain access to thesystem and run with privileges to access their own areasof memory It uses Winsock 20 to get access to networksand uses protocols other than the TCPIP suite The loginmonitor takes network requests and sends those requests tothe Winsock 20 SPI (Service Provider Interface) by callingthe main Winsock 20 file Ws2 32dll It provides access totransport service providers and namespace providers The

Wireless Communications and Mobile Computing 5

Cyber DeceptionEnvironment

Users and ProgramsAnalysis

Network AddressAnalysis

1 Construct DeceptionEnvironment

3 Clue Extraction2 Environment Monitor

4 Automated Analysis

5 Result

Login Monitor

CommunicationMonitor

Clipboard Monitor

LanguageIdentification

Traceable StringsIdentification

AuxiliaryTraceable

CluesProcess Monitor

Shared FolderMonitor

File Monitor

Login Clues

Remote Host Clues

Clipboard Clues

Compile Clues

Path Clues

Network Layer

Host Layer

File Layer

Ransomware detect

Figure 2 RDP-based ransomware attack traceback system process

IP Helper API makes it possible to get and modify networkconfiguration settings for the localhost It consists of theDLL file iphlpapidll and includes functions that can retrieveinformation about the protocols such as TCP and UDP [33]As a result the login monitor can directly access the databuffers involved in transmission control protocols routingtables network interfaces and network protocol statistics

412 The Communication Monitor The communicationmonitor is responsible for captures network traffic By locat-ing the TCP packets in the network traffic which containthe interactive configuration information of a RDP login theRDP connection information can be obtained as the attackerrsquospersonal information The main packet characteristics (1)The packetrsquos name is often ClientData (2)The packet locationis usually after the TCP three-way handshake (3) The datapacket position in the front (4) The amount of data issignificantly larger

42 At Host Layer

421 The Deception Host According to our observationthe main methods used by attackers to login a remote hostare (1) weak password direct login (2) add access accountlogin through vulnerability As a result we deliberately setthe administrator privileges of the deception environmentas weak passwords and leave common vulnerabilities in theenvironment such as EternalBlue to attract attackers Toguide attackers to upload ransomware using only clipboardand shared folders we block external traffic transfers fromoutside the environment and close common transfer ports(eg port 20 21 80 and 443)

422 The Clipboard Monitor The clipboard monitor canobtain clues in real time by monitoring the clipboardrsquoschanges It uses Clipboard Viewer to listen to messagechanges in the clipboard without affecting its contents TheClipboard Viewer is a mechanism that can get and displaythe contents of the clipboard As Windows applications aremessage-driven the key to the monitor is responding to andprocessing clipboard change messages When the contentchanges the monitor triggers the WM DRAWCLIPBOARDmessage and sends the changed message to the first window

of the Clipboard Viewer Chain After each Clipboard Viewerwindow responds to and processes the message it must sendthe message to the next window according to the handleof the next window in its saved linked list The clipboardmonitor can obtain the clipboards new contents by usingthe ldquoGetClipboardDatardquo Windows API through the windowWhen a subsequent copy or cut operation are executedthe data in the clipboard are rewritten As a result theclipboard monitor guarantees real-time listening and writesreal-time information to the log file The log file is updatedwhenever the clipboard monitor receives a clipboard changenotice When the log file is updated to prevent it from beingdetected by the attacker or being encrypted by ransomwarethe monitor sends it to a secure host and completely erases itfrom the environment

423 The Process Monitor To run a program on a Windowssystem a new process must be created The monitor gets thePE file run by the attacker by monitoring the environmentprocess It first records the state of processes commonlyused in the deception environments before a RDP-basedransomware attack After the login monitor detects such anattack it monitors the system for newly created processesin the environment through the Windows API ldquoCreateTool-help32Snapshotrdquo takes status snapshots of all processes in realtime which includes the process identifier (PID) When asuspicious process has started the monitor recognizes it bythe PID and looks up the processs running path with thehelp of ldquoGetModuleFileNameExrdquo Finally the monitor findsthe suspicious programs PE file through the path and copiesit to the secure host

43 At the File Layer

431 Deception Files Deception files are constructed withtwo goals First we need to make the attacker believe that thedeception environment is a real users host Second we needto make the attacker believe that there are resources in theenvironment that are worthy of attacking

To simulate a realistic environment we deploy largenumbers of different types of files on it for example imagesaudio files database files and documents that can be accessedin a Windows session Based on Amin Kharrazrsquos research [1]

6 Wireless Communications and Mobile Computing

we created four file categories that ransomware always triesto find and encrypt documents (lowasttxt lowastdoc(x) lowastppt(x)lowastxls(x) lowastpdf and lowastpy) keys and licenses (lowastkey lowastpemlowastcrt and lowastcer) file archives (lowastzip lowastrar) and media(lowastjp(e)g lowastmp3 and lowastavi) We obtain these files in threeways First we create files with valid headers and contentusing standard libraries (eg python-docx python-pptxpdfkit and OpenSSL) Second using Google search syntaxand crawler technology we download a large number offiles on the Internet Third we collect a number of non-confidential documents from the hosts of 20 volunteers toemulate actual user environments When we assign user filesfor the deception environment the path length is generatedrandomly Each folder may have a set of subfolders randomlyFor each folder a subset of extensions is randomly selectedFurthermore each directory name is generated based onmeaningful words Consequently we generate paths andextensions for user files giving them variable file depth andmeaningful content

To make the simulated environment more valuable wedeploy bait information on it such as database false codecomments digital certificates administrator password SSHkeys VPNkeys browser history passwords ARP records andDNS records When the bait information is obtained by anattacker it may trick it to attack the deception environment

432 The File Monitor The file monitor can detect theransomware by monitoring file type changes and file entropychanges which method is proposed in 2016 [21] The typeof data stored in a file can describe the order and positionof specific byte values unique to a file type Since files gen-erally retain their file types and formatting in the deceptionenvironment the bulk modification of such files should beconsidered suspicious When the monitor sees this type ofchanges we can infer that a ransomware attack has occurred

Entropy can express the randomness of each characterin a string The higher the entropy value the stronger therandomness The Shannon entropy of an array of bytes canbe computed as the sum

119890 =255

sum119894=0

119875119861119894log2

1119875119861119894

(1)

for119875119861119894= 119865119894119905119900119905119886119897119887119910119905119890119904 and119865

119894 the number of instances of byte

value 119894 in the array As the entropy value is represented by anumber from 0 to 8 the entropy value of 8 represents the bytearray composition of its completely uniform distributionSince the probability of each byte occurring in the encryptedciphertext is basically the same the entropy valuewill be closeto the upper limit Because the ransomware always encryptsa large number of files when we detect that a file change toa high entropy value file in a short period of time and alsochange the file type we assume that the file is subject to aransomware attack

433 The Shared Folder Monitor By traversing the diskstorage in real time the shared folder monitor discovers theupdates of the shared folders in real time It obtains thecontents of the attackers files locally which are often not

noticed by the attacker thus revealing some unexpectedtraceable clues The monitor can access a list of paths to theattacker shared folders

As we originally observed shared folders using RemoteDesktop often have a path in the remote host with the prefixldquotsclientrdquo When the monitor traverses the storage to thisprefix it uses ldquoFindFirstFilerdquo to find the first file It then usesldquoFindNextFilerdquo to find the next file with the returned handleWhen the resulting handle is in a folder format it continues totraverse all files under that folder Initially themonitor tries toget the full file names and file contents by traversing the newshared folder However during the actual experimentationit is found that as the number of files in storage grows themonitor takes far more time and resources to get all the filecontents than just the file paths All traverses are more likelyto alert the attacker Therefore the monitor only obtains thefile paths that the attacker shares on its host with the helpof ldquoGetFileNamerdquo Moreover in order to prevent encryptionby the ransomware the mounted disk monitor will directlytransfer the acquired shared file path list to another securehost

5 Clue Extraction and Analysis

Through the deception environment we trap the ransomwareattacker and collect a lot of information that may containmany traceable clues However such traceable clues areoften not visually observable and are complex in nature Inaddition many of the above clues contain information that isnot helpful in tracing back ransomware attackers Thereforein order to assist in the analysis of the monitor informationand extract the effective main traceable clues in this sectionwe propose how to extract clues and how to analyze traceableclues using an automatic approach after extraction

51 Clue Extraction We mainly obtain kinds of clues fromthe extraction including remote login information (IPaddresses) network traffic clipboard contents (pictures andtexts) shared folder information (path strings) and ran-somware samples (compile time and compile paths) Theshared folder path clues can be obtained directly from themonitor However clipboard clues compile clues and remotehost clues are often not visually observable and are complexin nature As a result the extraction module mainly focuseson the extraction of remote host clues clipboard string cluesand compilation clues

Remote Host Clue Extraction The IP address port numbersand folder path clues can be directly obtained from the logininformation and folder paths TCP packets that interact withthe configuration information in a network communicationPCAP package are usually named ldquoClientDatardquo We extractthe client name field from the ldquoClientDatardquo packet to obtainthe attackerrsquos hostname In addition the KeyboardLayoutfield indicates the default keyboard layout for an attack-ing host eg the Chinese Simplified layout number is0x0004 and the American English keyboard layout numberis 0x0409The remote users idiomatic language (the mother

Wireless Communications and Mobile Computing 7

tongue) can be found by the keyboard layout to infer theattackers nationality

Clipboard Clue Extraction The main file formats availableto the clipboard monitor are Windows Bitmap GDI fileANSI characters Unicode characters and WAV audio dataWe mainly aim at extracting the traceable clues of charactertypes It extracts character clues from the clipboard in variousformats by judging the GetClipboardData APIrsquos ldquoDataTyperdquovalue

Compilation Clue Extraction For all Windows RDP-basedransomware samples that we examined we empiricallyobserved that the most commonly used formats for thesesamples are the PE file especially lowastexe and lowastdll some PEfiles have compilation information in the file and this infor-mation does not change with the migration of the programsAs a result it is a goodway to obtain the creatorrsquos informationA PE file mainly consists of five major components DOSMZ header DOS stub PE header section headers andsection content Each component contains a great deal ofinformation There is very little information that we can useto identify the creator and some identification informationneeds to be extracted from the content of each section Inthis paper there are many clues in the PE files that canbe extracted to trace back the attacker file name PE filetype compiler version compilation path compile time lastmodified time last open time IP address URL domainname language string wide character and so on We extractmost of the clues with PEView [34] However since it cannotdirectly obtain the compilation path we use the pefile [35]tool to extract paths by locating in the PE structure

Since the extraction clues include different encodingformats to facilitate observation and the unified mode ofsubsequent analysis the extraction system completely con-verts the data encoding obtained into the UTF-8 format andsaved in the SQLite database Before submitted to the analysissystem the extracted clued are divided into two categoriesstring clues and path clues

52 Automatic Analysis At this point the clues extractedfrom this system mainly include string clues and path cluesThenumber of string clues is largemixedwith a large numberof unidentifiable strings Because the number of path cluesis also very large with no semantic correlation it is difficultto identify traceable clues manually So we focus on how toautomate the identification of traceable clues for path clues

521 Users and Programs Analysis In the analysis of thepath clues we first propose to obtain the attacker clues byidentifying the features of the context-related segmentationon the same path For instance each user has a separateuser folder and it is located in the ldquoUsersrdquo folder underDrive C As a result the system can obtain the user nameat the attacking host by obtaining the folder name under theldquoUsersrdquo folder (eg CUsersDell) The ldquoProgram filesrdquofolder usually contains the name of the software programinstalled on the machine (eg CProgram FilesMicrosoftVisual Studio 110) What is more the QQ account

number is always located in the ldquoQQQQfilerdquo folder (egDQQQQfile86lowast lowast lowast lowast lowastlowast086FileRecv)

In this way the analysis system can quickly and accuratelyget host names email accounts program names socialsoftware numbers and other traceable clues that are carriedin the attackers file paths However such user informationfor less of the overall clues is acquired by chance Thereforewe will conduct further analysis on this basis

522 Account Analysis Through the analysis of APT1 andsome other attribution reports we find that the mappingbetween the attacker and the physical world identity can bebetter obtained by analyzing the account number left by theattacker This information includes but is not limited to thelocation of the IP address the spelling and registration of thedomain name the URL corresponding IP address and thedomain name of the mailbox account Because it is difficultto identify this information effectively in a large numberof strings and path clues the analysis system automaticallyidentifies the IP address domain name URL and mailboxaccount by regular matching Then with the help of threatintelligence and big data technology more relevant clues areobtained

523 Language Identification User languages often help todetermine an attackers idiomatic language but because ofa large number of languages in different countries andthe high similarity of some languages we use automaticanalysis systems to identify the language of clues We testthe accuracy of two language identification toolkits usingentire path information in four different languages We havefound ldquolangidpyrdquo toolkit to be overall more accurate thanldquolangdetectrdquo toolkit The comparison results are shown inFigure 3 The langidpy is a language identification toolkitdeveloped by Lui and Baldwin at the University ofMelbourne[36] It combines a naive Bayes classifier with cross-domainfeature selection to provide domain-independent languageidentification

524 Traceable Strings Identification When traceable stringsare needed traditional string analysis methods usually useNamed Entity Recognition (NER) However the clues tobe analyzed mainly include strings and path Strings beforeand after the path separator have few semantic correlationsWhat is more the string between the path separators andthe remaining strings to be analyzed are mostly semanticallyunrelated due to their limited length So this paper proposesan algorithm which can quickly and automatically analyzethe traceable strings in strings and path

In order to filter out meaningful traceable clues related tothe attackerrsquos identity the path clues and string clues are splitinto strings and identified by common words and gibberishin the following steps Figure 4 shows the automatic traceableclues identification system process

Make Stop Words The system splits the path by the pathdelimiter as these separated path strings that are commonto multiple computers have no identifying effect So we takeout the file string names that are common to 20 normal user

8 Wireless Communications and Mobile Computing88

117

99 107

12

0

11

1

73

1

44

5

27

116

66

103

ENGLISH KOREANRUSSIANCHINESE

langid toolkit correct recognition

langid toolkit incorrect recognitionlangdetect toolkit correct recognition

langdetect toolkit incorrect recognition

Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively

StopWords

SentenceTokenize

Sign Split

Strings

Common wordsdetect

Alphabetic CaseCharacteristics

Result

Gibberish detect

StringClues

Word Tokenize

Traceable StringsIdentification

PathClues

Path Split

Figure 4 Process of automatic identifying of traceable clues

computers as stop words Then it removes the stop wordsfrom the strings after each split

Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again

Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue

Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is

P 119883119894+1

= 119909 | 1198831= 1199091 1198832= 1199092 119883

119894= 119909119894

= 119875 119883119894+1

= 119909 | 119883119894= 119909119894

(2)

Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes

119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]

times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)

If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities

119905ℎ119903119890119904ℎ119900119897119889

=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))

2

(4)

When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish

Wireless Communications and Mobile Computing 9

Table 1 Traceable strings recognized result [13]

String CommonWordsRecognized

GibberishRecognized

FinalRecognized

Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False

Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings

6 Evaluation

In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers

61 RDP-Based Ransomware Attacker

611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system

We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

Analysis Data Change Figure

Dat

a qua

ntity

Analysis Process

50

40

30

20

10

0

source stopword tokenize gibberish result

volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6

volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12

Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]

62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections

Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host

Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely

10 Wireless Communications and Mobile Computing

UserName

KeyboardLayout

Programs

Account

TraceableStrings

353lowastlowastlowast208

AdminDefault

CH-SIMPLI

FIED

AliPayQQ

SogouInput

MeiTu

Sinfor

360

VisualStudio

372lowastlowastlowast582

DELLbotnet

lowastlowaste

lowastlowastt

Alowastlowastlowastteam

Whlowastlowastlowastterfreebuf

visumantrag

lowastlowastlowastlowast

eaccnlowastlowastlowastlowast

taccn

Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string

used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users

Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features

Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use

Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by

Figure 7 The registration infographic of QQ account 372lowast lowast lowast82

security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname

63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware

Wireless Communications and Mobile Computing 11

Table 2 Same identifier for different samples [13]

Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc

jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501

8787197

0 100908070605040302010

number

Ransomware without a PDB pathRansomware with PDB path

Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation

samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples

What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information

Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion

Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been

353

5809

456

046

057

011

661034

399205057

1720

034114

046

zhennlnnslnbdeit

humtplfietdaes

Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)

translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo

ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo

We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following

(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China

(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)

12 Wireless Communications and Mobile Computing

Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library

(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China

7 Conclusion

In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)

Disclosure

An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)

References

[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016

[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015

[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015

[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017

[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018

[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware

[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817

[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018

[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf

[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018

[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf

[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018

[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018

[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A

Wireless Communications and Mobile Computing 13

survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018

[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018

[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015

[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016

[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020

[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016

[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015

[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016

[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016

[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015

[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015

[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016

[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016

[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015

[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018

[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017

[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015

[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016

[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017

[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments

[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language

identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012

[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004

[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst

[39] ldquoVirusTotalrdquo httpvirustotalcom

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: Automatically Traceback RDP-Based Targeted Ransomware …downloads.hindawi.com/journals/wcmc/2018/7943586.pdfResearchArticle Automatically Traceback RDP-Based Targeted Ransomware Attacks

2 Wireless Communications and Mobile Computing

patterns these traditional blocking-based defense systemsbecome much less effective for these targeted attacks

To address this issue we propose to utilize advanceddefense schemes to protect important hosts under targetedransomware attacks In this paper we utilize the cyber decep-tion technology to help us protect critical systems throughattack guidance by drawing attackers away from theseprotected systems While the cyber deception technologyhelps us protect important targets (such as in dealing withthe Advanced Persistent Threat (APT) [11 12]) it cannothelp us traceback attack sources To address this issue wefurther design specific techniques to traceback RDP-basedransomware attacks and identify the original attack sourcesas the main deterrence of ransomware attackers

Our deception environment simulates an actual user sys-tem in three layers with multiple monitors to observe variouskey system operations related to login network communica-tion clipboard process shared folder and file system It col-lects traceable clues and helps us detect the RDP ransomwareattack Because traditional tracebackmethods usually requiresecurity experts to manually analyze a large amount ofcollected clues it is difficult for make them to achieve fastresponsesTherefore we develop an automatic analysis systemto work on traceable clues by taking advantage of naturallanguage processing and machine learning techniques

To evaluate our system we invite 122 volunteers in a sim-ulated RDP-based ransomware attack The proposed systemwas able to capture traceable clues through the proposeddeception environment It can also automatically analyzethe clues effectively The convergence rate of the analysissystem reaches about 98 Moreover we demonstrated thatit helps us traceback RDP-based attack sources in practicalapplications

In summary this paper makes the following contribu-tions

(i) We propose a systematic method to deter RDP-basedransomware by identifying attackers which traps ran-somware attackers via a cyberdeception environmentand uses an automatic analysis system to obtaintraceable clues and identify attack sources

(ii) We build a deception environment to trap RDP-based ransomware attacker by simulating an userenvironment in three layers a network layer a hostlayer and a file system layer The environment helpsus discover attacker behaviors and collects attacker-related information

(iii) We develop an automatic analysis system with natu-ral language processing and machine learning tech-niques to automatically recognize effective clues fortracing back ransomware attack sources

(iv) We designed two practical experiments to test RDP-based ransomware attacks and ransomware makersand demonstrated the feasibility of the proposedsystem

The remainder of the paper is structured as follows InSection 2 we briefly present background and related workIn Section 3 we describe the methodology of our systematic

method In Section 4 we present the implementation of ourdeception environment prototype In Section 5 we describethe details of the clue analysis system In Section 6 we discussthe evaluation setup and results We conclude this paper inSection 7

2 Background and Related Work

21 Related Work on Ransomware Defense Ransomware isa type of malware which manipulates an user system toextort money It operates in many different ways eg simplylocking a userrsquos desktop or encrypting an entire file systemRecent rampant ransomware attacks have called for effectiveransomware defense solutions In the studies that tackleransomware counteraction several solutions are proposed toconfront this attack [14 15]

Some of these solutions are proposed to deal with all typesof ransomware [1 16ndash20] For example Kharraz presented adynamic analysis system calledUNVEILThe system analyzesand detects ransomware attacks by modeling ransomwarebehaviors It focuses on the observation of three elementsnamely IO data buffer entropy access patterns and filesystem activities [1] Moreover some others are type-specificsolutions that deal with only one type such as crypto-ransomware [21ndash25] For example Scaife presented an early-warning detection system that alerts users during suspiciousfile activities [21] Utilizing a set of behavior indicators thedetection system can halt a process that appears to tamperwith a large amount of user data Furthermore it is claimedthat the system can stop a ransomware execution with amedian loss of only 10 files Similarly some studies tackle thedetection of specific ransomware families only [26ndash28] Forexample Maltester is a family-specific technique proposedby Cabaj to detect Cryptowall infections [27] It employsdynamic analysis along with honeypot technology to analyzethe network behavior and detect the infection chain

These solutions can be categorized into prevention anddetection However these two kinds of countermeasures havethe following disadvantages Firstmany preventionmeasuresrequire many services to be disabled which is likely to affectservice functionality For example Prakash suggested severalprevention measures including disabled macros in officedocuments and restricted access permissions on ldquoTemprdquoand ldquoAppdatardquo folders [29] Secondly the detecting systemis often difficult to conceal itself and perform its functionswhen against ransomware attacks that precisely aiming atspecific hosts eg using the common Remote DesktopProtocol (RDP) Finally while these countermeasures canbe used to detect or block specific ransomware attacks theycannot fundamentally inhibit the spread of ransomwareBut traceability technology can fundamentally inhibit theransomware spread by traceback to attack sources

22 RDP-Based Ransomware Attacks Traditional ransom-ware randomly spreads across the Internet in executablefiles development kits macro files and other maliciousprograms on a large scale with various dissemination meth-ods including phishing emails puddle attacks vulnerabilityattacks server intrusion and supply chain pollution They

Wireless Communications and Mobile Computing 3

Ensnare to thedeception

environment

Ransom-wareDetect

Extraction Analysis Result

No

YESMonitor

① ② ③ ④ ⑤

Figure 1 Data collection and analysis process of the whole prototype

use different ways to trick a victim to launch such programsAmong these dissemination method phishing emails is themost widely used However according to Kasperskys 2017ransomware report the number of targeted ransomwareattacks based on RDP is growing rapidly

Recently more and more ransomware criminals havespread ransomware using RDP services and then installedransomware manually These attackers use a brute-forcemethod to acquire usernames and passwords on a targetmachine with an active RDP service [9] For instance oneof the typical families Crysis a copycat of Locky not onlyaims at commonbusinesses but also targets healthcare serviceproviders [9] Crysis gains access to admin level privilegesby stealing passwords and credentials In addition duringan RDP session the attacker uses both clipboard and sharedfolders to upload files to a remote host And then attackerscan installed ransomware manually

23 Cyber Deception Technology Because many critical sys-tems are known and always on it is difficult to protect themfrom potential network attacks

(i) Attackers can use zero-day vulnerabilities highlyantagonistic malicious code or other resources tobreak the defense system

(ii) Because humans are always the weakest link indefense systems attackers can use social engineeringto identify system weaknesses and penetrate thedefense

(iii) Attackers can repeatedly explore the potential vulner-abilities on a target system to identify its weaknesses

However when an attacker aims at a specific targeteg exploiting its RDP service traditional passive defensemethods cannot be usually less effective Therefore we needto use advanced active solutions to deal with such attackswithless observable features such as cyber deception

The earlier use of cyber deception technology is honey-pot Honeypot detects attacks by deploying a series of systemsor resources in the service network that do not have realbusiness When a trap is accessed it represents an attackHoneypot system generally waiting attacks passively anddoes not have the role of misleading and confusing attack-ers Whatrsquos more the honeypot system does not have realbusiness and does not have high interactive characteristicswhich may easy to be identified by attackers Compared withthe traditional honeypot system the cyber deception systemcan be deployed more conveniently the cyber deceptionenvironment is more real and can be linked with existingdefense products It can provide more effective solutions

for APT attacks ransomware attacks intranet attacks andother threats defense A Gartner report in 2015 [30] pointedout the market prospect of deception-based security defensetechnology and predicted that 10 of organizations will usedeception tools (or tactics) to counter cyber-attacks in 2018Compared with the traditional passive defense approachcyber deception technology is an active defense approachand can be applied to all stages of network attacks We canuse this technology to trap the RDP-based attacker detecttargeted attacks and deter ransomware attackers by preciselyidentifying them

Trap RDP-Based Ransomware Attackers A targeted ran-somware attack generally has three steps detection infiltra-tion and execution [31] However traditional security solu-tions are unable to copewith the internal translation phase Inaddition traditional honeypot technology (often used to fightnetwork attacks) generally does not focus on tracing back toattackers However cyber deception technology can deceivethe attacker into a surveillance environment and consume histime and energy with bait information

Detect RDP-Based Ransomware Attacks Once the attackerobtains the correct username and password combination heusually returnsmultiple timeswithin a short period to try andinfect the compromised host [6] In one particular caseCrysiswas deployed six times on an endpoint within a span of 10minutes As a result by monitoring in the cyber deceptionenvironment we can detect RDP-based ransomware attacksin time and determine the attackerrsquos behavior through theenvironment monitor

Deter the Ransomware Attacker Deterring ransomwareattackers can be approached in two ways First if an attackerrealizes he is entrapped it becomes a deterrent Second ifthe attacker is exposed to the deception environment andremains within the perspective of the defense surveillancethe monitor can collect the attackerrsquos traceable clues that areaccidentally released by the attacker (eg IP address pathnickname strings) The exposure of these clues hidden fromattackers can be a powerful deterrent to other attackers

3 Methodology

In this section we describe our method of tracing back RDP-based ransomware attackers Figure 1 summarizes the datacollection and analysis process of the entire prototype Firstwe implement a deception environment to trap attackers Sec-ond we monitor RDP-based ransomware attacks and collectinformationwhen they occurThird we extract effective clues

4 Wireless Communications and Mobile Computing

from the monitor information Fourth we use automaticanalysis to screen a large number of clues for tracing backthe attacker Finally we will generate a report to tracebackthe RDP-based ransomware attacker We refer readers toSections 4 and 5 for the detailed implementation of this proto-type

31 Deception Environment Generally the ransomwareattack execution stage has two steps login and spread [31]To build a deception environment is nontrivial in practicebecause it must make the ransomware attacker believe that itbelongs to a real user and the user data is worthy to attackBecause advanced attackers always exploit static featuresbased on certain analysis systems before they launch attacks[32] an intuitive approach to address such reconnaissanceattacks is to build the user environment in such a way thatthe user data is valid real and nondeterministic In additionthe environment serves as an ldquoenticing targetrdquo to encourageransomware attackers We elaborate on how to generatesan artificial realistic and enticing user environment for theRDP-based ransomware in Section 4

The RDP-based attackers commonly upload maliciousprograms in the following ways before spread ransomware(1) The attacker downloads malicious programs on theInternet (2) programs are transferred through FTP SCP orother transport protocols (3) programs are uploaded throughthe clipboard (4) programs are uploaded through a sharedfolder The clipboard and shared folders are most commonlyused to transfer programs by RDP-based ransomware attacksbecause they are simple and convenient However both areeasy to monitor by our proposed system

32 Environment Monitor In order to avoid the attackersobservation and collect more attackerrsquos information a sharedfolder and clipboard on the remote PC are always used totransfer ransomware programs from the attacker machineafter the attacker logs in to the environment This paperproposes three monitor layers the network layer the hostlayer and the file layer We elaborate on how to configure themonitor system for the deception environment in Section 4

The Network Layer Monitor The network layer monitordetects a remote connection and collects information includ-ing the remote IP addresses remote ports status codes ofports keyboard layout and so on When the RDP-basedattacker logs in to the host the monitor can obtain informa-tion and detect the attack without the attackers knowledge

The Host Layer Monitor We propose to detect changes suchas processes and clipboards by monitoring the host layerThe host layer monitor can gather information about theattackers behavior and their use of these system applicationsin the deception environment For instance as the clipboardis in the system-level heap space any application in the systemhas access to it The RDP-based ransomware always takesadvantage of the clipboard to interact between applicationsMoreover it might get the clues left by the attackers using theclipboard locally as theWindows system shares the clipboardby default during the RDP session

The File Layer Monitor By monitoring the file layer we canidentify ransomware attacks by file changes Furthermore itcan gather local traceable information by monitoring filesin the shared folder For instance as a shared folder on theremote PC is always used to transfer ransomware from theattacker machine during the RDP session In addition for amore convenient and quick attack the attacker often mountsthe entire local disk to the remote computer As a resultthrough the monitor of the shared folder we can detect thenewly-added shared folders in real time and capture a largeamount of path information automatically

33 Clue Extraction Through environmental monitors itcan gather a lot of information left by attackers such aslogin information communication information clipboardcontent folder path and portable execution (PE) file Manytraceable clues can be extracted here including but notrestricted to IP address keyboard layout compile path andfile path In order to analyze these clues quickly we dividedthem into two categories string clues and path clues Theseclues are then submitted to the automatic analysis systemWewill elaborate the types of clues that the proposed system canextract in Section 51

34 Automatic Analysis According to our investigation cur-rent traceback tools mostly analyze clues manually Howeverwe usually have to deal with a large amount clues withno semantic correlation Because such manual tracebackanalysis usually takes a lot time and efforts we propose anautomatic analysis system and we will elaborate on how toanalyze clues automatically in Section 52

4 Implementation of theDeception Environment

As the Windows platform is the main target of ransomwarewe chooseWindows as the proof of concept implementationIn this section we describe the implementation details of aWindows-based deception environment prototype It elabo-rates on how the deception environment traps ransomwareattackers how the monitor detects the RDP-based attack andcollects traceable clues The entire system implementationprocess is shown in Figure 2

41 At the Network Layer

411 The Login Monitor The login monitor is used to detectattacks in real time and collect the attackerrsquos login informa-tion On the Windows platform Win32 is an environmentsubsystem that provides an API for operating system servicesand functions to control all user inputs and outputs Thelogin monitor relies on Windows APIs to gain access to thesystem and run with privileges to access their own areasof memory It uses Winsock 20 to get access to networksand uses protocols other than the TCPIP suite The loginmonitor takes network requests and sends those requests tothe Winsock 20 SPI (Service Provider Interface) by callingthe main Winsock 20 file Ws2 32dll It provides access totransport service providers and namespace providers The

Wireless Communications and Mobile Computing 5

Cyber DeceptionEnvironment

Users and ProgramsAnalysis

Network AddressAnalysis

1 Construct DeceptionEnvironment

3 Clue Extraction2 Environment Monitor

4 Automated Analysis

5 Result

Login Monitor

CommunicationMonitor

Clipboard Monitor

LanguageIdentification

Traceable StringsIdentification

AuxiliaryTraceable

CluesProcess Monitor

Shared FolderMonitor

File Monitor

Login Clues

Remote Host Clues

Clipboard Clues

Compile Clues

Path Clues

Network Layer

Host Layer

File Layer

Ransomware detect

Figure 2 RDP-based ransomware attack traceback system process

IP Helper API makes it possible to get and modify networkconfiguration settings for the localhost It consists of theDLL file iphlpapidll and includes functions that can retrieveinformation about the protocols such as TCP and UDP [33]As a result the login monitor can directly access the databuffers involved in transmission control protocols routingtables network interfaces and network protocol statistics

412 The Communication Monitor The communicationmonitor is responsible for captures network traffic By locat-ing the TCP packets in the network traffic which containthe interactive configuration information of a RDP login theRDP connection information can be obtained as the attackerrsquospersonal information The main packet characteristics (1)The packetrsquos name is often ClientData (2)The packet locationis usually after the TCP three-way handshake (3) The datapacket position in the front (4) The amount of data issignificantly larger

42 At Host Layer

421 The Deception Host According to our observationthe main methods used by attackers to login a remote hostare (1) weak password direct login (2) add access accountlogin through vulnerability As a result we deliberately setthe administrator privileges of the deception environmentas weak passwords and leave common vulnerabilities in theenvironment such as EternalBlue to attract attackers Toguide attackers to upload ransomware using only clipboardand shared folders we block external traffic transfers fromoutside the environment and close common transfer ports(eg port 20 21 80 and 443)

422 The Clipboard Monitor The clipboard monitor canobtain clues in real time by monitoring the clipboardrsquoschanges It uses Clipboard Viewer to listen to messagechanges in the clipboard without affecting its contents TheClipboard Viewer is a mechanism that can get and displaythe contents of the clipboard As Windows applications aremessage-driven the key to the monitor is responding to andprocessing clipboard change messages When the contentchanges the monitor triggers the WM DRAWCLIPBOARDmessage and sends the changed message to the first window

of the Clipboard Viewer Chain After each Clipboard Viewerwindow responds to and processes the message it must sendthe message to the next window according to the handleof the next window in its saved linked list The clipboardmonitor can obtain the clipboards new contents by usingthe ldquoGetClipboardDatardquo Windows API through the windowWhen a subsequent copy or cut operation are executedthe data in the clipboard are rewritten As a result theclipboard monitor guarantees real-time listening and writesreal-time information to the log file The log file is updatedwhenever the clipboard monitor receives a clipboard changenotice When the log file is updated to prevent it from beingdetected by the attacker or being encrypted by ransomwarethe monitor sends it to a secure host and completely erases itfrom the environment

423 The Process Monitor To run a program on a Windowssystem a new process must be created The monitor gets thePE file run by the attacker by monitoring the environmentprocess It first records the state of processes commonlyused in the deception environments before a RDP-basedransomware attack After the login monitor detects such anattack it monitors the system for newly created processesin the environment through the Windows API ldquoCreateTool-help32Snapshotrdquo takes status snapshots of all processes in realtime which includes the process identifier (PID) When asuspicious process has started the monitor recognizes it bythe PID and looks up the processs running path with thehelp of ldquoGetModuleFileNameExrdquo Finally the monitor findsthe suspicious programs PE file through the path and copiesit to the secure host

43 At the File Layer

431 Deception Files Deception files are constructed withtwo goals First we need to make the attacker believe that thedeception environment is a real users host Second we needto make the attacker believe that there are resources in theenvironment that are worthy of attacking

To simulate a realistic environment we deploy largenumbers of different types of files on it for example imagesaudio files database files and documents that can be accessedin a Windows session Based on Amin Kharrazrsquos research [1]

6 Wireless Communications and Mobile Computing

we created four file categories that ransomware always triesto find and encrypt documents (lowasttxt lowastdoc(x) lowastppt(x)lowastxls(x) lowastpdf and lowastpy) keys and licenses (lowastkey lowastpemlowastcrt and lowastcer) file archives (lowastzip lowastrar) and media(lowastjp(e)g lowastmp3 and lowastavi) We obtain these files in threeways First we create files with valid headers and contentusing standard libraries (eg python-docx python-pptxpdfkit and OpenSSL) Second using Google search syntaxand crawler technology we download a large number offiles on the Internet Third we collect a number of non-confidential documents from the hosts of 20 volunteers toemulate actual user environments When we assign user filesfor the deception environment the path length is generatedrandomly Each folder may have a set of subfolders randomlyFor each folder a subset of extensions is randomly selectedFurthermore each directory name is generated based onmeaningful words Consequently we generate paths andextensions for user files giving them variable file depth andmeaningful content

To make the simulated environment more valuable wedeploy bait information on it such as database false codecomments digital certificates administrator password SSHkeys VPNkeys browser history passwords ARP records andDNS records When the bait information is obtained by anattacker it may trick it to attack the deception environment

432 The File Monitor The file monitor can detect theransomware by monitoring file type changes and file entropychanges which method is proposed in 2016 [21] The typeof data stored in a file can describe the order and positionof specific byte values unique to a file type Since files gen-erally retain their file types and formatting in the deceptionenvironment the bulk modification of such files should beconsidered suspicious When the monitor sees this type ofchanges we can infer that a ransomware attack has occurred

Entropy can express the randomness of each characterin a string The higher the entropy value the stronger therandomness The Shannon entropy of an array of bytes canbe computed as the sum

119890 =255

sum119894=0

119875119861119894log2

1119875119861119894

(1)

for119875119861119894= 119865119894119905119900119905119886119897119887119910119905119890119904 and119865

119894 the number of instances of byte

value 119894 in the array As the entropy value is represented by anumber from 0 to 8 the entropy value of 8 represents the bytearray composition of its completely uniform distributionSince the probability of each byte occurring in the encryptedciphertext is basically the same the entropy valuewill be closeto the upper limit Because the ransomware always encryptsa large number of files when we detect that a file change toa high entropy value file in a short period of time and alsochange the file type we assume that the file is subject to aransomware attack

433 The Shared Folder Monitor By traversing the diskstorage in real time the shared folder monitor discovers theupdates of the shared folders in real time It obtains thecontents of the attackers files locally which are often not

noticed by the attacker thus revealing some unexpectedtraceable clues The monitor can access a list of paths to theattacker shared folders

As we originally observed shared folders using RemoteDesktop often have a path in the remote host with the prefixldquotsclientrdquo When the monitor traverses the storage to thisprefix it uses ldquoFindFirstFilerdquo to find the first file It then usesldquoFindNextFilerdquo to find the next file with the returned handleWhen the resulting handle is in a folder format it continues totraverse all files under that folder Initially themonitor tries toget the full file names and file contents by traversing the newshared folder However during the actual experimentationit is found that as the number of files in storage grows themonitor takes far more time and resources to get all the filecontents than just the file paths All traverses are more likelyto alert the attacker Therefore the monitor only obtains thefile paths that the attacker shares on its host with the helpof ldquoGetFileNamerdquo Moreover in order to prevent encryptionby the ransomware the mounted disk monitor will directlytransfer the acquired shared file path list to another securehost

5 Clue Extraction and Analysis

Through the deception environment we trap the ransomwareattacker and collect a lot of information that may containmany traceable clues However such traceable clues areoften not visually observable and are complex in nature Inaddition many of the above clues contain information that isnot helpful in tracing back ransomware attackers Thereforein order to assist in the analysis of the monitor informationand extract the effective main traceable clues in this sectionwe propose how to extract clues and how to analyze traceableclues using an automatic approach after extraction

51 Clue Extraction We mainly obtain kinds of clues fromthe extraction including remote login information (IPaddresses) network traffic clipboard contents (pictures andtexts) shared folder information (path strings) and ran-somware samples (compile time and compile paths) Theshared folder path clues can be obtained directly from themonitor However clipboard clues compile clues and remotehost clues are often not visually observable and are complexin nature As a result the extraction module mainly focuseson the extraction of remote host clues clipboard string cluesand compilation clues

Remote Host Clue Extraction The IP address port numbersand folder path clues can be directly obtained from the logininformation and folder paths TCP packets that interact withthe configuration information in a network communicationPCAP package are usually named ldquoClientDatardquo We extractthe client name field from the ldquoClientDatardquo packet to obtainthe attackerrsquos hostname In addition the KeyboardLayoutfield indicates the default keyboard layout for an attack-ing host eg the Chinese Simplified layout number is0x0004 and the American English keyboard layout numberis 0x0409The remote users idiomatic language (the mother

Wireless Communications and Mobile Computing 7

tongue) can be found by the keyboard layout to infer theattackers nationality

Clipboard Clue Extraction The main file formats availableto the clipboard monitor are Windows Bitmap GDI fileANSI characters Unicode characters and WAV audio dataWe mainly aim at extracting the traceable clues of charactertypes It extracts character clues from the clipboard in variousformats by judging the GetClipboardData APIrsquos ldquoDataTyperdquovalue

Compilation Clue Extraction For all Windows RDP-basedransomware samples that we examined we empiricallyobserved that the most commonly used formats for thesesamples are the PE file especially lowastexe and lowastdll some PEfiles have compilation information in the file and this infor-mation does not change with the migration of the programsAs a result it is a goodway to obtain the creatorrsquos informationA PE file mainly consists of five major components DOSMZ header DOS stub PE header section headers andsection content Each component contains a great deal ofinformation There is very little information that we can useto identify the creator and some identification informationneeds to be extracted from the content of each section Inthis paper there are many clues in the PE files that canbe extracted to trace back the attacker file name PE filetype compiler version compilation path compile time lastmodified time last open time IP address URL domainname language string wide character and so on We extractmost of the clues with PEView [34] However since it cannotdirectly obtain the compilation path we use the pefile [35]tool to extract paths by locating in the PE structure

Since the extraction clues include different encodingformats to facilitate observation and the unified mode ofsubsequent analysis the extraction system completely con-verts the data encoding obtained into the UTF-8 format andsaved in the SQLite database Before submitted to the analysissystem the extracted clued are divided into two categoriesstring clues and path clues

52 Automatic Analysis At this point the clues extractedfrom this system mainly include string clues and path cluesThenumber of string clues is largemixedwith a large numberof unidentifiable strings Because the number of path cluesis also very large with no semantic correlation it is difficultto identify traceable clues manually So we focus on how toautomate the identification of traceable clues for path clues

521 Users and Programs Analysis In the analysis of thepath clues we first propose to obtain the attacker clues byidentifying the features of the context-related segmentationon the same path For instance each user has a separateuser folder and it is located in the ldquoUsersrdquo folder underDrive C As a result the system can obtain the user nameat the attacking host by obtaining the folder name under theldquoUsersrdquo folder (eg CUsersDell) The ldquoProgram filesrdquofolder usually contains the name of the software programinstalled on the machine (eg CProgram FilesMicrosoftVisual Studio 110) What is more the QQ account

number is always located in the ldquoQQQQfilerdquo folder (egDQQQQfile86lowast lowast lowast lowast lowastlowast086FileRecv)

In this way the analysis system can quickly and accuratelyget host names email accounts program names socialsoftware numbers and other traceable clues that are carriedin the attackers file paths However such user informationfor less of the overall clues is acquired by chance Thereforewe will conduct further analysis on this basis

522 Account Analysis Through the analysis of APT1 andsome other attribution reports we find that the mappingbetween the attacker and the physical world identity can bebetter obtained by analyzing the account number left by theattacker This information includes but is not limited to thelocation of the IP address the spelling and registration of thedomain name the URL corresponding IP address and thedomain name of the mailbox account Because it is difficultto identify this information effectively in a large numberof strings and path clues the analysis system automaticallyidentifies the IP address domain name URL and mailboxaccount by regular matching Then with the help of threatintelligence and big data technology more relevant clues areobtained

523 Language Identification User languages often help todetermine an attackers idiomatic language but because ofa large number of languages in different countries andthe high similarity of some languages we use automaticanalysis systems to identify the language of clues We testthe accuracy of two language identification toolkits usingentire path information in four different languages We havefound ldquolangidpyrdquo toolkit to be overall more accurate thanldquolangdetectrdquo toolkit The comparison results are shown inFigure 3 The langidpy is a language identification toolkitdeveloped by Lui and Baldwin at the University ofMelbourne[36] It combines a naive Bayes classifier with cross-domainfeature selection to provide domain-independent languageidentification

524 Traceable Strings Identification When traceable stringsare needed traditional string analysis methods usually useNamed Entity Recognition (NER) However the clues tobe analyzed mainly include strings and path Strings beforeand after the path separator have few semantic correlationsWhat is more the string between the path separators andthe remaining strings to be analyzed are mostly semanticallyunrelated due to their limited length So this paper proposesan algorithm which can quickly and automatically analyzethe traceable strings in strings and path

In order to filter out meaningful traceable clues related tothe attackerrsquos identity the path clues and string clues are splitinto strings and identified by common words and gibberishin the following steps Figure 4 shows the automatic traceableclues identification system process

Make Stop Words The system splits the path by the pathdelimiter as these separated path strings that are commonto multiple computers have no identifying effect So we takeout the file string names that are common to 20 normal user

8 Wireless Communications and Mobile Computing88

117

99 107

12

0

11

1

73

1

44

5

27

116

66

103

ENGLISH KOREANRUSSIANCHINESE

langid toolkit correct recognition

langid toolkit incorrect recognitionlangdetect toolkit correct recognition

langdetect toolkit incorrect recognition

Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively

StopWords

SentenceTokenize

Sign Split

Strings

Common wordsdetect

Alphabetic CaseCharacteristics

Result

Gibberish detect

StringClues

Word Tokenize

Traceable StringsIdentification

PathClues

Path Split

Figure 4 Process of automatic identifying of traceable clues

computers as stop words Then it removes the stop wordsfrom the strings after each split

Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again

Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue

Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is

P 119883119894+1

= 119909 | 1198831= 1199091 1198832= 1199092 119883

119894= 119909119894

= 119875 119883119894+1

= 119909 | 119883119894= 119909119894

(2)

Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes

119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]

times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)

If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities

119905ℎ119903119890119904ℎ119900119897119889

=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))

2

(4)

When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish

Wireless Communications and Mobile Computing 9

Table 1 Traceable strings recognized result [13]

String CommonWordsRecognized

GibberishRecognized

FinalRecognized

Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False

Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings

6 Evaluation

In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers

61 RDP-Based Ransomware Attacker

611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system

We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

Analysis Data Change Figure

Dat

a qua

ntity

Analysis Process

50

40

30

20

10

0

source stopword tokenize gibberish result

volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6

volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12

Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]

62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections

Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host

Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely

10 Wireless Communications and Mobile Computing

UserName

KeyboardLayout

Programs

Account

TraceableStrings

353lowastlowastlowast208

AdminDefault

CH-SIMPLI

FIED

AliPayQQ

SogouInput

MeiTu

Sinfor

360

VisualStudio

372lowastlowastlowast582

DELLbotnet

lowastlowaste

lowastlowastt

Alowastlowastlowastteam

Whlowastlowastlowastterfreebuf

visumantrag

lowastlowastlowastlowast

eaccnlowastlowastlowastlowast

taccn

Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string

used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users

Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features

Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use

Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by

Figure 7 The registration infographic of QQ account 372lowast lowast lowast82

security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname

63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware

Wireless Communications and Mobile Computing 11

Table 2 Same identifier for different samples [13]

Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc

jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501

8787197

0 100908070605040302010

number

Ransomware without a PDB pathRansomware with PDB path

Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation

samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples

What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information

Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion

Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been

353

5809

456

046

057

011

661034

399205057

1720

034114

046

zhennlnnslnbdeit

humtplfietdaes

Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)

translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo

ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo

We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following

(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China

(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)

12 Wireless Communications and Mobile Computing

Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library

(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China

7 Conclusion

In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)

Disclosure

An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)

References

[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016

[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015

[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015

[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017

[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018

[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware

[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817

[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018

[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf

[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018

[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf

[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018

[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018

[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A

Wireless Communications and Mobile Computing 13

survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018

[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018

[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015

[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016

[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020

[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016

[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015

[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016

[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016

[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015

[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015

[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016

[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016

[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015

[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018

[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017

[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015

[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016

[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017

[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments

[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language

identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012

[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004

[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst

[39] ldquoVirusTotalrdquo httpvirustotalcom

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: Automatically Traceback RDP-Based Targeted Ransomware …downloads.hindawi.com/journals/wcmc/2018/7943586.pdfResearchArticle Automatically Traceback RDP-Based Targeted Ransomware Attacks

Wireless Communications and Mobile Computing 3

Ensnare to thedeception

environment

Ransom-wareDetect

Extraction Analysis Result

No

YESMonitor

① ② ③ ④ ⑤

Figure 1 Data collection and analysis process of the whole prototype

use different ways to trick a victim to launch such programsAmong these dissemination method phishing emails is themost widely used However according to Kasperskys 2017ransomware report the number of targeted ransomwareattacks based on RDP is growing rapidly

Recently more and more ransomware criminals havespread ransomware using RDP services and then installedransomware manually These attackers use a brute-forcemethod to acquire usernames and passwords on a targetmachine with an active RDP service [9] For instance oneof the typical families Crysis a copycat of Locky not onlyaims at commonbusinesses but also targets healthcare serviceproviders [9] Crysis gains access to admin level privilegesby stealing passwords and credentials In addition duringan RDP session the attacker uses both clipboard and sharedfolders to upload files to a remote host And then attackerscan installed ransomware manually

23 Cyber Deception Technology Because many critical sys-tems are known and always on it is difficult to protect themfrom potential network attacks

(i) Attackers can use zero-day vulnerabilities highlyantagonistic malicious code or other resources tobreak the defense system

(ii) Because humans are always the weakest link indefense systems attackers can use social engineeringto identify system weaknesses and penetrate thedefense

(iii) Attackers can repeatedly explore the potential vulner-abilities on a target system to identify its weaknesses

However when an attacker aims at a specific targeteg exploiting its RDP service traditional passive defensemethods cannot be usually less effective Therefore we needto use advanced active solutions to deal with such attackswithless observable features such as cyber deception

The earlier use of cyber deception technology is honey-pot Honeypot detects attacks by deploying a series of systemsor resources in the service network that do not have realbusiness When a trap is accessed it represents an attackHoneypot system generally waiting attacks passively anddoes not have the role of misleading and confusing attack-ers Whatrsquos more the honeypot system does not have realbusiness and does not have high interactive characteristicswhich may easy to be identified by attackers Compared withthe traditional honeypot system the cyber deception systemcan be deployed more conveniently the cyber deceptionenvironment is more real and can be linked with existingdefense products It can provide more effective solutions

for APT attacks ransomware attacks intranet attacks andother threats defense A Gartner report in 2015 [30] pointedout the market prospect of deception-based security defensetechnology and predicted that 10 of organizations will usedeception tools (or tactics) to counter cyber-attacks in 2018Compared with the traditional passive defense approachcyber deception technology is an active defense approachand can be applied to all stages of network attacks We canuse this technology to trap the RDP-based attacker detecttargeted attacks and deter ransomware attackers by preciselyidentifying them

Trap RDP-Based Ransomware Attackers A targeted ran-somware attack generally has three steps detection infiltra-tion and execution [31] However traditional security solu-tions are unable to copewith the internal translation phase Inaddition traditional honeypot technology (often used to fightnetwork attacks) generally does not focus on tracing back toattackers However cyber deception technology can deceivethe attacker into a surveillance environment and consume histime and energy with bait information

Detect RDP-Based Ransomware Attacks Once the attackerobtains the correct username and password combination heusually returnsmultiple timeswithin a short period to try andinfect the compromised host [6] In one particular caseCrysiswas deployed six times on an endpoint within a span of 10minutes As a result by monitoring in the cyber deceptionenvironment we can detect RDP-based ransomware attacksin time and determine the attackerrsquos behavior through theenvironment monitor

Deter the Ransomware Attacker Deterring ransomwareattackers can be approached in two ways First if an attackerrealizes he is entrapped it becomes a deterrent Second ifthe attacker is exposed to the deception environment andremains within the perspective of the defense surveillancethe monitor can collect the attackerrsquos traceable clues that areaccidentally released by the attacker (eg IP address pathnickname strings) The exposure of these clues hidden fromattackers can be a powerful deterrent to other attackers

3 Methodology

In this section we describe our method of tracing back RDP-based ransomware attackers Figure 1 summarizes the datacollection and analysis process of the entire prototype Firstwe implement a deception environment to trap attackers Sec-ond we monitor RDP-based ransomware attacks and collectinformationwhen they occurThird we extract effective clues

4 Wireless Communications and Mobile Computing

from the monitor information Fourth we use automaticanalysis to screen a large number of clues for tracing backthe attacker Finally we will generate a report to tracebackthe RDP-based ransomware attacker We refer readers toSections 4 and 5 for the detailed implementation of this proto-type

31 Deception Environment Generally the ransomwareattack execution stage has two steps login and spread [31]To build a deception environment is nontrivial in practicebecause it must make the ransomware attacker believe that itbelongs to a real user and the user data is worthy to attackBecause advanced attackers always exploit static featuresbased on certain analysis systems before they launch attacks[32] an intuitive approach to address such reconnaissanceattacks is to build the user environment in such a way thatthe user data is valid real and nondeterministic In additionthe environment serves as an ldquoenticing targetrdquo to encourageransomware attackers We elaborate on how to generatesan artificial realistic and enticing user environment for theRDP-based ransomware in Section 4

The RDP-based attackers commonly upload maliciousprograms in the following ways before spread ransomware(1) The attacker downloads malicious programs on theInternet (2) programs are transferred through FTP SCP orother transport protocols (3) programs are uploaded throughthe clipboard (4) programs are uploaded through a sharedfolder The clipboard and shared folders are most commonlyused to transfer programs by RDP-based ransomware attacksbecause they are simple and convenient However both areeasy to monitor by our proposed system

32 Environment Monitor In order to avoid the attackersobservation and collect more attackerrsquos information a sharedfolder and clipboard on the remote PC are always used totransfer ransomware programs from the attacker machineafter the attacker logs in to the environment This paperproposes three monitor layers the network layer the hostlayer and the file layer We elaborate on how to configure themonitor system for the deception environment in Section 4

The Network Layer Monitor The network layer monitordetects a remote connection and collects information includ-ing the remote IP addresses remote ports status codes ofports keyboard layout and so on When the RDP-basedattacker logs in to the host the monitor can obtain informa-tion and detect the attack without the attackers knowledge

The Host Layer Monitor We propose to detect changes suchas processes and clipboards by monitoring the host layerThe host layer monitor can gather information about theattackers behavior and their use of these system applicationsin the deception environment For instance as the clipboardis in the system-level heap space any application in the systemhas access to it The RDP-based ransomware always takesadvantage of the clipboard to interact between applicationsMoreover it might get the clues left by the attackers using theclipboard locally as theWindows system shares the clipboardby default during the RDP session

The File Layer Monitor By monitoring the file layer we canidentify ransomware attacks by file changes Furthermore itcan gather local traceable information by monitoring filesin the shared folder For instance as a shared folder on theremote PC is always used to transfer ransomware from theattacker machine during the RDP session In addition for amore convenient and quick attack the attacker often mountsthe entire local disk to the remote computer As a resultthrough the monitor of the shared folder we can detect thenewly-added shared folders in real time and capture a largeamount of path information automatically

33 Clue Extraction Through environmental monitors itcan gather a lot of information left by attackers such aslogin information communication information clipboardcontent folder path and portable execution (PE) file Manytraceable clues can be extracted here including but notrestricted to IP address keyboard layout compile path andfile path In order to analyze these clues quickly we dividedthem into two categories string clues and path clues Theseclues are then submitted to the automatic analysis systemWewill elaborate the types of clues that the proposed system canextract in Section 51

34 Automatic Analysis According to our investigation cur-rent traceback tools mostly analyze clues manually Howeverwe usually have to deal with a large amount clues withno semantic correlation Because such manual tracebackanalysis usually takes a lot time and efforts we propose anautomatic analysis system and we will elaborate on how toanalyze clues automatically in Section 52

4 Implementation of theDeception Environment

As the Windows platform is the main target of ransomwarewe chooseWindows as the proof of concept implementationIn this section we describe the implementation details of aWindows-based deception environment prototype It elabo-rates on how the deception environment traps ransomwareattackers how the monitor detects the RDP-based attack andcollects traceable clues The entire system implementationprocess is shown in Figure 2

41 At the Network Layer

411 The Login Monitor The login monitor is used to detectattacks in real time and collect the attackerrsquos login informa-tion On the Windows platform Win32 is an environmentsubsystem that provides an API for operating system servicesand functions to control all user inputs and outputs Thelogin monitor relies on Windows APIs to gain access to thesystem and run with privileges to access their own areasof memory It uses Winsock 20 to get access to networksand uses protocols other than the TCPIP suite The loginmonitor takes network requests and sends those requests tothe Winsock 20 SPI (Service Provider Interface) by callingthe main Winsock 20 file Ws2 32dll It provides access totransport service providers and namespace providers The

Wireless Communications and Mobile Computing 5

Cyber DeceptionEnvironment

Users and ProgramsAnalysis

Network AddressAnalysis

1 Construct DeceptionEnvironment

3 Clue Extraction2 Environment Monitor

4 Automated Analysis

5 Result

Login Monitor

CommunicationMonitor

Clipboard Monitor

LanguageIdentification

Traceable StringsIdentification

AuxiliaryTraceable

CluesProcess Monitor

Shared FolderMonitor

File Monitor

Login Clues

Remote Host Clues

Clipboard Clues

Compile Clues

Path Clues

Network Layer

Host Layer

File Layer

Ransomware detect

Figure 2 RDP-based ransomware attack traceback system process

IP Helper API makes it possible to get and modify networkconfiguration settings for the localhost It consists of theDLL file iphlpapidll and includes functions that can retrieveinformation about the protocols such as TCP and UDP [33]As a result the login monitor can directly access the databuffers involved in transmission control protocols routingtables network interfaces and network protocol statistics

412 The Communication Monitor The communicationmonitor is responsible for captures network traffic By locat-ing the TCP packets in the network traffic which containthe interactive configuration information of a RDP login theRDP connection information can be obtained as the attackerrsquospersonal information The main packet characteristics (1)The packetrsquos name is often ClientData (2)The packet locationis usually after the TCP three-way handshake (3) The datapacket position in the front (4) The amount of data issignificantly larger

42 At Host Layer

421 The Deception Host According to our observationthe main methods used by attackers to login a remote hostare (1) weak password direct login (2) add access accountlogin through vulnerability As a result we deliberately setthe administrator privileges of the deception environmentas weak passwords and leave common vulnerabilities in theenvironment such as EternalBlue to attract attackers Toguide attackers to upload ransomware using only clipboardand shared folders we block external traffic transfers fromoutside the environment and close common transfer ports(eg port 20 21 80 and 443)

422 The Clipboard Monitor The clipboard monitor canobtain clues in real time by monitoring the clipboardrsquoschanges It uses Clipboard Viewer to listen to messagechanges in the clipboard without affecting its contents TheClipboard Viewer is a mechanism that can get and displaythe contents of the clipboard As Windows applications aremessage-driven the key to the monitor is responding to andprocessing clipboard change messages When the contentchanges the monitor triggers the WM DRAWCLIPBOARDmessage and sends the changed message to the first window

of the Clipboard Viewer Chain After each Clipboard Viewerwindow responds to and processes the message it must sendthe message to the next window according to the handleof the next window in its saved linked list The clipboardmonitor can obtain the clipboards new contents by usingthe ldquoGetClipboardDatardquo Windows API through the windowWhen a subsequent copy or cut operation are executedthe data in the clipboard are rewritten As a result theclipboard monitor guarantees real-time listening and writesreal-time information to the log file The log file is updatedwhenever the clipboard monitor receives a clipboard changenotice When the log file is updated to prevent it from beingdetected by the attacker or being encrypted by ransomwarethe monitor sends it to a secure host and completely erases itfrom the environment

423 The Process Monitor To run a program on a Windowssystem a new process must be created The monitor gets thePE file run by the attacker by monitoring the environmentprocess It first records the state of processes commonlyused in the deception environments before a RDP-basedransomware attack After the login monitor detects such anattack it monitors the system for newly created processesin the environment through the Windows API ldquoCreateTool-help32Snapshotrdquo takes status snapshots of all processes in realtime which includes the process identifier (PID) When asuspicious process has started the monitor recognizes it bythe PID and looks up the processs running path with thehelp of ldquoGetModuleFileNameExrdquo Finally the monitor findsthe suspicious programs PE file through the path and copiesit to the secure host

43 At the File Layer

431 Deception Files Deception files are constructed withtwo goals First we need to make the attacker believe that thedeception environment is a real users host Second we needto make the attacker believe that there are resources in theenvironment that are worthy of attacking

To simulate a realistic environment we deploy largenumbers of different types of files on it for example imagesaudio files database files and documents that can be accessedin a Windows session Based on Amin Kharrazrsquos research [1]

6 Wireless Communications and Mobile Computing

we created four file categories that ransomware always triesto find and encrypt documents (lowasttxt lowastdoc(x) lowastppt(x)lowastxls(x) lowastpdf and lowastpy) keys and licenses (lowastkey lowastpemlowastcrt and lowastcer) file archives (lowastzip lowastrar) and media(lowastjp(e)g lowastmp3 and lowastavi) We obtain these files in threeways First we create files with valid headers and contentusing standard libraries (eg python-docx python-pptxpdfkit and OpenSSL) Second using Google search syntaxand crawler technology we download a large number offiles on the Internet Third we collect a number of non-confidential documents from the hosts of 20 volunteers toemulate actual user environments When we assign user filesfor the deception environment the path length is generatedrandomly Each folder may have a set of subfolders randomlyFor each folder a subset of extensions is randomly selectedFurthermore each directory name is generated based onmeaningful words Consequently we generate paths andextensions for user files giving them variable file depth andmeaningful content

To make the simulated environment more valuable wedeploy bait information on it such as database false codecomments digital certificates administrator password SSHkeys VPNkeys browser history passwords ARP records andDNS records When the bait information is obtained by anattacker it may trick it to attack the deception environment

432 The File Monitor The file monitor can detect theransomware by monitoring file type changes and file entropychanges which method is proposed in 2016 [21] The typeof data stored in a file can describe the order and positionof specific byte values unique to a file type Since files gen-erally retain their file types and formatting in the deceptionenvironment the bulk modification of such files should beconsidered suspicious When the monitor sees this type ofchanges we can infer that a ransomware attack has occurred

Entropy can express the randomness of each characterin a string The higher the entropy value the stronger therandomness The Shannon entropy of an array of bytes canbe computed as the sum

119890 =255

sum119894=0

119875119861119894log2

1119875119861119894

(1)

for119875119861119894= 119865119894119905119900119905119886119897119887119910119905119890119904 and119865

119894 the number of instances of byte

value 119894 in the array As the entropy value is represented by anumber from 0 to 8 the entropy value of 8 represents the bytearray composition of its completely uniform distributionSince the probability of each byte occurring in the encryptedciphertext is basically the same the entropy valuewill be closeto the upper limit Because the ransomware always encryptsa large number of files when we detect that a file change toa high entropy value file in a short period of time and alsochange the file type we assume that the file is subject to aransomware attack

433 The Shared Folder Monitor By traversing the diskstorage in real time the shared folder monitor discovers theupdates of the shared folders in real time It obtains thecontents of the attackers files locally which are often not

noticed by the attacker thus revealing some unexpectedtraceable clues The monitor can access a list of paths to theattacker shared folders

As we originally observed shared folders using RemoteDesktop often have a path in the remote host with the prefixldquotsclientrdquo When the monitor traverses the storage to thisprefix it uses ldquoFindFirstFilerdquo to find the first file It then usesldquoFindNextFilerdquo to find the next file with the returned handleWhen the resulting handle is in a folder format it continues totraverse all files under that folder Initially themonitor tries toget the full file names and file contents by traversing the newshared folder However during the actual experimentationit is found that as the number of files in storage grows themonitor takes far more time and resources to get all the filecontents than just the file paths All traverses are more likelyto alert the attacker Therefore the monitor only obtains thefile paths that the attacker shares on its host with the helpof ldquoGetFileNamerdquo Moreover in order to prevent encryptionby the ransomware the mounted disk monitor will directlytransfer the acquired shared file path list to another securehost

5 Clue Extraction and Analysis

Through the deception environment we trap the ransomwareattacker and collect a lot of information that may containmany traceable clues However such traceable clues areoften not visually observable and are complex in nature Inaddition many of the above clues contain information that isnot helpful in tracing back ransomware attackers Thereforein order to assist in the analysis of the monitor informationand extract the effective main traceable clues in this sectionwe propose how to extract clues and how to analyze traceableclues using an automatic approach after extraction

51 Clue Extraction We mainly obtain kinds of clues fromthe extraction including remote login information (IPaddresses) network traffic clipboard contents (pictures andtexts) shared folder information (path strings) and ran-somware samples (compile time and compile paths) Theshared folder path clues can be obtained directly from themonitor However clipboard clues compile clues and remotehost clues are often not visually observable and are complexin nature As a result the extraction module mainly focuseson the extraction of remote host clues clipboard string cluesand compilation clues

Remote Host Clue Extraction The IP address port numbersand folder path clues can be directly obtained from the logininformation and folder paths TCP packets that interact withthe configuration information in a network communicationPCAP package are usually named ldquoClientDatardquo We extractthe client name field from the ldquoClientDatardquo packet to obtainthe attackerrsquos hostname In addition the KeyboardLayoutfield indicates the default keyboard layout for an attack-ing host eg the Chinese Simplified layout number is0x0004 and the American English keyboard layout numberis 0x0409The remote users idiomatic language (the mother

Wireless Communications and Mobile Computing 7

tongue) can be found by the keyboard layout to infer theattackers nationality

Clipboard Clue Extraction The main file formats availableto the clipboard monitor are Windows Bitmap GDI fileANSI characters Unicode characters and WAV audio dataWe mainly aim at extracting the traceable clues of charactertypes It extracts character clues from the clipboard in variousformats by judging the GetClipboardData APIrsquos ldquoDataTyperdquovalue

Compilation Clue Extraction For all Windows RDP-basedransomware samples that we examined we empiricallyobserved that the most commonly used formats for thesesamples are the PE file especially lowastexe and lowastdll some PEfiles have compilation information in the file and this infor-mation does not change with the migration of the programsAs a result it is a goodway to obtain the creatorrsquos informationA PE file mainly consists of five major components DOSMZ header DOS stub PE header section headers andsection content Each component contains a great deal ofinformation There is very little information that we can useto identify the creator and some identification informationneeds to be extracted from the content of each section Inthis paper there are many clues in the PE files that canbe extracted to trace back the attacker file name PE filetype compiler version compilation path compile time lastmodified time last open time IP address URL domainname language string wide character and so on We extractmost of the clues with PEView [34] However since it cannotdirectly obtain the compilation path we use the pefile [35]tool to extract paths by locating in the PE structure

Since the extraction clues include different encodingformats to facilitate observation and the unified mode ofsubsequent analysis the extraction system completely con-verts the data encoding obtained into the UTF-8 format andsaved in the SQLite database Before submitted to the analysissystem the extracted clued are divided into two categoriesstring clues and path clues

52 Automatic Analysis At this point the clues extractedfrom this system mainly include string clues and path cluesThenumber of string clues is largemixedwith a large numberof unidentifiable strings Because the number of path cluesis also very large with no semantic correlation it is difficultto identify traceable clues manually So we focus on how toautomate the identification of traceable clues for path clues

521 Users and Programs Analysis In the analysis of thepath clues we first propose to obtain the attacker clues byidentifying the features of the context-related segmentationon the same path For instance each user has a separateuser folder and it is located in the ldquoUsersrdquo folder underDrive C As a result the system can obtain the user nameat the attacking host by obtaining the folder name under theldquoUsersrdquo folder (eg CUsersDell) The ldquoProgram filesrdquofolder usually contains the name of the software programinstalled on the machine (eg CProgram FilesMicrosoftVisual Studio 110) What is more the QQ account

number is always located in the ldquoQQQQfilerdquo folder (egDQQQQfile86lowast lowast lowast lowast lowastlowast086FileRecv)

In this way the analysis system can quickly and accuratelyget host names email accounts program names socialsoftware numbers and other traceable clues that are carriedin the attackers file paths However such user informationfor less of the overall clues is acquired by chance Thereforewe will conduct further analysis on this basis

522 Account Analysis Through the analysis of APT1 andsome other attribution reports we find that the mappingbetween the attacker and the physical world identity can bebetter obtained by analyzing the account number left by theattacker This information includes but is not limited to thelocation of the IP address the spelling and registration of thedomain name the URL corresponding IP address and thedomain name of the mailbox account Because it is difficultto identify this information effectively in a large numberof strings and path clues the analysis system automaticallyidentifies the IP address domain name URL and mailboxaccount by regular matching Then with the help of threatintelligence and big data technology more relevant clues areobtained

523 Language Identification User languages often help todetermine an attackers idiomatic language but because ofa large number of languages in different countries andthe high similarity of some languages we use automaticanalysis systems to identify the language of clues We testthe accuracy of two language identification toolkits usingentire path information in four different languages We havefound ldquolangidpyrdquo toolkit to be overall more accurate thanldquolangdetectrdquo toolkit The comparison results are shown inFigure 3 The langidpy is a language identification toolkitdeveloped by Lui and Baldwin at the University ofMelbourne[36] It combines a naive Bayes classifier with cross-domainfeature selection to provide domain-independent languageidentification

524 Traceable Strings Identification When traceable stringsare needed traditional string analysis methods usually useNamed Entity Recognition (NER) However the clues tobe analyzed mainly include strings and path Strings beforeand after the path separator have few semantic correlationsWhat is more the string between the path separators andthe remaining strings to be analyzed are mostly semanticallyunrelated due to their limited length So this paper proposesan algorithm which can quickly and automatically analyzethe traceable strings in strings and path

In order to filter out meaningful traceable clues related tothe attackerrsquos identity the path clues and string clues are splitinto strings and identified by common words and gibberishin the following steps Figure 4 shows the automatic traceableclues identification system process

Make Stop Words The system splits the path by the pathdelimiter as these separated path strings that are commonto multiple computers have no identifying effect So we takeout the file string names that are common to 20 normal user

8 Wireless Communications and Mobile Computing88

117

99 107

12

0

11

1

73

1

44

5

27

116

66

103

ENGLISH KOREANRUSSIANCHINESE

langid toolkit correct recognition

langid toolkit incorrect recognitionlangdetect toolkit correct recognition

langdetect toolkit incorrect recognition

Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively

StopWords

SentenceTokenize

Sign Split

Strings

Common wordsdetect

Alphabetic CaseCharacteristics

Result

Gibberish detect

StringClues

Word Tokenize

Traceable StringsIdentification

PathClues

Path Split

Figure 4 Process of automatic identifying of traceable clues

computers as stop words Then it removes the stop wordsfrom the strings after each split

Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again

Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue

Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is

P 119883119894+1

= 119909 | 1198831= 1199091 1198832= 1199092 119883

119894= 119909119894

= 119875 119883119894+1

= 119909 | 119883119894= 119909119894

(2)

Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes

119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]

times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)

If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities

119905ℎ119903119890119904ℎ119900119897119889

=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))

2

(4)

When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish

Wireless Communications and Mobile Computing 9

Table 1 Traceable strings recognized result [13]

String CommonWordsRecognized

GibberishRecognized

FinalRecognized

Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False

Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings

6 Evaluation

In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers

61 RDP-Based Ransomware Attacker

611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system

We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

Analysis Data Change Figure

Dat

a qua

ntity

Analysis Process

50

40

30

20

10

0

source stopword tokenize gibberish result

volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6

volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12

Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]

62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections

Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host

Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely

10 Wireless Communications and Mobile Computing

UserName

KeyboardLayout

Programs

Account

TraceableStrings

353lowastlowastlowast208

AdminDefault

CH-SIMPLI

FIED

AliPayQQ

SogouInput

MeiTu

Sinfor

360

VisualStudio

372lowastlowastlowast582

DELLbotnet

lowastlowaste

lowastlowastt

Alowastlowastlowastteam

Whlowastlowastlowastterfreebuf

visumantrag

lowastlowastlowastlowast

eaccnlowastlowastlowastlowast

taccn

Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string

used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users

Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features

Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use

Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by

Figure 7 The registration infographic of QQ account 372lowast lowast lowast82

security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname

63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware

Wireless Communications and Mobile Computing 11

Table 2 Same identifier for different samples [13]

Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc

jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501

8787197

0 100908070605040302010

number

Ransomware without a PDB pathRansomware with PDB path

Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation

samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples

What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information

Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion

Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been

353

5809

456

046

057

011

661034

399205057

1720

034114

046

zhennlnnslnbdeit

humtplfietdaes

Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)

translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo

ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo

We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following

(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China

(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)

12 Wireless Communications and Mobile Computing

Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library

(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China

7 Conclusion

In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)

Disclosure

An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)

References

[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016

[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015

[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015

[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017

[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018

[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware

[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817

[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018

[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf

[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018

[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf

[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018

[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018

[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A

Wireless Communications and Mobile Computing 13

survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018

[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018

[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015

[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016

[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020

[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016

[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015

[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016

[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016

[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015

[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015

[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016

[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016

[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015

[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018

[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017

[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015

[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016

[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017

[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments

[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language

identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012

[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004

[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst

[39] ldquoVirusTotalrdquo httpvirustotalcom

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: Automatically Traceback RDP-Based Targeted Ransomware …downloads.hindawi.com/journals/wcmc/2018/7943586.pdfResearchArticle Automatically Traceback RDP-Based Targeted Ransomware Attacks

4 Wireless Communications and Mobile Computing

from the monitor information Fourth we use automaticanalysis to screen a large number of clues for tracing backthe attacker Finally we will generate a report to tracebackthe RDP-based ransomware attacker We refer readers toSections 4 and 5 for the detailed implementation of this proto-type

31 Deception Environment Generally the ransomwareattack execution stage has two steps login and spread [31]To build a deception environment is nontrivial in practicebecause it must make the ransomware attacker believe that itbelongs to a real user and the user data is worthy to attackBecause advanced attackers always exploit static featuresbased on certain analysis systems before they launch attacks[32] an intuitive approach to address such reconnaissanceattacks is to build the user environment in such a way thatthe user data is valid real and nondeterministic In additionthe environment serves as an ldquoenticing targetrdquo to encourageransomware attackers We elaborate on how to generatesan artificial realistic and enticing user environment for theRDP-based ransomware in Section 4

The RDP-based attackers commonly upload maliciousprograms in the following ways before spread ransomware(1) The attacker downloads malicious programs on theInternet (2) programs are transferred through FTP SCP orother transport protocols (3) programs are uploaded throughthe clipboard (4) programs are uploaded through a sharedfolder The clipboard and shared folders are most commonlyused to transfer programs by RDP-based ransomware attacksbecause they are simple and convenient However both areeasy to monitor by our proposed system

32 Environment Monitor In order to avoid the attackersobservation and collect more attackerrsquos information a sharedfolder and clipboard on the remote PC are always used totransfer ransomware programs from the attacker machineafter the attacker logs in to the environment This paperproposes three monitor layers the network layer the hostlayer and the file layer We elaborate on how to configure themonitor system for the deception environment in Section 4

The Network Layer Monitor The network layer monitordetects a remote connection and collects information includ-ing the remote IP addresses remote ports status codes ofports keyboard layout and so on When the RDP-basedattacker logs in to the host the monitor can obtain informa-tion and detect the attack without the attackers knowledge

The Host Layer Monitor We propose to detect changes suchas processes and clipboards by monitoring the host layerThe host layer monitor can gather information about theattackers behavior and their use of these system applicationsin the deception environment For instance as the clipboardis in the system-level heap space any application in the systemhas access to it The RDP-based ransomware always takesadvantage of the clipboard to interact between applicationsMoreover it might get the clues left by the attackers using theclipboard locally as theWindows system shares the clipboardby default during the RDP session

The File Layer Monitor By monitoring the file layer we canidentify ransomware attacks by file changes Furthermore itcan gather local traceable information by monitoring filesin the shared folder For instance as a shared folder on theremote PC is always used to transfer ransomware from theattacker machine during the RDP session In addition for amore convenient and quick attack the attacker often mountsthe entire local disk to the remote computer As a resultthrough the monitor of the shared folder we can detect thenewly-added shared folders in real time and capture a largeamount of path information automatically

33 Clue Extraction Through environmental monitors itcan gather a lot of information left by attackers such aslogin information communication information clipboardcontent folder path and portable execution (PE) file Manytraceable clues can be extracted here including but notrestricted to IP address keyboard layout compile path andfile path In order to analyze these clues quickly we dividedthem into two categories string clues and path clues Theseclues are then submitted to the automatic analysis systemWewill elaborate the types of clues that the proposed system canextract in Section 51

34 Automatic Analysis According to our investigation cur-rent traceback tools mostly analyze clues manually Howeverwe usually have to deal with a large amount clues withno semantic correlation Because such manual tracebackanalysis usually takes a lot time and efforts we propose anautomatic analysis system and we will elaborate on how toanalyze clues automatically in Section 52

4 Implementation of theDeception Environment

As the Windows platform is the main target of ransomwarewe chooseWindows as the proof of concept implementationIn this section we describe the implementation details of aWindows-based deception environment prototype It elabo-rates on how the deception environment traps ransomwareattackers how the monitor detects the RDP-based attack andcollects traceable clues The entire system implementationprocess is shown in Figure 2

41 At the Network Layer

411 The Login Monitor The login monitor is used to detectattacks in real time and collect the attackerrsquos login informa-tion On the Windows platform Win32 is an environmentsubsystem that provides an API for operating system servicesand functions to control all user inputs and outputs Thelogin monitor relies on Windows APIs to gain access to thesystem and run with privileges to access their own areasof memory It uses Winsock 20 to get access to networksand uses protocols other than the TCPIP suite The loginmonitor takes network requests and sends those requests tothe Winsock 20 SPI (Service Provider Interface) by callingthe main Winsock 20 file Ws2 32dll It provides access totransport service providers and namespace providers The

Wireless Communications and Mobile Computing 5

Cyber DeceptionEnvironment

Users and ProgramsAnalysis

Network AddressAnalysis

1 Construct DeceptionEnvironment

3 Clue Extraction2 Environment Monitor

4 Automated Analysis

5 Result

Login Monitor

CommunicationMonitor

Clipboard Monitor

LanguageIdentification

Traceable StringsIdentification

AuxiliaryTraceable

CluesProcess Monitor

Shared FolderMonitor

File Monitor

Login Clues

Remote Host Clues

Clipboard Clues

Compile Clues

Path Clues

Network Layer

Host Layer

File Layer

Ransomware detect

Figure 2 RDP-based ransomware attack traceback system process

IP Helper API makes it possible to get and modify networkconfiguration settings for the localhost It consists of theDLL file iphlpapidll and includes functions that can retrieveinformation about the protocols such as TCP and UDP [33]As a result the login monitor can directly access the databuffers involved in transmission control protocols routingtables network interfaces and network protocol statistics

412 The Communication Monitor The communicationmonitor is responsible for captures network traffic By locat-ing the TCP packets in the network traffic which containthe interactive configuration information of a RDP login theRDP connection information can be obtained as the attackerrsquospersonal information The main packet characteristics (1)The packetrsquos name is often ClientData (2)The packet locationis usually after the TCP three-way handshake (3) The datapacket position in the front (4) The amount of data issignificantly larger

42 At Host Layer

421 The Deception Host According to our observationthe main methods used by attackers to login a remote hostare (1) weak password direct login (2) add access accountlogin through vulnerability As a result we deliberately setthe administrator privileges of the deception environmentas weak passwords and leave common vulnerabilities in theenvironment such as EternalBlue to attract attackers Toguide attackers to upload ransomware using only clipboardand shared folders we block external traffic transfers fromoutside the environment and close common transfer ports(eg port 20 21 80 and 443)

422 The Clipboard Monitor The clipboard monitor canobtain clues in real time by monitoring the clipboardrsquoschanges It uses Clipboard Viewer to listen to messagechanges in the clipboard without affecting its contents TheClipboard Viewer is a mechanism that can get and displaythe contents of the clipboard As Windows applications aremessage-driven the key to the monitor is responding to andprocessing clipboard change messages When the contentchanges the monitor triggers the WM DRAWCLIPBOARDmessage and sends the changed message to the first window

of the Clipboard Viewer Chain After each Clipboard Viewerwindow responds to and processes the message it must sendthe message to the next window according to the handleof the next window in its saved linked list The clipboardmonitor can obtain the clipboards new contents by usingthe ldquoGetClipboardDatardquo Windows API through the windowWhen a subsequent copy or cut operation are executedthe data in the clipboard are rewritten As a result theclipboard monitor guarantees real-time listening and writesreal-time information to the log file The log file is updatedwhenever the clipboard monitor receives a clipboard changenotice When the log file is updated to prevent it from beingdetected by the attacker or being encrypted by ransomwarethe monitor sends it to a secure host and completely erases itfrom the environment

423 The Process Monitor To run a program on a Windowssystem a new process must be created The monitor gets thePE file run by the attacker by monitoring the environmentprocess It first records the state of processes commonlyused in the deception environments before a RDP-basedransomware attack After the login monitor detects such anattack it monitors the system for newly created processesin the environment through the Windows API ldquoCreateTool-help32Snapshotrdquo takes status snapshots of all processes in realtime which includes the process identifier (PID) When asuspicious process has started the monitor recognizes it bythe PID and looks up the processs running path with thehelp of ldquoGetModuleFileNameExrdquo Finally the monitor findsthe suspicious programs PE file through the path and copiesit to the secure host

43 At the File Layer

431 Deception Files Deception files are constructed withtwo goals First we need to make the attacker believe that thedeception environment is a real users host Second we needto make the attacker believe that there are resources in theenvironment that are worthy of attacking

To simulate a realistic environment we deploy largenumbers of different types of files on it for example imagesaudio files database files and documents that can be accessedin a Windows session Based on Amin Kharrazrsquos research [1]

6 Wireless Communications and Mobile Computing

we created four file categories that ransomware always triesto find and encrypt documents (lowasttxt lowastdoc(x) lowastppt(x)lowastxls(x) lowastpdf and lowastpy) keys and licenses (lowastkey lowastpemlowastcrt and lowastcer) file archives (lowastzip lowastrar) and media(lowastjp(e)g lowastmp3 and lowastavi) We obtain these files in threeways First we create files with valid headers and contentusing standard libraries (eg python-docx python-pptxpdfkit and OpenSSL) Second using Google search syntaxand crawler technology we download a large number offiles on the Internet Third we collect a number of non-confidential documents from the hosts of 20 volunteers toemulate actual user environments When we assign user filesfor the deception environment the path length is generatedrandomly Each folder may have a set of subfolders randomlyFor each folder a subset of extensions is randomly selectedFurthermore each directory name is generated based onmeaningful words Consequently we generate paths andextensions for user files giving them variable file depth andmeaningful content

To make the simulated environment more valuable wedeploy bait information on it such as database false codecomments digital certificates administrator password SSHkeys VPNkeys browser history passwords ARP records andDNS records When the bait information is obtained by anattacker it may trick it to attack the deception environment

432 The File Monitor The file monitor can detect theransomware by monitoring file type changes and file entropychanges which method is proposed in 2016 [21] The typeof data stored in a file can describe the order and positionof specific byte values unique to a file type Since files gen-erally retain their file types and formatting in the deceptionenvironment the bulk modification of such files should beconsidered suspicious When the monitor sees this type ofchanges we can infer that a ransomware attack has occurred

Entropy can express the randomness of each characterin a string The higher the entropy value the stronger therandomness The Shannon entropy of an array of bytes canbe computed as the sum

119890 =255

sum119894=0

119875119861119894log2

1119875119861119894

(1)

for119875119861119894= 119865119894119905119900119905119886119897119887119910119905119890119904 and119865

119894 the number of instances of byte

value 119894 in the array As the entropy value is represented by anumber from 0 to 8 the entropy value of 8 represents the bytearray composition of its completely uniform distributionSince the probability of each byte occurring in the encryptedciphertext is basically the same the entropy valuewill be closeto the upper limit Because the ransomware always encryptsa large number of files when we detect that a file change toa high entropy value file in a short period of time and alsochange the file type we assume that the file is subject to aransomware attack

433 The Shared Folder Monitor By traversing the diskstorage in real time the shared folder monitor discovers theupdates of the shared folders in real time It obtains thecontents of the attackers files locally which are often not

noticed by the attacker thus revealing some unexpectedtraceable clues The monitor can access a list of paths to theattacker shared folders

As we originally observed shared folders using RemoteDesktop often have a path in the remote host with the prefixldquotsclientrdquo When the monitor traverses the storage to thisprefix it uses ldquoFindFirstFilerdquo to find the first file It then usesldquoFindNextFilerdquo to find the next file with the returned handleWhen the resulting handle is in a folder format it continues totraverse all files under that folder Initially themonitor tries toget the full file names and file contents by traversing the newshared folder However during the actual experimentationit is found that as the number of files in storage grows themonitor takes far more time and resources to get all the filecontents than just the file paths All traverses are more likelyto alert the attacker Therefore the monitor only obtains thefile paths that the attacker shares on its host with the helpof ldquoGetFileNamerdquo Moreover in order to prevent encryptionby the ransomware the mounted disk monitor will directlytransfer the acquired shared file path list to another securehost

5 Clue Extraction and Analysis

Through the deception environment we trap the ransomwareattacker and collect a lot of information that may containmany traceable clues However such traceable clues areoften not visually observable and are complex in nature Inaddition many of the above clues contain information that isnot helpful in tracing back ransomware attackers Thereforein order to assist in the analysis of the monitor informationand extract the effective main traceable clues in this sectionwe propose how to extract clues and how to analyze traceableclues using an automatic approach after extraction

51 Clue Extraction We mainly obtain kinds of clues fromthe extraction including remote login information (IPaddresses) network traffic clipboard contents (pictures andtexts) shared folder information (path strings) and ran-somware samples (compile time and compile paths) Theshared folder path clues can be obtained directly from themonitor However clipboard clues compile clues and remotehost clues are often not visually observable and are complexin nature As a result the extraction module mainly focuseson the extraction of remote host clues clipboard string cluesand compilation clues

Remote Host Clue Extraction The IP address port numbersand folder path clues can be directly obtained from the logininformation and folder paths TCP packets that interact withthe configuration information in a network communicationPCAP package are usually named ldquoClientDatardquo We extractthe client name field from the ldquoClientDatardquo packet to obtainthe attackerrsquos hostname In addition the KeyboardLayoutfield indicates the default keyboard layout for an attack-ing host eg the Chinese Simplified layout number is0x0004 and the American English keyboard layout numberis 0x0409The remote users idiomatic language (the mother

Wireless Communications and Mobile Computing 7

tongue) can be found by the keyboard layout to infer theattackers nationality

Clipboard Clue Extraction The main file formats availableto the clipboard monitor are Windows Bitmap GDI fileANSI characters Unicode characters and WAV audio dataWe mainly aim at extracting the traceable clues of charactertypes It extracts character clues from the clipboard in variousformats by judging the GetClipboardData APIrsquos ldquoDataTyperdquovalue

Compilation Clue Extraction For all Windows RDP-basedransomware samples that we examined we empiricallyobserved that the most commonly used formats for thesesamples are the PE file especially lowastexe and lowastdll some PEfiles have compilation information in the file and this infor-mation does not change with the migration of the programsAs a result it is a goodway to obtain the creatorrsquos informationA PE file mainly consists of five major components DOSMZ header DOS stub PE header section headers andsection content Each component contains a great deal ofinformation There is very little information that we can useto identify the creator and some identification informationneeds to be extracted from the content of each section Inthis paper there are many clues in the PE files that canbe extracted to trace back the attacker file name PE filetype compiler version compilation path compile time lastmodified time last open time IP address URL domainname language string wide character and so on We extractmost of the clues with PEView [34] However since it cannotdirectly obtain the compilation path we use the pefile [35]tool to extract paths by locating in the PE structure

Since the extraction clues include different encodingformats to facilitate observation and the unified mode ofsubsequent analysis the extraction system completely con-verts the data encoding obtained into the UTF-8 format andsaved in the SQLite database Before submitted to the analysissystem the extracted clued are divided into two categoriesstring clues and path clues

52 Automatic Analysis At this point the clues extractedfrom this system mainly include string clues and path cluesThenumber of string clues is largemixedwith a large numberof unidentifiable strings Because the number of path cluesis also very large with no semantic correlation it is difficultto identify traceable clues manually So we focus on how toautomate the identification of traceable clues for path clues

521 Users and Programs Analysis In the analysis of thepath clues we first propose to obtain the attacker clues byidentifying the features of the context-related segmentationon the same path For instance each user has a separateuser folder and it is located in the ldquoUsersrdquo folder underDrive C As a result the system can obtain the user nameat the attacking host by obtaining the folder name under theldquoUsersrdquo folder (eg CUsersDell) The ldquoProgram filesrdquofolder usually contains the name of the software programinstalled on the machine (eg CProgram FilesMicrosoftVisual Studio 110) What is more the QQ account

number is always located in the ldquoQQQQfilerdquo folder (egDQQQQfile86lowast lowast lowast lowast lowastlowast086FileRecv)

In this way the analysis system can quickly and accuratelyget host names email accounts program names socialsoftware numbers and other traceable clues that are carriedin the attackers file paths However such user informationfor less of the overall clues is acquired by chance Thereforewe will conduct further analysis on this basis

522 Account Analysis Through the analysis of APT1 andsome other attribution reports we find that the mappingbetween the attacker and the physical world identity can bebetter obtained by analyzing the account number left by theattacker This information includes but is not limited to thelocation of the IP address the spelling and registration of thedomain name the URL corresponding IP address and thedomain name of the mailbox account Because it is difficultto identify this information effectively in a large numberof strings and path clues the analysis system automaticallyidentifies the IP address domain name URL and mailboxaccount by regular matching Then with the help of threatintelligence and big data technology more relevant clues areobtained

523 Language Identification User languages often help todetermine an attackers idiomatic language but because ofa large number of languages in different countries andthe high similarity of some languages we use automaticanalysis systems to identify the language of clues We testthe accuracy of two language identification toolkits usingentire path information in four different languages We havefound ldquolangidpyrdquo toolkit to be overall more accurate thanldquolangdetectrdquo toolkit The comparison results are shown inFigure 3 The langidpy is a language identification toolkitdeveloped by Lui and Baldwin at the University ofMelbourne[36] It combines a naive Bayes classifier with cross-domainfeature selection to provide domain-independent languageidentification

524 Traceable Strings Identification When traceable stringsare needed traditional string analysis methods usually useNamed Entity Recognition (NER) However the clues tobe analyzed mainly include strings and path Strings beforeand after the path separator have few semantic correlationsWhat is more the string between the path separators andthe remaining strings to be analyzed are mostly semanticallyunrelated due to their limited length So this paper proposesan algorithm which can quickly and automatically analyzethe traceable strings in strings and path

In order to filter out meaningful traceable clues related tothe attackerrsquos identity the path clues and string clues are splitinto strings and identified by common words and gibberishin the following steps Figure 4 shows the automatic traceableclues identification system process

Make Stop Words The system splits the path by the pathdelimiter as these separated path strings that are commonto multiple computers have no identifying effect So we takeout the file string names that are common to 20 normal user

8 Wireless Communications and Mobile Computing88

117

99 107

12

0

11

1

73

1

44

5

27

116

66

103

ENGLISH KOREANRUSSIANCHINESE

langid toolkit correct recognition

langid toolkit incorrect recognitionlangdetect toolkit correct recognition

langdetect toolkit incorrect recognition

Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively

StopWords

SentenceTokenize

Sign Split

Strings

Common wordsdetect

Alphabetic CaseCharacteristics

Result

Gibberish detect

StringClues

Word Tokenize

Traceable StringsIdentification

PathClues

Path Split

Figure 4 Process of automatic identifying of traceable clues

computers as stop words Then it removes the stop wordsfrom the strings after each split

Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again

Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue

Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is

P 119883119894+1

= 119909 | 1198831= 1199091 1198832= 1199092 119883

119894= 119909119894

= 119875 119883119894+1

= 119909 | 119883119894= 119909119894

(2)

Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes

119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]

times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)

If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities

119905ℎ119903119890119904ℎ119900119897119889

=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))

2

(4)

When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish

Wireless Communications and Mobile Computing 9

Table 1 Traceable strings recognized result [13]

String CommonWordsRecognized

GibberishRecognized

FinalRecognized

Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False

Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings

6 Evaluation

In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers

61 RDP-Based Ransomware Attacker

611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system

We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

Analysis Data Change Figure

Dat

a qua

ntity

Analysis Process

50

40

30

20

10

0

source stopword tokenize gibberish result

volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6

volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12

Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]

62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections

Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host

Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely

10 Wireless Communications and Mobile Computing

UserName

KeyboardLayout

Programs

Account

TraceableStrings

353lowastlowastlowast208

AdminDefault

CH-SIMPLI

FIED

AliPayQQ

SogouInput

MeiTu

Sinfor

360

VisualStudio

372lowastlowastlowast582

DELLbotnet

lowastlowaste

lowastlowastt

Alowastlowastlowastteam

Whlowastlowastlowastterfreebuf

visumantrag

lowastlowastlowastlowast

eaccnlowastlowastlowastlowast

taccn

Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string

used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users

Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features

Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use

Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by

Figure 7 The registration infographic of QQ account 372lowast lowast lowast82

security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname

63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware

Wireless Communications and Mobile Computing 11

Table 2 Same identifier for different samples [13]

Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc

jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501

8787197

0 100908070605040302010

number

Ransomware without a PDB pathRansomware with PDB path

Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation

samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples

What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information

Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion

Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been

353

5809

456

046

057

011

661034

399205057

1720

034114

046

zhennlnnslnbdeit

humtplfietdaes

Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)

translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo

ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo

We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following

(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China

(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)

12 Wireless Communications and Mobile Computing

Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library

(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China

7 Conclusion

In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)

Disclosure

An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)

References

[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016

[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015

[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015

[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017

[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018

[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware

[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817

[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018

[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf

[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018

[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf

[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018

[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018

[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A

Wireless Communications and Mobile Computing 13

survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018

[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018

[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015

[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016

[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020

[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016

[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015

[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016

[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016

[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015

[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015

[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016

[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016

[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015

[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018

[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017

[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015

[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016

[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017

[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments

[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language

identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012

[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004

[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst

[39] ldquoVirusTotalrdquo httpvirustotalcom

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: Automatically Traceback RDP-Based Targeted Ransomware …downloads.hindawi.com/journals/wcmc/2018/7943586.pdfResearchArticle Automatically Traceback RDP-Based Targeted Ransomware Attacks

Wireless Communications and Mobile Computing 5

Cyber DeceptionEnvironment

Users and ProgramsAnalysis

Network AddressAnalysis

1 Construct DeceptionEnvironment

3 Clue Extraction2 Environment Monitor

4 Automated Analysis

5 Result

Login Monitor

CommunicationMonitor

Clipboard Monitor

LanguageIdentification

Traceable StringsIdentification

AuxiliaryTraceable

CluesProcess Monitor

Shared FolderMonitor

File Monitor

Login Clues

Remote Host Clues

Clipboard Clues

Compile Clues

Path Clues

Network Layer

Host Layer

File Layer

Ransomware detect

Figure 2 RDP-based ransomware attack traceback system process

IP Helper API makes it possible to get and modify networkconfiguration settings for the localhost It consists of theDLL file iphlpapidll and includes functions that can retrieveinformation about the protocols such as TCP and UDP [33]As a result the login monitor can directly access the databuffers involved in transmission control protocols routingtables network interfaces and network protocol statistics

412 The Communication Monitor The communicationmonitor is responsible for captures network traffic By locat-ing the TCP packets in the network traffic which containthe interactive configuration information of a RDP login theRDP connection information can be obtained as the attackerrsquospersonal information The main packet characteristics (1)The packetrsquos name is often ClientData (2)The packet locationis usually after the TCP three-way handshake (3) The datapacket position in the front (4) The amount of data issignificantly larger

42 At Host Layer

421 The Deception Host According to our observationthe main methods used by attackers to login a remote hostare (1) weak password direct login (2) add access accountlogin through vulnerability As a result we deliberately setthe administrator privileges of the deception environmentas weak passwords and leave common vulnerabilities in theenvironment such as EternalBlue to attract attackers Toguide attackers to upload ransomware using only clipboardand shared folders we block external traffic transfers fromoutside the environment and close common transfer ports(eg port 20 21 80 and 443)

422 The Clipboard Monitor The clipboard monitor canobtain clues in real time by monitoring the clipboardrsquoschanges It uses Clipboard Viewer to listen to messagechanges in the clipboard without affecting its contents TheClipboard Viewer is a mechanism that can get and displaythe contents of the clipboard As Windows applications aremessage-driven the key to the monitor is responding to andprocessing clipboard change messages When the contentchanges the monitor triggers the WM DRAWCLIPBOARDmessage and sends the changed message to the first window

of the Clipboard Viewer Chain After each Clipboard Viewerwindow responds to and processes the message it must sendthe message to the next window according to the handleof the next window in its saved linked list The clipboardmonitor can obtain the clipboards new contents by usingthe ldquoGetClipboardDatardquo Windows API through the windowWhen a subsequent copy or cut operation are executedthe data in the clipboard are rewritten As a result theclipboard monitor guarantees real-time listening and writesreal-time information to the log file The log file is updatedwhenever the clipboard monitor receives a clipboard changenotice When the log file is updated to prevent it from beingdetected by the attacker or being encrypted by ransomwarethe monitor sends it to a secure host and completely erases itfrom the environment

423 The Process Monitor To run a program on a Windowssystem a new process must be created The monitor gets thePE file run by the attacker by monitoring the environmentprocess It first records the state of processes commonlyused in the deception environments before a RDP-basedransomware attack After the login monitor detects such anattack it monitors the system for newly created processesin the environment through the Windows API ldquoCreateTool-help32Snapshotrdquo takes status snapshots of all processes in realtime which includes the process identifier (PID) When asuspicious process has started the monitor recognizes it bythe PID and looks up the processs running path with thehelp of ldquoGetModuleFileNameExrdquo Finally the monitor findsthe suspicious programs PE file through the path and copiesit to the secure host

43 At the File Layer

431 Deception Files Deception files are constructed withtwo goals First we need to make the attacker believe that thedeception environment is a real users host Second we needto make the attacker believe that there are resources in theenvironment that are worthy of attacking

To simulate a realistic environment we deploy largenumbers of different types of files on it for example imagesaudio files database files and documents that can be accessedin a Windows session Based on Amin Kharrazrsquos research [1]

6 Wireless Communications and Mobile Computing

we created four file categories that ransomware always triesto find and encrypt documents (lowasttxt lowastdoc(x) lowastppt(x)lowastxls(x) lowastpdf and lowastpy) keys and licenses (lowastkey lowastpemlowastcrt and lowastcer) file archives (lowastzip lowastrar) and media(lowastjp(e)g lowastmp3 and lowastavi) We obtain these files in threeways First we create files with valid headers and contentusing standard libraries (eg python-docx python-pptxpdfkit and OpenSSL) Second using Google search syntaxand crawler technology we download a large number offiles on the Internet Third we collect a number of non-confidential documents from the hosts of 20 volunteers toemulate actual user environments When we assign user filesfor the deception environment the path length is generatedrandomly Each folder may have a set of subfolders randomlyFor each folder a subset of extensions is randomly selectedFurthermore each directory name is generated based onmeaningful words Consequently we generate paths andextensions for user files giving them variable file depth andmeaningful content

To make the simulated environment more valuable wedeploy bait information on it such as database false codecomments digital certificates administrator password SSHkeys VPNkeys browser history passwords ARP records andDNS records When the bait information is obtained by anattacker it may trick it to attack the deception environment

432 The File Monitor The file monitor can detect theransomware by monitoring file type changes and file entropychanges which method is proposed in 2016 [21] The typeof data stored in a file can describe the order and positionof specific byte values unique to a file type Since files gen-erally retain their file types and formatting in the deceptionenvironment the bulk modification of such files should beconsidered suspicious When the monitor sees this type ofchanges we can infer that a ransomware attack has occurred

Entropy can express the randomness of each characterin a string The higher the entropy value the stronger therandomness The Shannon entropy of an array of bytes canbe computed as the sum

119890 =255

sum119894=0

119875119861119894log2

1119875119861119894

(1)

for119875119861119894= 119865119894119905119900119905119886119897119887119910119905119890119904 and119865

119894 the number of instances of byte

value 119894 in the array As the entropy value is represented by anumber from 0 to 8 the entropy value of 8 represents the bytearray composition of its completely uniform distributionSince the probability of each byte occurring in the encryptedciphertext is basically the same the entropy valuewill be closeto the upper limit Because the ransomware always encryptsa large number of files when we detect that a file change toa high entropy value file in a short period of time and alsochange the file type we assume that the file is subject to aransomware attack

433 The Shared Folder Monitor By traversing the diskstorage in real time the shared folder monitor discovers theupdates of the shared folders in real time It obtains thecontents of the attackers files locally which are often not

noticed by the attacker thus revealing some unexpectedtraceable clues The monitor can access a list of paths to theattacker shared folders

As we originally observed shared folders using RemoteDesktop often have a path in the remote host with the prefixldquotsclientrdquo When the monitor traverses the storage to thisprefix it uses ldquoFindFirstFilerdquo to find the first file It then usesldquoFindNextFilerdquo to find the next file with the returned handleWhen the resulting handle is in a folder format it continues totraverse all files under that folder Initially themonitor tries toget the full file names and file contents by traversing the newshared folder However during the actual experimentationit is found that as the number of files in storage grows themonitor takes far more time and resources to get all the filecontents than just the file paths All traverses are more likelyto alert the attacker Therefore the monitor only obtains thefile paths that the attacker shares on its host with the helpof ldquoGetFileNamerdquo Moreover in order to prevent encryptionby the ransomware the mounted disk monitor will directlytransfer the acquired shared file path list to another securehost

5 Clue Extraction and Analysis

Through the deception environment we trap the ransomwareattacker and collect a lot of information that may containmany traceable clues However such traceable clues areoften not visually observable and are complex in nature Inaddition many of the above clues contain information that isnot helpful in tracing back ransomware attackers Thereforein order to assist in the analysis of the monitor informationand extract the effective main traceable clues in this sectionwe propose how to extract clues and how to analyze traceableclues using an automatic approach after extraction

51 Clue Extraction We mainly obtain kinds of clues fromthe extraction including remote login information (IPaddresses) network traffic clipboard contents (pictures andtexts) shared folder information (path strings) and ran-somware samples (compile time and compile paths) Theshared folder path clues can be obtained directly from themonitor However clipboard clues compile clues and remotehost clues are often not visually observable and are complexin nature As a result the extraction module mainly focuseson the extraction of remote host clues clipboard string cluesand compilation clues

Remote Host Clue Extraction The IP address port numbersand folder path clues can be directly obtained from the logininformation and folder paths TCP packets that interact withthe configuration information in a network communicationPCAP package are usually named ldquoClientDatardquo We extractthe client name field from the ldquoClientDatardquo packet to obtainthe attackerrsquos hostname In addition the KeyboardLayoutfield indicates the default keyboard layout for an attack-ing host eg the Chinese Simplified layout number is0x0004 and the American English keyboard layout numberis 0x0409The remote users idiomatic language (the mother

Wireless Communications and Mobile Computing 7

tongue) can be found by the keyboard layout to infer theattackers nationality

Clipboard Clue Extraction The main file formats availableto the clipboard monitor are Windows Bitmap GDI fileANSI characters Unicode characters and WAV audio dataWe mainly aim at extracting the traceable clues of charactertypes It extracts character clues from the clipboard in variousformats by judging the GetClipboardData APIrsquos ldquoDataTyperdquovalue

Compilation Clue Extraction For all Windows RDP-basedransomware samples that we examined we empiricallyobserved that the most commonly used formats for thesesamples are the PE file especially lowastexe and lowastdll some PEfiles have compilation information in the file and this infor-mation does not change with the migration of the programsAs a result it is a goodway to obtain the creatorrsquos informationA PE file mainly consists of five major components DOSMZ header DOS stub PE header section headers andsection content Each component contains a great deal ofinformation There is very little information that we can useto identify the creator and some identification informationneeds to be extracted from the content of each section Inthis paper there are many clues in the PE files that canbe extracted to trace back the attacker file name PE filetype compiler version compilation path compile time lastmodified time last open time IP address URL domainname language string wide character and so on We extractmost of the clues with PEView [34] However since it cannotdirectly obtain the compilation path we use the pefile [35]tool to extract paths by locating in the PE structure

Since the extraction clues include different encodingformats to facilitate observation and the unified mode ofsubsequent analysis the extraction system completely con-verts the data encoding obtained into the UTF-8 format andsaved in the SQLite database Before submitted to the analysissystem the extracted clued are divided into two categoriesstring clues and path clues

52 Automatic Analysis At this point the clues extractedfrom this system mainly include string clues and path cluesThenumber of string clues is largemixedwith a large numberof unidentifiable strings Because the number of path cluesis also very large with no semantic correlation it is difficultto identify traceable clues manually So we focus on how toautomate the identification of traceable clues for path clues

521 Users and Programs Analysis In the analysis of thepath clues we first propose to obtain the attacker clues byidentifying the features of the context-related segmentationon the same path For instance each user has a separateuser folder and it is located in the ldquoUsersrdquo folder underDrive C As a result the system can obtain the user nameat the attacking host by obtaining the folder name under theldquoUsersrdquo folder (eg CUsersDell) The ldquoProgram filesrdquofolder usually contains the name of the software programinstalled on the machine (eg CProgram FilesMicrosoftVisual Studio 110) What is more the QQ account

number is always located in the ldquoQQQQfilerdquo folder (egDQQQQfile86lowast lowast lowast lowast lowastlowast086FileRecv)

In this way the analysis system can quickly and accuratelyget host names email accounts program names socialsoftware numbers and other traceable clues that are carriedin the attackers file paths However such user informationfor less of the overall clues is acquired by chance Thereforewe will conduct further analysis on this basis

522 Account Analysis Through the analysis of APT1 andsome other attribution reports we find that the mappingbetween the attacker and the physical world identity can bebetter obtained by analyzing the account number left by theattacker This information includes but is not limited to thelocation of the IP address the spelling and registration of thedomain name the URL corresponding IP address and thedomain name of the mailbox account Because it is difficultto identify this information effectively in a large numberof strings and path clues the analysis system automaticallyidentifies the IP address domain name URL and mailboxaccount by regular matching Then with the help of threatintelligence and big data technology more relevant clues areobtained

523 Language Identification User languages often help todetermine an attackers idiomatic language but because ofa large number of languages in different countries andthe high similarity of some languages we use automaticanalysis systems to identify the language of clues We testthe accuracy of two language identification toolkits usingentire path information in four different languages We havefound ldquolangidpyrdquo toolkit to be overall more accurate thanldquolangdetectrdquo toolkit The comparison results are shown inFigure 3 The langidpy is a language identification toolkitdeveloped by Lui and Baldwin at the University ofMelbourne[36] It combines a naive Bayes classifier with cross-domainfeature selection to provide domain-independent languageidentification

524 Traceable Strings Identification When traceable stringsare needed traditional string analysis methods usually useNamed Entity Recognition (NER) However the clues tobe analyzed mainly include strings and path Strings beforeand after the path separator have few semantic correlationsWhat is more the string between the path separators andthe remaining strings to be analyzed are mostly semanticallyunrelated due to their limited length So this paper proposesan algorithm which can quickly and automatically analyzethe traceable strings in strings and path

In order to filter out meaningful traceable clues related tothe attackerrsquos identity the path clues and string clues are splitinto strings and identified by common words and gibberishin the following steps Figure 4 shows the automatic traceableclues identification system process

Make Stop Words The system splits the path by the pathdelimiter as these separated path strings that are commonto multiple computers have no identifying effect So we takeout the file string names that are common to 20 normal user

8 Wireless Communications and Mobile Computing88

117

99 107

12

0

11

1

73

1

44

5

27

116

66

103

ENGLISH KOREANRUSSIANCHINESE

langid toolkit correct recognition

langid toolkit incorrect recognitionlangdetect toolkit correct recognition

langdetect toolkit incorrect recognition

Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively

StopWords

SentenceTokenize

Sign Split

Strings

Common wordsdetect

Alphabetic CaseCharacteristics

Result

Gibberish detect

StringClues

Word Tokenize

Traceable StringsIdentification

PathClues

Path Split

Figure 4 Process of automatic identifying of traceable clues

computers as stop words Then it removes the stop wordsfrom the strings after each split

Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again

Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue

Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is

P 119883119894+1

= 119909 | 1198831= 1199091 1198832= 1199092 119883

119894= 119909119894

= 119875 119883119894+1

= 119909 | 119883119894= 119909119894

(2)

Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes

119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]

times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)

If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities

119905ℎ119903119890119904ℎ119900119897119889

=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))

2

(4)

When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish

Wireless Communications and Mobile Computing 9

Table 1 Traceable strings recognized result [13]

String CommonWordsRecognized

GibberishRecognized

FinalRecognized

Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False

Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings

6 Evaluation

In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers

61 RDP-Based Ransomware Attacker

611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system

We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

Analysis Data Change Figure

Dat

a qua

ntity

Analysis Process

50

40

30

20

10

0

source stopword tokenize gibberish result

volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6

volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12

Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]

62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections

Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host

Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely

10 Wireless Communications and Mobile Computing

UserName

KeyboardLayout

Programs

Account

TraceableStrings

353lowastlowastlowast208

AdminDefault

CH-SIMPLI

FIED

AliPayQQ

SogouInput

MeiTu

Sinfor

360

VisualStudio

372lowastlowastlowast582

DELLbotnet

lowastlowaste

lowastlowastt

Alowastlowastlowastteam

Whlowastlowastlowastterfreebuf

visumantrag

lowastlowastlowastlowast

eaccnlowastlowastlowastlowast

taccn

Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string

used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users

Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features

Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use

Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by

Figure 7 The registration infographic of QQ account 372lowast lowast lowast82

security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname

63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware

Wireless Communications and Mobile Computing 11

Table 2 Same identifier for different samples [13]

Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc

jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501

8787197

0 100908070605040302010

number

Ransomware without a PDB pathRansomware with PDB path

Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation

samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples

What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information

Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion

Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been

353

5809

456

046

057

011

661034

399205057

1720

034114

046

zhennlnnslnbdeit

humtplfietdaes

Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)

translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo

ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo

We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following

(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China

(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)

12 Wireless Communications and Mobile Computing

Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library

(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China

7 Conclusion

In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)

Disclosure

An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)

References

[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016

[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015

[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015

[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017

[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018

[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware

[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817

[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018

[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf

[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018

[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf

[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018

[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018

[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A

Wireless Communications and Mobile Computing 13

survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018

[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018

[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015

[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016

[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020

[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016

[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015

[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016

[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016

[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015

[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015

[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016

[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016

[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015

[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018

[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017

[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015

[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016

[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017

[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments

[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language

identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012

[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004

[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst

[39] ldquoVirusTotalrdquo httpvirustotalcom

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: Automatically Traceback RDP-Based Targeted Ransomware …downloads.hindawi.com/journals/wcmc/2018/7943586.pdfResearchArticle Automatically Traceback RDP-Based Targeted Ransomware Attacks

6 Wireless Communications and Mobile Computing

we created four file categories that ransomware always triesto find and encrypt documents (lowasttxt lowastdoc(x) lowastppt(x)lowastxls(x) lowastpdf and lowastpy) keys and licenses (lowastkey lowastpemlowastcrt and lowastcer) file archives (lowastzip lowastrar) and media(lowastjp(e)g lowastmp3 and lowastavi) We obtain these files in threeways First we create files with valid headers and contentusing standard libraries (eg python-docx python-pptxpdfkit and OpenSSL) Second using Google search syntaxand crawler technology we download a large number offiles on the Internet Third we collect a number of non-confidential documents from the hosts of 20 volunteers toemulate actual user environments When we assign user filesfor the deception environment the path length is generatedrandomly Each folder may have a set of subfolders randomlyFor each folder a subset of extensions is randomly selectedFurthermore each directory name is generated based onmeaningful words Consequently we generate paths andextensions for user files giving them variable file depth andmeaningful content

To make the simulated environment more valuable wedeploy bait information on it such as database false codecomments digital certificates administrator password SSHkeys VPNkeys browser history passwords ARP records andDNS records When the bait information is obtained by anattacker it may trick it to attack the deception environment

432 The File Monitor The file monitor can detect theransomware by monitoring file type changes and file entropychanges which method is proposed in 2016 [21] The typeof data stored in a file can describe the order and positionof specific byte values unique to a file type Since files gen-erally retain their file types and formatting in the deceptionenvironment the bulk modification of such files should beconsidered suspicious When the monitor sees this type ofchanges we can infer that a ransomware attack has occurred

Entropy can express the randomness of each characterin a string The higher the entropy value the stronger therandomness The Shannon entropy of an array of bytes canbe computed as the sum

119890 =255

sum119894=0

119875119861119894log2

1119875119861119894

(1)

for119875119861119894= 119865119894119905119900119905119886119897119887119910119905119890119904 and119865

119894 the number of instances of byte

value 119894 in the array As the entropy value is represented by anumber from 0 to 8 the entropy value of 8 represents the bytearray composition of its completely uniform distributionSince the probability of each byte occurring in the encryptedciphertext is basically the same the entropy valuewill be closeto the upper limit Because the ransomware always encryptsa large number of files when we detect that a file change toa high entropy value file in a short period of time and alsochange the file type we assume that the file is subject to aransomware attack

433 The Shared Folder Monitor By traversing the diskstorage in real time the shared folder monitor discovers theupdates of the shared folders in real time It obtains thecontents of the attackers files locally which are often not

noticed by the attacker thus revealing some unexpectedtraceable clues The monitor can access a list of paths to theattacker shared folders

As we originally observed shared folders using RemoteDesktop often have a path in the remote host with the prefixldquotsclientrdquo When the monitor traverses the storage to thisprefix it uses ldquoFindFirstFilerdquo to find the first file It then usesldquoFindNextFilerdquo to find the next file with the returned handleWhen the resulting handle is in a folder format it continues totraverse all files under that folder Initially themonitor tries toget the full file names and file contents by traversing the newshared folder However during the actual experimentationit is found that as the number of files in storage grows themonitor takes far more time and resources to get all the filecontents than just the file paths All traverses are more likelyto alert the attacker Therefore the monitor only obtains thefile paths that the attacker shares on its host with the helpof ldquoGetFileNamerdquo Moreover in order to prevent encryptionby the ransomware the mounted disk monitor will directlytransfer the acquired shared file path list to another securehost

5 Clue Extraction and Analysis

Through the deception environment we trap the ransomwareattacker and collect a lot of information that may containmany traceable clues However such traceable clues areoften not visually observable and are complex in nature Inaddition many of the above clues contain information that isnot helpful in tracing back ransomware attackers Thereforein order to assist in the analysis of the monitor informationand extract the effective main traceable clues in this sectionwe propose how to extract clues and how to analyze traceableclues using an automatic approach after extraction

51 Clue Extraction We mainly obtain kinds of clues fromthe extraction including remote login information (IPaddresses) network traffic clipboard contents (pictures andtexts) shared folder information (path strings) and ran-somware samples (compile time and compile paths) Theshared folder path clues can be obtained directly from themonitor However clipboard clues compile clues and remotehost clues are often not visually observable and are complexin nature As a result the extraction module mainly focuseson the extraction of remote host clues clipboard string cluesand compilation clues

Remote Host Clue Extraction The IP address port numbersand folder path clues can be directly obtained from the logininformation and folder paths TCP packets that interact withthe configuration information in a network communicationPCAP package are usually named ldquoClientDatardquo We extractthe client name field from the ldquoClientDatardquo packet to obtainthe attackerrsquos hostname In addition the KeyboardLayoutfield indicates the default keyboard layout for an attack-ing host eg the Chinese Simplified layout number is0x0004 and the American English keyboard layout numberis 0x0409The remote users idiomatic language (the mother

Wireless Communications and Mobile Computing 7

tongue) can be found by the keyboard layout to infer theattackers nationality

Clipboard Clue Extraction The main file formats availableto the clipboard monitor are Windows Bitmap GDI fileANSI characters Unicode characters and WAV audio dataWe mainly aim at extracting the traceable clues of charactertypes It extracts character clues from the clipboard in variousformats by judging the GetClipboardData APIrsquos ldquoDataTyperdquovalue

Compilation Clue Extraction For all Windows RDP-basedransomware samples that we examined we empiricallyobserved that the most commonly used formats for thesesamples are the PE file especially lowastexe and lowastdll some PEfiles have compilation information in the file and this infor-mation does not change with the migration of the programsAs a result it is a goodway to obtain the creatorrsquos informationA PE file mainly consists of five major components DOSMZ header DOS stub PE header section headers andsection content Each component contains a great deal ofinformation There is very little information that we can useto identify the creator and some identification informationneeds to be extracted from the content of each section Inthis paper there are many clues in the PE files that canbe extracted to trace back the attacker file name PE filetype compiler version compilation path compile time lastmodified time last open time IP address URL domainname language string wide character and so on We extractmost of the clues with PEView [34] However since it cannotdirectly obtain the compilation path we use the pefile [35]tool to extract paths by locating in the PE structure

Since the extraction clues include different encodingformats to facilitate observation and the unified mode ofsubsequent analysis the extraction system completely con-verts the data encoding obtained into the UTF-8 format andsaved in the SQLite database Before submitted to the analysissystem the extracted clued are divided into two categoriesstring clues and path clues

52 Automatic Analysis At this point the clues extractedfrom this system mainly include string clues and path cluesThenumber of string clues is largemixedwith a large numberof unidentifiable strings Because the number of path cluesis also very large with no semantic correlation it is difficultto identify traceable clues manually So we focus on how toautomate the identification of traceable clues for path clues

521 Users and Programs Analysis In the analysis of thepath clues we first propose to obtain the attacker clues byidentifying the features of the context-related segmentationon the same path For instance each user has a separateuser folder and it is located in the ldquoUsersrdquo folder underDrive C As a result the system can obtain the user nameat the attacking host by obtaining the folder name under theldquoUsersrdquo folder (eg CUsersDell) The ldquoProgram filesrdquofolder usually contains the name of the software programinstalled on the machine (eg CProgram FilesMicrosoftVisual Studio 110) What is more the QQ account

number is always located in the ldquoQQQQfilerdquo folder (egDQQQQfile86lowast lowast lowast lowast lowastlowast086FileRecv)

In this way the analysis system can quickly and accuratelyget host names email accounts program names socialsoftware numbers and other traceable clues that are carriedin the attackers file paths However such user informationfor less of the overall clues is acquired by chance Thereforewe will conduct further analysis on this basis

522 Account Analysis Through the analysis of APT1 andsome other attribution reports we find that the mappingbetween the attacker and the physical world identity can bebetter obtained by analyzing the account number left by theattacker This information includes but is not limited to thelocation of the IP address the spelling and registration of thedomain name the URL corresponding IP address and thedomain name of the mailbox account Because it is difficultto identify this information effectively in a large numberof strings and path clues the analysis system automaticallyidentifies the IP address domain name URL and mailboxaccount by regular matching Then with the help of threatintelligence and big data technology more relevant clues areobtained

523 Language Identification User languages often help todetermine an attackers idiomatic language but because ofa large number of languages in different countries andthe high similarity of some languages we use automaticanalysis systems to identify the language of clues We testthe accuracy of two language identification toolkits usingentire path information in four different languages We havefound ldquolangidpyrdquo toolkit to be overall more accurate thanldquolangdetectrdquo toolkit The comparison results are shown inFigure 3 The langidpy is a language identification toolkitdeveloped by Lui and Baldwin at the University ofMelbourne[36] It combines a naive Bayes classifier with cross-domainfeature selection to provide domain-independent languageidentification

524 Traceable Strings Identification When traceable stringsare needed traditional string analysis methods usually useNamed Entity Recognition (NER) However the clues tobe analyzed mainly include strings and path Strings beforeand after the path separator have few semantic correlationsWhat is more the string between the path separators andthe remaining strings to be analyzed are mostly semanticallyunrelated due to their limited length So this paper proposesan algorithm which can quickly and automatically analyzethe traceable strings in strings and path

In order to filter out meaningful traceable clues related tothe attackerrsquos identity the path clues and string clues are splitinto strings and identified by common words and gibberishin the following steps Figure 4 shows the automatic traceableclues identification system process

Make Stop Words The system splits the path by the pathdelimiter as these separated path strings that are commonto multiple computers have no identifying effect So we takeout the file string names that are common to 20 normal user

8 Wireless Communications and Mobile Computing88

117

99 107

12

0

11

1

73

1

44

5

27

116

66

103

ENGLISH KOREANRUSSIANCHINESE

langid toolkit correct recognition

langid toolkit incorrect recognitionlangdetect toolkit correct recognition

langdetect toolkit incorrect recognition

Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively

StopWords

SentenceTokenize

Sign Split

Strings

Common wordsdetect

Alphabetic CaseCharacteristics

Result

Gibberish detect

StringClues

Word Tokenize

Traceable StringsIdentification

PathClues

Path Split

Figure 4 Process of automatic identifying of traceable clues

computers as stop words Then it removes the stop wordsfrom the strings after each split

Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again

Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue

Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is

P 119883119894+1

= 119909 | 1198831= 1199091 1198832= 1199092 119883

119894= 119909119894

= 119875 119883119894+1

= 119909 | 119883119894= 119909119894

(2)

Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes

119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]

times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)

If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities

119905ℎ119903119890119904ℎ119900119897119889

=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))

2

(4)

When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish

Wireless Communications and Mobile Computing 9

Table 1 Traceable strings recognized result [13]

String CommonWordsRecognized

GibberishRecognized

FinalRecognized

Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False

Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings

6 Evaluation

In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers

61 RDP-Based Ransomware Attacker

611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system

We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

Analysis Data Change Figure

Dat

a qua

ntity

Analysis Process

50

40

30

20

10

0

source stopword tokenize gibberish result

volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6

volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12

Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]

62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections

Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host

Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely

10 Wireless Communications and Mobile Computing

UserName

KeyboardLayout

Programs

Account

TraceableStrings

353lowastlowastlowast208

AdminDefault

CH-SIMPLI

FIED

AliPayQQ

SogouInput

MeiTu

Sinfor

360

VisualStudio

372lowastlowastlowast582

DELLbotnet

lowastlowaste

lowastlowastt

Alowastlowastlowastteam

Whlowastlowastlowastterfreebuf

visumantrag

lowastlowastlowastlowast

eaccnlowastlowastlowastlowast

taccn

Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string

used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users

Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features

Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use

Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by

Figure 7 The registration infographic of QQ account 372lowast lowast lowast82

security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname

63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware

Wireless Communications and Mobile Computing 11

Table 2 Same identifier for different samples [13]

Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc

jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501

8787197

0 100908070605040302010

number

Ransomware without a PDB pathRansomware with PDB path

Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation

samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples

What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information

Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion

Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been

353

5809

456

046

057

011

661034

399205057

1720

034114

046

zhennlnnslnbdeit

humtplfietdaes

Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)

translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo

ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo

We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following

(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China

(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)

12 Wireless Communications and Mobile Computing

Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library

(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China

7 Conclusion

In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)

Disclosure

An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)

References

[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016

[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015

[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015

[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017

[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018

[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware

[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817

[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018

[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf

[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018

[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf

[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018

[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018

[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A

Wireless Communications and Mobile Computing 13

survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018

[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018

[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015

[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016

[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020

[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016

[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015

[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016

[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016

[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015

[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015

[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016

[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016

[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015

[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018

[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017

[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015

[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016

[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017

[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments

[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language

identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012

[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004

[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst

[39] ldquoVirusTotalrdquo httpvirustotalcom

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: Automatically Traceback RDP-Based Targeted Ransomware …downloads.hindawi.com/journals/wcmc/2018/7943586.pdfResearchArticle Automatically Traceback RDP-Based Targeted Ransomware Attacks

Wireless Communications and Mobile Computing 7

tongue) can be found by the keyboard layout to infer theattackers nationality

Clipboard Clue Extraction The main file formats availableto the clipboard monitor are Windows Bitmap GDI fileANSI characters Unicode characters and WAV audio dataWe mainly aim at extracting the traceable clues of charactertypes It extracts character clues from the clipboard in variousformats by judging the GetClipboardData APIrsquos ldquoDataTyperdquovalue

Compilation Clue Extraction For all Windows RDP-basedransomware samples that we examined we empiricallyobserved that the most commonly used formats for thesesamples are the PE file especially lowastexe and lowastdll some PEfiles have compilation information in the file and this infor-mation does not change with the migration of the programsAs a result it is a goodway to obtain the creatorrsquos informationA PE file mainly consists of five major components DOSMZ header DOS stub PE header section headers andsection content Each component contains a great deal ofinformation There is very little information that we can useto identify the creator and some identification informationneeds to be extracted from the content of each section Inthis paper there are many clues in the PE files that canbe extracted to trace back the attacker file name PE filetype compiler version compilation path compile time lastmodified time last open time IP address URL domainname language string wide character and so on We extractmost of the clues with PEView [34] However since it cannotdirectly obtain the compilation path we use the pefile [35]tool to extract paths by locating in the PE structure

Since the extraction clues include different encodingformats to facilitate observation and the unified mode ofsubsequent analysis the extraction system completely con-verts the data encoding obtained into the UTF-8 format andsaved in the SQLite database Before submitted to the analysissystem the extracted clued are divided into two categoriesstring clues and path clues

52 Automatic Analysis At this point the clues extractedfrom this system mainly include string clues and path cluesThenumber of string clues is largemixedwith a large numberof unidentifiable strings Because the number of path cluesis also very large with no semantic correlation it is difficultto identify traceable clues manually So we focus on how toautomate the identification of traceable clues for path clues

521 Users and Programs Analysis In the analysis of thepath clues we first propose to obtain the attacker clues byidentifying the features of the context-related segmentationon the same path For instance each user has a separateuser folder and it is located in the ldquoUsersrdquo folder underDrive C As a result the system can obtain the user nameat the attacking host by obtaining the folder name under theldquoUsersrdquo folder (eg CUsersDell) The ldquoProgram filesrdquofolder usually contains the name of the software programinstalled on the machine (eg CProgram FilesMicrosoftVisual Studio 110) What is more the QQ account

number is always located in the ldquoQQQQfilerdquo folder (egDQQQQfile86lowast lowast lowast lowast lowastlowast086FileRecv)

In this way the analysis system can quickly and accuratelyget host names email accounts program names socialsoftware numbers and other traceable clues that are carriedin the attackers file paths However such user informationfor less of the overall clues is acquired by chance Thereforewe will conduct further analysis on this basis

522 Account Analysis Through the analysis of APT1 andsome other attribution reports we find that the mappingbetween the attacker and the physical world identity can bebetter obtained by analyzing the account number left by theattacker This information includes but is not limited to thelocation of the IP address the spelling and registration of thedomain name the URL corresponding IP address and thedomain name of the mailbox account Because it is difficultto identify this information effectively in a large numberof strings and path clues the analysis system automaticallyidentifies the IP address domain name URL and mailboxaccount by regular matching Then with the help of threatintelligence and big data technology more relevant clues areobtained

523 Language Identification User languages often help todetermine an attackers idiomatic language but because ofa large number of languages in different countries andthe high similarity of some languages we use automaticanalysis systems to identify the language of clues We testthe accuracy of two language identification toolkits usingentire path information in four different languages We havefound ldquolangidpyrdquo toolkit to be overall more accurate thanldquolangdetectrdquo toolkit The comparison results are shown inFigure 3 The langidpy is a language identification toolkitdeveloped by Lui and Baldwin at the University ofMelbourne[36] It combines a naive Bayes classifier with cross-domainfeature selection to provide domain-independent languageidentification

524 Traceable Strings Identification When traceable stringsare needed traditional string analysis methods usually useNamed Entity Recognition (NER) However the clues tobe analyzed mainly include strings and path Strings beforeand after the path separator have few semantic correlationsWhat is more the string between the path separators andthe remaining strings to be analyzed are mostly semanticallyunrelated due to their limited length So this paper proposesan algorithm which can quickly and automatically analyzethe traceable strings in strings and path

In order to filter out meaningful traceable clues related tothe attackerrsquos identity the path clues and string clues are splitinto strings and identified by common words and gibberishin the following steps Figure 4 shows the automatic traceableclues identification system process

Make Stop Words The system splits the path by the pathdelimiter as these separated path strings that are commonto multiple computers have no identifying effect So we takeout the file string names that are common to 20 normal user

8 Wireless Communications and Mobile Computing88

117

99 107

12

0

11

1

73

1

44

5

27

116

66

103

ENGLISH KOREANRUSSIANCHINESE

langid toolkit correct recognition

langid toolkit incorrect recognitionlangdetect toolkit correct recognition

langdetect toolkit incorrect recognition

Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively

StopWords

SentenceTokenize

Sign Split

Strings

Common wordsdetect

Alphabetic CaseCharacteristics

Result

Gibberish detect

StringClues

Word Tokenize

Traceable StringsIdentification

PathClues

Path Split

Figure 4 Process of automatic identifying of traceable clues

computers as stop words Then it removes the stop wordsfrom the strings after each split

Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again

Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue

Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is

P 119883119894+1

= 119909 | 1198831= 1199091 1198832= 1199092 119883

119894= 119909119894

= 119875 119883119894+1

= 119909 | 119883119894= 119909119894

(2)

Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes

119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]

times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)

If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities

119905ℎ119903119890119904ℎ119900119897119889

=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))

2

(4)

When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish

Wireless Communications and Mobile Computing 9

Table 1 Traceable strings recognized result [13]

String CommonWordsRecognized

GibberishRecognized

FinalRecognized

Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False

Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings

6 Evaluation

In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers

61 RDP-Based Ransomware Attacker

611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system

We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

Analysis Data Change Figure

Dat

a qua

ntity

Analysis Process

50

40

30

20

10

0

source stopword tokenize gibberish result

volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6

volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12

Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]

62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections

Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host

Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely

10 Wireless Communications and Mobile Computing

UserName

KeyboardLayout

Programs

Account

TraceableStrings

353lowastlowastlowast208

AdminDefault

CH-SIMPLI

FIED

AliPayQQ

SogouInput

MeiTu

Sinfor

360

VisualStudio

372lowastlowastlowast582

DELLbotnet

lowastlowaste

lowastlowastt

Alowastlowastlowastteam

Whlowastlowastlowastterfreebuf

visumantrag

lowastlowastlowastlowast

eaccnlowastlowastlowastlowast

taccn

Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string

used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users

Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features

Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use

Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by

Figure 7 The registration infographic of QQ account 372lowast lowast lowast82

security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname

63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware

Wireless Communications and Mobile Computing 11

Table 2 Same identifier for different samples [13]

Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc

jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501

8787197

0 100908070605040302010

number

Ransomware without a PDB pathRansomware with PDB path

Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation

samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples

What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information

Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion

Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been

353

5809

456

046

057

011

661034

399205057

1720

034114

046

zhennlnnslnbdeit

humtplfietdaes

Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)

translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo

ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo

We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following

(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China

(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)

12 Wireless Communications and Mobile Computing

Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library

(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China

7 Conclusion

In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)

Disclosure

An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)

References

[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016

[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015

[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015

[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017

[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018

[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware

[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817

[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018

[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf

[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018

[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf

[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018

[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018

[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A

Wireless Communications and Mobile Computing 13

survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018

[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018

[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015

[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016

[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020

[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016

[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015

[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016

[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016

[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015

[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015

[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016

[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016

[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015

[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018

[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017

[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015

[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016

[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017

[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments

[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language

identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012

[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004

[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst

[39] ldquoVirusTotalrdquo httpvirustotalcom

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: Automatically Traceback RDP-Based Targeted Ransomware …downloads.hindawi.com/journals/wcmc/2018/7943586.pdfResearchArticle Automatically Traceback RDP-Based Targeted Ransomware Attacks

8 Wireless Communications and Mobile Computing88

117

99 107

12

0

11

1

73

1

44

5

27

116

66

103

ENGLISH KOREANRUSSIANCHINESE

langid toolkit correct recognition

langid toolkit incorrect recognitionlangdetect toolkit correct recognition

langdetect toolkit incorrect recognition

Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively

StopWords

SentenceTokenize

Sign Split

Strings

Common wordsdetect

Alphabetic CaseCharacteristics

Result

Gibberish detect

StringClues

Word Tokenize

Traceable StringsIdentification

PathClues

Path Split

Figure 4 Process of automatic identifying of traceable clues

computers as stop words Then it removes the stop wordsfrom the strings after each split

Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again

Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue

Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is

P 119883119894+1

= 119909 | 1198831= 1199091 1198832= 1199092 119883

119894= 119909119894

= 119875 119883119894+1

= 119909 | 119883119894= 119909119894

(2)

Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes

119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]

times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)

If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities

119905ℎ119903119890119904ℎ119900119897119889

=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))

2

(4)

When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish

Wireless Communications and Mobile Computing 9

Table 1 Traceable strings recognized result [13]

String CommonWordsRecognized

GibberishRecognized

FinalRecognized

Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False

Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings

6 Evaluation

In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers

61 RDP-Based Ransomware Attacker

611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system

We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

Analysis Data Change Figure

Dat

a qua

ntity

Analysis Process

50

40

30

20

10

0

source stopword tokenize gibberish result

volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6

volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12

Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]

62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections

Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host

Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely

10 Wireless Communications and Mobile Computing

UserName

KeyboardLayout

Programs

Account

TraceableStrings

353lowastlowastlowast208

AdminDefault

CH-SIMPLI

FIED

AliPayQQ

SogouInput

MeiTu

Sinfor

360

VisualStudio

372lowastlowastlowast582

DELLbotnet

lowastlowaste

lowastlowastt

Alowastlowastlowastteam

Whlowastlowastlowastterfreebuf

visumantrag

lowastlowastlowastlowast

eaccnlowastlowastlowastlowast

taccn

Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string

used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users

Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features

Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use

Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by

Figure 7 The registration infographic of QQ account 372lowast lowast lowast82

security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname

63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware

Wireless Communications and Mobile Computing 11

Table 2 Same identifier for different samples [13]

Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc

jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501

8787197

0 100908070605040302010

number

Ransomware without a PDB pathRansomware with PDB path

Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation

samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples

What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information

Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion

Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been

353

5809

456

046

057

011

661034

399205057

1720

034114

046

zhennlnnslnbdeit

humtplfietdaes

Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)

translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo

ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo

We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following

(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China

(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)

12 Wireless Communications and Mobile Computing

Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library

(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China

7 Conclusion

In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)

Disclosure

An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)

References

[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016

[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015

[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015

[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017

[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018

[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware

[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817

[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018

[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf

[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018

[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf

[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018

[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018

[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A

Wireless Communications and Mobile Computing 13

survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018

[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018

[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015

[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016

[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020

[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016

[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015

[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016

[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016

[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015

[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015

[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016

[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016

[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015

[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018

[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017

[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015

[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016

[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017

[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments

[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language

identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012

[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004

[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst

[39] ldquoVirusTotalrdquo httpvirustotalcom

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: Automatically Traceback RDP-Based Targeted Ransomware …downloads.hindawi.com/journals/wcmc/2018/7943586.pdfResearchArticle Automatically Traceback RDP-Based Targeted Ransomware Attacks

Wireless Communications and Mobile Computing 9

Table 1 Traceable strings recognized result [13]

String CommonWordsRecognized

GibberishRecognized

FinalRecognized

Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False

Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings

6 Evaluation

In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers

61 RDP-Based Ransomware Attacker

611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system

We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]

Analysis Data Change Figure

Dat

a qua

ntity

Analysis Process

50

40

30

20

10

0

source stopword tokenize gibberish result

volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6

volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12

Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]

62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections

Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host

Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely

10 Wireless Communications and Mobile Computing

UserName

KeyboardLayout

Programs

Account

TraceableStrings

353lowastlowastlowast208

AdminDefault

CH-SIMPLI

FIED

AliPayQQ

SogouInput

MeiTu

Sinfor

360

VisualStudio

372lowastlowastlowast582

DELLbotnet

lowastlowaste

lowastlowastt

Alowastlowastlowastteam

Whlowastlowastlowastterfreebuf

visumantrag

lowastlowastlowastlowast

eaccnlowastlowastlowastlowast

taccn

Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string

used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users

Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features

Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use

Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by

Figure 7 The registration infographic of QQ account 372lowast lowast lowast82

security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname

63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware

Wireless Communications and Mobile Computing 11

Table 2 Same identifier for different samples [13]

Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc

jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501

8787197

0 100908070605040302010

number

Ransomware without a PDB pathRansomware with PDB path

Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation

samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples

What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information

Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion

Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been

353

5809

456

046

057

011

661034

399205057

1720

034114

046

zhennlnnslnbdeit

humtplfietdaes

Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)

translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo

ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo

We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following

(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China

(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)

12 Wireless Communications and Mobile Computing

Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library

(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China

7 Conclusion

In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)

Disclosure

An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)

References

[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016

[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015

[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015

[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017

[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018

[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware

[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817

[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018

[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf

[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018

[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf

[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018

[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018

[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A

Wireless Communications and Mobile Computing 13

survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018

[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018

[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015

[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016

[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020

[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016

[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015

[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016

[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016

[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015

[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015

[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016

[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016

[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015

[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018

[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017

[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015

[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016

[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017

[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments

[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language

identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012

[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004

[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst

[39] ldquoVirusTotalrdquo httpvirustotalcom

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: Automatically Traceback RDP-Based Targeted Ransomware …downloads.hindawi.com/journals/wcmc/2018/7943586.pdfResearchArticle Automatically Traceback RDP-Based Targeted Ransomware Attacks

10 Wireless Communications and Mobile Computing

UserName

KeyboardLayout

Programs

Account

TraceableStrings

353lowastlowastlowast208

AdminDefault

CH-SIMPLI

FIED

AliPayQQ

SogouInput

MeiTu

Sinfor

360

VisualStudio

372lowastlowastlowast582

DELLbotnet

lowastlowaste

lowastlowastt

Alowastlowastlowastteam

Whlowastlowastlowastterfreebuf

visumantrag

lowastlowastlowastlowast

eaccnlowastlowastlowastlowast

taccn

Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string

used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users

Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features

Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use

Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by

Figure 7 The registration infographic of QQ account 372lowast lowast lowast82

security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname

63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware

Wireless Communications and Mobile Computing 11

Table 2 Same identifier for different samples [13]

Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc

jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501

8787197

0 100908070605040302010

number

Ransomware without a PDB pathRansomware with PDB path

Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation

samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples

What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information

Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion

Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been

353

5809

456

046

057

011

661034

399205057

1720

034114

046

zhennlnnslnbdeit

humtplfietdaes

Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)

translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo

ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo

We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following

(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China

(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)

12 Wireless Communications and Mobile Computing

Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library

(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China

7 Conclusion

In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)

Disclosure

An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)

References

[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016

[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015

[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015

[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017

[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018

[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware

[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817

[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018

[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf

[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018

[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf

[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018

[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018

[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A

Wireless Communications and Mobile Computing 13

survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018

[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018

[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015

[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016

[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020

[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016

[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015

[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016

[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016

[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015

[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015

[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016

[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016

[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015

[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018

[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017

[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015

[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016

[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017

[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments

[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language

identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012

[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004

[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst

[39] ldquoVirusTotalrdquo httpvirustotalcom

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: Automatically Traceback RDP-Based Targeted Ransomware …downloads.hindawi.com/journals/wcmc/2018/7943586.pdfResearchArticle Automatically Traceback RDP-Based Targeted Ransomware Attacks

Wireless Communications and Mobile Computing 11

Table 2 Same identifier for different samples [13]

Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc

jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501

8787197

0 100908070605040302010

number

Ransomware without a PDB pathRansomware with PDB path

Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation

samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples

What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information

Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion

Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been

353

5809

456

046

057

011

661034

399205057

1720

034114

046

zhennlnnslnbdeit

humtplfietdaes

Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)

translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo

ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo

We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following

(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China

(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)

12 Wireless Communications and Mobile Computing

Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library

(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China

7 Conclusion

In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)

Disclosure

An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)

References

[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016

[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015

[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015

[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017

[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018

[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware

[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817

[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018

[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf

[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018

[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf

[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018

[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018

[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A

Wireless Communications and Mobile Computing 13

survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018

[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018

[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015

[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016

[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020

[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016

[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015

[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016

[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016

[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015

[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015

[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016

[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016

[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015

[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018

[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017

[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015

[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016

[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017

[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments

[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language

identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012

[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004

[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst

[39] ldquoVirusTotalrdquo httpvirustotalcom

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: Automatically Traceback RDP-Based Targeted Ransomware …downloads.hindawi.com/journals/wcmc/2018/7943586.pdfResearchArticle Automatically Traceback RDP-Based Targeted Ransomware Attacks

12 Wireless Communications and Mobile Computing

Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library

(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China

7 Conclusion

In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Additional Points

This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)

Disclosure

An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)

References

[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016

[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015

[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015

[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017

[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018

[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware

[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817

[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018

[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf

[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018

[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf

[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018

[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018

[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A

Wireless Communications and Mobile Computing 13

survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018

[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018

[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015

[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016

[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020

[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016

[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015

[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016

[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016

[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015

[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015

[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016

[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016

[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015

[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018

[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017

[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015

[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016

[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017

[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments

[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language

identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012

[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004

[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst

[39] ldquoVirusTotalrdquo httpvirustotalcom

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 13: Automatically Traceback RDP-Based Targeted Ransomware …downloads.hindawi.com/journals/wcmc/2018/7943586.pdfResearchArticle Automatically Traceback RDP-Based Targeted Ransomware Attacks

Wireless Communications and Mobile Computing 13

survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018

[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018

[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015

[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016

[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020

[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016

[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015

[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016

[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016

[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015

[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015

[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016

[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016

[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015

[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018

[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017

[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015

[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016

[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017

[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments

[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language

identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012

[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004

[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst

[39] ldquoVirusTotalrdquo httpvirustotalcom

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 14: Automatically Traceback RDP-Based Targeted Ransomware …downloads.hindawi.com/journals/wcmc/2018/7943586.pdfResearchArticle Automatically Traceback RDP-Based Targeted Ransomware Attacks

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom