Automatically Traceback RDP-Based Targeted Ransomware...
Transcript of Automatically Traceback RDP-Based Targeted Ransomware...
Research ArticleAutomatically Traceback RDP-Based TargetedRansomware Attacks
ZiHan Wang 1 ChaoGe Liu 1 Jing Qiu 2 ZhiHong Tian 2 Xiang Cui2 and Shen Su2
1 Institute of Information Engineering Chinese Academy of Sciences Beijing China2Cyberspace Institute of Advanced Technology Guangzhou University Guangzhou China
Correspondence should be addressed to ChaoGe Liu liuchaogeiieaccn Jing Qiu qiujinggzhueducnand ZhiHong Tian tianzhihonggzhueducn
Received 13 July 2018 Revised 24 October 2018 Accepted 22 November 2018 Published 6 December 2018
Guest Editor Vishal Sharma
Copyright copy 2018 ZiHan Wang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
While various ransomware defense systems have been proposed to deal with traditional randomly-spread ransomware attacks(based on their unique high-noisy behaviors at hosts and on networks) none of them considered ransomware attacks preciselyaiming at specific hosts eg using the commonRemote Desktop Protocol (RDP) To address this problem we propose a systematicmethod to fight such specifically targeted ransomware by trapping attackers via a network deception environment and thenusing traceback techniques to identify attack sources In particular we developed various monitors in the proposed deceptionenvironment to gather traceable clues about attackers and we further design an analysis system that automatically extracts andanalyze the collected clues Our evaluations show that the proposed method can trap the adversary in the deception environmentand significantly improve the efficiency of clue analysis Furthermore it also helps us trace back RDP-based ransomware attackersand ransomware makers in the practical applications
1 Introduction
Ransomware was first emerged in late 1980s [1 2] and hasresurfaced since 2013 [3] Recently several wide-spread ran-somware attacks have caused significant damages on a largenumber of user systems and businesses on the InternetSymantec reported a 250 increase in new crypto ran-somware families between 2013 and 2014 [2] In May 2017WannaCry spread across more than 150 countries and200000 computers in just a few days and severely disruptedmany businesses and personal systems [4 5] In additionspecifically targeted ransomware like Crysis disrupted manysmall and large enterprises across the globe eg Trend Microobserved that the Crysis family specifically targeted busi-nesses in Australia and New Zealand in September 2016Thenumber of such targeted ransomware attacks was doubled inJanuary 2017 compared with in late 2016 [6] What is morethe lack of focus on security has left IoT (Internet of Things)devices vulnerable which has been the target of 10 of allransomware attacks Researcher predicts IoT ransomwareattacks being likely to increase to around 25 to 30 of allransomware cases [7]
Because traditional ransomware was typically spreadrandomly without specific targets via network scanning orhost probing they can be easily detected by monitoring ofthe abnormal behaviors in host activities such as file systemoperations and network traffic [1 3 8] Recently more andmore ransomware attacks aimed at specific targets KasperskySecurity Bulletin indicated that targeted attacks have becomeone of the main propagation methods for several widespreadransomware families in 2017 [9 10] For instance an attackerusing Crysis ransomware first logs in a victimrsquos host andspreads itself via a brute force attack on the common RemoteDesktop Protocol (RDP) Such a targeted ransomware attackusually has a clear command-and-control structure andaimed at resource exploitation and resource theft on thesetargets while generating fairly limited noisy on hosts andnetworks which is hard to detect
Existing ransomware defense methods (designed fordealing with randomly-spread attacks) usually protect a hostby blocking the spreading of ransomware attacks (in nearlyreal-time) based on the signatures generated by ransomwaredetection solutions However because of the different char-acteristic of targeted ransomware attacks with less notable
HindawiWireless Communications and Mobile ComputingVolume 2018 Article ID 7943586 13 pageshttpsdoiorg10115520187943586
2 Wireless Communications and Mobile Computing
patterns these traditional blocking-based defense systemsbecome much less effective for these targeted attacks
To address this issue we propose to utilize advanceddefense schemes to protect important hosts under targetedransomware attacks In this paper we utilize the cyber decep-tion technology to help us protect critical systems throughattack guidance by drawing attackers away from theseprotected systems While the cyber deception technologyhelps us protect important targets (such as in dealing withthe Advanced Persistent Threat (APT) [11 12]) it cannothelp us traceback attack sources To address this issue wefurther design specific techniques to traceback RDP-basedransomware attacks and identify the original attack sourcesas the main deterrence of ransomware attackers
Our deception environment simulates an actual user sys-tem in three layers with multiple monitors to observe variouskey system operations related to login network communica-tion clipboard process shared folder and file system It col-lects traceable clues and helps us detect the RDP ransomwareattack Because traditional tracebackmethods usually requiresecurity experts to manually analyze a large amount ofcollected clues it is difficult for make them to achieve fastresponsesTherefore we develop an automatic analysis systemto work on traceable clues by taking advantage of naturallanguage processing and machine learning techniques
To evaluate our system we invite 122 volunteers in a sim-ulated RDP-based ransomware attack The proposed systemwas able to capture traceable clues through the proposeddeception environment It can also automatically analyzethe clues effectively The convergence rate of the analysissystem reaches about 98 Moreover we demonstrated thatit helps us traceback RDP-based attack sources in practicalapplications
In summary this paper makes the following contribu-tions
(i) We propose a systematic method to deter RDP-basedransomware by identifying attackers which traps ran-somware attackers via a cyberdeception environmentand uses an automatic analysis system to obtaintraceable clues and identify attack sources
(ii) We build a deception environment to trap RDP-based ransomware attacker by simulating an userenvironment in three layers a network layer a hostlayer and a file system layer The environment helpsus discover attacker behaviors and collects attacker-related information
(iii) We develop an automatic analysis system with natu-ral language processing and machine learning tech-niques to automatically recognize effective clues fortracing back ransomware attack sources
(iv) We designed two practical experiments to test RDP-based ransomware attacks and ransomware makersand demonstrated the feasibility of the proposedsystem
The remainder of the paper is structured as follows InSection 2 we briefly present background and related workIn Section 3 we describe the methodology of our systematic
method In Section 4 we present the implementation of ourdeception environment prototype In Section 5 we describethe details of the clue analysis system In Section 6 we discussthe evaluation setup and results We conclude this paper inSection 7
2 Background and Related Work
21 Related Work on Ransomware Defense Ransomware isa type of malware which manipulates an user system toextort money It operates in many different ways eg simplylocking a userrsquos desktop or encrypting an entire file systemRecent rampant ransomware attacks have called for effectiveransomware defense solutions In the studies that tackleransomware counteraction several solutions are proposed toconfront this attack [14 15]
Some of these solutions are proposed to deal with all typesof ransomware [1 16ndash20] For example Kharraz presented adynamic analysis system calledUNVEILThe system analyzesand detects ransomware attacks by modeling ransomwarebehaviors It focuses on the observation of three elementsnamely IO data buffer entropy access patterns and filesystem activities [1] Moreover some others are type-specificsolutions that deal with only one type such as crypto-ransomware [21ndash25] For example Scaife presented an early-warning detection system that alerts users during suspiciousfile activities [21] Utilizing a set of behavior indicators thedetection system can halt a process that appears to tamperwith a large amount of user data Furthermore it is claimedthat the system can stop a ransomware execution with amedian loss of only 10 files Similarly some studies tackle thedetection of specific ransomware families only [26ndash28] Forexample Maltester is a family-specific technique proposedby Cabaj to detect Cryptowall infections [27] It employsdynamic analysis along with honeypot technology to analyzethe network behavior and detect the infection chain
These solutions can be categorized into prevention anddetection However these two kinds of countermeasures havethe following disadvantages Firstmany preventionmeasuresrequire many services to be disabled which is likely to affectservice functionality For example Prakash suggested severalprevention measures including disabled macros in officedocuments and restricted access permissions on ldquoTemprdquoand ldquoAppdatardquo folders [29] Secondly the detecting systemis often difficult to conceal itself and perform its functionswhen against ransomware attacks that precisely aiming atspecific hosts eg using the common Remote DesktopProtocol (RDP) Finally while these countermeasures canbe used to detect or block specific ransomware attacks theycannot fundamentally inhibit the spread of ransomwareBut traceability technology can fundamentally inhibit theransomware spread by traceback to attack sources
22 RDP-Based Ransomware Attacks Traditional ransom-ware randomly spreads across the Internet in executablefiles development kits macro files and other maliciousprograms on a large scale with various dissemination meth-ods including phishing emails puddle attacks vulnerabilityattacks server intrusion and supply chain pollution They
Wireless Communications and Mobile Computing 3
Ensnare to thedeception
environment
Ransom-wareDetect
Extraction Analysis Result
No
YESMonitor
① ② ③ ④ ⑤
Figure 1 Data collection and analysis process of the whole prototype
use different ways to trick a victim to launch such programsAmong these dissemination method phishing emails is themost widely used However according to Kasperskys 2017ransomware report the number of targeted ransomwareattacks based on RDP is growing rapidly
Recently more and more ransomware criminals havespread ransomware using RDP services and then installedransomware manually These attackers use a brute-forcemethod to acquire usernames and passwords on a targetmachine with an active RDP service [9] For instance oneof the typical families Crysis a copycat of Locky not onlyaims at commonbusinesses but also targets healthcare serviceproviders [9] Crysis gains access to admin level privilegesby stealing passwords and credentials In addition duringan RDP session the attacker uses both clipboard and sharedfolders to upload files to a remote host And then attackerscan installed ransomware manually
23 Cyber Deception Technology Because many critical sys-tems are known and always on it is difficult to protect themfrom potential network attacks
(i) Attackers can use zero-day vulnerabilities highlyantagonistic malicious code or other resources tobreak the defense system
(ii) Because humans are always the weakest link indefense systems attackers can use social engineeringto identify system weaknesses and penetrate thedefense
(iii) Attackers can repeatedly explore the potential vulner-abilities on a target system to identify its weaknesses
However when an attacker aims at a specific targeteg exploiting its RDP service traditional passive defensemethods cannot be usually less effective Therefore we needto use advanced active solutions to deal with such attackswithless observable features such as cyber deception
The earlier use of cyber deception technology is honey-pot Honeypot detects attacks by deploying a series of systemsor resources in the service network that do not have realbusiness When a trap is accessed it represents an attackHoneypot system generally waiting attacks passively anddoes not have the role of misleading and confusing attack-ers Whatrsquos more the honeypot system does not have realbusiness and does not have high interactive characteristicswhich may easy to be identified by attackers Compared withthe traditional honeypot system the cyber deception systemcan be deployed more conveniently the cyber deceptionenvironment is more real and can be linked with existingdefense products It can provide more effective solutions
for APT attacks ransomware attacks intranet attacks andother threats defense A Gartner report in 2015 [30] pointedout the market prospect of deception-based security defensetechnology and predicted that 10 of organizations will usedeception tools (or tactics) to counter cyber-attacks in 2018Compared with the traditional passive defense approachcyber deception technology is an active defense approachand can be applied to all stages of network attacks We canuse this technology to trap the RDP-based attacker detecttargeted attacks and deter ransomware attackers by preciselyidentifying them
Trap RDP-Based Ransomware Attackers A targeted ran-somware attack generally has three steps detection infiltra-tion and execution [31] However traditional security solu-tions are unable to copewith the internal translation phase Inaddition traditional honeypot technology (often used to fightnetwork attacks) generally does not focus on tracing back toattackers However cyber deception technology can deceivethe attacker into a surveillance environment and consume histime and energy with bait information
Detect RDP-Based Ransomware Attacks Once the attackerobtains the correct username and password combination heusually returnsmultiple timeswithin a short period to try andinfect the compromised host [6] In one particular caseCrysiswas deployed six times on an endpoint within a span of 10minutes As a result by monitoring in the cyber deceptionenvironment we can detect RDP-based ransomware attacksin time and determine the attackerrsquos behavior through theenvironment monitor
Deter the Ransomware Attacker Deterring ransomwareattackers can be approached in two ways First if an attackerrealizes he is entrapped it becomes a deterrent Second ifthe attacker is exposed to the deception environment andremains within the perspective of the defense surveillancethe monitor can collect the attackerrsquos traceable clues that areaccidentally released by the attacker (eg IP address pathnickname strings) The exposure of these clues hidden fromattackers can be a powerful deterrent to other attackers
3 Methodology
In this section we describe our method of tracing back RDP-based ransomware attackers Figure 1 summarizes the datacollection and analysis process of the entire prototype Firstwe implement a deception environment to trap attackers Sec-ond we monitor RDP-based ransomware attacks and collectinformationwhen they occurThird we extract effective clues
4 Wireless Communications and Mobile Computing
from the monitor information Fourth we use automaticanalysis to screen a large number of clues for tracing backthe attacker Finally we will generate a report to tracebackthe RDP-based ransomware attacker We refer readers toSections 4 and 5 for the detailed implementation of this proto-type
31 Deception Environment Generally the ransomwareattack execution stage has two steps login and spread [31]To build a deception environment is nontrivial in practicebecause it must make the ransomware attacker believe that itbelongs to a real user and the user data is worthy to attackBecause advanced attackers always exploit static featuresbased on certain analysis systems before they launch attacks[32] an intuitive approach to address such reconnaissanceattacks is to build the user environment in such a way thatthe user data is valid real and nondeterministic In additionthe environment serves as an ldquoenticing targetrdquo to encourageransomware attackers We elaborate on how to generatesan artificial realistic and enticing user environment for theRDP-based ransomware in Section 4
The RDP-based attackers commonly upload maliciousprograms in the following ways before spread ransomware(1) The attacker downloads malicious programs on theInternet (2) programs are transferred through FTP SCP orother transport protocols (3) programs are uploaded throughthe clipboard (4) programs are uploaded through a sharedfolder The clipboard and shared folders are most commonlyused to transfer programs by RDP-based ransomware attacksbecause they are simple and convenient However both areeasy to monitor by our proposed system
32 Environment Monitor In order to avoid the attackersobservation and collect more attackerrsquos information a sharedfolder and clipboard on the remote PC are always used totransfer ransomware programs from the attacker machineafter the attacker logs in to the environment This paperproposes three monitor layers the network layer the hostlayer and the file layer We elaborate on how to configure themonitor system for the deception environment in Section 4
The Network Layer Monitor The network layer monitordetects a remote connection and collects information includ-ing the remote IP addresses remote ports status codes ofports keyboard layout and so on When the RDP-basedattacker logs in to the host the monitor can obtain informa-tion and detect the attack without the attackers knowledge
The Host Layer Monitor We propose to detect changes suchas processes and clipboards by monitoring the host layerThe host layer monitor can gather information about theattackers behavior and their use of these system applicationsin the deception environment For instance as the clipboardis in the system-level heap space any application in the systemhas access to it The RDP-based ransomware always takesadvantage of the clipboard to interact between applicationsMoreover it might get the clues left by the attackers using theclipboard locally as theWindows system shares the clipboardby default during the RDP session
The File Layer Monitor By monitoring the file layer we canidentify ransomware attacks by file changes Furthermore itcan gather local traceable information by monitoring filesin the shared folder For instance as a shared folder on theremote PC is always used to transfer ransomware from theattacker machine during the RDP session In addition for amore convenient and quick attack the attacker often mountsthe entire local disk to the remote computer As a resultthrough the monitor of the shared folder we can detect thenewly-added shared folders in real time and capture a largeamount of path information automatically
33 Clue Extraction Through environmental monitors itcan gather a lot of information left by attackers such aslogin information communication information clipboardcontent folder path and portable execution (PE) file Manytraceable clues can be extracted here including but notrestricted to IP address keyboard layout compile path andfile path In order to analyze these clues quickly we dividedthem into two categories string clues and path clues Theseclues are then submitted to the automatic analysis systemWewill elaborate the types of clues that the proposed system canextract in Section 51
34 Automatic Analysis According to our investigation cur-rent traceback tools mostly analyze clues manually Howeverwe usually have to deal with a large amount clues withno semantic correlation Because such manual tracebackanalysis usually takes a lot time and efforts we propose anautomatic analysis system and we will elaborate on how toanalyze clues automatically in Section 52
4 Implementation of theDeception Environment
As the Windows platform is the main target of ransomwarewe chooseWindows as the proof of concept implementationIn this section we describe the implementation details of aWindows-based deception environment prototype It elabo-rates on how the deception environment traps ransomwareattackers how the monitor detects the RDP-based attack andcollects traceable clues The entire system implementationprocess is shown in Figure 2
41 At the Network Layer
411 The Login Monitor The login monitor is used to detectattacks in real time and collect the attackerrsquos login informa-tion On the Windows platform Win32 is an environmentsubsystem that provides an API for operating system servicesand functions to control all user inputs and outputs Thelogin monitor relies on Windows APIs to gain access to thesystem and run with privileges to access their own areasof memory It uses Winsock 20 to get access to networksand uses protocols other than the TCPIP suite The loginmonitor takes network requests and sends those requests tothe Winsock 20 SPI (Service Provider Interface) by callingthe main Winsock 20 file Ws2 32dll It provides access totransport service providers and namespace providers The
Wireless Communications and Mobile Computing 5
Cyber DeceptionEnvironment
Users and ProgramsAnalysis
Network AddressAnalysis
1 Construct DeceptionEnvironment
3 Clue Extraction2 Environment Monitor
4 Automated Analysis
5 Result
Login Monitor
CommunicationMonitor
Clipboard Monitor
LanguageIdentification
Traceable StringsIdentification
AuxiliaryTraceable
CluesProcess Monitor
Shared FolderMonitor
File Monitor
Login Clues
Remote Host Clues
Clipboard Clues
Compile Clues
Path Clues
Network Layer
Host Layer
File Layer
Ransomware detect
Figure 2 RDP-based ransomware attack traceback system process
IP Helper API makes it possible to get and modify networkconfiguration settings for the localhost It consists of theDLL file iphlpapidll and includes functions that can retrieveinformation about the protocols such as TCP and UDP [33]As a result the login monitor can directly access the databuffers involved in transmission control protocols routingtables network interfaces and network protocol statistics
412 The Communication Monitor The communicationmonitor is responsible for captures network traffic By locat-ing the TCP packets in the network traffic which containthe interactive configuration information of a RDP login theRDP connection information can be obtained as the attackerrsquospersonal information The main packet characteristics (1)The packetrsquos name is often ClientData (2)The packet locationis usually after the TCP three-way handshake (3) The datapacket position in the front (4) The amount of data issignificantly larger
42 At Host Layer
421 The Deception Host According to our observationthe main methods used by attackers to login a remote hostare (1) weak password direct login (2) add access accountlogin through vulnerability As a result we deliberately setthe administrator privileges of the deception environmentas weak passwords and leave common vulnerabilities in theenvironment such as EternalBlue to attract attackers Toguide attackers to upload ransomware using only clipboardand shared folders we block external traffic transfers fromoutside the environment and close common transfer ports(eg port 20 21 80 and 443)
422 The Clipboard Monitor The clipboard monitor canobtain clues in real time by monitoring the clipboardrsquoschanges It uses Clipboard Viewer to listen to messagechanges in the clipboard without affecting its contents TheClipboard Viewer is a mechanism that can get and displaythe contents of the clipboard As Windows applications aremessage-driven the key to the monitor is responding to andprocessing clipboard change messages When the contentchanges the monitor triggers the WM DRAWCLIPBOARDmessage and sends the changed message to the first window
of the Clipboard Viewer Chain After each Clipboard Viewerwindow responds to and processes the message it must sendthe message to the next window according to the handleof the next window in its saved linked list The clipboardmonitor can obtain the clipboards new contents by usingthe ldquoGetClipboardDatardquo Windows API through the windowWhen a subsequent copy or cut operation are executedthe data in the clipboard are rewritten As a result theclipboard monitor guarantees real-time listening and writesreal-time information to the log file The log file is updatedwhenever the clipboard monitor receives a clipboard changenotice When the log file is updated to prevent it from beingdetected by the attacker or being encrypted by ransomwarethe monitor sends it to a secure host and completely erases itfrom the environment
423 The Process Monitor To run a program on a Windowssystem a new process must be created The monitor gets thePE file run by the attacker by monitoring the environmentprocess It first records the state of processes commonlyused in the deception environments before a RDP-basedransomware attack After the login monitor detects such anattack it monitors the system for newly created processesin the environment through the Windows API ldquoCreateTool-help32Snapshotrdquo takes status snapshots of all processes in realtime which includes the process identifier (PID) When asuspicious process has started the monitor recognizes it bythe PID and looks up the processs running path with thehelp of ldquoGetModuleFileNameExrdquo Finally the monitor findsthe suspicious programs PE file through the path and copiesit to the secure host
43 At the File Layer
431 Deception Files Deception files are constructed withtwo goals First we need to make the attacker believe that thedeception environment is a real users host Second we needto make the attacker believe that there are resources in theenvironment that are worthy of attacking
To simulate a realistic environment we deploy largenumbers of different types of files on it for example imagesaudio files database files and documents that can be accessedin a Windows session Based on Amin Kharrazrsquos research [1]
6 Wireless Communications and Mobile Computing
we created four file categories that ransomware always triesto find and encrypt documents (lowasttxt lowastdoc(x) lowastppt(x)lowastxls(x) lowastpdf and lowastpy) keys and licenses (lowastkey lowastpemlowastcrt and lowastcer) file archives (lowastzip lowastrar) and media(lowastjp(e)g lowastmp3 and lowastavi) We obtain these files in threeways First we create files with valid headers and contentusing standard libraries (eg python-docx python-pptxpdfkit and OpenSSL) Second using Google search syntaxand crawler technology we download a large number offiles on the Internet Third we collect a number of non-confidential documents from the hosts of 20 volunteers toemulate actual user environments When we assign user filesfor the deception environment the path length is generatedrandomly Each folder may have a set of subfolders randomlyFor each folder a subset of extensions is randomly selectedFurthermore each directory name is generated based onmeaningful words Consequently we generate paths andextensions for user files giving them variable file depth andmeaningful content
To make the simulated environment more valuable wedeploy bait information on it such as database false codecomments digital certificates administrator password SSHkeys VPNkeys browser history passwords ARP records andDNS records When the bait information is obtained by anattacker it may trick it to attack the deception environment
432 The File Monitor The file monitor can detect theransomware by monitoring file type changes and file entropychanges which method is proposed in 2016 [21] The typeof data stored in a file can describe the order and positionof specific byte values unique to a file type Since files gen-erally retain their file types and formatting in the deceptionenvironment the bulk modification of such files should beconsidered suspicious When the monitor sees this type ofchanges we can infer that a ransomware attack has occurred
Entropy can express the randomness of each characterin a string The higher the entropy value the stronger therandomness The Shannon entropy of an array of bytes canbe computed as the sum
119890 =255
sum119894=0
119875119861119894log2
1119875119861119894
(1)
for119875119861119894= 119865119894119905119900119905119886119897119887119910119905119890119904 and119865
119894 the number of instances of byte
value 119894 in the array As the entropy value is represented by anumber from 0 to 8 the entropy value of 8 represents the bytearray composition of its completely uniform distributionSince the probability of each byte occurring in the encryptedciphertext is basically the same the entropy valuewill be closeto the upper limit Because the ransomware always encryptsa large number of files when we detect that a file change toa high entropy value file in a short period of time and alsochange the file type we assume that the file is subject to aransomware attack
433 The Shared Folder Monitor By traversing the diskstorage in real time the shared folder monitor discovers theupdates of the shared folders in real time It obtains thecontents of the attackers files locally which are often not
noticed by the attacker thus revealing some unexpectedtraceable clues The monitor can access a list of paths to theattacker shared folders
As we originally observed shared folders using RemoteDesktop often have a path in the remote host with the prefixldquotsclientrdquo When the monitor traverses the storage to thisprefix it uses ldquoFindFirstFilerdquo to find the first file It then usesldquoFindNextFilerdquo to find the next file with the returned handleWhen the resulting handle is in a folder format it continues totraverse all files under that folder Initially themonitor tries toget the full file names and file contents by traversing the newshared folder However during the actual experimentationit is found that as the number of files in storage grows themonitor takes far more time and resources to get all the filecontents than just the file paths All traverses are more likelyto alert the attacker Therefore the monitor only obtains thefile paths that the attacker shares on its host with the helpof ldquoGetFileNamerdquo Moreover in order to prevent encryptionby the ransomware the mounted disk monitor will directlytransfer the acquired shared file path list to another securehost
5 Clue Extraction and Analysis
Through the deception environment we trap the ransomwareattacker and collect a lot of information that may containmany traceable clues However such traceable clues areoften not visually observable and are complex in nature Inaddition many of the above clues contain information that isnot helpful in tracing back ransomware attackers Thereforein order to assist in the analysis of the monitor informationand extract the effective main traceable clues in this sectionwe propose how to extract clues and how to analyze traceableclues using an automatic approach after extraction
51 Clue Extraction We mainly obtain kinds of clues fromthe extraction including remote login information (IPaddresses) network traffic clipboard contents (pictures andtexts) shared folder information (path strings) and ran-somware samples (compile time and compile paths) Theshared folder path clues can be obtained directly from themonitor However clipboard clues compile clues and remotehost clues are often not visually observable and are complexin nature As a result the extraction module mainly focuseson the extraction of remote host clues clipboard string cluesand compilation clues
Remote Host Clue Extraction The IP address port numbersand folder path clues can be directly obtained from the logininformation and folder paths TCP packets that interact withthe configuration information in a network communicationPCAP package are usually named ldquoClientDatardquo We extractthe client name field from the ldquoClientDatardquo packet to obtainthe attackerrsquos hostname In addition the KeyboardLayoutfield indicates the default keyboard layout for an attack-ing host eg the Chinese Simplified layout number is0x0004 and the American English keyboard layout numberis 0x0409The remote users idiomatic language (the mother
Wireless Communications and Mobile Computing 7
tongue) can be found by the keyboard layout to infer theattackers nationality
Clipboard Clue Extraction The main file formats availableto the clipboard monitor are Windows Bitmap GDI fileANSI characters Unicode characters and WAV audio dataWe mainly aim at extracting the traceable clues of charactertypes It extracts character clues from the clipboard in variousformats by judging the GetClipboardData APIrsquos ldquoDataTyperdquovalue
Compilation Clue Extraction For all Windows RDP-basedransomware samples that we examined we empiricallyobserved that the most commonly used formats for thesesamples are the PE file especially lowastexe and lowastdll some PEfiles have compilation information in the file and this infor-mation does not change with the migration of the programsAs a result it is a goodway to obtain the creatorrsquos informationA PE file mainly consists of five major components DOSMZ header DOS stub PE header section headers andsection content Each component contains a great deal ofinformation There is very little information that we can useto identify the creator and some identification informationneeds to be extracted from the content of each section Inthis paper there are many clues in the PE files that canbe extracted to trace back the attacker file name PE filetype compiler version compilation path compile time lastmodified time last open time IP address URL domainname language string wide character and so on We extractmost of the clues with PEView [34] However since it cannotdirectly obtain the compilation path we use the pefile [35]tool to extract paths by locating in the PE structure
Since the extraction clues include different encodingformats to facilitate observation and the unified mode ofsubsequent analysis the extraction system completely con-verts the data encoding obtained into the UTF-8 format andsaved in the SQLite database Before submitted to the analysissystem the extracted clued are divided into two categoriesstring clues and path clues
52 Automatic Analysis At this point the clues extractedfrom this system mainly include string clues and path cluesThenumber of string clues is largemixedwith a large numberof unidentifiable strings Because the number of path cluesis also very large with no semantic correlation it is difficultto identify traceable clues manually So we focus on how toautomate the identification of traceable clues for path clues
521 Users and Programs Analysis In the analysis of thepath clues we first propose to obtain the attacker clues byidentifying the features of the context-related segmentationon the same path For instance each user has a separateuser folder and it is located in the ldquoUsersrdquo folder underDrive C As a result the system can obtain the user nameat the attacking host by obtaining the folder name under theldquoUsersrdquo folder (eg CUsersDell) The ldquoProgram filesrdquofolder usually contains the name of the software programinstalled on the machine (eg CProgram FilesMicrosoftVisual Studio 110) What is more the QQ account
number is always located in the ldquoQQQQfilerdquo folder (egDQQQQfile86lowast lowast lowast lowast lowastlowast086FileRecv)
In this way the analysis system can quickly and accuratelyget host names email accounts program names socialsoftware numbers and other traceable clues that are carriedin the attackers file paths However such user informationfor less of the overall clues is acquired by chance Thereforewe will conduct further analysis on this basis
522 Account Analysis Through the analysis of APT1 andsome other attribution reports we find that the mappingbetween the attacker and the physical world identity can bebetter obtained by analyzing the account number left by theattacker This information includes but is not limited to thelocation of the IP address the spelling and registration of thedomain name the URL corresponding IP address and thedomain name of the mailbox account Because it is difficultto identify this information effectively in a large numberof strings and path clues the analysis system automaticallyidentifies the IP address domain name URL and mailboxaccount by regular matching Then with the help of threatintelligence and big data technology more relevant clues areobtained
523 Language Identification User languages often help todetermine an attackers idiomatic language but because ofa large number of languages in different countries andthe high similarity of some languages we use automaticanalysis systems to identify the language of clues We testthe accuracy of two language identification toolkits usingentire path information in four different languages We havefound ldquolangidpyrdquo toolkit to be overall more accurate thanldquolangdetectrdquo toolkit The comparison results are shown inFigure 3 The langidpy is a language identification toolkitdeveloped by Lui and Baldwin at the University ofMelbourne[36] It combines a naive Bayes classifier with cross-domainfeature selection to provide domain-independent languageidentification
524 Traceable Strings Identification When traceable stringsare needed traditional string analysis methods usually useNamed Entity Recognition (NER) However the clues tobe analyzed mainly include strings and path Strings beforeand after the path separator have few semantic correlationsWhat is more the string between the path separators andthe remaining strings to be analyzed are mostly semanticallyunrelated due to their limited length So this paper proposesan algorithm which can quickly and automatically analyzethe traceable strings in strings and path
In order to filter out meaningful traceable clues related tothe attackerrsquos identity the path clues and string clues are splitinto strings and identified by common words and gibberishin the following steps Figure 4 shows the automatic traceableclues identification system process
Make Stop Words The system splits the path by the pathdelimiter as these separated path strings that are commonto multiple computers have no identifying effect So we takeout the file string names that are common to 20 normal user
8 Wireless Communications and Mobile Computing88
117
99 107
12
0
11
1
73
1
44
5
27
116
66
103
ENGLISH KOREANRUSSIANCHINESE
langid toolkit correct recognition
langid toolkit incorrect recognitionlangdetect toolkit correct recognition
langdetect toolkit incorrect recognition
Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively
StopWords
SentenceTokenize
Sign Split
Strings
Common wordsdetect
Alphabetic CaseCharacteristics
Result
Gibberish detect
StringClues
Word Tokenize
Traceable StringsIdentification
PathClues
Path Split
Figure 4 Process of automatic identifying of traceable clues
computers as stop words Then it removes the stop wordsfrom the strings after each split
Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again
Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue
Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is
P 119883119894+1
= 119909 | 1198831= 1199091 1198832= 1199092 119883
119894= 119909119894
= 119875 119883119894+1
= 119909 | 119883119894= 119909119894
(2)
Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes
119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]
times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)
If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities
119905ℎ119903119890119904ℎ119900119897119889
=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))
2
(4)
When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish
Wireless Communications and Mobile Computing 9
Table 1 Traceable strings recognized result [13]
String CommonWordsRecognized
GibberishRecognized
FinalRecognized
Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False
Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings
6 Evaluation
In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers
61 RDP-Based Ransomware Attacker
611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system
We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
Analysis Data Change Figure
Dat
a qua
ntity
Analysis Process
50
40
30
20
10
0
source stopword tokenize gibberish result
volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6
volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12
Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]
62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections
Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host
Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely
10 Wireless Communications and Mobile Computing
UserName
KeyboardLayout
Programs
Account
TraceableStrings
353lowastlowastlowast208
AdminDefault
CH-SIMPLI
FIED
AliPayQQ
SogouInput
MeiTu
Sinfor
360
VisualStudio
372lowastlowastlowast582
DELLbotnet
lowastlowaste
lowastlowastt
Alowastlowastlowastteam
Whlowastlowastlowastterfreebuf
visumantrag
lowastlowastlowastlowast
eaccnlowastlowastlowastlowast
taccn
Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string
used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users
Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features
Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use
Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by
Figure 7 The registration infographic of QQ account 372lowast lowast lowast82
security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname
63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware
Wireless Communications and Mobile Computing 11
Table 2 Same identifier for different samples [13]
Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc
jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501
8787197
0 100908070605040302010
number
Ransomware without a PDB pathRansomware with PDB path
Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation
samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples
What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information
Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion
Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been
353
5809
456
046
057
011
661034
399205057
1720
034114
046
zhennlnnslnbdeit
humtplfietdaes
Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)
translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo
ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo
We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following
(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China
(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)
12 Wireless Communications and Mobile Computing
Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library
(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China
7 Conclusion
In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Additional Points
This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)
Disclosure
An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)
References
[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016
[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015
[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015
[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017
[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018
[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware
[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817
[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018
[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf
[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018
[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf
[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018
[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018
[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A
Wireless Communications and Mobile Computing 13
survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018
[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018
[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015
[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016
[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020
[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016
[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015
[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016
[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016
[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015
[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015
[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016
[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016
[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015
[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018
[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017
[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015
[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016
[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017
[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments
[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language
identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012
[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004
[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst
[39] ldquoVirusTotalrdquo httpvirustotalcom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
2 Wireless Communications and Mobile Computing
patterns these traditional blocking-based defense systemsbecome much less effective for these targeted attacks
To address this issue we propose to utilize advanceddefense schemes to protect important hosts under targetedransomware attacks In this paper we utilize the cyber decep-tion technology to help us protect critical systems throughattack guidance by drawing attackers away from theseprotected systems While the cyber deception technologyhelps us protect important targets (such as in dealing withthe Advanced Persistent Threat (APT) [11 12]) it cannothelp us traceback attack sources To address this issue wefurther design specific techniques to traceback RDP-basedransomware attacks and identify the original attack sourcesas the main deterrence of ransomware attackers
Our deception environment simulates an actual user sys-tem in three layers with multiple monitors to observe variouskey system operations related to login network communica-tion clipboard process shared folder and file system It col-lects traceable clues and helps us detect the RDP ransomwareattack Because traditional tracebackmethods usually requiresecurity experts to manually analyze a large amount ofcollected clues it is difficult for make them to achieve fastresponsesTherefore we develop an automatic analysis systemto work on traceable clues by taking advantage of naturallanguage processing and machine learning techniques
To evaluate our system we invite 122 volunteers in a sim-ulated RDP-based ransomware attack The proposed systemwas able to capture traceable clues through the proposeddeception environment It can also automatically analyzethe clues effectively The convergence rate of the analysissystem reaches about 98 Moreover we demonstrated thatit helps us traceback RDP-based attack sources in practicalapplications
In summary this paper makes the following contribu-tions
(i) We propose a systematic method to deter RDP-basedransomware by identifying attackers which traps ran-somware attackers via a cyberdeception environmentand uses an automatic analysis system to obtaintraceable clues and identify attack sources
(ii) We build a deception environment to trap RDP-based ransomware attacker by simulating an userenvironment in three layers a network layer a hostlayer and a file system layer The environment helpsus discover attacker behaviors and collects attacker-related information
(iii) We develop an automatic analysis system with natu-ral language processing and machine learning tech-niques to automatically recognize effective clues fortracing back ransomware attack sources
(iv) We designed two practical experiments to test RDP-based ransomware attacks and ransomware makersand demonstrated the feasibility of the proposedsystem
The remainder of the paper is structured as follows InSection 2 we briefly present background and related workIn Section 3 we describe the methodology of our systematic
method In Section 4 we present the implementation of ourdeception environment prototype In Section 5 we describethe details of the clue analysis system In Section 6 we discussthe evaluation setup and results We conclude this paper inSection 7
2 Background and Related Work
21 Related Work on Ransomware Defense Ransomware isa type of malware which manipulates an user system toextort money It operates in many different ways eg simplylocking a userrsquos desktop or encrypting an entire file systemRecent rampant ransomware attacks have called for effectiveransomware defense solutions In the studies that tackleransomware counteraction several solutions are proposed toconfront this attack [14 15]
Some of these solutions are proposed to deal with all typesof ransomware [1 16ndash20] For example Kharraz presented adynamic analysis system calledUNVEILThe system analyzesand detects ransomware attacks by modeling ransomwarebehaviors It focuses on the observation of three elementsnamely IO data buffer entropy access patterns and filesystem activities [1] Moreover some others are type-specificsolutions that deal with only one type such as crypto-ransomware [21ndash25] For example Scaife presented an early-warning detection system that alerts users during suspiciousfile activities [21] Utilizing a set of behavior indicators thedetection system can halt a process that appears to tamperwith a large amount of user data Furthermore it is claimedthat the system can stop a ransomware execution with amedian loss of only 10 files Similarly some studies tackle thedetection of specific ransomware families only [26ndash28] Forexample Maltester is a family-specific technique proposedby Cabaj to detect Cryptowall infections [27] It employsdynamic analysis along with honeypot technology to analyzethe network behavior and detect the infection chain
These solutions can be categorized into prevention anddetection However these two kinds of countermeasures havethe following disadvantages Firstmany preventionmeasuresrequire many services to be disabled which is likely to affectservice functionality For example Prakash suggested severalprevention measures including disabled macros in officedocuments and restricted access permissions on ldquoTemprdquoand ldquoAppdatardquo folders [29] Secondly the detecting systemis often difficult to conceal itself and perform its functionswhen against ransomware attacks that precisely aiming atspecific hosts eg using the common Remote DesktopProtocol (RDP) Finally while these countermeasures canbe used to detect or block specific ransomware attacks theycannot fundamentally inhibit the spread of ransomwareBut traceability technology can fundamentally inhibit theransomware spread by traceback to attack sources
22 RDP-Based Ransomware Attacks Traditional ransom-ware randomly spreads across the Internet in executablefiles development kits macro files and other maliciousprograms on a large scale with various dissemination meth-ods including phishing emails puddle attacks vulnerabilityattacks server intrusion and supply chain pollution They
Wireless Communications and Mobile Computing 3
Ensnare to thedeception
environment
Ransom-wareDetect
Extraction Analysis Result
No
YESMonitor
① ② ③ ④ ⑤
Figure 1 Data collection and analysis process of the whole prototype
use different ways to trick a victim to launch such programsAmong these dissemination method phishing emails is themost widely used However according to Kasperskys 2017ransomware report the number of targeted ransomwareattacks based on RDP is growing rapidly
Recently more and more ransomware criminals havespread ransomware using RDP services and then installedransomware manually These attackers use a brute-forcemethod to acquire usernames and passwords on a targetmachine with an active RDP service [9] For instance oneof the typical families Crysis a copycat of Locky not onlyaims at commonbusinesses but also targets healthcare serviceproviders [9] Crysis gains access to admin level privilegesby stealing passwords and credentials In addition duringan RDP session the attacker uses both clipboard and sharedfolders to upload files to a remote host And then attackerscan installed ransomware manually
23 Cyber Deception Technology Because many critical sys-tems are known and always on it is difficult to protect themfrom potential network attacks
(i) Attackers can use zero-day vulnerabilities highlyantagonistic malicious code or other resources tobreak the defense system
(ii) Because humans are always the weakest link indefense systems attackers can use social engineeringto identify system weaknesses and penetrate thedefense
(iii) Attackers can repeatedly explore the potential vulner-abilities on a target system to identify its weaknesses
However when an attacker aims at a specific targeteg exploiting its RDP service traditional passive defensemethods cannot be usually less effective Therefore we needto use advanced active solutions to deal with such attackswithless observable features such as cyber deception
The earlier use of cyber deception technology is honey-pot Honeypot detects attacks by deploying a series of systemsor resources in the service network that do not have realbusiness When a trap is accessed it represents an attackHoneypot system generally waiting attacks passively anddoes not have the role of misleading and confusing attack-ers Whatrsquos more the honeypot system does not have realbusiness and does not have high interactive characteristicswhich may easy to be identified by attackers Compared withthe traditional honeypot system the cyber deception systemcan be deployed more conveniently the cyber deceptionenvironment is more real and can be linked with existingdefense products It can provide more effective solutions
for APT attacks ransomware attacks intranet attacks andother threats defense A Gartner report in 2015 [30] pointedout the market prospect of deception-based security defensetechnology and predicted that 10 of organizations will usedeception tools (or tactics) to counter cyber-attacks in 2018Compared with the traditional passive defense approachcyber deception technology is an active defense approachand can be applied to all stages of network attacks We canuse this technology to trap the RDP-based attacker detecttargeted attacks and deter ransomware attackers by preciselyidentifying them
Trap RDP-Based Ransomware Attackers A targeted ran-somware attack generally has three steps detection infiltra-tion and execution [31] However traditional security solu-tions are unable to copewith the internal translation phase Inaddition traditional honeypot technology (often used to fightnetwork attacks) generally does not focus on tracing back toattackers However cyber deception technology can deceivethe attacker into a surveillance environment and consume histime and energy with bait information
Detect RDP-Based Ransomware Attacks Once the attackerobtains the correct username and password combination heusually returnsmultiple timeswithin a short period to try andinfect the compromised host [6] In one particular caseCrysiswas deployed six times on an endpoint within a span of 10minutes As a result by monitoring in the cyber deceptionenvironment we can detect RDP-based ransomware attacksin time and determine the attackerrsquos behavior through theenvironment monitor
Deter the Ransomware Attacker Deterring ransomwareattackers can be approached in two ways First if an attackerrealizes he is entrapped it becomes a deterrent Second ifthe attacker is exposed to the deception environment andremains within the perspective of the defense surveillancethe monitor can collect the attackerrsquos traceable clues that areaccidentally released by the attacker (eg IP address pathnickname strings) The exposure of these clues hidden fromattackers can be a powerful deterrent to other attackers
3 Methodology
In this section we describe our method of tracing back RDP-based ransomware attackers Figure 1 summarizes the datacollection and analysis process of the entire prototype Firstwe implement a deception environment to trap attackers Sec-ond we monitor RDP-based ransomware attacks and collectinformationwhen they occurThird we extract effective clues
4 Wireless Communications and Mobile Computing
from the monitor information Fourth we use automaticanalysis to screen a large number of clues for tracing backthe attacker Finally we will generate a report to tracebackthe RDP-based ransomware attacker We refer readers toSections 4 and 5 for the detailed implementation of this proto-type
31 Deception Environment Generally the ransomwareattack execution stage has two steps login and spread [31]To build a deception environment is nontrivial in practicebecause it must make the ransomware attacker believe that itbelongs to a real user and the user data is worthy to attackBecause advanced attackers always exploit static featuresbased on certain analysis systems before they launch attacks[32] an intuitive approach to address such reconnaissanceattacks is to build the user environment in such a way thatthe user data is valid real and nondeterministic In additionthe environment serves as an ldquoenticing targetrdquo to encourageransomware attackers We elaborate on how to generatesan artificial realistic and enticing user environment for theRDP-based ransomware in Section 4
The RDP-based attackers commonly upload maliciousprograms in the following ways before spread ransomware(1) The attacker downloads malicious programs on theInternet (2) programs are transferred through FTP SCP orother transport protocols (3) programs are uploaded throughthe clipboard (4) programs are uploaded through a sharedfolder The clipboard and shared folders are most commonlyused to transfer programs by RDP-based ransomware attacksbecause they are simple and convenient However both areeasy to monitor by our proposed system
32 Environment Monitor In order to avoid the attackersobservation and collect more attackerrsquos information a sharedfolder and clipboard on the remote PC are always used totransfer ransomware programs from the attacker machineafter the attacker logs in to the environment This paperproposes three monitor layers the network layer the hostlayer and the file layer We elaborate on how to configure themonitor system for the deception environment in Section 4
The Network Layer Monitor The network layer monitordetects a remote connection and collects information includ-ing the remote IP addresses remote ports status codes ofports keyboard layout and so on When the RDP-basedattacker logs in to the host the monitor can obtain informa-tion and detect the attack without the attackers knowledge
The Host Layer Monitor We propose to detect changes suchas processes and clipboards by monitoring the host layerThe host layer monitor can gather information about theattackers behavior and their use of these system applicationsin the deception environment For instance as the clipboardis in the system-level heap space any application in the systemhas access to it The RDP-based ransomware always takesadvantage of the clipboard to interact between applicationsMoreover it might get the clues left by the attackers using theclipboard locally as theWindows system shares the clipboardby default during the RDP session
The File Layer Monitor By monitoring the file layer we canidentify ransomware attacks by file changes Furthermore itcan gather local traceable information by monitoring filesin the shared folder For instance as a shared folder on theremote PC is always used to transfer ransomware from theattacker machine during the RDP session In addition for amore convenient and quick attack the attacker often mountsthe entire local disk to the remote computer As a resultthrough the monitor of the shared folder we can detect thenewly-added shared folders in real time and capture a largeamount of path information automatically
33 Clue Extraction Through environmental monitors itcan gather a lot of information left by attackers such aslogin information communication information clipboardcontent folder path and portable execution (PE) file Manytraceable clues can be extracted here including but notrestricted to IP address keyboard layout compile path andfile path In order to analyze these clues quickly we dividedthem into two categories string clues and path clues Theseclues are then submitted to the automatic analysis systemWewill elaborate the types of clues that the proposed system canextract in Section 51
34 Automatic Analysis According to our investigation cur-rent traceback tools mostly analyze clues manually Howeverwe usually have to deal with a large amount clues withno semantic correlation Because such manual tracebackanalysis usually takes a lot time and efforts we propose anautomatic analysis system and we will elaborate on how toanalyze clues automatically in Section 52
4 Implementation of theDeception Environment
As the Windows platform is the main target of ransomwarewe chooseWindows as the proof of concept implementationIn this section we describe the implementation details of aWindows-based deception environment prototype It elabo-rates on how the deception environment traps ransomwareattackers how the monitor detects the RDP-based attack andcollects traceable clues The entire system implementationprocess is shown in Figure 2
41 At the Network Layer
411 The Login Monitor The login monitor is used to detectattacks in real time and collect the attackerrsquos login informa-tion On the Windows platform Win32 is an environmentsubsystem that provides an API for operating system servicesand functions to control all user inputs and outputs Thelogin monitor relies on Windows APIs to gain access to thesystem and run with privileges to access their own areasof memory It uses Winsock 20 to get access to networksand uses protocols other than the TCPIP suite The loginmonitor takes network requests and sends those requests tothe Winsock 20 SPI (Service Provider Interface) by callingthe main Winsock 20 file Ws2 32dll It provides access totransport service providers and namespace providers The
Wireless Communications and Mobile Computing 5
Cyber DeceptionEnvironment
Users and ProgramsAnalysis
Network AddressAnalysis
1 Construct DeceptionEnvironment
3 Clue Extraction2 Environment Monitor
4 Automated Analysis
5 Result
Login Monitor
CommunicationMonitor
Clipboard Monitor
LanguageIdentification
Traceable StringsIdentification
AuxiliaryTraceable
CluesProcess Monitor
Shared FolderMonitor
File Monitor
Login Clues
Remote Host Clues
Clipboard Clues
Compile Clues
Path Clues
Network Layer
Host Layer
File Layer
Ransomware detect
Figure 2 RDP-based ransomware attack traceback system process
IP Helper API makes it possible to get and modify networkconfiguration settings for the localhost It consists of theDLL file iphlpapidll and includes functions that can retrieveinformation about the protocols such as TCP and UDP [33]As a result the login monitor can directly access the databuffers involved in transmission control protocols routingtables network interfaces and network protocol statistics
412 The Communication Monitor The communicationmonitor is responsible for captures network traffic By locat-ing the TCP packets in the network traffic which containthe interactive configuration information of a RDP login theRDP connection information can be obtained as the attackerrsquospersonal information The main packet characteristics (1)The packetrsquos name is often ClientData (2)The packet locationis usually after the TCP three-way handshake (3) The datapacket position in the front (4) The amount of data issignificantly larger
42 At Host Layer
421 The Deception Host According to our observationthe main methods used by attackers to login a remote hostare (1) weak password direct login (2) add access accountlogin through vulnerability As a result we deliberately setthe administrator privileges of the deception environmentas weak passwords and leave common vulnerabilities in theenvironment such as EternalBlue to attract attackers Toguide attackers to upload ransomware using only clipboardand shared folders we block external traffic transfers fromoutside the environment and close common transfer ports(eg port 20 21 80 and 443)
422 The Clipboard Monitor The clipboard monitor canobtain clues in real time by monitoring the clipboardrsquoschanges It uses Clipboard Viewer to listen to messagechanges in the clipboard without affecting its contents TheClipboard Viewer is a mechanism that can get and displaythe contents of the clipboard As Windows applications aremessage-driven the key to the monitor is responding to andprocessing clipboard change messages When the contentchanges the monitor triggers the WM DRAWCLIPBOARDmessage and sends the changed message to the first window
of the Clipboard Viewer Chain After each Clipboard Viewerwindow responds to and processes the message it must sendthe message to the next window according to the handleof the next window in its saved linked list The clipboardmonitor can obtain the clipboards new contents by usingthe ldquoGetClipboardDatardquo Windows API through the windowWhen a subsequent copy or cut operation are executedthe data in the clipboard are rewritten As a result theclipboard monitor guarantees real-time listening and writesreal-time information to the log file The log file is updatedwhenever the clipboard monitor receives a clipboard changenotice When the log file is updated to prevent it from beingdetected by the attacker or being encrypted by ransomwarethe monitor sends it to a secure host and completely erases itfrom the environment
423 The Process Monitor To run a program on a Windowssystem a new process must be created The monitor gets thePE file run by the attacker by monitoring the environmentprocess It first records the state of processes commonlyused in the deception environments before a RDP-basedransomware attack After the login monitor detects such anattack it monitors the system for newly created processesin the environment through the Windows API ldquoCreateTool-help32Snapshotrdquo takes status snapshots of all processes in realtime which includes the process identifier (PID) When asuspicious process has started the monitor recognizes it bythe PID and looks up the processs running path with thehelp of ldquoGetModuleFileNameExrdquo Finally the monitor findsthe suspicious programs PE file through the path and copiesit to the secure host
43 At the File Layer
431 Deception Files Deception files are constructed withtwo goals First we need to make the attacker believe that thedeception environment is a real users host Second we needto make the attacker believe that there are resources in theenvironment that are worthy of attacking
To simulate a realistic environment we deploy largenumbers of different types of files on it for example imagesaudio files database files and documents that can be accessedin a Windows session Based on Amin Kharrazrsquos research [1]
6 Wireless Communications and Mobile Computing
we created four file categories that ransomware always triesto find and encrypt documents (lowasttxt lowastdoc(x) lowastppt(x)lowastxls(x) lowastpdf and lowastpy) keys and licenses (lowastkey lowastpemlowastcrt and lowastcer) file archives (lowastzip lowastrar) and media(lowastjp(e)g lowastmp3 and lowastavi) We obtain these files in threeways First we create files with valid headers and contentusing standard libraries (eg python-docx python-pptxpdfkit and OpenSSL) Second using Google search syntaxand crawler technology we download a large number offiles on the Internet Third we collect a number of non-confidential documents from the hosts of 20 volunteers toemulate actual user environments When we assign user filesfor the deception environment the path length is generatedrandomly Each folder may have a set of subfolders randomlyFor each folder a subset of extensions is randomly selectedFurthermore each directory name is generated based onmeaningful words Consequently we generate paths andextensions for user files giving them variable file depth andmeaningful content
To make the simulated environment more valuable wedeploy bait information on it such as database false codecomments digital certificates administrator password SSHkeys VPNkeys browser history passwords ARP records andDNS records When the bait information is obtained by anattacker it may trick it to attack the deception environment
432 The File Monitor The file monitor can detect theransomware by monitoring file type changes and file entropychanges which method is proposed in 2016 [21] The typeof data stored in a file can describe the order and positionof specific byte values unique to a file type Since files gen-erally retain their file types and formatting in the deceptionenvironment the bulk modification of such files should beconsidered suspicious When the monitor sees this type ofchanges we can infer that a ransomware attack has occurred
Entropy can express the randomness of each characterin a string The higher the entropy value the stronger therandomness The Shannon entropy of an array of bytes canbe computed as the sum
119890 =255
sum119894=0
119875119861119894log2
1119875119861119894
(1)
for119875119861119894= 119865119894119905119900119905119886119897119887119910119905119890119904 and119865
119894 the number of instances of byte
value 119894 in the array As the entropy value is represented by anumber from 0 to 8 the entropy value of 8 represents the bytearray composition of its completely uniform distributionSince the probability of each byte occurring in the encryptedciphertext is basically the same the entropy valuewill be closeto the upper limit Because the ransomware always encryptsa large number of files when we detect that a file change toa high entropy value file in a short period of time and alsochange the file type we assume that the file is subject to aransomware attack
433 The Shared Folder Monitor By traversing the diskstorage in real time the shared folder monitor discovers theupdates of the shared folders in real time It obtains thecontents of the attackers files locally which are often not
noticed by the attacker thus revealing some unexpectedtraceable clues The monitor can access a list of paths to theattacker shared folders
As we originally observed shared folders using RemoteDesktop often have a path in the remote host with the prefixldquotsclientrdquo When the monitor traverses the storage to thisprefix it uses ldquoFindFirstFilerdquo to find the first file It then usesldquoFindNextFilerdquo to find the next file with the returned handleWhen the resulting handle is in a folder format it continues totraverse all files under that folder Initially themonitor tries toget the full file names and file contents by traversing the newshared folder However during the actual experimentationit is found that as the number of files in storage grows themonitor takes far more time and resources to get all the filecontents than just the file paths All traverses are more likelyto alert the attacker Therefore the monitor only obtains thefile paths that the attacker shares on its host with the helpof ldquoGetFileNamerdquo Moreover in order to prevent encryptionby the ransomware the mounted disk monitor will directlytransfer the acquired shared file path list to another securehost
5 Clue Extraction and Analysis
Through the deception environment we trap the ransomwareattacker and collect a lot of information that may containmany traceable clues However such traceable clues areoften not visually observable and are complex in nature Inaddition many of the above clues contain information that isnot helpful in tracing back ransomware attackers Thereforein order to assist in the analysis of the monitor informationand extract the effective main traceable clues in this sectionwe propose how to extract clues and how to analyze traceableclues using an automatic approach after extraction
51 Clue Extraction We mainly obtain kinds of clues fromthe extraction including remote login information (IPaddresses) network traffic clipboard contents (pictures andtexts) shared folder information (path strings) and ran-somware samples (compile time and compile paths) Theshared folder path clues can be obtained directly from themonitor However clipboard clues compile clues and remotehost clues are often not visually observable and are complexin nature As a result the extraction module mainly focuseson the extraction of remote host clues clipboard string cluesand compilation clues
Remote Host Clue Extraction The IP address port numbersand folder path clues can be directly obtained from the logininformation and folder paths TCP packets that interact withthe configuration information in a network communicationPCAP package are usually named ldquoClientDatardquo We extractthe client name field from the ldquoClientDatardquo packet to obtainthe attackerrsquos hostname In addition the KeyboardLayoutfield indicates the default keyboard layout for an attack-ing host eg the Chinese Simplified layout number is0x0004 and the American English keyboard layout numberis 0x0409The remote users idiomatic language (the mother
Wireless Communications and Mobile Computing 7
tongue) can be found by the keyboard layout to infer theattackers nationality
Clipboard Clue Extraction The main file formats availableto the clipboard monitor are Windows Bitmap GDI fileANSI characters Unicode characters and WAV audio dataWe mainly aim at extracting the traceable clues of charactertypes It extracts character clues from the clipboard in variousformats by judging the GetClipboardData APIrsquos ldquoDataTyperdquovalue
Compilation Clue Extraction For all Windows RDP-basedransomware samples that we examined we empiricallyobserved that the most commonly used formats for thesesamples are the PE file especially lowastexe and lowastdll some PEfiles have compilation information in the file and this infor-mation does not change with the migration of the programsAs a result it is a goodway to obtain the creatorrsquos informationA PE file mainly consists of five major components DOSMZ header DOS stub PE header section headers andsection content Each component contains a great deal ofinformation There is very little information that we can useto identify the creator and some identification informationneeds to be extracted from the content of each section Inthis paper there are many clues in the PE files that canbe extracted to trace back the attacker file name PE filetype compiler version compilation path compile time lastmodified time last open time IP address URL domainname language string wide character and so on We extractmost of the clues with PEView [34] However since it cannotdirectly obtain the compilation path we use the pefile [35]tool to extract paths by locating in the PE structure
Since the extraction clues include different encodingformats to facilitate observation and the unified mode ofsubsequent analysis the extraction system completely con-verts the data encoding obtained into the UTF-8 format andsaved in the SQLite database Before submitted to the analysissystem the extracted clued are divided into two categoriesstring clues and path clues
52 Automatic Analysis At this point the clues extractedfrom this system mainly include string clues and path cluesThenumber of string clues is largemixedwith a large numberof unidentifiable strings Because the number of path cluesis also very large with no semantic correlation it is difficultto identify traceable clues manually So we focus on how toautomate the identification of traceable clues for path clues
521 Users and Programs Analysis In the analysis of thepath clues we first propose to obtain the attacker clues byidentifying the features of the context-related segmentationon the same path For instance each user has a separateuser folder and it is located in the ldquoUsersrdquo folder underDrive C As a result the system can obtain the user nameat the attacking host by obtaining the folder name under theldquoUsersrdquo folder (eg CUsersDell) The ldquoProgram filesrdquofolder usually contains the name of the software programinstalled on the machine (eg CProgram FilesMicrosoftVisual Studio 110) What is more the QQ account
number is always located in the ldquoQQQQfilerdquo folder (egDQQQQfile86lowast lowast lowast lowast lowastlowast086FileRecv)
In this way the analysis system can quickly and accuratelyget host names email accounts program names socialsoftware numbers and other traceable clues that are carriedin the attackers file paths However such user informationfor less of the overall clues is acquired by chance Thereforewe will conduct further analysis on this basis
522 Account Analysis Through the analysis of APT1 andsome other attribution reports we find that the mappingbetween the attacker and the physical world identity can bebetter obtained by analyzing the account number left by theattacker This information includes but is not limited to thelocation of the IP address the spelling and registration of thedomain name the URL corresponding IP address and thedomain name of the mailbox account Because it is difficultto identify this information effectively in a large numberof strings and path clues the analysis system automaticallyidentifies the IP address domain name URL and mailboxaccount by regular matching Then with the help of threatintelligence and big data technology more relevant clues areobtained
523 Language Identification User languages often help todetermine an attackers idiomatic language but because ofa large number of languages in different countries andthe high similarity of some languages we use automaticanalysis systems to identify the language of clues We testthe accuracy of two language identification toolkits usingentire path information in four different languages We havefound ldquolangidpyrdquo toolkit to be overall more accurate thanldquolangdetectrdquo toolkit The comparison results are shown inFigure 3 The langidpy is a language identification toolkitdeveloped by Lui and Baldwin at the University ofMelbourne[36] It combines a naive Bayes classifier with cross-domainfeature selection to provide domain-independent languageidentification
524 Traceable Strings Identification When traceable stringsare needed traditional string analysis methods usually useNamed Entity Recognition (NER) However the clues tobe analyzed mainly include strings and path Strings beforeand after the path separator have few semantic correlationsWhat is more the string between the path separators andthe remaining strings to be analyzed are mostly semanticallyunrelated due to their limited length So this paper proposesan algorithm which can quickly and automatically analyzethe traceable strings in strings and path
In order to filter out meaningful traceable clues related tothe attackerrsquos identity the path clues and string clues are splitinto strings and identified by common words and gibberishin the following steps Figure 4 shows the automatic traceableclues identification system process
Make Stop Words The system splits the path by the pathdelimiter as these separated path strings that are commonto multiple computers have no identifying effect So we takeout the file string names that are common to 20 normal user
8 Wireless Communications and Mobile Computing88
117
99 107
12
0
11
1
73
1
44
5
27
116
66
103
ENGLISH KOREANRUSSIANCHINESE
langid toolkit correct recognition
langid toolkit incorrect recognitionlangdetect toolkit correct recognition
langdetect toolkit incorrect recognition
Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively
StopWords
SentenceTokenize
Sign Split
Strings
Common wordsdetect
Alphabetic CaseCharacteristics
Result
Gibberish detect
StringClues
Word Tokenize
Traceable StringsIdentification
PathClues
Path Split
Figure 4 Process of automatic identifying of traceable clues
computers as stop words Then it removes the stop wordsfrom the strings after each split
Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again
Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue
Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is
P 119883119894+1
= 119909 | 1198831= 1199091 1198832= 1199092 119883
119894= 119909119894
= 119875 119883119894+1
= 119909 | 119883119894= 119909119894
(2)
Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes
119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]
times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)
If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities
119905ℎ119903119890119904ℎ119900119897119889
=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))
2
(4)
When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish
Wireless Communications and Mobile Computing 9
Table 1 Traceable strings recognized result [13]
String CommonWordsRecognized
GibberishRecognized
FinalRecognized
Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False
Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings
6 Evaluation
In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers
61 RDP-Based Ransomware Attacker
611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system
We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
Analysis Data Change Figure
Dat
a qua
ntity
Analysis Process
50
40
30
20
10
0
source stopword tokenize gibberish result
volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6
volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12
Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]
62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections
Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host
Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely
10 Wireless Communications and Mobile Computing
UserName
KeyboardLayout
Programs
Account
TraceableStrings
353lowastlowastlowast208
AdminDefault
CH-SIMPLI
FIED
AliPayQQ
SogouInput
MeiTu
Sinfor
360
VisualStudio
372lowastlowastlowast582
DELLbotnet
lowastlowaste
lowastlowastt
Alowastlowastlowastteam
Whlowastlowastlowastterfreebuf
visumantrag
lowastlowastlowastlowast
eaccnlowastlowastlowastlowast
taccn
Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string
used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users
Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features
Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use
Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by
Figure 7 The registration infographic of QQ account 372lowast lowast lowast82
security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname
63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware
Wireless Communications and Mobile Computing 11
Table 2 Same identifier for different samples [13]
Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc
jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501
8787197
0 100908070605040302010
number
Ransomware without a PDB pathRansomware with PDB path
Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation
samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples
What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information
Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion
Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been
353
5809
456
046
057
011
661034
399205057
1720
034114
046
zhennlnnslnbdeit
humtplfietdaes
Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)
translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo
ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo
We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following
(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China
(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)
12 Wireless Communications and Mobile Computing
Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library
(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China
7 Conclusion
In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Additional Points
This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)
Disclosure
An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)
References
[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016
[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015
[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015
[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017
[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018
[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware
[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817
[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018
[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf
[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018
[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf
[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018
[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018
[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A
Wireless Communications and Mobile Computing 13
survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018
[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018
[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015
[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016
[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020
[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016
[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015
[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016
[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016
[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015
[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015
[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016
[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016
[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015
[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018
[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017
[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015
[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016
[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017
[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments
[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language
identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012
[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004
[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst
[39] ldquoVirusTotalrdquo httpvirustotalcom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Wireless Communications and Mobile Computing 3
Ensnare to thedeception
environment
Ransom-wareDetect
Extraction Analysis Result
No
YESMonitor
① ② ③ ④ ⑤
Figure 1 Data collection and analysis process of the whole prototype
use different ways to trick a victim to launch such programsAmong these dissemination method phishing emails is themost widely used However according to Kasperskys 2017ransomware report the number of targeted ransomwareattacks based on RDP is growing rapidly
Recently more and more ransomware criminals havespread ransomware using RDP services and then installedransomware manually These attackers use a brute-forcemethod to acquire usernames and passwords on a targetmachine with an active RDP service [9] For instance oneof the typical families Crysis a copycat of Locky not onlyaims at commonbusinesses but also targets healthcare serviceproviders [9] Crysis gains access to admin level privilegesby stealing passwords and credentials In addition duringan RDP session the attacker uses both clipboard and sharedfolders to upload files to a remote host And then attackerscan installed ransomware manually
23 Cyber Deception Technology Because many critical sys-tems are known and always on it is difficult to protect themfrom potential network attacks
(i) Attackers can use zero-day vulnerabilities highlyantagonistic malicious code or other resources tobreak the defense system
(ii) Because humans are always the weakest link indefense systems attackers can use social engineeringto identify system weaknesses and penetrate thedefense
(iii) Attackers can repeatedly explore the potential vulner-abilities on a target system to identify its weaknesses
However when an attacker aims at a specific targeteg exploiting its RDP service traditional passive defensemethods cannot be usually less effective Therefore we needto use advanced active solutions to deal with such attackswithless observable features such as cyber deception
The earlier use of cyber deception technology is honey-pot Honeypot detects attacks by deploying a series of systemsor resources in the service network that do not have realbusiness When a trap is accessed it represents an attackHoneypot system generally waiting attacks passively anddoes not have the role of misleading and confusing attack-ers Whatrsquos more the honeypot system does not have realbusiness and does not have high interactive characteristicswhich may easy to be identified by attackers Compared withthe traditional honeypot system the cyber deception systemcan be deployed more conveniently the cyber deceptionenvironment is more real and can be linked with existingdefense products It can provide more effective solutions
for APT attacks ransomware attacks intranet attacks andother threats defense A Gartner report in 2015 [30] pointedout the market prospect of deception-based security defensetechnology and predicted that 10 of organizations will usedeception tools (or tactics) to counter cyber-attacks in 2018Compared with the traditional passive defense approachcyber deception technology is an active defense approachand can be applied to all stages of network attacks We canuse this technology to trap the RDP-based attacker detecttargeted attacks and deter ransomware attackers by preciselyidentifying them
Trap RDP-Based Ransomware Attackers A targeted ran-somware attack generally has three steps detection infiltra-tion and execution [31] However traditional security solu-tions are unable to copewith the internal translation phase Inaddition traditional honeypot technology (often used to fightnetwork attacks) generally does not focus on tracing back toattackers However cyber deception technology can deceivethe attacker into a surveillance environment and consume histime and energy with bait information
Detect RDP-Based Ransomware Attacks Once the attackerobtains the correct username and password combination heusually returnsmultiple timeswithin a short period to try andinfect the compromised host [6] In one particular caseCrysiswas deployed six times on an endpoint within a span of 10minutes As a result by monitoring in the cyber deceptionenvironment we can detect RDP-based ransomware attacksin time and determine the attackerrsquos behavior through theenvironment monitor
Deter the Ransomware Attacker Deterring ransomwareattackers can be approached in two ways First if an attackerrealizes he is entrapped it becomes a deterrent Second ifthe attacker is exposed to the deception environment andremains within the perspective of the defense surveillancethe monitor can collect the attackerrsquos traceable clues that areaccidentally released by the attacker (eg IP address pathnickname strings) The exposure of these clues hidden fromattackers can be a powerful deterrent to other attackers
3 Methodology
In this section we describe our method of tracing back RDP-based ransomware attackers Figure 1 summarizes the datacollection and analysis process of the entire prototype Firstwe implement a deception environment to trap attackers Sec-ond we monitor RDP-based ransomware attacks and collectinformationwhen they occurThird we extract effective clues
4 Wireless Communications and Mobile Computing
from the monitor information Fourth we use automaticanalysis to screen a large number of clues for tracing backthe attacker Finally we will generate a report to tracebackthe RDP-based ransomware attacker We refer readers toSections 4 and 5 for the detailed implementation of this proto-type
31 Deception Environment Generally the ransomwareattack execution stage has two steps login and spread [31]To build a deception environment is nontrivial in practicebecause it must make the ransomware attacker believe that itbelongs to a real user and the user data is worthy to attackBecause advanced attackers always exploit static featuresbased on certain analysis systems before they launch attacks[32] an intuitive approach to address such reconnaissanceattacks is to build the user environment in such a way thatthe user data is valid real and nondeterministic In additionthe environment serves as an ldquoenticing targetrdquo to encourageransomware attackers We elaborate on how to generatesan artificial realistic and enticing user environment for theRDP-based ransomware in Section 4
The RDP-based attackers commonly upload maliciousprograms in the following ways before spread ransomware(1) The attacker downloads malicious programs on theInternet (2) programs are transferred through FTP SCP orother transport protocols (3) programs are uploaded throughthe clipboard (4) programs are uploaded through a sharedfolder The clipboard and shared folders are most commonlyused to transfer programs by RDP-based ransomware attacksbecause they are simple and convenient However both areeasy to monitor by our proposed system
32 Environment Monitor In order to avoid the attackersobservation and collect more attackerrsquos information a sharedfolder and clipboard on the remote PC are always used totransfer ransomware programs from the attacker machineafter the attacker logs in to the environment This paperproposes three monitor layers the network layer the hostlayer and the file layer We elaborate on how to configure themonitor system for the deception environment in Section 4
The Network Layer Monitor The network layer monitordetects a remote connection and collects information includ-ing the remote IP addresses remote ports status codes ofports keyboard layout and so on When the RDP-basedattacker logs in to the host the monitor can obtain informa-tion and detect the attack without the attackers knowledge
The Host Layer Monitor We propose to detect changes suchas processes and clipboards by monitoring the host layerThe host layer monitor can gather information about theattackers behavior and their use of these system applicationsin the deception environment For instance as the clipboardis in the system-level heap space any application in the systemhas access to it The RDP-based ransomware always takesadvantage of the clipboard to interact between applicationsMoreover it might get the clues left by the attackers using theclipboard locally as theWindows system shares the clipboardby default during the RDP session
The File Layer Monitor By monitoring the file layer we canidentify ransomware attacks by file changes Furthermore itcan gather local traceable information by monitoring filesin the shared folder For instance as a shared folder on theremote PC is always used to transfer ransomware from theattacker machine during the RDP session In addition for amore convenient and quick attack the attacker often mountsthe entire local disk to the remote computer As a resultthrough the monitor of the shared folder we can detect thenewly-added shared folders in real time and capture a largeamount of path information automatically
33 Clue Extraction Through environmental monitors itcan gather a lot of information left by attackers such aslogin information communication information clipboardcontent folder path and portable execution (PE) file Manytraceable clues can be extracted here including but notrestricted to IP address keyboard layout compile path andfile path In order to analyze these clues quickly we dividedthem into two categories string clues and path clues Theseclues are then submitted to the automatic analysis systemWewill elaborate the types of clues that the proposed system canextract in Section 51
34 Automatic Analysis According to our investigation cur-rent traceback tools mostly analyze clues manually Howeverwe usually have to deal with a large amount clues withno semantic correlation Because such manual tracebackanalysis usually takes a lot time and efforts we propose anautomatic analysis system and we will elaborate on how toanalyze clues automatically in Section 52
4 Implementation of theDeception Environment
As the Windows platform is the main target of ransomwarewe chooseWindows as the proof of concept implementationIn this section we describe the implementation details of aWindows-based deception environment prototype It elabo-rates on how the deception environment traps ransomwareattackers how the monitor detects the RDP-based attack andcollects traceable clues The entire system implementationprocess is shown in Figure 2
41 At the Network Layer
411 The Login Monitor The login monitor is used to detectattacks in real time and collect the attackerrsquos login informa-tion On the Windows platform Win32 is an environmentsubsystem that provides an API for operating system servicesand functions to control all user inputs and outputs Thelogin monitor relies on Windows APIs to gain access to thesystem and run with privileges to access their own areasof memory It uses Winsock 20 to get access to networksand uses protocols other than the TCPIP suite The loginmonitor takes network requests and sends those requests tothe Winsock 20 SPI (Service Provider Interface) by callingthe main Winsock 20 file Ws2 32dll It provides access totransport service providers and namespace providers The
Wireless Communications and Mobile Computing 5
Cyber DeceptionEnvironment
Users and ProgramsAnalysis
Network AddressAnalysis
1 Construct DeceptionEnvironment
3 Clue Extraction2 Environment Monitor
4 Automated Analysis
5 Result
Login Monitor
CommunicationMonitor
Clipboard Monitor
LanguageIdentification
Traceable StringsIdentification
AuxiliaryTraceable
CluesProcess Monitor
Shared FolderMonitor
File Monitor
Login Clues
Remote Host Clues
Clipboard Clues
Compile Clues
Path Clues
Network Layer
Host Layer
File Layer
Ransomware detect
Figure 2 RDP-based ransomware attack traceback system process
IP Helper API makes it possible to get and modify networkconfiguration settings for the localhost It consists of theDLL file iphlpapidll and includes functions that can retrieveinformation about the protocols such as TCP and UDP [33]As a result the login monitor can directly access the databuffers involved in transmission control protocols routingtables network interfaces and network protocol statistics
412 The Communication Monitor The communicationmonitor is responsible for captures network traffic By locat-ing the TCP packets in the network traffic which containthe interactive configuration information of a RDP login theRDP connection information can be obtained as the attackerrsquospersonal information The main packet characteristics (1)The packetrsquos name is often ClientData (2)The packet locationis usually after the TCP three-way handshake (3) The datapacket position in the front (4) The amount of data issignificantly larger
42 At Host Layer
421 The Deception Host According to our observationthe main methods used by attackers to login a remote hostare (1) weak password direct login (2) add access accountlogin through vulnerability As a result we deliberately setthe administrator privileges of the deception environmentas weak passwords and leave common vulnerabilities in theenvironment such as EternalBlue to attract attackers Toguide attackers to upload ransomware using only clipboardand shared folders we block external traffic transfers fromoutside the environment and close common transfer ports(eg port 20 21 80 and 443)
422 The Clipboard Monitor The clipboard monitor canobtain clues in real time by monitoring the clipboardrsquoschanges It uses Clipboard Viewer to listen to messagechanges in the clipboard without affecting its contents TheClipboard Viewer is a mechanism that can get and displaythe contents of the clipboard As Windows applications aremessage-driven the key to the monitor is responding to andprocessing clipboard change messages When the contentchanges the monitor triggers the WM DRAWCLIPBOARDmessage and sends the changed message to the first window
of the Clipboard Viewer Chain After each Clipboard Viewerwindow responds to and processes the message it must sendthe message to the next window according to the handleof the next window in its saved linked list The clipboardmonitor can obtain the clipboards new contents by usingthe ldquoGetClipboardDatardquo Windows API through the windowWhen a subsequent copy or cut operation are executedthe data in the clipboard are rewritten As a result theclipboard monitor guarantees real-time listening and writesreal-time information to the log file The log file is updatedwhenever the clipboard monitor receives a clipboard changenotice When the log file is updated to prevent it from beingdetected by the attacker or being encrypted by ransomwarethe monitor sends it to a secure host and completely erases itfrom the environment
423 The Process Monitor To run a program on a Windowssystem a new process must be created The monitor gets thePE file run by the attacker by monitoring the environmentprocess It first records the state of processes commonlyused in the deception environments before a RDP-basedransomware attack After the login monitor detects such anattack it monitors the system for newly created processesin the environment through the Windows API ldquoCreateTool-help32Snapshotrdquo takes status snapshots of all processes in realtime which includes the process identifier (PID) When asuspicious process has started the monitor recognizes it bythe PID and looks up the processs running path with thehelp of ldquoGetModuleFileNameExrdquo Finally the monitor findsthe suspicious programs PE file through the path and copiesit to the secure host
43 At the File Layer
431 Deception Files Deception files are constructed withtwo goals First we need to make the attacker believe that thedeception environment is a real users host Second we needto make the attacker believe that there are resources in theenvironment that are worthy of attacking
To simulate a realistic environment we deploy largenumbers of different types of files on it for example imagesaudio files database files and documents that can be accessedin a Windows session Based on Amin Kharrazrsquos research [1]
6 Wireless Communications and Mobile Computing
we created four file categories that ransomware always triesto find and encrypt documents (lowasttxt lowastdoc(x) lowastppt(x)lowastxls(x) lowastpdf and lowastpy) keys and licenses (lowastkey lowastpemlowastcrt and lowastcer) file archives (lowastzip lowastrar) and media(lowastjp(e)g lowastmp3 and lowastavi) We obtain these files in threeways First we create files with valid headers and contentusing standard libraries (eg python-docx python-pptxpdfkit and OpenSSL) Second using Google search syntaxand crawler technology we download a large number offiles on the Internet Third we collect a number of non-confidential documents from the hosts of 20 volunteers toemulate actual user environments When we assign user filesfor the deception environment the path length is generatedrandomly Each folder may have a set of subfolders randomlyFor each folder a subset of extensions is randomly selectedFurthermore each directory name is generated based onmeaningful words Consequently we generate paths andextensions for user files giving them variable file depth andmeaningful content
To make the simulated environment more valuable wedeploy bait information on it such as database false codecomments digital certificates administrator password SSHkeys VPNkeys browser history passwords ARP records andDNS records When the bait information is obtained by anattacker it may trick it to attack the deception environment
432 The File Monitor The file monitor can detect theransomware by monitoring file type changes and file entropychanges which method is proposed in 2016 [21] The typeof data stored in a file can describe the order and positionof specific byte values unique to a file type Since files gen-erally retain their file types and formatting in the deceptionenvironment the bulk modification of such files should beconsidered suspicious When the monitor sees this type ofchanges we can infer that a ransomware attack has occurred
Entropy can express the randomness of each characterin a string The higher the entropy value the stronger therandomness The Shannon entropy of an array of bytes canbe computed as the sum
119890 =255
sum119894=0
119875119861119894log2
1119875119861119894
(1)
for119875119861119894= 119865119894119905119900119905119886119897119887119910119905119890119904 and119865
119894 the number of instances of byte
value 119894 in the array As the entropy value is represented by anumber from 0 to 8 the entropy value of 8 represents the bytearray composition of its completely uniform distributionSince the probability of each byte occurring in the encryptedciphertext is basically the same the entropy valuewill be closeto the upper limit Because the ransomware always encryptsa large number of files when we detect that a file change toa high entropy value file in a short period of time and alsochange the file type we assume that the file is subject to aransomware attack
433 The Shared Folder Monitor By traversing the diskstorage in real time the shared folder monitor discovers theupdates of the shared folders in real time It obtains thecontents of the attackers files locally which are often not
noticed by the attacker thus revealing some unexpectedtraceable clues The monitor can access a list of paths to theattacker shared folders
As we originally observed shared folders using RemoteDesktop often have a path in the remote host with the prefixldquotsclientrdquo When the monitor traverses the storage to thisprefix it uses ldquoFindFirstFilerdquo to find the first file It then usesldquoFindNextFilerdquo to find the next file with the returned handleWhen the resulting handle is in a folder format it continues totraverse all files under that folder Initially themonitor tries toget the full file names and file contents by traversing the newshared folder However during the actual experimentationit is found that as the number of files in storage grows themonitor takes far more time and resources to get all the filecontents than just the file paths All traverses are more likelyto alert the attacker Therefore the monitor only obtains thefile paths that the attacker shares on its host with the helpof ldquoGetFileNamerdquo Moreover in order to prevent encryptionby the ransomware the mounted disk monitor will directlytransfer the acquired shared file path list to another securehost
5 Clue Extraction and Analysis
Through the deception environment we trap the ransomwareattacker and collect a lot of information that may containmany traceable clues However such traceable clues areoften not visually observable and are complex in nature Inaddition many of the above clues contain information that isnot helpful in tracing back ransomware attackers Thereforein order to assist in the analysis of the monitor informationand extract the effective main traceable clues in this sectionwe propose how to extract clues and how to analyze traceableclues using an automatic approach after extraction
51 Clue Extraction We mainly obtain kinds of clues fromthe extraction including remote login information (IPaddresses) network traffic clipboard contents (pictures andtexts) shared folder information (path strings) and ran-somware samples (compile time and compile paths) Theshared folder path clues can be obtained directly from themonitor However clipboard clues compile clues and remotehost clues are often not visually observable and are complexin nature As a result the extraction module mainly focuseson the extraction of remote host clues clipboard string cluesand compilation clues
Remote Host Clue Extraction The IP address port numbersand folder path clues can be directly obtained from the logininformation and folder paths TCP packets that interact withthe configuration information in a network communicationPCAP package are usually named ldquoClientDatardquo We extractthe client name field from the ldquoClientDatardquo packet to obtainthe attackerrsquos hostname In addition the KeyboardLayoutfield indicates the default keyboard layout for an attack-ing host eg the Chinese Simplified layout number is0x0004 and the American English keyboard layout numberis 0x0409The remote users idiomatic language (the mother
Wireless Communications and Mobile Computing 7
tongue) can be found by the keyboard layout to infer theattackers nationality
Clipboard Clue Extraction The main file formats availableto the clipboard monitor are Windows Bitmap GDI fileANSI characters Unicode characters and WAV audio dataWe mainly aim at extracting the traceable clues of charactertypes It extracts character clues from the clipboard in variousformats by judging the GetClipboardData APIrsquos ldquoDataTyperdquovalue
Compilation Clue Extraction For all Windows RDP-basedransomware samples that we examined we empiricallyobserved that the most commonly used formats for thesesamples are the PE file especially lowastexe and lowastdll some PEfiles have compilation information in the file and this infor-mation does not change with the migration of the programsAs a result it is a goodway to obtain the creatorrsquos informationA PE file mainly consists of five major components DOSMZ header DOS stub PE header section headers andsection content Each component contains a great deal ofinformation There is very little information that we can useto identify the creator and some identification informationneeds to be extracted from the content of each section Inthis paper there are many clues in the PE files that canbe extracted to trace back the attacker file name PE filetype compiler version compilation path compile time lastmodified time last open time IP address URL domainname language string wide character and so on We extractmost of the clues with PEView [34] However since it cannotdirectly obtain the compilation path we use the pefile [35]tool to extract paths by locating in the PE structure
Since the extraction clues include different encodingformats to facilitate observation and the unified mode ofsubsequent analysis the extraction system completely con-verts the data encoding obtained into the UTF-8 format andsaved in the SQLite database Before submitted to the analysissystem the extracted clued are divided into two categoriesstring clues and path clues
52 Automatic Analysis At this point the clues extractedfrom this system mainly include string clues and path cluesThenumber of string clues is largemixedwith a large numberof unidentifiable strings Because the number of path cluesis also very large with no semantic correlation it is difficultto identify traceable clues manually So we focus on how toautomate the identification of traceable clues for path clues
521 Users and Programs Analysis In the analysis of thepath clues we first propose to obtain the attacker clues byidentifying the features of the context-related segmentationon the same path For instance each user has a separateuser folder and it is located in the ldquoUsersrdquo folder underDrive C As a result the system can obtain the user nameat the attacking host by obtaining the folder name under theldquoUsersrdquo folder (eg CUsersDell) The ldquoProgram filesrdquofolder usually contains the name of the software programinstalled on the machine (eg CProgram FilesMicrosoftVisual Studio 110) What is more the QQ account
number is always located in the ldquoQQQQfilerdquo folder (egDQQQQfile86lowast lowast lowast lowast lowastlowast086FileRecv)
In this way the analysis system can quickly and accuratelyget host names email accounts program names socialsoftware numbers and other traceable clues that are carriedin the attackers file paths However such user informationfor less of the overall clues is acquired by chance Thereforewe will conduct further analysis on this basis
522 Account Analysis Through the analysis of APT1 andsome other attribution reports we find that the mappingbetween the attacker and the physical world identity can bebetter obtained by analyzing the account number left by theattacker This information includes but is not limited to thelocation of the IP address the spelling and registration of thedomain name the URL corresponding IP address and thedomain name of the mailbox account Because it is difficultto identify this information effectively in a large numberof strings and path clues the analysis system automaticallyidentifies the IP address domain name URL and mailboxaccount by regular matching Then with the help of threatintelligence and big data technology more relevant clues areobtained
523 Language Identification User languages often help todetermine an attackers idiomatic language but because ofa large number of languages in different countries andthe high similarity of some languages we use automaticanalysis systems to identify the language of clues We testthe accuracy of two language identification toolkits usingentire path information in four different languages We havefound ldquolangidpyrdquo toolkit to be overall more accurate thanldquolangdetectrdquo toolkit The comparison results are shown inFigure 3 The langidpy is a language identification toolkitdeveloped by Lui and Baldwin at the University ofMelbourne[36] It combines a naive Bayes classifier with cross-domainfeature selection to provide domain-independent languageidentification
524 Traceable Strings Identification When traceable stringsare needed traditional string analysis methods usually useNamed Entity Recognition (NER) However the clues tobe analyzed mainly include strings and path Strings beforeand after the path separator have few semantic correlationsWhat is more the string between the path separators andthe remaining strings to be analyzed are mostly semanticallyunrelated due to their limited length So this paper proposesan algorithm which can quickly and automatically analyzethe traceable strings in strings and path
In order to filter out meaningful traceable clues related tothe attackerrsquos identity the path clues and string clues are splitinto strings and identified by common words and gibberishin the following steps Figure 4 shows the automatic traceableclues identification system process
Make Stop Words The system splits the path by the pathdelimiter as these separated path strings that are commonto multiple computers have no identifying effect So we takeout the file string names that are common to 20 normal user
8 Wireless Communications and Mobile Computing88
117
99 107
12
0
11
1
73
1
44
5
27
116
66
103
ENGLISH KOREANRUSSIANCHINESE
langid toolkit correct recognition
langid toolkit incorrect recognitionlangdetect toolkit correct recognition
langdetect toolkit incorrect recognition
Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively
StopWords
SentenceTokenize
Sign Split
Strings
Common wordsdetect
Alphabetic CaseCharacteristics
Result
Gibberish detect
StringClues
Word Tokenize
Traceable StringsIdentification
PathClues
Path Split
Figure 4 Process of automatic identifying of traceable clues
computers as stop words Then it removes the stop wordsfrom the strings after each split
Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again
Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue
Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is
P 119883119894+1
= 119909 | 1198831= 1199091 1198832= 1199092 119883
119894= 119909119894
= 119875 119883119894+1
= 119909 | 119883119894= 119909119894
(2)
Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes
119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]
times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)
If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities
119905ℎ119903119890119904ℎ119900119897119889
=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))
2
(4)
When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish
Wireless Communications and Mobile Computing 9
Table 1 Traceable strings recognized result [13]
String CommonWordsRecognized
GibberishRecognized
FinalRecognized
Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False
Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings
6 Evaluation
In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers
61 RDP-Based Ransomware Attacker
611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system
We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
Analysis Data Change Figure
Dat
a qua
ntity
Analysis Process
50
40
30
20
10
0
source stopword tokenize gibberish result
volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6
volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12
Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]
62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections
Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host
Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely
10 Wireless Communications and Mobile Computing
UserName
KeyboardLayout
Programs
Account
TraceableStrings
353lowastlowastlowast208
AdminDefault
CH-SIMPLI
FIED
AliPayQQ
SogouInput
MeiTu
Sinfor
360
VisualStudio
372lowastlowastlowast582
DELLbotnet
lowastlowaste
lowastlowastt
Alowastlowastlowastteam
Whlowastlowastlowastterfreebuf
visumantrag
lowastlowastlowastlowast
eaccnlowastlowastlowastlowast
taccn
Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string
used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users
Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features
Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use
Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by
Figure 7 The registration infographic of QQ account 372lowast lowast lowast82
security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname
63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware
Wireless Communications and Mobile Computing 11
Table 2 Same identifier for different samples [13]
Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc
jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501
8787197
0 100908070605040302010
number
Ransomware without a PDB pathRansomware with PDB path
Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation
samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples
What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information
Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion
Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been
353
5809
456
046
057
011
661034
399205057
1720
034114
046
zhennlnnslnbdeit
humtplfietdaes
Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)
translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo
ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo
We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following
(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China
(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)
12 Wireless Communications and Mobile Computing
Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library
(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China
7 Conclusion
In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Additional Points
This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)
Disclosure
An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)
References
[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016
[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015
[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015
[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017
[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018
[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware
[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817
[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018
[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf
[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018
[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf
[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018
[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018
[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A
Wireless Communications and Mobile Computing 13
survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018
[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018
[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015
[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016
[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020
[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016
[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015
[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016
[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016
[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015
[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015
[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016
[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016
[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015
[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018
[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017
[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015
[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016
[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017
[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments
[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language
identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012
[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004
[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst
[39] ldquoVirusTotalrdquo httpvirustotalcom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
4 Wireless Communications and Mobile Computing
from the monitor information Fourth we use automaticanalysis to screen a large number of clues for tracing backthe attacker Finally we will generate a report to tracebackthe RDP-based ransomware attacker We refer readers toSections 4 and 5 for the detailed implementation of this proto-type
31 Deception Environment Generally the ransomwareattack execution stage has two steps login and spread [31]To build a deception environment is nontrivial in practicebecause it must make the ransomware attacker believe that itbelongs to a real user and the user data is worthy to attackBecause advanced attackers always exploit static featuresbased on certain analysis systems before they launch attacks[32] an intuitive approach to address such reconnaissanceattacks is to build the user environment in such a way thatthe user data is valid real and nondeterministic In additionthe environment serves as an ldquoenticing targetrdquo to encourageransomware attackers We elaborate on how to generatesan artificial realistic and enticing user environment for theRDP-based ransomware in Section 4
The RDP-based attackers commonly upload maliciousprograms in the following ways before spread ransomware(1) The attacker downloads malicious programs on theInternet (2) programs are transferred through FTP SCP orother transport protocols (3) programs are uploaded throughthe clipboard (4) programs are uploaded through a sharedfolder The clipboard and shared folders are most commonlyused to transfer programs by RDP-based ransomware attacksbecause they are simple and convenient However both areeasy to monitor by our proposed system
32 Environment Monitor In order to avoid the attackersobservation and collect more attackerrsquos information a sharedfolder and clipboard on the remote PC are always used totransfer ransomware programs from the attacker machineafter the attacker logs in to the environment This paperproposes three monitor layers the network layer the hostlayer and the file layer We elaborate on how to configure themonitor system for the deception environment in Section 4
The Network Layer Monitor The network layer monitordetects a remote connection and collects information includ-ing the remote IP addresses remote ports status codes ofports keyboard layout and so on When the RDP-basedattacker logs in to the host the monitor can obtain informa-tion and detect the attack without the attackers knowledge
The Host Layer Monitor We propose to detect changes suchas processes and clipboards by monitoring the host layerThe host layer monitor can gather information about theattackers behavior and their use of these system applicationsin the deception environment For instance as the clipboardis in the system-level heap space any application in the systemhas access to it The RDP-based ransomware always takesadvantage of the clipboard to interact between applicationsMoreover it might get the clues left by the attackers using theclipboard locally as theWindows system shares the clipboardby default during the RDP session
The File Layer Monitor By monitoring the file layer we canidentify ransomware attacks by file changes Furthermore itcan gather local traceable information by monitoring filesin the shared folder For instance as a shared folder on theremote PC is always used to transfer ransomware from theattacker machine during the RDP session In addition for amore convenient and quick attack the attacker often mountsthe entire local disk to the remote computer As a resultthrough the monitor of the shared folder we can detect thenewly-added shared folders in real time and capture a largeamount of path information automatically
33 Clue Extraction Through environmental monitors itcan gather a lot of information left by attackers such aslogin information communication information clipboardcontent folder path and portable execution (PE) file Manytraceable clues can be extracted here including but notrestricted to IP address keyboard layout compile path andfile path In order to analyze these clues quickly we dividedthem into two categories string clues and path clues Theseclues are then submitted to the automatic analysis systemWewill elaborate the types of clues that the proposed system canextract in Section 51
34 Automatic Analysis According to our investigation cur-rent traceback tools mostly analyze clues manually Howeverwe usually have to deal with a large amount clues withno semantic correlation Because such manual tracebackanalysis usually takes a lot time and efforts we propose anautomatic analysis system and we will elaborate on how toanalyze clues automatically in Section 52
4 Implementation of theDeception Environment
As the Windows platform is the main target of ransomwarewe chooseWindows as the proof of concept implementationIn this section we describe the implementation details of aWindows-based deception environment prototype It elabo-rates on how the deception environment traps ransomwareattackers how the monitor detects the RDP-based attack andcollects traceable clues The entire system implementationprocess is shown in Figure 2
41 At the Network Layer
411 The Login Monitor The login monitor is used to detectattacks in real time and collect the attackerrsquos login informa-tion On the Windows platform Win32 is an environmentsubsystem that provides an API for operating system servicesand functions to control all user inputs and outputs Thelogin monitor relies on Windows APIs to gain access to thesystem and run with privileges to access their own areasof memory It uses Winsock 20 to get access to networksand uses protocols other than the TCPIP suite The loginmonitor takes network requests and sends those requests tothe Winsock 20 SPI (Service Provider Interface) by callingthe main Winsock 20 file Ws2 32dll It provides access totransport service providers and namespace providers The
Wireless Communications and Mobile Computing 5
Cyber DeceptionEnvironment
Users and ProgramsAnalysis
Network AddressAnalysis
1 Construct DeceptionEnvironment
3 Clue Extraction2 Environment Monitor
4 Automated Analysis
5 Result
Login Monitor
CommunicationMonitor
Clipboard Monitor
LanguageIdentification
Traceable StringsIdentification
AuxiliaryTraceable
CluesProcess Monitor
Shared FolderMonitor
File Monitor
Login Clues
Remote Host Clues
Clipboard Clues
Compile Clues
Path Clues
Network Layer
Host Layer
File Layer
Ransomware detect
Figure 2 RDP-based ransomware attack traceback system process
IP Helper API makes it possible to get and modify networkconfiguration settings for the localhost It consists of theDLL file iphlpapidll and includes functions that can retrieveinformation about the protocols such as TCP and UDP [33]As a result the login monitor can directly access the databuffers involved in transmission control protocols routingtables network interfaces and network protocol statistics
412 The Communication Monitor The communicationmonitor is responsible for captures network traffic By locat-ing the TCP packets in the network traffic which containthe interactive configuration information of a RDP login theRDP connection information can be obtained as the attackerrsquospersonal information The main packet characteristics (1)The packetrsquos name is often ClientData (2)The packet locationis usually after the TCP three-way handshake (3) The datapacket position in the front (4) The amount of data issignificantly larger
42 At Host Layer
421 The Deception Host According to our observationthe main methods used by attackers to login a remote hostare (1) weak password direct login (2) add access accountlogin through vulnerability As a result we deliberately setthe administrator privileges of the deception environmentas weak passwords and leave common vulnerabilities in theenvironment such as EternalBlue to attract attackers Toguide attackers to upload ransomware using only clipboardand shared folders we block external traffic transfers fromoutside the environment and close common transfer ports(eg port 20 21 80 and 443)
422 The Clipboard Monitor The clipboard monitor canobtain clues in real time by monitoring the clipboardrsquoschanges It uses Clipboard Viewer to listen to messagechanges in the clipboard without affecting its contents TheClipboard Viewer is a mechanism that can get and displaythe contents of the clipboard As Windows applications aremessage-driven the key to the monitor is responding to andprocessing clipboard change messages When the contentchanges the monitor triggers the WM DRAWCLIPBOARDmessage and sends the changed message to the first window
of the Clipboard Viewer Chain After each Clipboard Viewerwindow responds to and processes the message it must sendthe message to the next window according to the handleof the next window in its saved linked list The clipboardmonitor can obtain the clipboards new contents by usingthe ldquoGetClipboardDatardquo Windows API through the windowWhen a subsequent copy or cut operation are executedthe data in the clipboard are rewritten As a result theclipboard monitor guarantees real-time listening and writesreal-time information to the log file The log file is updatedwhenever the clipboard monitor receives a clipboard changenotice When the log file is updated to prevent it from beingdetected by the attacker or being encrypted by ransomwarethe monitor sends it to a secure host and completely erases itfrom the environment
423 The Process Monitor To run a program on a Windowssystem a new process must be created The monitor gets thePE file run by the attacker by monitoring the environmentprocess It first records the state of processes commonlyused in the deception environments before a RDP-basedransomware attack After the login monitor detects such anattack it monitors the system for newly created processesin the environment through the Windows API ldquoCreateTool-help32Snapshotrdquo takes status snapshots of all processes in realtime which includes the process identifier (PID) When asuspicious process has started the monitor recognizes it bythe PID and looks up the processs running path with thehelp of ldquoGetModuleFileNameExrdquo Finally the monitor findsthe suspicious programs PE file through the path and copiesit to the secure host
43 At the File Layer
431 Deception Files Deception files are constructed withtwo goals First we need to make the attacker believe that thedeception environment is a real users host Second we needto make the attacker believe that there are resources in theenvironment that are worthy of attacking
To simulate a realistic environment we deploy largenumbers of different types of files on it for example imagesaudio files database files and documents that can be accessedin a Windows session Based on Amin Kharrazrsquos research [1]
6 Wireless Communications and Mobile Computing
we created four file categories that ransomware always triesto find and encrypt documents (lowasttxt lowastdoc(x) lowastppt(x)lowastxls(x) lowastpdf and lowastpy) keys and licenses (lowastkey lowastpemlowastcrt and lowastcer) file archives (lowastzip lowastrar) and media(lowastjp(e)g lowastmp3 and lowastavi) We obtain these files in threeways First we create files with valid headers and contentusing standard libraries (eg python-docx python-pptxpdfkit and OpenSSL) Second using Google search syntaxand crawler technology we download a large number offiles on the Internet Third we collect a number of non-confidential documents from the hosts of 20 volunteers toemulate actual user environments When we assign user filesfor the deception environment the path length is generatedrandomly Each folder may have a set of subfolders randomlyFor each folder a subset of extensions is randomly selectedFurthermore each directory name is generated based onmeaningful words Consequently we generate paths andextensions for user files giving them variable file depth andmeaningful content
To make the simulated environment more valuable wedeploy bait information on it such as database false codecomments digital certificates administrator password SSHkeys VPNkeys browser history passwords ARP records andDNS records When the bait information is obtained by anattacker it may trick it to attack the deception environment
432 The File Monitor The file monitor can detect theransomware by monitoring file type changes and file entropychanges which method is proposed in 2016 [21] The typeof data stored in a file can describe the order and positionof specific byte values unique to a file type Since files gen-erally retain their file types and formatting in the deceptionenvironment the bulk modification of such files should beconsidered suspicious When the monitor sees this type ofchanges we can infer that a ransomware attack has occurred
Entropy can express the randomness of each characterin a string The higher the entropy value the stronger therandomness The Shannon entropy of an array of bytes canbe computed as the sum
119890 =255
sum119894=0
119875119861119894log2
1119875119861119894
(1)
for119875119861119894= 119865119894119905119900119905119886119897119887119910119905119890119904 and119865
119894 the number of instances of byte
value 119894 in the array As the entropy value is represented by anumber from 0 to 8 the entropy value of 8 represents the bytearray composition of its completely uniform distributionSince the probability of each byte occurring in the encryptedciphertext is basically the same the entropy valuewill be closeto the upper limit Because the ransomware always encryptsa large number of files when we detect that a file change toa high entropy value file in a short period of time and alsochange the file type we assume that the file is subject to aransomware attack
433 The Shared Folder Monitor By traversing the diskstorage in real time the shared folder monitor discovers theupdates of the shared folders in real time It obtains thecontents of the attackers files locally which are often not
noticed by the attacker thus revealing some unexpectedtraceable clues The monitor can access a list of paths to theattacker shared folders
As we originally observed shared folders using RemoteDesktop often have a path in the remote host with the prefixldquotsclientrdquo When the monitor traverses the storage to thisprefix it uses ldquoFindFirstFilerdquo to find the first file It then usesldquoFindNextFilerdquo to find the next file with the returned handleWhen the resulting handle is in a folder format it continues totraverse all files under that folder Initially themonitor tries toget the full file names and file contents by traversing the newshared folder However during the actual experimentationit is found that as the number of files in storage grows themonitor takes far more time and resources to get all the filecontents than just the file paths All traverses are more likelyto alert the attacker Therefore the monitor only obtains thefile paths that the attacker shares on its host with the helpof ldquoGetFileNamerdquo Moreover in order to prevent encryptionby the ransomware the mounted disk monitor will directlytransfer the acquired shared file path list to another securehost
5 Clue Extraction and Analysis
Through the deception environment we trap the ransomwareattacker and collect a lot of information that may containmany traceable clues However such traceable clues areoften not visually observable and are complex in nature Inaddition many of the above clues contain information that isnot helpful in tracing back ransomware attackers Thereforein order to assist in the analysis of the monitor informationand extract the effective main traceable clues in this sectionwe propose how to extract clues and how to analyze traceableclues using an automatic approach after extraction
51 Clue Extraction We mainly obtain kinds of clues fromthe extraction including remote login information (IPaddresses) network traffic clipboard contents (pictures andtexts) shared folder information (path strings) and ran-somware samples (compile time and compile paths) Theshared folder path clues can be obtained directly from themonitor However clipboard clues compile clues and remotehost clues are often not visually observable and are complexin nature As a result the extraction module mainly focuseson the extraction of remote host clues clipboard string cluesand compilation clues
Remote Host Clue Extraction The IP address port numbersand folder path clues can be directly obtained from the logininformation and folder paths TCP packets that interact withthe configuration information in a network communicationPCAP package are usually named ldquoClientDatardquo We extractthe client name field from the ldquoClientDatardquo packet to obtainthe attackerrsquos hostname In addition the KeyboardLayoutfield indicates the default keyboard layout for an attack-ing host eg the Chinese Simplified layout number is0x0004 and the American English keyboard layout numberis 0x0409The remote users idiomatic language (the mother
Wireless Communications and Mobile Computing 7
tongue) can be found by the keyboard layout to infer theattackers nationality
Clipboard Clue Extraction The main file formats availableto the clipboard monitor are Windows Bitmap GDI fileANSI characters Unicode characters and WAV audio dataWe mainly aim at extracting the traceable clues of charactertypes It extracts character clues from the clipboard in variousformats by judging the GetClipboardData APIrsquos ldquoDataTyperdquovalue
Compilation Clue Extraction For all Windows RDP-basedransomware samples that we examined we empiricallyobserved that the most commonly used formats for thesesamples are the PE file especially lowastexe and lowastdll some PEfiles have compilation information in the file and this infor-mation does not change with the migration of the programsAs a result it is a goodway to obtain the creatorrsquos informationA PE file mainly consists of five major components DOSMZ header DOS stub PE header section headers andsection content Each component contains a great deal ofinformation There is very little information that we can useto identify the creator and some identification informationneeds to be extracted from the content of each section Inthis paper there are many clues in the PE files that canbe extracted to trace back the attacker file name PE filetype compiler version compilation path compile time lastmodified time last open time IP address URL domainname language string wide character and so on We extractmost of the clues with PEView [34] However since it cannotdirectly obtain the compilation path we use the pefile [35]tool to extract paths by locating in the PE structure
Since the extraction clues include different encodingformats to facilitate observation and the unified mode ofsubsequent analysis the extraction system completely con-verts the data encoding obtained into the UTF-8 format andsaved in the SQLite database Before submitted to the analysissystem the extracted clued are divided into two categoriesstring clues and path clues
52 Automatic Analysis At this point the clues extractedfrom this system mainly include string clues and path cluesThenumber of string clues is largemixedwith a large numberof unidentifiable strings Because the number of path cluesis also very large with no semantic correlation it is difficultto identify traceable clues manually So we focus on how toautomate the identification of traceable clues for path clues
521 Users and Programs Analysis In the analysis of thepath clues we first propose to obtain the attacker clues byidentifying the features of the context-related segmentationon the same path For instance each user has a separateuser folder and it is located in the ldquoUsersrdquo folder underDrive C As a result the system can obtain the user nameat the attacking host by obtaining the folder name under theldquoUsersrdquo folder (eg CUsersDell) The ldquoProgram filesrdquofolder usually contains the name of the software programinstalled on the machine (eg CProgram FilesMicrosoftVisual Studio 110) What is more the QQ account
number is always located in the ldquoQQQQfilerdquo folder (egDQQQQfile86lowast lowast lowast lowast lowastlowast086FileRecv)
In this way the analysis system can quickly and accuratelyget host names email accounts program names socialsoftware numbers and other traceable clues that are carriedin the attackers file paths However such user informationfor less of the overall clues is acquired by chance Thereforewe will conduct further analysis on this basis
522 Account Analysis Through the analysis of APT1 andsome other attribution reports we find that the mappingbetween the attacker and the physical world identity can bebetter obtained by analyzing the account number left by theattacker This information includes but is not limited to thelocation of the IP address the spelling and registration of thedomain name the URL corresponding IP address and thedomain name of the mailbox account Because it is difficultto identify this information effectively in a large numberof strings and path clues the analysis system automaticallyidentifies the IP address domain name URL and mailboxaccount by regular matching Then with the help of threatintelligence and big data technology more relevant clues areobtained
523 Language Identification User languages often help todetermine an attackers idiomatic language but because ofa large number of languages in different countries andthe high similarity of some languages we use automaticanalysis systems to identify the language of clues We testthe accuracy of two language identification toolkits usingentire path information in four different languages We havefound ldquolangidpyrdquo toolkit to be overall more accurate thanldquolangdetectrdquo toolkit The comparison results are shown inFigure 3 The langidpy is a language identification toolkitdeveloped by Lui and Baldwin at the University ofMelbourne[36] It combines a naive Bayes classifier with cross-domainfeature selection to provide domain-independent languageidentification
524 Traceable Strings Identification When traceable stringsare needed traditional string analysis methods usually useNamed Entity Recognition (NER) However the clues tobe analyzed mainly include strings and path Strings beforeand after the path separator have few semantic correlationsWhat is more the string between the path separators andthe remaining strings to be analyzed are mostly semanticallyunrelated due to their limited length So this paper proposesan algorithm which can quickly and automatically analyzethe traceable strings in strings and path
In order to filter out meaningful traceable clues related tothe attackerrsquos identity the path clues and string clues are splitinto strings and identified by common words and gibberishin the following steps Figure 4 shows the automatic traceableclues identification system process
Make Stop Words The system splits the path by the pathdelimiter as these separated path strings that are commonto multiple computers have no identifying effect So we takeout the file string names that are common to 20 normal user
8 Wireless Communications and Mobile Computing88
117
99 107
12
0
11
1
73
1
44
5
27
116
66
103
ENGLISH KOREANRUSSIANCHINESE
langid toolkit correct recognition
langid toolkit incorrect recognitionlangdetect toolkit correct recognition
langdetect toolkit incorrect recognition
Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively
StopWords
SentenceTokenize
Sign Split
Strings
Common wordsdetect
Alphabetic CaseCharacteristics
Result
Gibberish detect
StringClues
Word Tokenize
Traceable StringsIdentification
PathClues
Path Split
Figure 4 Process of automatic identifying of traceable clues
computers as stop words Then it removes the stop wordsfrom the strings after each split
Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again
Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue
Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is
P 119883119894+1
= 119909 | 1198831= 1199091 1198832= 1199092 119883
119894= 119909119894
= 119875 119883119894+1
= 119909 | 119883119894= 119909119894
(2)
Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes
119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]
times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)
If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities
119905ℎ119903119890119904ℎ119900119897119889
=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))
2
(4)
When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish
Wireless Communications and Mobile Computing 9
Table 1 Traceable strings recognized result [13]
String CommonWordsRecognized
GibberishRecognized
FinalRecognized
Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False
Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings
6 Evaluation
In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers
61 RDP-Based Ransomware Attacker
611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system
We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
Analysis Data Change Figure
Dat
a qua
ntity
Analysis Process
50
40
30
20
10
0
source stopword tokenize gibberish result
volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6
volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12
Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]
62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections
Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host
Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely
10 Wireless Communications and Mobile Computing
UserName
KeyboardLayout
Programs
Account
TraceableStrings
353lowastlowastlowast208
AdminDefault
CH-SIMPLI
FIED
AliPayQQ
SogouInput
MeiTu
Sinfor
360
VisualStudio
372lowastlowastlowast582
DELLbotnet
lowastlowaste
lowastlowastt
Alowastlowastlowastteam
Whlowastlowastlowastterfreebuf
visumantrag
lowastlowastlowastlowast
eaccnlowastlowastlowastlowast
taccn
Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string
used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users
Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features
Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use
Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by
Figure 7 The registration infographic of QQ account 372lowast lowast lowast82
security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname
63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware
Wireless Communications and Mobile Computing 11
Table 2 Same identifier for different samples [13]
Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc
jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501
8787197
0 100908070605040302010
number
Ransomware without a PDB pathRansomware with PDB path
Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation
samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples
What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information
Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion
Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been
353
5809
456
046
057
011
661034
399205057
1720
034114
046
zhennlnnslnbdeit
humtplfietdaes
Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)
translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo
ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo
We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following
(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China
(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)
12 Wireless Communications and Mobile Computing
Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library
(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China
7 Conclusion
In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Additional Points
This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)
Disclosure
An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)
References
[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016
[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015
[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015
[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017
[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018
[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware
[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817
[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018
[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf
[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018
[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf
[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018
[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018
[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A
Wireless Communications and Mobile Computing 13
survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018
[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018
[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015
[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016
[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020
[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016
[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015
[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016
[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016
[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015
[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015
[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016
[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016
[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015
[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018
[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017
[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015
[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016
[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017
[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments
[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language
identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012
[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004
[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst
[39] ldquoVirusTotalrdquo httpvirustotalcom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Wireless Communications and Mobile Computing 5
Cyber DeceptionEnvironment
Users and ProgramsAnalysis
Network AddressAnalysis
1 Construct DeceptionEnvironment
3 Clue Extraction2 Environment Monitor
4 Automated Analysis
5 Result
Login Monitor
CommunicationMonitor
Clipboard Monitor
LanguageIdentification
Traceable StringsIdentification
AuxiliaryTraceable
CluesProcess Monitor
Shared FolderMonitor
File Monitor
Login Clues
Remote Host Clues
Clipboard Clues
Compile Clues
Path Clues
Network Layer
Host Layer
File Layer
Ransomware detect
Figure 2 RDP-based ransomware attack traceback system process
IP Helper API makes it possible to get and modify networkconfiguration settings for the localhost It consists of theDLL file iphlpapidll and includes functions that can retrieveinformation about the protocols such as TCP and UDP [33]As a result the login monitor can directly access the databuffers involved in transmission control protocols routingtables network interfaces and network protocol statistics
412 The Communication Monitor The communicationmonitor is responsible for captures network traffic By locat-ing the TCP packets in the network traffic which containthe interactive configuration information of a RDP login theRDP connection information can be obtained as the attackerrsquospersonal information The main packet characteristics (1)The packetrsquos name is often ClientData (2)The packet locationis usually after the TCP three-way handshake (3) The datapacket position in the front (4) The amount of data issignificantly larger
42 At Host Layer
421 The Deception Host According to our observationthe main methods used by attackers to login a remote hostare (1) weak password direct login (2) add access accountlogin through vulnerability As a result we deliberately setthe administrator privileges of the deception environmentas weak passwords and leave common vulnerabilities in theenvironment such as EternalBlue to attract attackers Toguide attackers to upload ransomware using only clipboardand shared folders we block external traffic transfers fromoutside the environment and close common transfer ports(eg port 20 21 80 and 443)
422 The Clipboard Monitor The clipboard monitor canobtain clues in real time by monitoring the clipboardrsquoschanges It uses Clipboard Viewer to listen to messagechanges in the clipboard without affecting its contents TheClipboard Viewer is a mechanism that can get and displaythe contents of the clipboard As Windows applications aremessage-driven the key to the monitor is responding to andprocessing clipboard change messages When the contentchanges the monitor triggers the WM DRAWCLIPBOARDmessage and sends the changed message to the first window
of the Clipboard Viewer Chain After each Clipboard Viewerwindow responds to and processes the message it must sendthe message to the next window according to the handleof the next window in its saved linked list The clipboardmonitor can obtain the clipboards new contents by usingthe ldquoGetClipboardDatardquo Windows API through the windowWhen a subsequent copy or cut operation are executedthe data in the clipboard are rewritten As a result theclipboard monitor guarantees real-time listening and writesreal-time information to the log file The log file is updatedwhenever the clipboard monitor receives a clipboard changenotice When the log file is updated to prevent it from beingdetected by the attacker or being encrypted by ransomwarethe monitor sends it to a secure host and completely erases itfrom the environment
423 The Process Monitor To run a program on a Windowssystem a new process must be created The monitor gets thePE file run by the attacker by monitoring the environmentprocess It first records the state of processes commonlyused in the deception environments before a RDP-basedransomware attack After the login monitor detects such anattack it monitors the system for newly created processesin the environment through the Windows API ldquoCreateTool-help32Snapshotrdquo takes status snapshots of all processes in realtime which includes the process identifier (PID) When asuspicious process has started the monitor recognizes it bythe PID and looks up the processs running path with thehelp of ldquoGetModuleFileNameExrdquo Finally the monitor findsthe suspicious programs PE file through the path and copiesit to the secure host
43 At the File Layer
431 Deception Files Deception files are constructed withtwo goals First we need to make the attacker believe that thedeception environment is a real users host Second we needto make the attacker believe that there are resources in theenvironment that are worthy of attacking
To simulate a realistic environment we deploy largenumbers of different types of files on it for example imagesaudio files database files and documents that can be accessedin a Windows session Based on Amin Kharrazrsquos research [1]
6 Wireless Communications and Mobile Computing
we created four file categories that ransomware always triesto find and encrypt documents (lowasttxt lowastdoc(x) lowastppt(x)lowastxls(x) lowastpdf and lowastpy) keys and licenses (lowastkey lowastpemlowastcrt and lowastcer) file archives (lowastzip lowastrar) and media(lowastjp(e)g lowastmp3 and lowastavi) We obtain these files in threeways First we create files with valid headers and contentusing standard libraries (eg python-docx python-pptxpdfkit and OpenSSL) Second using Google search syntaxand crawler technology we download a large number offiles on the Internet Third we collect a number of non-confidential documents from the hosts of 20 volunteers toemulate actual user environments When we assign user filesfor the deception environment the path length is generatedrandomly Each folder may have a set of subfolders randomlyFor each folder a subset of extensions is randomly selectedFurthermore each directory name is generated based onmeaningful words Consequently we generate paths andextensions for user files giving them variable file depth andmeaningful content
To make the simulated environment more valuable wedeploy bait information on it such as database false codecomments digital certificates administrator password SSHkeys VPNkeys browser history passwords ARP records andDNS records When the bait information is obtained by anattacker it may trick it to attack the deception environment
432 The File Monitor The file monitor can detect theransomware by monitoring file type changes and file entropychanges which method is proposed in 2016 [21] The typeof data stored in a file can describe the order and positionof specific byte values unique to a file type Since files gen-erally retain their file types and formatting in the deceptionenvironment the bulk modification of such files should beconsidered suspicious When the monitor sees this type ofchanges we can infer that a ransomware attack has occurred
Entropy can express the randomness of each characterin a string The higher the entropy value the stronger therandomness The Shannon entropy of an array of bytes canbe computed as the sum
119890 =255
sum119894=0
119875119861119894log2
1119875119861119894
(1)
for119875119861119894= 119865119894119905119900119905119886119897119887119910119905119890119904 and119865
119894 the number of instances of byte
value 119894 in the array As the entropy value is represented by anumber from 0 to 8 the entropy value of 8 represents the bytearray composition of its completely uniform distributionSince the probability of each byte occurring in the encryptedciphertext is basically the same the entropy valuewill be closeto the upper limit Because the ransomware always encryptsa large number of files when we detect that a file change toa high entropy value file in a short period of time and alsochange the file type we assume that the file is subject to aransomware attack
433 The Shared Folder Monitor By traversing the diskstorage in real time the shared folder monitor discovers theupdates of the shared folders in real time It obtains thecontents of the attackers files locally which are often not
noticed by the attacker thus revealing some unexpectedtraceable clues The monitor can access a list of paths to theattacker shared folders
As we originally observed shared folders using RemoteDesktop often have a path in the remote host with the prefixldquotsclientrdquo When the monitor traverses the storage to thisprefix it uses ldquoFindFirstFilerdquo to find the first file It then usesldquoFindNextFilerdquo to find the next file with the returned handleWhen the resulting handle is in a folder format it continues totraverse all files under that folder Initially themonitor tries toget the full file names and file contents by traversing the newshared folder However during the actual experimentationit is found that as the number of files in storage grows themonitor takes far more time and resources to get all the filecontents than just the file paths All traverses are more likelyto alert the attacker Therefore the monitor only obtains thefile paths that the attacker shares on its host with the helpof ldquoGetFileNamerdquo Moreover in order to prevent encryptionby the ransomware the mounted disk monitor will directlytransfer the acquired shared file path list to another securehost
5 Clue Extraction and Analysis
Through the deception environment we trap the ransomwareattacker and collect a lot of information that may containmany traceable clues However such traceable clues areoften not visually observable and are complex in nature Inaddition many of the above clues contain information that isnot helpful in tracing back ransomware attackers Thereforein order to assist in the analysis of the monitor informationand extract the effective main traceable clues in this sectionwe propose how to extract clues and how to analyze traceableclues using an automatic approach after extraction
51 Clue Extraction We mainly obtain kinds of clues fromthe extraction including remote login information (IPaddresses) network traffic clipboard contents (pictures andtexts) shared folder information (path strings) and ran-somware samples (compile time and compile paths) Theshared folder path clues can be obtained directly from themonitor However clipboard clues compile clues and remotehost clues are often not visually observable and are complexin nature As a result the extraction module mainly focuseson the extraction of remote host clues clipboard string cluesand compilation clues
Remote Host Clue Extraction The IP address port numbersand folder path clues can be directly obtained from the logininformation and folder paths TCP packets that interact withthe configuration information in a network communicationPCAP package are usually named ldquoClientDatardquo We extractthe client name field from the ldquoClientDatardquo packet to obtainthe attackerrsquos hostname In addition the KeyboardLayoutfield indicates the default keyboard layout for an attack-ing host eg the Chinese Simplified layout number is0x0004 and the American English keyboard layout numberis 0x0409The remote users idiomatic language (the mother
Wireless Communications and Mobile Computing 7
tongue) can be found by the keyboard layout to infer theattackers nationality
Clipboard Clue Extraction The main file formats availableto the clipboard monitor are Windows Bitmap GDI fileANSI characters Unicode characters and WAV audio dataWe mainly aim at extracting the traceable clues of charactertypes It extracts character clues from the clipboard in variousformats by judging the GetClipboardData APIrsquos ldquoDataTyperdquovalue
Compilation Clue Extraction For all Windows RDP-basedransomware samples that we examined we empiricallyobserved that the most commonly used formats for thesesamples are the PE file especially lowastexe and lowastdll some PEfiles have compilation information in the file and this infor-mation does not change with the migration of the programsAs a result it is a goodway to obtain the creatorrsquos informationA PE file mainly consists of five major components DOSMZ header DOS stub PE header section headers andsection content Each component contains a great deal ofinformation There is very little information that we can useto identify the creator and some identification informationneeds to be extracted from the content of each section Inthis paper there are many clues in the PE files that canbe extracted to trace back the attacker file name PE filetype compiler version compilation path compile time lastmodified time last open time IP address URL domainname language string wide character and so on We extractmost of the clues with PEView [34] However since it cannotdirectly obtain the compilation path we use the pefile [35]tool to extract paths by locating in the PE structure
Since the extraction clues include different encodingformats to facilitate observation and the unified mode ofsubsequent analysis the extraction system completely con-verts the data encoding obtained into the UTF-8 format andsaved in the SQLite database Before submitted to the analysissystem the extracted clued are divided into two categoriesstring clues and path clues
52 Automatic Analysis At this point the clues extractedfrom this system mainly include string clues and path cluesThenumber of string clues is largemixedwith a large numberof unidentifiable strings Because the number of path cluesis also very large with no semantic correlation it is difficultto identify traceable clues manually So we focus on how toautomate the identification of traceable clues for path clues
521 Users and Programs Analysis In the analysis of thepath clues we first propose to obtain the attacker clues byidentifying the features of the context-related segmentationon the same path For instance each user has a separateuser folder and it is located in the ldquoUsersrdquo folder underDrive C As a result the system can obtain the user nameat the attacking host by obtaining the folder name under theldquoUsersrdquo folder (eg CUsersDell) The ldquoProgram filesrdquofolder usually contains the name of the software programinstalled on the machine (eg CProgram FilesMicrosoftVisual Studio 110) What is more the QQ account
number is always located in the ldquoQQQQfilerdquo folder (egDQQQQfile86lowast lowast lowast lowast lowastlowast086FileRecv)
In this way the analysis system can quickly and accuratelyget host names email accounts program names socialsoftware numbers and other traceable clues that are carriedin the attackers file paths However such user informationfor less of the overall clues is acquired by chance Thereforewe will conduct further analysis on this basis
522 Account Analysis Through the analysis of APT1 andsome other attribution reports we find that the mappingbetween the attacker and the physical world identity can bebetter obtained by analyzing the account number left by theattacker This information includes but is not limited to thelocation of the IP address the spelling and registration of thedomain name the URL corresponding IP address and thedomain name of the mailbox account Because it is difficultto identify this information effectively in a large numberof strings and path clues the analysis system automaticallyidentifies the IP address domain name URL and mailboxaccount by regular matching Then with the help of threatintelligence and big data technology more relevant clues areobtained
523 Language Identification User languages often help todetermine an attackers idiomatic language but because ofa large number of languages in different countries andthe high similarity of some languages we use automaticanalysis systems to identify the language of clues We testthe accuracy of two language identification toolkits usingentire path information in four different languages We havefound ldquolangidpyrdquo toolkit to be overall more accurate thanldquolangdetectrdquo toolkit The comparison results are shown inFigure 3 The langidpy is a language identification toolkitdeveloped by Lui and Baldwin at the University ofMelbourne[36] It combines a naive Bayes classifier with cross-domainfeature selection to provide domain-independent languageidentification
524 Traceable Strings Identification When traceable stringsare needed traditional string analysis methods usually useNamed Entity Recognition (NER) However the clues tobe analyzed mainly include strings and path Strings beforeand after the path separator have few semantic correlationsWhat is more the string between the path separators andthe remaining strings to be analyzed are mostly semanticallyunrelated due to their limited length So this paper proposesan algorithm which can quickly and automatically analyzethe traceable strings in strings and path
In order to filter out meaningful traceable clues related tothe attackerrsquos identity the path clues and string clues are splitinto strings and identified by common words and gibberishin the following steps Figure 4 shows the automatic traceableclues identification system process
Make Stop Words The system splits the path by the pathdelimiter as these separated path strings that are commonto multiple computers have no identifying effect So we takeout the file string names that are common to 20 normal user
8 Wireless Communications and Mobile Computing88
117
99 107
12
0
11
1
73
1
44
5
27
116
66
103
ENGLISH KOREANRUSSIANCHINESE
langid toolkit correct recognition
langid toolkit incorrect recognitionlangdetect toolkit correct recognition
langdetect toolkit incorrect recognition
Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively
StopWords
SentenceTokenize
Sign Split
Strings
Common wordsdetect
Alphabetic CaseCharacteristics
Result
Gibberish detect
StringClues
Word Tokenize
Traceable StringsIdentification
PathClues
Path Split
Figure 4 Process of automatic identifying of traceable clues
computers as stop words Then it removes the stop wordsfrom the strings after each split
Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again
Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue
Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is
P 119883119894+1
= 119909 | 1198831= 1199091 1198832= 1199092 119883
119894= 119909119894
= 119875 119883119894+1
= 119909 | 119883119894= 119909119894
(2)
Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes
119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]
times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)
If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities
119905ℎ119903119890119904ℎ119900119897119889
=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))
2
(4)
When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish
Wireless Communications and Mobile Computing 9
Table 1 Traceable strings recognized result [13]
String CommonWordsRecognized
GibberishRecognized
FinalRecognized
Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False
Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings
6 Evaluation
In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers
61 RDP-Based Ransomware Attacker
611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system
We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
Analysis Data Change Figure
Dat
a qua
ntity
Analysis Process
50
40
30
20
10
0
source stopword tokenize gibberish result
volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6
volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12
Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]
62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections
Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host
Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely
10 Wireless Communications and Mobile Computing
UserName
KeyboardLayout
Programs
Account
TraceableStrings
353lowastlowastlowast208
AdminDefault
CH-SIMPLI
FIED
AliPayQQ
SogouInput
MeiTu
Sinfor
360
VisualStudio
372lowastlowastlowast582
DELLbotnet
lowastlowaste
lowastlowastt
Alowastlowastlowastteam
Whlowastlowastlowastterfreebuf
visumantrag
lowastlowastlowastlowast
eaccnlowastlowastlowastlowast
taccn
Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string
used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users
Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features
Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use
Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by
Figure 7 The registration infographic of QQ account 372lowast lowast lowast82
security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname
63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware
Wireless Communications and Mobile Computing 11
Table 2 Same identifier for different samples [13]
Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc
jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501
8787197
0 100908070605040302010
number
Ransomware without a PDB pathRansomware with PDB path
Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation
samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples
What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information
Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion
Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been
353
5809
456
046
057
011
661034
399205057
1720
034114
046
zhennlnnslnbdeit
humtplfietdaes
Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)
translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo
ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo
We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following
(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China
(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)
12 Wireless Communications and Mobile Computing
Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library
(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China
7 Conclusion
In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Additional Points
This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)
Disclosure
An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)
References
[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016
[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015
[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015
[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017
[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018
[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware
[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817
[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018
[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf
[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018
[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf
[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018
[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018
[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A
Wireless Communications and Mobile Computing 13
survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018
[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018
[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015
[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016
[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020
[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016
[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015
[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016
[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016
[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015
[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015
[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016
[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016
[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015
[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018
[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017
[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015
[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016
[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017
[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments
[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language
identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012
[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004
[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst
[39] ldquoVirusTotalrdquo httpvirustotalcom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
6 Wireless Communications and Mobile Computing
we created four file categories that ransomware always triesto find and encrypt documents (lowasttxt lowastdoc(x) lowastppt(x)lowastxls(x) lowastpdf and lowastpy) keys and licenses (lowastkey lowastpemlowastcrt and lowastcer) file archives (lowastzip lowastrar) and media(lowastjp(e)g lowastmp3 and lowastavi) We obtain these files in threeways First we create files with valid headers and contentusing standard libraries (eg python-docx python-pptxpdfkit and OpenSSL) Second using Google search syntaxand crawler technology we download a large number offiles on the Internet Third we collect a number of non-confidential documents from the hosts of 20 volunteers toemulate actual user environments When we assign user filesfor the deception environment the path length is generatedrandomly Each folder may have a set of subfolders randomlyFor each folder a subset of extensions is randomly selectedFurthermore each directory name is generated based onmeaningful words Consequently we generate paths andextensions for user files giving them variable file depth andmeaningful content
To make the simulated environment more valuable wedeploy bait information on it such as database false codecomments digital certificates administrator password SSHkeys VPNkeys browser history passwords ARP records andDNS records When the bait information is obtained by anattacker it may trick it to attack the deception environment
432 The File Monitor The file monitor can detect theransomware by monitoring file type changes and file entropychanges which method is proposed in 2016 [21] The typeof data stored in a file can describe the order and positionof specific byte values unique to a file type Since files gen-erally retain their file types and formatting in the deceptionenvironment the bulk modification of such files should beconsidered suspicious When the monitor sees this type ofchanges we can infer that a ransomware attack has occurred
Entropy can express the randomness of each characterin a string The higher the entropy value the stronger therandomness The Shannon entropy of an array of bytes canbe computed as the sum
119890 =255
sum119894=0
119875119861119894log2
1119875119861119894
(1)
for119875119861119894= 119865119894119905119900119905119886119897119887119910119905119890119904 and119865
119894 the number of instances of byte
value 119894 in the array As the entropy value is represented by anumber from 0 to 8 the entropy value of 8 represents the bytearray composition of its completely uniform distributionSince the probability of each byte occurring in the encryptedciphertext is basically the same the entropy valuewill be closeto the upper limit Because the ransomware always encryptsa large number of files when we detect that a file change toa high entropy value file in a short period of time and alsochange the file type we assume that the file is subject to aransomware attack
433 The Shared Folder Monitor By traversing the diskstorage in real time the shared folder monitor discovers theupdates of the shared folders in real time It obtains thecontents of the attackers files locally which are often not
noticed by the attacker thus revealing some unexpectedtraceable clues The monitor can access a list of paths to theattacker shared folders
As we originally observed shared folders using RemoteDesktop often have a path in the remote host with the prefixldquotsclientrdquo When the monitor traverses the storage to thisprefix it uses ldquoFindFirstFilerdquo to find the first file It then usesldquoFindNextFilerdquo to find the next file with the returned handleWhen the resulting handle is in a folder format it continues totraverse all files under that folder Initially themonitor tries toget the full file names and file contents by traversing the newshared folder However during the actual experimentationit is found that as the number of files in storage grows themonitor takes far more time and resources to get all the filecontents than just the file paths All traverses are more likelyto alert the attacker Therefore the monitor only obtains thefile paths that the attacker shares on its host with the helpof ldquoGetFileNamerdquo Moreover in order to prevent encryptionby the ransomware the mounted disk monitor will directlytransfer the acquired shared file path list to another securehost
5 Clue Extraction and Analysis
Through the deception environment we trap the ransomwareattacker and collect a lot of information that may containmany traceable clues However such traceable clues areoften not visually observable and are complex in nature Inaddition many of the above clues contain information that isnot helpful in tracing back ransomware attackers Thereforein order to assist in the analysis of the monitor informationand extract the effective main traceable clues in this sectionwe propose how to extract clues and how to analyze traceableclues using an automatic approach after extraction
51 Clue Extraction We mainly obtain kinds of clues fromthe extraction including remote login information (IPaddresses) network traffic clipboard contents (pictures andtexts) shared folder information (path strings) and ran-somware samples (compile time and compile paths) Theshared folder path clues can be obtained directly from themonitor However clipboard clues compile clues and remotehost clues are often not visually observable and are complexin nature As a result the extraction module mainly focuseson the extraction of remote host clues clipboard string cluesand compilation clues
Remote Host Clue Extraction The IP address port numbersand folder path clues can be directly obtained from the logininformation and folder paths TCP packets that interact withthe configuration information in a network communicationPCAP package are usually named ldquoClientDatardquo We extractthe client name field from the ldquoClientDatardquo packet to obtainthe attackerrsquos hostname In addition the KeyboardLayoutfield indicates the default keyboard layout for an attack-ing host eg the Chinese Simplified layout number is0x0004 and the American English keyboard layout numberis 0x0409The remote users idiomatic language (the mother
Wireless Communications and Mobile Computing 7
tongue) can be found by the keyboard layout to infer theattackers nationality
Clipboard Clue Extraction The main file formats availableto the clipboard monitor are Windows Bitmap GDI fileANSI characters Unicode characters and WAV audio dataWe mainly aim at extracting the traceable clues of charactertypes It extracts character clues from the clipboard in variousformats by judging the GetClipboardData APIrsquos ldquoDataTyperdquovalue
Compilation Clue Extraction For all Windows RDP-basedransomware samples that we examined we empiricallyobserved that the most commonly used formats for thesesamples are the PE file especially lowastexe and lowastdll some PEfiles have compilation information in the file and this infor-mation does not change with the migration of the programsAs a result it is a goodway to obtain the creatorrsquos informationA PE file mainly consists of five major components DOSMZ header DOS stub PE header section headers andsection content Each component contains a great deal ofinformation There is very little information that we can useto identify the creator and some identification informationneeds to be extracted from the content of each section Inthis paper there are many clues in the PE files that canbe extracted to trace back the attacker file name PE filetype compiler version compilation path compile time lastmodified time last open time IP address URL domainname language string wide character and so on We extractmost of the clues with PEView [34] However since it cannotdirectly obtain the compilation path we use the pefile [35]tool to extract paths by locating in the PE structure
Since the extraction clues include different encodingformats to facilitate observation and the unified mode ofsubsequent analysis the extraction system completely con-verts the data encoding obtained into the UTF-8 format andsaved in the SQLite database Before submitted to the analysissystem the extracted clued are divided into two categoriesstring clues and path clues
52 Automatic Analysis At this point the clues extractedfrom this system mainly include string clues and path cluesThenumber of string clues is largemixedwith a large numberof unidentifiable strings Because the number of path cluesis also very large with no semantic correlation it is difficultto identify traceable clues manually So we focus on how toautomate the identification of traceable clues for path clues
521 Users and Programs Analysis In the analysis of thepath clues we first propose to obtain the attacker clues byidentifying the features of the context-related segmentationon the same path For instance each user has a separateuser folder and it is located in the ldquoUsersrdquo folder underDrive C As a result the system can obtain the user nameat the attacking host by obtaining the folder name under theldquoUsersrdquo folder (eg CUsersDell) The ldquoProgram filesrdquofolder usually contains the name of the software programinstalled on the machine (eg CProgram FilesMicrosoftVisual Studio 110) What is more the QQ account
number is always located in the ldquoQQQQfilerdquo folder (egDQQQQfile86lowast lowast lowast lowast lowastlowast086FileRecv)
In this way the analysis system can quickly and accuratelyget host names email accounts program names socialsoftware numbers and other traceable clues that are carriedin the attackers file paths However such user informationfor less of the overall clues is acquired by chance Thereforewe will conduct further analysis on this basis
522 Account Analysis Through the analysis of APT1 andsome other attribution reports we find that the mappingbetween the attacker and the physical world identity can bebetter obtained by analyzing the account number left by theattacker This information includes but is not limited to thelocation of the IP address the spelling and registration of thedomain name the URL corresponding IP address and thedomain name of the mailbox account Because it is difficultto identify this information effectively in a large numberof strings and path clues the analysis system automaticallyidentifies the IP address domain name URL and mailboxaccount by regular matching Then with the help of threatintelligence and big data technology more relevant clues areobtained
523 Language Identification User languages often help todetermine an attackers idiomatic language but because ofa large number of languages in different countries andthe high similarity of some languages we use automaticanalysis systems to identify the language of clues We testthe accuracy of two language identification toolkits usingentire path information in four different languages We havefound ldquolangidpyrdquo toolkit to be overall more accurate thanldquolangdetectrdquo toolkit The comparison results are shown inFigure 3 The langidpy is a language identification toolkitdeveloped by Lui and Baldwin at the University ofMelbourne[36] It combines a naive Bayes classifier with cross-domainfeature selection to provide domain-independent languageidentification
524 Traceable Strings Identification When traceable stringsare needed traditional string analysis methods usually useNamed Entity Recognition (NER) However the clues tobe analyzed mainly include strings and path Strings beforeand after the path separator have few semantic correlationsWhat is more the string between the path separators andthe remaining strings to be analyzed are mostly semanticallyunrelated due to their limited length So this paper proposesan algorithm which can quickly and automatically analyzethe traceable strings in strings and path
In order to filter out meaningful traceable clues related tothe attackerrsquos identity the path clues and string clues are splitinto strings and identified by common words and gibberishin the following steps Figure 4 shows the automatic traceableclues identification system process
Make Stop Words The system splits the path by the pathdelimiter as these separated path strings that are commonto multiple computers have no identifying effect So we takeout the file string names that are common to 20 normal user
8 Wireless Communications and Mobile Computing88
117
99 107
12
0
11
1
73
1
44
5
27
116
66
103
ENGLISH KOREANRUSSIANCHINESE
langid toolkit correct recognition
langid toolkit incorrect recognitionlangdetect toolkit correct recognition
langdetect toolkit incorrect recognition
Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively
StopWords
SentenceTokenize
Sign Split
Strings
Common wordsdetect
Alphabetic CaseCharacteristics
Result
Gibberish detect
StringClues
Word Tokenize
Traceable StringsIdentification
PathClues
Path Split
Figure 4 Process of automatic identifying of traceable clues
computers as stop words Then it removes the stop wordsfrom the strings after each split
Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again
Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue
Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is
P 119883119894+1
= 119909 | 1198831= 1199091 1198832= 1199092 119883
119894= 119909119894
= 119875 119883119894+1
= 119909 | 119883119894= 119909119894
(2)
Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes
119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]
times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)
If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities
119905ℎ119903119890119904ℎ119900119897119889
=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))
2
(4)
When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish
Wireless Communications and Mobile Computing 9
Table 1 Traceable strings recognized result [13]
String CommonWordsRecognized
GibberishRecognized
FinalRecognized
Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False
Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings
6 Evaluation
In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers
61 RDP-Based Ransomware Attacker
611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system
We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
Analysis Data Change Figure
Dat
a qua
ntity
Analysis Process
50
40
30
20
10
0
source stopword tokenize gibberish result
volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6
volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12
Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]
62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections
Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host
Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely
10 Wireless Communications and Mobile Computing
UserName
KeyboardLayout
Programs
Account
TraceableStrings
353lowastlowastlowast208
AdminDefault
CH-SIMPLI
FIED
AliPayQQ
SogouInput
MeiTu
Sinfor
360
VisualStudio
372lowastlowastlowast582
DELLbotnet
lowastlowaste
lowastlowastt
Alowastlowastlowastteam
Whlowastlowastlowastterfreebuf
visumantrag
lowastlowastlowastlowast
eaccnlowastlowastlowastlowast
taccn
Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string
used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users
Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features
Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use
Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by
Figure 7 The registration infographic of QQ account 372lowast lowast lowast82
security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname
63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware
Wireless Communications and Mobile Computing 11
Table 2 Same identifier for different samples [13]
Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc
jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501
8787197
0 100908070605040302010
number
Ransomware without a PDB pathRansomware with PDB path
Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation
samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples
What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information
Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion
Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been
353
5809
456
046
057
011
661034
399205057
1720
034114
046
zhennlnnslnbdeit
humtplfietdaes
Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)
translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo
ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo
We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following
(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China
(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)
12 Wireless Communications and Mobile Computing
Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library
(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China
7 Conclusion
In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Additional Points
This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)
Disclosure
An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)
References
[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016
[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015
[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015
[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017
[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018
[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware
[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817
[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018
[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf
[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018
[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf
[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018
[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018
[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A
Wireless Communications and Mobile Computing 13
survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018
[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018
[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015
[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016
[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020
[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016
[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015
[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016
[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016
[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015
[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015
[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016
[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016
[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015
[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018
[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017
[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015
[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016
[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017
[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments
[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language
identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012
[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004
[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst
[39] ldquoVirusTotalrdquo httpvirustotalcom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Wireless Communications and Mobile Computing 7
tongue) can be found by the keyboard layout to infer theattackers nationality
Clipboard Clue Extraction The main file formats availableto the clipboard monitor are Windows Bitmap GDI fileANSI characters Unicode characters and WAV audio dataWe mainly aim at extracting the traceable clues of charactertypes It extracts character clues from the clipboard in variousformats by judging the GetClipboardData APIrsquos ldquoDataTyperdquovalue
Compilation Clue Extraction For all Windows RDP-basedransomware samples that we examined we empiricallyobserved that the most commonly used formats for thesesamples are the PE file especially lowastexe and lowastdll some PEfiles have compilation information in the file and this infor-mation does not change with the migration of the programsAs a result it is a goodway to obtain the creatorrsquos informationA PE file mainly consists of five major components DOSMZ header DOS stub PE header section headers andsection content Each component contains a great deal ofinformation There is very little information that we can useto identify the creator and some identification informationneeds to be extracted from the content of each section Inthis paper there are many clues in the PE files that canbe extracted to trace back the attacker file name PE filetype compiler version compilation path compile time lastmodified time last open time IP address URL domainname language string wide character and so on We extractmost of the clues with PEView [34] However since it cannotdirectly obtain the compilation path we use the pefile [35]tool to extract paths by locating in the PE structure
Since the extraction clues include different encodingformats to facilitate observation and the unified mode ofsubsequent analysis the extraction system completely con-verts the data encoding obtained into the UTF-8 format andsaved in the SQLite database Before submitted to the analysissystem the extracted clued are divided into two categoriesstring clues and path clues
52 Automatic Analysis At this point the clues extractedfrom this system mainly include string clues and path cluesThenumber of string clues is largemixedwith a large numberof unidentifiable strings Because the number of path cluesis also very large with no semantic correlation it is difficultto identify traceable clues manually So we focus on how toautomate the identification of traceable clues for path clues
521 Users and Programs Analysis In the analysis of thepath clues we first propose to obtain the attacker clues byidentifying the features of the context-related segmentationon the same path For instance each user has a separateuser folder and it is located in the ldquoUsersrdquo folder underDrive C As a result the system can obtain the user nameat the attacking host by obtaining the folder name under theldquoUsersrdquo folder (eg CUsersDell) The ldquoProgram filesrdquofolder usually contains the name of the software programinstalled on the machine (eg CProgram FilesMicrosoftVisual Studio 110) What is more the QQ account
number is always located in the ldquoQQQQfilerdquo folder (egDQQQQfile86lowast lowast lowast lowast lowastlowast086FileRecv)
In this way the analysis system can quickly and accuratelyget host names email accounts program names socialsoftware numbers and other traceable clues that are carriedin the attackers file paths However such user informationfor less of the overall clues is acquired by chance Thereforewe will conduct further analysis on this basis
522 Account Analysis Through the analysis of APT1 andsome other attribution reports we find that the mappingbetween the attacker and the physical world identity can bebetter obtained by analyzing the account number left by theattacker This information includes but is not limited to thelocation of the IP address the spelling and registration of thedomain name the URL corresponding IP address and thedomain name of the mailbox account Because it is difficultto identify this information effectively in a large numberof strings and path clues the analysis system automaticallyidentifies the IP address domain name URL and mailboxaccount by regular matching Then with the help of threatintelligence and big data technology more relevant clues areobtained
523 Language Identification User languages often help todetermine an attackers idiomatic language but because ofa large number of languages in different countries andthe high similarity of some languages we use automaticanalysis systems to identify the language of clues We testthe accuracy of two language identification toolkits usingentire path information in four different languages We havefound ldquolangidpyrdquo toolkit to be overall more accurate thanldquolangdetectrdquo toolkit The comparison results are shown inFigure 3 The langidpy is a language identification toolkitdeveloped by Lui and Baldwin at the University ofMelbourne[36] It combines a naive Bayes classifier with cross-domainfeature selection to provide domain-independent languageidentification
524 Traceable Strings Identification When traceable stringsare needed traditional string analysis methods usually useNamed Entity Recognition (NER) However the clues tobe analyzed mainly include strings and path Strings beforeand after the path separator have few semantic correlationsWhat is more the string between the path separators andthe remaining strings to be analyzed are mostly semanticallyunrelated due to their limited length So this paper proposesan algorithm which can quickly and automatically analyzethe traceable strings in strings and path
In order to filter out meaningful traceable clues related tothe attackerrsquos identity the path clues and string clues are splitinto strings and identified by common words and gibberishin the following steps Figure 4 shows the automatic traceableclues identification system process
Make Stop Words The system splits the path by the pathdelimiter as these separated path strings that are commonto multiple computers have no identifying effect So we takeout the file string names that are common to 20 normal user
8 Wireless Communications and Mobile Computing88
117
99 107
12
0
11
1
73
1
44
5
27
116
66
103
ENGLISH KOREANRUSSIANCHINESE
langid toolkit correct recognition
langid toolkit incorrect recognitionlangdetect toolkit correct recognition
langdetect toolkit incorrect recognition
Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively
StopWords
SentenceTokenize
Sign Split
Strings
Common wordsdetect
Alphabetic CaseCharacteristics
Result
Gibberish detect
StringClues
Word Tokenize
Traceable StringsIdentification
PathClues
Path Split
Figure 4 Process of automatic identifying of traceable clues
computers as stop words Then it removes the stop wordsfrom the strings after each split
Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again
Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue
Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is
P 119883119894+1
= 119909 | 1198831= 1199091 1198832= 1199092 119883
119894= 119909119894
= 119875 119883119894+1
= 119909 | 119883119894= 119909119894
(2)
Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes
119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]
times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)
If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities
119905ℎ119903119890119904ℎ119900119897119889
=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))
2
(4)
When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish
Wireless Communications and Mobile Computing 9
Table 1 Traceable strings recognized result [13]
String CommonWordsRecognized
GibberishRecognized
FinalRecognized
Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False
Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings
6 Evaluation
In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers
61 RDP-Based Ransomware Attacker
611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system
We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
Analysis Data Change Figure
Dat
a qua
ntity
Analysis Process
50
40
30
20
10
0
source stopword tokenize gibberish result
volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6
volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12
Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]
62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections
Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host
Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely
10 Wireless Communications and Mobile Computing
UserName
KeyboardLayout
Programs
Account
TraceableStrings
353lowastlowastlowast208
AdminDefault
CH-SIMPLI
FIED
AliPayQQ
SogouInput
MeiTu
Sinfor
360
VisualStudio
372lowastlowastlowast582
DELLbotnet
lowastlowaste
lowastlowastt
Alowastlowastlowastteam
Whlowastlowastlowastterfreebuf
visumantrag
lowastlowastlowastlowast
eaccnlowastlowastlowastlowast
taccn
Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string
used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users
Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features
Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use
Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by
Figure 7 The registration infographic of QQ account 372lowast lowast lowast82
security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname
63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware
Wireless Communications and Mobile Computing 11
Table 2 Same identifier for different samples [13]
Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc
jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501
8787197
0 100908070605040302010
number
Ransomware without a PDB pathRansomware with PDB path
Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation
samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples
What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information
Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion
Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been
353
5809
456
046
057
011
661034
399205057
1720
034114
046
zhennlnnslnbdeit
humtplfietdaes
Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)
translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo
ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo
We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following
(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China
(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)
12 Wireless Communications and Mobile Computing
Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library
(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China
7 Conclusion
In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Additional Points
This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)
Disclosure
An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)
References
[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016
[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015
[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015
[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017
[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018
[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware
[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817
[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018
[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf
[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018
[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf
[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018
[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018
[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A
Wireless Communications and Mobile Computing 13
survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018
[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018
[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015
[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016
[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020
[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016
[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015
[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016
[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016
[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015
[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015
[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016
[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016
[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015
[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018
[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017
[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015
[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016
[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017
[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments
[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language
identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012
[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004
[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst
[39] ldquoVirusTotalrdquo httpvirustotalcom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
8 Wireless Communications and Mobile Computing88
117
99 107
12
0
11
1
73
1
44
5
27
116
66
103
ENGLISH KOREANRUSSIANCHINESE
langid toolkit correct recognition
langid toolkit incorrect recognitionlangdetect toolkit correct recognition
langdetect toolkit incorrect recognition
Figure 3 Accuracy of Two Language Identification Tools In testsusing the sameknown language path clues the accuracy of the langidtoolkit for English Chinese Russian and Korean was 88 10099 990 respectively while the accuracy of the langdet toolkitwas 73 085 40 and 460 respectively
StopWords
SentenceTokenize
Sign Split
Strings
Common wordsdetect
Alphabetic CaseCharacteristics
Result
Gibberish detect
StringClues
Word Tokenize
Traceable StringsIdentification
PathClues
Path Split
Figure 4 Process of automatic identifying of traceable clues
computers as stop words Then it removes the stop wordsfrom the strings after each split
Segmentation After removing the stop words we separatesentences and words for each segment of the path withtokenize (eg NLTK NER) [37] However as the segmentsof the path are not semantically related the segmentationis less effective Based on our extensive research on pathinformation users often prefer to use symbols such asunderscores to split file names In addition they also like touse capitalization in multiword strings to facilitate readingAs a result the system uses the common identifier in thedocument and alphabetic case characteristic to segment thestrings again
Common Words Removal After segmentation the systemobtains a lot of repeatable strings But most of them arecommonly used words and does not help in identifyingransomware attackers As a result we filter out the commonwords by comparing each string with a dictionary If thestring can be matched in a dictionary the system marks thecommon word property of the word as false otherwise astrue
Gibberish Detect Based on our analysis and observation wefound that there is a large amount of garbled informationin path clues Most of these strings are generated randomlyand people can recognize this randomness manually but itis difficult for a program to recognize it automatically Asa result we propose detecting the gibberish by training theMarkov chain model with English texts For each string Xthere is a probability that the i character in X is
P 119883119894+1
= 119909 | 1198831= 1199091 1198832= 1199092 119883
119894= 119909119894
= 119875 119883119894+1
= 119909 | 119883119894= 119909119894
(2)
Each letter is related to the upper and lower two lettersThis relationship can be expressed by 2-gram For examplewith regard to the string ldquoRansomwarerdquoRa anns e[space]The analysis system can record how often the charactersappear next to each other through the collection of gibberishand some commonly used normal phrases or vocabulary Itnormalizes the counts after reading through the training dataand then measures the probability of generating the stringbased on the digest by multiplying the probability of the pairof adjacent characters in the string [38]The training data canhelp statistics corresponding to gibberish and normal stringsrespectively the average transfer probability This probabilitythen measures the amount of possibilities assigned to thisstring according to the data as observed by the model Whenwe test the string ldquoRansomrdquo it computes
119901119903119900119887 [10158401198771198861198991199041199001198981015840] = 119901119903119900119887 [10158401198771015840] [10158401198861015840] times 119901119903119900119887 [10158401198861015840] [10158401198991015840]
times sdot sdot sdot times 119901119903119900119887 [10158401198981015840] [10158401015840](3)
If the input string is gibberish it will pass through somepairs with very low counts in the training phase and hencehave low probability The system then looks at the amountof probability per character for a few known normal stringsand a few examples of known gibberish and then picks athreshold between the gibberishrsquos most possibilities and thenormal stringrsquos least possibilities
119905ℎ119903119890119904ℎ119900119897119889
=(119898119894119899 (119899119900119898119886119897119901119903119900119887119904) + 119898119886119909 (119892119894119887119887119890119903119894119904ℎ119901119903119900119887119904))
2
(4)
When we analyze the string lsquoXrsquo if 119901119903119900119887[10158401198831015840] gt 119905ℎ119903119890119904ℎ119900119897119889the analysis system will view the string as normal string If119901119903119900119887[10158401198831015840] le 119905ℎ119903119890119904ℎ119900119897119889 the analysis system will view thestring as lsquoFalsersquoThen the system removes the strings with thelsquoFalsersquo type as gibberish
Wireless Communications and Mobile Computing 9
Table 1 Traceable strings recognized result [13]
String CommonWordsRecognized
GibberishRecognized
FinalRecognized
Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False
Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings
6 Evaluation
In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers
61 RDP-Based Ransomware Attacker
611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system
We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
Analysis Data Change Figure
Dat
a qua
ntity
Analysis Process
50
40
30
20
10
0
source stopword tokenize gibberish result
volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6
volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12
Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]
62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections
Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host
Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely
10 Wireless Communications and Mobile Computing
UserName
KeyboardLayout
Programs
Account
TraceableStrings
353lowastlowastlowast208
AdminDefault
CH-SIMPLI
FIED
AliPayQQ
SogouInput
MeiTu
Sinfor
360
VisualStudio
372lowastlowastlowast582
DELLbotnet
lowastlowaste
lowastlowastt
Alowastlowastlowastteam
Whlowastlowastlowastterfreebuf
visumantrag
lowastlowastlowastlowast
eaccnlowastlowastlowastlowast
taccn
Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string
used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users
Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features
Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use
Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by
Figure 7 The registration infographic of QQ account 372lowast lowast lowast82
security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname
63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware
Wireless Communications and Mobile Computing 11
Table 2 Same identifier for different samples [13]
Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc
jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501
8787197
0 100908070605040302010
number
Ransomware without a PDB pathRansomware with PDB path
Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation
samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples
What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information
Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion
Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been
353
5809
456
046
057
011
661034
399205057
1720
034114
046
zhennlnnslnbdeit
humtplfietdaes
Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)
translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo
ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo
We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following
(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China
(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)
12 Wireless Communications and Mobile Computing
Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library
(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China
7 Conclusion
In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Additional Points
This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)
Disclosure
An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)
References
[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016
[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015
[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015
[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017
[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018
[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware
[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817
[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018
[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf
[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018
[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf
[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018
[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018
[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A
Wireless Communications and Mobile Computing 13
survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018
[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018
[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015
[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016
[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020
[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016
[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015
[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016
[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016
[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015
[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015
[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016
[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016
[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015
[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018
[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017
[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015
[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016
[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017
[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments
[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language
identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012
[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004
[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst
[39] ldquoVirusTotalrdquo httpvirustotalcom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Wireless Communications and Mobile Computing 9
Table 1 Traceable strings recognized result [13]
String CommonWordsRecognized
GibberishRecognized
FinalRecognized
Whlowast lowast lowastlowastter True True Truegandcrab True True Truekate z True True Truevwrtjty True False Falseprogram False True False
Identification Result After the above analysis the automatedanalysis system outputs the strings for which both thecommon word tag and the gibberish tag are True which isthe list of traceable clues [13] Table 1 is reproduced fromZHWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
When the string is a normal word such as ldquoprogramrdquo thecommon words recognition will make it ldquoFalserdquo Gibberishrecognition can recognize the word only with non-randomlygenerated identifier strings Moreover it cannot identifythe words that do not conform to the spelling patterns ofcommon words (eg vwrtjty) The string is considered tohave auxiliary traceability only if both of the analysis valuesare true We rely extensively on string inversion to verify theaccuracy of the system It is found that most of the traceablestrings on the volunteer storage can be recognized Table 1shows the recognition results of several typical strings
6 Evaluation
In this paper we evaluate the proposed method with twoexperiments The goal of the first experiment is to demon-strate that the proposed method can help trace back to theRDP-based ransomware attackers and capture their privateinformation The goal of the second experiment is to demon-strate that the method can also automatically recognize cluesin ransomware samples and help trace back to ransomwaremakers
61 RDP-Based Ransomware Attacker
611 Clue Capture and Analysis We usedWindows 7 systemvirtual machines to deploy the deception environment andassess the effectiveness of itWhileWindows 7 is not requiredit was chosen because of the wide range of applicationsand because it is one of the main targets of ransomwareWe invited 122 professional volunteers to help with theexperiment and provided them with an experimental fleet of12 virtual hosts Most of the volunteersrsquo login informationclipboard content shared folder path and uploaded PE filewere able to be successfully captured by the monitor system
We choose 12 computersrsquo path the information collectedfrom volunteers and record the rate of convergence aftereach step of the analysis Figure 5 shows how the number oftraceable clues remains as each step of the data is processedby the analysis system [13] Figure 5 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain]
Analysis Data Change Figure
Dat
a qua
ntity
Analysis Process
50
40
30
20
10
0
source stopword tokenize gibberish result
volunteer1volunteer2volunteer3volunteer4volunteer5volunteer6
volunteer7volunteer8volunteer9volunteer10volunteer11volunteer12
Figure 5 The data volume changes in 12 volunteers traceable cluesthrough the analysis system The screening rate can reach 9834[13]
62 Traceback Attackers Next we carried out a detailedanalysis of the information left over from one of the volun-teers attacks By understanding the data from the automaticanalysis of an attacker we can infer that the attackersreal identity may have the following characteristics (1) Theattacker is likely to come from China (2) The attacker maybe a security and related personnel (3) The attacker maywork in the Chinese Academy of Sciences and have relationswith a large security manufacturer in China The conclusionis shown in Figure 6 It shows the results from the analysissystem in five sections
Username Through automatic analysis a total of three usernames from the attackers host were extracted from a largenumber of clues in which by using the ldquoDellrdquo user it can beassumed that the attacker was using the Dell host
Programs The automatic analysis system identified severaltypical software programs installed in the attackers hostFor instance ldquoQQrdquo is a widely used real-time social toolin China ldquoAliPayrdquo is an online payment tool developedby Alibaba and widely used in China ldquoSogou inputrdquo isan input method software for Chinese developed by ChinaSogou Company MeiTu is a photo beautification softwaredeveloped by a Chinese company and widely used in ChinaIn addition ldquoSinforrdquo and ldquo360rdquo are both well-known securitymanufacturers in China and have a large number of safety-related products ldquoVisual Studiordquo is a common developmentsoftware It is found thatmuchChinese-made softwarewidely
10 Wireless Communications and Mobile Computing
UserName
KeyboardLayout
Programs
Account
TraceableStrings
353lowastlowastlowast208
AdminDefault
CH-SIMPLI
FIED
AliPayQQ
SogouInput
MeiTu
Sinfor
360
VisualStudio
372lowastlowastlowast582
DELLbotnet
lowastlowaste
lowastlowastt
Alowastlowastlowastteam
Whlowastlowastlowastterfreebuf
visumantrag
lowastlowastlowastlowast
eaccnlowastlowastlowastlowast
taccn
Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string
used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users
Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features
Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use
Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by
Figure 7 The registration infographic of QQ account 372lowast lowast lowast82
security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname
63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware
Wireless Communications and Mobile Computing 11
Table 2 Same identifier for different samples [13]
Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc
jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501
8787197
0 100908070605040302010
number
Ransomware without a PDB pathRansomware with PDB path
Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation
samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples
What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information
Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion
Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been
353
5809
456
046
057
011
661034
399205057
1720
034114
046
zhennlnnslnbdeit
humtplfietdaes
Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)
translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo
ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo
We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following
(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China
(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)
12 Wireless Communications and Mobile Computing
Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library
(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China
7 Conclusion
In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Additional Points
This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)
Disclosure
An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)
References
[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016
[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015
[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015
[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017
[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018
[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware
[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817
[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018
[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf
[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018
[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf
[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018
[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018
[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A
Wireless Communications and Mobile Computing 13
survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018
[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018
[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015
[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016
[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020
[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016
[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015
[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016
[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016
[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015
[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015
[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016
[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016
[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015
[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018
[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017
[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015
[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016
[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017
[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments
[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language
identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012
[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004
[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst
[39] ldquoVirusTotalrdquo httpvirustotalcom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
10 Wireless Communications and Mobile Computing
UserName
KeyboardLayout
Programs
Account
TraceableStrings
353lowastlowastlowast208
AdminDefault
CH-SIMPLI
FIED
AliPayQQ
SogouInput
MeiTu
Sinfor
360
VisualStudio
372lowastlowastlowast582
DELLbotnet
lowastlowaste
lowastlowastt
Alowastlowastlowastteam
Whlowastlowastlowastterfreebuf
visumantrag
lowastlowastlowastlowast
eaccnlowastlowastlowastlowast
taccn
Figure 6 Analyze an attacker using automatic analysis system to display traceable information about the attacker in five aspects host username program software keyboard layout account information and traceable string
used in China was installed on the attackers hosts as well assome development and security products that were installedless often by ordinary users
Keyboard LayoutThrough network communication analysiswe obtain the attackers keyboard layout is Simplified Chi-nese from which it can be inferred that the attackers nativelanguage is probably Chinese This is consistent with the hostinstalled software features
Account Number According to the system automaticallyidentified account number it found two QQ accounts andtwo e-mail accounts The e-mail accounts for the ChineseAcademy of Sciences business accounts can confirm theabove assumption that the attackers from China and it islikely to work in the Chinese Academy of Sciences Throughthe registration information of the QQ account (as shownin Figure 7) we can find the following information (1) theattacker could be a male aged 32 (2) Internet-related worklikely with 360 Security (3) located in Haidian DistrictBeijing China In the ldquo353lowast lowast lowast208rdquo account registrationinformation mail work and other information are emptynicknamed a special English string ldquoWhlowast lowast lowastterrdquo it can bepresumed that the account is likely to be a private use
Traceable Strings The automatic analysis system sifts stringsfrom the monitor that may be traceable ldquoFreebuf rdquo is asecurity information exchange website commonly used byChinese security personnel ldquoBootnetrdquo is a secondary attackmethod that is often used by attackers and is often used by
Figure 7 The registration infographic of QQ account 372lowast lowast lowast82
security researchers as the main research direction ldquolowastlowasterdquoand ldquolowastlowasttrdquo are acronyms for departments under the ChineseAcademy of Sciences while ldquoAlowast lowast lowastteamrdquo is a securityresearch team in the ldquolowastlowasterdquo department ldquoVisumantragrdquo isoften used when processing visas from various countries TheldquoWhlowast lowast lowastterrdquo matches the nickname of the QQ numberldquo353lowastlowastlowast208rdquo account and is likely to be its regular nickname
63 Ransomware Maker Automatic Analysis In order toverify that the analysis system is available in tracing back theransomwaremaker we obtainedmore than 8000 ransomware
Wireless Communications and Mobile Computing 11
Table 2 Same identifier for different samples [13]
Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc
jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501
8787197
0 100908070605040302010
number
Ransomware without a PDB pathRansomware with PDB path
Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation
samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples
What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information
Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion
Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been
353
5809
456
046
057
011
661034
399205057
1720
034114
046
zhennlnnslnbdeit
humtplfietdaes
Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)
translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo
ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo
We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following
(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China
(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)
12 Wireless Communications and Mobile Computing
Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library
(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China
7 Conclusion
In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Additional Points
This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)
Disclosure
An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)
References
[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016
[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015
[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015
[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017
[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018
[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware
[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817
[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018
[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf
[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018
[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf
[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018
[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018
[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A
Wireless Communications and Mobile Computing 13
survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018
[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018
[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015
[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016
[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020
[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016
[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015
[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016
[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016
[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015
[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015
[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016
[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016
[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015
[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018
[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017
[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015
[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016
[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017
[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments
[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language
identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012
[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004
[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst
[39] ldquoVirusTotalrdquo httpvirustotalcom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Wireless Communications and Mobile Computing 11
Table 2 Same identifier for different samples [13]
Sample Hash Same Identity Strings68dd613973f8alowast lowast lowast7befe0f4b461d258ce569b9020079lowast lowast lowast9ae0f090266a60cc
jqxxpphoaonkje7z2fa0c084aef41969lowast lowast lowast1f427a1b65e1b1464fa82d297e712lowast lowast lowaste1fdf312c044814a36da0286d42cdlowast lowast lowast306fba99239c8238821beaf40744belowast lowast lowastba862dbc6d90f70cbc839bcb88lowast lowast lowast7422bbd2fa812dd3de02391b6a9f1lowast lowast lowast0762a6f49cf32501
8787197
0 100908070605040302010
number
Ransomware without a PDB pathRansomware with PDB path
Figure 8 Of the 8075 real ransomware samples 11 were able toextract the Program Data Base (PDB) path while the remaining89 of authors may have chosen to hide the PDB path duringcompilation
samples fromVirusTotal [39] and extracted the ProgramDataBase (PDB) path from the samples (as shown in Figure 8)It is shows that despite the large number of attackersdeliberate erasure of the compiled information legacy PDBpath information can still be found from a large number ofransomware samples
What ismore the analysis system is used to automaticallyanalyze more than 800 different ransomware samplesrsquo PDBpath We found multiple identity strings in the analysisresults of different samples One of the typical results isshown in Table 2 [13] Table 2 is reproduced from Z HWang et al (2018) [under the Creative Commons AttributionLicensepublic domain] We used VirusTotal to validatethe samples and found that they were all Cobra familyransomware As a result it could be assumed that thesesamples are made by the same ransomware maker Whenone sample is traced back other samples can also arrive atthe conclusion of the ransomware maker When differenttraceable information is analyzed from different samplesthe identity information of the ransomware maker can bedescribed by integrating the information
Language Identification When we carry on the languagerecognition to the PDB path of 878 samples the result is asFigure 9 The system automatically identifies the path whichmainly contains 11 languages of which English accounts forthe largest proportion
Actual Case Analysis The automatic identification systemhelped us identify the traceable strings in the PDB path Takethe example of a sample PDB path (the Chinese has been
353
5809
456
046
057
011
661034
399205057
1720
034114
046
zhennlnnslnbdeit
humtplfietdaes
Figure 9 Of the 878 ransomware sample PDB paths Englishaccounted for 510 and the remaining monitored languageswere Chinese 31(353) Dutch 40(456) Norwegian Nynorsk4(046) Slovenian 5(057) Norwegian Bokmal 1(011) Ger-man 58(661) Italian 3(034) Hungarian 35(399) Maltese18(205) Polish 5(057) Finnish 151(1720) Luxembourgish3(034) Danish 10(114) and Spanish 4(046)
translated into English) with anMD5 value of ldquo60c6a92afblowastlowastlowast6d0c16f7rdquo
ldquoELIAOSTUDIOUACsimulated an improved ver-sion of the 360 anti-virus program 1117 Bale1lowast41lowastlowast2lowastlowastlowastlowast0(lowastlowast1) 9818 programWriteSystem32ReleaseVirtualDesktoppdbrdquo
We can infer from the automatic analysis result that theransomware maker is likely from China We can prove thispoint by the following
(1) Language Identification This is done by identifyingthe path language which can be used to speculate that theattacker may have come from China By understanding theChinese strings in the path we find that the meaning of thistext is to improve to bypass the detection of 360 securitysoftware that has a large market in China
(2) IP address The system identified an IP address in thePDB path With the help of threat intelligence database it isfound that the IP came from Xinjiang China (Figure 10)
12 Wireless Communications and Mobile Computing
Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library
(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China
7 Conclusion
In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Additional Points
This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)
Disclosure
An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)
References
[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016
[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015
[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015
[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017
[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018
[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware
[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817
[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018
[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf
[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018
[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf
[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018
[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018
[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A
Wireless Communications and Mobile Computing 13
survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018
[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018
[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015
[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016
[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020
[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016
[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015
[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016
[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016
[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015
[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015
[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016
[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016
[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015
[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018
[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017
[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015
[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016
[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017
[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments
[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language
identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012
[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004
[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst
[39] ldquoVirusTotalrdquo httpvirustotalcom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
12 Wireless Communications and Mobile Computing
Figure 10 Partial infographic of IP 1lowast41lowastlowast2lowastlowastlowastlowast0 in the threatintelligence library
(3) Traceable Strings The extracted identifier stringldquoLIAOrdquo resembles a Chinese pinyin and is most likely a Chi-nese surname To sum up we speculate that the ransomwaremaker is most likely from China
7 Conclusion
In this paper we focus on tracing back RDP-based ran-somware Recently more and more ransomware attackers areusing RDP attacks to spread ransomware with impunity dueto the use of strong cryptography We propose tracing backthe attack sources to deter RDP-based ransomware with theproposed deception environment It collects traceable cluesand performs automatic analysis by using natural languageprocessing and machine learning techniques The evaluationshows that it is able to trap attackers and collect traceableclues left by attackers in a deception environment Withautomatic clue identification it can converge the amount oftraceable clues to about 2 We use this method to analyzetwo practical cases and show its effectiveness By tracing backto ransomware attackers we can provide a strong deterrent tostifle the development of ransomware
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Additional Points
This manuscript provides additional deception environmentmonitors and analytical methods by using more volunteersand samples It proves the validity of the methods throughspecific traceback cases (ie traceback the ransomwareattacker and traceback the ransomware maker)
Disclosure
An earlier version of this paper was presented at the Interna-tional Conference IEEE Third International Conference onData Science in Cyberspace Guangzhou China 18-21 June2018The authorsrsquo initial conference paper focused mainly onthe effectiveness of deception environment monitors and thescreening rate of the analysis system
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank VirusTotal for providingransomware samples and the volunteers from the Universityof Chinese Academy of Sciences This work is supportedin part by the National Key research and DevelopmentPlan (Grant no 2018YFB0803504) and the National NaturalScience Foundation of China (61871140 61572153 U163621561572492 and 61672020)
References
[1] A Kharraz ldquoUNVEIL A Large-Scale Automated Approachto Detecting Ransomwarerdquo in Proceedings of the 25th USENIXSecurity Symposium (USENIX Security 16) pp 757ndash772USENIX Association 2016
[2] K Savage P Coogan and H Lau ldquoThe evolution of ran-somwarerdquo Tech Rep 2015
[3] A Kharraz W Robertson D Balzarotti L Bilge and EKirda ldquoCutting the Gordian Knot A Look Under the Hood ofRansomware Attacksrdquo in Detection of Intrusions and Malwareand Vulnerability Assessment vol 9148 of Lecture Notes inComputer Science pp 3ndash24 Springer International PublishingCham 2015
[4] S Mohurle and M Patil ldquoA brief study of Wannacry ThreatRansomware Attack 2017rdquo International Journal of AdvancedResearch in Computer Science vol 8 no 5 2017
[5] X Yu Z Tian J Qiu and F Jiang ldquoA Data Leakage PreventionMethod Based on the Reduction of Confidential and ContextTerms for SmartMobileDevicesrdquoWireless Communications andMobile Computing vol 2018 Article ID 5823439 11 pages 2018
[6] J Yaneza ldquoBrute Force RDP Attacks Plant CRYSIS Ransom-warerdquo httpsblogtrendmicrocomtrendlabs-security-intelli-gencebrute-force-rdp-attacks-plant-crysis-ransomware
[7] ldquo10 of Ransomware Attacks on SMBs Targeted IoT Devicesrdquohttpswwwdarkreadingcomapplication-security10ndashof-ran-somware-attacks-on-smbs-targeted-iot-devices-dd-id1329817
[8] Q Tan Y Gao J Shi X Wang B Fang and Z H TianldquoTowards a Comprehensive Insight into the Eclipse Attacks ofTor Hidden Servicesrdquo IEEE Internet of Things Journal 2018
[9] ldquoKaspersky Security Bulletin STORY OF THE YEAR 2017rdquo2017 httpsmediakasperskycontenthubcomwp-contentup-loadssites4320180307164824KSB Story of the Year Ran-somware FINAL engpdf
[10] Z Tian Y Cui L An et al ldquoA Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campusrdquo IEEEAccess vol 6 pp 35355ndash35364 2018
[11] R Ross et al Managing Information Security Risk Organi-sation Mission and Information System View National Insti-tute of Standards and Technology 2011 httpcsrcnistgovpublicationsnistpubs800-39SP800-39-finalpdf
[12] Y Wang Z Tian H Zhang S Su and W Shi ldquoA PrivacyPreserving Scheme for Nearest Neighbor Queryrdquo Sensors vol18 no 8 p 2440 2018
[13] Z H Wang X Wu C G Liu Q X Liu and J L ZhangldquoRansomTracer Exploiting Cyber Deception for RansomwareTracingrdquo in Proceedings of the IEEEThird International Confer-ence on Data Science in Cyberspace pp 227ndash234 2018
[14] B A S Al-rimyMAMaarof and S ZM Shaid ldquoRansomwarethreat success factors taxonomy and countermeasures A
Wireless Communications and Mobile Computing 13
survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018
[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018
[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015
[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016
[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020
[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016
[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015
[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016
[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016
[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015
[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015
[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016
[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016
[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015
[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018
[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017
[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015
[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016
[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017
[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments
[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language
identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012
[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004
[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst
[39] ldquoVirusTotalrdquo httpvirustotalcom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
Wireless Communications and Mobile Computing 13
survey and research directionsrdquo Computers amp Security vol 74pp 144ndash166 2018
[15] F Jiang Y Fu B B Gupta et al ldquoDeep Learning based Multi-channel intelligent attack detection for Data Securityrdquo IEEETransactions on Sustainable Computing 2018
[16] N Andronio S Zanero and F Maggi ldquoHelDroid dissectingand detecting mobile ransomwarerdquo in Research in AttacksIntrusions and Defenses vol 9404 of Lecture Notes in ComputerScience pp 382ndash404 Springer 2015
[17] F Mercaldo V Nardone A Santone and C A VisaggioldquoRansomware Steals Your Phone Formal Methods Rescue Itrdquoin Formal Techniques For Distributed Objects Components AndSystems 36th IFIPWG 61 International Conference FORTE2016 held as part of the 11th International Federated ConferenceOn Distributed Computing Techniques DisCoTec 2016 E Albertand I Lanese Eds pp 212ndash221 Springer International Publish-ing 2016
[18] D Sgandurra L Muplusmnoz-Gonzszliglez R Mohsen and E C LupuldquoAutomated Dynamic Analysis of Ransomware Benefits Limi-tations and use for Detectionrdquo httpsarxivorgabs160903020
[19] S Song B Kim and S Lee ldquoThe Effective RansomwarePrevention Technique Using Process Monitoring on AndroidPlatformrdquo Mobile Information Systems vol 2016 Article ID2946735 9 pages 2016
[20] T Yang Y Yang K Qian D C-T Lo Y Qian and L TaoldquoAutomated detection and analysis for android ransomwarerdquoin Proceedings of the 17th IEEE International Conference onHigh Performance Computing and Communications IEEE 7thInternational Symposium on Cyberspace Safety and Security andIEEE 12th International Conference on Embedded Software andSystems HPCC-ICESS-CSS 2015 pp 1338ndash1343 USA August2015
[21] N Scaife H Carter P Traynor andK R B Butler ldquoCryptoLock(and Drop It) Stopping Ransomware Attacks on User Datardquo inProceedings of the 36th IEEE International Conference on Dis-tributed Computing Systems ICDCS 2016 pp 303ndash312 JapanJune 2016
[22] M M Ahmadian and H R Shahriari ldquo2entFOX A frameworkfor high survivable ransomwares detectionrdquo in Proceedings ofthe 13th International ISC Conference on Information Securityand Cryptology ISCISC 2016 pp 79ndash84 Iran September 2016
[23] M M Ahmadian H R Shahriari and S M GhaffarianldquoConnection-monitoramp connection-breakerA novel approachfor prevention and detection of high survivable ransomwaresrdquoin Proceedings of the 12th International ISC Conference onInformation Security and Cryptology ISCISC 2015 pp 79ndash84Iran September 2015
[24] D Kim W Soh and S Kim ldquoDesign of Quantification Modelfor Prevent of Cryptolockerrdquo Indian Journal of Science andTechnology vol 8 no 19 2015
[25] CMoore ldquoDetectingRansomwarewithHoneypotTechniquesrdquoin Proceedings of the 2016 Cybersecurity and CyberforensicsConference (CCC) pp 77ndash81 Amman Jordan August 2016
[26] F Mbol J Robert and A Sadighian ldquoAn Efficient Approachto Detect TorrentLocker Ransomware in Computer Systemsrdquoin Cryptology and Network Security S Foresti and G PersianoEds vol 10052 of Lecture Notes in Computer Science pp 532ndash541 Springer International Publishing Cham 2016
[27] K Cabaj P Gawkowski K Grochowski and D Osojca ldquoNet-work activity analysis of CryptoWall ransomwarerdquo PrzeglądElektrotechniczny vol 91 no 11 pp 201ndash204 2015
[28] J Chen Z Tian X Cui L Yin and X Wang ldquoTrust Architec-ture and Reputation Evaluation for Internet of Thingsrdquo Journalof Ambient Intelligence amp Humanized Computing vol 2 pp 1ndash92018
[29] C Le Guernic and A Legay ldquoRansomware and the LegacyCrypto API Paper presented at the Risks andrdquo in Risks andSecurity of Internet and Systems 11th International ConferenceCRiSIS 2016 Roscoff France 2017
[30] L Pingree Emerging Technology Analysis Deception Techniquesand Technologies Create Security Technology Business Opportu-nities Gartner 2015
[31] ldquoRansomware and Businessesrdquo httpswwwsymanteccomcontentdamsymantecdocssecurity-centerwhite-papersran-somware-and-businesses-16-enpdf 2016
[32] A Kharraz and E Kirda ldquoRedemption Real-Time ProtectionAgainst Ransomware at End-Hostsrdquo in Proceedings of theInternational Symposium on Research in Attacks Intrusions andDefenses pp 98ndash119 2017
[33] F Bergstrand J Bergstrand and H Gunnarsson Localizationof Spyware in Windows Environments
[34] ldquoPEViewrdquo httpswwwaldeidcomwikiPEView[35] ldquopython-pefilerdquo httpspypipythonorgpypipefile[36] M Lui and T Baldwin ldquolangid py An off-the-shelf language
identification toolrdquo in Proceedings of the ACL 2012 systemdemonstrations Association for Computational Linguistics pp25ndash30 2012
[37] S Bird and E Loper ldquoNLTK the natural language toolkitrdquoin Proceedings of the ACL 2004 on Interactive poster anddemonstration sessions Association for Computational Linguis-tics Barcelona Spain July 2004
[38] ldquoRrenaud Gibberish-Detectorrdquo httpsgithubcomrrenaudGibberish-DetectorblobmasterREADMErst
[39] ldquoVirusTotalrdquo httpvirustotalcom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom