AECID: A Self-learning Anomaly Detection Approach Based on ... · bility of anomaly detection...

AECID: A Self-learning Anomaly Detection Approach Based onLight-weight Log Parser Models

Markus Wurzenberger1, Florian Skopik1, Giuseppe Settanni1 and Roman Fiedler1

1AIT Austrian Instritute of Technology, Center for Digital Safety and Security, Vienna, [email protected]

Keywords: Anomaly Detection; Intrusion Detection System, Machine Learning; Log Analysis

Abstract: In recent years, new forms of cyber attacks with an unprecedented sophistication level have emerged. Addi-tionally, systems have grown to a size and complexity so that their mode of operation is barely understandableany more, especially for chronically understaffed security teams. The combination of ever increasing exploita-tion of zero day vulnerabilities, malware auto-generated from tool kits with varying signatures, and the stillproblematic lack of user awareness is alarming. As a consequence signature-based intrusion detection systems,which look for signatures of known malware or malicious behavior studied in labs, do not seem fit for futurechallenges. New, flexibly adaptable forms of intrusion detection systems (IDS), which require just minimalmaintenance and human intervention, and rather learn themselves what is considered normal in an infrastruc-ture, are a promising means to tackle today’s serious security situation. This paper introduces ÆCID, a newanomaly-based IDS approach, that incorporates many features motivated by recent research results, includingthe automatic classification of events in a network, their correlation, evaluation, and interpretation up to adynamically-configurable alerting system. Eventually, we foresee ÆCID to be a smart sensor for establishedSIEM solutions. Parts of ÆCID are open source and already included in Debian Linux and Ubuntu. Thispaper provides vital information on its basic design, deployment scenarios and application cases to support theresearch community as well as early adopters of the software package.

1 Introduction

In 1980, James Anderson was one of the firstresearchers who indicated the need of intrusion de-tection systems (IDS), contradicting the commonassumption of software developers and administra-tors, that their computer systems were running in‘friendly’, i.e. secure, environments (James P. Ander-son, 1980). Recent reports (Verizon, 2016) (Syman-tec, 2016) show that while companies started to investlarge amounts of resources into cyber security alreadyyears ago, many basic security issues unfortunatelyremain. On the one side, the increasing interconnec-tion of previously isolated infrastructures, and the cre-ation of systems of systems as well as the emergenceof the Internet of Things (IoT) (Atzori et al., 2010)tremendously increases the attack surface. On theother side of the scale, the awareness in non-securitycompany is still rather low – most successful attackstoday still use phishing, skimming and baiting (Ver-izon, 2016) and exploit human negligence to obtainunauthorized access to a company’s ICT network.

A wide variety of security solutions have been

proposed in recent years to cope with this challeng-ing situation. While some of them were able to re-lax the situation at least partially, research on intru-sion detection systems seems – due to rapidly chang-ing technologies and system design paradigms – tobe a never-ending story. Signature-based approaches(Mitchell and Chen, 2014), i.e., black-listing meth-ods, are still the de-facto standard applied today forsome good reasons: they are essentially easy to con-figure, can be centrally managed, i.e., do not needmuch customization for specific networks, yield a ro-bust and reliable detection for known attacks and pro-vide low false positive rates. While these are all greatbenefits for the application in today’s enterprise envi-ronments, there are, nevertheless, solid arguments towatch out for more sophisticated anomaly-based de-tection mechanisms (Liao et al., 2013), which shouldbe applied additionally to black-listing approaches forthe reasons explained as follows.

• The exploitation of new zero-day vulnerabilitiesare hardly detectable by blacklisting-approaches.Simply, there are no signatures to describe the in-dicators of an unknown exploit.

• Attackers can easily circumvent the detection ofMalware, once indicators are widely distributed.Simply re-compiling a Malware with small mod-ifications will change hash sums, names, IP ad-dresses of command and control servers and thelike – in the worst case, rendering all these datawhich is used to describe indicators useless.

• Eventually, many sophisticated attacks use socialengineering as an initial intrusion vector. Hereno technical vulnerabilities are exploited, hence,no indicators on a blacklist can appropriately de-scribe malicious behavior.

Especially the latter requires smart anomaly detec-tion approaches to reliably discover deviations froma desired system’s behavior as a consequence of anunusual utilization through an illegitimate user. Thisis the usual case when an adversary manages to stealuser credentials and is using these actually legitimatecredentials to illegitimately access a system. How-ever, an attacker will eventually utilize the system dif-ferently from the legitimate user, to reach his target,for instance running scans, searching shared directo-ries and trying to extend his influence to surround-ing systems at either unusual speed, at unusual times,taking unusual routes in the network, issuing actionswith unusual frequency, causing unusual data trans-fers at unusual bandwidth. This causes a series ofevents within an infrastructure which are picked upby anomaly-based approaches and used to trigger offalerts.

Certainly, even black-listing approaches can coversome of these cases. For instance, log-in attemptsoutside business hours is a standard case, which ev-ery well-configured IDS can detect. The point hereis, that with just using black-listing the security per-sonnel must think of a wide variety of potential at-tack cases ahead and how they manifest in the net-work, i.e., which kind of events they would trigger.This is not only a cumbersome never-ending task, butalso extremely error-prone. Eventually, there will al-ways be ways to exploit a system of which the de-signers and operators did not think in advance. Incontrast to that, the application of white-listing ap-proaches seems intriguing here: one needs to describethe ‘normal and desired system behavior’ (this meansto ‘white list’ what is known good) and everythingthat differs from this description is potentially classi-fied as hostile.

The effort is comparatively lower, which demon-strates impressively the advantage of an anomaly-based approach. However, while this seems quite at-tractive, the advantages come with a price. Often,high false positive rates, complex behavior models,potentially error-prone learning phases and the like

are just some of the drawbacks to consider. Deploy-ing, configuring and effectively operating an anomalydetection system is a highly non-trivial task.

Substantial effort has been invested to applyanomaly detection on network traffic in real-time(Liao et al., 2013). With these methods, irregularitiesof data flows can be effectively spotted by investigat-ing network packets and their meta data, such as flowsbetween previously decoupled systems or changes inthe interaction patterns and behavior. However, manyargue that in the future a reliable detection of anoma-lies will only be feasible on the host. The wide adop-tion of numerous tunneling technologies, end-to-endencryption, virtualization/containerization, and soft-ware defined networks makes it hard – if not impos-sible – to track the real system behavior by inspect-ing network traffic only. More sophisticated, but alsomore complex is the detection of anomalies based onactual system events. These system events are oftencaptured in system log files and are just waiting to beharnessed by more modern anomaly detection tech-niques.

In this paper, we have a close look on the applica-bility of anomaly detection approaches in IDS whichspecifically exploit semantically rich log data. In par-ticular the contribution of this paper is threefold:

• we discuss the design principles of a modern scal-able anomaly detection system to be applied inlarge-scale distributed systems;

• we outline the ÆCID approach, which is an ac-tual implementation based on the aforementioneddesign principles;

• we demonstrate the successful application of theÆCID approach specifically in non-enterprise ITenvironments, such as Industry 4.0 and cyber-physical systems.

The remainder of the paper is organized as fol-lows. Section 2 outlines important background andrelated work. Section 3 discusses important con-cepts for modern anomaly detection approaches ap-plied in IDS systems, specifically, the distribution ofthe parsing and un-supervised classification processover numerous nodes and continuous synchronizationamong them. In Section 4 the implementation of themost important features of the ÆCID system is de-scribed, followed by its mode of operation in Section5. Then, Section 6 highlights typical application casesof ÆCID and demonstrates where such a system issuperior over a standard signature-based solution. Fi-nally, Section 7 concludes the paper.

2 Background and Related work

Like other security tools, IDS aim to achieve ahigher level of security in ICT networks. Their pri-mary goal is to timely and rapidly detect invaders,so that it is possible to react quickly and reduce thecaused damage. In opposite to Intrusion Preven-tion Systems (IPS) that are able to actively defend anetwork against an attack, IDS are passive securitymechanisms, which allow to detect attacks withoutenabling any countermeasure automatically (Whit-man and Mattord, 2012), but informing administra-tors of the discovered intrusion through notificationprocedures.

(Sabahi and Movaghar, 2008; Liao et al., 2013;Garcıa-Teodoro et al., 2009) survey various methodsapplicable for intrusion detection. IDS can roughlybe categorized as host-based IDS (HIDS), network-based IDS (NIDS) and hybrid or cross-layer IDS(Vacca, 2013). Similarly to simple security solutionssuch as anti-viruses, HIDS are installed on every sys-tem (host) of the network that needs to be monitored.While HIDS deliver specific high-level informationabout an attack and allow comprehensive monitor-ing of a single host, they can be potentially disabledby, for example, a Denial of Service (DoS) attack,because if a system is once compromised also theHIDS is. NIDS monitor and analyze the network traf-fic crossing the network internal to an organization.For monitoring a network with an NIDS, one singlesensor-node can be sufficient and the functionality ofthis sensor is not effected if one system of the networkis compromised. A major drawback of NIDS is thatif the network is overloaded, a complete monitoringcannot be guaranteed, because some network pack-ets may be dropped to avoid congestion. Cross-layerand hybrid IDS merge different methods. Hybrid IDSusually provide a management framework that com-bines HIDS and NIDS to reduce their drawbacks andbenefit from their advantages. Cross-layer IDS aim tomaximize the available information while minimiz-ing the false alarm rate. To achieve this, various datasources, such as log data and network traffic data, areused for intrusion detection.

There exist generally three detection methods ap-plied in IDS: signature-based detection (SD), statefulprotocol analysis (SPA), and anomaly-based detection(AD) (Liao et al., 2013). SD and SPA can only de-tect previously known attack patterns using signaturesand rules that describe malicious events and thus arealso called black-listing approaches (Whitman andMattord, 2012) (Scarfone and Mell, 2007). AD ap-proaches are more flexible and able to detect noveland previously unknown attacks. They establish a

baseline of normal system behavior and therefore arealso called white-listing approaches (Garcıa-Teodoroet al., 2009).

The rapidly changing cyber threat landscape de-mand for flexible and self-adaptive IDS approaches.One solution are self-learning AD based approachesthat automatically learn and continuously adapt thenormal system behavior; this serves as ground truthto detect anomalies that expose attacks and especiallyinvaders. Generally, there are three ways to realizeself-learning AD (Goldstein and Uchida, 2016): su-pervised, semi-supervised, and unsupervised. Unsu-pervised methods do not require any labeled data andare able to learn distinguishing normal from malicioussystem behavior during the training phase. Semi-supervised methods are applied when the training setonly contains anomaly-free data; they are also knownas ‘one-class’ classification. Supervised methods re-quire a fully labeled training set containing both nor-mal and malicious data. These three methods donot necessitate active human intervention during thelearning process. While unsupervised self-learning isentirely independent from human influence, for theother two methods the user has to ensure that the train-ing data is anomaly free and correctly labeled. Con-sequently, the previously described methods can becategorized as unsupported self-learning approaches.

3 Motivation and Challenges

This section discusses the motivation behind theintroduction of the novel anomaly detection approachpresented in this paper, and summarizes the chal-lenges this approach aims to address. This workbuilds on our prior research on the topic, publishedin ((Friedberg et al., 2015)), and enhances the ap-proach therein presented by extending its features andimproving its flexibility.

Existing anomaly detection approaches that ana-lyze computer log data are mostly designed for foren-sic analysis. However, these techniques can comple-ment on-line Intrusion Detection System (IDS) andtimely detect sophisticated attacks, such as advancedpersistent threats (APT), and eventually mitigate theirimpact (Friedberg et al., 2015). Furthermore, con-trarily to available state-of-the-art IDS, which mostlyprocess network traffic data, a paradigm shift towardstextual log data should be considered. In fact, the cur-rent trend shows an increasing adoption of end-to-endencryption in network communications, to secure crit-ical information in transit. Hence, only clear text metainformation is available for the investigation, if onlynetwork traffic is inspected. Thus, alternative data

sources need to be leveraged to effectively performanomaly detection. To the best of our knowledge, noonline IDS is available, which processes log data toperform anomaly detection.

Anomaly-based IDS normally implement white-listing mechanisms; they are also called behaviorbased approaches because they compare computernetwork’s or system’s activities against a model thatcharacterizes the normal behavior. This reference isconsidered to be “anomaly-free” and is also known asground-truth. Today’s rapidly changing cyber threatlandscape accounts for flexible and self-adaptive IDSapproaches. Thus, to build a system behavior modelsemi-supervised self-learning methods can be ap-plied, whose main strength is the capability to detectunknown attacks without requiring attack signatures.

Leveraging self-learning and white-listing meth-ods it is possible to implement anomaly detection sys-tems that are independent from semantical interpre-tation of the processed log data, and from the syn-tax of the data. Since the system behavior modelcan be built automatically, the only requirement isthat the syntax of the log lines does not change overtime, as this would be considered as a deviation fromthe normal system behavior and therefore marked asanomaly. This is feasible because log data, contrar-ily to free text, follows a predefined structure. Nor-mally, a log message comprises static chunks, i.e.,constant strings that occur often in the log sequence(e.g., prepositions, commands, protocol names), andvariable chunks occurring less frequently (e.g., IP ad-dresses, host names, TCP port numbers). Moreover,the order in which different parts occur in a log line isdeterministic and does not vary.

Another benefit of self-learning and white-listingapproaches utilized for anomaly detection is their in-trinsic flexibility. They can be applied to process logsproduced by legacy systems and by appliance withsmall market shares (like those largely employed incyber physical systems (CPS)). Such systems oftenlack of vendor support and suffer from poor docu-mentation. Usually neither security solution providerssupply signatures for monitoring these systems, norvendors make patches available to keep the systemsup-to-date. Furthermore, modern networks, such asthe Internet of Things (IoT), comprise many devicesthat are often not powerful enough to allocate thelarge amounts of resources required to run anomalydetection tools. Thus, light-weight anomaly detec-tion mechanisms that can operate consuming a min-imal amount of memory and CPU are necessary. Oneway to meet this requirement is to foresee a decentral-ized architecture. Different light-weight processinginstances can be distributed across the infrastructure,

while a central control instance, running on a pow-erful machine, executes resource intensive operations(such as machine-learning functions) and allows anadministrator to control the distributed light-weightedinstances.

3.1 Detectable anomalies

In order for an anomaly detection method to be con-sidered effective, different types of anomalies have tobe recognizable by using it, with a certain level ofconfidence. In the following we list the main cate-gories of anomalies a modern anomaly detection toolshould be capable to reveal.

The simplest type of anomaly is represented byanomalous single events. On the one hand, thesecan be so-called outliers representing rarely occur-ring events, which appear so seldom that are not partof the normal system behavior model. On the otherhand, these anomalies can be violations of prohibitedparameter values; for example, a client access exe-cuted through a user agent normally not utilized in anetwork, therefore not white-listed. In case of black-listing approaches, user agents that are not allowedneed to be added (one-by-one) to the blacklist, andhence imply a high risk of incompleteness.

Anomalous event parameters are point anomaliessuch as IP addresses, port numbers or software ver-sions that are not white-listed and therefore are notpart of the normal system behavior. This type ofanomalies includes, for example, events that occuroutside of business hours, or are triggered by accountsof employees who are on vacation.

Anomalous single event frequencies are eventsusually considered normal, which occur with ananomalous frequency. For example, in case of datatheft, an anomalously high number of database ac-cesses from a single client would be recorded in thelog data, triggering an anomaly.

Anomalous event sequences are anomalies reveal-able by observing the dependency between relatedevents. Such dependency can be formalized by defin-ing correlation rules. A correlation rule describes aseries of events that have to occur in an ordered se-quence, within a given time-window, to be considerednon anomalous. To detect more complex anomalousprocesses, which may involve different systems ona network, multiple log lines need to be examined.After a particular log line type (recording a condi-tioning event) is observed, another specific log line(recording the expected implied event) has to occurwithin a predefined time slot. Otherwise, an anomalyis raised. Additionally, such correlation rules shouldbe definable so that once a given (conditioning) event

occurred, the algorithm checks if in a predefined timewindow previous to such event, another specific (im-plied) event has occurred. Correlation rules allow, forexample, the detection of access chain violations, e.g.,in course of an SQL-injection.

4 The ÆCID Approach

This section introduces ÆCID, a novel anomalydetection approach we propose to address the chal-lenges outlined in the previous section. We illustratehere the system architecture and we describe the twomain components ÆCID consists of: the AMiner andÆCID Central.

Figure 1 depicts the system architecture of ÆCID.ÆCID is designed to allow the deployment inhighly distributed environments; in fact, due to itslightweight implementation, an AMiner instance canbe installed, as vantage point, on any relevant node ofthe network; ÆCID Central is the component respon-sible of controlling and coordinating all the deployedAMiner instances.

The AMiner operates similarly to an HIDS sensor.It has to be installed on every host and network nodethat needs be monitored, or – if possible – on a cen-tralized logging storage which collects the log datagenerated by the monitored nodes. Each AMiner in-stance interprets the log messages acquired from thenode it is deployed on, following a specific model,called parser model, generated ad-hoc to represent thedifferent events being logged on that particular node.A tailored rule set identifies the events that are con-sidered legitimate on that system; an AMiner instancechecks every parsed log line against this rule set andreports any mismatch. Furthermore, each AMiner in-stance comprises a report generator that produces adetailed record of parsed and unparsed lines, alertsand triggered alarms. The reports can be sent eithervia the AECID Central Interface to the AECID Cen-tral, or through additional interfaces (e.g., via e-mail)to system administrators. The parser model in combi-nation with the rule set, characterize the normal sys-tem behavior, i.e., describe the type, structure, andcontent of log lines representing events allowed to oc-cur on the monitored system. Every log message vi-olating this behavioral model represents an anomaly.

While the AMiner performs lightweight opera-tions such as parsing log messages and comparingthem against a set of existing rules, ÆCID Centralprovides more advanced features, and therefore re-quires more computational resources than a singleAMiner instance. One of the main functions executed

by ÆCID Central is to learn the normal system behav-ior of every monitored system, and consequently con-figure the AMiner instance, running on that system,to detect any logged abnormal activity. To do this,ÆCID Central analyzes the logs received from eachAMiner instance, generates a tailored parser model(Parser Model Generator function) and a specific setof rules (Rule Generator function), and sends themto the AMiner instance, which adopts them to exam-ine future log messages. ÆCID Central provides thedifferent AMiner instances with self-learned parsermodels and rule sets, and adapts them, when the net-work infrastructure and/or the user behavior change.Hence, ÆCID Central needs to control and configureall the deployed AMiner instances; these operationsare performed through the AMiner Interface. More-over, a Control Interface allows a system administra-tor to communicate with ÆCID Central, adjust its set-tings, and setup the deployed AMiner instances. Ad-ditionally ÆCID Central leverages a Correlation En-gine that allows to analyze and associate events ob-served by different AMiner instances, with the pur-pose of white-listing events generated by complexprocesses involving diverse network nodes. ÆCIDCentral also provides a Web-based GUI, to exploreand review statistics as well as information on trig-gered alerts and alarms. If the log data collectedwithin a network infrastructure includes records ofcommunication events between network devices (e.g.,

ÆCID

ÆC

ID C

en

tral

IF

Parser Model

Rule Set

Report Generator

...

ÆCID CENTRAL

Control Interface

AM

ine

rs IF

Parser Model Generator

RuleGenerator

CorrelationEngine

ÆC

ID C

en

tral

IF

Parser Model

Rule Set

Report Generator

ÆC

ID C

en

tral

IF

Parser Model

Rule Set

Report Generator

Figure 1: ÆCID architecture.

“ntpd[“PID [INT]

“]: “

“ntpd exiting on signal “

“Listen and drop on “

“Listening on routing socket on fd # “

“Listen normally on “

“proto: precision = “

“peers refreshed“

“ntp_io: estimated max descriptors: 1024, initial socket boundary: 16“

“new interface(s) found: waking up resolver“

SIGNAL [INT]

FD [INT]“ “

INTERFACE [IF]“ “

ADDRESS [IPv6]

“ UDP 123“

ADDRESS [IPv4]

“ UDP 123“

FD [INT]“ for interface updates“

“:123“

FD [INT]“ “

INTERFACE [IF]“ “

ADDRESS [IPv6]

“ UDP 123“

ADDRESS [IPv4]

“ UDP 123“

“:123“

PREC [DDM]“ usec“

Figure 2: The graph describes the parser model for ntpd (Network Time Protocol) service logs. String under quotes are fixedelements or elements of a list. Oval entities symbolize model elements that allow variable values (an example of a parsedlog-line is provided in Figure 3).

packets’ headers observed via tcpdump), ÆCID canoperate as NIDS, and therefore be utilized as hybridIDS.

4.1 AMiner

The AMiner is the peripheral component of the ÆCIDsystem, which executes lightweight operations on logdata collected from the system it is deployed on.The current AMiner implementation – available opensource1 – is compatible with Python 2.6-2.7 and hasbeen tested on CentoOS 6.7 and Ubuntu Xenial 16.04.In the lightest available configuration, AMiner re-quires only 32 MB of memory to perform simple logline filtering. If stricter memory requirements are inplace, the AMiner can run on a separate (more power-ful) machine, and listen to the log stream coming fromthe monitored device. This makes it possible to pro-cess logs produced by devices with low computationalpower, legacy systems, and devices whose hardwaredoes not meet the AMiner’s installation requirements.

The following sections illustrate in detail how theAMiner parses the captured log messages and how itidentifies whether the logged event is to be consideredlegit or it indicates an anomaly.

4.1.1 Parser Model

The AMiner interprets every incoming log messageaccording to a specific parser model, which charac-terizes the events observed on the device or networkcomponent it monitors. The parser model representsa path entropy model that efficiently describes thewhite-listed, i.e. permitted, log lines. It describes the

1https://git.launchpad.net/logdata-anomaly-miner/

log model of the monitored system as a graph, specifi-cally an ordered tree (see Figure 2). The goal of usingthe parser model is to filter out as much redundant in-formation as possible before a detailed analysis of thelog line is performed. The parser model allows to ef-ficiently extract all the information contained in a logline, while retaining only a minimum amount of data.Thus, every branch of the graph includes fixed seg-ments, which represent constant strings always occur-ring in the same position of the log line, and variablesegments, which represent strings that differ from lineto line.

Log messages produced by different services havegenerally different syntaxes; for this reason, a parsermodel can only describe the set of possible log mes-sages produced by one single service. This impliesthat an AMiner instance will adopt a distinct parsermodel for each different service running on the sys-tem that it monitors.

There exist several components, namedModelElements, a parser can consist of. Themost frequently used are2:• AnyByteDataModelElement: Match anything till

the end of a log-atom.• DateTimeModelElement: Simple datetime pars-

ing using python datetime module.• DecimalIntegerValueModelElement: Parsing

integer values.• FirstMatchModelElement: Branch the model

taking the first branch matching the remaininglog-atom data.

• FixedDataModelElement: Match a fixed (con-stant) string.

2A more exhaustive list of model elementscan be found in the AMiner documentation at:https://git.launchpad.net/logdata-anomaly-miner/tree/source/root/usr/share/doc/aminer/ParsingModel.txt

Listing 1: Example log data of an ntpd service.0: Jun 14 16:17:12 ghive -ldap ntpd[16721]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 1231: Jun 14 16:17:12 ghive -ldap ntpd[16721]: Listen and drop on 1 v6wildcard :: UDP 1232: Jun 14 16:17:12 ghive -ldap ntpd[16721]: Listen normally on 2 lo 127.0.0.1 UDP 1233: Jun 14 16:17:12 ghive -ldap ntpd[16721]: Listen normally on 3 eth0 134.74.77.21 UDP 1234: Jun 14 16:17:12 ghive -ldap ntpd[16721]: Listen normally on 4 eth1 10.10.0.57 UDP 1235: Jun 14 16:17:12 ghive -ldap ntpd[16721]: Listen normally on 5 eth1 fe80::5652:ff:fe5a:f89f UDP 1236: Jun 14 16:17:12 ghive -ldap ntpd[16721]: Listen normally on 6 eth0 fe80::5652:ff:fe01:1aee UDP 1237: Jun 14 16:17:12 ghive -ldap ntpd[16721]: Listen normally on 7 lo ::1 UDP 1238: Jun 14 16:17:12 ghive -ldap ntpd[16721]: peers refreshed9: Jun 14 16:17:12 ghive -ldap ntpd[16721]: Listening on routing socket on fd #24 for interface updates

“ntpd[“16721

“]: “ “Listen normally on “3

“ “eth0

“ “134.74.77.21

“ UDP 123“

PIDDecimal Integer Model

File DescirptorDecimal Integer Model

IPFirst Match ModelIP Address Model

String 2Fixed Data Model

Service Name Fixed Data Model

String 1 Fixed Data Model

MessageFirst Match ModelFixed Data Model

Interface NameVariable Byte Model

(0..9a..z.)

String 3Fixed Data Model

PortFixed Word List Model

Figure 3: Example of log line parsing (cf. List 1 line number 3 and Figure 2).

• IpAddressDataModelElement: Match an IPv4address.

• SequenceModelElement: Match all the sub-elements exactly in the given order.

• FixedWordlistDataModelElement: Match oneof the fixed strings from a list.

• VariableByteDataModelElement: Match vari-able length data encoded within a given alphabet.

When the AMiner parses a log line, it usually ap-plies a preamble model to parse the beginning of theline; for example, in case of a syslog message, thepreamble corresponds to the timestamp and the hostname. Afterwards, a FirstMatchModelElement de-cides which service the log line reports about, andapplies the corresponding parsing model. For a bet-ter understanding of the parser, Figure 2 shows thegraph of the parser model for the log lines of anntpd service (see List 1). Figure 3 illustrates howthe line number 3 in List 1 is parsed. The figureshows how the different elements of the log line aremapped to the nodes of the graph that represents theparser of the ntpd service. For example, at eachfork of the graph a FirstMatchModelElement isapplied to find the correct branch, before the nextpart of the line can be parsed. The oval entitiesmark variable segments of the log line; these haveto follow specific rules, e.g., a PID parsed by theDecimalIntegerValueModelElement must consistonly of digits between 0 and 9. Non-oval entitiesmark fixed segments of the parsed log line that mustalways occur for a log line to match the same branchof the parser model.

4.1.2 Detecting Anomalies

There are two ways for the AMiner to reveal anoma-lies within the log messages: i) by observing devi-ations of the log lines from the parser model, ii) byidentifying log lines which do not follow certain pre-defined rules.

Thanks to the acquired knowledge on the nor-mal system behavior, formalized through the corre-sponding parser model, the most advantageous wayto reveal anomalies is to detect significant deviationsof the logged events from the normal system behav-ior model. Usually, an information system operatesonly in a contained number of system states. Thosestates, which occur while the system runs normally,i.e. when the system is not in maintenance or recov-ery mode, define the model for the normal system be-havior. Given that log data records occurring systemevents, a log message that does not match any avail-able path in the parser model graph is to be consid-ered anomalous, because it represents an unexpectedsystem event. Thus, the AMiner considers a log linenon-anomalous only if it matches one entire path ofthe parser model graph. Hence, every deviation fromthe graph’s paths indicates an anomalous event. Theanomaly can be caused either by a technical failure,by maintenance activities, or by an unused systemfunction that has been activated by an attack. TheAMiner white-lists all paths described by the graphdefined by the parser model, and raises an alert everytime a log line cannot be fully parsed.

Another way to detect anomalies is by definingwhite-listing rules, which are formulated in order to

Listing 2: Output of the parser shown in Figure 2 applied tolog line 3 from List 1Jun 14 16:17:12 ghive -ldap ntpd[16721]: Listennormally on 3 eth0 134.74.77.21 UDP 123/model/syslog/time: ‘Jun 14 16:17:12’/model/syslog/host: ‘ghive -ldap ’/model/services/ntpd/sname: ‘ntpd[’/model/services/ntpd/pid: ‘16721’/model/services/ntpd/s1: ‘]: ’/model/services/ntpd/msg/text: ‘Listen normallyon ’/model/services/ntpd/msg/descriptor: ‘3’/model/services/ntpd/msg/s2: ‘ ’/model/services/ntpd/msg/if: ‘eth0 ’/model/services/ntpd/msg/s3: ‘ ’/model/services/ntpd/msg/ipv4/ip: ‘134.74.77.21’/model/services/ntpd/msg/ipv4/port/: ‘ UDP 123’

allow only specific system events. The AMiner ex-tracts all paths occurring in a log line and the as-sociated parsed values as shown in List 2. White-listing rules can be defined by simply allowing onlyspecific values for a certain log line element, or spe-cific combination of different values. For example,in /model/services/ntpd/msg/ipv4/ip, it is pos-sible to white-list only a certain list of IP addresseswhich are allowed to occur. If a non-white-listed IPaddress is detected, an alert or an alarm can be raised,and consequently an e-mail message can automati-cally be sent to system administrators to notify themof the anomaly. The same concept can be applied fora combination of values; for example to allow thatspecific user names only appear together with certainIP or MAC addresses. A rule can also be configuredas a black-listing rule, to filter out known unwantedevents, whose impact is considered dangerous. Fi-nally, it is possible to formulate rules in a way to per-mit a range or a list of values.

Signature based IDS normally analyze log linesindividually, however, malicious network behavioroften manifests in an sequence of multiple log lines.Only by correlating such a sequence of events theanomaly can become apparent. For this reason, theAMiner can be configured to detect anomalies basedon statistics. For example, a specific event which nor-mally occurs 5 to 7 times per hour, will trigger analert if it suddenly occurs 10 times in one hour. Inthis case, the AMiner will detect changes in the dis-tribution of path occurrences. For example, assumingthat the occurrence of a path follows a normal distri-bution over a predefined time interval, a fluctuationof the mean and of the standard deviation will indi-cate an anomaly. This demonstrates that the AMineris able to detect not only anomalous single events, butalso anomalous event frequencies.

Once an anomaly is detected, the AMiner can beconfigured to handle alerts and alarms. Reports can besent via e-mail, periodically at a specified frequency,or every time a certain number of alerts is reached, orimmediately as soon as an alarm is triggered to timelyinform an administrator.

4.2 ÆCID Central

ÆCID Central is the intelligent component of ÆCID.It includes the parser generator, the rule generator,and a correlation engine, which allows to correlateevents observed by different AMiner instances. TheAMiner Interface enables the exchange of data be-tween ÆCID Central and the different AMiner in-stances deployed in the network, while a Control In-terface allows administrators to configure the wholesystem as well as to visualize analysis results.

4.2.1 Control Interface

The control interface allows system administrators tocommunicate with ÆCID Central and, through theAMiner Interface, to orchestrate the different AMinerinstances deployed in the network. This means thatthe administrator can change the settings of everyAMiner instance, including the parser model theyadopt, the set of rules that they check, as well as thedata sources that they parse (i.e., their input data).Furthermore, ÆCID Central is equipped with a graph-ical interface visualizing analytical statistics (includ-ing for example the number of occurrences of a cer-tain type of log line), along with information on de-tected anomalies and triggered alerts and alarms.

4.2.2 Parser Model Generator

The parser model generation is one of the self-learning functions provided by ÆCID. Via the con-trol interface the system administrator can activate theparser model generator separately for each AMinerinstance controlled by ÆCID Central. This is donewhen a new AMiner instance is deployed, new hard-ware or software components are added to the net-work, or a high number of false alerts occur, causedby deviations from the current parser model.

When the parser model generator is activated therespective AMiner instance reports every unparsedline to ÆCID Central. The lines are then analyzed toderive a new parser model, or to adapt an existing one.First, the parser model generator collects a certainnumber of lines; afterwards, it applies either a word-based clustering, similar to SLCT (Vaarandi, 2003),or a character-based clustering, similar to (Wurzen-berger et al., 2017), to group the log lines and detect

1

Client Firewall Web Server Database Server

2 3

456

Figure 4: The figure shows the log-in process to an online shop: (1) the client tries to log into an online shop on a web server,(2) a connection through the firewall occurs, (3) the Web server checks credentials through a database query, (4) the databasequery returns some result, (5) a response through the firewall: access acceptance or denial, (6) client receives the response.

static and variable parts of the log lines. Based onthe obtained alignments that describe the derived logline clusters, and considering the information aboutthe log line segments occurring as variable parts, thegraph describing the log line model can be built. Oncethe parser reaches a certain stability level, i.e. whenthe process is repeated for a new set of obtained loglines, and the parser does not need to be adapted any-more, ÆCID Central terminates the parser model gen-eration process. Finally, the generated parser is for-warded to the respective AMiner instance, and theAMiner starts filtering out unknown log lines and gen-erating alerts.

The parser model generator is also invoked toadapt an existing log line model. New paths can beadded to the graph describing a parser model, listssuch as FixedWordlistDataModelElementscan be adapted, or alphabets such asVariableByteDataModelElements can be up-dated.

4.2.3 Rule Generator

Another self-learning function provided by ÆCID isthe rule generation. Similarly to the parser modelgenerator, the rule generator can be activated for aspecific AMiner instance via the control interface.Once enabled, the selected AMiner instance will for-ward the parsed lines to the rule generator running onÆCID Central. Based on the parsed values the rulegenerator creates candidates for rules. It defines listsor intervals of values that are allowed to occur in aspecific path, or it defines rules enforcing that valuesof different paths are only allowed to occur in spe-cific combinations, e.g., specific usernames are onlyallowed to occur in combination with specific IP orMAC addresses.

Once the rule generator defines a rule candidate,the candidate has to be verified. One way to accom-plish this is to run a binomial test as shown in (Fried-berg et al., 2015), to evaluate if a tested rule candidateis stable or not. Once a rule candidate is verified andtherefore considered stable, the rule generator pushesit to the AMiner, which includes it into its rule set.

4.2.4 Correlation Engine

The correlation engine implemented in ÆCID Centralallows to detect network-wide anomalies by correlat-ing events observed by different AMiner instances.This makes it possible to detect deviations withincomplex processes that involve different services, andtherefore produce log messages on components mon-itored by different AMiner instances. Consider theexample depicted in Fig. 4; it shows the normal ac-cess chain of the log-in procedure to a Web-shop.If a user logs into the Web-shop, certain log lineswill be produced in a specific order by the firewall,the Web-server, the database server and the Web-server again, including specific values for the pathsof the parser. ÆCID is able to identify such event se-quences, and to generate corresponding models. Thisallows ÆCID to automatically derive relevant corre-lation rules and verify through them if the systembehavior is aligned with the generated model. AllAMiner instances monitoring the services involved inthe process will, in fact, forward to ÆCID Centralthe log events for which the correlation rule is beingevaluated (even if they are not individually consideredanomalous). ÆCID Central will then analyze the se-quence of events collected from the different AMinerinstances and verify their alignment to the model. Ifthe illustrative access chain mentioned above is vio-lated, because the database is accessed without a pre-vious access to the Web-server (this could be due to anSQL-Injection), ÆCID Central will identify an incon-sistency and therefore trigger an alarm. This is an ex-ample that demonstrate how ÆCID does not only de-tect anomalies that manifest in deviations of the nor-mal system behavior of a single device, but also com-plex anomalies that can only be detected when analyz-ing events occurring in diverse nodes of the network.It is important to notice that the correlation rules canalso be applied to a single AMiner instance to analyzecomplex processes running on one single node.

ÆCID CENTRAL

AMiner Instance #1

Up

da

tes

(Parser Models

Rule Set)

Un

par

sed

Lo

g lin

es +R

epor

t

Alerts & Alarms

ÆCID CENTRAL

AMiner Instance #1

Parser Mo

dels+

Rule Set

Un

par

sed

Lo

g lin

es

ÆCID CENTRAL

AMiner Instance #1

Un

pars

ed L

og

lines

Step 1 Step 2 Step 3

Figure 5: ÆCID Initialization Process.

5 System Deployment and Operation

This section illustrates the different topologies inwhich ÆCID can be deployed within a network, andpresents the phases of its operational work-flow.

The simplest way to deploy ÆCID is by employ-ing only one AMiner instance installed on a singlenode. This setup allows to exclusively monitor eventsproduced and logged by that single component. Al-ternatively, if a log collection mechanism is in placeon the node, which acquires log messages from otherremote nodes in the network (a la Syslog server), theAMiner can work in a centralized fashion, and an-alyze events occurring on distributed systems. Thedrawback of this configuration is that the parser mod-els, as well as the set of rules utilized by the AMinerfor the detection of anomalies, are statically definedand need to be manually configured. The lack ofan intelligent component (ÆCID central) implies, infact, that the administrators have to: i) define theparser model describing the structure of the eventsbeing logged by the monitored node, and ii) have athorough understanding of every event (occurring onevery monitored node) that is to be consider legiti-mate, and instantiate a corresponding set of withe-listing rules. This solution is applicable in case ofsmall-scale systems, whose computational power isnot sufficient to run ÆCID central, which performhighly recurrent operations, and are therefore simpleto characterize manually.

If the infrastructure to be monitored comprisesa large number of distributed nodes, with little re-sources and small computational power at their dis-

posal, ÆCID can be deployed following a star topol-ogy. In this setup, an AMiner instance is installed onevery distributed node, while a more powerful nodehosts ÆCID Central. Every AMiner is connected toÆCID Central and exchanges with it information re-garding parsed lines and discovered anomalies.

Figure 5 illustrates the three stages required toinitialize the ÆCID system when deployed in thistopology. When the AMiner instances are installedfor the first time on the nodes, they are not able toparse any of the log lines generated on the node, be-cause no parser model is defined yet; thus, they for-ward every unparsed line to ÆCID Central, whichlearns the structure of the log messages received fromeach node and automatically builds a dedicated parsermodel for each service generating logs on each node.Along with the parser models, ÆCID Central buildsthe behavioral model of the services running on ev-ery connected node, and generates a correspondingset of white-listing rules per node, which describe themodel, i.e., the normal behavior.

In the second step, ÆCID Central forwards thederived parser models and rule-sets to the respectiveAMiner instances; now the AMiner instances can fol-low the received parser models, analyze the incom-ing log messages and identify any suspicious eventby checking the adherence to the obtained rules.

The third step represents the fully operational sys-tem; the AMiner instances continue sending any un-parsed log line to ÆCID Central for further inspec-tion, and receive from ÆCID Central updates on theenabled parser models and set of rules. If correlationrules are defined, ÆCID Central (through its corre-

lation engine) analyzes the relevant log messages in-teracting with the involved AMiner instances, as de-scribed in the previous section.

Whenever an anomaly is revealed, the AMiner in-stance generates a notification report and, if neces-sary, sends it by email to a pre-configured list of re-cipients. Alerts and alarms are also reported to ÆCIDCentral, where they can be aggregated and visualized.

Finally, in case sufficient resources and computa-tional power are available on a single node, ÆCIDcan be deployed in its full-fledged setup on a stand-alone machine. In this scenario, ÆCID Central andthe AMiner instance operate on the same node. Theadvantage of this topology, compared to the first one,lies in the fact that the administrators do not need tomanually determine the parser models nor to definethe set of rules, because ÆCID Central automaticallygenerates them following the three-step approach de-scribed above and depicted in Figure 5.

6 Application Scenarios

The multi-layer light-weight detection approachpresented in this paper introduces a number of bene-fits that make its adoption attractive for a series of ap-plication scenarios. This section explores some of themost promising use cases, in which the employmentof ÆCID, stand-alone or in conjunction with other se-curity solutions, would be highly advantageous.

As extensively illustrated in the previous sections,ÆCID can be adopted to analyze events logged bysystems running on different layers of the OSI model.If applied on network traffic information, ÆCID canidentify and keep track of the communications estab-lished between different systems in the network, andperform a network interaction graph analysis. Ob-serving which network nodes interact with each-otherat which frequency, would allow ÆCID to build a“communication-behavior” model, and to promptlyidentify any divergence from such a model, whichcould indicate internal or external malicious attemptsto access network systems.

Similarly, ÆCID could be employed to analyzeevents recorded on the application layer, particularlywhen users authenticate on the numerous services de-ployed in an enterprise network. Authentication inter-action graph analysis can be performed using ÆCID,to monitor which users authenticate on which serviceswith which frequency. The “authentication-behavior”model established by ÆCID in this scenario wouldallow to reveal any unusual authentication attempt,pinpointing potential intrusions, illegitimate access tocritical resources, or erroneous authentication.

Moreover, ÆCID could serve as additional secu-rity layer, besides signature-based and other black-listing solutions, such as firewalls. In this setupÆCID would improve the overall detection capabil-ity by allowing the identification of previously un-known threats, and the verification of suspicious trig-gered alerts. The false positive rate (FPR) as well asthe number of false negatives (FN) would decreaseeffectively and a higher level of security would beachieved. The alarms triggered by ÆCID could thenbe fed into a Security Information and Event Man-agement (SIEM) system, which would correlate themwith the events generated by other sensors.

A further promising application area is CPS. CPSof the future will be the backbone of Industry 4.0, andwill operate following a self-adaptation paradigm,which foresees that the components of a system arecapable of configuring, protecting and healing them-selves when certain internal and/or external condi-tions demand to (Musil et al., 2017). The process ofself-adaptation follows four principal phases: moni-tor, analysis, plan, execute. Critical events, observedin the systems (in the monitor phase) and opportunelyexamined (in the analysis phase), would trigger spe-cific changes in the system configuration (through theexecute phase), which would follow suitable adapta-tion policies (evaluated in the plan phase). In the con-text of ICT security this approach could allow CPSbeing targeted by security threats to timely identifythe indicators of an attack and swiftly react to containits effects, reducing its impact. In this scenario, em-ploying ÆCID in the monitoring and analysis phase,would be an asset. Thanks to their light-weight na-ture, AMiner instances could be installed on numer-ous low-power components deployed across the CPS,which would not be able to run any traditional IDS.By recording system events they would allow to havean accurate and comprehensive overview of the se-curity situation of the entire CPS in real-time. Theanomalies identified by ÆCID Central would then beevaluated and, in line with predefined security poli-cies, trigger configuration changes in the monitoredCPS, that would contain the effect of the detectedthreat.

Finally, the system behavior model built by ÆCIDCentral, makes it possible for ÆCID to identify criti-cal security events previously unseen, which may po-tentially indicate the occurrence of a zero-day attack.Integrating ÆCID with a threat intelligence (TI) man-agement system would allow security operation cen-ters to greatly improve their incident handling capa-bility. Indicators of compromise (IoC), obtained byinspecting the anomalies revealed by ÆCID, could infact be combined and correlated with the intelligence

gathered from multiple data sources by TI manage-ment solutions (such as the tool proposed in (Settanniet al., 2017)). This correlation is fundamental to in-terpret the IoCs, confirm the occurrence of an attack,and identify possible mitigation strategies. Addition-ally, this integrated framework would allow to dynam-ically reconfigure any deployed monitoring system inorder to center their focus towards those critical as-sets vulnerable to the discovered threat. Eventually,this solution would also support the definition of at-tack signatures, which would be used to update anyblack-listing security solutions deployed in the infras-tructure, and guarantee a higher level of protection.

7 Conclusion and Outlook

New light-weight forms of IDS, capable of auto-matically comprehending and dynamically adapt tothe behavior of large-scale distributed systems, area promising means to tackle today’s advanced secu-rity threats. In this paper, we discussed the designprinciples a modern anomaly detection system shouldfollow to be effective. Furthermore, we introducedthe ÆCID approach, a light-weight detection mech-anism based on self-learning log parser models. Wepresented ÆCID’s system architecture, and we out-lined the advantages this approach provides in termsof detection effectiveness, deployment flexibility, andapplicability. Several use case scenarios demonstratehow ÆCID would greatly improve the security pos-ture of modern organization, allowing them to achievea higher degree of protection against advanced cyberthreats.

Parts of ÆCID are open source and are alreadyincluded in Debian Linux and Ubuntu; future workincludes the implementation of the system in opera-tional controlled environment, its validation in differ-ent complex application scenarios, and the evaluationof its performance on large-scale distributed systems.

ACKNOWLEDGEMENTS

This work was partly funded by the FFG project syn-ERGY (855457), European research project SemI40(ECSEL joint undertaking nr. 692466) and carriedout in course of a PhD thesis at the Vienna Univer-sity of Technology funded by the FFG project BAESE(852301).

REFERENCES

Atzori, L., Iera, A., and Morabito, G. (2010). The internetof things: A survey. Computer networks, 54(15).

Friedberg, I., Skopik, F., Settanni, G., and Fiedler, R.(2015). Combating advanced persistent threats: Fromnetwork event correlation to incident detection. Com-puters & Security, 48:35–57.

Garcıa-Teodoro, P., Dıaz-Verdejo, J., Macia-Fernandez, G.,and Vazquez, E. (2009). Anomaly-based networkintrusion detection: Techniques, systems and chal-lenges. Computers & Security, 28(1-2):18–28.

Goldstein, M. and Uchida, S. (2016). A Comparative Evalu-ation of Unsupervised Anomaly Detection Algorithmsfor Multivariate Data. PloS one, 11(4):e0152173.

James P. Anderson (1980). Computer security threat moni-toring and surveillance. Technical Report 17, James P.Anderson Company, Fort Washington, Pennsylvania.

Liao, H.-J., Richard Lin, C.-H., Lin, Y.-C., and Tung, K.-Y.(2013). Intrusion detection system: A comprehensivereview. Journal of Network and Computer Applica-tions, 36(1):16–24.

Mitchell, R. and Chen, I.-R. (2014). A survey of intrusiondetection techniques for cyber-physical systems. ACMComputing Surveys (CSUR), 46(4):55.

Musil, A., Musil, J., Weyns, D., Bures, T., Muccini, H.,and Sharaf, M. (2017). Patterns for self-adaptationin cyber-physical systems. In Multi-DisciplinaryEngineering for Cyber-Physical Production Systems,pages 331–368. Springer.

Sabahi, F. and Movaghar, A. (2008). Intrusion Detection:A Survey. pages 23–26. IEEE.

Scarfone, K. and Mell, P. (2007). Guide to intrusion de-tection and prevention systems (idps). NIST specialpublication, 800(2007):94.

Settanni, G., Shovgenya, Y., Skopik, F., Graf, R., Wurzen-berger, M., and Fiedler, R. (2017). Acquiring cyberthreat intelligence through security information corre-lation. In Cybernetics (CYBCONF), 2017 3rd IEEEInternational Conference on, pages 1–7. IEEE.

Symantec (2016). ISTR Internet Security Threat Report.Technical Report 21, Symantec Corporation.

Vaarandi, R. (2003). A data clustering algorithm for miningpatterns from event logs. In IP Operations & Man-agement, 2003.(IPOM 2003). 3rd IEEE Workshop on,pages 119–126. IEEE.

Vacca, J. R. (2013). Managing information security. Else-vier.

Verizon (2016). 2016 Data Breach Investigations Report.Technical report, Verizon.

Whitman, M. E. and Mattord, H. J. (2012). Principles ofinformation security. Course Technology, CengageLearning, Stamford, Conn., 4. ed., international ededition. OCLC: 930764051.

Wurzenberger, M., Skopik, F., Landauer, M., Greitbauer, P.,Fiedler, R., and Kastner, W. (2017). Incremental clus-tering for semi-supervised anomaly detection appliedon log data. In Proceedings of the 12th InternationalConference on Availability, Reliability and Security,page 31. ACM.

AECID: A Self-learning Anomaly Detection Approach Based on ... · bility of anomaly detection...

Documents

Transcript of AECID: A Self-learning Anomaly Detection Approach Based on ... · bility of anomaly detection...