Post on 09-Apr-2018
Self Learning NetworksJeff Apcar, Distinguished Services Engineer
BRKRST-2122
(Acknowledgements to JP Vasseur, PHD, Cisco Fellow and the SLN Team)
• What is a Self Learning Network?
• Security Trends In Advanced Malware
• Anomaly Detection
• SLN Architecture
• Graph Based Visibility
• The SLN User Interface
• Conclusion
Agenda
• SLN is fundamentally a hyper-distributed analytics platform
• Putting together analytics and networking
• The network learns without a human in the loop
• We have developed many sophisticated network functions
• But no real learning incorporated
• True technology disruption & unique competitive landscape ...
What Is Self Learning About?
It started with IoT Network Use CaseHarsh environments: instability of links, limited bandwidth, constrained nodes, stochastic networks (random probability distribution)
Still need for some determinism, tight SLA and hyper-scale networks
And the network needs to be adaptive: every single network is different !
From this SLN was incubatedDeployed IoT network, 800 nodes, April 2013
Some Deployment Scenarios
Predict network behavior and traffic patterns based on multivariable and time-based modeling
Automatically select and optimize network path in real-time, adapt QoS, based on Business SLAs
IWAN Path
Optimization
Detect of multi-layer subtle DoS attacks andAnomaly Detection
Auto learn new threats
Massively Distributed, Global, real-time protection
SLNInternet Behavioural
Analytics for Security
Predictive models for large scale networks, enable:
• High performance
• High Resiliency
• Detection of disruptive subtle DDoS attacks
IoT/IoE
• Fundamentally distributed, building models for visibility and detection at edge
• Mix of Machine Learning (ML) and Threat Intelligence
• Enrichment of context (using Identify Services Engine (ISE) integration)
• Ability to adapt to user feed-back (Reinforcement Learning)
• Advanced control handling networking complexity
SLN Architecture Principles For Security
• Multi-layered defense architectures no longer sufficient to prevent breaches caused by advanced malware ...
• No longer a question of “if” or “when” but “where” ...
• Many of the well-known assumptions are no longer true
• eg. Attacks come from the outside, deterministic, well understood
• Attacks are more and more “subtle” (Hard to detect ...)
• Signature-based architectures vulnerable to mutating attacks (polymorphic)
• Dramatic increase of the number of 0-day attacks
Why Predictive Analytics?
• Why learning ?
• The network is truly adaptive thanks to advanced analytics
• Why a paradigm shift ?
• Move from Trial-and-Error model to a proactive approach using models built using advanced analytics
• The hard part is not just the “analytics” but the underlying architecture for self-learning and the “how to”
What Is a Self Learning Network (SLN)?
ReconAttack
Delivery
Redirect
Exploit
Kit
Dropper
File
Comm &
Control
(C2)
Internal
Recon
Data
Exfiltration
Kill Chain Model For Information Security*
*Adapted by Lockheed Martin to model intrusions on a computer network
Research of targets, vulnerabilities, port scanning… Phishing, Spear Phishing, Watering Hole
Safe looking sites containing exploits
After “clicking”, software scans for
0-Day and known vulnerabilities
Malware gets access to target
Uses sophisticated evasive techniques
Determines environment (human/sandbox)
Connection between computer and attacker
Receive commands, download programs
Many C2 techniques can be used, fast flux etc…
Attempt to spread across other
systems and retrieve new credentials
Overt and convert channels to transfer
Data. Complex mechanisms used like
steganography, hiding inside protocols1 2
3
4
56
7
8
• Botnet groups of computers that run malware unknown to owners
• Ranges from thousands to millions of compromised hosts
• Exfiltration is unauthorised information transfer out of target network
• C&C (C2) servers become increasingly evasive
• Fast Flux Service Networks (FFSN), single or double Flux
• DGA-based malware (Domain Generation Algorithms)
• DNS Tunneling, P2P protocols, Anonymised services (Tor)
• Steganography, potentially combined with Cryptography
• Social media updates or email messages
• Mixed protocols , Timing Channels
Botnets & Exfiltration Techniques
C&C Server(s)
Distributed Denial-of-Service Attacks
C&C
Direct Attack
Originates from compromised
hosts or directly from attacker
Send spoofed request to a
server, response to target
Small spoofed request,
produces large reply
Reflection Attack Amplification Attack
Target
Botnet
C&C
Target
Botnet
Attacks can be a anyone or a combination of the above
C&C
Target
NTP
Server
MONLIST?
Attacker Attacker Attacker
Reflectors
• Infected host queries DNS for C&C server FQDN
• Hardcoded or from Domain Generation Algorithm (DGA)
• Authoritative C&C DNS controlled by Botnet master
• DNS reply has very short TTL (a few minutes)
• Botnet members used as C&C relays
• Cycles very quickly through C&C relay hosts due to TTL
• Rapidly-changing, optimised set of C&C endpoints
• Greatly reduces possibility of C&C server takedown
• Still possible to take down the C&C DNS server(s)
Example: Fast Flux DNS - Single Flux
Query: “botnet.tld”
“ns.botnet.tld”
Root DNS
Query: “cc.botnet.tld”
Fixed
C&C DNS
“10.1.1.1”
Query: “cc.botnet.tld”
“10.2.2.2”
TTL Timeout…
C&C Relay
(10.1.1.1)C&C Relay
(10.2.2.2)
Actual
C&CBotnet
Master
Example: Fast Flux DNS – Double Flux
Query: “56W-jes.tld”
“10.3.3.3”
Root DNS
Query: “cc.56W-jes.tld”
Actual C&C
DNS
“10.1.1.1”
“10.2.2.2”
TTL Timeout…
C&C Relay
(10.1.1.1)C&C Relay
(10.2.2.2)
Actual
C&CBotnet
Master
DNS Relay
(10.3.3.3)
DNS Relay
(10.4.4.4)
• Double flux adds changing DNS for C&C domain
• DNS Botnet Master updates low-TTL NS entries
• Through permissive DNS registrar
• Some botnet members are DNS relays, others are C&C relays
• Massively complicates botnet takedown
• DGA introduces further complexity
Generated
Domain
Query: “cc.56W-jes.tld”
Domains
C&C
Relay
DNS
Relay
Anomalies are patterns/data points that do not conform to expected behaviour
In algorithmic terms, they significant statistical deviations
Model what is normal behaviour
Normal behaviour changes over time and anomalies evolve (adaptive models required)
Recognising malicious behaviourthat adapts to look “normal”
Differentiate noise from anomalies
SLN allows detection of previously unknown malware complimentary to firewalls & IDS
SLN Anomaly Detection
Challenges
Collective AnomaliesA collection of an instance that is
abnormal in regards to an entire set
Categories Of Anomalies
Point AnomalyA specific instance is abnormal compared to other instances
Contextual AnomalyA data instance looks normal individually, but abnormal when taken in a specific context
Visualisation
SLN Centralised
Agent (SCA)
Da
tace
nte
r a
nd
clo
ud
Priva
te/P
ub
lic
Ne
two
rk
Ne
two
rk E
dg
e
(LA
N, W
PA
N)
Controller
infrastructureSLN Architecture
SCA
Public/Private
Internet
DLADLA
DLA Distributed
Learning Agent
• Granular data collection with knowledge extraction from
various sources (DPI, Routing…) to build a model
• Analytics and learning
• Edge Control Architecture (ECA): autonomous embedded
control, fast close loop, advanced networking control (police,
shaper, recoloring, redirect, ....)
• Application hosted devices (SDN)
• Orchestration and interaction with
remote learning agents (DLA)
• Advanced Visualisation
• Centralised policy
• Interaction with other security
components (ISE, Threat Intelligence)
• DLA has been designed for low footprint both in terms of memory and CPU
• Feature computation, ID & classification are performed locally
• Lightweight techniques emplyed with no significant impact on the edge device
The DLA Can Have Many Data Sources
DLA
NBAR
Netflow
DPI
Data &
Control
Planes
Local
State
Adaptive Firewall w/ AMP
• Component to SLN
• Enhanced context, ML+Threat Intelligence
• Edge Control
SCA Context Enrichment
SCAFire
Power
DLA
Identity Services Engine
ISE
Advanced Malware ProtectionThreat
Grid
Advanced Malware ProtectionEdge
AMP
DNS/IP BlacklistsTalos
Feed
Feed for SLN
Edge Control
Reprogramming the network fabric (install new
rules…) + close loop feed-back.
Username, domain, location, time
Edge Control: shape, police, drop,
redirect, ... Reroute, VLAN, ...
Edge Learning: models normal
traffic, graph-based anomalies.
Trigger for traffic mitigation
Controller
infrastructure
On-Premise Edge Control
SCA
Public/Private
Internet
DLADLA
DLA
Control Policy
Smart Traffic flagging
According to {Severity, Confidence,
Anomaly_Score}
Traffic segregation & selection
Network-centric control (shaping, policing,
divert/redirect)
Honeypot
(Forensic Analysis)
DSCP ReWrite
CBWFQDSCP ReWrite
CBWFQ
Shaping
• Graph structure usage in SLN:• Infer information on the roles of hosts
• Detect unusual change in graph structure
• Capturing and modeling the dynamic of
the flows in the network
What Is Graph-Based Visibility?Vertices (Nodes, hosts, sites, disks, router, laptop…)
Edge (Comms links, flows…)
SLN Usage Examples
Vertices Clusters Group of devices
Physical: Country, City, Building, DC, Floor,
Logical: DC Function, Organisational Group
Edges Traffic Flows Applications, protocols, bandwidth, seasonality
• Lack of visibility of flows between remote sites (clusters) is an issue
• Difficult to detect Malware/data exfiltration involving lateral movement
Visualising Anomaly Detection Process
Cluster
A
Cluster
B
Application Flows
Type Frequency Behaviour
SLN Process
Netflow,
DPI, Local
state
Constuct
statistical
model of
normal traffic
Deviations to
the model
Establish likely
root cause of
deviation
Extract
measurable
characteristics
Red Graph: Anomaly
Green Graph: Shell
Blue Graph: VoIP
Grey Graph: FTP/SCP
Visualising Likely/Unlikely Flows
Sydney
Chicago
Raleigh
Data
Centre
San
Diego Dallas
Data
Centre
New
York
Beijing
Belgium
• Static Versus Dynamic cluster computation
• ML algorithms are used to computed inter-
cluster relationship
• Colored graphs
• Simple property of likelihood
DLA
Green Graph: Shell
Blue Graph: VoIP
Grey Graph: FTP/SCP
Visualising Seasonality
Sydney
Chicago
Raleigh
Data
Centre
San
Diego Dallas
Data
Centre
New
York
Beijing
Belgium
AM PM AM PM AM PM AM PM AM PM AM PM AM PM
22 23 24 25 26 27 28
AnomalyAnomaly
• ML computes models of seasonality per application
DLA
Grey Graph: FTP/SCP
Visualising Behavioural Analytics
Sydney
Chicago
Raleigh
Data
Centre
San
Diego Dallas
Data
Centre
New
York
A vector of properties (models of behavioural analytics)
DLA
Host
Application
What Are Features and Vectors?Features Measureable characteristics of network state
Vectors Highly dimensional spaces made up of features
Sample of Features for
Dest & Source
Number of flows
Unique IP addresses
Unique ports
Entropy of ports
Entropy of DNS names
Proportion of HTTP ports
Proportion of DNS ports
Number of bytes
Number of DNS requests
Number of DNS replies
For hosts or
applications
Feature ConstructorFeature Vector
(Many dimensions)
Application Categoriesnetwork
office
other
p2p
proxy
remote
shell
social
tls
tunneling
voip
windows
www
auth
cloud-apps
cloud-storage
dns-tunneling
darknet
dns
database
endpoint
file-xfer
game
icmp
im
malware
media
ssh
exec
login
shell
Secure-telnet
telnet
gridftp
winmx
sftp
uucp-path
bftp
ftp-data
nfs
ftp
ftp-agent
sift-uft
tftp
gopher
Host Anomalies Using Feature Vectors
Feature Value
HTTP Flows MANY
Neighbours MANY
Feature Value
HTTP Bytes FEW
Neighbours FEWFeature Value
HTTP Flows MANY
Neighbours FEW
Hosts whose behaviour do not match known models
(cannot be explained) may point to an anomaly
Use Case Scenarios
Sydney
Chicago
Raleigh
Data
Centre
San
Diego Dallas
Data
Centre
Beijing
Detection of unlikely flow (e.g., ftp) between two specific clusters
Identification of the username, location, device via ISE
Identification of known threats (AS, IP@, Domains) !!
Edge Mitigation
Scenario 1 – Unlikely flows
Detection of a change in DNS usage (e.g., volume, content of
DNS requests) not compliant with the learned model
Or fake VoIP, stealing bit headers, ...
Detection of seasonality changes ...
Identification of known threats (AS, IP@, Domains) !!
Edge Mitigation
Scenario 2 – Change in DNS usage
DLA
Reinforcement learning
SLN Targeted Outcomes For The User
Normal behaviours Anomalous Behaviours
Who talks to whom? Detection of new applications where never used
Active applications between clusters Detection of abnormal behaviours (data
exfiltration)
Applications displaying seasonal behaviours Adaption: Is abnormal event of interest?
Additional application characterisation Upon detecting anomalies; Explain why? What
has changed?
Contextual data; usernames, domains… Ability to perform advanced control
(Shape, route, redirect…)
SLN adapting to user expectations
Affected Internal Cluster
Suspicious External Cluster in Hungary
3 flows, DNS, NFS and SUNRPC
Spiritual health website, but no HTTP flow?
Unexpected Edge (Flow)
Port scan between branch and corporate host
Two instances of targeted port scan
(Portmapper)
(Printing)
(HTTPS)
(HTTP)
(PPTP)
(DCE/RPC)
(DNS)
(FTP)
(Kerberos)
(RDP)
(Telnet)
(RFB)
(SSH)
1
2
Many characteristics of a scan can trigger detection:
• Variety of ports and/or destinations (network sweep)
• Short flows (e.g. SYN, SYN/ACK, RST)
• Incomplete L7 handshake (no application PDUs exchanged)
• Unusual applications/protocols spoken by scanning host
• Unusually high rate of new flows
SLN does not look for network scans
specifically, but detects them because they are
different from a host’s usual behavior
Port Scan
Rapid TCP DNS Requests
Same host performing TCP DNS request in
parallel with normal UDP request
Normal UDP DNS requests from host
Behaviour similar to DGA based malware
evading detection via DNS over TCP
DPI detects entropy of names etc…
Repeated SSH Login Atempts
Large number of new flows anomaly
Many SSH attempts (Port 22), short flows (not getting past login)
Could indicate brute force attack
Anomalous Connection on port 2201 detected as SSH
SLN leverages NBAR to recognise protocols on non-
standard ports
Unusual # of SNMP queries
Unusual number of packets per flow
Normal SNMP, short in duration
1364 SNMP packets indicates change of
behaviour on host (snmp walk?)
• SLN is a disruptive approach for malware detection using behavioral analytics, relying on dynamic learning, fully auto-adaptive
• Network data is analysed locally by SLN using advanced and lightweight analytics
• The router can perform local mitigation
• Lightweight and distributed architecture that is scalable
• Visualisation is key with simple understandable UI
Summary
Complete Your Online Session Evaluation
Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at CiscoLiveAPAC.com
• Give us your feedback and receive a Cisco 2016 T-Shirt by completing the Overall Event Survey and 5 Session Evaluations.– Directly from your mobile device on the Cisco
Live Mobile App
– By visiting the Cisco Live Mobile Site (URL)
– Visit any Cisco Live Internet Station located throughout the venue
– T-Shirts can be collected in the World of Solutions on Friday from 12:00pm - 2:00pm