Needle in a Haystack: Hunting Mobile Theater Missiles on ...
Finding the needle in the haystack with ELK...Finding the needle in the haystack with ELK...
Transcript of Finding the needle in the haystack with ELK...Finding the needle in the haystack with ELK...
S
Finding the needle in the haystack with ELK
Elasticsearch for Incident Handlers and Forensic Analysts
Whoami
S Working for the Belgian Government my own company S Incident Handling S Malware analysis
S Forensics (network + system)
S Open Source minded
S Creator of MISP – Malware Information Sharing Platform
S Creator of pystemon – pastebin monitoring tool
S Core organizer of the FOSDEM conference for many years
S Contact me: [email protected]
S
Finding the needle in the haystack with ELK
Elasticsearch for Incident Handlers and Forensic Analysts
image by James Lumb
What tools do you use?
S Text logs
S notepad
S Grep
S awk / sed / cut
S MS Excel / OOo Calc
image by velorichard.wordpress.com
Optimizing
S grep -F log.txt
S zgrep -F log.txt
S zgrep -f patterns.txt -F log.txt
S find "$LOGS_DIR" -iname "*.gz" -print0 | parallel --gnu -0 -n1 -P8 zgrep -f patterns.txt –F > result-all.txt
S Fast for single search, however no column lookup !
Optimizing
S MySQL / MS Access
S Splunk S free = 500MB/day
S ELSA – Enterprise Log Search and Archive S Limitation of the # of columns
S ${COMMERCIAL_TOOL}
Trick for Splunk Addicts
S Limit is 500 MB /day
S 3 license violations allowed per month
S Set the date to 00:01 AM
S Index as much as possible 24h/day for 3 days (while loops are your friend)
S Enjoy searching
logstash kibana
Trick for all = ELK
S Elasticsearch Logstash Kibana
S Index as much as you want
S No limit on volume, speed or position of the moon
S Open Source, Free to use, commercial support
Configurations
S https://github.com/cvandeplas/ELK-forensics
S Repository with Logstash and Kibana configurations
S Mactime, BlueCoat, Mail IMSS, IWSVA, IIS, SuperTimeline, Plaso, …
S http://christophe.vandeplas.com/2014/06/setting-up-single-node-elk-in-20-minutes.html
S Our focus today: S Forensics and Incident Handling
S Batch-Import
S
How does it work?
logstash kibana
Trick for all = ELK
S Elasticsearch Logstash Kibana
S Index as much as you want
S No limit on volume, speed or position-of-the-moon-licensing
S Open Source, Free to use, commercial support
Inputs
S Inputs & codecs S collectd, drupal_dblog, elasticsearch, eventlog, exec, file,
ganglia, gelf, gemfire, generator, graphite, heroku, imap, invalid_input, irc, jmx, log4j, lumberjack, pipe, puppet_facter, rabbitmq, rackspace, redis, relp, s3, snmptrap, sqlite, sqs, stdin, stomp, syslog, tcp, twitter, udp, unix, varnishlog, websocket, wmi, xmpp, zenoss, zeromq
S cloudtrail, collectd, compress_spooler, dots, edn, edn_lines, fluent, graphite, json, json_lines, json_spooler, line, msgpack, multiline, netflow, noop, oldlogstashjson, plain, rubydebug, spool
S Outputs
S Filters
Input Example
S I usually don’t use “file” as input
S Keeps a reference to the position in the file
S TCP socket is the easiest for me
S ncat log01.lab.internal 18001 < logfile.log!
Outputs
S Inputs & codecs
S Outputs S boundary, circonus, cloudwatch, csv, datadog,
datadog_metrics, elasticsearch, elasticsearch_http, elasticsearch_river, email, exec, file, ganglia, gelf, gemfire, google_bigquery, google_cloud_storage, graphite, graphtastic, hipchat, http, irc, jira, juggernaut, librato, loggly, lumberjack, metriccatcher, mongodb, nagios, nagios_nsca, null, opentsdb, pagerduty, pipe, rabbitmq, rackspace, redis, redmine, riak, riemann, s3, sns, solr_http, sqs, statsd, stdout, stomp, syslog, tcp, udp, websocket, xmpp, zabbix, zeromq
S Filters
Output Example
Filters
S Inputs & codecs
S Outputs
S Filters S advisor, alter, anonymize, checksum, cidr, cipher, clone,
collate, csv, date, dns, drop, elapsed, elasticsearch, environment, extractnumbers, fingerprint, gelfify, geoip, grep, grok, grokdiscovery, i18n, json, json_encode, kv, metaevent, metrics, multiline, mutate, noop, prune, punct, railsparallelrequest, range, ruby, sleep, split, sumnumbers, syslog_pri, throttle, translate, unique, urldecode, useragent, uuid, wms, wmts, xml, zeromq
Filter Example
Filter Example
Grok
S Named regular expressions to match patterns/extract data.
S Logstash ships with lots of patterns ! https://github.com/elasticsearch/logstash/tree/master/patterns
S Test app: http://grokdebug.herokuapp.com
Testing complex Groks
Data Enrichment with Filters
S Extract fields: csv, grok, kv!
S Extract date!
S Modify using mutate!
S Enrich with S Geoip
S User-agent
S Urldecode
S Translate
S …
Geoip
Geoip
User-Agent
User-Agent
Translate
Translate
Ruby as last resort
* There might be a better way to do this, but ruby and I are not really friends yet
Data Enrichment with Filters
S Extract fields: csv, grok, kv!
S Extract date!
S Modify using mutate!
S Enrich with S Geoip
S User-agent
S Urldecode
S Translate
S …
logstash kibana
Trick for all = ELK
S Elasticsearch Logstash Kibana
S Index as much as you want
S No limit on volume, speed or season-licensing
S Open Source, Free to use, commercial support
Elasticsearch
S Wikipedia: Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.
S Very very fast
S Adding an node = easier than extremely easy
Elasticsearch
S Be cautious
S No security by default
S Auto-discovery, auto-distribution if other node is present
S Elastic HQ plugin S cd /usr/share/elasticsearch/bin!S ./plugin -install royrusso/elasticsearch-HQ!
logstash kibana
Trick for all = ELK
S Elasticsearch Logstash Kibana
S Index as much as you want
S No limit on volume, speed or horoscope-licensing
S Open Source, Free to use, commercial support
Kibana
S Fancy GUI
S Extremely easy to build up a dashboard
S Gives good overview over data
S Powerful, but limited in capability
S For more: write a python script or use REST API
DO NOT PRESS
THIS BUTTON
Search syntax
S Apache Lucene Search syntax
S title:foo title:"foo bar”
S title:"foo bar” AND body:"quick fox”
S (title:"foo bar" AND body:"quick fox") OR title:fox
S title:foo -title:bar
S title:foo*bar
S time_taken:[10000 TO 999999999]
http://www.lucenetutorial.com/lucene-query-syntax.html
Load dashboards
Filter
S
Performance
Performance goals
S Focus Incident Handling and Forensics
S Max speed of indexing
S Max speed of searching
S During indexation search may be slow
S No need for redundancy
S So don’t use this advice for operations-live-production
Performance Logstash
S Memory setting: (/etc/default/elasticsearch) S LS_HEAP_SIZE="500m"!
S Command line flag: S -w or –filterworkers AMOUNT_OF_CORES (default: 1)!
S Each extra filter slows it down S Grok aka regex = slow
S Prefer csv, kv
S Use the least possible wildcards (* or +)!
S Geoip = slow but very practical
S User-agent = slow, often practical
Performance Elasticsearch
S Memory setting (/etc/default/elasticsearch) S ES_HEAP_SIZE=12g => set to half of RAM (max 32 GB)
S Disable redundancy (/etc/elasticsearch/elasticsearch.yml)
S index.number_of_replicas: 0!
S Shards for number of nodes (/etc/elasticsearch/elasticsearch.yml) S index.number_of_shards: 1
S Increase memory buffer for search S indices.memory.index_buffer_size: 50%!
Perf. Elasticsearch Indexes
S Open Index = memory usage + disk usage Closed Index = disk usage, so close index when not needed
S Per case new indexes Similar logs in the same index, but use a field “host” to differentiate investigations S system timelines: logstash-%{[case]}-%{[type]}
S mail logs: logstash-%{[case]}-%{[type]}-%{+YYYY.MM}
S proxy logs: logstash-%{[case]}-%{[type]}-%{+YYYY.MM.dd}
S curl -XPOST 'localhost:9200/logstash-${case}*/_close' curl -XPOST 'localhost:9200/logstash-${case}*/_open'!
Performance Kibana
S Each block/graph is extra search
S So 10 graphs equals 10 simultaneous searches
1. First select small date/time window
2. Test your search on small data set
3. Add filters
4. Zoom out on date/time
5. Dig deeper
Keep in mind
S Logstash is (relatively) SLOW
S Finished? Close the index, do NOT delete it
S Or save JSON to files (output plugin Logstash), re-index them later
S Node++ = Speed++
S
Forensic analysis
Plaso
S Plaso = the new log2timeline and more
S log2timeline.py win7-64-nfury-10.3.58.6.dump /path/to/disk/image
S psort.py -o elastic win7-64-nfury-10.3.58.6.dump
ELK-forensics
S https://github.com/cvandeplas/ELK-forensics
S Logstash configs
S Kibana dashboards
S Mactime, Log2timeline csv, BlueCoat, Mail IMSS, IWSVA, IIS
S More to come
Other interesting projects using Elasticsearch
S Moloch – Open Source large scale IPv4 full PCAP capturing, indexing and database system. https://github.com/aol/moloch
S Mozdef – PoC – automate IH process and facilitate real-time activities - https://github.com/jeffbryner/MozDef
S Suricata – Exports data in EVE format (JSON). Great to visualize malware activity from sandbox
S
Places to be? • https://github.com/cvandeplas/ELK-forensics • http://www.elasticsearch.org/overview/elkdownloads/ • http://logstash.net/ • https://groups.google.com/forum/#!forum/logstash-users