Splunk for cyber_threat

2
SOLUTIONS GUIDE Splunk ® for Cyber Threat Analysis A Big Data Approach to Enterprise Security Challenge of Discovering Known and Unknown Threats In today’s cyber battlefield a vast amount of information is commonly processed, aggregated and correlated to identify security incidents collected from the IT architecture. This eort largely represents looking for known threats—looking for incidents that have been pre-defined as security threats. The cyber analyst sets up behavioral rules that identify and match a level of response that is appropriate for a given security incident. These rules are commonly present in the detection technology itself or may be implemented via a security information and event management (SIEM) technology. From an enterprise security point of view, this methodology of aggregation and correlation is often targeted at the tier-1 data center level, which operates as the front-line defense of your IT security. The combination of human assets and technology falls under the broad term of CND (or computer network defense) and has represented the baseline for all SecOPS over the years. While current technologies and methods are still somewhat eective in identifying breeches, attackers have changed their methodologies and have made the “what you know” proposition much more dicult to quantify. Compounding the issue is the explosion of unstructured data from increasingly complex technologies that often do not fit nicely into the structured world of SIEM, which can impose artificial restrictions on the collection of specific data types and provide little visibility into attack patterns and context. In response to more sophisticated attacks, a new kind of cyber threat analyst has emerged operating at the tier-3 level. This analyst functions as a “security intelligence analyst” and is often called upon to perform detailed analysis upon a security incident. Rather than the point-in-time / predetermined analysis of the tier-1 analyst, the intelligence analyst must consider threats against a much larger pool of information, some machine generated and some human generated, over a significantly longer period of time. The unfortunate truth is that the pre-defined tools of the tier-1 analyst, which are designed to reduce the amount of data for analysis, are not suitable for the investigative needs of the security intelligence analyst. A Big Data Approach to Discovering Unknown Threats While Splunk can certainly address the tier-1 needs of reduction and correlation, Splunk was designed to support a new paradigm of data discovery. This shift rejects a data reduction strategy in favor of a data inclusion strategy. This supports analysis of very large datasets through data indexing and MapReduce functionality pioneered by Google. This gives Splunk the ability to collect data from virtually any available data source without normalization at collection time and analyze security incidents using analytics and statistical analysis. Other Splunk functionality often leveraged for threat analysis includes: Indexed data storage with automated field extraction. Splunk does not store data in a traditional schema-based row and column format: events are free to be interpreted as they are. This is especially important where the event presents ‘ multi-value’ fields such as an event that can write multiple values for the same field in the same event. This is a common issue in data sources that track SMTP addresses. The addresses the data sources contain are often variable. Using Splunk, each of these would be extracted out separately regardless of the actual event. Statistical analysis command language. Splunk oers a ‘ search language’ rather than an SQL-style query language. While an SQL language is adequate for searching what you know (such as values in columns that are indexed) it is not adequate for handling ad-hoc queries since it is a very structured language designed to blindly ‘dump’ the contents of a cell. In contrast, the Splunk search language oers a much greater freedom in formulating questions on the fly with a search-friendly interface that is focused more on acquiring answers rather than formatting questions. Additionally, much of the search language is designed to manipulate the data not just save it. For instance, the Splunk stats command can process a field any number of ways such as averaging, first value, list, max, mean, mode, percentile, per-hour, range, standard deviation, sum and variance—just to name a few. The ability to ask nearly any conceivable question of the data rather than simply dumping the data is a key capability for threat analysis. Add knowledge to make Splunk smarter. The Splunk function of tagging, when combined with the ability to scale to incredibly large datasets allows threat analysts to classify data independent of its source. This can be as simple as classifying a particular IP address as ‘hostile,’ which then gets turned into an IP-hostile report or classified by IP address report that can be analyzed separately. Since tagging is performed at search time rather than at index time, you can view data by dierent

Transcript of Splunk for cyber_threat

Page 1: Splunk for cyber_threat

S O L U T I O N S G U I D E

Splunk® for Cyber Threat AnalysisA Big Data Approach to Enterprise Security

Challenge of Discovering Known and Unknown ThreatsIn today’s cyber battlefield a vast amount of information is commonly processed, aggregated and correlated to identify security incidents collected from the IT architecture. This e!ort largely represents looking for known threats—looking for incidents that have been pre-defined as security threats. The cyber analyst sets up behavioral rules that identify and match a level of response that is appropriate for a given security incident. These rules are commonly present in the detection technology itself or may be implemented via a security information and event management (SIEM) technology.

From an enterprise security point of view, this methodology of aggregation and correlation is often targeted at the tier-1 data center level, which operates as the front-line defense of your IT security. The combination of human assets and technology falls under the broad term of CND (or computer network defense) and has represented the baseline for all SecOPS over the years.

While current technologies and methods are still somewhat e!ective in identifying breeches, attackers have changed their methodologies and have made the “what you know” proposition much more di"cult to quantify. Compounding the issue is the explosion of unstructured data from increasingly complex technologies that often do not fit nicely into the structured world of SIEM, which can impose artificial restrictions on the collection of specific data types and provide little visibility into attack patterns and context.

In response to more sophisticated attacks, a new kind of cyber threat analyst has emerged operating at the tier-3 level. This analyst functions as a “security intelligence analyst” and is often called upon to perform detailed analysis upon a security incident. Rather than the point-in-time / predetermined analysis of the tier-1 analyst, the intelligence analyst must consider threats against a much larger pool of information, some machine generated and some human generated, over a significantly longer period of time. The unfortunate truth is that the pre-defined tools of the tier-1 analyst, which are designed to reduce the amount of data for analysis, are not suitable for the investigative needs of the security intelligence analyst.

A Big Data Approach to Discovering Unknown Threats While Splunk can certainly address the tier-1 needs of reduction and correlation, Splunk was designed to support a new paradigm of data discovery. This shift rejects a data reduction strategy

in favor of a data inclusion strategy. This supports analysis of very large datasets through data indexing and MapReduce functionality pioneered by Google. This gives Splunk the ability to collect data from virtually any available data source without normalization at collection time and analyze security incidents using analytics and statistical analysis.

Other Splunk functionality often leveraged for threat analysis includes:

Indexed data storage with automated field extraction. Splunk does not store data in a traditional schema-based row and column format: events are free to be interpreted as they are. This is especially important where the event presents ‘multi-value’ fields such as an event that can write multiple values for the same field in the same event. This is a common issue in data sources that track SMTP addresses. The addresses the data sources contain are often variable. Using Splunk, each of these would be extracted out separately regardless of the actual event.

Statistical analysis command language. Splunk o!ers a ‘search language’ rather than an SQL-style query language. While an SQL language is adequate for searching what you know (such as values in columns that are indexed) it is not adequate for handling ad-hoc queries since it is a very structured language designed to blindly ‘dump’ the contents of a cell. In contrast, the Splunk search language o!ers a much greater freedom in formulating questions on the fly with a search-friendly interface that is focused more on acquiring answers rather than formatting questions. Additionally, much of the search language is designed to manipulate the data not just save it. For instance, the Splunk stats command can process a field any number of ways such as averaging, first value, list, max, mean, mode, percentile, per-hour, range, standard deviation, sum and variance—just to name a few. The ability to ask nearly any conceivable question of the data rather than simply dumping the data is a key capability for threat analysis.

Add knowledge to make Splunk smarter. The Splunk function of tagging, when combined with the ability to scale to incredibly large datasets allows threat analysts to classify data independent of its source. This can be as simple as classifying a particular IP address as ‘hostile,’ which then gets turned into an IP-hostile report or classified by IP address report that can be analyzed separately. Since tagging is performed at search time rather than at index time, you can view data by di!erent

Page 2: Splunk for cyber_threat

www.splunk.com

S O L U T I O N S G U I D E

250 Brannan St, San Francisco, CA, 94107 [email protected] | [email protected] 866-438-7758 | 415-848-8400 www.splunkbase.com

Copyright © 2012 Splunk Inc. All rights reserved. Splunk Enterprise is protected by U.S. and international copyright and intellectual property laws. Splunk is a registered trademark or trademark of Splunk Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies. Item # SG-Splunk-Security-106

time slices–this is especially important for handling “watch lists.” While these lists might change on a daily basis, the relevant data collected against them can extend back months. Splunk can also ‘learn’ IP address changes for malicious websites through correlation of DNS and Netflow data.

Add data for event context. Lookup tables provide another invaluable function to the threat analyst. Lookup tables allow repository data to be merged with event data. For example, a repository of human resources data such as name, phone number and physical location can form the lookup based on the MAC address of a computer. Since lookups can also be temporal in nature, an IDS event can be used to look up DHCP data to acquire the MAC address--which can then be used to lookup the HR data. Thus, every IDS event from an internal node can be associated temporally with a name, contact and location. Splunk can dynamically create these tables based on event data and monitor them for any length of time. Data access procedures and processes can be monitored and given context without the manual e!ort involved with piecing together all the data that must be collected. In today’s environment, users are often assigned multiple devices. Using the above method and lookups to Active Directory or an HR database, a threat analysts would be able to ask the data to “Show all devices for ‘Bill’ across the IT architecture and determine process violations.”

Accelerate forensic analysis across data types. Associated with lookup tables, workflows actions allow interactions between fields and other web sources. For instance, a workflow might be created to perform a WHOIS on an IP address or perhaps a click-on-demand function to request “port details” from the Internet Storm Center. Acquisition of third-party information in a timely fashion is another key to the success of the threat analyst.

Collect data when you want without altering its format. Depending on vendor support for specific data types is the number one complaint of many security practitioners. Splunk is data agnostic. No normalization is required for Splunk to gather data. As long as the data is ASCII or is UTF-8 compliant, Splunk will consume data much like a human consumes data -- if it’s readable, it’s consumable. While this is very handy for bringing in any dataset that might be present during an investigation with a minimum of work, for the threat analyst, it represents an ability to think outside the box—by bringing all the data in the enterprise architecture to bear in a specific problem. COTS products often miss threats that only present themselves as abnormal patterns in normal IT data. Sometimes it takes ingenuity, creativity and out-of-the-box thinking when dealing with threats that can hide behind normal credentialed user activities. Splunk is the technology that facilitates such thinking.

S O L U T I O N S G U I D E

Successful security intelligence analysts must be agile and adept at thinking “outside of the box.” Additionally, they must be capable of considering a wide range of data that often changes during the course of the investigation. Splunk is a platform designed to facilitate these requirements and provide the threat analyst the ability to use any and all IT data to accomplish their mission objectives such as:

Perform research on adversarial threats posed to various systems, technologies, operations or missions in appropriate intelligence sources

Analyze collected data to derive facts, inferences and projections concerning capabilities, intentions, attack approaches, and likelihood of various adversarial attacks under various situations

Use context to more accurately determine false-positives and false-negatives.

Research resource allocations, motivations, tendencies, personalities and tolerance for detection, attribution and retribution that influence adversarial decisions

Contribute to profiling adversarial behavior with respect to identified system attacks in an operational mission context

Produce formal and informal reports, briefings, and perspectives of the behavior of adversaries against target systems, technologies, operations and missions

Free DownloadDownload Splunk for free. You’ll get a Splunk Enterprise license for 60 days and you can index up to 500 megabytes of data per day. You can convert to a perpetual Free license or purchase an Enterprise license by contacting [email protected].