Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks....
Transcript of Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks....
Chapter 8: On the Use of Hash Functions in
Computer Forensics
Harald Baier
Hochschule Darmstadt, CASED
WS 2011/2012
Harald Baier Hash Functions in Forensics / WS 2011/2012 1/41
Motivation
Foundations of Hash Functions
Use Cases of Hash Functions
Piecewise Hashing
Fuzzy Hashing
Harald Baier Hash Functions in Forensics / WS 2011/2012 2/41
Motivation
Motivation
Foundations of Hash Functions
Use Cases of Hash Functions
Piecewise Hashing
Fuzzy Hashing
Harald Baier Hash Functions in Forensics / WS 2011/2012 3/41
Motivation
Use Case: Prosecution
1. Police and prosecutors confronted with different storagemedia:
I Hard disk drives, solid-state drives, USB sticks.
I Mobile phones, SIM cards.
I Digital cameras, digital camcorders, SD cards.
I CDs, DVDs.
I RAM (dumps).
I ...
2. Amount of distrained data often exceeds 1 terabyte.
Harald Baier Hash Functions in Forensics / WS 2011/2012 4/41
Motivation
Different views of 1 terabyte
1 terabyte of digital text is (approximately) equal to:
1. 1 trillion characters: 1 character = 1 byte.
2. 220 million pages: 1 page = 5000 characters.
3. 21 years of printing time: 20 sheets per minute.
4. 1 million kg of paper: onesided printed.
5. Paper stack of 22 km height: bulk of 0.1 mm.
Harald Baier Hash Functions in Forensics / WS 2011/2012 5/41
Motivation
Finding relevant files resembles ...
Source: tu-harburg.de Source: beepworld.de
Harald Baier Hash Functions in Forensics / WS 2011/2012 6/41
Motivation
... or is it solved for suspect files?
Harald Baier Hash Functions in Forensics / WS 2011/2012 7/41
Foundations of Hash Functions
Motivation
Foundations of Hash Functions
Use Cases of Hash Functions
Piecewise Hashing
Fuzzy Hashing
Harald Baier Hash Functions in Forensics / WS 2011/2012 8/41
Foundations of Hash Functions
Definition and Security Applications
1. A hash function h is a function with two properties:I Compression: h : {0, 1}∗ −→ {0, 1}n.
I Ease of computation: Computation of h(m) is ’fast’ in practice.
2. Notation:I m is a ’document’ (e.g. a file, a device).
I h(m) its hash value or digest.
3. Sample security applications:I Storage of passwords.
I Electronic signatures (MAC, asymmetric signatures).
I Computer forensics.
Harald Baier Hash Functions in Forensics / WS 2011/2012 9/41
Foundations of Hash Functions
Hash Functions in Cryptography
For use in cryptography, we have to impose further conditions:
1. Preimage Resistance:I Let a hash value H ∈ {0, 1}n be given.
I It is infeasible in practice to find an input (i.e. a document m)with H = h(m).
2. Second Preimage Resistance:I Let a document m ∈ {0, 1}∗ be given.
I It is infeasible in practice to find a second document m′ withm 6= m′ and h(m) = h(m′).
3. Collision Resistance:I It is infeasible in practice to find any two documents m, m′
with m 6= m′ and h(m) = h(m′).
Harald Baier Hash Functions in Forensics / WS 2011/2012 10/41
Foundations of Hash Functions
Avalanche Effect
1. Let m and h(m) be given.If m is replaced by m′, h(m′) behaves pseudo randomly.=⇒ No control over the output, if the input is changed.
2. Consequences:I If only one bit in m is changed to get m′, the two outputs
h(m) and h(m′) look ’very’ different.
I Every bit in h(m′) changes with probability 50%,independently of the number of different bits in m′.
Harald Baier Hash Functions in Forensics / WS 2011/2012 11/41
Foundations of Hash Functions
Sample Cryptographic Hash Functions
Name MD5 SHA-1 SHA-256 SHA-512 RIPEMD-160n 128 160 256 512 160
Demo:1. Computation of hash values using openssl.
2. Avalanche effect.
3. Performance.
Harald Baier Hash Functions in Forensics / WS 2011/2012 12/41
Use Cases of Hash Functions
Motivation
Foundations of Hash Functions
Use Cases of Hash Functions
Piecewise Hashing
Fuzzy Hashing
Harald Baier Hash Functions in Forensics / WS 2011/2012 13/41
Use Cases of Hash Functions
Use Cases
1. Ensure authenticity and integrity during data acquisition.I Relevant for both dead and live analysis.
I Hash values must be protected:I Written down by hand in investigation notebook.
I Compute a digital signature over it.
2. Identify known files:I Whitelisting: Known to be good files.
I Blacklisting: Known to be bad files.
3. Bloom filters to efficiently identify a large set of files.
Relevant security property of the hash function:Second-preimage resistance.
Harald Baier Hash Functions in Forensics / WS 2011/2012 14/41
Use Cases of Hash Functions
Data Acquisition: Dead analysis
1. Investigation is performed on a copy.
2. Main goal of hashing: Preserve chain of custody.
3. Common approach for a storage medium:I Compute hash value h1 over the whole original volume.
I Write hash value h1 down in physical logbook.
I Make a 1-to-1 copy of the volume using dd.
I Compute hash value h2 over the copy.
I Write hash value h2 down in physical logbook.
I Work read-only on copy.
I To prove chain of custody, compute h2 again and check, ifh1 = h2 holds.
Harald Baier Hash Functions in Forensics / WS 2011/2012 15/41
Use Cases of Hash Functions
Example: Save first partition of a HDD
Chain of custody preserved
$ openssl dgst -sha1 /dev/hda1SHA1(/dev/hda1)= c1e4449bfd504b60e85447afaa3cd100f7e59dad
$ dd if=/dev/hda1 of=image-hda1.dd bs=512
$ openssl dgst -sha1 image-hda1.ddSHA1(image-hda1.dd)= c1e4449bfd504b60e85447afaa3cd100f7e59dad
[INVESTIGATE read-only image-hda1.dd]
$ openssl dgst -sha1 image-hda1.ddSHA1(image-hda1.dd)= c1e4449bfd504b60e85447afaa3cd100f7e59dad
Harald Baier Hash Functions in Forensics / WS 2011/2012 16/41
Use Cases of Hash Functions
Example: Save first partition of a HDD
Chain of custody destroyed
$ openssl dgst -sha1 /dev/hda1SHA1(/dev/hda1)= c1e4449bfd504b60e85447afaa3cd100f7e59dad
$ dd if=/dev/hda1 of=image-hda1.dd bs=512
$ openssl dgst -sha1 image-hda1.ddSHA1(image-hda1.dd)= c1e4449bfd504b60e85447afaa3cd100f7e59dad
[INVESTIGATE read-only image-hda1.dd]
$ openssl dgst -sha1 image-hda1.ddSHA1(image-hda1.dd)= 568c68d97baf93642962f509343b8b02d324e3d8
Harald Baier Hash Functions in Forensics / WS 2011/2012 17/41
Use Cases of Hash Functions
Hash Functions in Data Acquisition: Assessment
1. Key problem: If original and copy differ by at least 1 bit, thechain of custody is broken.
2. Solutions:I Compute hashes over smaller pieces of the input (e.g. file
system blocks).This leads to segment hashes.
I Perform a binary search over original and copy to finddifferences.
Harald Baier Hash Functions in Forensics / WS 2011/2012 18/41
Use Cases of Hash Functions
Whitelisting
1. Underlying Idea:I Generate a database of known to be good files.
I Let G denote this set.
I Identify automatically an unsuspicious file on base of its hashvalue, which matches a fingerprint of a file in G.
I Exclude a known to be good file from further investigation.
I Significant reduction of irrelevant data.
2. Examples of unsuspicious files:I System files of operating systems.
I Well-known benign applications like browsers, editors, ...
3. Widespread database:I Reference Data Set (RDS) of the National Software Reference
Library (NSRL), maintained by NISTHarald Baier Hash Functions in Forensics / WS 2011/2012 19/41
Use Cases of Hash Functions
NSRL-RDS
1. Website www.nsrl.nist.gov.
2. Current download version: RDS 2.34 from October 2011
3. Original software is kept to ensure court worthiness.
4. Propagated viaI CD.
I Download of iso-images from website via http.
5. RDS 2.34 comprises four iso-images of about 1.2 GB size.
6. Contains 69.887.164 files and 21.082.054 unique SHA-1 values(due to duplications across different software distributions)
7. Entries are arranged according to the SHA-1 values.
Harald Baier Hash Functions in Forensics / WS 2011/2012 20/41
Use Cases of Hash Functions
NSRL-RDS
Sample entries
$ less NSRLFile.txt
"SHA-1","MD5","CRC32","FileName","FileSize","ProductCode","OpSystemCode","SpecialCode"
"000000206738748EDD92C4E3D2E823896700F849","392126E756571EBF112CB1C1CDEDF926","EBD105A0","I05002T2.PFB",98865,3095,"WIN",""
"0000004DA6391F7F5D2F7FCCF36CEBDA60C6EA02","0E53C14A3E48D94FF596A2824307B492","AA6A7B16","00br2026.gif",2226,228,"WIN",""
"000000A9E47BD385A0A3685AA12C2DB6FD727A20","176308F27DD52890F013A3FD80F92E51","D749B562","femvo523.wav",42748,4887,"MacOSX",""
"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,18266,"358",""
"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,2322,"WIN",""
"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,2575,"WIN",""
"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,2583,"WIN",""
"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,3271,"WIN",""
"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,3282,"UNK",""
"SHA-1","MD5","CRC32","FileName","FileSize","ProductCode",\\"OpSystemCode","SpecialCode""000000206738748EDD92C4E3D2E823896700F849",\\"392126E756571EBF112CB1C1CDEDF926",\\"EBD105A0","I05002T2.PFB",98865,3095,"WIN",""
Harald Baier Hash Functions in Forensics / WS 2011/2012 21/41
Use Cases of Hash Functions
Whitelisting: Anti-detection
1. Task: Find a suspicious file s with h(s) = h(g):I Attacker has to solve the second preimage problem for some
g ∈ G.
I As of today hardly possible (even for MD5).
I If RDS is used, both SHA-1 and MD5 preimages have to befound.
2. Attack the whitelist:I Breach integrity and authenticity of whitelist.
I Example:I Attack the distribution channel.
I If RDS is downloaded via http, spoofing is possible.
Harald Baier Hash Functions in Forensics / WS 2011/2012 22/41
Use Cases of Hash Functions
Whitelisting: Assessment
1. General assessment:I Well-known and established process in computer forensics.
I If database is trusted, no false positives (positive = benign).
2. Possible bottleneck: Size of database.I Size of database is increasing.
I Currently RDS is about 6 gigabyte.
Harald Baier Hash Functions in Forensics / WS 2011/2012 23/41
Use Cases of Hash Functions
Blacklisting
1. Underlying idea:I Generate a database of known to be bad files.
I Let B denote this set.
I Find automatically a suspicious file on base of its fingerprint,which matches a fingerprint of a file in B.
2. Sample suspect files:I Malware.
I Encryption or steganographic software.
I Corporate secrets.
I IPR protected files.
I Child pornography.
Harald Baier Hash Functions in Forensics / WS 2011/2012 24/41
Use Cases of Hash Functions
Blacklisting: Evaluation
1. Anti-detection approach:I Let a suspicious file b ∈ B be given.
I Change some (irrelevant) bit of b to get b′.
I Consequence:I h(b′) is very different from h(b).
I b′ is not detected automatically.
2. Core problem:I Cryptographic requirements of a hash function and forensic
goals are complementary.
I A suspicious file similar to an element of B is not detected.
3. Fragments of elements of B are not identified, too.
Harald Baier Hash Functions in Forensics / WS 2011/2012 25/41
Piecewise Hashing
Motivation
Foundations of Hash Functions
Use Cases of Hash Functions
Piecewise Hashing
Fuzzy Hashing
Harald Baier Hash Functions in Forensics / WS 2011/2012 26/41
Piecewise Hashing
Goals
1. Overcome drawbacks of cryptographic hash functions in thecontext of computer forensics.
2. Main drawbacks are:I Data acquisition: Integrity of copy is destroyed, if some bits
change.
I White-/Blacklisting:I Suspect files similar to known to be bad files are not detected.
I Fragments are not detected (due to deletion, fragmentation).
3. Currently known approaches:I Segment hashes (also called block hashes): Tool dcfldd.
I Context-triggered piecewise hashes: Tool ssdeep.
I Similarity digests: Tool sdhash.
Harald Baier Hash Functions in Forensics / WS 2011/2012 27/41
Piecewise Hashing
Segment Hashes
1. Underlying idea:I Split input data (volume, file) in blocks of fixed length.
I Compute for each segment its cryptographic hash.
I Lookup in hash database for matches.
2. Original aim: Improve integrity of storage media.
Harald Baier Hash Functions in Forensics / WS 2011/2012 28/41
Piecewise Hashing
Segment Hashes: Example from NIST
1. Sample tool of Nicholas Harbour (since 2002):I dcfldd: An extension of dd.
I Department of Defense – Computer Forensics Laboratory.
I Provides MD5, SHA-1, SHA-2 family.
2. Evaluation by NIST (Douglas White, 2008):I Hashing of File Blocks: When Exact Matches are not Useful.
I NIST worked on Windows 2000 and XP OS files.
I Main result:I File-based data reduction leaves an average 30% of disk space
for human investigation.
I Incorporating block hashes reduces this to an average of 15%.
I Assist in recognising wiped media.
Harald Baier Hash Functions in Forensics / WS 2011/2012 29/41
Piecewise Hashing
Example: Save first partition of a HDD
Chain of custody preserved
$ dcfldd if=/dev/hda1 of=image-hda1.dd bs=512 hashwindow=4096 hash=sha256
0 - 4096: da0bd2b16c7cd5acb5695e9d81fb6d832cba85312d87e08d0c675e41b608de50
4096 - 8192: 281f4b8ac2dcda0f3fd9a0642a694fedf829d7567a531b1cfc8925f94eebe7a3
8192 - 12288: 1c05a3c7251b666c1ec4a2b689e25f95a92a311613ce685fdf7cbf41552290e5
12288 - 16384: 6c9c17f271f18587bcccf8f9c6154b4bff764664a3eb8ddf06881168c5c4698b
16384 - 20480: c61cd658e73450dfb0dfc9a1d83cdbddd162d9194d81f27f0516bb107280e841
[REMOVED]
$ dcfldd if=image-hda1.dd of=/dev/null bs=512 hashwindow=4096 hash=sha256
0 - 4096: da0bd2b16c7cd5acb5695e9d81fb6d832cba85312d87e08d0c675e41b608de50
4096 - 8192: 281f4b8ac2dcda0f3fd9a0642a694fedf829d7567a531b1cfc8925f94eebe7a3
8192 - 12288: 1c05a3c7251b666c1ec4a2b689e25f95a92a311613ce685fdf7cbf41552290e5
12288 - 16384: 6c9c17f271f18587bcccf8f9c6154b4bff764664a3eb8ddf06881168c5c4698b
16384 - 20480: c61cd658e73450dfb0dfc9a1d83cdbddd162d9194d81f27f0516bb107280e841
[REMOVED]
Harald Baier Hash Functions in Forensics / WS 2011/2012 30/41
Piecewise Hashing
Example: Save first partition of a HDD
Chain of custody partly destroyed
$ dcfldd if=/dev/hda1 of=image-hda1.dd bs=512 hashwindow=4096 hash=sha256
0 - 4096: da0bd2b16c7cd5acb5695e9d81fb6d832cba85312d87e08d0c675e41b608de50
4096 - 8192: 281f4b8ac2dcda0f3fd9a0642a694fedf829d7567a531b1cfc8925f94eebe7a3
8192 - 12288: 1c05a3c7251b666c1ec4a2b689e25f95a92a311613ce685fdf7cbf41552290e5
12288 - 16384: 6c9c17f271f18587bcccf8f9c6154b4bff764664a3eb8ddf06881168c5c4698b
16384 - 20480: c61cd658e73450dfb0dfc9a1d83cdbddd162d9194d81f27f0516bb107280e841
[REMOVED]
$ dcfldd if=image-hda1.dd of=/dev/null bs=512 hashwindow=4096 hash=sha256
0 - 4096: da0bd2b16c7cd5acb5695e9d81fb6d832cba85312d87e08d0c675e41b608de50
4096 - 8192: 281f4b8ac2dcda0f3fd9a0642a694fedf829d7567a531b1cfc8925f94eebe7a3
8192 - 12288: 1c05a3c7251b666c1ec4a2b689e25f95a92a311613ce685fdf7cbf41552290e5
12288 - 16384: f361bc439d0fc4215eba5523cba663c5371d47c358d68b2192602a29e972e143
16384 - 20480: c61cd658e73450dfb0dfc9a1d83cdbddd162d9194d81f27f0516bb107280e841
[REMOVED]
Harald Baier Hash Functions in Forensics / WS 2011/2012 31/41
Piecewise Hashing
Segment Hashes: Evaluation
1. Anti-Blacklisting is very easy:I Introduce an irrelevant byte in the first sector.
I All segment hashes differ from the stored segment hashes.
I Modified suspect file is not detected.
I Demo.
2. A good technique for whitelisting (see NIST results).
3. Size of segment hash database is large:I 4096 byte block size, SHA-1.
I size of hash databasesize of raw data = 20
4096 = 0.00488=⇒ 1 terabyte of raw data yields a 5 gigabyte hash database.
4. Hash database depends on the hashwindow size.
Harald Baier Hash Functions in Forensics / WS 2011/2012 32/41
Piecewise Hashing
Context Triggered Piecewise Hashes
1. Underlying idea:I Split input data (volume, file) in blocks of variable length.
I The end points of the blocks are determined by a rolling hash:I Its value only depends on the current context.
I A window (e.g. of size 7 bytes) slides over the input.
I Context = Bytes of input data in the current window.
I If rolling hash matches a trigger value, an end point is set.
I Rolling hash function is assumed to be pseudo random: PF .
I Compute for every segment its cryptographic hash.
I The sequence of these segment hashes is the context triggeredpiecewise hash (CTPH) of the input.
I Lookup in hash database for matches.
Harald Baier Hash Functions in Forensics / WS 2011/2012 33/41
Piecewise Hashing
Context Triggered Piecewise Hashes
1. Originally proposed for spam detection (spamsum by AndrewTridgell, 2002)
2. Ported to forensics by Jesse Kornblum, 2006: ssdeep.
Harald Baier Hash Functions in Forensics / WS 2011/2012 34/41
Piecewise Hashing
Piecewise Hashes: The algorithm
Harald Baier Hash Functions in Forensics / WS 2011/2012 35/41
Piecewise Hashing
CTPH: A sample tool
1. ssdeep (based on spamsum).
2. Window size is 7.
3. CTPH is a sequence of printable characters:I Only the least significant 6 bits (LS6B) of a segment hash are
considered.
I LS6B are encoded base64.
4. ssdeep decides about a match:I On base of the edit distance (changing, inserting, ...
characters) of two CTPHs.
I Edit distance is rescaled to a percentage match score.
5. Demo.
Harald Baier Hash Functions in Forensics / WS 2011/2012 36/41
Piecewise Hashing
CTPH: Sample Research Questions
1. Rolling hash:I Shall be efficient and pseudo random.
I Current implementation is fast, but not pseudo random.
I Task: Find a slightly slower, but pseudo random rolling hash.
2. Fragment detection:I Kornblum’s approach fails, if fragments are much smaller than
the original file.
I Task: Find a different approach.
3. Edit distance:I Kornblum’s approach addresses text files (due to spam
detection).
I Task: Find a more general approach, which also addressesimages, videos, ...
Harald Baier Hash Functions in Forensics / WS 2011/2012 37/41
Fuzzy Hashing
Motivation
Foundations of Hash Functions
Use Cases of Hash Functions
Piecewise Hashing
Fuzzy Hashing
Harald Baier Hash Functions in Forensics / WS 2011/2012 38/41
Fuzzy Hashing
Vision
1. Find a similarity preserving hash function.I Fuzzy hash function, denoted by f .
I m and m′ are ’similar’ =⇒ f(m) and f(m′) are ’similar’, too.
2. Find a general (forensic) metric to measure similarity:I Let d : {0, 1}∗ × {0, 1}∗ −→ R+
0 be such a metric.
I d shall be applicable to txt, doc, odt, jpg, bmp, devices, ...
3. f is a function with d(m1, m2) ≈ d(f(m1), f(m2)).
Harald Baier Hash Functions in Forensics / WS 2011/2012 39/41
Fuzzy Hashing
Fuzzy Hashing: Applications
1. Forensics (on the file level): Detect similar files.I Blacklisting:
I Detect manipulated suspicious files.
I Find fragments of suspicious data.
I Whitelisting: Find changed unsuspicious files.
2. Biometrics: Template protection.
3. Malware: Detect obfuscated malware (e.g. metamorphicmalware).
4. Junk mail detection.
Harald Baier Hash Functions in Forensics / WS 2011/2012 40/41
Fuzzy Hashing
Questions?
Harald Baier Hash Functions in Forensics / WS 2011/2012 41/41