Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks....

Post on 10-Jun-2018

230 views 0 download

Transcript of Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks....

Chapter 8: On the Use of Hash Functions in

Computer Forensics

Harald Baier

Hochschule Darmstadt, CASED

WS 2011/2012

Harald Baier Hash Functions in Forensics / WS 2011/2012 1/41

Motivation

Foundations of Hash Functions

Use Cases of Hash Functions

Piecewise Hashing

Fuzzy Hashing

Harald Baier Hash Functions in Forensics / WS 2011/2012 2/41

Motivation

Motivation

Foundations of Hash Functions

Use Cases of Hash Functions

Piecewise Hashing

Fuzzy Hashing

Harald Baier Hash Functions in Forensics / WS 2011/2012 3/41

Motivation

Use Case: Prosecution

1. Police and prosecutors confronted with different storagemedia:

I Hard disk drives, solid-state drives, USB sticks.

I Mobile phones, SIM cards.

I Digital cameras, digital camcorders, SD cards.

I CDs, DVDs.

I RAM (dumps).

I ...

2. Amount of distrained data often exceeds 1 terabyte.

Harald Baier Hash Functions in Forensics / WS 2011/2012 4/41

Motivation

Different views of 1 terabyte

1 terabyte of digital text is (approximately) equal to:

1. 1 trillion characters: 1 character = 1 byte.

2. 220 million pages: 1 page = 5000 characters.

3. 21 years of printing time: 20 sheets per minute.

4. 1 million kg of paper: onesided printed.

5. Paper stack of 22 km height: bulk of 0.1 mm.

Harald Baier Hash Functions in Forensics / WS 2011/2012 5/41

Motivation

Finding relevant files resembles ...

Source: tu-harburg.de Source: beepworld.de

Harald Baier Hash Functions in Forensics / WS 2011/2012 6/41

Motivation

... or is it solved for suspect files?

Harald Baier Hash Functions in Forensics / WS 2011/2012 7/41

Foundations of Hash Functions

Motivation

Foundations of Hash Functions

Use Cases of Hash Functions

Piecewise Hashing

Fuzzy Hashing

Harald Baier Hash Functions in Forensics / WS 2011/2012 8/41

Foundations of Hash Functions

Definition and Security Applications

1. A hash function h is a function with two properties:I Compression: h : {0, 1}∗ −→ {0, 1}n.

I Ease of computation: Computation of h(m) is ’fast’ in practice.

2. Notation:I m is a ’document’ (e.g. a file, a device).

I h(m) its hash value or digest.

3. Sample security applications:I Storage of passwords.

I Electronic signatures (MAC, asymmetric signatures).

I Computer forensics.

Harald Baier Hash Functions in Forensics / WS 2011/2012 9/41

Foundations of Hash Functions

Hash Functions in Cryptography

For use in cryptography, we have to impose further conditions:

1. Preimage Resistance:I Let a hash value H ∈ {0, 1}n be given.

I It is infeasible in practice to find an input (i.e. a document m)with H = h(m).

2. Second Preimage Resistance:I Let a document m ∈ {0, 1}∗ be given.

I It is infeasible in practice to find a second document m′ withm 6= m′ and h(m) = h(m′).

3. Collision Resistance:I It is infeasible in practice to find any two documents m, m′

with m 6= m′ and h(m) = h(m′).

Harald Baier Hash Functions in Forensics / WS 2011/2012 10/41

Foundations of Hash Functions

Avalanche Effect

1. Let m and h(m) be given.If m is replaced by m′, h(m′) behaves pseudo randomly.=⇒ No control over the output, if the input is changed.

2. Consequences:I If only one bit in m is changed to get m′, the two outputs

h(m) and h(m′) look ’very’ different.

I Every bit in h(m′) changes with probability 50%,independently of the number of different bits in m′.

Harald Baier Hash Functions in Forensics / WS 2011/2012 11/41

Foundations of Hash Functions

Sample Cryptographic Hash Functions

Name MD5 SHA-1 SHA-256 SHA-512 RIPEMD-160n 128 160 256 512 160

Demo:1. Computation of hash values using openssl.

2. Avalanche effect.

3. Performance.

Harald Baier Hash Functions in Forensics / WS 2011/2012 12/41

Use Cases of Hash Functions

Motivation

Foundations of Hash Functions

Use Cases of Hash Functions

Piecewise Hashing

Fuzzy Hashing

Harald Baier Hash Functions in Forensics / WS 2011/2012 13/41

Use Cases of Hash Functions

Use Cases

1. Ensure authenticity and integrity during data acquisition.I Relevant for both dead and live analysis.

I Hash values must be protected:I Written down by hand in investigation notebook.

I Compute a digital signature over it.

2. Identify known files:I Whitelisting: Known to be good files.

I Blacklisting: Known to be bad files.

3. Bloom filters to efficiently identify a large set of files.

Relevant security property of the hash function:Second-preimage resistance.

Harald Baier Hash Functions in Forensics / WS 2011/2012 14/41

Use Cases of Hash Functions

Data Acquisition: Dead analysis

1. Investigation is performed on a copy.

2. Main goal of hashing: Preserve chain of custody.

3. Common approach for a storage medium:I Compute hash value h1 over the whole original volume.

I Write hash value h1 down in physical logbook.

I Make a 1-to-1 copy of the volume using dd.

I Compute hash value h2 over the copy.

I Write hash value h2 down in physical logbook.

I Work read-only on copy.

I To prove chain of custody, compute h2 again and check, ifh1 = h2 holds.

Harald Baier Hash Functions in Forensics / WS 2011/2012 15/41

Use Cases of Hash Functions

Example: Save first partition of a HDD

Chain of custody preserved

$ openssl dgst -sha1 /dev/hda1SHA1(/dev/hda1)= c1e4449bfd504b60e85447afaa3cd100f7e59dad

$ dd if=/dev/hda1 of=image-hda1.dd bs=512

$ openssl dgst -sha1 image-hda1.ddSHA1(image-hda1.dd)= c1e4449bfd504b60e85447afaa3cd100f7e59dad

[INVESTIGATE read-only image-hda1.dd]

$ openssl dgst -sha1 image-hda1.ddSHA1(image-hda1.dd)= c1e4449bfd504b60e85447afaa3cd100f7e59dad

Harald Baier Hash Functions in Forensics / WS 2011/2012 16/41

Use Cases of Hash Functions

Example: Save first partition of a HDD

Chain of custody destroyed

$ openssl dgst -sha1 /dev/hda1SHA1(/dev/hda1)= c1e4449bfd504b60e85447afaa3cd100f7e59dad

$ dd if=/dev/hda1 of=image-hda1.dd bs=512

$ openssl dgst -sha1 image-hda1.ddSHA1(image-hda1.dd)= c1e4449bfd504b60e85447afaa3cd100f7e59dad

[INVESTIGATE read-only image-hda1.dd]

$ openssl dgst -sha1 image-hda1.ddSHA1(image-hda1.dd)= 568c68d97baf93642962f509343b8b02d324e3d8

Harald Baier Hash Functions in Forensics / WS 2011/2012 17/41

Use Cases of Hash Functions

Hash Functions in Data Acquisition: Assessment

1. Key problem: If original and copy differ by at least 1 bit, thechain of custody is broken.

2. Solutions:I Compute hashes over smaller pieces of the input (e.g. file

system blocks).This leads to segment hashes.

I Perform a binary search over original and copy to finddifferences.

Harald Baier Hash Functions in Forensics / WS 2011/2012 18/41

Use Cases of Hash Functions

Whitelisting

1. Underlying Idea:I Generate a database of known to be good files.

I Let G denote this set.

I Identify automatically an unsuspicious file on base of its hashvalue, which matches a fingerprint of a file in G.

I Exclude a known to be good file from further investigation.

I Significant reduction of irrelevant data.

2. Examples of unsuspicious files:I System files of operating systems.

I Well-known benign applications like browsers, editors, ...

3. Widespread database:I Reference Data Set (RDS) of the National Software Reference

Library (NSRL), maintained by NISTHarald Baier Hash Functions in Forensics / WS 2011/2012 19/41

Use Cases of Hash Functions

NSRL-RDS

1. Website www.nsrl.nist.gov.

2. Current download version: RDS 2.34 from October 2011

3. Original software is kept to ensure court worthiness.

4. Propagated viaI CD.

I Download of iso-images from website via http.

5. RDS 2.34 comprises four iso-images of about 1.2 GB size.

6. Contains 69.887.164 files and 21.082.054 unique SHA-1 values(due to duplications across different software distributions)

7. Entries are arranged according to the SHA-1 values.

Harald Baier Hash Functions in Forensics / WS 2011/2012 20/41

Use Cases of Hash Functions

NSRL-RDS

Sample entries

$ less NSRLFile.txt

"SHA-1","MD5","CRC32","FileName","FileSize","ProductCode","OpSystemCode","SpecialCode"

"000000206738748EDD92C4E3D2E823896700F849","392126E756571EBF112CB1C1CDEDF926","EBD105A0","I05002T2.PFB",98865,3095,"WIN",""

"0000004DA6391F7F5D2F7FCCF36CEBDA60C6EA02","0E53C14A3E48D94FF596A2824307B492","AA6A7B16","00br2026.gif",2226,228,"WIN",""

"000000A9E47BD385A0A3685AA12C2DB6FD727A20","176308F27DD52890F013A3FD80F92E51","D749B562","femvo523.wav",42748,4887,"MacOSX",""

"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,18266,"358",""

"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,2322,"WIN",""

"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,2575,"WIN",""

"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,2583,"WIN",""

"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,3271,"WIN",""

"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,3282,"UNK",""

"SHA-1","MD5","CRC32","FileName","FileSize","ProductCode",\\"OpSystemCode","SpecialCode""000000206738748EDD92C4E3D2E823896700F849",\\"392126E756571EBF112CB1C1CDEDF926",\\"EBD105A0","I05002T2.PFB",98865,3095,"WIN",""

Harald Baier Hash Functions in Forensics / WS 2011/2012 21/41

Use Cases of Hash Functions

Whitelisting: Anti-detection

1. Task: Find a suspicious file s with h(s) = h(g):I Attacker has to solve the second preimage problem for some

g ∈ G.

I As of today hardly possible (even for MD5).

I If RDS is used, both SHA-1 and MD5 preimages have to befound.

2. Attack the whitelist:I Breach integrity and authenticity of whitelist.

I Example:I Attack the distribution channel.

I If RDS is downloaded via http, spoofing is possible.

Harald Baier Hash Functions in Forensics / WS 2011/2012 22/41

Use Cases of Hash Functions

Whitelisting: Assessment

1. General assessment:I Well-known and established process in computer forensics.

I If database is trusted, no false positives (positive = benign).

2. Possible bottleneck: Size of database.I Size of database is increasing.

I Currently RDS is about 6 gigabyte.

Harald Baier Hash Functions in Forensics / WS 2011/2012 23/41

Use Cases of Hash Functions

Blacklisting

1. Underlying idea:I Generate a database of known to be bad files.

I Let B denote this set.

I Find automatically a suspicious file on base of its fingerprint,which matches a fingerprint of a file in B.

2. Sample suspect files:I Malware.

I Encryption or steganographic software.

I Corporate secrets.

I IPR protected files.

I Child pornography.

Harald Baier Hash Functions in Forensics / WS 2011/2012 24/41

Use Cases of Hash Functions

Blacklisting: Evaluation

1. Anti-detection approach:I Let a suspicious file b ∈ B be given.

I Change some (irrelevant) bit of b to get b′.

I Consequence:I h(b′) is very different from h(b).

I b′ is not detected automatically.

2. Core problem:I Cryptographic requirements of a hash function and forensic

goals are complementary.

I A suspicious file similar to an element of B is not detected.

3. Fragments of elements of B are not identified, too.

Harald Baier Hash Functions in Forensics / WS 2011/2012 25/41

Piecewise Hashing

Motivation

Foundations of Hash Functions

Use Cases of Hash Functions

Piecewise Hashing

Fuzzy Hashing

Harald Baier Hash Functions in Forensics / WS 2011/2012 26/41

Piecewise Hashing

Goals

1. Overcome drawbacks of cryptographic hash functions in thecontext of computer forensics.

2. Main drawbacks are:I Data acquisition: Integrity of copy is destroyed, if some bits

change.

I White-/Blacklisting:I Suspect files similar to known to be bad files are not detected.

I Fragments are not detected (due to deletion, fragmentation).

3. Currently known approaches:I Segment hashes (also called block hashes): Tool dcfldd.

I Context-triggered piecewise hashes: Tool ssdeep.

I Similarity digests: Tool sdhash.

Harald Baier Hash Functions in Forensics / WS 2011/2012 27/41

Piecewise Hashing

Segment Hashes

1. Underlying idea:I Split input data (volume, file) in blocks of fixed length.

I Compute for each segment its cryptographic hash.

I Lookup in hash database for matches.

2. Original aim: Improve integrity of storage media.

Harald Baier Hash Functions in Forensics / WS 2011/2012 28/41

Piecewise Hashing

Segment Hashes: Example from NIST

1. Sample tool of Nicholas Harbour (since 2002):I dcfldd: An extension of dd.

I Department of Defense – Computer Forensics Laboratory.

I Provides MD5, SHA-1, SHA-2 family.

2. Evaluation by NIST (Douglas White, 2008):I Hashing of File Blocks: When Exact Matches are not Useful.

I NIST worked on Windows 2000 and XP OS files.

I Main result:I File-based data reduction leaves an average 30% of disk space

for human investigation.

I Incorporating block hashes reduces this to an average of 15%.

I Assist in recognising wiped media.

Harald Baier Hash Functions in Forensics / WS 2011/2012 29/41

Piecewise Hashing

Example: Save first partition of a HDD

Chain of custody preserved

$ dcfldd if=/dev/hda1 of=image-hda1.dd bs=512 hashwindow=4096 hash=sha256

0 - 4096: da0bd2b16c7cd5acb5695e9d81fb6d832cba85312d87e08d0c675e41b608de50

4096 - 8192: 281f4b8ac2dcda0f3fd9a0642a694fedf829d7567a531b1cfc8925f94eebe7a3

8192 - 12288: 1c05a3c7251b666c1ec4a2b689e25f95a92a311613ce685fdf7cbf41552290e5

12288 - 16384: 6c9c17f271f18587bcccf8f9c6154b4bff764664a3eb8ddf06881168c5c4698b

16384 - 20480: c61cd658e73450dfb0dfc9a1d83cdbddd162d9194d81f27f0516bb107280e841

[REMOVED]

$ dcfldd if=image-hda1.dd of=/dev/null bs=512 hashwindow=4096 hash=sha256

0 - 4096: da0bd2b16c7cd5acb5695e9d81fb6d832cba85312d87e08d0c675e41b608de50

4096 - 8192: 281f4b8ac2dcda0f3fd9a0642a694fedf829d7567a531b1cfc8925f94eebe7a3

8192 - 12288: 1c05a3c7251b666c1ec4a2b689e25f95a92a311613ce685fdf7cbf41552290e5

12288 - 16384: 6c9c17f271f18587bcccf8f9c6154b4bff764664a3eb8ddf06881168c5c4698b

16384 - 20480: c61cd658e73450dfb0dfc9a1d83cdbddd162d9194d81f27f0516bb107280e841

[REMOVED]

Harald Baier Hash Functions in Forensics / WS 2011/2012 30/41

Piecewise Hashing

Example: Save first partition of a HDD

Chain of custody partly destroyed

$ dcfldd if=/dev/hda1 of=image-hda1.dd bs=512 hashwindow=4096 hash=sha256

0 - 4096: da0bd2b16c7cd5acb5695e9d81fb6d832cba85312d87e08d0c675e41b608de50

4096 - 8192: 281f4b8ac2dcda0f3fd9a0642a694fedf829d7567a531b1cfc8925f94eebe7a3

8192 - 12288: 1c05a3c7251b666c1ec4a2b689e25f95a92a311613ce685fdf7cbf41552290e5

12288 - 16384: 6c9c17f271f18587bcccf8f9c6154b4bff764664a3eb8ddf06881168c5c4698b

16384 - 20480: c61cd658e73450dfb0dfc9a1d83cdbddd162d9194d81f27f0516bb107280e841

[REMOVED]

$ dcfldd if=image-hda1.dd of=/dev/null bs=512 hashwindow=4096 hash=sha256

0 - 4096: da0bd2b16c7cd5acb5695e9d81fb6d832cba85312d87e08d0c675e41b608de50

4096 - 8192: 281f4b8ac2dcda0f3fd9a0642a694fedf829d7567a531b1cfc8925f94eebe7a3

8192 - 12288: 1c05a3c7251b666c1ec4a2b689e25f95a92a311613ce685fdf7cbf41552290e5

12288 - 16384: f361bc439d0fc4215eba5523cba663c5371d47c358d68b2192602a29e972e143

16384 - 20480: c61cd658e73450dfb0dfc9a1d83cdbddd162d9194d81f27f0516bb107280e841

[REMOVED]

Harald Baier Hash Functions in Forensics / WS 2011/2012 31/41

Piecewise Hashing

Segment Hashes: Evaluation

1. Anti-Blacklisting is very easy:I Introduce an irrelevant byte in the first sector.

I All segment hashes differ from the stored segment hashes.

I Modified suspect file is not detected.

I Demo.

2. A good technique for whitelisting (see NIST results).

3. Size of segment hash database is large:I 4096 byte block size, SHA-1.

I size of hash databasesize of raw data = 20

4096 = 0.00488=⇒ 1 terabyte of raw data yields a 5 gigabyte hash database.

4. Hash database depends on the hashwindow size.

Harald Baier Hash Functions in Forensics / WS 2011/2012 32/41

Piecewise Hashing

Context Triggered Piecewise Hashes

1. Underlying idea:I Split input data (volume, file) in blocks of variable length.

I The end points of the blocks are determined by a rolling hash:I Its value only depends on the current context.

I A window (e.g. of size 7 bytes) slides over the input.

I Context = Bytes of input data in the current window.

I If rolling hash matches a trigger value, an end point is set.

I Rolling hash function is assumed to be pseudo random: PF .

I Compute for every segment its cryptographic hash.

I The sequence of these segment hashes is the context triggeredpiecewise hash (CTPH) of the input.

I Lookup in hash database for matches.

Harald Baier Hash Functions in Forensics / WS 2011/2012 33/41

Piecewise Hashing

Context Triggered Piecewise Hashes

1. Originally proposed for spam detection (spamsum by AndrewTridgell, 2002)

2. Ported to forensics by Jesse Kornblum, 2006: ssdeep.

Harald Baier Hash Functions in Forensics / WS 2011/2012 34/41

Piecewise Hashing

Piecewise Hashes: The algorithm

Harald Baier Hash Functions in Forensics / WS 2011/2012 35/41

Piecewise Hashing

CTPH: A sample tool

1. ssdeep (based on spamsum).

2. Window size is 7.

3. CTPH is a sequence of printable characters:I Only the least significant 6 bits (LS6B) of a segment hash are

considered.

I LS6B are encoded base64.

4. ssdeep decides about a match:I On base of the edit distance (changing, inserting, ...

characters) of two CTPHs.

I Edit distance is rescaled to a percentage match score.

5. Demo.

Harald Baier Hash Functions in Forensics / WS 2011/2012 36/41

Piecewise Hashing

CTPH: Sample Research Questions

1. Rolling hash:I Shall be efficient and pseudo random.

I Current implementation is fast, but not pseudo random.

I Task: Find a slightly slower, but pseudo random rolling hash.

2. Fragment detection:I Kornblum’s approach fails, if fragments are much smaller than

the original file.

I Task: Find a different approach.

3. Edit distance:I Kornblum’s approach addresses text files (due to spam

detection).

I Task: Find a more general approach, which also addressesimages, videos, ...

Harald Baier Hash Functions in Forensics / WS 2011/2012 37/41

Fuzzy Hashing

Motivation

Foundations of Hash Functions

Use Cases of Hash Functions

Piecewise Hashing

Fuzzy Hashing

Harald Baier Hash Functions in Forensics / WS 2011/2012 38/41

Fuzzy Hashing

Vision

1. Find a similarity preserving hash function.I Fuzzy hash function, denoted by f .

I m and m′ are ’similar’ =⇒ f(m) and f(m′) are ’similar’, too.

2. Find a general (forensic) metric to measure similarity:I Let d : {0, 1}∗ × {0, 1}∗ −→ R+

0 be such a metric.

I d shall be applicable to txt, doc, odt, jpg, bmp, devices, ...

3. f is a function with d(m1, m2) ≈ d(f(m1), f(m2)).

Harald Baier Hash Functions in Forensics / WS 2011/2012 39/41

Fuzzy Hashing

Fuzzy Hashing: Applications

1. Forensics (on the file level): Detect similar files.I Blacklisting:

I Detect manipulated suspicious files.

I Find fragments of suspicious data.

I Whitelisting: Find changed unsuspicious files.

2. Biometrics: Template protection.

3. Malware: Detect obfuscated malware (e.g. metamorphicmalware).

4. Junk mail detection.

Harald Baier Hash Functions in Forensics / WS 2011/2012 40/41

Fuzzy Hashing

Questions?

Harald Baier Hash Functions in Forensics / WS 2011/2012 41/41