Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions...

45
Chapter 3: Similarity Preserving Hash Functions Frank Breitinger Hochschule Darmstadt, CASED SoSe 2012 Frank Breitinger Hash Functions in Forensics / SoSe 2012 1/45

Transcript of Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions...

Page 1: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Chapter 3:

Similarity Preserving Hash Functions

Frank Breitinger

Hochschule Darmstadt, CASED

SoSe 2012

Frank Breitinger Hash Functions in Forensics / SoSe 2012 1/45

Page 2: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Repetition

Similarity Preserving Hashing

Approaches and their Tools

Applications for Fuzzy Hashing

Peculiarities

Frank Breitinger Hash Functions in Forensics / SoSe 2012 2/45

Page 3: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Repetition

Repetition

Similarity Preserving Hashing

Approaches and their Tools

Applications for Fuzzy Hashing

Peculiarities

Frank Breitinger Hash Functions in Forensics / SoSe 2012 3/45

Page 4: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Repetition

Type: Similarity Preserving Hash Functions [1/2]

I Further property: similar inputs yield similar hash values.I Alignment Robustness.I Non-Propagation.I If possible: Also fulfill expectations for cryptographic hash

functions (partly).

I Field of Application:I Detection similar files during a forensic investigation (e.g.,

different versions of files).I Biometrics: Template protection.I Malware: Detect obfuscated malware (e.g., metamorphic

malware).I Junk mail detection.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 4/45

Page 5: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Repetition

Type: Similarity Preserving Hash Functions [2/2]

I Possible representatives:I Segment hashes (Tool dcfldd).I Context-triggered piecewise hashes (Tool ssdeep).I Similarity digests (Tool sdhash).

I Example:I ‘I don’t have any fear at home.’ vs

‘I don’t have any bear at home.’I Is there a match?

SHA-1: no match.ssdeep: similarity of 90%.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 5/45

Page 6: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Similarity Preserving Hashing

Repetition

Similarity Preserving Hashing

Approaches and their Tools

Applications for Fuzzy Hashing

Peculiarities

Frank Breitinger Hash Functions in Forensics / SoSe 2012 6/45

Page 7: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Similarity Preserving Hashing

Problem

Let X ⊆ Σ∗ be the set of inputs, let dx denote a distance functionon X and let x1, x2 ∈ X.

In order to identify the similarity between two inputs x1 and x2 werun into two problems:

1. Both inputs (x1, x2) might be very ‘large’ and thus calculatingdx(x1, x2) is very time consuming.

2. If x1 is known and we want to compare it against x2, we needto have x1 available (disk space problem).

Frank Breitinger Hash Functions in Forensics / SoSe 2012 7/45

Page 8: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Similarity Preserving Hashing

Solution

The aim is to use a compression function that obtain the similarityof the domain.

The Definition in the following is an own creation.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 8/45

Page 9: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Similarity Preserving Hashing

Naming

Several terms for the compression function: similarity preservinghashing, similarity digest, fuzzy hash function or similaritypreserving hash function.

We use the term approach for similarity preserving hashing whichconsists of a

similarity preserving hash function, which is a function / algorithm(which is denoted by h) to build a hash value /fingerprint and a

distance function, (denoted by dy) that outputs a similarity scorefor two hash values / fingerprints. The termfingerprint or hash value is due to cryptographic /traditional hash functions.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 9/45

Page 10: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Similarity Preserving Hashing

Solution: Possible candidates

Let h be called a similarity preserving hash function withh : X −→ Y and let dy denote a distance function on Y . Then apossible solution candidate is a setting of (Y, h, dy) with thefollowing properties:

1. |Y | < |X|.2. h is fast computable.

3. dy is fast computable.

Questions:

I Compare these properties to the ones of cryptographic hashfunctions. What is the difference?

I What is fast?

Frank Breitinger Hash Functions in Forensics / SoSe 2012 10/45

Page 11: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Similarity Preserving Hashing

Valid solution candidates

A solution candidate is valid, if there is a εy:

∀x1, x2 : if dx(x1, x2) ≤ εx,

then dy(h(x1), h(x2)) ≤ εy

This issues is called correctness.

In words:Similar inputs yield similar outputs.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 11/45

Page 12: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Similarity Preserving Hashing

Quality of solution candidates

The quality of a solution candidate can be measured by itscompleteness which is defined as follows:

∀x1, x2 : if dy(h(x1), h(x2)) ≤ εy,

then dx(x1, x2) ≤ εx

In words:Similar outputs imply similar inputs.

Often completeness is a little bit too strict. Thus we introduce theterm p-completeness: Let p be a probability with 0 ≤ p ≤ 1.

∀x1, x2 : if dy(h(x1), h(x2)) ≤ εy,

then P (dx(x1, x2) ≤ εx) ≥ p

Frank Breitinger Hash Functions in Forensics / SoSe 2012 12/45

Page 13: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Similarity Preserving Hashing

Main Goal

1. Overcome ‘drawbacks’ of cryptographic hash functions indifferent application contexts.

2. E.g., for computer forensics the main drawbacks are:I Data acquisition: Integrity of copy is destroyed, if some bits

change.

I White-/Blacklisting:I Suspect files similar to known-to-be-bad-files are not detected.

I Fragments are not detected (due to deletion, fragmentation).

Frank Breitinger Hash Functions in Forensics / SoSe 2012 13/45

Page 14: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Similarity Preserving Hashing

Approaches

Currently known approaches:

I Segment hashes (also called block hashes): Tool dcfldd.

I Context-triggered piecewise hashes: Tool ssdeep.

I Similarity digests: Tool sdhash.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 14/45

Page 15: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Repetition

Similarity Preserving Hashing

Approaches and their Tools

Applications for Fuzzy Hashing

Peculiarities

Frank Breitinger Hash Functions in Forensics / SoSe 2012 15/45

Page 16: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Segment Hashes including dcfldd

Frank Breitinger Hash Functions in Forensics / SoSe 2012 16/45

Page 17: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Segment Hashes

1. Underlying idea:I Split input data (volume, file) in blocks of fixed length.

I Compute for each segment its cryptographic hash.

2. Original aim: Improve integrity of storage media.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 17/45

Page 18: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Segment Hashes: Example - dcfldd

$ dcfldd if=/dev/hda1 of=image-hda1.dd bs=512 hashwindow=4096 hash=sha256

0 - 4096: da0bd2b16c7cd5acb5695e9d81fb6d832cba85312d87e08d0c675e41b608de50

4096 - 8192: 281f4b8ac2dcda0f3fd9a0642a694fedf829d7567a531b1cfc8925f94eebe7a3

8192 - 12288: 1c05a3c7251b666c1ec4a2b689e25f95a92a311613ce685fdf7cbf41552290e5

12288 - 16384: 6c9c17f271f18587bcccf8f9c6154b4bff764664a3eb8ddf06881168c5c4698b

16384 - 20480: c61cd658e73450dfb0dfc9a1d83cdbddd162d9194d81f27f0516bb107280e841

[REMOVED]

Frank Breitinger Hash Functions in Forensics / SoSe 2012 18/45

Page 19: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Segment Hashes: Example from NIST

1. Sample tool of Nicholas Harbour (since 2002):I dcfldd: An extension of dd.

I Department of Defense – Computer Forensics Laboratory.

I Provides MD5, SHA-1, SHA-2 family.

2. Evaluation by NIST (Douglas White, 2008):I Hashing of File Blocks: When Exact Matches are not Useful.

I NIST worked on Windows 2000 and XP OS files.

I Main result:I File-based data reduction leaves an average 30% of disk space

for human investigation.

I Incorporating block hashes reduces this to an average of 15%.

I Assist in recognising wiped media.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 19/45

Page 20: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Segment Hashes: Evaluation

1. Anti-Blacklisting is very easy:I Introduce an irrelevant byte in the first sector.

I All segment hashes differ from the stored segment hashes.

I Modified suspect file is not detected.

2. A good technique for whitelisting (see NIST results).

3. Size of segment hash database is large:I 4096 byte block size, SHA-1.

I size of hash value in bytessize of raw data in bytes = 20

4096 = 0.00488=⇒ 1 terabyte of raw data yields a 5 gigabyte hash database.

4. Hash database depends on the hashwindow size.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 20/45

Page 21: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Context Triggered Piecewise Hashes includingssdeep

Frank Breitinger Hash Functions in Forensics / SoSe 2012 21/45

Page 22: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Context Triggered Piecewise Hashes

1. Underlying idea:I Split input data (volume, file) in blocks of variable length.

I Compute a hash value for each segment.

I The sequence of these segment hashes is the context triggeredpiecewise hash of the input.

2. Question: How do we achieve blocks of variable length?

Frank Breitinger Hash Functions in Forensics / SoSe 2012 22/45

Page 23: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Context Triggered Piecewise Hashes (CTPH)

The end points of the blocks are determined by a pseudo randomfunction called rolling hash:

I Its value only depends on the current context.

I A window (e.g., of size 7 bytes) slides over the input.

I Context = Bytes of input data in the current window.

I If rolling hash matches a trigger value, an end point is set.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 23/45

Page 24: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

CTPH: A sample tool

1. ssdeep (based on spamsum).

2. Window size is 7.

3. CTPH is a sequence of printable characters:I LS6B are encoded base64.

I Only the least significant 6 bits (LS6B) of a block hash areconsidered.

4. ssdeep decides about a match:I On base of the weighted edit distance (changing, inserting and

deleting characters) of two CTPHs.

I Edit distance is rescaled to a percentage match score.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 24/45

Page 25: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Piecewise Hashes: The algorithm

Frank Breitinger Hash Functions in Forensics / SoSe 2012 25/45

Page 26: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Piecewise Hashes: The algorithm

Frank Breitinger Hash Functions in Forensics / SoSe 2012 26/45

Page 27: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

CTPH: Sample Research Questions

1. Rolling hash:I Shall be efficient and pseudo random.

I Current implementation is fast, but not pseudo random.

I Task: Find a slightly slower, but pseudo random rolling hash.

2. Fragment detection:I Kornblum’s approach fails, if fragments are much smaller than

the original file.

I Task: Find a different approach.

3. Edit distance:I Kornblum’s approach addresses text files (due to spam

detection).

I Task: Find a more general approach, which also addressesimages, videos, ...

Frank Breitinger Hash Functions in Forensics / SoSe 2012 27/45

Page 28: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

CTPH: Origin & Evaluation

1. Originally proposed for spam detection (spamsum by AndrewTridgell, 2002)

2. Ported to forensics by Jesse Kornblum, 2006: ssdeep.

3. Evaluation by different researchers:I Some publications to improve ssdeep.

I Performance.I Detection rate.I Rolling hash.

I Main conclusion: Fail a security analysis → not usable inforensics.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 28/45

Page 29: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Similarity Digests including sdhash

Frank Breitinger Hash Functions in Forensics / SoSe 2012 29/45

Page 30: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Similarity Digests

1. Underlying idea:I Use statistical improbable features from input data (volume,

file) of 64 bytes.

I Generate a cryptographic hash (SHA-1) for each feature.

I Divide each hash value in 5 sub-hashes in order to set five bitswithin a Bloom filter.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 30/45

Page 31: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Similarity Digests: Example from Roussev

1. Sample tool of Vassil Roussev (since 2010):I sdhash.

I Very popular approach; NIST provides databases.

2. Evaluation by J. Beckingham / F. Breitinger / H. Baier(2012):

I Robust approach with several design errors.

I Performance is slower than ssdeep.I A parallelized version is coming.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 31/45

Page 32: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Similarity Digests: A sample tool

Preparation:

1. Select statistically improbable features on the basis of theirentropy.

I A ‘feature’ is a (substring-) sequence of 64 consecutive bytes.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 32/45

Page 33: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Similarity Digests: A sample tool

Feature selection:

2. Shannon entropy score H is calculated where P (Xi) is theempirical probability (i.e., the relative frequency) ofencountering ASCII code i within a feature.

H = −255∑i=0

P (Xi) · log2 (P (Xi))

3. Roussev uses a precedence rank Rprec. The least likelyfeatures measured by its entropy score gets the lowest rank.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 33/45

Page 34: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Similarity Digests: A sample tool

Identification of popular features:

4. A sliding window of a size 64 is going through all Rprec

values. At each position sdhash increments the Rpop score forthe leftmost feature with the lowest Rprec.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 34/45

Page 35: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Similarity Digests: A sample tool

Feature selection:

Frank Breitinger Hash Functions in Forensics / SoSe 2012 35/45

Page 36: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Similarity Digests: A sample tool

Fingerprint generation:

5. Hash underlying byte sequences for all Rpop scores higher thana certain threshold using SHA-1.

I Split hash (160 bit) into 5 sub-hashes of 32 bits and use the11 least significant bits of each sub hash.

I E.g., 24645aa1 3b9b9d0a b6e2da45 89fcd42d 215de81c

Least significant 11 bits of each sub-hash:24645aa1 = aa1 = 1010 1010 0001

11 bits: 010 1010 0001

6. Depending on these 5 x 11 bits set 5 bits within a Bloom filter(size 256 bytes = 2048 bits = 211 bits).

I A Bloom filter is a bit vector / array.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 36/45

Page 37: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Approaches and their Tools

Bloom filter

I Empty Bloom filter is a bit array of m bits, all set to 0.I k different hash functions are defined (e.g., sub-hashes within

sdhash).I Each hash function maps or hashes some set element to one

of the m array positions with a uniform random distribution.

http://wikipedia.de

I To query for an element (test whether it is in the set), feed itto each of the k hash functions to get k array positions

Frank Breitinger Hash Functions in Forensics / SoSe 2012 37/45

Page 38: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Applications for Fuzzy Hashing

Repetition

Similarity Preserving Hashing

Approaches and their Tools

Applications for Fuzzy Hashing

Peculiarities

Frank Breitinger Hash Functions in Forensics / SoSe 2012 38/45

Page 39: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Applications for Fuzzy Hashing

Applications

1. Forensics (on the file level): Detect similar files.I Blacklisting:

I Detect manipulated suspicious files.

I Find fragments of suspicious data.

I Identify similar versions of files.

I Whitelisting: Find changed unsuspicious files?

Frank Breitinger Hash Functions in Forensics / SoSe 2012 39/45

Page 40: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Applications for Fuzzy Hashing

Applications

2. Biometrics: Template protection.I Due to privacy only save hash value.I Why do we need similarity preserving hashing and not

cryptographic hashing (e.g., SHA-1)?

3. Malware Detection:I Detect obfuscated malware (e.g., metamorphic malware).

4. Junk mail detection.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 40/45

Page 41: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Peculiarities

Repetition

Similarity Preserving Hashing

Approaches and their Tools

Applications for Fuzzy Hashing

Peculiarities

Frank Breitinger Hash Functions in Forensics / SoSe 2012 41/45

Page 42: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Peculiarities

Database comparison

Let n be the number of hash values entries within a database.

I Comparison time for cryptographic hash functions: O(log2 n).

I Comparison time for similarity preserving hashing: O(n).I Ordering is not possible.

I Fragment detection.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 42/45

Page 43: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Peculiarities

Drawbacks

1. Compression → size of the hash value.I Fixed size of few bits vs. variable of several thousand bytes

2. Ease of computation → creating a hash value.I Currently cryptographic hash functions are faster.

Frank Breitinger Hash Functions in Forensics / SoSe 2012 43/45

Page 44: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Peculiarities

Whitelisting

Does it make sense to use non-cryptographic hash functions forwhitelisting?

Frank Breitinger Hash Functions in Forensics / SoSe 2012 44/45

Page 45: Chapter 3: [1.5ex] Similarity Preserving Hash Functions · Similarity Preserving Hash Functions Frank Breitinger ... I Similarity digests ... Frank Breitinger Hash Functions in Forensics

Questions?

http://www.glasbergen.com/

Frank Breitinger Hash Functions in Forensics / SoSe 2012 45/45