Boyer-Moore String Search Algorithm - univie.ac.at...Boyer-Moore String Search Algorithm Michael...

90
Boyer-Moore String Search Algorithm Michael Sedlmair Boyer, Robert S., and Moore, J. Strother. "A fast string searching algorithm." Communications of the ACM 20.10 (1977): 762-772.

Transcript of Boyer-Moore String Search Algorithm - univie.ac.at...Boyer-Moore String Search Algorithm Michael...

Boyer-Moore String Search Algorithm

Michael Sedlmair

Boyer, Robert S., and Moore, J. Strother. "A fast string searching algorithm." Communications of the ACM 20.10

(1977): 762-772.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Preliminaries — Assumptions• Target audience: CS students, ~2nd term • Prerequisites:

- Java basics • Not required

- Complexity theory (O-notation, etc.) - …

2

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

String search

3

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

String search

4

F I N D I N A H A Y S T A C K N E E D L E I N A

Text txt (with N characters, here N = 24)

N E E D L E

Pattern pat (with M characters, here M = 6)

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

5

How would you go about it?F I N D I N A H A Y S T A C K N E E D L E I N A

N E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

6

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

7

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

8

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

9

F I N D I N A H A Y S T A C K N E E D L E I N AI IN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

10

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

11

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

12

F I N D I N A H A Y S T A C K N E E D L E I N AI IN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

13

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

14

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

15

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

16

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

17

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

18

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

19

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

20

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

21

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Naïve algorithms

22

F I N D I N A H A Y S T A C K N E E D L E I N AI I I I I IN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Analysis

23

• What is the problem?

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Analysis• What is the problem? • Naïve search is very inefficient

24

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Analysis• What is the problem? • Naïve search is very inefficient• What could we change?

25

F I N D I N A H A Y S T A C K N E E D L E I N A

N E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Better naïve algorithm

26

F I N D I N A H A Y S T A C K N E E D L E I N AI IN E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Better naïve algorithm

27

F I N D I N A H A Y S T A C K N E E D L E I N AI IN E E D L E

‘A’ does never occur in the pattern

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Better naïve algorithm

28

F I N D I N A H A Y S T A C K N E E D L E I N AI IN E E D L E

‘A’ does never occur in the pattern

F I N D I N A H A Y S T A C K N E E D L E I N AI IN E E D L E

N E E D L EN E E D L E

skip!

Boyer-Moore Algorithm

Boyer, Robert S., and Moore, J. Strother. "A fast string searching algorithm." Communications of the ACM 20.10

(1977): 762-772.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Intuition• Try to skip as many as M characters when mismatch

(unless we have a reason not to)

30

F I N D I N A H A Y S T A C K N E E D L E I N A

N E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Intuition

31

F I N D I N A H A Y S T A C K N E E D L E I N A

N E E D L E

• Try to skip as many as M characters when mismatch(unless we have a reason not to)

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Intuition

32

F I N D I N A H A Y S T A C K N E E D L E I N A

N E E D L E

• Try to skip as many as M characters when mismatch(unless we have a reason not to)

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Intuition

33

F I N D I N A H A Y S T A C K N E E D L E I N A

N E E D L E

F I N D I N A H A Y S T A C K N E E D L E I N AI I I I I IN E E D L E

• Try to skip as many as M characters when mismatch(unless we have a reason not to)

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Intuition• Try to skip as many as M characters when mismatch

(unless we have a reason not to) • Scan pattern from right to left

34

F I N D I N A H A Y S T A C K N E E D L E I N A

N E E D L E

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 35

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 36

F I N D I N A H A Y S T A C K N E E D L E I N AI

N E E D L E

Checking

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 37

F I N D I N A H A Y S T A C K N E E D L E I N AI

N E E D L E

Skipping

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 38

F I N D I N A H A Y S T A C K N E E D L E I N AI

N E E D L E

Skipping

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 39

F I N D I N A H A Y S T A C K N E E D L E I N AI

N E E D L E

Skipping

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 40

F I N D I N A H A Y S T A C K N E E D L E I N AI

N E E D L E

Skipping

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 41

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Skipping

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

condition (a)!

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 42

F I N D I N A H A Y S T A C K N E E D L E I N AI

N E E D L E

Checking

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 43

F I N D I N A H A Y S T A C K N E E D L E I N AI

N E E D L E

Skipping

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

condition (b)!

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 44

F I N D I N A H A Y S T A C K N E E D L E I N AI

N E E D L E

Skipping

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 45

F I N D I N A H A Y S T A C K N E E D L E I N AI

N E E D L E

Checking

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 46

F I N D I N A H A Y S T A C K N E E D L E I N AI I

N E E D L E

Checking

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 47

F I N D I N A H A Y S T A C K N E E D L E I N AI

N E E D L E

Skipping

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 48

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Skipping

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 49

F I N D I N A H A Y S T A C K N E E D L E I N AI

N E E D L E

Checking

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 50

F I N D I N A H A Y S T A C K N E E D L E I N AI I

N E E D L E

Checking

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair 51

F I N D I N A H A Y S T A C K N E E D L E I N AI I I I I IN E E D L E

Boyer-Moore: Bad character ruleUpon mismatch, skip alignments (a) until mismatch becomes a match, or (b) pat moves past mismatch character.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Analysis• Idea: If we know how much to skip, we can drastically

reduce the steps needed

52

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Analysis• Idea: If we know how much to skip, we can drastically

reduce the steps needed • Boyer-Moore

• Naïve

53

A simple Java implementation

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Simple algorithm• Only find first occurrence • Based on remembering rightmost occurrence

55

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Q: How much to skipA: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)

56

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Q: How much to skip

57

A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)

//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)

right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)

right[pat.CharAt(j)] = j;

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Q: How much to skip

58

right[c]nullnullnullnullnullnullnullnullnullnull

A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)

//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)

right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)

right[pat.CharAt(j)] = j;

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Q: How much to skip

59

c right[c]A -1B -1C -1D -1E -1… -1L -1M -1N -1… -1

A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)

//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)

right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)

right[pat.CharAt(j)] = j;

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Q: How much to skip

60

N E E D L Ec 0 1 2 3 4 5 right[c]A -1B -1C -1D -1E -1… -1L -1M -1N -1… -1

A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)

//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)

right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)

right[pat.CharAt(j)] = j;

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Q: How much to skip

61

N E E D L Ec 0 1 2 3 4 5 right[c]A -1B -1C -1D -1E -1… -1L -1M -1N 0 0… -1

A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)

//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)

right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)

right[pat.CharAt(j)] = j;

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Q: How much to skip

62

N E E D L Ec 0 1 2 3 4 5 right[c]A -1B -1C -1D -1E 1 1… -1L -1M -1N 0 0… -1

A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)

//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)

right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)

right[pat.CharAt(j)] = j;

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Q: How much to skip

63

N E E D L Ec 0 1 2 3 4 5 right[c]A -1B -1C -1D -1E 1 2 2… -1L -1M -1N 0 0… -1

A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)

//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)

right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)

right[pat.CharAt(j)] = j;

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Q: How much to skip

64

N E E D L Ec 0 1 2 3 4 5 right[c]A -1B -1C -1D 3 3E 1 2 2… -1L -1M -1N 0 0… -1

A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)

//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)

right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)

right[pat.CharAt(j)] = j;

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Q: How much to skip

65

N E E D L Ec 0 1 2 3 4 5 right[c]A -1B -1C -1D 3 3E 1 2 2… -1L 4 4M -1N 0 0… -1

A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)

//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)

right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)

right[pat.CharAt(j)] = j;

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Q: How much to skip

66

N E E D L Ec 0 1 2 3 4 5 right[c]A -1B -1C -1D 3 3E 1 2 5 5… -1L 4 4M -1N 0 0… -1

A: Precompute index of rightmost occurrence of character c in pattern pat (-1 if character is not in pattern)

//create array for all chars//R = length of alphabet (eg 128)right = new int[R];//initialize everything with -1for (int c = 0; c < R; c++)

right[c] = -1;//store rightmost occurrence//M is the length of Pattern P for (int j = 0; j < M; c++)

right[pat.CharAt(j)] = j;

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

(Simple) Java implementation

67

public int search (String txt, String pat){

int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){

skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }

}if (skip == 0) return i;

}return N;

}

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

(Simple) Java implementation

68

F I N D I N A H A Y S T A C K N E E D L E I N A

N E E D L E

txt

pat

N

M

skip

public int search (String txt, String pat){

int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){

skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }

}if (skip == 0) return i;

}return N;

}

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

(Simple) Java implementation

69

public int search (String txt, String pat){

int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){

skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }

}if (skip == 0) return i;

}return N;

}

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

(Simple) Java implementation

70

public int search (String txt, String pat){

int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){

skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }

}if (skip == 0) return i;

}return N;

}

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

(Simple) Java implementation

71

F I N D I N A H A Y S T A C K N E E D L E I N AI

N E E D L E

Loop over patternpublic int search (String txt, String pat){

int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){

skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }

}if (skip == 0) return i;

}return N;

}

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

(Simple) Java implementation

72

F I N D I N A H A Y S T A C K N E E D L E I N AI

N E E D L E

Checking: mismatch?public int search (String txt, String pat){

int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){

skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }

}if (skip == 0) return i;

}return N;

}

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

(Simple) Java implementation

73

Skipping

compute skip value

// compute skip valueright[txt.charAt(i+j)]// (here) = 0

F I N D I N A H A Y S T A C K N E E D L E I N A

N E E D L E

public int search (String txt, String pat){

int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){

skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }

}if (skip == 0) return i;

}return N;

}

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

(Simple) Java implementation

74

Skipping

compute skip value

// compute skip valueright[txt.charAt(i+j)]// (here) = 0// we can jump by M-1 chars// j = M-1// skip = j-0 // (here) skip = 5

F I N D I N A H A Y S T A C K N E E D L E I N A

public int search (String txt, String pat){

int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){

skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }

}if (skip == 0) return i;

}return N;

}

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

(Simple) Java implementation

75

F I N D I N A H A Y S T A C K N E E D L E I N AIN E E D L E

Skipping

// compute skip valueright[txt.charAt(i+j)]// (here) = 0// we can jump by M-1 chars// j = M-1// skip = j-0 // (here) skip = 5

public int search (String txt, String pat){

int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){

skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }

}if (skip == 0) return i;

}return N;

}

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

(Simple) Java implementation

76

Skip at least by 1

in case other term is nonpositive

. . . . . . E L E .I I I

N E E D L E

public int search (String txt, String pat){

int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){

skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }

}if (skip == 0) return i;

}return N;

}

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

(Simple) Java implementation

77

Skip at least by 1

in case other term is nonpositive

. . . . . . E L E .I I I

N E E D L E

right[txt.charAt(i+j)]// = 5

N E E D L E

public int search (String txt, String pat){

int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){

skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }

}if (skip == 0) return i;

}return N;

}

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

(Simple) Java implementation

78

Skip at least by 1

in case other term is nonpositive

. . . . . . E L E .I I I

N E E D L E

right[txt.charAt(i+j)]// = 5

N E E D L E

// j = M-3// skip = j-5 = M-3-5 // skip = -2

public int search (String txt, String pat){

int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){

skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }

}if (skip == 0) return i;

}return N;

}

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

(Simple) Java implementation

79

Skip at least by 1

in case other term is nonpositive

. . . . . . E L E .I

N E E D L E

not jumping back!

// skip = -2

public int search (String txt, String pat){

int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){

skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }

}if (skip == 0) return i;

}return N;

}

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

(Simple) Java implementation

80

Skip at least by 1

in case other term is nonpositive

. . . . . . E L E .I

N E E D L E

public int search (String txt, String pat){

int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){

skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }

}if (skip == 0) return i;

}return N;

}

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

(Simple) Java implementation

81

MATCH!

F I N D I N A H A Y S T A C K N E E D L E I N AI I I I I IN E E D L E

match

public int search (String txt, String pat){

int N = txt.length();int M = pat.length();int skip;//loop over textfor (int i = 0; i <= N-M; i += skip){

skip = 0;//loop over pattern, right to leftfor (int j = M-1; j >= 0; j--){ //check if there is a mismatch if (pat.charAt(j) != txt.charAt(i+j)){ skip = Math.max(1, j - right[txt.charAt(i+j)]); break; }

}if (skip == 0) return i;

}return N;

}

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Analysis (continued)• Takes about ~N / M character comparisons

82

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Analysis (continued)• Takes about ~N / M character comparisons • Worst Case?

83

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Analysis (continued)• Takes about ~N / M character comparisons • Worst Case: N*M

84

B B B B B B B B B B B B B B B B B B B B B B B B

A B B B B B

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Analysis (continued)• Takes about ~N / M character comparisons • Worst Case: N*M

85

B B B B B B B B B B B B B B B B B B B B B B B B

A B B B B B

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Analysis (continued)• Takes about ~N / M character comparisons • Worst Case: N*M

86

B B B B B B B B B B B B B B B B B B B B B B B B

A B B B B B etc. etc.

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Homework: Do it yourself!Dale’s cone of experience

- what you heard today

- what you do yourself

87

{{

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Homework• Implement “extended” Boyer-Moore that …

- finds all occurrences of the pattern - make the algorithm not only remember the right-most occurrence, but all

occurrences • Analysis

- text: lorem ipsum — http://www.lipsum.com/feed/html - pattern: “nulla” - count steps needed - (not mandatory: compare to steps needed in simple Boyer Moore and naïve

search)

88

Jacobs University, Bremen — May 10, 2017Michael Sedlmair

Summary & Outlook• Naïve search algorithm

- inefficient • Boyer-Moore algorithm

- bad character rule- simple java implementation - ~ N/M character comparisons

• Homework - extended Boyer-Moore implementation

• What’s next - Boyer-Moore — good suffix rule

89

Questions?

Boyer-Moore Algorithm

Boyer, Robert S., and Moore, J. Strother. "A fast string searching algorithm." Communications of the ACM 20.10

(1977): 762-772.

F I N D I N A H A Y S T A C K N E E D L E I N AI

N E E D L E