String Matching in Hardware using the FM-Index

12
String Matching in Hardware using the FM-Index Author: Edward Fernandez, Walid Najjar and Stefano Lonardi Publisher: FCCM,2011 Presenter: Jia-Wei,You Date: 2012/4/11 1

description

String Matching in Hardware using the FM-Index. Author: Edward Fernandez, Walid Najjar and Stefano Lonardi Publisher: FCCM,2011 Presenter: Jia-Wei,You Date : 2012/4/11. Introduction. String matching is the problem of searching for patterns in a long text . - PowerPoint PPT Presentation

Transcript of String Matching in Hardware using the FM-Index

Page 1: String Matching in Hardware using the FM-Index

String Matching in Hardware using the FM-Index

Author: Edward Fernandez, Walid Najjar and Stefano LonardiPublisher: FCCM,2011

Presenter: Jia-Wei,You

Date: 2012/4/11

1

Page 2: String Matching in Hardware using the FM-Index

Introduction• String matching is the problem of searching for patterns in a

long text.

• A recent breakthrough in this field is the FM-index, a data structure that synergistically combines the Burrows-Wheeler transform and the suffix array.

• It is compared to the brute force approach and it is shown that the FM-index has a higher effective throughput than the brute force. This is due to the higher number of character comparisons per cycle performed by the FM-index.

2

Page 3: String Matching in Hardware using the FM-Index

Burrows-Wheeler transform

3

Page 4: String Matching in Hardware using the FM-Index

I-table & C-table

4

Q = GCTAATTAGGTACC$BWT(Q) = CTTTACAG$AGCGTASBWT(Q) = $AAAACCCGGGTTTT

Page 5: String Matching in Hardware using the FM-Index

Searching and locating

5

Pattern searching using the FM-index starts with initializing the top and bottom pointers to the first and last indices of the C-table respectively.

Process one character at a time, beginning with the last character of the pattern.

The top and bottom pointers move to different suffix array indices according to the current character processed and the current index where the top and bottom pointers are indexing.

Page 6: String Matching in Hardware using the FM-Index

Searching and locating(1/3)

6

Page 7: String Matching in Hardware using the FM-Index

Searching and locating(2/3)

7

Page 8: String Matching in Hardware using the FM-Index

Searching and locating(3/3)

8

Page 9: String Matching in Hardware using the FM-Index

Architecture

9

Page 10: String Matching in Hardware using the FM-Index

Performance(1/3)

10

Xilinx Virtex 6(XC6VLX760)

262144 characters

Page 11: String Matching in Hardware using the FM-Index

Performance(2/3)

11

Page 12: String Matching in Hardware using the FM-Index

Performance(3/3)

12