Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler...
-
Upload
alicia-rebecca-carr -
Category
Documents
-
view
217 -
download
0
Transcript of Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler...
![Page 2: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/2.jpg)
What is it?
A specific kind of
text pattern
sequence of characters
that can be used for:
• pattern matching with strings
Split a text
o eg. Tryptic digestion
>seq = VGTKCCTKPESERMPCTEDYLSLILNR
>split(/(?!P)(?<=[RK])/, seq)
>> VGTK
>> CCTKPESER
>> MPCTEDYLSLILNR
![Page 3: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/3.jpg)
What is it?
Finding the text pattern in the input
o eg. Finding certain patterns of sequences
Find E[IL]+T IN
Replace text matching the pattern with other text
o eg. Translate a DNA sequence to a peptide sequence
…
![Page 4: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/4.jpg)
What is it?
Matching patterns in strings using simple rules and symbols
![Page 5: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/5.jpg)
Some background..
Stems from mathematics and computer science theory.
• Mathematical expressions “Regularity”
• Can be implemented using a deterministic finite automaton.
![Page 6: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/6.jpg)
#
+
>START INCREASE ACCEPT
Some background.. Finite Automaton
STATES
TRANSITIONS
![Page 7: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/7.jpg)
NFA and DFA
• Normally in computer science, regular expressions are represented either
by non-deterministic finite automaton or deterministic finite automaton.
• Every DFA is also an NFA. And every NFA can be translated into an
equivalent DFA.
• Since this is a short course, we are NOT going to proceed with them,
instead represent regular expressions with informal/unbound state
diagrams for the sake of simplicity.
![Page 8: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/8.jpg)
Symbols
[…] matching one of the characters or symbols in character list
A A T
A
[A] [AT]
A
T
![Page 9: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/9.jpg)
Symbols
A
T
[AT]G
[AT][G]A
[AT][G]T
[AT][G]G
[AT][G]C
[AT][G][ATGC]
G
G
A
T
G
C
![Page 10: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/10.jpg)
Symbols
A
T
[AT]G
[AT][G]A
[AT][G]T
[AT][G]G
[AT][G]C
[AT][G][ATGC]
A G A
G
G
A
T
G
C
![Page 11: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/11.jpg)
Symbols
A
T
[AT]G
[AT][G]A
[AT][G]T
[AT][G]G
[AT][G]C
[AT][G][ATGC]
A G A
G
G
A
T
G
C
A G T
![Page 12: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/12.jpg)
Symbols
A
T
[AT]G
[AT][G]A
[AT][G]T
[AT][G]G
[AT][G]C
[AT][G][ATGC]
A G A
G
G
A
T
G
C
A G T
A G G
![Page 13: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/13.jpg)
Symbols
A
T
[AT]G
[AT][G]A
[AT][G]T
[AT][G]G
[AT][G]C
[AT][G][ATGC]
A G A
G
G
A
T
G
C
A G T
A G G
A G C
![Page 14: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/14.jpg)
Symbols
A
T
[AT]G
[AT][G]A
[AT][G]T
[AT][G]G
[AT][G]C
[AT][G][ATGC]
A G A
G
G
A
T
G
C
A G T
A G G
A G C
T G A
![Page 15: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/15.jpg)
Symbols
A
T
[AT]G
[AT][G]A
[AT][G]T
[AT][G]G
[AT][G]C
[AT][G][ATGC]
A G A
G
G
A
T
G
C
A G T
A G G
A G C
T G A
T G T
![Page 16: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/16.jpg)
Symbols
A
T
[AT]G
[AT][G]A
[AT][G]T
[AT][G]G
[AT][G]C
[AT][G][ATGC]
A G A
G
G
A
T
G
C
A G T
A G G
A G C
T G A
T G T
T G G
![Page 17: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/17.jpg)
Symbols
A
T
[AT]G
[AT][G]A
[AT][G]T
[AT][G]G
[AT][G]C
[AT][G][ATGC]
A G A
G
G
A
T
G
C
A G T
A G G
A G C
T G A
T G T
T G G
T G C
![Page 18: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/18.jpg)
Symbols
A
T
[AT]G
[AT][G]A
[AT][G]T
[AT][G]G
[AT][G]C
[AT][G][ATGC]
G
G
A
T
G
C
CCGCGCTGATT
CCGCGCTGATT
![Page 19: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/19.jpg)
Symbols
^ the beginning of the string
^[AT][G][ATGC]
CCGCGCTGATT
AAGAT
AGAT
TGACA
TGA
TGCGGTCGATT
![Page 20: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/20.jpg)
Symbols
$ the end of the string
[AT][G][ATGC]$
CCGCGCTGATT
AAGAT
AGAT
TGACA
TGA
GGTCGATTTGC
![Page 21: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/21.jpg)
Symbols
+ one or more of the preceding patterns
AB+C
A AB+ AB+C
B
A B C
B C
![Page 22: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/22.jpg)
Symbols
+ one or more of the preceding patterns
AB+C
A AB+ AB+C
B
A B C
A B B C
B C
![Page 23: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/23.jpg)
Symbols
+ one or more of the preceding patterns
AB+C
A AB+ AB*C
A B C
A B B C
A B B B B C
B
B C
![Page 24: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/24.jpg)
Symbols
* zero or more of the preceding patterns
AB*C
A AB+ AB*C
A C
B
B C
C
![Page 25: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/25.jpg)
Symbols
* zero or more of the preceding patterns
AB*C
A AB+ AB*C
A C
A B C
B
B C
C
![Page 26: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/26.jpg)
Symbols
* zero or more of the preceding patterns
AB*C
A AB+ AB*C
A C
A B C
A B B B C
B
B C
C
![Page 27: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/27.jpg)
Symbols
? zero or one of the preceding patterns
COLOU?R
C O
C CO COL COLO COLOU COLOU?RO L O U R
R
L O R
![Page 28: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/28.jpg)
Symbols
? zero or one of the preceding patterns
COLOU?R
C O
C CO COL COLO COLOU COLOU?RO L O U R
R
L O R
C O L O U R
![Page 29: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/29.jpg)
Symbols
. one of any character
. .A .AT
H A T
A T
.AT
![Page 30: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/30.jpg)
Symbols
. one of any character
. .A .AT
H A T
C A T
A T
.AT
![Page 31: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/31.jpg)
Symbols
. one of any character
. .A .AT
H A T
C A T
A T
R A T
F A T
.AT
![Page 32: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/32.jpg)
Symbols
| or
TAT(A. |.A)T TATxyT , where x or y is A
() is for precedence
![Page 33: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/33.jpg)
Symbols
| or
TAT(A. |.A)T
T TA TAT
TATA TATA.
TAT(A. |.A)T
TAT.
A T
A
A,T,G,C
A,T,G,C
TAT.AA
T
T
T A T C A T
![Page 34: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/34.jpg)
Symbols
| or
TAT(A. |.A)T
T TA TAT
TATA TATA.
TAT.
A T
A
A,T,G,C
A,T,G,C
TAT.AA
T
T
T A T C A T
T A T A G T
TAT(A. |.A)T
![Page 35: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/35.jpg)
Symbols
| or
TAT(A. |.A)T
T TA TAT
TATA TATA.
TAT.
A T
A
A,T,G,C
A,T,G,C
TAT.AA
T
T
T A T C A T
T A T A G T
TAT(A. |.A)T
T A T A A T
![Page 36: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/36.jpg)
Symbols
| or
TAT(A. |.A)T
T TA TAT
TATA TATA.
TAT.
A T
A
A,T,G,C
A,T,G,C
TAT.AA
T
T
T A T C A T
T A T A G T
TAT(A. |.A)T
T A T A A T
T A T A A T
![Page 37: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/37.jpg)
Symbols
• 0-9 any digit 0 to 9
• A-Z any uppercase letter from A to Z
• a-z any lowercase letter from a to z
• [^…] matching any other character than those inside brackets
• {n, m} match at least n and at most m of the preceding pattern
• {n, } match at least n of the preceding pattern
• {, m} match at most m of the preceding pattern
![Page 38: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/38.jpg)
Symbols
[^PRK] [^RK] {4,} [RK]
1. Any character except P, R and K
2. Followed by, minimum 4 characters that are neither R nor K
3. Followed by R or K
Tryptic peptide with no missed clevage
Minimum 6 aa.
![Page 39: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/39.jpg)
Symbols
[0-9A-Z] {3,6} _DANRE
1. At least 3, at most 6 occurences of the preceding
2. Followed by _DANRE
Uniprot Zebrafish entry
![Page 40: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/40.jpg)
Regular Expression flavors
• Basic regular expressions are normally supported by all the utilities that support regular expressions.
• Sometimes, support for extended regular expressions are needed for specific regular expressions.
![Page 41: Regular Expressions January 27, 2015 Linux and basic scripting course Arzu Tugce Guler a.t.guler@lumc.nl.](https://reader033.fdocuments.net/reader033/viewer/2022051821/5697c0201a28abf838cd1f69/html5/thumbnails/41.jpg)
Practicing Regex
An online tool for visually testing regex:
https://www.debuggex.com/