regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt...

80
regex regular expressions

Transcript of regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt...

Page 1: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

regexregular expressions

Page 2: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Everywhere

grep -P perl style regexsed -r extended regexawk ~ /regex/

Page 3: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Everywhere

Java, Perl, Python

JavaScript, Ruby

PHP:Programmable Hyperlinked PASTA

Page 4: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Ctrl + F

search: “color”

(exact matches)

Page 5: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Ctrl + F

colou?r

(metacharacters)

Page 6: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Special Chars

\ ^ $? * +[ ] |{ } .

( )

Page 7: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Literal Chars

\escape character

(to match special chars)

Page 8: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Literal Chars

\

Slash\\dot\.

Page 9: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Dot

.

matches any 1 character(very powerful)

Page 10: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Dot

file-201311..\.txt

Page 11: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Count Modifiers

? * +

allow for repetition

Page 12: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Count Modifiers

?

0 or 1(optional)

Page 13: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Count Modifiers

?

colou?r

color colour

Page 14: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Count Modifiers

+

1 or more(at least 1)

Page 15: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Count Modifiers

+

ap+le

aple apple apppleale apppp pple

Page 16: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Count Modifiers

*

any amount(0 or more)

(includes zero)(did i mention zero?)

Page 17: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Count Modifiers

*

ap*le

aple apple apppleale

Page 18: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

{ }

custom ranges :{3} {2,5}{4,} {,10}

Exact Counts

Page 19: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

{ }

ap{3}le

apppleaple apple ale

Exact Counts

Page 20: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

{ }

hel{2,3}o

hello helllohelo heo

Exact Counts

Page 21: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

{ }

? + *{0,1} {1,} {0,}

Exact Counts

Page 22: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

[ ]

groups set of chars:(treat them as a unit)

Character Sets

Page 23: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

[ ]

gr[ea]y

grey graygreey graay gry groy

Character Sets

Page 24: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

[ ]file-201[23]0101\.txt

file-20120101.txtfile-20130101.txtfile-20110101.txt

Character Sets

Page 25: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

mix w/ count modifiers

[abcdef0123456789]+a0c4f5f51azebra3000

Character Sets

Page 26: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

mix w/ count modifiers

[xo]{2,}xo ox oo xx oxoxo

x o

Character Sets

Page 27: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

[ ]

ranges using ‘-’:[a-z] [A-Z] [0-9]

Character Sets

Page 28: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

[ ]

[a-f0-9]{32}

Character Sets

Page 29: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

[ ]

[a-f0-9]{32}

MD5 strings

Character Sets

Page 30: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

[ ]

he[1!lL]{2}[oO0]

Character Sets

Page 31: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

[ ]

he[1!lL]{2}[oO0]

he!!o heL10 heLLOhe1lo he!l0

Character Sets

Page 32: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Character Sets

[^ ]

negation sets:matches any NOT in the set

(get more than you think)

Page 33: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Character Sets

[^ ]file-201[^01]0101\.txt

file-20100101.txtfile-20110101.txtfile-20120101.txtfile-20130101.txt

Page 34: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Character Sets

[^ ]

he[^1!]{2}[^0]

hexxj help! he999 ...

Page 35: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

\d [0-9]\w [A-Za-z0-9_]\s [ \t\r\n\f]

Short hand

Page 36: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

file-20130[1-6]\d\d\.txt

hello,?\s*world

Short hand

Page 37: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

file-20130[1-6]\d\d\.txt(first half of 2013*)

hello,?\s*world

Short hand

Page 38: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

file-20130[1-6]\d\d\.txt(first half of 2013)

hello,?\s*worldhelloworld hello,world

hello world hello, world

Short hand

Page 39: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Short hand

\w+@\w+\.\w+

Page 40: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Short hand

\w+@\w+\.\w+

(very weak email regex)

Page 41: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

! Short hand

Negation Short hand\D [^0-9]\W [^A-Za-z0-9_]\S [^ \t\r\n\f]

Page 42: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

careful :[\d\D]*

! Short hand

Page 43: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

careful :[\d\D]*

(matches everything)

! Short hand

Page 44: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Anchors

^ \b $

^ Start of the string$ End of the string\b “word boundaries”

Page 45: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Anchors

^ $^\d+$

^[a-f\d]{32}$^\d{3}-\d{2}-\d{4}$

^14:17:05.*.*the end$

Page 46: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Anchors

\b“whole words only”

\bword\b

\bhello, world\bwell, hello, world!

Page 47: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Anchors

zero width

\bhello\b..\bworld\bwell, hello, world!

Page 48: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Anchors

zero width

\bhello\b\bworld\b(syntax error)

Page 49: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Alternate

|

OR several regexs“or”

Page 50: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Alternate

|

cat|dog

cat dog

Page 51: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Alternate

|

^\d+$|^[a-f\d]{10,}$

Page 52: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Alternate

|

^\d+$|^[a-f\d]{10,}$1230843

af765cf75c

Page 53: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Capture Groups

( )

“a regex within a regex”

mix w/ ? * + and |

Page 54: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Capture Groups

( )

Get GetValue

Page 55: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Capture Groups

( )

Get GetValue

Get|GetValue

Page 56: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Capture Groups

( )

Get(Value)?

Page 57: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Capture Groups

( )

Get(Value)?

Get GetValue

Page 58: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Capture Groups

( )

(I am|We are) (not )?happy

Page 59: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Capture Groups

( )

(I am|We are) (not )?happy

I am not happyWe are happy

Page 60: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Capture Groups

( )

(\d{1,3}\.){3}\d{1,3}

Page 61: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Capture Groups

( )

(\d{1,3}\.){3}\d{1,3}

(weak IP addr regex)

Page 62: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Capture Groups

( )

“capture” meaning to be able to reference it later

Page 63: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Backreferences

\1 \2 \3 ...refer to a matched group

(one|toe)-to-\1

Page 64: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Backreferences

\1 \2 \3 ...refer to a matched group

(one|toe)-to-\1one-to-one toe-to-toeone-to-toe toe-to-one

Page 65: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Backreferences

\1 \2 \3 ...

(x|X)(o|O)\1\2\1\2

Page 66: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Backreferences

\1 \2 \3 ...

(x|X)(o|O)\1\2\1\2

xoxoxo XOXOXOxOxOxO XoXoXo

Page 67: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Backreferences

\1 \2 \3 ...Non-matching vs optional

(x?)hi\1 hi xhix(x)?hi\1 hi xhix

Page 68: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Lookarounds

Zero Width Assertions

Assert that pattern is followed or preceded by

something

Page 69: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Lookahead

(?=...)(?!...)

Assert you will (not) be followed

Page 70: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Lookahead

(?=...)“positive lookahead”

q(?=u)quiet quilt iraq

Page 71: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Lookahead

(?!...)“negative lookahead”

q(?!u)quiet quilt iraq

Page 72: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Lookahead

^(?=US|BR|CA)\w{2}$

Page 73: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Lookahead

^(?=US|BR|CA)\w{2}$

US BR CABZ RA CH

Page 74: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Lookahead

“6 letter word containing the word ‘cat’ ”

Page 75: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Lookahead

“6 letter word containing the word ‘cat’ ”

\b\w{6}\b\w*cat\w*

Page 76: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Lookahead

“6 letter word containing the word ‘cat’ ”

(?=\b\w{6}\b)\b\w*cat\w*\b

Page 77: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Look Behind

(?<=...)(?<!...)

asserts that not preceded by something

Page 78: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Look Behind

(?<=...)(?<!...)

(?<!a)bbab

Page 79: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Look Behind

(?<=...)(?<!...)

(?<=a)babcb

Page 80: regular expressions · file-201[23]0101\.txt file-20120101.txt file-20130101.txt file-20110101.txt Character Sets

Look arounds

(?!...)assertion is not part of

the match