CPTG286K Programming - Perl Chapter 7: Regular Expressions.

22
CPTG286K Programming - Perl Chapter 7: Regular Expressions

Transcript of CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Page 1: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

CPTG286K Programming - Perl

Chapter 7: Regular Expressions

Page 2: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Regular Expressions (aka regex)

• Regular expressions are patterns used to match against a string

• Regular expressions are contained between slashes

• The outcome is either a successful match or a failure to match

• Substitution, join, and split operations can be performed on successful matches

Page 3: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Simple Uses of regex

while (<>) # similar to grep “abc” filename{

if (/abc/) # regex /abc/ matches abc to $_{ print; } # prints $_ if it contains abc

}

• Replacing regex /abc/ with:– /ab*c/ matches an a, followed by 0 or more b’s,

followed by a c; same as /ab{0,}c/– /ab+c/ matches an a, followed by 1 or more b’s,

followed by a c; same as /ab{1,}c/– /ab?c/ matches an a, followed by 0 or 1 b’s, followed

by a c; same as /ab{0,1}c/

Page 4: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Quantifiers

Symbol Meaning

+ Match 1 or more times

* Match 0 or more times

? Match 0 or 1 time

{n} Match exactly n times

{n,} Match at least n times

{n,m} Match at least n but not more than m times

Page 5: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Patterns

• Single-character patterns– Character class– Negated character class

• Grouping patterns– Parenthesis– Multipliers– Sequence and anchoring– Alternation

Page 6: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Single-Character Patterns

• Specific single-character match: /a/• Any non-newline character: /./• Character class: /[valid_list]/

– /[0-9]/ # or \d, any single digit– /[a-zA-Z0-9_]/ # or \w, any word– /[ \r\t\n\f]/ # or \s, any space

• Negated class: /[^valid_list]/– /[^0-9]/ # or \D, any single non-digit– /[^a-zA-Z0-9_]/ # or \W, any single non-word– /[^ \r\t\n\f]/ # or \S, any non-space

Page 7: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Parenthesis grouping

• This grouping is used to “memorize” a pattern, so it can be referenced later

• A memorized pattern is referenced using a backslash and parenthesis grouping number

Examples:/(a)(b)c\2d\1/; # matches abcbda/a(.*)b\1c/; # matches aFREDbFREDc but

# does not match aXXbXXXc

Page 8: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Multiplier grouping

/x{5}/ # matches exactly 5 x’s

/x{5,10}/ # matches 5 to 10 x’s

/fo+ba?r*/ # matches f followed by one or more o’s, a b, # an optional a, and zero or more r’s

/fo{1,}ba{0,1}r{0,}/ # same as /fo+ba?r*/ using a general multiplier

• By default, * and + groupings are greedy:$_ = “Nuts sold here. Come here!”;

/N.*here/ # $_ matches “Nuts sold here. Come here!”

/N.*?here/ # $_ matches “Nuts sold here.” (non-greedy)

Page 9: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Anchor grouping

• \b requires a word boundary for a match• \B requires NO word boundary for match• ^ matches beginning of the string• $ matches end of stringExamples:/\bFred\b/; # matches Fred, not Frederick or alFred/\bFred\B/; # matches Frederick, not Fred Flintstone/^a/; # matches strings beginning with a/c$/; # matches strings ending in c (before \n)

Page 10: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Alternatives grouping

/al|bert|c/; # matches al or bert or c

/^x|y/; # x at beginning of line, # or y anywhere

/^(x|y)/; # either x or y at # beginning of

line

/songbird|bluebird/;# songbird or bluebird

/(song|blue)bird/; # same, using parenthesis

/(a|b)(c|d)/; # ac, ad, bc, or bd

Page 11: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Regex Grouping Precedence

• Arranged from highest to lowest precedence:Name Representation

Parenthesis ( ) (?: )

Multipliers ? + * {m,n} ?? +? *? {m,n}?

Sequence and Anchoring abc ^ $ \A \Z (?= ) (?! )

Alternation |

Example:/a|b*/; # interpreted as /a|(b*)/, not (a|b)*

/a|(?:b*)/ ; # same, but does not trigger memory

# to store into \1

Page 12: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

The pattern binding =~ operator

• Use the =~ to bind pattern to a scalar variable other than the default $_ variable

• To match the regex to $name from keyboard:

print “Proceed (y/Y)? ”; # produce prompt

chomp ($name = <STDIN>); # chomp input

if ($name =~ /^[yY]/) # test both cases

print “Proceeding.”; # display decision

Page 13: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Ignoring case & other delimiters

• Append an i to the regex to ignore case:print “Proceed (y/Y)? ”; # produce promptchomp ($name = <STDIN>); # chomp inputif ($name =~ /^y/i) # use either case

print “Proceeding.”; # display decision

• To use a different delimiter:– Place an m followed by a new character in place of

slashes (i.e. a #)print “Proceed (y/Y)? ”; # produce promptchomp ($name = <STDIN>); # chomp inputif ($name =~ m#^y#i) # new # delimiter

print “Proceeding.”; # display decision

Page 14: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Variable Interpolation

• A regex can be constructed from computed strings rather than literals:

$sentence = “Every good bird does fly.”;

print “What should I look for? “; # prompt

$what = <STDIN>; # read keyboard

chomp($what); # chomp input

if ($sentence =~ /$what/) # matches [bw]ird

{ print “I saw $what in $sentence. \n”; }

else { print “Nope… didn’t find it.\n”; }

Page 15: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Special Read-only Variables

• Upon a successful pattern match, $1, $2, $3… are set to values in \1, \2, \3…

• These read-only variables can be used in later parts of the program:

$_ = “This is a test”;

/(\w+)\W+(\w+)/; # match first two words

# $1 is now “this” and

# $2 is now “is”

($first,$second) = /(\w+)\W+(\w+)/;

# $first is now “this” and $second is now “is”

Page 16: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

More Read-only Variables

• Use the $& variable to examine part of string matching a regex

• $` is part of string before matching part• $’ is part of string after matching part$_ = “This is a sample string”;/sa.*le/; # matches “sample”

# $` is now “This is a “# $& is now “sample”# $’ is now “ string”

Page 17: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Substitutions

• Use the substitution operator:s/regex/new-string/

• Replacement strings can be variable interpolated

• Can use pattern characters in the regex, and special read-only variables

• Can use ignore case and custom delimiters• Can use the pattern binding =~ operator

Page 18: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Split Function

• The split function splits a string into fields delimited by a regex

$line = “merlyn::118:10:Randal:/home/merlyn:/usr/bin/perl”;

@fields = split(/:/,$line); # split $line using

# : as delimiter

# @fields is now

# (“merlyn”, “”, “118”, “10”, “Randal”, “/home/merlyn”,

# “/usr/bin/perl”)

Page 19: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Splitting in list context

$line = “merlyn::118:10:Randal:/home/merlyn:”;

($name,$password,$uid,$gid,$gcos,$home,$shell) = split(/:/,$line); # split $line using : as delimiter

# $name is now “merlyn”,

# $password is now “”,

# $uid is now “118”,

# $gid is now “10”,

# $gcos is now “Randal”,

# $home is now “/home/merlyn”,

# $shell is now undef

Page 20: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

The “Default” Split

$_ = “some string”;

@words = split;

# same as @words = split(/\s+/, $_);

# where \s+ specifies 1 or more spaces

# @words is now (“some”,“string”)

Page 21: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Join Function

• The join function joins a list of values with a glue string between list elements

• The $line can be reconstructed from the @field using

$line = join(“:”, @fields); # glue string “:”

# is not a regex

Page 22: CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Glue Ahead & Trailing Glue

$_ = "some string"; # initialize default string

@words = split; # perform default split

print "@words\n"; # show split result

$result = join("+","",@words); # glue ahead

print "$result\n"; # $result is “+some+string”

$output = join(“\n”, @word, “”); # trailing glue

print $output\n”; # $output is “some\nstring\n”