Programming in Perl regular expressions and m,s operators Peter Verhás January 2002.

31
Programming in Perl regular expressions and m,s operators Peter Verhás January 2002.

Transcript of Programming in Perl regular expressions and m,s operators Peter Verhás January 2002.

Programming in Perlregular expressions and m,s

operators

Peter VerhásJanuary 2002.

Pattern Matching Operator

expression =~ m/regexp/options;

$a = "apple";

print "yes!" if $a =~ m/pp/;

The result is TRUE (1) or FALSE (0).

M operator options

• g global search• i case insensitive search• m multi-line string• s single line string• o evaluate once only• x extended regular expression

Now let’s see what Regular expression is and then we will return to m operator fine points.

Regular Expressions

• A regular expression is a string with joker characters and joker expressions.

• We will look at examples to explain it.

Regular Expression to Verify Email (1)

@mail = ( '[email protected]', 'hab.akukk%mikkamakka@jeno', );

for( @mail ){ if( /^.*\@\w+\..+$/ ){ print "$_ seems to be a good eMail\n"; }else{ print "$_ bad address\n"; } }OUTPUT:[email protected] seems to be a good eMailhab.akukk%mikkamakka@jeno bad address

NOTES:$_ is used as defaultm/ is default when / is used$_ =~ m/^.*@\w+\..+$/

@ would also work instead of \@ but \@ is safe

Regular Expression to Verify Email (2)

/^.*\@\w+\..+$/• ^ at the start of the string• .* zero or more any-character

– * means zero or more of what stands before

• \@ a single @ character• \w+ one or more alpha character

– + means one or more of what stands before

• \. one . (dot) character– special regexp character is escaped with \

• .+ one or more any character• $ until end of string

Search and Replace Example of Regular Expressions

$text = 'JavaScript is not used on island Java.';

$text =~ s/Java(?!Script)/Borneo/;

print $text;

OUTPUT:JavaScript is not used on island Borneo.

NOTES:Operator s will be dicussed later in detail(?! ) is zero length forward look, detailed later

Meta (joker) Character

• . any character but new line• ^ start of string• $ end of string• \ escaping the next character• \w any alpha character• \W any non-alpha character• \s any white space• \S any non-white space

Only examples, there are

other meta characters, see the Perl

manual.

Parentheses (1)

$text = 'Hook is not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/;print "$1 $2 $3 $4 $5 $6\n";#$text = 'Hook i not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/;print "$1 $2 $3 $4 $5 $6\n";

OUTPUT:Hook ok is la l aHook ok i sl s l NOTES:

Numbering is in the order of the opening parentheses

Parentheses without $n

$text = 'Hook is not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/;print "$1 $2 $3 $4 $5 .$6.\n";$text = 'Hook i not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/;print "$1 $2 $3 $4 $5 .$6.\n";

OUTPUT:Hook ok is la a ..Hook ok i sl l .. NOTES:

(?: ) groups sub-expression without creating reference

$6 is zero string

Character classes

• List of characters between [ and ]• Interval, e.g. [a-f]• Negative character set [^a-f]

Repetitions

• * zero or more times• + one or more times• ? zero or one time• {n} exactly n times• {n,} at least n times• {n,m} at least n times, at most m

times

NOTES:There is {n,} but there is

not {,m}

Why? (hint: {0,m} works, but {n,???}??)

Greedy repetition

• Repetitions are greedy, eat as many characters as possible

$text = 'Hook is not used on island Java.';$text =~ /(.*)is/; #1print "$1.\n";$text =~ /(.*?)is/; #2print "$1.\n";$text =~ /(.*?)is.*n/; #3print "$1.\n";

OUTPUT:Hook is not used on .Hook .Hook .

Other extensions

• Other UNIX tools also use simpler, similar regular expressions

• Perl regular expressions are more powerful

List of some extensions on the next slides

Regular expression comment

(?# comment comes here)

• Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments!

Regular Expression Parentheses

• (?: sub expression w/o $n)

(?: we have discussed it already beforehand as it came up in an example, but this is the proper

place to discuss this construct.)

Positive look forward

(?= subregexp)

$t = 'jamaica rum rum kingston rum';

$t =~ s/([aeoui])(?=\w)/uc($1)/ge;

print $t;

• OUTPUT:jAmAIca rUm rUm kIngstOn rUm

Example:Uppercase all vowels standing inside a word

to upper case.

Negative look forward

(?! subregexp)

$t = 'jamaica rum rum kingston rum';

$t =~ s/([aeoui])(?!\w)/uc($1)/ge;

print $t;

• OUTPUT:jamaicA rum rum kingston rum

Example:Uppercase all vowels standing end of a word

to upper case.

Option change inside the regular expression

(? imsx)• This can be used inside m/ or s/

operator.• i and g options can not be used

Now we go back to operator m/ and discuss some details.

M operator array result

@k = "abbabaa" =~ m/(bb).+(a.)/;

print $#k; print ' ',$k[0],' ',$k[1],"\n";

OUTPUT:1 bb aa

NOTES:Parts of the expression are closed into ( )$1, $2 ... are the default variables where the

substrings are put

M operator option g (1)

@k = "abbabaa" =~ m/(b)(a)/g;

print $#k,' ',$k[0],' ',$k[1],' ',$k[2],' ',$k[3],"\n";

OUTPUT:3 b a b a

NOTES:$_ is used as defaultm/ is default when / is used@ would also work instead of \@

but it is safe

M operator option g (2)

$t = "abbabaa";

while( $t =~ m/(ab)(b|a)/g ){

print pos($t)," $1 $2\n";

}

OUTPUT:3 ab b

6 ab a

M operator option i

• Case insensitive matchprint '.',"apple" =~ /AppLe/,".\n";

print '.',"apple" =~ /AppLe/i,".\n";

• prints..

.1.

M operator options m and s

$t = "mah\na\nb";while( $t =~ /(.?.)$/mg ){ print '.',$1; }print ".\n";while( $t =~ /(.?.)$/sg ){ print '.',$1; }print ".\n";while( $t =~ /(.?.)$/g ){ print '.',$1; }print ".\n";• OUTPUT:.ah.a.b..b..b.

m matches $ to all \n in the strings matches . to \n (otherwise . is any character but \n)

M operator option o

• Evaluate the regular expression only once to save processor

$t = "al brab";$a = 'al'; $b = 'rab';&q;&p;$b = 'fe';&q;&p;sub q { print ' q',$t =~ /$a\sb$b/o }sub p { print ' p',$t =~ /$a\sb$b/ }

• prints

q1 p1 q1 p

M operator option x

@k = "abbabaa" =~ m/(bb) #two or more 'b' gets into $1

.+ #one or more any-character

(a.) #a letter 'a' and exactly one any-character

/x; #space and comment allowed

print $#k;

print ' ',$k[0],' ',$k[1],"\n";

OUTPUT:1 bb aa

This option allows space (\ is space) and comments to ease readability.

Operator s

$text =~ s/regexp/replace/egimosx• Options:

– e replace is interpreted as expression– g global search and replace– i case insensitive search– m string is treated as multi-line – o regular expression is evaluated only once– s string is treated as single-line– x extended syntax for the regexp

Global Search and Replace

$t = "abbab" ;

$t =~ s/ab/aa/g;

print $t;OUTPUT:

aabaa replaces all occurrences of the search regular expression to the

replacement string

m and s operators with different delimiters

• / is the default, but you can use• ' to have non-interpolated string• Other non alphanumeric

characters• () {} [] with matching character

pairs– In this case s{search}{replace}

m and s operators with different delimiters example

$text = 'a@bba@bbabb';@b = ('bba');$text =~ s{@b}{q}g;print "$text\n";$text = 'a@bba@bbabb';$text =~ s'@b'q'g;print "$text\n";OUTPUT:a@q@qbbaqbaqbabb

@b is evaluated in the first search but not in the second

Thank you for your kind attention.