Download - Programming in Perl regular expressions and m,s operators Peter Verhás January 2002.

Transcript

Programming in Perlregular expressions and m,s

operators

Peter VerhásJanuary 2002.

Pattern Matching Operator

expression =~ m/regexp/options;

$a = "apple";

print "yes!" if $a =~ m/pp/;

The result is TRUE (1) or FALSE (0).

M operator options

• g global search• i case insensitive search• m multi-line string• s single line string• o evaluate once only• x extended regular expression

Now let’s see what Regular expression is and then we will return to m operator fine points.

Regular Expressions

• A regular expression is a string with joker characters and joker expressions.

• We will look at examples to explain it.

Regular Expression to Verify Email (1)

@mail = ( '[email protected]', 'hab.akukk%mikkamakka@jeno', );

for( @mail ){ if( /^.*\@\w+\..+$/ ){ print "$_ seems to be a good eMail\n"; }else{ print "$_ bad address\n"; } }OUTPUT:[email protected] seems to be a good eMailhab.akukk%mikkamakka@jeno bad address

NOTES:$_ is used as defaultm/ is default when / is used$_ =~ m/^.*@\w+\..+$/

@ would also work instead of \@ but \@ is safe

Regular Expression to Verify Email (2)

/^.*\@\w+\..+$/• ^ at the start of the string• .* zero or more any-character

– * means zero or more of what stands before

• \@ a single @ character• \w+ one or more alpha character

– + means one or more of what stands before

• \. one . (dot) character– special regexp character is escaped with \

• .+ one or more any character• $ until end of string

Search and Replace Example of Regular Expressions

$text = 'JavaScript is not used on island Java.';

$text =~ s/Java(?!Script)/Borneo/;

print $text;

OUTPUT:JavaScript is not used on island Borneo.

NOTES:Operator s will be dicussed later in detail(?! ) is zero length forward look, detailed later

Meta (joker) Character

• . any character but new line• ^ start of string• $ end of string• \ escaping the next character• \w any alpha character• \W any non-alpha character• \s any white space• \S any non-white space

Only examples, there are

other meta characters, see the Perl

manual.

Parentheses (1)

$text = 'Hook is not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/;print "$1 $2 $3 $4 $5 $6\n";#$text = 'Hook i not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/;print "$1 $2 $3 $4 $5 $6\n";

OUTPUT:Hook ok is la l aHook ok i sl s l NOTES:

Numbering is in the order of the opening parentheses

Parentheses without $n

$text = 'Hook is not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/;print "$1 $2 $3 $4 $5 .$6.\n";$text = 'Hook i not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/;print "$1 $2 $3 $4 $5 .$6.\n";

OUTPUT:Hook ok is la a ..Hook ok i sl l .. NOTES:

(?: ) groups sub-expression without creating reference

$6 is zero string

Character classes

• List of characters between [ and ]• Interval, e.g. [a-f]• Negative character set [^a-f]

Repetitions

• * zero or more times• + one or more times• ? zero or one time• {n} exactly n times• {n,} at least n times• {n,m} at least n times, at most m

times

NOTES:There is {n,} but there is

not {,m}

Why? (hint: {0,m} works, but {n,???}??)

Greedy repetition

• Repetitions are greedy, eat as many characters as possible

$text = 'Hook is not used on island Java.';$text =~ /(.*)is/; #1print "$1.\n";$text =~ /(.*?)is/; #2print "$1.\n";$text =~ /(.*?)is.*n/; #3print "$1.\n";

OUTPUT:Hook is not used on .Hook .Hook .

Other extensions

• Other UNIX tools also use simpler, similar regular expressions

• Perl regular expressions are more powerful

List of some extensions on the next slides

Regular expression comment

(?# comment comes here)

• Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments!

Regular Expression Parentheses

• (?: sub expression w/o $n)

(?: we have discussed it already beforehand as it came up in an example, but this is the proper

place to discuss this construct.)

Positive look forward

(?= subregexp)

$t = 'jamaica rum rum kingston rum';

$t =~ s/([aeoui])(?=\w)/uc($1)/ge;

print $t;

• OUTPUT:jAmAIca rUm rUm kIngstOn rUm

Example:Uppercase all vowels standing inside a word

to upper case.

Negative look forward

(?! subregexp)

$t = 'jamaica rum rum kingston rum';

$t =~ s/([aeoui])(?!\w)/uc($1)/ge;

print $t;

• OUTPUT:jamaicA rum rum kingston rum

Example:Uppercase all vowels standing end of a word

to upper case.

Option change inside the regular expression

(? imsx)• This can be used inside m/ or s/

operator.• i and g options can not be used

Now we go back to operator m/ and discuss some details.

M operator array result

@k = "abbabaa" =~ m/(bb).+(a.)/;

print $#k; print ' ',$k[0],' ',$k[1],"\n";

OUTPUT:1 bb aa

NOTES:Parts of the expression are closed into ( )$1, $2 ... are the default variables where the

substrings are put

M operator option g (1)

@k = "abbabaa" =~ m/(b)(a)/g;

print $#k,' ',$k[0],' ',$k[1],' ',$k[2],' ',$k[3],"\n";

OUTPUT:3 b a b a

NOTES:$_ is used as defaultm/ is default when / is used@ would also work instead of \@

but it is safe

M operator option g (2)

$t = "abbabaa";

while( $t =~ m/(ab)(b|a)/g ){

print pos($t)," $1 $2\n";

}

OUTPUT:3 ab b

6 ab a

M operator option i

• Case insensitive matchprint '.',"apple" =~ /AppLe/,".\n";

print '.',"apple" =~ /AppLe/i,".\n";

• prints..

.1.

M operator options m and s

$t = "mah\na\nb";while( $t =~ /(.?.)$/mg ){ print '.',$1; }print ".\n";while( $t =~ /(.?.)$/sg ){ print '.',$1; }print ".\n";while( $t =~ /(.?.)$/g ){ print '.',$1; }print ".\n";• OUTPUT:.ah.a.b..b..b.

m matches $ to all \n in the strings matches . to \n (otherwise . is any character but \n)

M operator option o

• Evaluate the regular expression only once to save processor

$t = "al brab";$a = 'al'; $b = 'rab';&q;&p;$b = 'fe';&q;&p;sub q { print ' q',$t =~ /$a\sb$b/o }sub p { print ' p',$t =~ /$a\sb$b/ }

• prints

q1 p1 q1 p

M operator option x

@k = "abbabaa" =~ m/(bb) #two or more 'b' gets into $1

.+ #one or more any-character

(a.) #a letter 'a' and exactly one any-character

/x; #space and comment allowed

print $#k;

print ' ',$k[0],' ',$k[1],"\n";

OUTPUT:1 bb aa

This option allows space (\ is space) and comments to ease readability.

Operator s

$text =~ s/regexp/replace/egimosx• Options:

– e replace is interpreted as expression– g global search and replace– i case insensitive search– m string is treated as multi-line – o regular expression is evaluated only once– s string is treated as single-line– x extended syntax for the regexp

Global Search and Replace

$t = "abbab" ;

$t =~ s/ab/aa/g;

print $t;OUTPUT:

aabaa replaces all occurrences of the search regular expression to the

replacement string

m and s operators with different delimiters

• / is the default, but you can use• ' to have non-interpolated string• Other non alphanumeric

characters• () {} [] with matching character

pairs– In this case s{search}{replace}

m and s operators with different delimiters example

$text = 'a@bba@bbabb';@b = ('bba');$text =~ s{@b}{q}g;print "$text\n";$text = 'a@bba@bbabb';$text =~ s'@b'q'g;print "$text\n";OUTPUT:a@q@qbbaqbaqbabb

@b is evaluated in the first search but not in the second

Thank you for your kind attention.