Programming in Perl regular expressions and m,s operators

31
Programming in Perl regular expressions and m,s operators Peter Verhás January 2002.

description

Programming in Perl regular expressions and m,s operators. Peter Verhás January 2002. Pattern Matching Operator. expression =~ m/regexp/options; $a = "apple"; print "yes!" if $a =~ m/pp/; The result is TRUE (1) or FALSE (0). M operator options. g global search - PowerPoint PPT Presentation

Transcript of Programming in Perl regular expressions and m,s operators

Page 1: Programming in  Perl regular expressions and  m,s  operators

Programming in Perlregular expressions and m,s

operators

Peter VerhásJanuary 2002.

Page 2: Programming in  Perl regular expressions and  m,s  operators

Pattern Matching Operator

expression =~ m/regexp/options;

$a = "apple";

print "yes!" if $a =~ m/pp/;

The result is TRUE (1) or FALSE (0).

Page 3: Programming in  Perl regular expressions and  m,s  operators

M operator options

• g global search• i case insensitive search• m multi-line string• s single line string• o evaluate once only• x extended regular expression

Now let’s see what Regular expression is and then we will return to m operator fine points.

Page 4: Programming in  Perl regular expressions and  m,s  operators

Regular Expressions

• A regular expression is a string with joker characters and joker expressions.

• We will look at examples to explain it.

Page 5: Programming in  Perl regular expressions and  m,s  operators

Regular Expression to Verify Email (1)

@mail = ( '[email protected]', 'hab.akukk%mikkamakka@jeno', );

for( @mail ){ if( /^.*\@\w+\..+$/ ){ print "$_ seems to be a good eMail\n"; }else{ print "$_ bad address\n"; } }OUTPUT:[email protected] seems to be a good eMailhab.akukk%mikkamakka@jeno bad address

NOTES:$_ is used as defaultm/ is default when / is used$_ =~ m/^.*@\w+\..+$/

@ would also work instead of \@ but \@ is safe

Page 6: Programming in  Perl regular expressions and  m,s  operators

Regular Expression to Verify Email (2)

/^.*\@\w+\..+$/• ^ at the start of the string• .* zero or more any-character

– * means zero or more of what stands before

• \@ a single @ character• \w+ one or more alpha character

– + means one or more of what stands before

• \. one . (dot) character– special regexp character is escaped with \

• .+ one or more any character• $ until end of string

Page 7: Programming in  Perl regular expressions and  m,s  operators

Search and Replace Example of Regular Expressions

$text = 'JavaScript is not used on island Java.';

$text =~ s/Java(?!Script)/Borneo/;

print $text;

OUTPUT:JavaScript is not used on island Borneo.

NOTES:Operator s will be dicussed later in detail(?! ) is zero length forward look, detailed later

Page 8: Programming in  Perl regular expressions and  m,s  operators

Meta (joker) Character

• . any character but new line• ^ start of string• $ end of string• \ escaping the next character• \w any alpha character• \W any non-alpha character• \s any white space• \S any non-white space

Only examples, there are

other meta characters, see the Perl

manual.

Page 9: Programming in  Perl regular expressions and  m,s  operators

Parentheses (1)

$text = 'Hook is not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/;print "$1 $2 $3 $4 $5 $6\n";#$text = 'Hook i not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/;print "$1 $2 $3 $4 $5 $6\n";

OUTPUT:Hook ok is la l aHook ok i sl s l NOTES:

Numbering is in the order of the opening parentheses

Page 10: Programming in  Perl regular expressions and  m,s  operators

Parentheses without $n

$text = 'Hook is not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/;print "$1 $2 $3 $4 $5 .$6.\n";$text = 'Hook i not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/;print "$1 $2 $3 $4 $5 .$6.\n";

OUTPUT:Hook ok is la a ..Hook ok i sl l .. NOTES:

(?: ) groups sub-expression without creating reference

$6 is zero string

Page 11: Programming in  Perl regular expressions and  m,s  operators

Character classes

• List of characters between [ and ]• Interval, e.g. [a-f]• Negative character set [^a-f]

Page 12: Programming in  Perl regular expressions and  m,s  operators

Repetitions

• * zero or more times• + one or more times• ? zero or one time• {n} exactly n times• {n,} at least n times• {n,m} at least n times, at most m

times

NOTES:There is {n,} but there is

not {,m}

Why? (hint: {0,m} works, but {n,???}??)

Page 13: Programming in  Perl regular expressions and  m,s  operators

Greedy repetition

• Repetitions are greedy, eat as many characters as possible

$text = 'Hook is not used on island Java.';$text =~ /(.*)is/; #1print "$1.\n";$text =~ /(.*?)is/; #2print "$1.\n";$text =~ /(.*?)is.*n/; #3print "$1.\n";

OUTPUT:Hook is not used on .Hook .Hook .

Page 14: Programming in  Perl regular expressions and  m,s  operators

Other extensions

• Other UNIX tools also use simpler, similar regular expressions

• Perl regular expressions are more powerful

List of some extensions on the next slides

Page 15: Programming in  Perl regular expressions and  m,s  operators

Regular expression comment

(?# comment comes here)

• Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments!

Page 16: Programming in  Perl regular expressions and  m,s  operators

Regular Expression Parentheses

• (?: sub expression w/o $n)

(?: we have discussed it already beforehand as it came up in an example, but this is the proper

place to discuss this construct.)

Page 17: Programming in  Perl regular expressions and  m,s  operators

Positive look forward

(?= subregexp)

$t = 'jamaica rum rum kingston rum';

$t =~ s/([aeoui])(?=\w)/uc($1)/ge;

print $t;

• OUTPUT:jAmAIca rUm rUm kIngstOn rUm

Example:Uppercase all vowels standing inside a word

to upper case.

Page 18: Programming in  Perl regular expressions and  m,s  operators

Negative look forward

(?! subregexp)

$t = 'jamaica rum rum kingston rum';

$t =~ s/([aeoui])(?!\w)/uc($1)/ge;

print $t;

• OUTPUT:jamaicA rum rum kingston rum

Example:Uppercase all vowels standing end of a word

to upper case.

Page 19: Programming in  Perl regular expressions and  m,s  operators

Option change inside the regular expression

(? imsx)• This can be used inside m/ or s/

operator.• i and g options can not be used

Now we go back to operator m/ and discuss some details.

Page 20: Programming in  Perl regular expressions and  m,s  operators

M operator array result

@k = "abbabaa" =~ m/(bb).+(a.)/;

print $#k; print ' ',$k[0],' ',$k[1],"\n";

OUTPUT:1 bb aa

NOTES:Parts of the expression are closed into ( )$1, $2 ... are the default variables where the

substrings are put

Page 21: Programming in  Perl regular expressions and  m,s  operators

M operator option g (1)

@k = "abbabaa" =~ m/(b)(a)/g;

print $#k,' ',$k[0],' ',$k[1],' ',$k[2],' ',$k[3],"\n";

OUTPUT:3 b a b a

NOTES:$_ is used as defaultm/ is default when / is used@ would also work instead of \@

but it is safe

Page 22: Programming in  Perl regular expressions and  m,s  operators

M operator option g (2)

$t = "abbabaa";

while( $t =~ m/(ab)(b|a)/g ){

print pos($t)," $1 $2\n";

}

OUTPUT:3 ab b

6 ab a

Page 23: Programming in  Perl regular expressions and  m,s  operators

M operator option i

• Case insensitive matchprint '.',"apple" =~ /AppLe/,".\n";

print '.',"apple" =~ /AppLe/i,".\n";

• prints..

.1.

Page 24: Programming in  Perl regular expressions and  m,s  operators

M operator options m and s

$t = "mah\na\nb";while( $t =~ /(.?.)$/mg ){ print '.',$1; }print ".\n";while( $t =~ /(.?.)$/sg ){ print '.',$1; }print ".\n";while( $t =~ /(.?.)$/g ){ print '.',$1; }print ".\n";• OUTPUT:.ah.a.b..b..b.

m matches $ to all \n in the strings matches . to \n (otherwise . is any character but \n)

Page 25: Programming in  Perl regular expressions and  m,s  operators

M operator option o

• Evaluate the regular expression only once to save processor

$t = "al brab";$a = 'al'; $b = 'rab';&q;&p;$b = 'fe';&q;&p;sub q { print ' q',$t =~ /$a\sb$b/o }sub p { print ' p',$t =~ /$a\sb$b/ }

• prints

q1 p1 q1 p

Page 26: Programming in  Perl regular expressions and  m,s  operators

M operator option x

@k = "abbabaa" =~ m/(bb) #two or more 'b' gets into $1

.+ #one or more any-character

(a.) #a letter 'a' and exactly one any-character

/x; #space and comment allowed

print $#k;

print ' ',$k[0],' ',$k[1],"\n";

OUTPUT:1 bb aa

This option allows space (\ is space) and comments to ease readability.

Page 27: Programming in  Perl regular expressions and  m,s  operators

Operator s

$text =~ s/regexp/replace/egimosx• Options:

– e replace is interpreted as expression– g global search and replace– i case insensitive search– m string is treated as multi-line – o regular expression is evaluated only once– s string is treated as single-line– x extended syntax for the regexp

Page 28: Programming in  Perl regular expressions and  m,s  operators

Global Search and Replace

$t = "abbab" ;

$t =~ s/ab/aa/g;

print $t;OUTPUT:

aabaa replaces all occurrences of the search regular expression to the

replacement string

Page 29: Programming in  Perl regular expressions and  m,s  operators

m and s operators with different delimiters

• / is the default, but you can use• ' to have non-interpolated string• Other non alphanumeric

characters• () {} [] with matching character

pairs– In this case s{search}{replace}

Page 30: Programming in  Perl regular expressions and  m,s  operators

m and s operators with different delimiters example

$text = 'a@bba@bbabb';@b = ('bba');$text =~ s{@b}{q}g;print "$text\n";$text = 'a@bba@bbabb';$text =~ s'@b'q'g;print "$text\n";OUTPUT:a@q@qbbaqbaqbabb

@b is evaluated in the first search but not in the second

Page 31: Programming in  Perl regular expressions and  m,s  operators

Thank you for your kind attention.