Post on 01-Jan-2016
description
Regular expressions
Used by several different UNIX commands, including ed, sed, awk, grep
A period ‘.’ matches any single characters .X. matches any X that is surrounded by any two
characters Caret character ^ matches the beginning of the
line ^Bridgeport matches the characters Bridgeport
only if they occur at the beginning of the line
Regular expressions (continue.)
A dollar sign ‘$’ is used to match the end of the line
Bridgeport$ will match the characters Bridgeport only they are the very last characters on the line
$ matches any single character at the end of the line
To match any single character, this character should be preceded by a backslash ‘\’ to remove the special meaning
\.$ matches any line end with a period
Regular expressions (continue.)
^$ matches any line that contains no characters […] is used to match any character enclosed in
[…] [tT] matches a lower or upper case t followed
immediately by the characters [A-Z] matches upper case letter [A-Za-z] matches upper or lower case letter [^A-Z] matches any character except upper case
letter [A-Za-z] matches any non alphabetic character
Regular expressions (continue.)
(*) Asterisk matches zero or more characters X* matches zero, one, two, three, … capital X’s XX* matches one or more capital X’s .* matches zero or more occurrences of any
characters e.*e matches all the characters from the first e in the
line to the last one [A-Za-z] [A-Za-z] * matches any alphabetic
character followed by zero or more alphabetic character
Regular expressions (continue.)
[-0-9] matches a single dash or digit character (ORDER IS IMPORTANT)
[0-9-] same as [-0-9] [^-0-9] matches any alphabetic except digits and
dash []a-z] matches a right bracket or lower case letter
(ORDER IS IMPORTANT)
Regular expressions (continue.)
\{min, max\} matches a precise number of characters
min specifies the minimum number of occurrences of the preceding regular expression to be matched, and max specifies the maximum
w\{1,10\} matches from 1 to 10 consecutive w’s [a-zA-Z]\{7\} matches exactly seven alphabetic
characters
Regular expressions (continue.)
X\{5,\} matches at least five consecutive X’s \(….) is used to save matched characters ^\(.\) matches the first character on the line and
store it into register one There is 1-9 registers To retrieve what is stored in any register \n is used Example: ^\(.\)\1 matches the first two characters
on a line if they are both the same characters
Regular expressions (continue.)
^\(.\).*\1$ matches all lines in which the first character on the line is the same as the last. Note (.*) matches all the characters in-between
^\(…)\(…\) the first three characters on the line will be stored into register 1 and the next three characters into register 2
cut
$ whobgeorge pts/16 Oct 5 15:01 (216.87.102.204)
abakshi pts/13 Oct 6 19:48 (216.87.102.220)
tphilip pts/11 Oct 2 14:10 (AC8C6085.ipt.aol.com)
$ who | cut -c1-8,18-bgeorge Oct 5 15:01 (216.87.102.204)
abakshi Oct 6 19:48 (216.87.102.220)
tphilip Oct 2 14:10 (AC8C6085.ipt.aol.com)
$
Used in extracting various fields of data from a data
file or the output of a command
Format: cut -cchars file
chars specifies what characters to extract from each line of file.
cut (continue.)
Example: -c5, -c1,3,4 -c-10-15 -c5- The –d and –f options are used with cut
when you have data that is delimited by a particular character
Format: cut –ddchars –ffields file dchar: delimiters of the fields (default: tab
character) fields: fields to be extracted from file
cut (continue.)
$ cat /etc/passwd
root:x:0:1:Super-User:/:/sbin/sh
daemon:x:1:1::/:
bin:x:2:2::/usr/bin:
sys:x:3:3::/:
adm:x:4:4:Admin:/var/adm:
lp:x:71:8:Line Printer Admin:/usr/spool/lp:
uucp:x:5:5:uucp Admin:/usr/lib/uucp:
listen:x:37:4:Network Admin:/usr/net/nls:
nobody:x:60001:60001:Nobody:/:
noaccess:x:60002:60002:No Access User:/:
oracle:*:101:67:DBA Account:/export/home/oracle:/bin/csh
webuser:*:102:102:Web User:/export/home/webuser:/bin/csh
abuzneid:x:103:100:Abdelshakour Abuzneid:/home/abuzneid:/sbin/csh
$
cut (continue.)
$ cut -d: -f1 /etc/passwd
root
daemon
bin
sys
adm
lp
uucp
nuucp
listen
nobody
oracle
webuser
abuzneid
$
cut (continue.)
$ cat phonebook
Edward 336-145
Alice 334-121
Sony 332-336
Robert 326-056
$ cut -f1 phonebook
Edward
Alice
Sony
Robert
$
paste (continue.)
Example:
$ cat students
Sue
Vara
Elvis
Luis
Eliza
$ cat sid
578426
452869
354896
455468
335123
$ paste students sid
Sue 578426
Vara 452869
Elvis 354896
Luis 455468
Eliza 335123
$
paste (continue.)
The option –s tells paste to paste together lines from the same file not from alternate files
To change the delimiter, -d option is used
paste (continue.)
Examples:$ paste -d '+' students sid
Sue+578426
Vara+452869
Elvis+354896
Luis+455468
Eliza+335123
$ paste -s students
Sue Vara Elvis Luis Eliza
$ ls | paste -d ' ' -s -
addr args list mail memo name nsmail phonebook programs roster sid
students test tp twice user
$
sed
sed (stream editor) is a program used for editing data
Unlike ed, sed can not be used interactively Format: sed command file command: applied to each line of the specified file file: if no file is specified, then standard input is
assumed sed writes the output to the standard output s/Unix/UNIX command is applied to every line in
the file, it replaces the first Unix with UNIX
sed (continue.)
sed makes no changes to the original input file ‘s/Unix/UNIX/g’ command is applied to every line
in the file. It replaces every Unix with UNIX. “g” means global
With –n option, selected lines can be printed Example: sed –n ’1,2p’ file which prints the first
two lines Example: sed –n ‘/UNIX/p’ file, prints any line
containing UNIX
sed (continue.)
Example: sed –n ‘/1,2d/’ file, deletes lines 1 and 2
Example: sed –n’ /1’ text, prints all lines from text, showing non printing characters as \nn and tab characters as “>”
tr
The tr filter is used to translate characters from standard input
Format: tr from-chars to-chars Result is written to standard output Example tr e x <file, translates every “e” in file to
“x” and prints the output to the standard output The octal representation of a character can be
given to “tr” in the format \nnn Example: tr : ‘\11’ will translate all : to tabs
tr (continue.)
Character Octal value
Bell 7
Backspace 10
Tab 11
New line 12
Linefeed 12
Form feed 14
Carriage return 15
Escape 33
tr (continue.)
Example: tr ‘[a-z]’’[A-Z]’ < file translate all lower case letters in file to their uppercase equivalent. The characters ranges [a-z] and [A-Z] are enclosed in quotes to keep the shell from replacing them with all files named from a through z and A through Z
To “squeeze” out multiple occurrences of characters the –s option is used
tr (continue.)
Example: tr –s ’ ’ ‘ ‘ < file will squeeze multiple spaces to one space
The –d option is used to delete single characters from a stream of input
Format: tr –d from-chars Example: tr –d ‘ ‘ < file will delete all spaces from
the input stream
grep
Searches one or more files for a particular characters patterns
Format: grep pattern files Example: grep path .cshrc will print every line
in .cshrc file which has the pattern ‘path’ and print it
Example: grep bin .cshrc .login .profile will print every line from any of the three files .cshrc, .login and .profile which has the pattern “bin”
grep (continue.)
Example : grep * smarts will give an error because * will be substituted with all file in the correct directory
Example : grep ‘*’ smarts
*
smartsgrep
arguments
sort
By default, sort takes each line of the specified input file and sorts it into ascending order$ cat students
Sue
Vara
Elvis
Luis
Eliza
$ sort students
Eliza
Elvis
Luis
Sue
Vara
$
sort (continue.)
$ echo Ash >> students
$ echo Ash >> students
$ cat students
Sue
Vara
Elvis
Luis
Eliza
Ash
Ash
$ sort students
Ash
Ash
Eliza
Elvis
Luis
Sue
Vara
$
sort (continue.)
The –s option reverses the order of the sort The –o option is used to direct the input from the
standard output to file sort students > sorted_students works as sort
students –o sorted_students The –o option allows to sort file and saves the output
to the same file Example:
sort students –o students correct
sort students > students incorrect
sort (continue.)
• The –n option specifies the first field for sort as number and data to sorted arithmetically
sort (continue.)
$ cat data
-10 11
15 2
-9 -3
2 13
20 22
3 1
$ sort data
-10 11
-9 -3
15 2
2 13
20 22
3 1
$
sort (continue.)
$ sort -n data
-10 11
-9 -3
2 13
3 1
15 2
20 22
$ sort +1n data
-9 -3
3 1
15 2
-10 11
2 13
20 22
$
sort (continue.)
To sort by the second field +1n should be used instead of n. +1 says to skip the first field
+5n would mean to skip the first five fields on each line and then sort the data numerically
sort (continue.)
Example
$ sort -t: +2n /etc/passwd
root:x:0:1:Super-User:/:/sbin/sh
daemon:x:1:1::/:
bin:x:2:2::/usr/bin:
sys:x:3:3::/:
adm:x:4:4:Admin:/var/adm:
uucp:x:5:5:uucp Admin:/usr/lib/uucp:
nuucp:x:9:9:uucp Admin:/var/spool/uucppublic:/usr/lib/uucp/uucico
listen:x:37:4:Network Admin:/usr/net/nls:
lp:x:71:8:Line Printer Admin:/usr/spool/lp:
oracle:*:101:67:DBA Account:/export/home/oracle:/bin/csh
webuser:*:102:102:Web User:/export/home/webuser:/bin/csh
y:x:60001:60001:Nobody:/:
$
uniq
Used to find duplicate lines in a file Format: uniq in_file out_file uniq will copy in_file to out_file removing
any duplicate lines in the process uniq’s definition of duplicated lines are
consecutive-occurring lines that match exactly
uniq (continue.)
$ cat studentsSueVaraElvisLuisElizaAshAsh
$ uniq studentsSueVaraElvisLuisElizaAsh
$
The –d option is used to list duplicate lines
Example: