AWK. text processing languge awk Created for Unix by Aho, Weinberger and Kernighan Basicly an:...
-
Upload
geoffrey-gilbert -
Category
Documents
-
view
233 -
download
2
Transcript of AWK. text processing languge awk Created for Unix by Aho, Weinberger and Kernighan Basicly an:...
awk• Created for Unix by Aho, Weinberger and
Kernighan• Basicly an:
▫ interpreted ▫ text processing ▫programming language
• Updated versions▫NAWK
New awk▫GAWK
Free Software Foundation’s version
awk Basics
•Basic form:▫ awk options 'selection criteria {action}' file(s)
•Can use regular expressions•Files are read one line at a time with
contents as fields•Fields are numbered ($1, $2, etc…)
▫Entire line is $0•Can run standalone•Can run as a program•Uses a blank as the default separator
-f Option (stored awk programs)•awk programs can be stored in a file•awk –f awkfile datafile
▫-f filename is the awk program▫datafile contains the data
Example• Find the TAs in the personnel file
▫The file is blank separated -F defines the delimiter
Use “\ “ to escape the blank (a blank after the \)▫Note: the blank is the default seperator anyway
▫Title is in the 3rd field
# cat personnel.dataTony Kombol Lecturer 800111222 704-687-1111Jinyue Xia TA 800111333 704-687-2222Hadi Hashemi TA 800111444 704-687-3333## awk -F\ '$3 == "TA" { print }' personnel.dataJinyue Xia TA 800111333 704-687-2222Hadi Hashemi TA 800111444 704-687-3333#
example• To run an awk program
▫ personnel.data has the data ▫ findta.awk is the code
Looks for TA (3rd parm) Prints first name and telephone number (1st and 5th parms)
▫ Note: what small formatting problem is here?
# awk -F\ -f findta.awk personnel.dataTAsJinyue704-687-2222Hadi704-687-3333Done
# cat personnel.dataTony Kombol Lecturer 800111222 704-687-1111Jinyue Xia TA 800111333 704-687-2222Hadi Hashemi TA 800111444 704-687-3333
# cat findta.awkBEGIN { print "TAs";}$3 == "TA" {print $1 $5}END { print "Done"}
print and printf• Output goes to std out
▫ can be redirected with > or | redirected name must be in quotes: # print $2, $1 | "sort"
▫ the output of the print goes to the sort routine
• print is unformatted• printf allows formatting
▫ %s – string %-20s
20 char spaces, justified (-)
▫ %d – integer %8d
set aside 8 spaces for the number
▫ %f – floating point %4.8f
Set aside 4 chars to the left of the decimal point and 8 to the right
▫ printf needs \n to start new line
Number processing• AWK supports basic computation
▫ + - addition▫ - - subtraction▫ * - multiplication▫ / - division▫ % - modulus▫ ^ - exponentiation
• Also supports:▫ ++ - add one to itself (post and pre fix)▫ += - add and assign to self▫ -- - subtract one from self (post and pre fix)▫ -= - subtract from self▫ *= - multiply self▫ /= - divide self
Variables and Expressions• awk is loosely typed• do not need to declare variables
▫ x = 5• do not need $ to access like sed
▫ print x• strings are double quoted
▫ x = "This is a string"• no string concatenater, done by context
▫ x = "string1"; y = "string2"print x y Space is required
• some conversions done automatically▫ x = "56"; y = 43; z = "abc"print x y # gives 5643 y converted to stringprint x + y # gives 99 + converts x to integerprint y + z # gives 43 + converts z to integer 0
Comparison and Logical Operators•awk supports string and numeric
comparisons▫== is the equality operator
= is for assignment▫< and > can be used on strings
Beware of conversions when dealing with strings that consist of numbers
▫~ is used for regular expressions $2 ~ /[dh]og/
parameter 2 matches hog or dog
simple comparison
•Field 6 is number of years with organization▫Find those with more than 5 years
# awk '$6 > 5 { print $2 ", " $1 ":" $6}' personnelyears.dataKombol, Tony:6Flintstone, Fred:10#
# cat personnelyears.dataTony Kombol Lecturer 800111222 704-687-1111 6Jinyue Xia TA 800111333 704-687-2222 3Hadi Hashemi TA 800111444 704-687-3333 1Fred Flintstone RA 800123321 704-687-1212 10Barney Rubble URA 800112233 704-687-3344 4#
Regular Expression comparison example
•Find the TAs and RAs including the URAs
# cat personnel.dataTony Kombol Lecturer 800111222 704-687-1111Jinyue Xia TA 800111333 704-687-2222Hadi Hashemi TA 800111444 704-687-3333Fred Flintstone RA 800123321 704-687-1212Barney Rubble URA 800112233 704-687-3344
# awk '$3 ~ /[RT]A/ {print $1 " " $2 " " $5}' personnel.dataJinyue Xia 704-687-2222Hadi Hashemi 704-687-3333Fred Flintstone 704-687-1212Barney Rubble 704-687-3344#
BEGIN and END Sections• BEGIN and END allows for some pre and post
processing▫Both are optional
• General format:▫BEGIN { action }{ action }END { action }
▫BEGIN's actions are done before the processing of the datafile begins Good for headers, setup, etc.
▫END's actions are done after the processing of the datafile ends Good for post processing, notes, etc.
another regular expression• This is a more complex check using a file for the awk
program▫Check to see the ID is 800……
That is 800 followed by 6 characters
# awk -f findbadid.awk personnelbad.dataList of bad IDs followsBad Id has a bad id:809123456End of list
# cat personnelbad.dataTony Kombol Lecturer 800111222 704-687-1111 6Jinyue Xia TA 800111333 704-687-2222 3Hadi Hashemi TA 800111444 704-687-3333 1Fred Flintstone RA 800123321 704-687-1212 10Barney Rubble URA 800112233 704-687-3344 4Bad Id LX 809123456 704-687-8890 0
# cat findbadid.awkBEGIN { print "List of bad IDs follows";} $4 !~ /^800....../ { print $1 " " $2 " has a bad id:" $4};END { print "End of list";}#
awk file example
# cat grades.dataFred Ziffle:99:AArnold Ziffle: 55: FTara Boomdea: 85:BNeo:100:ABuffy Summers: 72:CSheldon Cooper:67:DZorbon Prentwist: 88 : BZorbax Bottlewit:88:BBad Grade: 33: A
# cat ckgrades.awkBEGIN {print "Listing Bs\n"}$3 == "B" {print $0 }END {print "\nDone"}#
# awk -F: -f ckgrades.awk grades.dataListing Bs
Tara Boomdea: 85:BZorbax Bottlewit:88:B
Done#
Note: ": B" does not get matched
Positional Parameters•Parameters are usually used as the fields of
each line•A parameter can be passed to the awk program
▫Used with a shell program▫Must be in quotes in the program
e.g. Instead of
▫$4 > 12▫4th parm in line is > 12
▫$4 > '$2'▫4th parm in line is > 2nd parm passed to the program:▫prog.awk 50 82
Arrays
• awk supports arrays▫ arrays do not need to be "declared"
"declared" the minute they are used• Arrays are associative
▫ index can be numeric alphabetic
▫ thisday["Tue"] = "Tuesday";thisday[2] = "Tuesday"; above are two array elements for the array thisday each reference a separate string
printf("thisday[\"Tue\"] is %s", thisday["Tue"]) ;printf("thisday[2] is %s", thisday[2]) ;▫ Both will print "Tuesday" for the array referenced
Built-in Variables•awk has a set of built-in variables
▫Some can be overridden
Built-In VariablesVariable Function Default
NR Cumulative # of lines read -
FS Input Field Separator space
OFS Output Field Separator space
OFMT Default FP format %.6f
RS Record separator newline
NF Number of fields in current line -
FILENAME Current input file -
ARGC Number of arguments in command line
-
ARGV Array containing list of arguments -
ENVIRON Assoc. array of all environment variables
-
Functions
•awk has several built-in functions▫() are optional if no parms
encouraged to use▫Arithmetic functions▫String functions
String Functions• length()
▫ length of complete line• length(x)
▫ length of x• tolower(s)
▫ returns s as lower case• toupper(s)
▫ returns s as upper case• substr(str,m)
▫ returns string starting at m to end of string• substr(str,m,n)
▫ returns string starting at m for n characters• index(s1,s2)
▫ finds the position of s2 inside s2• split(str,arr,ch)
▫ splits str int an array, the delimiter is ch• system("cmd")
▫ exectutes a system (Linux) command and returns exit status
If
•Syntax:▫if (cond true) {
statements} else {
statements}
▫Notes: else is optional {} not needed for single statements
For• Syntax form 1:
▫ for ( startval ; condition ; control) statement C like in form
▫Example: for ( k=1 ; k<9 ; k++ ) print k
• Syntax form 2:▫for ( var in array) statement
Will scan every var in the array Great for associative array
Non numeric indices Gaps in array
See ENVIRON example in previous slide
continue and break
•Continue and break can be used to stop all loops▫for▫while
•break ▫stops the loop
•continue▫stops processing statements in this loop▫continues to next iteration
Resources
•Awk - A Tutorial and Introduction - by Bruce Barnett ▫http://www.grymoire.com/Unix/Awk.html
•Awk Tutorial - Main Page▫http://robert.wsi.edu.pl/awk/