CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed...

40
CIS 218 Advanced UNIX 1 CIS 218 Advanced UNIX (g)awk

Transcript of CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed...

Page 1: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 1

CIS 218 – Advanced UNIX

(g)awk

Page 2: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 2

Overview

• awk is a programming language

• Awk uses syntax based on grep and sed for

handling numbers and text

• awk provides field level addressability.

And within a field (word) using substring

commands

• awk works field by field

Page 3: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 3

awk command syntax

• There are two ways to execute an awk

program/script:

– awk [-F field-separator] ‘program’ target-file

– awk [-F field-separator] -f program.file target

• From our discussion of sed, and

Refrigerator Rule No. 5, I would hope you

are firmly committed to the second form!

Page 4: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 4

awk Variables

• There are a number of awk variables that

are very useful

– FS (The field separator, defaults to white space)

– OFS (Output field separator, can be critical)

– NR (Number of records, a sequential counter)

– NF (Number of fields in the current record)

– FILENAME (Name of the current target file)

Page 5: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 5

awk Variables (cont.)

– $0 (The entire line as read from the target file)

– $n (Where n is the nth field in the record. This is how we get field level addressability in awk)

• nawk, gawk, etc give us more variables, the most significant two are:

– ARGC (the count of the command line arguments)

– ARGV (an array of the command line arguments)

Page 6: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 6

Parts of a program

• All programs are composed of one or more of the following three constructs:

– sequence (a series of instructions, one following the next, executed sequentially)

– selection (the ability of the code to decide which instructions to execute, conditional execution)

– iteration (adding looping so that selected code will be repeated over an over)

Page 7: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 7

awk Program Format

• Awk programs are composed of pattern {action} pairs (actions must be enclosed in French braces {} )

– a pattern without a corresponding action takes the default action, print $0

– an action without a corresponding pattern is applied to every line

– each input line is submitted to every pattern/action pair

Page 8: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 8

awk Program Format (cont.)

• Placement of the open French brace is critical

– pattern { both patterns are

action 1 executed for lines

action 2 matching the pattern }

– pattern lines matching the pattern

{action 1 are printed, and both

action 2 actions are performed on

} every line!

Page 9: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 9

Patterns

• In an awk program, the pattern is the

selection tool that decides what actions are

applied to which lines.

• Patterns can be:

– relational expressions

– regular expressions

– magic patterns

Page 10: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 10

Relational Expression patterns

Symbol Meaning Symbol Meaning

< Less than == equal to

<= Less than or

equal to

~ contains the RE

> Greater than !~ doesn't contain

RE

>= Greater than or

equal to

&& logical and

!= not equal to || logical or

Page 11: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 11

Regular Expression patterns

• Must be enclosed in slashes /RE/

• Anchors apply to the entire line if they are used as the only pattern

• Remember, you can use regular expressions in relational patterns with ~ and !~ to apply them to fields

• Both true regular expressions and fixed patterns can be used as REs in awk

Page 12: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 12

Pre/Post Processing

• There are two in awk:

– BEGIN {the action associated is performed before the

target file is opened}

– END {the action associated is performed after the target

file is successfully closed}

• Both are coded in UPPER CASE

Page 13: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 13

# comments

• Like most scripting languages # indicates a

comment

• awk scripts should be well documented

• Comments should explain what you are

doing and why.

Page 14: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 14

print

• The print command is the simplistic output

tool for awk. Basically and “echo”/

• You can direct print to send its data to a file

with the > operator

• Generally print is used for simple output or

debugging output

Page 15: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 15

printf

• Similar in concept to the “C” language command.

The format of a printf command is:

printf (“formatting string”,variables)

• The formatting characters correspond to the

variables one for one in both lists.

• Each formatting character is prefixed by %

Page 16: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 16

printf (cont.)

• The formatting specifiers contain then

following characters:

– - indicates that the data should be left justifed

– n indicates the minimum width of the field

– .n indicates the maximum width of the field

“%-5s”

indicates a string field, left justified, of

width 5 bytes

Page 17: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 17

printf formatting characters

Format Meaning Format Meaning

%c single ASCII

character

%G shortest of %E or

%f

%d decimal integer %i decimal integer

%e scientific notation %o octal number

%E SCIENTIFIC

NOTATION

%s string

%f floating point %x hexadecimal (lc)

%g shortest of %f or

%e

%X HEXADECIMAL

Page 18: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 18

printf spacing characters

• There are two characters available to

change the spacing of your text:

– \n inserts a newline character. You must use

this if you want your output to occur on

successive lines.

– \t inserts a tab character

Page 19: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 19

getline

• getline is used to read from the keyboard

• It can also capture the results of a command but this form is seldom used

• Read from the keyboard using getline variable < “/dev/tty”

• If you don’t supply a variable, awk will use $0, so in most cases you want to use a variable.

Page 20: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 20

rand() srand()

• The rand() function generates pseudo-random numbers in the range 0 - 1.

• Given the same seed, it will always generate the same series of numbers.

• srand() is used to supply a new seed to rand().

• If you don’t supply srand() a value, it uses the current time as the seed.

Page 21: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 21

system()

• The system() function allows you to execute

system commands within an awk script.

• You must enclose the system command in

quotation marks.

• You cannot capture the output from the

system() function within the script but you

can capture the return code.

Page 22: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 22

length()

• The length([argument]) function returns the

length of the argument in bytes.

• If you give length() a number, it will return

the number of digits in the number.

• If you don’t give length() an argument, it

will use $0 by default.

Page 23: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 23

index()

• The index(string,target) function returns the

position of the first occurrence of the target

within the string.

• The index() function is often used to set the

boundary for the substr() function.

Page 24: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 24

substr()

• The substr(string,start[,length]) function

will return the part of the string beginning

with start and continuing for length bytes.

• If you don’t give it a length, it will return all

the bytes between the start and the end of

the string.

Page 25: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 25

split()

• You will use split(string, array[, separator])

to divide a string into parts using separator

to parse them, storing the resultant parts in

the array.

• If you don’t code a separator, the function

will use the field separator to parse the

string.

Page 26: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 26

if • Besides using patterns, if gives us another

way to perform selection

• The format of an if statement is if (condition) {verb(s)} [else { verb(s)}]

• If you have more than one verb, they must be enclosed in French braces.

Page 27: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 27

if conditions

A < B A is less than B

A <= B A is less than or equal to B

A == B A equals B (note 2 =)

A > B A is greater than B

A >= B A is greater than or equal to B

A != B A is not equal to B

A ~ /RE/ A contains the regular

expression RE

Page 28: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 28

if

• A sample if

Page 29: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 29

exit

• The input file is closed

• Control is transferred to the action

associated with the END magic pattern if

there is one

• Generally used as a bailout in case of

catastrophic errors

Page 30: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 30

for loop • This is a counted loop

• executes until the counter reaches the target

value

• Increment (count up) or decrement (count

down)

• also works with the elements of an array

• multiple verbs must be enclosed in { }

Page 31: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 31

for loop example

Page 32: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 32

while loop

• The while loop is an example of conditional

execution

• The loop cycles as long as the condition

specified is true

• A while loop always checks to see if it

should execute

• multiple verbs must be enclosed in { }

Page 33: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 33

while loop example

Page 34: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 34

do/while

• Even though it has a while in it, this is an

example of until logic.

• Until logic is shunned by conscientious

coders.

• ‘nuff said

Page 35: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 35

break

• Used to exit from a loop

• Control is passed to the line following the

end of the loop

• Causes an exit from the loop but NOT the

awk script. If you want to bail out of the

whole script, use the exit command.

Page 36: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 36

break example

Page 37: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 37

continue

• Causes awk to skip the rest of the body of

the loop for the current value

• In a for loop the counter is incremented, and

the next cycle of the loop is started

• In a while loop, the next iteration of the

loop starts

Page 38: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 38

continue example

Page 39: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 39

next

• Causes the script to start over

• takes the next element from standard input

or the target file

• Like exit, this command effects the whole

script

Page 40: CIS 218 Advanced UNIX · •awk is a programming language •Awk uses syntax based on grep and sed for handling numbers and text •awk provides field level addressability. And within

CIS 218 Advanced UNIX 40

next example