Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text...

33

Transcript of Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text...

Page 1: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Page 2: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Shell ScriptingAwk (part1)

Page 3: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Awk Programming Language

standard unix language that is geared for text processing and creating formatted reports

but it is very valuable to seismologists because it uses floating point math, unlike integer only bash, and is designed to work with columnar data

syntax similar to C and bash

one of the most useful unix tools at your command

Page 4: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

considers text files as fields (columns) and records (lines)

performs floating & integer arithmetic and string operations

uses loops and conditionals

define your own functions (subroutines)

execute unix commands within the scripts and process the results

Page 5: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

versionsawk: original awk

nawk: new awk, dates to 1987

gawk: GNU awk has more powerful string functionality

the CERI unix system has all three. You want to use nawk. I suggest adding this line to your .cshrc file

alias awk ‘nawk’

in OS X, awk is already nawk so no changes are necessary

Page 6: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Command line functionality

you can call awk from the command line two ways:

awk [options] ‘{ commands }’ variables infile(s)awk –f scriptfile variables infile(s)

or you can create an executable awk script%cat << EOF > test.awk

#!/usr/bin/nawk

some set of commands

EOF

%chmod 755 test.awk

%./test.awk

Page 7: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

How it treats textawk commands are applied to every record or line

of a file

it is designed to separate the data in each line into a field

essentially, each field becomes a member of an array so that the first field is $1, second field $2 and so on.

$0 refers to the entire record

Page 8: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Field Separatorthe default field separator is one or more white

spaces

$1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11

1 1918 9 22 9 54 49.29 -1.698 98.298 15.0 ehb

Notice that the fields may be integer, floating point (have a decimal point) or strings. Nawk is generally smart enough to figure out how to use them.

Page 9: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Field Separatorthe field separator may be modified by

resetting the FS built in variable

Look at passwd file%head -n1 /etc/passwd

root:x:0:1:Super-User:/:/sbin/sh

Separator is “:”, so reset it.

%awk –F”:” ‘{ print $1, $3}’ /etc/passwd

root 0

Page 10: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

printOne of the most common commands used in awk

scripts is print

awk is not sensitive to white space in the commands

%awk –F”:” ‘{ print $1 $3}’ /etc/passwd

root0

two solutions to this%awk –F”:” ‘{ print $1 “ “ $3}’ /etc/passwd

%awk –F”:” ‘{ print $1, $3}’ /etc/passwd

root 0

Page 11: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

any string or numeric text can be explicitly output using “”

Assume a starting file like so:

1 1 1918 9 22 9 54 49.29 -1.698 98.298 15.0 0.0 0.0 ehb FEQ x

%awk '{print "latitude:",$9,"longitude:",$10,"depth:",$11}’ SUMA. loc

latitude: -1.698 longitude: 98.298 depth: 15.0

latitude: 9.599 longitude: 92.802 depth: 30.0

latitude: 4.003 longitude: 94.545 depth: 20.0

Page 12: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

• Unlike the shell AWK does not evaluate variables within strings.

• The second line, for example, could not be written:

{print "$8\t$3" }

As it would print ”$8 $3.”

• Inside quotes, the dollar sign is not a special character. Outside, it corresponds to a field.

Page 13: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

1 1 1918 9 22 9 54 49.29 -1.698 98.298 15.0 0.0 0.0 ehb FEQ x

you can specify a newline in two ways

%awk '{print "latitude:",$9; print "longitude:",$10}’ SUMA. loc

%awk '{print "latitude:",$9”\n”,”longitude:",$10}’ SUMA. loc

latitude: -1.698

longitude: 98.298

Page 14: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

a trickIf a field is composed of both strings and

numbers, you can multiple the field by 1 to remove the string.

%head test.tmp

1.5 2008/09/09 03:32:10 36.440N 89.560W 9.4

1.8 2008/09/08 23:11:39 36.420N 89.510W 7.1

1.7 2008/09/08 19:44:29 36.360N 89.520W 8.2

%awk '{print $4,$4*1}' test.tmp

36.440N 36.44

36.420N 36.42

36.360N 36.36

Page 15: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Selective executionawk recognizes regular expressions and

conditionals, which can be used to selective execute awk procedures on certain records

%awk –F”:” ‘ /root/ { print $1, $3}’ /etc/passwd #reg expr

root 0

or within our example script

#!/usr/bin/nawk -f

/root/ { print $1}

Page 16: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

if statements are also very useful%awk –F”:” ‘ {if ($1==“root”) print $1, $3}’ /etc/passwd

root 0

or within our example script

{ if ($1==“root”) { print $1 }

note, this particular if syntax is a bit different from your reading, which suggested

%awk –F”:” ‘ $1==“root” {print $1, $3}’ /etc/passwd

the syntax I use is more explicit and more like C or perl, so I essentially have to remember less syntax

Page 17: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Floating Point Arithmeticawk does floating point math!!!!!

it stores all variables as strings, but when math operators are applied, it converts the strings to floating point numbers if the string consists of numeric characters

the reading calls this stringy variables

Page 18: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Arithmetic OperatorsAll basic arithmetic is left to right associative

+ : addition- : subtraction* : multiplication / : division% : remainder or modulus ^ : exponent other standard C programming operators

Page 19: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Assignment Operators= : set variable equal to value on right+= : set variable equal to itself plus the value on

right-= : set variable equal to itself minus the value on

right*= : set variable equal to itself times the value on

right/= : set variable equal to itself divided by value on

right%= : set variable equal to the remainder of itself

divided by the value on the right^= : set variable equal to the itself to the

exponent following the equal sign

Page 20: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Unary OperationsA unary expression contains one operand and one

operator

++ : increment the operand by 1

if ++ occurs after, $x++, the original value of the operand is used in the expression and then incremented

if ++ occurs before, ++$x, the incremented value of the operand is used in the expression

-- : decrement the operand by 1+ : unary plus maintains the value of the operand, x=+x - : unary minus negates the value of the operand, -

1*x=-x! : logical negation evaluates if the operand is true

(returns 1) or false (returns 0)

Page 21: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Relational OperatorsReturns 1 if true and 0 if false

!!! opposite of bash test command

All relational operators are left to right associative

< : test for less than<= : test for less than or equal to> : test for greater than>= : test for greater than or equal to== : test for equal to!= : test for not equal

Page 22: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Boolean (Logical) Operators

Boolean operators return 1 for true and 0 for false

&& : logical AND; tests that both expressions are true

left to right associative

|| : logical OR ; tests that one or both of the expressions are true

left to right associative

! : logical negation; tests that expression is true

Page 23: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Unlike bash, the comparison and relational operators don’t have different syntax for strings and numbers.

ie: == in awk rather than == or eq using test

Page 24: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Comparison Operators~ : pattern match

!~ : pattern does not match

&& : logical AND

|| : logical OR

== : equals (numeric or string)

!= : does not equal (numeric or string)

Page 25: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Built-In VariablesFS: Field Separator

NR: record number is another useful built-in awk variable

it takes on the current line number, starting from 1

%awk –F”:” ‘ {if (NR==1) print $1, $3}’ /etc/passwd

root 0

this is useful when headers are present in a file

Page 26: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

RS : record separator specifies when the current record ends and the next begins

default is “\n” or newlineuseful option is “”, or a blank line

OFS : output field separatordefault is “ “ or a whitespace

ORS : output record separatordefault is a “\n” or newline

Page 27: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

NF : number of fields in the current recordthink of this as awk looking ahead to the next RS to

count the number of fields in advance

FILENAME : stores the current filename

OFMT : output format for numbers example OFMT=“%.6f” would make all numbers

output as floating points

Page 28: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Accessing shell variables in nawk

3 methods to access shell variables inside a nawk script ...

Page 29: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

1. Assign the shell variables to awk variables after the body of the script, but before you specify the input file

VAR1=3VAR2=“Hi”

awk '{print v1, v2}' v1=$VAR1 v2=$VAR2 input_file

3 HiNote that I am sneaking in the concept of awk variables here (v1,v2)

Page 30: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

There are a couple of constraints with this method

- Shell variables assigned using this method are not available in the BEGIN section

- If variables are assigned after a filename, they will not be available when processing that filename

awk '{print v1, v2}' v1=$VAR1 file1 v2=$VAR2 file2

In this case, v2 is not available to awk when processing file1.

Page 31: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

Also note: awk variables are referred to by just their name (no $ in front)

awk '{print v1, v2, NF, NR}' v1=$VAR1 file1 v2=$VAR2 file2

Page 32: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

2. Use the -v switch to assign the shell variables to awk variables.

This works with nawk, but not with all flavours of awk.

nawk -v v1=$VAR1 -v v2=$VAR2 '{print v1, v2}' input_file

Page 33: Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.

3. Protect the shell variables from awk by enclosing them with "'" (i.e. double quote

- single quote - double quote).

awk '{print "'"$VAR1"'", "'"$VAR2"'"}' input_file