CIS52 – File Manipulation
description
Transcript of CIS52 – File Manipulation
1© 2001 John Urrutia. All rights reserved.
CIS52 – File Manipulation
File Manipulation Utilities Regular Expressions
sed, awk
2© 2001 John Urrutia. All rights reserved.
Overviewcomm – comparison of sorted filescut – output sections of lines in a filefind – find files that match a patternpaste – merges records in filespr – paginate files into pagestr – translate or delete characters
3© 2001 John Urrutia. All rights reserved.
Overviewregular expressionssed – Stream Editor (batch file editor) awk – Aho,Weinberger,Kernighan (Pattern
match)
4© 2001 John Urrutia. All rights reserved.
The comm before the stormCompares 2 sorted files
Results reported in 3 columns1st – records found only in file 12nd – records found only in file 23rd – records that match in both files
Options remove corresponding columns – [1] [2] [3]
5© 2001 John Urrutia. All rights reserved.
comm – cont.Either file name can be substituted
with standard input
Example:File1 File2
aa bbdd ccee ddgg eehh ff
6© 2001 John Urrutia. All rights reserved.
comm resultsFile1 File2 Bothaa
bbcc
ddee
ffgghh
option -1
bbcc
ddee
ff
option -2aa
ddee
gghh
option -12ddee
7© 2001 John Urrutia. All rights reserved.
cut to the chaseAllows you to extract portions of
each record in a file.
Delimits data in the file into fields or columns.Default delimiter is the tab character
Can be changed by the –d option
8© 2001 John Urrutia. All rights reserved.
cut cont.cut - [b | c | [ f [-d char] [-s] ] list
[--output-delimiter=string]b – bytes
c – characters (same as bytes)
f – fieldsd – delimiter characters– display only records with
delimiters
9© 2001 John Urrutia. All rights reserved.
cut ! printchar – single byte used to delimit
fields in a record
list – list of range/s of characters to displayRanges are comma separated.
1-7 first 7 characters in record
1,7 first and seventh characters
10© 2001 John Urrutia. All rights reserved.
cut ! print againstring – list of characters to
substitute for the delimiters.
11© 2001 John Urrutia. All rights reserved.
cut - Example
[/@linux2 uid]$ cat file1The quick brown fox eyed the jactitating dog[/@linux2 uid]$ cut –f1,3,5,8 –d’ ‘ file1The brown eyed dog[/@linux2 uid]$ cut –f1,4-6,8 –d’ ‘ file1The fox eyed the dog
12© 2001 John Urrutia. All rights reserved.
find that pot of goldfind – selects all files that meet the
selection criteria in the expressionNo action is taken unless it is specified
Sub-directories are scanned automatically
The expression can be simple or complex
13© 2001 John Urrutia. All rights reserved.
find me somethingThe criteria expression:
And’s each operand separated by a space
Or’s each operand separated by –o
Processes left to right sequentially
14© 2001 John Urrutia. All rights reserved.
find criteria continuedActions
-print prints the path of all files that meet the selection criteria
-exec cmds\; executes the commands before the \:
-ok same as –exec but must have a Y from stdin.
15© 2001 John Urrutia. All rights reserved.
find criteria continued again
Evaluations-type specify a type of file (ie. directory)
-atime ±n accessed ±n days ago.
-mtime ±n modified ±n days ago.
-user uid owner of the file
-nouser uid owner is not known to system
16© 2001 John Urrutia. All rights reserved.
paste tastes goodpaste [options] [filelist]
each record in the file is merged into 1 record-s process filelist sequentially. All
records are processed before going to the next file
-d [delimiter list] each character in turn delimits the file records.
17© 2001 John Urrutia. All rights reserved.
paste continued[/@linux2 uid]$ cat file1
ABC
[/@linux2 uid]$ cat file2123
[/@linux2 uid]$ cat file3xyz
18© 2001 John Urrutia. All rights reserved.
paste continued
[/@linux2 uid]$ paste file1 file2 file3
Output file
A 1 xB 2 yC 3 z
[/@linux2 uid]$ paste –s file1 file2 file3
Output file
A B C1 2 3x y z
19© 2001 John Urrutia. All rights reserved.
pr – public relations--NOTpr paginate file(s) for printing
Can specify page attributesChanged lines through the –l option
For multiple files each starts a new page
20© 2001 John Urrutia. All rights reserved.
pr – continuedpr paginate a file for printing
Creates a header and trailerChanged through the –h optionSuppress through the –t option
Can create columns of data–nbr Number of columns per line–Sx Character used to separate
columns
21© 2001 John Urrutia. All rights reserved.
pr – continuedCan create numbers for each line
–nckc - character data separator
default is tab characterk – number of digits
22© 2001 John Urrutia. All rights reserved.
Regular ExpressionsA set of characters that define the
criteria used to identify a string within a record.
Used by vi, grep, sed, awk, and others.
23© 2001 John Urrutia. All rights reserved.
tr – Translate thistr – [c] [d] [s] [t] set1 [ set2 ]
Translate from set1 to set2c – compliment of set1
d – delete characters found in set1
s – squeeze out duplicates
t – truncate set1 to length of set2
24© 2001 John Urrutia. All rights reserved.
Regular ExpressionsSimple strings
Bound by / … /Interpreted literallyie. /e D/ - matches exactly e D
Taste Dee – OK Taste don’t – not OK
25© 2001 John Urrutia. All rights reserved.
Regular ExpressionsThe • special single sub character
Matches any single character
ie. – /.eny/ matches Aeny Beny Ceny
The [ char-range ] define a character class
The [^ char-range ] define the not-in-character class
26© 2001 John Urrutia. All rights reserved.
Regular ExpressionsThe
(asterisk)Matches 0 or more of the preceding character.
What’s this?
/. // [ a-zA-Z ] /
/ ([^)] )/
27© 2001 John Urrutia. All rights reserved.
Regular Expressions
The /^ (for the rabbit) characterIn the beginning …
The $/ (for the teacher) characterAt the end …
28© 2001 John Urrutia. All rights reserved.
Regular ExpressionsQuote the raven – backslash
\. This yields •
\\ This yields \
\* This yields *
\[ This yields [
\] This yields ]
\ / This yields /
29© 2001 John Urrutia. All rights reserved.
sed – the old Stream EDitor sed [-n] [-fscript ] [file-list]
Copies and edits to standard output
Edits file(s) in a non-interactive mode
Gets its instructions from a script file–f filename contains sed instructions
No option 1st command argument is used
–n suppress stdout unless specified
30© 2001 John Urrutia. All rights reserved.
sed – the old mill stream Record processing
1. Read record from file list
2. Read record from script (or cmd line)
3. Apply selection criteria
4. If selected perform instructionand repeat 2 4 until no more script
5. Repeat 1 5 until no more file list.
31© 2001 John Urrutia. All rights reserved.
He sed what!!??Instruction format
[addr1 ] ,addr2 ] ] inst [arg-list]
AddressA line number
Regular expression
Addr1 – start
Addr2 – stop
32© 2001 John Urrutia. All rights reserved.
Address line numbers$ Designates the last line of the last file
1st address line numberStarts selecting records based on their
position in the input file list relative to 1.
2nd address line numberStops selecting records when position in
the input file list is > than the line number.
33© 2001 John Urrutia. All rights reserved.
He sed some moreInstructions
! – Not negates the address selection sed ‘!/line/ p’ file.list
{…} – Groups the instructions for the address selection
34© 2001 John Urrutia. All rights reserved.
sed Instructionsp – Print now and continue
d – Delete and get the next record
q – Quit processing; Stop; Go Away
35© 2001 John Urrutia. All rights reserved.
sed Instructionsc – Change
[addr1] [addr2] c\ yada yada yadaall selected records are replaced as a group by the change value
a – Append[addr1] a\ …
add the text to the end of the selected records
36© 2001 John Urrutia. All rights reserved.
sed Instructionsi – Insert
[addr1] a\ … add the text to the beginning of the selected records
n – Next[addr1] n
writes the current, gets the next and continues the script
37© 2001 John Urrutia. All rights reserved.
sed Instructionsw – Write
[addr1] [,addr2] w filename
writes the selected records to a file
r – Read[addr1] r filename
reads records from the filename and appends them to the selected record
38© 2001 John Urrutia. All rights reserved.
sed Instructionss – Substitute
[addr1] [,addr2] s/ptrn /repl /[g] [p] [w f ]for each selected record match the pattern and replace
g – Replace all non-overlapping occurrences
p – Print the record
w – write the record to the filename
39© 2001 John Urrutia. All rights reserved.
Hawk – Squawk – awk The programmable utility that does everything.
Aho – Weinberger – Kernighan
Provides:Conditional execution
Looping
Handles:Numeric & string variables
Regular expresions
C print facilities
40© 2001 John Urrutia. All rights reserved.
awkawk [–Fc] [–f] program-file [ file list ]
F – field delimiter character
f – name of the awk program file
program-file instream instructions
List of files to process
41© 2001 John Urrutia. All rights reserved.
awk – program linespattern [ action ]
Like sed pattern selects records
Record processing is the same as sed
42© 2001 John Urrutia. All rights reserved.
awk – patternPatterns follow regular expression format.
~ Tests for match to regular expression
!~ Tests for NO match to regular expression
, – Establishes a pattern range all records are processed inclusively within the range
BEGINexecutes before the first record is processed
ENDexecutes after the last record is processed
43© 2001 John Urrutia. All rights reserved.
awk – relational operators< – less than
<= – less than or equal to
== – equal to
!= – not equal to
>= – greater than or equal to
> – greater than
44© 2001 John Urrutia. All rights reserved.
awk – operatorsArithmetic
+ – addition
- – subtraction
* – multiplication
/ – division
Assignment= – assigns value to the left
+= – adds value to the left
45© 2001 John Urrutia. All rights reserved.
awk – boolean operators&& – and
|| – or
! – not
46© 2001 John Urrutia. All rights reserved.
awk – actions# - Comment to the right on any line
Default action is print to stdout
Multiple actions can be takenUse {…} to enclose multiple actions
Separate actions with ;
47© 2001 John Urrutia. All rights reserved.
awk – actionsprint variable …
Var , Var2 , Var3Prints variables separated by delimiter
Var Var2 Var3NO separators
“literal value “Prints exactly everything between the “ “
48© 2001 John Urrutia. All rights reserved.
awk – actionsprintf “cntl string” variable …
Control String\n – new line\t – tab
%[-] [n] [.d] conv char- left justificationn number of character.d decimal positions
49© 2001 John Urrutia. All rights reserved.
awk – actions%[-] [n] [.d] conv char
- left justificationn number of character.d decimal positionsconv char – conversion character
d - decimal, e - exponent, f - floating-pointo - octal, x - hexadecimals - string
50© 2001 John Urrutia. All rights reserved.
awk – variablesawk provided variables
NF – total number of fields
$1…$n – each field in the current record
FS – input field separator (default space or tab )
OFS – output field separator (default space )
51© 2001 John Urrutia. All rights reserved.
awk – variablesawk provided variables
NR – current record number
$0 – entire current record
RS – record separator (default newline )
ORS – output record separator (default newline )
FILENAME – name of current input file
52© 2001 John Urrutia. All rights reserved.
awk - variablesAssociative Arrays
array_name [ string ]The array name should be meaningfulThe index of the array is a stringElements are automatically created
for ( element in array ) actions
53© 2001 John Urrutia. All rights reserved.
awk - functionslength(string) – returns the number of
characters in string
int(num) – returns the integer portion
index(str1,str2) – returns the index of str2 found in str1 or 0 if not present
split(str,arr,del) – populates arr[ ] from fields in str delimited by del – returns count of elements.
54© 2001 John Urrutia. All rights reserved.
awk - functionssprintf(fmt , args) – formats args using
the fmt and returns the formatted string.
substr(str , pos , len) – returns a substring of str starting with position pos for a length of len.