Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS...

76
Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School

Transcript of Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS...

Page 1: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Introduction to Shell scripting

Presented by:Shailender Nagpal, Al Ritacco

Research ComputingUMASS Medical School

Page 2: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

AGENDAShell basics: Scalars, Arrays, Expressions, PrintingBuilt-in commands, Blocks, Branching, LoopsString and Array operationsFile operations: Text processing utilities SED, AWKWriting custom functionsProviding input to programsShell scripting strategiesUsing Linux scripts with the LSF cluster

2

Page 3: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

What is Shell scripting?

• Series of linux commands in a text file that can be executed on a linux shell in top-down fashion

• The Linux shell provides a high-level, general-purpose, interpreted, interactive programming environment

• Simple iterative, top-down, left to right programming style for users to create small, and large’ish programs– Mainly for automating linux tasks but also for writing

integrated workflows

3

Page 4: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Features of Shell scripting

• Linux code is for Linux operating system only• Easy to use and lots of resources are available• Procedural programming, not strongly "typed"• Similar programming syntax as other languages

– if, for, do, functions, etc• Provides limited methods to manipulate data

– scalars, arrays

• Statements don’t need to be terminated by semi-colon (but can be)

4

Page 5: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Advantages of Shell scripting

• Not as fully-featured as C, Java, Perl, Shell script, but still very useful for automation, file processing and workflow development, making it advantageous to use it in certain applications like Bioinformatics– Fewer lines of code than C, Java. Similar to Perl, Python– No compilation necessary. Prototype and run!– Run every line of code interactively– Vast command library – Save coding time and automate computing tasks– Code is even more concise than Perl and Python

5

Page 6: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Types of linux "shells"

• Shells provide a user interface (command prompt) to the underlying unix operating system

• They give users an environment to execute commands upon login

• Many shells are available, which are mostly the same, but with some minor differences– Bourne shell (sh), C shell (csh), TC shell (tcsh), Korn shell

(ksh), Bourne Again Shell (bash)

• Which "shell" are you using?echo $SHELL

6

Page 7: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Shell features

7

FEATURES Bourne C TC Korn BASH

Command history No Yes Yes Yes Yes

Command alias No Yes Yes Yes Yes

Shell scripts Yes Yes Yes Yes Yes

Filename completion No Yes Yes Yes Yes

Command line editing No No Yes Yes Yes

Job control No Yes Yes Yes Yes

Page 8: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

First Shell program• The obligatory "Hello World" program

#!/usr/bin/bash

# Comment: 1st program: variable, echoname="World"echo "Hello $name"echo "Hello ${name}"

• Save as ".sh" extension, then at linux shell:chmod 755 hello.sh # Make it executable./hello.sh

Hello WorldHello World

8

Page 9: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Understanding the code

• The first line of a Shell script requires an interpreter location, which is the path to the "bash" shell

#!/path/to/bash

• 2nd line: A comment, beginning with "#"• 3rd line: Declaration of a string variable• 4th, 5th line: echoing some text to the shell with a

variable, whose value is interpolated by $ sign• The quotes are not echoed, and "name" is replaced by

"World" in the output.

9

Page 10: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Second program

• Report summary statistics of DNA sequence#!/usr/bin/bashdna="ATAGCAGATAGCAGACGACGAGA"dna_length=`echo $dna |wc -m`echo "Length of DNA is $dna_length"echo "Number of A bases are" `echo $dna | grep -o "A" | wc -l`echo "Number of C bases are" `echo $dna | grep -o "C" | wc -l`echo "Number of G bases are" `echo $dna | grep -o "G" | wc -l`echo "Number of T bases are" `echo $dna | grep -o "T" | wc -l`echo "Number of GC dinucleotides are ", `echo $dna | grep -o "GC" | wc -l`gc=$((`echo $dna | grep -o "G" | wc -l`+`echo $dna | grep -o "C" | wc -l`))gc_per=`echo $gc/$dna_length*100 | bc -l`printf "G+C percent content is %2.1f" $gc_per

• Quick summary, re-use code to find motifs, RE sites, etc

10

Page 11: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Linux Commandsls cp rm mv cdmkdir pwd rmdir cat headtail clear vi passwd lessmore history export alias functiondate who whoami last exitwc grep man sort uniqgzip tar file ssh rshscp rsync ftp echo touchfile cut tee dos2unix psbg fg wait top timewho df du screen lastchmod chown chgrp grep/egrep sedawk test expr csplit diff

11

Page 12: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Linux Commands (…contd)find locate finger history hosthostname jobs join kill lnmail make mount umount nlnohup passwd ps pstree nicerenice rlogin rsh set setenvtee test top tr unaliasuname untar unless unzip uptimevmstat wget which while xargszip env su sudo emacspico nano bzip2 sleep disownsource exec bash umask pastesvn free banner fgrep crontab

12

Page 13: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Linux Commands (…contd)

if then else elif fifor do done while caseesac

13

Page 14: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Application commandsallegro bedtools blastbowtie bwa clustalWcrossbow cufflinks fastafastx maq maqviewmfold plink polyphenprimer3 prinseq samtoolssnpEff sratools tophatvcftools vmd namd

• To run the "blast" command for example, run this:blastall –p blastn –d nr –i query.fa

14

Page 15: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Shell comments• Use "#" character at beginning of line for adding

comments into your code • Helps you and others to understand your thought

process• Lets say you intend to sum up a array of numbers

# (sum from 1 to 100 of X)

•The code would look like this:sum=0 # Initialize variable called "sum" to 0for i in $(seq 1 100); do # Use "for" loop to iterate over 1 to 100

sum=$(( $sum + $i)) # Add the previous sum to xdoneecho "The sum of 1..x is $sum" # Report the result

15

Page 16: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Shell script: Variables

• Variables – Provide a location to "store" data we are interested in

• Strings, decimals, integers, characters, arrays, …– What is a character – a single letter or number– What is a string – a array of characters– What is floating point – a number 4.7 (sometimes referred

to as a real if there is a decimal point)

• Variables can be assigned or changed easily within a Shell script

16

Page 17: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Variables and built-in keywords

• Variable names should represent or describe the data they contain– Do not use meta-characters, stick to alphabets, digits and

underscores. Begin variable with alphabet

• Shell scripting as a language has keywords that should not be used as variable names. They are reserved for writing syntax and logical flow of the program– Examples include: if, then, fi, for, while, do, done, switch,

function, etc

17

Page 18: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Special shell variables

• Shell built in variables available:$# - Shows number of command line arguments$* - All arguments are sent to shell$@ - All arguments, any type, are sent to shell$$ - Process ID of the program running or ran$! – Process ID of the last program put into the Back Ground$? – Exit code of the command just submitted for execution

18

Page 19: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Shell "Environment" variables

• Try out the commandsenvprintenv

• Variables that control the behavior of the shell are called Environment Variables

• An important variable is the “PATH” variable, which controls the order of the directories where commands will be executed from

• Try:which man

19

Page 20: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Variables, Arrays• Variables that hold single strings are string variables• Variables that hold single integers are integer

variablesrank=3 score=5.3dna="ATAGGATAGCGA"

• Collection of variables are called arrays… could be a array of students in a class, scores from a test, etcstudents=("Alan" "Shailender" "Chris")scores=(89.1 65.9 92.4)binding_pos=(9439984 114028942)

20

Page 21: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Printing text and variables• Single quotes do not process delimiters or variables

and are therefore generally not used• Double quotes process variables prefixed with the

"$" sign. Delimiters are not processed with "echo" Ex: x=1echo "This \t is a test\nwith text $x"

Output: This \t is a test\nwith text 1

• To process delimiters use "printf",printf "This is a \t tab"printf "This is a \t tab %s %s" $x $x

21

Page 22: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Printing arrays

• Array variables can also be echoed as a array with a default delimiter, but another way to echo arrays is put them in a loop and echo them as scalarsstudents=("Alan" "Shailender" "Chris")echo "students\n" # Does not work!printf "%s %s %s" ${students[@]} # Method 1printf "%s %s %s" ${students[0]} ${students[1]} ${students[2]} # Method 2

• If you run this as a program, you get this output:Alan Shailender Chris # Method 1Alan Shailender Chris # Method 2

22

Page 23: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Math Operators and Expressions• Math operators

– Eg: echo $((3 + 2))– + is the operator – We read this left to right– Basic operators such as + - / * ** ( ^ )– Variables can be usedecho "Sum of 2 and 3 is " $((2+3))x = 3echo "Sum of 2 and x is " $((2+$x))

• PEMDAS rules are followed to build mathematical expressions. Floating point operations not allowed

23

Page 24: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Mathematical operations

• Another way for integer arithmeticlet "x=3"let "y=5"let "z=y+x"echo $zlet "x=x*z"let "y++"echo $xecho $y

• Yet another wayz=`expr $x + 4` # space required between operands

24

Page 25: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Floating point arithmetic

• Many ways to do this. If "bc" is available, print an expression and send it to built-in calculatorx=1.5y=2.9echo "$x/$y" | bc -l

• One can also use the "awk" programecho `awk 'BEGIN {print 5/3}'`z=`awk 'BEGIN { x = 1.5; y = 2.9; printf("%2.1f", y/x) }'`echo $z

25

Page 26: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Creating Arrays

• Integer arrays can be created using the command "seq", which needs a start and end position, alongwith increment sizeseq 1 2 10echo $(seq 1 2 10)

26

Page 27: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Array Indexing• Arrays can be indexed by number to retrieve individual

elements• Indexes have range 0 to (n-1), where 0 is the index of the

first element and n-1 is the last item's indexnucleotides=("adenine" "cytosine" "guanine" "thymine" "uracil")echo ${nucleotides[3]} is equal to thymineecho ${nucleotides[4]} is equal to what?

• Any element of an array can be re-assignednucleotides[4]="Uracil"echo ${nucleotides[@]} # @ represents all elements

27

Page 28: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Array Operations

• Consider an arraydata=(10 20)

• To get the number of items in arrayecho ${#data[@]}

• To add items to the end of the arraydata=(${data[@]} 30 40); echo ${data[@]}

• To get the string length of a particular item in arrayecho ${#data[3]}

Page 29: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Array Operations (…contd)

• To extract a slice of items in arrayecho ${data[@]:2:2}

• To find and replace items in the arrayecho ${data[@]/0/5}

• To remove an item at a given positionunset data[3]; echo ${data[@]}

• To remove item based on patternsecho ${data[@]/2*/}

Page 30: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

• Shell script provides excellent features for handling strings contained in variables

• The "split" command allows users to search for patterns in a string and use them as a delimiter to break the string apart

• For example, to extract the words in a sentence, we use the space delimiter to capture the wordsx="This is a sentence"echo $x | tr " " "\n"for word in `echo $x | tr " " "\n"`; do

echo $word; done

String Operations: Split

Page 31: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

String Operations

• Two strings placed next to one another with a space will concatenate automatically in the echo commandecho "Hello "" world"

Hello worldwords=`echo "Hello "" world"`echo $words

Hello world

Page 32: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

String Sub-scripting

• Once a string is created, it can be subscripted using its indices that begin with 0word="Programming"echo ${word:0} # "Programming"echo ${word:3} # "gramming"echo ${word:0:3} # "Pro"

• Slices of Shell script strings cannot be assigned, eg${word:0:1}="D" # This won't work

Page 33: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

String Commands

• Some examplesdna="ATAGACGACGACGTCAGAGACGA"

• Length of DNA isecho "Length is" ${#dna}

• Find the index of a patternecho `expr index "$dna" GA`

• Extract a substringecho `expr substr $dna 1 2`

• Convert to uppercase or lowercaseecho $dna | tr [A-Z] [a-z]

Page 34: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

String Functions (…contd)

• Delete a pattern within a stringecho ${dna#A*A} # Delete shortest from frontecho ${dna##A*A} # Delete longest from front

• Find and replace a stringecho ${dna//AT/GGGGG}# Replace all occurances

Page 35: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

File Processing operations

• There are many commands in linux that operate directly on files, without having to open them and save data in arrays, etc

• This is a big advantage over Perl, Pythonsort cut uniq awk sed tr splithead tail

35

Page 36: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

File Processing operations: AWK• Consider CSV file "gene_data.txt"

awk -F "," '$2>1000' gene_data.txtawk -F "," '$2>50 && $4<50' gene_data.txtawk -F "," '$2>50 && $4<50 {print $1}' gene_data.txtawk -F "," '$2>50 && $4<50 {printf("%s\t%s\t%s\n", $1,$3,$5)}' gene_data.txtawk -F "," '$2>50 && $4<50 {printf("%s\t%s\t%s\n", $1,$3,$5)}' gene_data.txtawk -F "," '$2>50 && $4<50 {printf("%s\t%f\n", $1,$4-$2)}' gene_data.txt

36

Page 37: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

File Processing operations: SED

• SED is a text stream editor that operates on files as well as standard output. Main function is to find patterns and act on them – delete or replace text

• Here’s some simple examples of using SED– Delete lines from a file containing a pattern

sed '/^>/d' sequence.fa # Result in STDOUTsed '/^>/d' –i sequence.fa # In-place

– Replacement of text pattern with another textsed 's/T/U/g' sequence.fa

37

Page 38: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

File Processing operations: CUT• Dissect the "gene_info.txt" file in a few ways

– Extract the 2nd column from file (each line)cut -f 2 -d "," gene_info.txt

– Extract the 1st and 4th columns from file (each line)cut -f 1,4 -d "," --output-delimiter=" " gene_info.txt

– Extract the 10th character in each linecut -c 10 gene_info.txt

– Extract the 10th to 12th characters in each linecut -c 10-12 gene_info.txt

– Extract the 3rd and 13th characters in each linecut -c 3,13 gene_info.txt

38

Page 39: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

File Processing operations: SORT

• Sort the "gene_data.txt" file in different ways– 1st column, dictionary order. Delimiter is ","

sort -k 1 -d gene_data.txt

– 2nd column, numerical increasing order. Delimiter is "," sort -k 2 -n -t "," gene_data.txt

– 4th column, numerical decreasing order. Delimiter is ","sort -k 4 -nr -t "," gene_data.txt

39

Page 40: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

File Processing operations: UNIQ

• The "uniq" command finds consecutive lines in files or STDIN that are the same and merges them for display

• The best use of the command is with delimited files where a particular field is "cut" out and sorted

• How many unique chromosomes are represented in the file "gene_info.txt"?cat gene_info.txtcat gene_info.txt | cut -d "," -f 3 | sort | uniq

40

Page 41: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

File Processing operations: TR

• "tr" translates, squeezes, and/or deletes characters from standard input, writing to standard output– In string, delete all spaces

echo "Sam Smith" | tr -d ' '– In string, replace spaces with tabs

echo "Sam Smith" | tr –s [:space:] '\t'– In string, delete all spaces

echo "Sam Smith" | tr -d ' '

– In FASTA file, concatenate all DNA into stringsed '/^>/d' sequence.fa | tr -d '\n'

41

Page 42: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

File Processing operations: SPLIT

• Lets say you want to break a FASTQ file into pieces so you can align each piece separately in parallel – how would you split the file?– One approach will be to count the reads and split by "m"

equal reads– Another would be to divide into "n" pieces of somewhat

equal size – may corrupt FASTQ– Shown below:nlines=`wc -l reads.fq | cut -f 1 -d " "`echo $nlines/100 | bcsplit -l 132000 -a 3 -d reads.fq

42

Page 43: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

File Processing operations: CAT

• With the "cat" command, many file operations can be accomplished– Lines of a file can be loaded into an array

lines=`cat filename.txt`echo ${lines[@])

– Files can be loaded into STDOUT for string operationscat filename.txt | wc -l

– Files can be re-directed as output to other files with the re-direction operatorcat filename.txt >> filename2.txt

43

Page 44: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Commands blocks in Shell script• A group of statements surrounded by braces {}

– No! There are no curly braces in Shell script!– Shell script blocks begin with "then", "elif", "else", "do"

and "case" statements and end in "fi", "done" and "esac" statements

• Creates a new context for statements and commands• Ex:

if (( $x>1 )); then

echo "Test"echo "x is greater than 1"

fi

44

Page 45: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Conditional operations with "if-then-else"

• If-then-else syntax allows programmers to introduce logic in their programs

• Blocks of code can be branched to execute only when certain conditions are metif [condition1 is true]; then

<statements if condition1 is true> else; <statements if condition1 is false> fi

• Nested if statements are possible

45

Page 46: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Conditions/Tests• Linux supports many kinds of "tests" that result in a T/F

value, which can be used in an if-then-else statementif [ -f file.txt ]; then echo "File exists" \else; echo "Does not exist"; fiif [ -d dirname ]; then echo "Directory exists" \else; echo "Does not exist"; fiif [ "string" = $string ]; then echo \"Identical strings" else; echo "Not same"; fiif [ "string" != $string ]; then echo \"Not identical strings" else; echo "Same"; fiif [ -n $string]; then echo "String not empty"; \else "Empty string"; fi

46

Page 47: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Conditions/Tests (..contd)

47

if [ INTEGER1 -eq INTEGER2]; then echo ""; else; echo "" fiif [ INTEGER1 -ge INTEGER2]; then echo ""; else; echo "" fiif [ INTEGER1 -gt INTEGER2]; then echo ""; else; echo "" fiif [ INTEGER1 -le INTEGER2]; then echo ""; else; echo "" fiif [ INTEGER1 -lt INTEGER2]; then echo ""; else; echo "" fiif (( $num <= 5 )); then echo "Number less than 5"; fi

• Double square bracket syntax is also used. (When?)

Page 48: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Rules of conditional statements

• Always keep spaces between the brackets and the actual check/comparison

• Always terminate the line with ";" before putting a new keyword like "then", since it is a shell command

• Quote string variables if you use them in conditions• You can invert a condition by putting an "!" in front of

it• You can combine conditions by using "-a" for "and"

and "-o" for "or"

48

Page 49: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Flow Control: "For" loop

• "For" loops allow users to repeat a set of statements a pre-set number of time.STAGE=$(seq 1 10)for i in ${STAGE}; do

echo "Stage $i"done

• The "in" syntax allows for other arrays to be createdfor file in `ls`; do

echo $filedonefor line in `cat gene_info.txt`; do

echo $linedone

Page 50: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Iterating over arrays with "while"• Example:nucleotides=("adenine" "cytosine" "guanine"

"thymine" "uracil")i=0while [ $i -lt ${#nucleotides[@]} ]; do printf "Nucleotide is: %s\n" ${nucleotides[i]}

i=$(($i+1))done

Output:Nucleotide is: adenineNucleotide is: cytosineNucleotide is: guanineNucleotide is: thymineNucleotide is: uracil

50

Page 51: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Switch-case

• Case statements allow for branching to be performed on code blocks based on different values a variable takes

• Like an if-then-else statement, except instead of condition, the syntax checks for values of variable

x=1case $x in "1") echo 1 ;; "2") echo 2 ;; *) echo "none" ;; esac

51

Page 52: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Shell script File access

• What is file access?– set of Shell script commands/syntax to work with data files

• Why do we need it?– Makes reading data from files easy, we can also create new

data files

• What different types are there?– Read, write, append

52

Page 53: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

File I/O

• Low level file I/O is usually not performed in Linux– Abundance of file manipulation tools/commands

• If needed though, ASCII/text files can be read line by line using shell script easily. file="sequence.fa"while read line; do

# display $line or do something with $line echo "$line"

done < "$file"

Page 54: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

File read and write examplefile="mailing_list"while read line; do

printf "%s %s" "$fields[1]" "$fields[0]" printf "%s %s" "$fields[3]" "$fields[4]" printf "%s %s %s" "fields[5]" \

"fields[6]" "fields[7]"done < "$file"

• Output:Al Smith 123 Apple St., Apt. #1 Cambridge, MA 02139

54

Input file:Last name:First name:Age:Address:Apartment:City:State:ZIP Smith:Al:18:123 Apple St.:Apt. #1:Cambridge:MA:02139

Page 55: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Functions

• What is a function?– group related statements into a single task– segment code into logical blocks– avoid code and variable based collision– can be "called" by segments of other code

• Subroutines return values– Explicitly with the return command– Implicitly as the value of the last executed statement

• Return values can be a scalar or a flat array

55

Page 56: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Functions

• A function can be written in any Shell script program, it is identified by the "def" keyword

• Writing a functionfunction echostars {

echo "***********************"}function exitIfError { if [[ $1 -ne 0 ]]; then echo "ERROR! - return code $1" exit 1 fi}echostars; exitIfError

Page 57: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Functions with Inputs and Outputs

• The "echo" statement can be used to return some output from the functionfunction fib2 { result=(1 1) a=0; b=1 while [ $b -lt $1 ]; do result=(${result[@]} $b) a=$b b=$(($a+$b)) done echo ${result[@]}}

• The function can then be calledsource fib2.sh; fib2 100

Page 58: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Providing input to programs

• It is sometimes convenient not to have to edit a program to change certain data variables

• Shell script allows you to read data from shell directly into program variables with the "raw_input" command

• Examples:echo –n "Enter your name: "read nameShailenderecho $nameShailender

58

Page 59: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

• Command line arguments are optional data values that can be passed as input to the Shell script program as the program is run– After the name of the program, place string or numeric

values with spaces separating them– Accessed them by the xargs variable inside the program or

$1, $2, $3 …– Avoid entering or replacing data by editing the program

• Examples:bash arguments.sh arg1 arg2 10 20

Command Line Arguments

Page 60: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Creating a bash "Shell script"

• The power of linux can be captured in a script, where commands can be placed sequentially to be executed from top to bottom, left to right– The text file containing these commands is called a "shell

script"

• Scripts are useful because a compilation of commands executes a task in an automated and precise manner, repeatedly

60

Page 61: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Shell scripting strategies

• Use "exit" codes– Shell scripts can be terminated abruptly with the use of

the "exit" command, it is desirable to terminate if errors occur, rather than continuing to run

– Examplecd /home/sn34w/project1 # Change to "project1"rm –rf * # Delete everything there

– What if "project1" did not exist and there was an error?• Your entire current directory would get deleted!

• Use of exit codes avoids this problem

61

Page 62: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Shell scripting tips and tricks (…contd)

• The "$?" special variable stores an error message after every linux command, has value of 0 if command was successful, otherwise 1 or more (see error code array)cd /home/sn34w/project1echo $?if [[ $? eq 0 ]]; then

rm –rf *fi

62

Page 63: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Useful Shell scripting tips

• Pipes (|) send the output of one command to another as Standard input so that powerful constructs for operating on data become possible– Order of execution is from left to rightcat sequence.fa | grep "ACTTTA" | wc -l

• A linux command can be split across multiple lines by using the "\" character at the end of the linecat sequence.fa | grep \

"ACTTTA" | wc -l

63

Page 64: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Useful Shell scripting tips (…contd)

• Shell expansion with wild cards• Input and Output redirection with "<", ">", and ">>"• Tab completion• Combining options/flags• Using flag names with "--"

64

Page 65: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Useful Shell scripting tips (…contd)

• Copying and pasting clipboard with left and right mouse clicks

• Using multiple shells at the same time• Using semi-colon to run commands on same

line• Evaluating linux commands with backticks• Conditional execution of commands with &&

and ||

65

Page 66: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Shell scripts in our home directory

• Users of the bash shell have scripts in their home directory that control shell behaviors– .bashrc, executed with new interactive terminal session – .bash_profile, executed with new login session– .bash_history, contains history of commands – saves

commands on exit and loads them upon start of session– .bash_logout, contains things to do upon logout

• To look any of these, say .bashrc, do:ls –a ~ # Display hidden files in home dirvi ~/.bashrc # Open .bashrc file in home dir

00/00/2010Information Services,66

Page 67: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Shell script example: Downloading the human genome

• The hg19 build of the human genome can be downloaded from the UCSC website, but before it is usable, it has to be unzipped, "cleaned up", etc.vi make_hg19.sh

00/00/2010Information Services,67

Page 68: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Using Shell script programs on the cluster

• Shell script scripts can easily be submitted as jobs to be run on the MGHPCC infrastructure

• Basic understanding of Linux commands is required, and an account on the cluster

• Lots of useful and account registration information atwww.umassrc.org

• Feel free to reach out to Research Computing for [email protected]

68

Page 69: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

What is a computing "Job"?

• A computing "job" is an instruction to the HPC system to execute a command or script– Simple linux commands or Shell script/Shell script/R scripts

that can be executed within miliseconds would probably not qualify to be submitted as a "job"

– Any command that is expected to take up a big portion of CPU or memory for more than a few seconds on a node would qualify to be submitted as a "job". Why? (Hint: multi-user environment)

69

Page 70: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

How to submit a "job"

• The basic syntax is:bsub <valid linux command>

• bsub: LSF command for submitting a job• Lets say user wants to execute a Shell script

script. On a linux PC, the command isbash countDNA.sh

• To submit a job to do the work, dobsub bash countDNA.sh

70

Page 71: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Specifying more "job" options

• Jobs can be marked with options for better job tracking and resource management– Job should be submitted with parameters such as queue

name, estimated runtime, job name, memory required, output and error files, etc.

• These can be passed on in the bsub commandbsub –q short –W 1:00 –R rusage[mem=2048] –J "Myjob" –o hpc.out –e hpc.err bash countDNA.sh

71

Page 72: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Job submission "options"

72

Option flag or name

Description

-q Name of queue to use. On our systems, possible values are "short" (<=4 hrs execution time), "long" and "interactive"

-W Allocation of node time. Specify hours and minutes as HH:MM

-J Job name. Eg "Myjob"

-o Output file. Eg. "hpc.out"

-e Error file. Eg. "hpc.err"

-R Resources requested from assigned node. Eg: "-R rusage[mem=1024]", "-R hosts[span=1]"

-n Number of cores to use on assigned node. Eg. "-n 8"

Page 73: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Why use the correct queue?

• Match requirements to resources• Jobs dispatch quicker• Better for entire cluster• Help GHPCC staff determine when new resources are

needed

73

Page 74: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Questions?

• How can we help further?• Please check out books we recommend as

well as web references (next 2 slides)

00/00/2010Information Services,74

Page 75: Introduction to Shell scripting Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School.

Shell script Books

• Shell script books which may be helpful– http://shop.oreilly.com/product/9781118983843.do

• Linux Command Line and Shell Scripting Bible, 3rd Edition

– http://shop.oreilly.com/product/9781118004425.do • Linux Command Line and Shell Scripting Bible, 2nd Edition

– http://shop.oreilly.com/product/9781782162742.do • Linux Shell Scripting Cookbook, 2nd Edition

– http://shop.oreilly.com/product/9780764583209.do • Beginning Shell Scripting

75