Workshop on command line tools - day 2
-
Upload
leandro-lima -
Category
Software
-
view
230 -
download
4
Transcript of Workshop on command line tools - day 2
![Page 1: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/1.jpg)
I Workshop on command-line tools
(day 2)
Center for Applied GenomicsChildren's Hospital of Philadelphia
February 12-13, 2015
![Page 2: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/2.jpg)
awk - a powerful way to check conditions and show specific columnsExample: show only CNV that use less than 3 targets (exons)tail -n +2 DATA.xcnv | awk '$8 <= 3'
![Page 3: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/3.jpg)
awk - different ways to do the same thingtail -n +2 DATA.xcnv | awk '$8 <= 3'
# same effect 1
tail -n +2 DATA.xcnv | awk '$8 <= 3 {print}'
# same effect 2
tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print}'
# same effect 3
tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print $0}'
# different effect
tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print $1}'
![Page 4: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/4.jpg)
awk - more options on if statement# Applying XHMM "gold" thresholds (KB >= 1,
# NUM_TARG >= 3, Q_SOME >= 65, Q_NON_DIPLOID >= 65)
tail -n +2 DATA.xcnv | \
awk '$4 >= 1 && $8 >= 3 && $10 >= 65 && $11 >= 65' \
> DATA.gold.xcnv
# Using only awk
awk 'NR > 1 && $4 >= 1 && $8 >= 3 &&
$10 >= 65 && $11 >= 65' DATA.xcnv > DATA.gold2.xcnv
![Page 5: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/5.jpg)
diff - compare files line by line
# Comparediff DATA.gold.xcnv DATA.gold2.xcnv
# Tip: install tkdiff to use a# graphic version of diff
![Page 6: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/6.jpg)
Exercises1. Using adhd.map, show 10 SNPs with rsID starting with 'rs' on
chrom. 2, between positions 1Mb and 2Mb2. Check which chromosome has more SNPs3. Check which snp IDs are duplicated
![Page 7: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/7.jpg)
Suggestions# 1.
grep '\brs' adhd.map | \
awk '$1 == 2 && int($4) >= 1000000 && int($4) <= 2000000' | \
less
# 2.
cut -f1 adhd.map | sort | uniq -c | sort -k1n | tail -1
# 3.
cut -f2 adhd.map | sort | uniq -c | awk '$1 > 1'
![Page 8: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/8.jpg)
More awk - inserting external variablesawk -v Mb=1000000 -v chrom=2 \
'$1 == chrom && int($4) >= Mb && int($4) <= 2*Mb' \
adhd.map | less
# Printing specific columns
awk -v Mb=1000000 -v chrom=2 \
'$1 == chrom && int($4) >= Mb && int($4) <= 2*Mb
{print $1" "$2" "$4}' \
adhd.map | less
![Page 9: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/9.jpg)
Using awk to check number of variantsin ped files# Options using only awk, but takes (much) more time
awk 'NR == 1 {print (NF-6)/2}' adhd.ped
awk 'NR < 2 {print (NF-6)/2}' adhd.ped # Slow, too
# Better alternative
head -n 1 adhd.ped | awk '{print (NF-6)/2}'
# Now, the map file
wc -l adhd.map
![Page 10: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/10.jpg)
time - time command execution
time head -n 1 adhd.ped | awk '{print (NF-6)/2}'real 0m0.485suser 0m0.391ssys 0m0.064s
time awk 'NR < 2 {print (NF-6)/2}' adhd.ped
# Forget… just press Ctrl+Creal 1m0.611suser 0m51.261ssys 0m0.826s
![Page 11: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/11.jpg)
top - display and update sorted information about processes / display Linux taks
top
z : colork : kill processu : choose specific userc : show complete commands running1 : show usage of singles CPUsq : quit
![Page 12: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/12.jpg)
screen - screen manager with terminal emulation (i)
screenscreen -S <session_name>Ctrl+a, then c: create windowCtrl+a, then n: go to next windowCtrl+a, then p: go to previous windowCtrl+a, then 0: go to window number 0Ctrl+a, then z: leave your session, but keep running
![Page 13: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/13.jpg)
screen - screen manager with terminal emulation (ii)
Ctrl+a, then [ : activate copy mode (to scroll screen) q : quit copy modeexit : close current windowscreen -r : resume the only session detachedscreen -r <session_name> : resume specific session detachedscreen -rD <session_name> : reattach session
![Page 14: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/14.jpg)
split - split a file into piecessplit -l <lines_of_each_piece> <input> <prefix>
# Examplesplit -l 100000 adhd.map map_
wc -l map_*
![Page 15: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/15.jpg)
in-line Perl/sed to find and replace (i)head DATA.gold.xcnv | cut -f3 | perl -pe 's/chr/CHR/g'
head DATA.gold.xcnv | cut -f3 | perl -pe 's/chr//g'
# Other possibilities
head DATA.gold.xcnv | cut -f3 | perl -pe 's|chr||g'
head DATA.gold.xcnv | cut -f3 | perl -pe 's!chr!!g'
head DATA.gold.xcnv | cut -f3 | sed 's/chr//g'
# Creating a BED file
head DATA.gold.xcnv | cut -f3 | perl -pe 's/[:-]/\t/g'
![Page 16: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/16.jpg)
in-line Perl/sed to find and replace (ii)# "s" means substitute
# "g" means global (replace all matches, not only first)
# See the difference...
head DATA.gold.xcnv | cut -f3 | sed 's/9/nine/g'
head DATA.gold.xcnv | cut -f3 | sed 's/9/nine/'
# Adding more replacements
head DATA.gold.xcnv | cut -f3 | sed 's/1/one/g; s/2/two/g'
![Page 17: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/17.jpg)
copy from terminal to clipboard/paste from clipboard to terminal
# This is like Ctrl+V in your terminal
pbpaste
# This is like Ctrl+C from your terminal
head DATA.xcnv | pbcopy
# Then, Ctrl+V in other text editor
# On Linux, you can install "xclip"http://sourceforge.net/projects/xclip/
![Page 18: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/18.jpg)
datamash - command-line calculations
tail -n +2 DATA.xcnv | \ head | \ cut -f6,10,11 | \ datamash mean 1 sum 2 min 3 # mean of 1st column # sum of 2nd column # minimum of 3rd column
http://www.gnu.org/software/datamash/
![Page 19: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/19.jpg)
touch - change file access and modification times
ls -lh DATA.gold.xcnvtouch DATA.gold.xcnvls -lh DATA.gold.xcnv
![Page 20: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/20.jpg)
Introduction to "for" looptail -n +2 DATA.xcnv | cut -f1 | sort | uniq | head > samples.txt
for sample in `cat samples.txt`; do touch $sample.txt; done
ls -lh Sample*
for sample in `cat samples.txt`; do
mv $sample.txt $sample.csv;
done
ls -lh Sample*
![Page 21: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/21.jpg)
Variables (i)
i=1name=Leandrocount=`wc -l adhd.map`echo $iecho $nameecho $count
![Page 22: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/22.jpg)
Variables (ii)
# Examplesbwa=/home/users/llima/tools/bwahg19=/references/hg19.fasta
# Do not run$bwa index $hg19
![Page 23: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/23.jpg)
System variablesecho $HOMEecho $USERecho $PWD
# directory where bash looks for your programsecho $PATH
![Page 24: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/24.jpg)
Exercise
1. Create a program that shows input parameters/arguments
2. Create a program (say, "fields", or "colnames") that prints the column names of a <tab>-delimited file (example: DATA.xcnv)
3. Send this program to your PATH
![Page 25: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/25.jpg)
Running a bash script (i)cat > arguments.shecho Your program is $0echo Your first argument is $1echo Your second argument is $2
echo You entered $# parameters.# Ctrl+C to exit "cat"
![Page 26: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/26.jpg)
Running a bash script (ii)bash arguments.shbash arguments.sh A B C D E
![Page 27: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/27.jpg)
ls -lh arguments.sh
-rw-r--r--
# First characterb Block special file.c Character special file.d Directory.l Symbolic link.s Socket link.p FIFO.- Regular file.
chmod - set permissions (i)
![Page 28: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/28.jpg)
Next charactersuser, group, others | read, write, executels -lh arguments.sh-rw-r--r--
# Everybody can read# Only user can write/modify
chmod - set permissions (ii)
![Page 29: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/29.jpg)
# Add writing permission to groupchmod g+w arguments.sh ls -lh arguments.sh# Remove writing permission from groupchmod g-w arguments.shls -lh arguments.sh# Add execution permission to allchmod a+x arguments.shls -lh arguments.sh
chmod - set permissions (iii)
![Page 30: Workshop on command line tools - day 2](https://reader030.fdocuments.net/reader030/viewer/2022032618/55b9f494bb61eb3d2b8b45b3/html5/thumbnails/30.jpg)
# Add writing permission to group./arguments.sh ./arguments.sh A B C D E# change the namemv arguments.sh arguments# Send to your PATH (showing on Mac)sudo cp arguments /usr/local/bin/# Go to other directory# Type argu<Tab>, and "which arguments"
Run your program again