1
Introducing the
LINUX
Operating System
Mark Wamalwa BecA-‐ILRI Hub, Nairobi, Kenya h"p://hub.africabiosciences.org/ h"p://www.Ilri.org/ [email protected]
BecA-ILRI INTRODUCTION TO BIOINFORMATICS
2
What is UNIX?
• A family of operating systems
IRIX
SOLARIS
AIX
LINUX
Digital UNIX
HP-UX
...
• Multitasking
Runs more than one program at the same time.
A busy system can be running several hundred or even thousands of programs at the same time.
• Multiuser
Many different people can use the system at the same time.
• Networked
It is designed to be linked to other computers and to allow people to work over a network.
The network IS the computer.
3
What is LINUX?
n A freely available clone of the UNIX operating system
for personal computers n Linux and Unix
– Time Sharing OPS: allow multiple users to use the system simultaneously
– Unix: developed in 1969 at Bell-Labs – Linux is similar to Unix in some aspects
Linus Torvalds
4
X
Xprog X · unix> help
Press ENTER to continue:
Disk storage
Memory
Network adapter
Modem
Screen
Keyboard
UNIX
Kernel
What does UNIX do?
The Computer
• Controls access to the hardware. • Prevents programs interfering with each other. • Provides an easy way for programmers to talk to the electronics. • Controls data storage and protection.
The Shell (or command line) • Allows the user to interact directly with the computer by typing commands. • The shell interprets these and instructs the kernel accordingly. • Very powerful but can be intimidating
Console programs • Run from the shell • Use one program actively at a time
The X Window System • Graphical interface (point, click, drag, drop etc.) • Network enabled • Can use many programs at once • Is a separate program • Easier to use than the shell but less powerful
Pointy, clicky program. • Any number of users can use any number of programs and methods to access the system from any number of remote machines at the same time.
users User Interaction • Many different users, typically accessing the system from remote machines in different ways
5
Log in from anywhere.
Logging in
Log in from anywhere you have permission Have graphical output sent anywhere you have permission
You must have a username (login id) to use a unix/linux system
This identifies you to the system so it can manage your work properly.
Every user is a member of one or more groups of users.
This helps the system manage different types of user properly.
6
Connecting to http://hpc.ilri.cgiar.org Connected. Welcome to Genotyping by Sequencing (GBS) workshop Login:
Logging in Connect to the linux machine using: • Putty • WinSCP - open source SFTP (SSH File Transfer Protocol) • SCP (Secure CoPy) client for Windows using SSH (Secure SHell).
Telnet Xterm Secure Shell Kermit Other terminal emulators
username
unix is case sensitive. username is not the same as Username or USERNAME
Password: linux doesn’t show p/w on the screen as you type your password.
The system will be unavailable during Ramadhan. You have new mail. username@hpc~>
You may get some messages here from the system administrator.
Accessing HPC from Windows systems
n Two stage process: – Connecting to the system via secure shell (ssh) login – Getting a graphical connection that supports X-Windows
n ssh connection: – Need third party software. – Local suggestion – use puTTY
n Process is slightly more awkward than ideal because local puTTY is configured for the Sun UNIX environment.
n Better – download putty.exe from http://www.chiark.greenend.org.uk/~sgtatham/putty/
– Just runs from your desktop n Alternative – cygwin - a Linux-like environment for Windows
– www.cygwin.com
Using Local PuTTY - 1
Better choice
This is necessary for all PuTTY installs.
Using Local PuTTY - 2
linux
Using PuTTY-3
PuTTY Terminal Screen
12
The shell or command line Several different shells but they behave more or less the same
username@hpc/home~> interactive
your username the machine you are logged in to
your present location The prompt can be customised to look how you wish
1. The Prompt.
13
The shell or command line 2. Commands
username@hpc~>
The shell breaks the command up into individual words
ls -ald *.txt
The first word is a command
ls -ald *.txt ls -ald *.txt
The subsequent words form a list of arguments to the command arguments beginning with - are options
ls -ald *.txt ls -ald *.txt
* is a special character. It means ‘any group of characters’ (including none). The shell finds all the filenames that match anything.txt and adds them to the list of arguments
The boundary between words is a space. For the shell to treat a phrase that includes spaces as a single word, put it in quotes: 'my word' or "my word". Options control how the program runs. '-a -l -d' is equivalent to '-ald'
14
More Special Characters
* ? " '
& | > <
`` $ \
Any group of characters including none. Any single character. word delineation
Cause the process to run in the background Pipe. Pass the output of the command on the left as the input to the command on the right.
Redirect the commands output, eg. to a file Redirect a commands input. eg. from a file instead of the keyboard. Backticks (not '). Take the output of the command as an argument
String or Dollar Treat the next word as a variable and write out its value
Backslash. Change the meaning of the next character.
Some special characters can lose their special meaning if they are inside quotes.
; Semicolon Seperate commands typed in together.
15
Organisation
"Everything is a file"
• An ordinary file contains data. • A directory contains other files. • A link is a file that is a shortcut to another file. The data could be an image, a document, a set of instructions (a program) or any fixed information. This is a folder on windows. A directory can contain
other directories (sub-directories.) Files can have more than one name, and be in different directories at the same time
• There are many other types of file .
16
Organisation of the file system
/
The top of the file system is the directory '/', commonly known as the root directory
bin usr etc home
Several subdirectories under the root directory username
Another subdirectory. project
seq2 seq1 seq3 seq4
letter prot An example users home directory with a subdirectory and several files
Any file in the file system can be uniquely identified by describing the path to it from the root directory.
/home/username/prot
/
/home/username/prot
home
/home/username/prot
username
/home/username/prot
prot
17
Organisation of the file system
/
bin usr etc home
username
project
seq2 seq1 seq3 seq4
letter prot
Any process is located somewhere in the filesystem
The command 'pwd' will tell you where.
username@hpc ~> pwd /home/username '~' is a linux shortcut for 'your
home directory' ‘pwd’ – print working dir
18
Looking at the file system
/
bin usr etc home
username
project
seq2 seq1 seq3 seq4
letter prot
'ls' lists the files in a directory or directories
username@hpc ~> ls prot letter project username@hpc~> project: seq1 seq2 seq3 seq4
ls project
Without an argument, ls lists all the files that don't start with . in the current directory There are many options to ls that allow you to select and control the information it presents.
19
Moving around the file system
/
bin usr etc home
username
project
seq2 seq1 seq3 seq4
letter prot
'directory' is the directory to which you want to move. The name can be written as the full path (from root) or as the relative path (from your current directory)
You can move to a different directory with the command 'cd directory '
username@hpc ~> cd /home/username/project username@hpc ~/project> pwd /home/username/project
username@hpc ~> cd project username@hpc ~/project> pwd /home/username/project
username@hpc ~/project> cd ..
'..' means the parent directory. '.' means the current directory.
..
username@hpc ~> pwd /home/username
username@hpc ~>
repeat using the relative path
20
Changing the file system
/
bin usr etc home
username
project
seq2 seq1 seq3 seq4
letter prot
You can create a new subdirectory in the current directory with the command ' mkdir directory '
username@hpc ~> mkdir model username@hpc ~>
model
21
Changing the file system
/
bin usr etc home
username
project
seq2 seq1 seq3 seq4
letter prot
You can delete an empty subdirectory with the command ' rmdir directory'
username@hpc ~> rmdir model
model
username@hpc ~>
model You can delete a file with the command ' rm file '
rm prot username@hpc~> rm -rf directory
You can delete a subdirectory and its contents with the command ' rm -rf directory '
22
More about files: filenames Filenames can contain any normal text character including spaces and special characters.
Filenames can be almost any length. It is best to stick to a-z, A-Z, _, -, and numbers. It is best to keep them short as it saves typing.
If a filename contains a special character or a space you may need to put quotes around the whole path.
Special characters in filenames can cause problems with some programs.
23
More about files: reading files You can print the contents of one or more files to the screen with the command: 'cat file1 file2 ...'
cat prints the whole file at once, so a file longer than just a few lines will run off the top of your screen.
You can view the contents of one or more files a page at a time on the screen with the command: ' more file1 file2 ...'
more will let you search through a file, go backwards and forwards and has many other functions.
You can print the first few lines of a file with the command: 'head file1 file2 ...'
The last few lines can be viewed with 'tail'
24
More about files: editing files You can change the content of text files and create new files with a text editor.
Text editors edit text. They do not try to format the text like word processors.
PICO A novice friendly basic text editor used as standard on many systems. Start with the command 'pico filename'
EMACS A powerful editing environment which can be programmed. It has many modes for auto layout of program code. Start with the command 'emacs filename'
VI A powerful editor which can be somewhat confusing for newcomers. It is designed for rapid editing of text files and programming. Start with the command 'vi filename'
Others: kedit,gedit,kwrite etc..
25
If newfilename is a directory, then the file will be copied to 'newfilename/oldfilename'
You can copy a file with the command 'cp oldfilename newfilename'
username@hpc ~> letter project username@hpc ~>
More about files: copying files
ls
cp letter draft username@hpc ~> ls draft letter project username@hpc ~> mv oldfilename newfilename Warning:
If a file called newfilename already exists then it will be overwritten. The command 'mv oldfilename newfilename'
can be used to rename a file
26
• Permissions determine who can read, write, or execute a given file.
More about files: permissions • Every file is protected.
Owner Group World
The user who owns the file
Other users in the same group as the user who owns the file. All the other users in the system.
• Files can have read (-r), write (-w) or execute (-x) permission for each of the three types of user.
27
You can view the permissions for a file by listing it in long format with the command 'ls -l filename'
username@hpc ~> ls -l letter -rwxr--r-- 1 username users 6048 Aug 17 16:07 letter
The letter l The file type: - - ordinary file d - directory l - link (shortcut)
Permissions for the owner
-rwxr--r--
Permissions for the owners group
-rwxr--r--
Permissions for everyone else
-rwxr--r-- username
The user who owns the file
users
The files group
6048
The files size
Aug 17 16:07
The date the file was last modified
letter
The files name
More about files: permissions
28
change is the modification you want to make to the files permissions
username@hpc ~>
You can change the permissions for a file with the command 'chmod change filename'
-rwxr--r-- 1 username users 6048 Aug 17 16:07 letter username@hpc ~>
More about files: permissions
ls -l letter
chmod o-r letter chmod o-r letter
For whom you are changing permissions: o - other g - group u - user a - all
chmod o-r letter
Permissions being changed: r - read permission w - write permission x - execute (run) permission
chmod o-r letter
How you are changing permissions: - - remove these permissions + - add these permissions = - set permissions to this
username@hpc ~> -rwxr----- 1 username users 6048 Aug 17 16:07 letter username@hpc ~>
ls -l letter
Introduction to Awk
Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.
Awk
n Works well on record-type data n Reads input file(s) a line at a time n Parses each line into fields n Performs user-defined tests against
each line, performs actions on matches
Other Common Uses
n Input validation – Every record have same # of fields? – Do values make sense (negative time,
hourly wage > $100, etc.)? n Filtering out certain fields n Searches
– Who got a zero on lab 3? – Who got the highest grade?
n Many others (it's late)
Invocation
n Can write little one-liners on the command line (very handy): – print the 3rd field of every line: $ awk '{ print $3 }' input.txt
n Execute an awk script file: $ awk –f script.awk input.txt
n Or, use this sha-bang as the first line, and give your script execute permissions: #!/bin/awk -f
Form of an AWK program
n AWK programs are entries of the form: pattern { action } – pattern – some test, looking for a pattern
(regular expressions) or C-like conditions n if null, actions are applies to every line
– action – a statement or set of statements n if not provided, the default action is to print
the entire line, much like grep
Awk Features
n Patterns can be regular expressions or C like conditions.
n Each line of the input is matched against the patterns, one after the next. If a match occurs the corresponding action is performed.
n Input lines are parsed and split into fields, which are accessed by $1,…,$NF, where NF is a variable set to the number of fields. The variable $0 contains the entire line, and by default lines are split by white space (blanks, tabs)
Variables
n Not declared, nor typed n No character type
– Only strings and floats (support for ints)
n $n refers to the nth field (where n is some integer value) # prints each field on the line for( i=1; i<=NF; ++i ) print $i
Some Built-in Variables
n FS – the input field separator n OFS – the output field separator n NF – # of fields; changes w/each
record n NR – the # of records read (so far).
So, the current record #. n $0 – the entire input line
37
You can get help on a command by using the command ' man command'
Getting help
This will bring up the manual page and show it to you screen by screen
If you do not know what a command is called, use the option '-k' to get a list of commands that may be relevant 'man -k word'
This will find all manual pages containing word in the short description of the command.
Try using the options '-h', '-help', or '--help' if you can't find the man page.
Exercise: Filter SNPS
38
Go to http://hpc.ilri.cgiar.org/beca/gbs/ and run these commands in your home directory
a) mkdir snp_data b) cd snp_data c) wget http://hpc.ilri.cgiar.org/beca/gbs/Africa55K_10Pops.bim d) wget http://hpc.ilri.cgiar.org/beca/gbs/emp.data e) ls -alh f) grep '^23\|^25\|^26 Africa55K_10Pops.bim >
AfricaAll_Pops_non_autosomal.rsids g) awk '{if ($1 > 22) print $2}' Africa55K_10Pops.bim >
Africa55K_10Pops.xchrsnps
Example
$ cat emp.data Beth 4.00 0 Dan 3.75 0 Kathy 4.00 10 Mark 5.00 20 Mary 5.50 22 Susie 4.25 18
Print those employees who actually worked $ awk '$3>0 {print $1, $2*$3}' emp.data
Kathy 40 Mark 100 Mary 121 Susie 76.5
40
Acknowledgement
n SANBI (David Martin) n BSK
Adapted from SANBI & Bioinformatics Society of Kenya/BSK
41
Useful literature
'Learning the UNIX operating system', O'Reilly press.
'UNIX Quickguide’ hpc Questions ?
Top Related