Introduction to linux at Introductory Bioinformatics Workshop

48
Introduction to Linux by Setor Amuzu H3ABioNet Associate [email protected]

description

This is a brief introduction to Linux, with emphasis on command-line interface. This presentation was made to participants of the H3ABioNet Introductory Bioinformatics workshop held in Accra, Ghana on 26 March, 2014.

Transcript of Introduction to linux at Introductory Bioinformatics Workshop

Page 1: Introduction to linux at Introductory Bioinformatics Workshop

Introduction to Linux

bySetor Amuzu

H3ABioNet [email protected]

Page 2: Introduction to linux at Introductory Bioinformatics Workshop

Outline1. What is Linux?2. Command-line Interface, Shell & BASH3. Popular commands4. File Permissions and Owners5. Installing programs6. Piping and Scripting7. Variables8. Common applications in bioinformatics9. Conclusion

11/04/2023 H3ABioNet Workshop 1: Day 4 2

Page 3: Introduction to linux at Introductory Bioinformatics Workshop

What is Linux?• Linux is a Unix-like computer

operating system assembled under the model of free and open source software development and distribution.

• UNIX is a multitasking, multi-user computer OS originally developed in 1969.

11/04/2023 H3ABioNet Workshop 1: Day 4 3

Linus Torvalds – Former Chief architect of Linux Kernel and current project Coordinator

Page 4: Introduction to linux at Introductory Bioinformatics Workshop

What is Linux?• Operating system (OS): Set of programs that manage computer hardware resources and provide common services for application software.

• Kernel

11/04/2023 H3ABioNet Workshop 1: Day 4 4

Page 5: Introduction to linux at Introductory Bioinformatics Workshop

What is Linux?• Linux kernel (v 0.01) was 1st released in 1991. Current stable

version is 3.13 released in January 2014.

• The underlying source code of Linux kernel may be used, modified, and distributed — commercially or non-commercially — by anyone under licenses such as the GNU General Public License.

• Therefore, different varieties of Linux have arisen to serve different needs and tastes. These are called Linux distributions (or distros).

• All Linux distros have the Linux kernel in common

11/04/2023 H3ABioNet Workshop 1: Day 4 5

Page 6: Introduction to linux at Introductory Bioinformatics Workshop

What is Linux?

11/04/2023 H3ABioNet Workshop 1: Day 4 6

Linux Distribution

Supporting packages

Linux kernel

Free, open-source, proprietary

software

Page 7: Introduction to linux at Introductory Bioinformatics Workshop

What is Linux?• There are over 600 Linux distributions, over 300 of which are in

active development.

11/04/2023 H3ABioNet Workshop 1: Day 4 7

Page 8: Introduction to linux at Introductory Bioinformatics Workshop

What is Linux?• Linux distributions share core components but may look different

and include different programs and files. • For example:

11/04/2023 H3ABioNet Workshop 1: Day 4 9

Page 9: Introduction to linux at Introductory Bioinformatics Workshop

What is Linux?

Commercially-backed distros• Fedora (Red Hat)• OpenSUSE (Novell)• Ubuntu (Canonical Ltd.)• Mandriva Linux (Mandriva)

Ubuntu is the most popular desktop Linux distribution with 20 million daily users worldwide, according to ubuntu.com.

Community-driven distros• Debian• Gentoo• Slackware• Arch Linux

11/04/2023 H3ABioNet Workshop 1: Day 4 10

Page 10: Introduction to linux at Introductory Bioinformatics Workshop

Shell, Command-line Interface & BASH

Command-line interface (CLI) Graphical User Interface (GUI)

11/04/2023 H3ABioNet Workshop 1: Day 4 11

The shell provides an interface for users of an operating system.

Page 11: Introduction to linux at Introductory Bioinformatics Workshop

Shell, Command-line Interface & BASH

Topic CLI GUIEase of use Generally more difficult to

successfully navigate and operate a CLI.

Much easier when compared to a CLI.

Control Greater control of file system and operating system in a CLI.

More advanced tasks may still need a CLI.

Resources Uses less resources. Requires more resources to load icons etc.

Scripting Easily script a sequence of commands to perform a task or execute a program.

Limited ability to create and execute tasks, compared to CLI.

11/04/2023 H3ABioNet Workshop 1: Day 4 12

Page 12: Introduction to linux at Introductory Bioinformatics Workshop

11/04/2023 H3ABioNet Workshop 1: Day 4 14

Shell, Command-line Interface & BASH

• A command is a directive to a computer program, acting as an interpreter of some kind, to perform a specific task.

• BASH is the primary shell for GNU/Linux and Mac OS X.

Shell→ CLI→ BASH (Bourne-Again SHell)

Page 13: Introduction to linux at Introductory Bioinformatics Workshop

• A Linux command typically consists of a program name, followed by options and arguments.

11/04/2023 H3ABioNet Workshop 1: Day 4 15

Shell, Command-line Interface & BASH

Page 14: Introduction to linux at Introductory Bioinformatics Workshop

11/04/2023 H3ABioNet Workshop 1: Day 4 16

Shell, Command-line Interface & BASH

Useful BASH shortcuts…

Shortcut Meaning

Page 15: Introduction to linux at Introductory Bioinformatics Workshop

Popular commands• Directory structure

11/04/2023 H3ABioNet Workshop 1: Day 4 18

Default working directory after user login

Complete directory path: /home/user/Documents/LinuxClass

Page 16: Introduction to linux at Introductory Bioinformatics Workshop

Popular commands• Changing working directories

Command: cd

11/04/2023 H3ABioNet Workshop 1: Day 4 19

Default working directory after user login

Move to parent directory

Move to child directory

Move using complete path: cd /home/user/Documents/LinuxClass

Page 17: Introduction to linux at Introductory Bioinformatics Workshop

Popular Commands

• Navigating directories

11/04/2023 H3ABioNet Workshop 1: Day 4 20

Page 18: Introduction to linux at Introductory Bioinformatics Workshop

Popular Commands• Compressing and archiving files

11/04/2023 H3ABioNet Workshop 1: Day 4 21

Page 19: Introduction to linux at Introductory Bioinformatics Workshop

Popular Commands

Task CommandHard disk usage df -lhRAM memory usage free memWhat processes are running in real-time?

top

Snapshot of current processes ps auxStop a process running in the terminal

CTRL + C

Stop a process that is running outside the terminal

kill <PID>

11/04/2023 H3ABioNet Workshop 1: Day 4 22

• Monitoring & managing resources

Page 20: Introduction to linux at Introductory Bioinformatics Workshop

Popular Commands

• Monitoring Network Connections– Do I have an internet connection? ping <web address>

– The ping command reports, how long a message takes back and forth to the given server.

11/04/2023 H3ABioNet Workshop 1: Day 4 23

Page 21: Introduction to linux at Introductory Bioinformatics Workshop

Popular Commands• Downloading files

– wget <url of file>– curl <url of file>

• wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols.

• curl is a tool to transfer data from or to a server, using one of several supported protocols (DICT, FILE, FTP, FTPS, GOPHER, HTTP, HTTPS, IMAP, etc).

11/04/2023 H3ABioNet Workshop 1: Day 4 24

Page 22: Introduction to linux at Introductory Bioinformatics Workshop

Popular Commands• Remote Connections

– How can I get access to a remote computer? ssh user@hostname

– The ssh (secure shell) command securely logs you into a remote computer where you already have an account.

– X11 connections are possible using -X option.

– Example: ssh -X [email protected]

– scp, sftp commands allow users to securely copy files to or from remote computers11/04/2023 H3ABioNet Workshop 1: Day 4 25

Page 23: Introduction to linux at Introductory Bioinformatics Workshop

Command-line help

Getting help (offline)

• More information about a command can be found from manual pages

COMMAND: manExample: man ls

• ARGUMENTS: -h or –helpExample: blastall --help

11/04/2023 H3ABioNet Workshop 1: Day 4 26

Page 24: Introduction to linux at Introductory Bioinformatics Workshop

Command-line help

Getting help (online)

• Go to explainshell.com• Write down a command-line to see the help text that matches

each argument.

11/04/2023 H3ABioNet Workshop 1: Day 4 27

Page 25: Introduction to linux at Introductory Bioinformatics Workshop

Command-line help

• Output from explainshell.com, for:– grep '>' fasta | sed 's/>//' > id.txt

11/04/2023 H3ABioNet Workshop 1: Day 4 28

Page 26: Introduction to linux at Introductory Bioinformatics Workshop

File Permissions and Owners• Linux is a multi-user OS. Therefore, different users can create

modify or delete the same files.

• To control access and modification of user files, Linux has a file permission and ownership system.

• This system consists of two parts:– Who is the owner of the file or directory?– What type of access does each user have?

11/04/2023 H3ABioNet Workshop 1: Day 4 30

Page 27: Introduction to linux at Introductory Bioinformatics Workshop

File Permissions and Owners• Each file and directory has three user based permission groups:

1. Owner (u) - The Owner permissions apply only the owner of the file or directory.

2. Group (g)- The Group permissions apply only to the group that has been assigned to the file or directory.

3. All Users (‘o’ or ‘a’) - The All Users permissions apply to all other users on the system.

• Each file or directory has three basic permission types:

1. Read (r) - The Read permission refers to a user's capability to read the contents of the file.

2. Write (w) - The Write permissions refer to a user's capability to write or modify a file or directory.

3. Execute(x) - The Execute permission affects a user's capability to execute a file or view the contents of a directory.

11/04/2023 H3ABioNet Workshop 1: Day 4 31

Page 28: Introduction to linux at Introductory Bioinformatics Workshop

File Permissions and Owners

11/04/2023 H3ABioNet Workshop 1: Day 4 32

[me@linuxbox me]$ ls -l some_file

-rw-rw-r-- 1 me me 1097374 Sep 26 18:48 some_file

Information about a file permissions: ls -l <file_name>

Page 29: Introduction to linux at Introductory Bioinformatics Workshop

File Permissions and Owners

• The chmod command is used to modify files and directory permissions. Typical permissions are read (r), write (w), execute (x).

syntax: chmod [options] permissions files

11/04/2023 H3ABioNet Workshop 1: Day 4 33

Page 30: Introduction to linux at Introductory Bioinformatics Workshop

File Permissions and Owners

• sudo – is a command for Unix-like computer operating systems that

allows users to run programs with the security privileges of another user (normally the superuser, or root). Its name is a concatenation of the su command (which grants the user a shell for the superuser) and "do", or take action.

– Example: sudo cp ./myscript.pl /usr/local/bin/

11/04/2023 H3ABioNet Workshop 1: Day 4 34

Page 31: Introduction to linux at Introductory Bioinformatics Workshop

Installing Programs

1. Using package managers1.1 Graphical package manager, example Synaptic for Ubuntu1.2 High-level command-line package manager, example apt for Debian1.3 Low-level command-line package manager, example dpkg for Debian

2. Copy executable file of program to PATH* 2.1 Pre-compiled

2.2 Build from source

* - PATH can be a directory, such as /usr/local/bin where BASH looks for commands

11/04/2023 H3ABioNet Workshop 1: Day 4 35

Page 32: Introduction to linux at Introductory Bioinformatics Workshop

Installing Programs

1.1 Using graphical package manager (Synaptic on Ubuntu)

11/04/2023 H3ABioNet Workshop 1: Day 4 36

Page 33: Introduction to linux at Introductory Bioinformatics Workshop

Installing Programs

• Search and install programs using Synaptic on Ubuntu

11/04/2023 H3ABioNet Workshop 1: Day 4 37

Page 34: Introduction to linux at Introductory Bioinformatics Workshop

Installing Programs

to

• Install program dependencies (Synaptic on Ubuntu)

11/04/2023 H3ABioNet Workshop 1: Day 4 38

Page 35: Introduction to linux at Introductory Bioinformatics Workshop

Piping and Scripting

• Piping: Run different programs sequentially where the output of one program becomes the input for the next one.

• Bash uses the “|” sign (pipe) to pipe the output of one program as the input of another program.

• For example:

11/04/2023 H3ABioNet Workshop 1: Day 4 44

Page 36: Introduction to linux at Introductory Bioinformatics Workshop

Piping and Scripting

• Another popular combination is redirect the stdout (output) to a file using '>' (write or overwrite if it exists) or '>>' (append).

• Example:

11/04/2023 H3ABioNet Workshop 1: Day 4 45

Page 37: Introduction to linux at Introductory Bioinformatics Workshop

Piping and Scripting• A shell program, called a script, is a tool for building applications by

"gluing together" system calls, tools, utilities, and compiled binaries.

• For example: fasta_seq_count.sh

#! /bin/bash# Count sequences in fasta file (1st argument)grep –c ‘>’ $1

• To run this script:1. Give script execute permission:

chmod u+x fasta_seq_count.sh2. bash fasta_seq_count.sh <fasta_file>

11/04/2023 H3ABioNet Workshop 1: Day 4 46

Page 38: Introduction to linux at Introductory Bioinformatics Workshop

Variables

• A variable is a name assigned to a location or set of locations in computer memory, holding an item of data.

• Variables in BASH can be put into two categories:1. System variables: Variables defined by system, such as PATH and

HOME2. User-defined variables: Variables defined by a user during shell

session.Example:

11/04/2023 H3ABioNet Workshop 1: Day 4 47

Page 39: Introduction to linux at Introductory Bioinformatics Workshop

Variables

• System variables

11/04/2023 H3ABioNet Workshop 1: Day 4 48

Page 40: Introduction to linux at Introductory Bioinformatics Workshop

Variables

• Commands to interact with variables

• Example: Add a program executable directory to your PATH.

export PATH=/home/user/shscripts:$PATH

11/04/2023 H3ABioNet Workshop 1: Day 4 49

Page 41: Introduction to linux at Introductory Bioinformatics Workshop

Common applications in bioinformatics

• Fasta file manipulation

– Fasta file is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes.

11/04/2023 H3ABioNet Workshop 1: Day 4 50

Page 42: Introduction to linux at Introductory Bioinformatics Workshop

Common applications in bioinformatics

• Fasta file manipulation

11/04/2023 H3ABioNet Workshop 1: Day 4 51

Page 43: Introduction to linux at Introductory Bioinformatics Workshop

Common applications in bioinformatics

• BLAST output manipulation

– The BLAST tabular format is one of the most common and useful formats for presenting BLAST output. It has 12 columns: query_id, subject_id, %identity, align_length, mismatches, gaps_openings, q_start, q_end, s_start, s_end, e_value, bit_score

11/04/2023 H3ABioNet Workshop 1: Day 4 52

Page 44: Introduction to linux at Introductory Bioinformatics Workshop

Common applications in bioinformatics

• BLAST output manipulation

11/04/2023 H3ABioNet Workshop 1: Day 4 53

Page 45: Introduction to linux at Introductory Bioinformatics Workshop

Common applications in bioinformatics

• High throughput sequencing software

– Create a report on the quality of a read set: fastqc

– Assemble reads into contigs: velvet, SPAdes, etc.

– Align reads to a known reference sequence: SHRiMP, Bowtie2, BWA etc.

– Many other tools: samtools, picard, GATK, etc.

11/04/2023 H3ABioNet Workshop 1: Day 4 54

Page 46: Introduction to linux at Introductory Bioinformatics Workshop

Conclusion

• Linux is a free and open source OS with powerful and flexible command-line tools to advance your bioinformatics research projects.

• While learning to use these tools may be challenging, at first, the rewards of UNIX/ Linux command-line proficiency is worth the effort.

11/04/2023 H3ABioNet Workshop 1: Day 4 55

Page 47: Introduction to linux at Introductory Bioinformatics Workshop

References• Basic Linux by Aureliano Bombarely Gomez, Boyce Thompson Institute for

Plant Research• Bash Scripting Guide by Mendel Cooper• Introduction to Linux for Bioinformatics by Joachim Jacob, Bioinformatics

Training and Service facility (BITS)• http://www.gnu.org/software/• Linux commands, with detailed examples and explanations: http://

www.linuxconfig.org/linux-commands• The Unix Shell (Software Carpentry): http://

software-carpentry.org/v4/shell/index.html• Bioinformatics on the Command line by Paul Harrison, Victorian

Bioinformatics Consortium

11/04/2023 H3ABioNet Workshop 1: Day 4 56

Page 48: Introduction to linux at Introductory Bioinformatics Workshop

Questions

11/04/2023 H3ABioNet Workshop 1: Day 5 57