Part 4 of 'Introduction to Linux for bioinformatics': Managing data
Introduction to linux at Introductory Bioinformatics Workshop
-
Upload
setor-amuzu -
Category
Technology
-
view
271 -
download
2
description
Transcript of Introduction to linux at Introductory Bioinformatics Workshop
Outline1. What is Linux?2. Command-line Interface, Shell & BASH3. Popular commands4. File Permissions and Owners5. Installing programs6. Piping and Scripting7. Variables8. Common applications in bioinformatics9. Conclusion
11/04/2023 H3ABioNet Workshop 1: Day 4 2
What is Linux?• Linux is a Unix-like computer
operating system assembled under the model of free and open source software development and distribution.
• UNIX is a multitasking, multi-user computer OS originally developed in 1969.
11/04/2023 H3ABioNet Workshop 1: Day 4 3
Linus Torvalds – Former Chief architect of Linux Kernel and current project Coordinator
What is Linux?• Operating system (OS): Set of programs that manage computer hardware resources and provide common services for application software.
• Kernel
11/04/2023 H3ABioNet Workshop 1: Day 4 4
What is Linux?• Linux kernel (v 0.01) was 1st released in 1991. Current stable
version is 3.13 released in January 2014.
• The underlying source code of Linux kernel may be used, modified, and distributed — commercially or non-commercially — by anyone under licenses such as the GNU General Public License.
• Therefore, different varieties of Linux have arisen to serve different needs and tastes. These are called Linux distributions (or distros).
• All Linux distros have the Linux kernel in common
11/04/2023 H3ABioNet Workshop 1: Day 4 5
What is Linux?
11/04/2023 H3ABioNet Workshop 1: Day 4 6
Linux Distribution
Supporting packages
Linux kernel
Free, open-source, proprietary
software
What is Linux?• There are over 600 Linux distributions, over 300 of which are in
active development.
11/04/2023 H3ABioNet Workshop 1: Day 4 7
What is Linux?• Linux distributions share core components but may look different
and include different programs and files. • For example:
11/04/2023 H3ABioNet Workshop 1: Day 4 9
What is Linux?
Commercially-backed distros• Fedora (Red Hat)• OpenSUSE (Novell)• Ubuntu (Canonical Ltd.)• Mandriva Linux (Mandriva)
Ubuntu is the most popular desktop Linux distribution with 20 million daily users worldwide, according to ubuntu.com.
Community-driven distros• Debian• Gentoo• Slackware• Arch Linux
11/04/2023 H3ABioNet Workshop 1: Day 4 10
Shell, Command-line Interface & BASH
Command-line interface (CLI) Graphical User Interface (GUI)
11/04/2023 H3ABioNet Workshop 1: Day 4 11
The shell provides an interface for users of an operating system.
Shell, Command-line Interface & BASH
Topic CLI GUIEase of use Generally more difficult to
successfully navigate and operate a CLI.
Much easier when compared to a CLI.
Control Greater control of file system and operating system in a CLI.
More advanced tasks may still need a CLI.
Resources Uses less resources. Requires more resources to load icons etc.
Scripting Easily script a sequence of commands to perform a task or execute a program.
Limited ability to create and execute tasks, compared to CLI.
11/04/2023 H3ABioNet Workshop 1: Day 4 12
11/04/2023 H3ABioNet Workshop 1: Day 4 14
Shell, Command-line Interface & BASH
• A command is a directive to a computer program, acting as an interpreter of some kind, to perform a specific task.
• BASH is the primary shell for GNU/Linux and Mac OS X.
Shell→ CLI→ BASH (Bourne-Again SHell)
• A Linux command typically consists of a program name, followed by options and arguments.
11/04/2023 H3ABioNet Workshop 1: Day 4 15
Shell, Command-line Interface & BASH
11/04/2023 H3ABioNet Workshop 1: Day 4 16
Shell, Command-line Interface & BASH
Useful BASH shortcuts…
Shortcut Meaning
Popular commands• Directory structure
11/04/2023 H3ABioNet Workshop 1: Day 4 18
Default working directory after user login
Complete directory path: /home/user/Documents/LinuxClass
Popular commands• Changing working directories
Command: cd
11/04/2023 H3ABioNet Workshop 1: Day 4 19
Default working directory after user login
Move to parent directory
Move to child directory
Move using complete path: cd /home/user/Documents/LinuxClass
Popular Commands
• Navigating directories
11/04/2023 H3ABioNet Workshop 1: Day 4 20
Popular Commands• Compressing and archiving files
11/04/2023 H3ABioNet Workshop 1: Day 4 21
Popular Commands
Task CommandHard disk usage df -lhRAM memory usage free memWhat processes are running in real-time?
top
Snapshot of current processes ps auxStop a process running in the terminal
CTRL + C
Stop a process that is running outside the terminal
kill <PID>
11/04/2023 H3ABioNet Workshop 1: Day 4 22
• Monitoring & managing resources
Popular Commands
• Monitoring Network Connections– Do I have an internet connection? ping <web address>
– The ping command reports, how long a message takes back and forth to the given server.
11/04/2023 H3ABioNet Workshop 1: Day 4 23
Popular Commands• Downloading files
– wget <url of file>– curl <url of file>
• wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols.
• curl is a tool to transfer data from or to a server, using one of several supported protocols (DICT, FILE, FTP, FTPS, GOPHER, HTTP, HTTPS, IMAP, etc).
11/04/2023 H3ABioNet Workshop 1: Day 4 24
Popular Commands• Remote Connections
– How can I get access to a remote computer? ssh user@hostname
– The ssh (secure shell) command securely logs you into a remote computer where you already have an account.
– X11 connections are possible using -X option.
– Example: ssh -X [email protected]
– scp, sftp commands allow users to securely copy files to or from remote computers11/04/2023 H3ABioNet Workshop 1: Day 4 25
Command-line help
Getting help (offline)
• More information about a command can be found from manual pages
COMMAND: manExample: man ls
• ARGUMENTS: -h or –helpExample: blastall --help
11/04/2023 H3ABioNet Workshop 1: Day 4 26
Command-line help
Getting help (online)
• Go to explainshell.com• Write down a command-line to see the help text that matches
each argument.
11/04/2023 H3ABioNet Workshop 1: Day 4 27
Command-line help
• Output from explainshell.com, for:– grep '>' fasta | sed 's/>//' > id.txt
11/04/2023 H3ABioNet Workshop 1: Day 4 28
File Permissions and Owners• Linux is a multi-user OS. Therefore, different users can create
modify or delete the same files.
• To control access and modification of user files, Linux has a file permission and ownership system.
• This system consists of two parts:– Who is the owner of the file or directory?– What type of access does each user have?
11/04/2023 H3ABioNet Workshop 1: Day 4 30
File Permissions and Owners• Each file and directory has three user based permission groups:
1. Owner (u) - The Owner permissions apply only the owner of the file or directory.
2. Group (g)- The Group permissions apply only to the group that has been assigned to the file or directory.
3. All Users (‘o’ or ‘a’) - The All Users permissions apply to all other users on the system.
• Each file or directory has three basic permission types:
1. Read (r) - The Read permission refers to a user's capability to read the contents of the file.
2. Write (w) - The Write permissions refer to a user's capability to write or modify a file or directory.
3. Execute(x) - The Execute permission affects a user's capability to execute a file or view the contents of a directory.
11/04/2023 H3ABioNet Workshop 1: Day 4 31
File Permissions and Owners
11/04/2023 H3ABioNet Workshop 1: Day 4 32
[me@linuxbox me]$ ls -l some_file
-rw-rw-r-- 1 me me 1097374 Sep 26 18:48 some_file
Information about a file permissions: ls -l <file_name>
File Permissions and Owners
• The chmod command is used to modify files and directory permissions. Typical permissions are read (r), write (w), execute (x).
syntax: chmod [options] permissions files
11/04/2023 H3ABioNet Workshop 1: Day 4 33
File Permissions and Owners
• sudo – is a command for Unix-like computer operating systems that
allows users to run programs with the security privileges of another user (normally the superuser, or root). Its name is a concatenation of the su command (which grants the user a shell for the superuser) and "do", or take action.
– Example: sudo cp ./myscript.pl /usr/local/bin/
11/04/2023 H3ABioNet Workshop 1: Day 4 34
Installing Programs
1. Using package managers1.1 Graphical package manager, example Synaptic for Ubuntu1.2 High-level command-line package manager, example apt for Debian1.3 Low-level command-line package manager, example dpkg for Debian
2. Copy executable file of program to PATH* 2.1 Pre-compiled
2.2 Build from source
* - PATH can be a directory, such as /usr/local/bin where BASH looks for commands
11/04/2023 H3ABioNet Workshop 1: Day 4 35
Installing Programs
1.1 Using graphical package manager (Synaptic on Ubuntu)
11/04/2023 H3ABioNet Workshop 1: Day 4 36
Installing Programs
• Search and install programs using Synaptic on Ubuntu
11/04/2023 H3ABioNet Workshop 1: Day 4 37
Installing Programs
to
• Install program dependencies (Synaptic on Ubuntu)
11/04/2023 H3ABioNet Workshop 1: Day 4 38
Piping and Scripting
• Piping: Run different programs sequentially where the output of one program becomes the input for the next one.
• Bash uses the “|” sign (pipe) to pipe the output of one program as the input of another program.
• For example:
11/04/2023 H3ABioNet Workshop 1: Day 4 44
Piping and Scripting
• Another popular combination is redirect the stdout (output) to a file using '>' (write or overwrite if it exists) or '>>' (append).
• Example:
11/04/2023 H3ABioNet Workshop 1: Day 4 45
Piping and Scripting• A shell program, called a script, is a tool for building applications by
"gluing together" system calls, tools, utilities, and compiled binaries.
• For example: fasta_seq_count.sh
#! /bin/bash# Count sequences in fasta file (1st argument)grep –c ‘>’ $1
• To run this script:1. Give script execute permission:
chmod u+x fasta_seq_count.sh2. bash fasta_seq_count.sh <fasta_file>
11/04/2023 H3ABioNet Workshop 1: Day 4 46
Variables
• A variable is a name assigned to a location or set of locations in computer memory, holding an item of data.
• Variables in BASH can be put into two categories:1. System variables: Variables defined by system, such as PATH and
HOME2. User-defined variables: Variables defined by a user during shell
session.Example:
11/04/2023 H3ABioNet Workshop 1: Day 4 47
Variables
• System variables
11/04/2023 H3ABioNet Workshop 1: Day 4 48
Variables
• Commands to interact with variables
• Example: Add a program executable directory to your PATH.
export PATH=/home/user/shscripts:$PATH
11/04/2023 H3ABioNet Workshop 1: Day 4 49
Common applications in bioinformatics
• Fasta file manipulation
– Fasta file is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes.
11/04/2023 H3ABioNet Workshop 1: Day 4 50
Common applications in bioinformatics
• Fasta file manipulation
11/04/2023 H3ABioNet Workshop 1: Day 4 51
Common applications in bioinformatics
• BLAST output manipulation
– The BLAST tabular format is one of the most common and useful formats for presenting BLAST output. It has 12 columns: query_id, subject_id, %identity, align_length, mismatches, gaps_openings, q_start, q_end, s_start, s_end, e_value, bit_score
11/04/2023 H3ABioNet Workshop 1: Day 4 52
Common applications in bioinformatics
• BLAST output manipulation
11/04/2023 H3ABioNet Workshop 1: Day 4 53
Common applications in bioinformatics
• High throughput sequencing software
– Create a report on the quality of a read set: fastqc
– Assemble reads into contigs: velvet, SPAdes, etc.
– Align reads to a known reference sequence: SHRiMP, Bowtie2, BWA etc.
– Many other tools: samtools, picard, GATK, etc.
11/04/2023 H3ABioNet Workshop 1: Day 4 54
Conclusion
• Linux is a free and open source OS with powerful and flexible command-line tools to advance your bioinformatics research projects.
• While learning to use these tools may be challenging, at first, the rewards of UNIX/ Linux command-line proficiency is worth the effort.
11/04/2023 H3ABioNet Workshop 1: Day 4 55
References• Basic Linux by Aureliano Bombarely Gomez, Boyce Thompson Institute for
Plant Research• Bash Scripting Guide by Mendel Cooper• Introduction to Linux for Bioinformatics by Joachim Jacob, Bioinformatics
Training and Service facility (BITS)• http://www.gnu.org/software/• Linux commands, with detailed examples and explanations: http://
www.linuxconfig.org/linux-commands• The Unix Shell (Software Carpentry): http://
software-carpentry.org/v4/shell/index.html• Bioinformatics on the Command line by Paul Harrison, Victorian
Bioinformatics Consortium
11/04/2023 H3ABioNet Workshop 1: Day 4 56
Questions
11/04/2023 H3ABioNet Workshop 1: Day 5 57