Applied Bioinformatics Introduction to Linux and R Bing Zhang Department of Biomedical Informatics...

download Applied Bioinformatics Introduction to Linux and R Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu.

If you can't read please download the document

Transcript of Applied Bioinformatics Introduction to Linux and R Bing Zhang Department of Biomedical Informatics...

  • Slide 1
  • Applied Bioinformatics Introduction to Linux and R Bing Zhang Department of Biomedical Informatics Vanderbilt University [email protected]
  • Slide 2
  • Quick summary of the introduced Linux commands 2 CommandMeaning rsh Remote shell passwdModify a users password exitExit the shell pwdDisplay the path of the current directory lsList files and directories ls -aList all files and directories ls -a -lList all files and directories in a long listing format mkdir Make a directory cd Change to named directory cdChange to home directory cd ~Change to home directory cd..Change to parent directory rmdir Remove a directory moreView the contents of a file cp Copy file1 and name the copied file file2 mv Move or rename file1 to file2 rm Remove a file man Display manual pages for a command
  • Slide 3
  • Getting help man (display manual pages for a command) space bar to show next page up and down arrows to move up and down q to exist 3
  • Slide 4
  • Exercise 4 TaskCommand Go to home directorycd Display manual pages for the command lsman ls List the contents of the current directoryls List the contents of the current directory, including entries starting with. and using a long listing format ls -a -l Create a test directory if you dont have one yet, ignore this if you already have it mkdir test Go to the test directorycd test Copy the file sample_data.txt under directory /home/igptest to current directory with the same name cp /home/igptest/sample_data.txt. View the content of the created filemore sample_data.txt Make a copy of the filecp sample_data.txt sample_data_copy.txt View the content of the new copymore sample_data_copy.txt List the contents of the current directoryls Remove the new copyrm sample_data_copy.txt List the contents of the current directoryls
  • Slide 5
  • Data manipulation with filters Filters: programs that accept textual data and then transform it in a particular way. head, tail, cut, sort, uniq, sed 5 TaskCommand View the content of a filemore sample_data.txt Get the first 10 lines of the filehead sample_data.txt Get the first 5 lines of the filehead -n 5 sample_data.txt Get all but the last 5 lines of the filehead -n -5 sample_data.txt Get the last 10 lines of the filetail sample_data.txt Get the last 5 lines of the filetail -n 5 sample_data.txt Get all lines starting from line 5tail -n +5 sample_data.txt Get the first three columns of the filecut -f 1-3 sample_data.txt Get selected columns of the filecut -f 1,3,5 sample_data.txt Sort all lines based on the numerical values in the second column (non-numeric entries are interpreted as zero) sort -k 2 -n sample_data.txt
  • Slide 6
  • Data manipulation with piping and redirection Piping (|) : sending data from one program to another program. Redirection: sending output from one program to a file >: save output to a file >>: append output to a file 6 TaskCommand Get the first 10 lines of the file and then get the first three columns head sample_data.txt | cut -f 1-3 Get the first 10 lines of the file, then get the first three columns of these lines, and then redirect the content to a new file head sample_data.txt | cut -f 1-3 >sample_data_subset.txt View the new filemore sample_data_subset.txt Append the last 10 lines of the old file to the end of the new file tail sample_data.txt >> sample_data_subset.txt View the new filemore sample_data_subset.txt
  • Slide 7
  • Editing files with nano nano is a user-friendly text editor A quick tutorial http://staffwww.fullcoll.edu/sedwards/Nano/IntroToNano.htmlhttp://staffwww.fullcoll.edu/sedwards/Nano/IntroToNano.html 7 TaskCommand Open sample_data.txt for editingnano sample_data.txt Delete the text Line_01 and the space after it, save the file, and then exit In nano, ^O for saving and ^X for exit View the edited filemore sample_data.txt View the content of the.bashrc file, which is located under your home directory. The file includes commands that are executed when starting the system. more ~/.bashrc Open.bashrc file under your home directory for editing.nano ~/.bashrc Add setpkgs a R to the end of this file. This will allow you to use the R environment which has been installed in the ACCRE system for statistical computing. In nano, ^O for saving and ^X for exit View the edited.bashrc filemore ~/.bashrc Run the.bashrc filesource ~/.bashrc
  • Slide 8
  • What is R R is a free software environment for statistical computing and graphics. It includes: an effective data handling and storage facility a suite of operators for calculations on arrays, in particular matrices a large, coherent, integrated collection of intermediate tools for data analysis graphical facilities for data analysis and display either on-screen or on hardcopy a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities 8
  • Slide 9
  • R Installation and tutorial Download and install R http://www.r-project.org/ http://www.r-project.org/ Choose a CRAN (Comprehensive R Archive Network) mirror Binary distributions of the base system and contributed packages Windows version Mac OS X version Linux version (already installed on the ACCRE cluster, will be used for this module) Tutorials http://cran.r-project.org/doc/manuals/r-release/R-intro.html http://cran.r-project.org/doc/manuals/r-release/R-intro.html An introduction to R 9
  • Slide 10
  • R interface 10 Command-line R: Linux/OS X Type R in your Linux shell to start R; Type q() in the R interface to close R. R Gui: OS X (Windows Gui is similar) Download and Install on your laptop Rstudio: Power and user-friendly user interface for R. Excellent for both beginners and developers (http://www.rstudio.com/)
  • Slide 11
  • Install and load packages CRAN packages http://cran.r-project.org/web/packages/ http://cran.r-project.org/web/packages/ >6000 packages BioConductor packages http://www.bioconductor.org/ http://www.bioconductor.org/ ~1000 packages for the analysis of high-throughput genomics data 11 TaskR code Install a CRAN packageinstall.packages (package name) Install a BioConductor packagesouce (http://www.bioconductor.org/biocLite.R) biocLite (package name) Load a package/librarylibrary (package name)
  • Slide 12
  • Basic R syntax Object
  • Operators and calculations Comparison operators: ==, !=,, = Logical operators: & (AND), | (OR), ! (NOT) Calculations Arithmetic operators: +,-,*,/,^ Arithmetic functions: log, exp, sqrt, mean, var, sd, sum, etc. 15 TaskR code Comparisons3==5 3!=5 30 & y>0 Calculations(4+2^2)/(2*2) x