2012 DOCK Tutorial With Streptavidin

download 2012 DOCK Tutorial With Streptavidin

of 18

Transcript of 2012 DOCK Tutorial With Streptavidin

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    1/18

    2012 DOCK tutorial with StreptavidinFrom Rizzo Lab

    For additional Rizzo Lab tutorials see DOCK Tutorials.

    Use this link Wiki Markup (http://www.unilang.org/wiki/index.php/Wiki_markup) as a reference for editing the wiki.

    Contents

    1 I. Introduction1.1 DOCK1.2 Streptavidin & Biotin1.3 Organizing Directories

    2 II. Preparing the Receptor and Ligand2.1 Downloading the PDB Structure (1DF8)2.2 Preparing the Ligand and Enzyme in Chimera

    3 III. Generating Receptor Surface and Spheres3.1 Receptor Surface3.2 Spheres

    4 IV. Generating Box and Grid4.1 Box4.2 Grid

    5 V. Docking a Single Molecule for Pose Reproduction5.1 Docking and Results5.2 Results

    6 VI. Virtual Screening6.1 Virtual Screening Preparation6.2 Virtual Screening Protocol

    6.3 Virtual Screening Results7 VII. Running DOCK in Serial and in Parallel on Seawulf 7.1 Running DOCK in Serial on a Single Processor7.2 Running DOCK in Parallel using MPI7.3 Serial Calculation for Pose Reproduction7.4 Parallel Virtual Screen

    8 VIII. Frequently Encountered Problems8.1 Barbara8.2 Woo Suk

    8.2.1 Distinguishing overlapped spheres8.3 Longfei8.4 Roman8.5 Quan

    8.6 Hui8.7 Tuoling8.8 Yuchen8.9 Always Remember Your Path8.10 Yunting

    8.10.1 Delete Spheres Manually8.11 Kip

    8.11.1 Keeping your data in sync8.11.2 Deleting jobs

    I. Introduction

    DOCK

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    1 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    2/18

    DOCK is a molecular docking program used in drug discovery. It was developed by Irwin D. Kuntz, Jr. and colleagues atUCSF (see UCSF DOCK (http://dock.compbio.ucsf.edu/) ). This program, given a protein binding site and a smallmolecule, tries to predict the correct binding mode of the small molecule in the binding site, and the associated bindingenergy. Small molecules with highly favorable binding energies could be new drug leads. This makes DOCK a valuabledrug discovery tool. DOCK is typically used to screen massive libraries of millions of compounds against a protein toisolate potential drug leads. These leads are then further studied, and could eventually result in a new, marketable drug.DOCK works well as a screening procedure for generating leads, but is not currently as useful for optimization of thoseleads.

    DOCK 6 uses an incremental construction algorithm called anchor and grow . It is described by a three-step process:

    Rigid portion of ligand (anchor) is docked by geometric methods.1.Non-rigid segments added in layers; energy minimized.2.The resulting congurations are 'pruned' and energy re-minimized, yielding the docked congurations.3.

    Streptavidin & Biotin

    Streptavidin is a tetrameric prokaryote protein that binds the co-enzyme biotin with an extremely high afnity. Thestreptavidin monomer is composed of eight antiparallel beta-strands which folds to give a beta barrel tertiary structure. Abiotin binding-site is located at one end of each ! -barrel, which has a high afnity as well as a high avidity for biotin. Fouridentical streptavidin monomers associate to give streptavidins tetrameric quaternary structure. The biotin binding-site ineach barrel consists of residues from the interior of the barrel, together with a conserved Trp120 from neighboring subunit.In this way, each subunit contributes to the binding site on the neighboring subunit, and so the tetramer can also beconsidered a dimer of functional dimers.

    Biotin is a water soluble B-vitamin complex which is composed of an ureido (tetrahydroimidizalone) ring fused with atetrahydrothiophene ring. It is a co-enzyme that is required in the metabolism of fatty acids and leucine. It is also involved ingluconeogenesis.

    Organizing Directories

    While performing docking, it is convenient to adopt a standard directory structure / naming scheme, so that les are easy tond / identify. For this tutorial, we will use something similar to the following:

    ~username/AMS536/DOCK-Tutorial/00-original-files//01-dockprep//02-surface-spheres//03-box-grid//04-dock//05-virtual-screen/

    In addition, most of the important les that are derived from the original crystal structure will be given a prex that is thesame as the PDB code, '1DF8'. The following sections in this tutorial will adhere to this directory structure / naming scheme.

    II. Preparing the Receptor and Ligand

    Downloading the PDB Structure (1DF8)

    The Protein Data Bank contains atomic coordinates for more than 79,000 molecules and is accessible via the internet fromPDB (http://www.pdb.org) . The target (protein or nucleic acid) is accessible by PDB ID, molecular name, or author. Afterentering the PDB ID, molecule name, or author, click Download Files , then select PDB File (text) . A window will open;click on Save File Downloads .

    Preparing the Ligand and Enzyme in Chimera

    If you are on "herbie" you can access Chimera directly by typing in Chimera at the prompt. Otherwise, download Chimera

    to your desktop and obtain your protein/nucleic acid complex of interest using Fetch (by ID).

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    2 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    3/18

    dockprep image

    Open Your Protein of Interest and Delete Unwanted Molecules:

    Click on Open under the File menu and nd where you saved your pdb le.

    You only need one of the monomers to perform docking. Remove chain B by carrying out the following:

    Select ! Chain ! B ! All .1.Actions ! Atoms ! Delete .2.

    To delete water molecules and/or other ligands, go to Tools ! Structure Edit ! Dock Prep . Check all boxes and clickokay. This will walk you through the steps needed to prepare the complex for docking, and will also assign partial charges tothe protein and the ligand. Choose am 1-bcc for charges.

    Save the le as 1DF8.dockprep.mol2.

    This le contains a conformation of the complex withhydrogen atoms. The grid calculation is based on thereceptor with its hydrogen atoms. The grid score is anenergy calculation that is based on the following equation:

    E GRID = E VDW + E ES . The score is an approximation of

    the molecular mechanics' energy function and it considersonly through space interactions.

    Create a Receptor File:

    Go to Select ! Residue ! BTN . Then go to Actions !

    Atoms ! Delete .

    Save the le as 1DF8.receptor.mol2.

    Create a Receptor File with No Hydrogen Atoms:

    Go to Select ! Chemistry ! Element ! H ! Delete .

    Save a PDB le as 1DF8.receptor.noH.pdb.

    Create a Ligand File:

    Open only the 1DF8.dockprep.mol2.

    Go to Select ! Chain ! Delete . This will allow you to have only the ligand.

    Save the le as 1DF8.ligand.mol2.

    These are the les that you will need to continue with rigid or exible docking.

    III. Generating Receptor Surface and Spheres

    Receptor Surface

    To generate an enzyme surface, rst open the receptor pdb le with the hydrogen atoms removed(1DF8.receptor.noH.pdb) . Next, go to Actions -> Surface -> Show . Note that for DOCK calculation hydrogen atoms areconsidered, but for generating enzyme surface and spheres, it is necessary to use the protein without hydrogens.

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    3 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    4/18

    2012_DOCK_Tutorial_1DF8_surface

    Recent versions of Chimera include a Write DMS tool that facilitates calculation of the molecular surface. Go to Tools ->Structure Editing -> Write DMS . Save the surface as 1DF8.receptor.dms .

    2012_DOCK_Tutorial_1DF8_dms

    The Write DMS tool will "roll" a small probe (default radius = 1.4 Angstroms = Size of a water molecule) over the surfaceof the enzyme and calculate the surface normal at each point. DMS (distributed molecular surface) le is subsequently usedas input le for sphgen .

    Spheres

    To generate docking spheres, we need to use a command line program called sphgen. To run the sphgen , we need a inputle named INSPH .

    1DF8.receptor.dms #molecular surface file that we got from previous stepR #sphere outside of surface (R) or inside surface (L)X #specifies subset of surface points to be used (X=all points)0.0 #prevents generation of large spheres with close surface contacts(defalut=0.0)4.0 #maximum sphere radius in angstroms (default=4.0)1.4 #minimum sphere radius in angstroms (default=radius of probe)1DF8.receptor.sph #clustered spheres file that we want to generate

    Save the INSPH le. Then use this command to generate the spheres le:

    sphgen -i INSPH -o OUTSPH

    -i means input; -o means output.

    You should get two output les: OUTSPH and 1DF8.receptor.sph . The OUTSPH le should similar to this:

    density type = Xreading 1DF8.receptor.dms type R# of atoms = 881 # of surf pts = 10771finding spheres for 1DF8.receptor.dmsdotlim = 0.000

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    4 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    5/18

    radmax = 4.000Minimum radius of acceptable spheres?

    1.4000000output to 1DF8.receptor.sphclustering is complete 28 clusters

    You can also open the spheres le that we generated in this step (1DF8_receptor.sph) . This le contains detailedinformation about the spheres, which are divided into 28 clusters. Cluster 0 in the end of the spheres le is a combination of all the clusters.

    In order to visualize the generated spheres, you can use a program called showsphere . Showsphere is an interactiveprogram. In the command line, simply type

    showsphere

    You will be asked the following questions:

    Enter name of sphere cluster file:1DF8.receptor.sph

    Enter cluster number to process (Atoms/Bonds->sphere , you will be able to see the spheres incluster 1. You can also open the receptor le (1DF8.receptor.mol2) at the same time, then choose Presets->Interactive3(hydrophobicity surface), then again choose Actions->Atoms/Bonds->sphere , you will be able to see what the spheres incluster 1 look like on the enzyme surface.

    2012_DOCK_Tutorial_1DF8_spheres

    There are over 500 spheres in the spheres le (1DF8.receptor.sph) . However, we're only interested in docking the ligandinto the active site. Therefore we need to select only those spheres which are inside the active site, using sphere_selector

    program.

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    5 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    6/18

    2012_DOCK_Tutorial_1DF8_surface_spheres_10 angstroms

    Flow Chart of Questions for Showbox (Red path is followed

    sphere_selector 1DF8.receptor.sph ../01-dockprep/1DF8.ligand.mol2 10.0

    Sphere_selector lters the output from sphgen , selecting all spheres within a user-dened radius of a target molecule. In thiscase, we selected the spheres within 10 angstroms of our ligand. A le called selected_spheres.sph should be created,showing the spheres that are selected. You can again visualize it by showsphere . You can also change the radius (to 8angstroms or 6 angstroms) or manually edit the le selected_spheres.sph so that you can select the spheres you want. In thistutorial, spheres within 6 angstroms of the ligand are used for the next step (with one sphere which does not belong to the

    active site being manually deleted).

    2012_DOCK_Tutorial_1DF8_surface_spheres_6 angstroms

    IV. Generating Box and Grid

    Box

    In order to speed up docking calculations, DOCK generates a ne grid. At each point in the grid, electrostatic and the VDWprobes' energies are precomputed. The energies are computed using a molecular force eld. To determine the dimensions of the grid, however, we rst generate a box that contains the outer boundaries for the grid calculation. The dimensions andlocation of the box can be determined using a program called showbox .

    First create a directory where you will place the grid les.

    $mkdir 03-box-grid$cd 03-box-grid

    showbox can be used interactively or a le with predetermined answers can be fed into the program.

    The program asks the questions depicted in the diagram the right:

    To run the program in the interactive mode, run

    $showbox

    To feed the answers to the questions, run

    $showbox < showbox.in

    For example, showbox.in can contain:

    Y5../02-surface-spheres/selected_spheres.sph11DF8.box.pdb

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    6 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    7/18

    in this tutorial)

    1DF8 receptor along with our ligand and the box we generatedusing showbox

    Y means we use automatic box construction, 5 is the extramargin to be enclosed around our ligand (in Angstroms),selected_spheres.sph is the sphere le we generated, 1corresponds to the cluster number in the selected_spheres.sph le, and 1DF8.box.pdb is the output le. We can open theoutput box le in chimera to make sure the box is in the right place.

    Grid

    Now let's generate a grid within our box. We will use theenergy scoring method to generate a grid, resulting in threeadditional les with extensions *.nrg , *.bmp , and *.out . The*.nrg le contains the energy scoring, *.bmp contains the size,position and grid spacing and determines whether there are anyoverlaps with receptor atoms.

    To generate the grid we will use the grid program. Thisprogram can either be used interactively, or an input le can befed in, just like the showbox program.

    Usage: grid [-i [input_file]] [-o [output_file]] ...[-standard_i/o] [-terse] [-verbose]

    -i: read from grid.in or input_file, standard_in otherwise-o: write to grid.out or output_file (-i required),

    standard_out otherwise-s: read from and write to standard streams (-i and/or -o ille-t: terse program output-v: verbose program output

    For our grid.in le, we will use the following answers:

    compute_grids yesgrid_spacing 0.3output_molecule nocontact_score noenergy_score yesenergy_cutoff_distance 9999atom_model aattractive_exponent 6repulsive_exponent 9distance_dielectric yesdielectric_factor 4bump_filter yesbump_overlap 0.75receptor_file ../01-dockprep/1DF8.receptor.mol2box_file 1DF8.box.pdbvdw_definition_file /opt/software/AMS536software/dock6/parameters/vdw_AMBER_parm99.defnscore_grid_prefix grid

    Line by line:

    compute scoring grids (yes)1.what is the distance between grid points along each axis (in Angstroms).2.write up coordinates of the receptor into a new le3.

    compute contact grid? default is no4.compute energy score? yes - we are using this method to compute force elds on probes5.the max distance between atoms for the energy contribution to be computed6.atom_model u means united atom model where atoms are attached to hydrogens, and a stands for all-atom model,where hydrogens on carbons are treated separately

    7.

    attractive component stands for exponent of the attractive LJ term in VDW potential8.repulsive component stands for exponent in the repulsive LJ term in VDW potential9.distance dielectric stands for the dielectric constant to be linearly dependent on distance10.distance dielectric factor is the coefcient of the dielectric11.bump lter ag determines if we want to screen orientation for clashes before scoring and minimization12.bump_overlap stands for the fraction of allowed overlap where 1 corresponds to no allowed overlap and 0corresponds to full overlap being permitted.

    13.

    our receptor le14.the box le we generated in the Box section15.

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    7 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    8/18

    VDW parameters le16.Prex for the grid le name. All the extensions will be generated automatically.17.

    V. Docking a Single Molecule for Pose Reproduction

    Docking and Results

    Change directory into 04-dock and create an empty input le called dock.in

    $touch dock.in #create a file called dock.in

    Run dock6.4

    $dock6 -i dock.in

    Alternatively:

    $dock6 -i dock -v # -v option allows you to print out the information regarding each growth step on terminal

    Or:

    $dock6 -i dock -o dock.bf.out -v # -o allows you to write the aforementioned information into an output file named dock.b

    1st run :

    Notice that running dock6 with an empty input le will require you to answer a series of questions. For the rst run we willdeactivate most of the features by selecting no. Parameters requiring specication are listed below:

    ligand_atom_file [database.mol2] (): ../01-dockprep/1DF8.ligand.mol2ligand_outfile_prefix [output] (): 1DF8.output

    What the program is doing in the 1st run is to take the ligand.mol2 le and directly generate an output le without anyadditional manipulations. You would expect it to be exactly the same as the original pose. You can open it in Chimera byusing the ViewDock function under Tools->Surface/Binding Analysis. We will not show the result here.

    2nd run:

    The real experiment begins here. Notice that we selected no for most of the functions. This time we will try change someparameters in the dock.in le.

    $vim dock.in

    Parameters being changed are listed below:

    orient_ligand yes

    The "orient_ligand" option tells the program whether to try orientations different from the pose in the original .mol2 le.Note that because of the change, additional questions are asked by the program. For simplicity we keep most of the answersas default. File paths that need to be specied are listed below:

    receptor_site_file [receptor.sph] ():../02-surface-spheres/selected_spheres_06A.sphvdw_defn_file [vdw.defn] ():../03-grid/vdw_AMBER_parm99.defnflex_defn_file [flex.defn] ():../03-grid/flex.defnflex_drive_file [flex_drive.tbl] ():../03-grid/flex_drive.tbl

    The receptor_site_le should be the selected spheres le (.sph) generated in a previous step (02 surface and spheres),according to which the program orients the ligand.

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    8 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    9/18

    The vdw_defn_le instructs the dock6 program to use the Van der Waals parameter sets from the AMBER force eld.

    The ex_defn_le and the ex_drive_le contain the information required by the program to sample conformations.

    The result is shown below. As you may notice the result doesn't look very good.

    DOCK second Run result

    3rd Run:

    Here we will further specify parameters to improve the result.

    calculate_rmsd yesscore_molecules yesnum_scored_conformers 10

    Again, further questions will be asked when you run the program. Keep most answers as the default values. Paths requiringspecication are listed below:

    grid_score_grid_prefix [grid] ():../03-grid/1DF8.grid

    Note that the above specication will tell the program to load the 1DF8.grid.nrg le you generated in the previous step(grid) for scoring.

    simplex_max_iterations [1000] ():20 write_conformations [yes] (yes no):no

    cluster_rmsd_threshold [2.0] ():0.2

    The simplex_max_iteration parameter species number of minimization cycles.

    Note the program will cluster poses that are very close together (rmsd smaller than the threshold specied in thecluster_rmsd_threshold parameter) into a cluster.

    This time the program returned the best 10 poses. The one with the best grid score (-52.8, shown blue) superimposed quitewell with the original pose in the crystal structure (RMSD = 0.71). You can view the grid scores in ViewDock by selectingColumn->Show->Grid Score

    DOCK third Run result

    4th Run:

    This time we will set the ligand as exible:

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    9 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    10/18

    flexible_ligand yes

    Further questions will be asked during the run. Keep most of the answers as the default values except for:

    simplex_grow_max_iterations [20] ():500

    The simplex_grow_max_iteration species the maximum number of iterations per cycle per growth step.Notice that the run is signicantly slower this time. Again the best pose generated (-64.8) is shown in blue. The grid scoreimproved signicantly but the RMSD did not change much (0.79)

    DOCK fourth Run result

    5th Run:

    This time we will turn the bump lter on:

    bump_filter yes

    The bump_lter option lters out conformations that cause clashes between atoms.

    Further questions will be asked during the run. Keep answers as the default values. Specify the following path:

    bump_grid_prefix [grid] ():../03-grid/1DF8.grid

    Note that the path tells the program to access the .bmp le generated in the previous grid step.

    The best pose (grid score -65.3) is again shown in blue. Notice that turning on the bump lter does not alter either the gridscore or the RMSD (0.78).

    DOCK fth Run result

    Remember you can tweak any parameter in the dock.in le instead of keeping them as default.

    Following is the nal dock.in le used during the fth run:

    ligand_atom_file ../01-dockprep/1DF8.ligand.mol2limit_max_ligands noskip_molecule noread_mol_solvation nocalculate_rmsd yesuse_rmsd_reference_mol no

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    10 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    11/18

    use_database_filter noorient_ligand yesautomated_matching yesreceptor_site_file ../02-surface-spheres/selected_spheres_06A.sphmax_orientations 500critical_points nochemical_matching nouse_ligand_spheres nouse_internal_energy yesinternal_energy_rep_exp 12flexible_ligand yesmin_anchor_size 40

    pruning_use_clustering yespruning_max_orients 100pruning_clustering_cutoff 100pruning_conformer_score_cutoff 25use_clash_overlap no

    write_growth_tree nobump_filter noscore_molecules yescontact_score_primary nocontact_score_secondary nogrid_score_primary yesgrid_score_secondary nogrid_score_rep_rad_scale 1grid_score_vdw_scale 1grid_score_es_scale 1grid_score_grid_prefix ../03-grid/1DF8.griddock3.5_score_secondary nocontinuous_score_secondary nogbsa_zou_score_secondary nogbsa_hawkins_score_secondary noamber_score_secondary nominimize_ligand yesminimize_anchor yesminimize_flexible_growth yesuse_advanced_simplex_parameters nosimplex_max_cycles 1simplex_score_converge 0.1simplex_cycle_converge 1.0simplex_trans_step 1.0simplex_rot_step 0.1simplex_tors_step 10.0simplex_anchor_max_iterations 500simplex_grow_max_iterations 500simplex_grow_tors_premin_iterations 0simplex_random_seed 0simplex_restraint_min noatom_model allvdw_defn_file ../03-grid/vdw_AMBER_parm99.defnflex_defn_file ../03-grid/flex.defnflex_drive_file ../03-grid/flex_drive.tblligand_outfile_prefix 1DF8.output

    write_orientations nonum_scored_conformers 10 write_conformations no

    cluster_conformations yescluster_rmsd_threshold 0.2rank_ligands no

    Now we will write up a script for submitting your dock job to Seawulf. Create a script called sub.dock.csh

    $vim sub.dock.csh

    wherein you write:

    #! /bin/tcsh#PBS -l nodes=1:ppn=1#PBS -l walltime=01:00:00#PBS -o zzz.qsub.out#PBS -j oe

    cd ~/AMS536/DOCK_tutorial/05-dock_qsub

    /nfs/user03/sudipto/dock6/bin/dock6 -i dock.in -o dock.out

    This will request 1 processor from the cluster for your job. When you are submitting the job:

    $qsub sub.dock.csh

    Note that you might have to make the script executable before running it:

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    11 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    12/18

    $chmod +x sub.dock.csh

    Results

    VI. Virtual Screening

    Virtual Screening PreparationVirtual screening is a widely used method in computational drug design. It searches large libraries of chemical compoundsto identify favorable structures that bind to a target molecule. To perform virtual screening, we use ligands.mol2 , a mol2 lewhich contains 101 small molecules to be the virtual library. The computational cost is reasonable for a quick search.Usually we use larger molecule set from chemical database, such as ZINC (http://zinc.docking.org/).

    Since the computational cost of virtual screening is much higher than single-ligand docking, it is better for us to run it onSeawulf. We need to compress the whole DOCK-Tutorial directory, copy it to Seawulf and compress it.

    tar -cvf DOCK-Tutorial.tar DOCK-Tutorial/scp DOCK-Tutorial sw:/nfs/user03/usrname/AMS536tar -xvf DOCK-Tutorial.tar DOCK-Tutorial

    Virtual Screening Protocol

    The purpose of virtual screening is different from single molecule docking, so we need to modify our previous dockingscript dock.in to vs.in . We can see the difference between the two les.

    1c1< ligand_atom_file ligands.mol2---> ligand_atom_file ../01-dockprep/1DF8.ligand.mol256,59c56,59< vdw_defn_file /nfs/user03/sudipto/dock6/parameters/vdw_AMBER_parm99.defn< flex_defn_file /nfs/user03/sudipto/dock6/parameters/flex.defn< flex_drive_file /nfs/user03/sudipto/dock6/parameters/flex_drive.tbl< ligand_outfile_prefix 1DF8.vs.output---> vdw_defn_file ../03-box-grid/vdw_AMBER_parm99.defn> flex_defn_file ../03-box-grid/flex.defn> flex_drive_file ../03-box-grid/flex_drive.tbl> ligand_outfile_prefix 1DF8.output61c61< num_scored_conformers 1---> num_scored_conformers 1063c63,64< cluster_conformations no---> cluster_conformations yes> cluster_rmsd_threshold 0.2

    num_scored_conformers: 1 -> 10 In virtual screening, we only need the most favorable pose of each candidate moleculeand compare them.cluster_conformations: yes -> no Slightly different conformations are not clustered together. They are treated as different

    conformations in the docking process.In order to generate different search spaces, we can modify some other parameters.

    max_orientations: The maximal number of anchor orientations that will be generated.min_anchor_size: The minimum number of atoms for an anchor.pruning_use_clustering: Pruning conformations during the clustering process.use_internal_energy: Using repulsive VDM to avoid internal clashes.

    Parallel computing can reduce the running time of virtual screening. Here is our job submission scriptsub.virtual_screen.csh .

    #! /bin/tcsh#PBS -l nodes=4:ppn=2#PBS -l walltime=01:00:00#PBS -o zzz.qsub.out

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    12 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    13/18

    #PBS -j oe#PBS -Vset nprocs = wc -l $PBS_NODEFILE | awk '{print $1}'`echo "Running on ${nprocs} processors"echo ""echo "processor list are:"cat $PBS_NODEFILEcd ~/AMS536/DOCK_Tutorial/06-virtualscreenmpirun -np $nprocs dock6.mpi -i vs.in -o vs.out

    Finally, we can submit the job and perform virtual screening.

    qsub sub.virtual_screen.csh

    Virtual Screening Results

    After performing virtual screening on Seawulf or any other computer, if you are using a parallel computer, you should get amulti-mol2 le 1DF8.vs.output_scored.mol2 which contains the mol2 les of all succesfully docked ligands, a vs.out lewhich contains the dock results of successfully docked ligands, and vs.out.1 through vs.out.7 which contain dock resultsfrom different nodes you use (the number will vary according to the number of nodes you use, here we use 8 nodes asmentioned before).

    The vs.out le is returned by the leading node and contains the information of each succesfully docked ligands, looks likethis:

    Molecule: ZINC33171556

    Anchors: 1Orientations: 500Conformations: 116

    Grid Score: -52.373383Grid_vdw: -51.643253

    Grid_es: -0.730129Int_energy: 6.125062

    The vs.out.1 through vs.out.2 les are returned by other nodes processing those ligands, separately. For those succesfullydocked ligands, the le will return the elapsed time for docking, and for those not succesfully docked, it will return an error

    massege like this:

    Molecule: ZINC20605433

    Elapsed time: 0 seconds

    ERROR: Could not complete growth.Confirm that the grid box is large enough to contain the ligand,and try increasing max_orientations.

    If you download the multi-mol2 le from seawulf using:

    scp sw:~/AMS536/DOCK_tutorial/06-virtual-screen/1DF8.vs.output_scored.mol2 ./

    Now you have this multi-mol2 le 1DF8.vs.output_scored.mol2 on your local machine, you can actually open this le inChimera to visually check you docking results and do some visual analysis. But it's not a good idea to directly open yourmulti-mol2 le because it contains information of all 47 succesfully docked ligands, if you just open this le, it will bepretty messy. So what you will do is, rst you open the receptor le 1DF8.receptor.mol2 and the ligand 1DF8.ligands.mol2as a reference. And then use the ViewDock function of Chimera to look at your 47 ligands one or however many you wantat a time. You can nd ViewDock via Tools -> Surface/Binding Analysis -> ViewDock , and then nd your1DF8.vs.output_scored.mol2 le in your own directory and click on open. Now a new window of ViewDock will pop outand there will be an extra ligands in Chimera main window which is the highlighted ligand in ViewDock window. You canlook at the ligands one at a time, you can also hold ctrl and click on different ligands to view them at the same time, thiswill give you a direct idea of how good these ligands dock. The other thing you can do is, the multi-mol2 le1DF8.vs.output_scored.mol2 contains the energy score of every ligand, so in ViewDock window, you can go to Column-> show -> Grid Score to show the grid score, and then you can click on the head of the column to rank order all the ligandsby their grid scores.The picture showing here is the best and worst scored ligands of out calculation, the best scroed one incyan and the worst scored on in magenta, and the reference ligand is colored according to elements. As you can see, the bestscored ligand ts in the binding pocket very well, but the worst scored one almost sticks out of the binding pocket.

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    13 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    14/18

    Example of Virtual Screening Result

    VII. Running DOCK in Serial and in Parallel on Seawulf

    The Seawulf Cluster has 235 dual processor nodes (2 processors per node), for a total of 470 individual processors. Theseare 3.4Ghz Intel Pentium IV Xeon processors from Dell. Each node has 2GB attached RAM and a 40GB hard disk.

    Typically you will use one processor on a single node to dock one ligand. If you are docking multiple ligands, you can usemore than processor in parallel mode, but you should never use more processors than you have ligands.

    Running DOCK in Serial on a Single Processor

    The following sample code can be used to run DOCK on one processor on a single node:

    #!/bin/tcsh#PBS -l nodes=1:ppn=1#PBS -l walltime=01:00:00#PBS -N dock6#PBS -M [email protected]#PBS -j oe#PBS -o pbs.out

    cd /nfs/user03/sudipto/DOCK_Tutorial/nfs/user03/sudipto/dock6/bin/dock6 -i dock.in -o dock.out

    Here is an explanation of the code and format:

    #!/bin/tcsh #Execute script with tcsh#PBS -l nodes=1:ppn=1 #Use one node, and one processor per node, so one single process#PBS -l walltime=01:00:00 #Allow 1 hour for your job run#PBS -N dock6 #Name of your job#PBS -M [email protected] #Get an email notifying you when your job is completed#PBS -j oe #Combine the output and error streams into a single output file#PBS -o pbs.out #Name of your output file

    cd /path-to-your-home-directory-on-seawulf/DOCK_Tutorial #Change to your home directory and folder with dock files/nfs/user03/sudipto/dock6/bin/dock6 -i dock.in -o dock.out #Specifies path to dock executable and provide input and output

    Running DOCK in Parallel using MPI

    The following sample code can be used to run DOCK on 4 nodes, using both processors on each node, for a total of 8processors.

    #! /bin/tcsh#PBS -l nodes=4:ppn=2#PBS -l walltime=24:00:00#PBS -o zzz.qsub.out#PBS -j oe#PBS -V

    set nprocs = wc -l $PBS_NODEFILE | awk '{print $1}'`

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    14 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    15/18

    echo "Running on ${nprocs} processors"

    cd /nfs/user03/sudipto/DOCK_Tutorialcp /nfs/user03/sudipto/dock6/parameters/vdw_AMBER_parm99.defn .cp /nfs/user03/sudipto/dock6/parameters/flex* .

    mpirun -np $nprocs /nfs/user03/sudipto/dock6/bin/dock6.mpi -i dock.in -o dock.out

    For more information, see PBS Queue

    Serial Calculation for Pose Reproduction

    Parallel Virtual Screen

    VIII. Frequently Encountered Problems

    Barbara

    How to Open Files

    It is important to remember that if you save les to your desktop as mol2 les or pdb les when preparing your ligand andreceptor for docking, they cannot be opened using Windows because Windows only opens pdf les using Adobe Reader.

    You can only access these les through Chimera.

    If you want to view a le in chimera, rst transfer the le from "herbie" or "seawulf" to your laptop using WinSCP. Thenopen Chimera on your laptop to open the les.

    Woo Suk

    Distinguishing overlapped spheres

    Sometimes your 1DF8.output_1.pdb can have some overlapped spheres. In this case, you cannot nd them before you deleteone sphere and you have to repeat this one more time to delete another one.In order to avoid the situation, you can adjust a parameter in INSPH le like following:

    1DF8.receptor.dmsRX0.001 4.01.41DF8.receptor.sph

    You can change the fourth parameter from 0.0 to other values such as 0.1, 0.01, or 0.001. Because what you want is just todistinguish the very close spheres, you don't need large numbers.

    Longfei

    Installation of DOCK

    If you want to install DOCK on your own desktop or laptop, and you are a beginner of Unix/Linux, you might encountersome problems during installation. A good reference for the installation is the DOCK UserManual(http://dock.compbio.ucsf.edu/DOCK_6/dock6_manual.htm). The problem I encountered was: after I used gnu asconguration le to congure the Makele, I successfully made the cong.h le, but when I was trying to build the DOCKexecutables, an error message " g77 command not found " appeared. And it is quite inconvenient to install the g77command. A way to solve this problem is to open the cong.h le, and manually change the g77 in that le to gfortran .

    Use the Correct Path of Your File

    One frequently encountered problem is using the wrong path of your le. For example, when you use sphere_selector togenerate spheres around the ligand, it requires two les--one is the spheres le .sph and another is your ligand le, but yourligand le is probably not in your working directory! So you need either copy your ligand le to the current directory or

    specify the correct path of your ligand le like below:

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    15 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    16/18

    sphere_selector 1DF8.receptor.sph ../01-dockprep/1DF8.ligand.mol2 10.0

    This is also important when you run DOCK. Remember to use the correct path and name of the your le.

    Notice the Shell That You are Using

    This tutorial is based on the shell tcsh , and maybe you are using a different shell. This will not cause any trouble in most of the time, but you may want to notice this in some circumstances, for example when you want to execute some .csh le. One

    way to solve this problem is that simply type "tcsh" in the command line, and you will be running tcsh in another shell.

    Roman

    One of the problems one may encounter with grid creation is the sampling location for ligand binding. It is important tocreate a grid and dock the ligand to the correct location in the protein. If the grid is created in the incorrect location, theligand binding sites will be sampled within the grid, and the best binding site will not be sampled. In our case, we aredocking it to an area where a ligand is known to bind to streptavidin. However, this may not always be the case.

    One solution is to dock to the center of mass of the spheres clusters created with spheregen and assess the center of mass asviable or not based on your chemical intuition.

    Another way is to assess the protein for buried sites where there may also be a number of spheres. Both are options in the

    showbox program.

    Finally, one can attempt to nd proteins with similar sequence and determine where their ligand binding sites are, and thenpick the binding site in the protein you are docking to using that information.

    Quan

    Hui

    Keep the previous dock6 output .mol2 les

    When we run dock6 program, the output le would be "1DF8.output_scored.mol2", with the prex "1DF8.output" that youspecied in dock.in le. In the case that you would have several runs of dock6 program with the output le of the same path

    (in the same directory), and you want to keep the result for each run, you need to rename "1DF8.output_scored.mol2" le toanother one, otherwise the old .mol2 le will be overwritten by the new .mol2 le generated in the next run.

    For example, you can rename the old .mol2 le generated by the rst run and then perform the second run.

    $mv 1DF8.output_scored.mol2 first_1DF8.output_scored.mol2

    Tuoling

    Some recommended parameters we can change when running dock

    max_orientations 500min_anchor_size 40pruning_conformer_score_cutoff 25pruning_clustering_cutoff 100

    Explanations of these parameters can be found on the website provided by Long Fei. Increase the number of theseparameters can lead to increased running time. But it enables more thorough cycling and may give you better results.

    Yuchen

    Always Remember Your Path

    One situation often appears is that the program will return an error saying "Could not open 1DF8.ligands.mol2 for reading."or something like that indicating that the program couldn't nd the input le you specied in you dock.in or vs.in or someother les. In this case, the most often reason is that the path you specied in your input le is wrong. So the rst thing youmight want to do is to check those paths, pay special attention to those paths where you use " ." to indicate the same folder

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    16 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    17/18

    and " .." to indicate the folder up one level because this is the place where you are most likely to make mistakes. If you arenot sure of the usage of " ." and " ..", it will be safer for you to use the absolute paths of the les. And you can use thecommand pwd to ask the system where you are if you don't know that.

    Yunting

    Delete Spheres Manually

    We need to play some tricks if we want to manually select the spheres in selected_spheres.sph .

    1. Transfer sph le to pbd le by using showsphere .2. Open the pdb le in Chimera and choose the sphere we want to delete.3. Identify the number of that sphere by Actions->Label->residue->specier .4. Open sph le, delete that sphere and modify the number of spheres in that cluster.

    DOCK spheres within 6.0 ang of ligandscluster 1 number of spheres in cluster 22

    60 28.86000 11.09173 4.97943 2.337 586 0 067 28.67840 9.74620 12.13940 1.400 60 0 0

    161 27.11509 13.42709 13.48051 1.401 385 0 0174 27.79940 12.82468 12.04078 2.475 480 0 0183 34.13903 14.01393 7.31591 3.590 691 0 0187 36.78641 17.02434 9.54436 3.481 691 0 0385 30.01753 14.79655 14.46633 1.402 174 0 0463 25.74631 14.83304 9.98248 1.400 589 0 0

    Kip

    Keeping your data in sync

    One problem you may encounter is that you want to run your jobs on Seawulf, but you also want the same les on a mathlabcomputer (or your home computer) for analysis. One option is to use RSYNC to keep your les on Seawulf and Herbie (orany mathlab computer) in sync. For example, if you want to move your DOCK Tutorial les to Seawulf, do the followingfrom Herbie or any mathlab computer:

    rsync -arv /home/username/DOCK_Tutorial/ sw:/nfs/user03/username/DOCK_Tutorial

    Note: The trailing slash on /home/username/DOCK_Tutorial / means that rsync will only copy the contents of yourDOCK_Tutorial folders. If you want to copy the DOCK_Tutorial folder itself, as well as its contents, then remove thetrailing /.

    To copy newer les from Seawulf back to Herbie, do the following from Seawulf:

    rsync -arv /nfs/user03/username/DOCK_Tutorial/ [email protected]:/home/username/DOCK_Tutorial

    You can apply this same strategy to then sync les with your home computer as well. Use the following format:

    rsync -arv source target

    Note: this is easiest if you are using a Linux or Mac computer at home.

    Deleting jobs

    If you make a mistake and need to delete a job from the queue, rst list all your queued or running jobs:

    qstat -u kip

    IMPORTANT: If you get no output from qstat, it means that whatever jobs you have submitted are done!

    If you do have jobs running or waiting in the queue, qstat will output a list that includes their job id(s). Find the job id forthe one you want to delete, and do:

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    17 de 18 29/03/12 18:05

  • 8/2/2019 2012 DOCK Tutorial With Streptavidin

    18/18

    qdel jobid

    Retrieved from "http://ringo.ams.sunysb.edu/index.php?title=2012_DOCK_tutorial_with_Streptavidin&oldid=7190"

    This page was last modied on 1 March 2012, at 18:06.This page has been accessed 652 times.

    2012 DOCK tutorial with Streptavidin - Rizzo Lab http://ringo.ams.sunysb.edu/index.php/2012_DOCK_tutorial_...

    18 de 18 29/03/12 18:05