Browser Scatter Plot

download Browser Scatter Plot

of 51

Transcript of Browser Scatter Plot

  • 8/8/2019 Browser Scatter Plot

    1/51

    Vipendra Singh,

    Intern chEMBL Group

    European Bioinformatics Institute

    Final review Meeting Date: 17/06/10

    Division of Bioinformatics

    SBST

    Data Visualization Methods for Exploring the ChEMBL

    chemogenomics database

    Vipendra Singh

  • 8/8/2019 Browser Scatter Plot

    2/51

    Contents:

  • 8/8/2019 Browser Scatter Plot

    3/51

    17.05.20103

    1. Introduction

  • 8/8/2019 Browser Scatter Plot

    4/51

    Introduction

    Rapidly increasing data complexity and volume.

    Important in datasets containing systems biology and

    chemical biology data.

    Need to visualize and browse chemical databases has becomelimiting.

    We propose a new metaphor for data visualization and organismbrowsing on top of the ChEMBL database.

  • 8/8/2019 Browser Scatter Plot

    5/51

    Introduction

    The ChEMBLdb is a new public domain resource linking chemicaltools and drugs through to bioactivity and molecular targets.

    The concepts underlying our approach are based on navigation andvisualization with zooming and portals.

  • 8/8/2019 Browser Scatter Plot

    6/51

    About chEMBL Group

    The ChEMBL team's research focuses on mapping the interactions andfunctional effects of small molecules binding to their macromolecular targets.

    The group studies the interactions of pharmacologically active

    molecules and their receptors.

    In particular the group builds and maintains a series of drug discovery

    databases that are components of ChEMBL.

    ChEMBL Includes:

    ChEMBLdb - medicinal chemistry database Kinase SARfari - www.sarfari.org

  • 8/8/2019 Browser Scatter Plot

    7/51

    About chEMBLdb

    ChEMBL is a database of bioactive drug-like small molecules.

    Contains :

    2-D structures

    Calculated properties (e.g. logP, Molecular Weight, Lipinski Parameters, etc.) and

    Abstracted bioactivities (e.g. binding constants, pharmacology and ADMET data)

    Where does the data come from?

    Data is abstracted and curated from the primary

    scientific literature

    How is the Data supported?

    Funding From Wellcome Trust

  • 8/8/2019 Browser Scatter Plot

    8/51

    About chEMBLdbDatabase Statistics (from Last release)

    Using Database: CHEMBL_02

    Created On: 30-NOV-09

    Protein targets: 4,364

    Cell-line targets: 1,198 Other targets(organisms, etc): 1630

    Total targets: 7,192

    Total activities: 2,404,622

    Total compounds: 622,824 Total publications: 33,956

  • 8/8/2019 Browser Scatter Plot

    9/51

    About chEMBLdb (current Release)

  • 8/8/2019 Browser Scatter Plot

    10/51

    About Organism Browser

    Organism Browser is a GUI (Graphical User Interface)For Organism Class table in the chEMBL schema

    Within chEMBL Schema

    N:1 mapping between the Target Dictionary table and

    the organism table in the ChEMBL Schema

  • 8/8/2019 Browser Scatter Plot

    11/51

    About Organism BrowserC

    chEMBL schema

  • 8/8/2019 Browser Scatter Plot

    12/51

    About Organism Browser(GUI)

    Web Version:1.0.0

  • 8/8/2019 Browser Scatter Plot

    13/51

    About Organism Browser(GUI)Web Version:1.0.0

    :Features:

    Dynamic graphic Display (Pie chart)

    Clickable tree

    Searching Highlighting nodes in the Tree

  • 8/8/2019 Browser Scatter Plot

    14/51

    How does it work?

    On everclickon the JQuery Tree a request is made to the google charts API

    a) Process request on the google server.

    Output: Colorful static image (pie chartin our case)

    b) Parent Node Level (ex L0,L1LN) and the Entity (ex Eukaryote, Bacteria etc.)

    stored in the session variable

    - function in the Controller(MVC architecture of codeignitor) is called

    - Invoke the childrens of the selected parent

    - the tree expands

  • 8/8/2019 Browser Scatter Plot

    15/51

  • 8/8/2019 Browser Scatter Plot

    16/51

    About Organism Browser(GUI)Search hits Result Page:

  • 8/8/2019 Browser Scatter Plot

    17/51

    Scatterplot Module:

    Introduction

    Interactive data visualization for viewing the hits (against clients Query)

    from the chEMBL database.

    Scope

    2D Scatterplot.

    Datapoints are coloured by Lipinskis Rule of Five,

    Boundary condition like Molecular weight, LogP, HBA, HDB, PSA etc.

    Identify potential Drug Targets.

  • 8/8/2019 Browser Scatter Plot

    18/51

    Existing system and Procedures (In use)

    AlmostAll the graphical libraries generate Static Images for Scatterplot

    .None of the other Js Libraries involves Heavy User Interaction.

    Highcharts , Best availabe Js library for our Task.

    Y not Flash? Additional Plug In Required on client side ( Client says Babyee )

  • 8/8/2019 Browser Scatter Plot

    19/51

    a) Proposed / Expected Design

  • 8/8/2019 Browser Scatter Plot

    20/51

    Step Actor Action Step System Response

    Select Interaction Analysis Module

    1 User Selects the Plot option and Update

    the Graph.

    2 System displays Scatterplot with

    corresponding X-Y axis+Store the event in session.

    3 UserChanges the Settings and Updatethe Graph.

    4 System displays Scatterplot overwritethe session variables.

    5 The user click on the link for theColour Options.

    6 The output will be HighSlide Js popup box WithColour Options.

    7 UserClick on the input Box and changesthe colour in the Colour Picker.

    8 XML:HTTP request sent to server,Session variables overwritten, SeriesRedraw.

    9 User Perform Mouse Down and move themouse over the chart, Finally releases it.

    10 Zoom in event occurs + Reset zoomAppears on chart.

    Primary Scenario

    11 User Disable / Enable a series. 12 Series disappear / Appear.

  • 8/8/2019 Browser Scatter Plot

    21/51

  • 8/8/2019 Browser Scatter Plot

    22/51

    Model View Controller

  • 8/8/2019 Browser Scatter Plot

    23/51

    Model View ControllerWhy MVC?

    1)Effective separation of logic and view.

    2)Speeds up Development

    :Components of MVC:

    1)Controller : Loads Model and View

    2) Model: For operation on the database like select insert update etc.

    3) View: For displaying the HTML content.

  • 8/8/2019 Browser Scatter Plot

    24/51

    26.12.201024

    Technologies Used for Organism Browser:

    1)PHP

    2)Javascript

    3)Jquery

    4)Google charts

    5)SQL6)PLSQL

    Database used:

    1)Oracle 11g

    Materials and Methodology

  • 8/8/2019 Browser Scatter Plot

    25/51

  • 8/8/2019 Browser Scatter Plot

    26/51

    26.12.201026 26.12.201026

    Softwares Used:

    1)CodeIgnitor(Model View Controller Framework)

    2)Eclipse

    3)SQL Developer

    Firebug (For testing)

    Framework Used:

    CodeIgnitor(PHP framework)

    Materials and Methodology

  • 8/8/2019 Browser Scatter Plot

    27/51

    26.12.201027 26.12.201027

    Softwares Used:

    To compare with the repository

    1)Subversion

    2)Trac

    Operating System:

    1)Mac OS X

    Software For installing Windows on Mac OS X

    1) Vitual Machine

    Materials and Methodology

  • 8/8/2019 Browser Scatter Plot

    28/51

  • 8/8/2019 Browser Scatter Plot

    29/51

    Major Issues In development

    1).Compatibility issues with the Latest version of PHP and codeignitor.(Y)

    (Common problem with Opensource projects)

    2) Compatibility issues with Latest version of Mozilla and the Firebug latest version(Y)

    (installed lower version of Mozilla and Firebug)

  • 8/8/2019 Browser Scatter Plot

    30/51

    Scatterplot functional Overview

    zoom in / out

    Clickable datapoints

    Adjustable datapoint size

    Adjustable Chart size

    (For best performance with Varying desktop resolutions)

  • 8/8/2019 Browser Scatter Plot

    31/51

    Scatterplot functional Overview

    Colour Picker forChanging Series color.

    Tooltip bubble with Compound Image & X-Y coordinates information

    + chEMBL ID + Series Detail.

    Datastructure Storing X-Y coordinates for each compound.

    PopUp showing number of compound representing a given Datapoint

  • 8/8/2019 Browser Scatter Plot

    32/51

    Scatterplot functional Overview

    Link to ChEMBL report card.

    Flickr / Facebook Like Top menu Bar.

    Display Time to execute SQL Query.

    Export options

  • 8/8/2019 Browser Scatter Plot

    33/51

    HCTEST_SMALL 127 rows of data

    HCTEST_MEDIUM 798 rows of data

    HCTEST_LARGE 20284 rows of data

    Testing:

    Initial Testing for performance of Highcharts Js library :

  • 8/8/2019 Browser Scatter Plot

    34/51

    Inside each table we will find the following columns:

    MOLREGNO

    CHEBI_ID

    MOLWEIGHTSYNONYMS

    ALOGP

    PSA

    HBA

    HBD

    RO5

    SIMILARITY

    CHIME

    TAG

  • 8/8/2019 Browser Scatter Plot

    35/51

    Overview of the browsers used to access chembl interface is below,

    (firefox is most popular):

  • 8/8/2019 Browser Scatter Plot

    36/51

    WithTooltip WithoutTool tip

    1)Safari 2-3 sec 1-2 sec

    2)Chrome 2-3 sec 1-2 sec

    3)Firefox 5-6 sec 1-2 sec

    4)Opera 3-4 sec 2-3 sec

    5)IE not feasible 3-4 sec

    6)Sea Monkey 5-6 sec 1-2 sec

    Tested with and without tool tip in all the major browsers includes:

    For :::::: 2000 ::::::: datapoints

  • 8/8/2019 Browser Scatter Plot

    37/51

    For ::::: 20000:::::::: datapoint (Disabled zoom)

    Firefox: 6 sec

    Sarfari: 9-10 sec

    Opera: 35-36 sec

    chrome: 3-4 sec

    IE: not feasible

    For5000datapoints (disabled tool tip and zoom)

    IE 5-6 sec

    For500 data points

    All options enabled

  • 8/8/2019 Browser Scatter Plot

    38/51

    >500

    IE zoom disabled

    >1500

    Firefox Zoom disabled

    >5000

    For all browsers Zoom Disabled

    Best performance with Sarfari & Google chrome (Both uses

    same engine i.e. Webkit)

    Can show >> 20000 datapoints (optimal performance at 20000

    datapoints tested)

  • 8/8/2019 Browser Scatter Plot

    39/51

  • 8/8/2019 Browser Scatter Plot

    40/51

    26.12.201040

    Final Plot Options Dropdown

    (Optimised to be User Friendly)Initial Plot option Dropdown (Too long)

  • 8/8/2019 Browser Scatter Plot

    41/51

    26.12.201041

    Colour by : Molecular weight boundaries Colour by : R05 violations

  • 8/8/2019 Browser Scatter Plot

    42/51

    26.12.201042

    Snapshot of the application

  • 8/8/2019 Browser Scatter Plot

    43/51

    Organism Browser Status

    Created the organism_class table in the chembl_app_01 schema

    ORGANISM_CLASS joined to TARGET_DICTIONARY using the Tax_id (unique identifier)

    Created organism browser code using target browser code as template.

    Organism browser functions in the same way as the target browser.

  • 8/8/2019 Browser Scatter Plot

    44/51

    Organism Browser Status

    Additional functionality like searching in the tree added.

    Searching in the tree case insensitive

    Added wild-cards to searches

    ( ex search for abc in string -----abc-------)

  • 8/8/2019 Browser Scatter Plot

    45/51

    26.12.201045

    Snapshot of Organism Browser

  • 8/8/2019 Browser Scatter Plot

    46/51

    26.12.201046

    Target result result page

  • 8/8/2019 Browser Scatter Plot

    47/51

    4. Conclusion and Future Prospects

    17.05.201047

  • 8/8/2019 Browser Scatter Plot

    48/51

    Further enhancements over scatterplot ??

  • 8/8/2019 Browser Scatter Plot

    49/51

    26.12.201049

    IGoogle like Draggable ChEMBL analytics dashboard

  • 8/8/2019 Browser Scatter Plot

    50/51

    26.12.201050

    IGoogle like Draggable ChEMBL analytics dashboard

  • 8/8/2019 Browser Scatter Plot

    51/51

    Have a good day ahead

    Thank you for the patience