SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME...

74
SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group

Transcript of SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME...

Page 1: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

SureChEMBL – Tech Track Session

ELIXIR Innovation and SME forum

Mark Davies

Technical Lead

ChEMBL Group

Page 2: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Outline

• Background

• Patent data

• Coverage and content

• Capabilities

• Future plans

• SureChEMBL interface demo

• myChEMBL example

• SureChEMBL exercises

Page 3: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Background

Page 4: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

What is EMBL-EBI?

• Part of the European Molecular Biology Laboratory

• International, non-profit research institute

• Europe’s hub for biological data services and research

• 500 members of staff from 53 nations.

Page 5: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

EMBL-EBI resources & groups Genes, genomes & variation

ArrayExpressExpression Atlas

MetabolightsPRIDE

InterPro Pfam UniProt

ChEMBL ChEBI

Literature & ontologies

Europe PubMed CentralGene OntologyExperimental Factor Ontology

Molecular structuresProtein Data Bank in EuropeElectron Microscopy Data Bank

European Nucleotide Archive1000 Genomes

Gene, protein & metabolite expression

Protein sequences, families & motifs

Chemical biology

Reactions, interactions & pathways

IntAct Reactome MetaboLights

SystemsBioModelsEnzyme Portal

BioSamples

Ensembl Ensembl Genomes

European Genome-phenome ArchiveMetagenomics portal

Page 6: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Bioactivity dataBioactivity data

CompoundCompound

Ass

ay/T

arge

tA

ssay

/Tar

get

>Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE

3. Insight, tools and resources for translational drug discovery

2. Organization, integration, curation and standardization of pharmacology data

1. Scientific facts

Ki = 4.5nM

APTT = 11 min.

ChEMBL: Data for drug discovery

Page 7: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Patent data

• Historically a closed and costly data source

• Out of reach to many academics and SMEs

• Patent literature 2-3 years ahead of published literature

• Prior art and freedom to operate

• Competitor intelligence

• Provides access to lots more data

• High cost to extract and lots of noise

Do we include patent data in the ChEMBL database?

Page 8: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Patent Data

Page 9: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

What is a patent?

• patere (Latin) = to lay open

• Legal and technical documents

• Agreement between Inventor and State

• Disclosure of invention in exchange for exclusive rights

• Usually lasts 20 years

• Requires:

• Novelty, utility and inventive step

• Part of IP legislation, controlled by international treaties

Page 10: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical
Page 11: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Description/Examples

ClaimsFront page

Page 12: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Patent authorities

Page 13: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Patent families

• As defined by the EPO:

• A patent family is a set of either patent applications or publications taken in multiple countries to protect a single invention by a common inventor(s) and then patented in more than one country.

• A first application is made in one country – the priority – and is then extended to other offices.

Page 14: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Types of pharmaceutical patents

• Protein sequences

• Substances & compounds (composition of matter)

• Manufacturing processes

• Formulations/dosing

• Fixed-dose combinations

• Indications and uses

Page 15: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Patent classifications

• Classification Systems

• International Patent Classification (IPC/IPCR)

• Cooperative Patent Classification (CPC)

• European CLAssification (ECLA)

• United States Patent Classification (USPC)

C07D 239/94• Heterocyclic Compounds

A61K 31/505 • Preparations for Medical,

Dental or Toilet purposes

Page 16: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

International Patent Classification

http://web2.wipo.int/ipcpub

Page 17: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Chemical entities in patents

• Markush structures

• Molecular formulas (e.g. C2H5)

• IUPAC nomenclature (e.g. propyl, acetylsalicylic acid)

• Trivial and trade names (e.g. aspirin)

• Images of 2D structures

• Non-structural entities (e.g. ‘pharmaceutically acceptable salt’)

• Supplementary files (e.g. molfiles, chemdraw)

• Identifiers (e.g. CAS numbers)

Markush

Exemplified

Page 18: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Why is searching chemical patents useful?

• Infringement search to avoid areas of valid patent protection (freedom to operate)

• Search for industrial profiles and research directions (competitive intelligence)

• State-of-the-art search*

• Find claimed inhibitors of EGFR receptor published in 2014

• Novel heterocyclic scaffolds / reaction schemes

• Search for citations and key references

• Most of the knowledge in chemical patents will never appear anywhere else

Page 19: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

How can one search for chemical patents?http://worldwide.espacenet.com

http://www.lens.org/lens

http://patentscope.wipo.int

https://www.google.com/patents

Page 20: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

However…

Thomson Pharma ($)CAS SciFinder ($)Elsevier Reaxys ($)IBM SIIPS ($)SureChEMBL

Page 21: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

SureChEMBL

Page 22: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

SureChem becomes SureChEMBL

• December 2013 EMBL-EBI acquired SureChem – a leading chemistry patent mining product from Digital Science, Macmillan Group

• SureChem not aligned with core future academic business

• Existing SureChem user base

• Free (SureChemOpen)

• Paying (SureChemPro + API)

• EMBL-EBI supported existing licensees during transition

• EMBL-EBI provides an ongoing, free and open resource to the entire community

• Rebranded as SureChEMBL

Page 23: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Rebranding process

Page 24: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

SureChEMBL patent coverageData Description & Languages Years

EP applications Bib. dataFull text

DocDB + OriginalOriginal (EN, DE, FR) from 1978

EP granted Bib. dataFull text

DocDB + OriginalOriginal (EN, DE, FR) From 1980

WO applicationsBib. data

Full text

DocDB + Original

Original (EN, DE, FR, ES, RU)

From 1978

From 1978

US applicationsBib. data

Full text

DocDB + Original

Original (EN)

From 2001

From 2001

US granted Bib. data

Full text

DocDB + Original

Original (EN)

From 1920

From 1976

JP applications Bib. DataDocDB

PAJ - English abstracts/titles

From 1973

From 1976

JP granted Bib. data DocDB From 1994

90+ countries Bib. data DocDB From 1920

Page 25: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

• Structures from text: 1976 onwards

• Title, abstract, claims, description

• IUPAC, trivial, drug names, etc.

• SureChem Chemical Entity Recognition proprietary algorithm

• ACD/Labs, ChemAxon, OpenEye, OPSIN, PerkinElmer name-to-structure conversion

• Structures from images: 2007 onwards

• CLiDE image-structure conversion

• USPTO offers ‘Complex Work Units’ since 2001

• CWU file types include MOL and CDX

• CWUs processed as part of pipeline: 2007 onwards

SureChEMBL chemistry data coverage

Page 26: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

SureChEMBL data pipeline

WO

EPApplications& Granted

USApplications & granted

JPAbstracts

Patent Offices

Chemistry Database

SureChEMBL System

Patent PDFs

(service)

Application Server

Users

API

Database

Entity Recognition

SureChem IP

1‐[4‐ethoxy‐3‐(6,7‐dihydro‐1‐methyl‐7‐oxo‐3‐propyl‐1H‐pyrazolo[4,3‐d]pyrimidin‐5‐yl)phenylsulfonyl]‐4‐

methylpiperazine

Image to Structure(one method)

Name to Structure (five methods)

OCR

Processed patents

(IFI Claims)

Page 27: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

SureChEMBL user interface

https://www.surechembl.org/

Page 28: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

SureChEMBL data content (27/10/14)

• 15,893,365 unique compounds

• 13,046,249 annotated patents

• ~80,000 novel compounds extracted from ~50,000 new patents monthly

• 2–7 days for a published patent to be chemically annotated and searchable in SureChEMBL

• SureChEMBL provides search access to all patents (not just chemically annotated ones)

• ~120M patents

Page 29: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

EMBL-EBI chemistry resources

RDF and REST API interfaces

REST API Interface ‐ https://www.ebi.ac.uk/unichem/

Atlas

Ligand induced transcript response

750

PDBe

Ligand structures 

from structurally defined protein 

complexes

15K

ChEBI

Nomenclature of primary and secondary metabolites. Chemical Ontology

24K

SureChEMBL

Chemicalstructures from patent literature

~16M

ChEMBL

Bioactivity data from literature 

and depositions

1.5M

UniChem – InChI‐based chemical resolver (full + relaxed ‘lenses’) >70M

3rd Party Data

ZINC, PubChem, ThomsonPharma DOTF, IUPHAR, DrugBank, KEGG, 

NIH NCC, eMolecules, FDA SRS, PharmGKB, 

Selleck, ….

~55M

Page 30: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

SureChEMBL compound data access

• UniChem (“Universal Compound Resolver”)

• Weekly updates

• Web service lookup

• Connectivity search

• https://www.ebi.ac.uk/unichem/

• FTP download

• Quarterly updates

• All SureChEMBL compounds in SDF and CSV format

• Raw data

• ftp://ftp.ebi.ac.uk/pub/databases/chembl/SureChEMBL/

Page 31: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

• InChI-based comparison using filtered parent compounds

ChEMBL – SureChEMBL overlap

235K18.4%1.3M 12.2M

SureChEMBLChEMBL

Filters• MW between 100 and 1200• #Atoms between 6 and 70• ALogP between -10 and 10• #C > 0• #Rings > 0• #C != #Atoms• RTB <= 20

(ChEMBL 18)

Page 32: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

SureChEMBL IPCR classification

SureChEMBL patents from 2014 ~400K

Page 33: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Can we have everything?

Cost

TimeQuality

Page 34: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Common sources of errors

• Small, poor quality images

• OCR errors in names (OCR done by IFI). There is an OCR correction step, but cannot fix all errors

-> ‘2,6-Difluoro-Λ/-{1 -r(4-iodo-2-methylphenyl)methvn-1 H-pyrazol-3-vDbenzamide’

• Reliability better for US patents due to inclusion of mol files

Page 35: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Use cases with SureChEMBL

• Chemoinformatics

• Chemistry landscape for a particular biological target/disease

• MDS, MCS and R-group analysis for a particular patent family claimed chemistry see myChEMBL examples

• (Negative) novelty checking with UniChem

• Competitive intelligence

• Reporting

• Patent alerts

• Per target/disease

Page 36: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Future plans

• SureChEMBL UI now available

• https://www.surechembl.org/

• OpenPHACTS project

• Biological entity extraction and annotation

• Semantic integration

• Add new data sources

• Patent authorities

• Europe PubMedCentral (scientific literature corpus)

• Image and attachment processing prior 2007

• Images and ‘Complex Work Units’

Page 37: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Enhanced entity extraction plans

• Identify new entity types e.g. proteins, diseases and cell lines

• Working with Open PHACTS partners

• Extend using ChEMBL dictionaries + others

• Ontology/synonym mapping - integration

• Target-relevance assessment

• Protein/biotherapeutic sequence extraction

• Sequence based patent searches

• Enhanced cross-referencing

• Tag up all commonly used identifiers (CAS, ChEBI, ChEMBL, PubChem, UniProt,…)

Page 38: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Bioactivity data extraction? Compounds

Target/Assay

Bioactivity

Page 39: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Markush structure extraction?

-alkyl-aryl-heteroaryl-heterocyclyl-cycloalkyl….

Page 40: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

SureChEMBL Interface

Page 41: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

https://www.surechembl.org/

Page 42: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Homepage

Help

Search by keyword

Search by chemical structure(sketch

compound)

Search by SMILES, MOL, SMARTS, name

Search by patent numberFilter by authority (US, EP, WO and JP)

Filter by document section (title, claims, abstract, description and images)

Chemical search type filter 

(substructure, similarity, identical)

Filter by date

Filter by MW

Page 43: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Keyword-based search

• Uses Lucene Query Language• Example searches…

• roche OR novartis• sterili?e• kinase*• pfizer C07D “kinase inhibitor”• pn:WO2011058149A1• (pa:bayer OR genentech ORmerck) AND desc:(chemotherap* AND

(“phosphoinositide kinase”~0.8 OR Pi3K))

Page 44: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Fielded keyword search

Keyword search Filter by document section

Logical operators

Page 45: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Lucene Field Description Indexed Data Samplescpn SureChEMBL Patent Number (SCPN) EP‐0555555‐B1 scpn:EP‐0555555‐B1pn publication number EP0555555B1 pn:ep0555555b1pd publication date 20120101 pd:20120101an application number EP06009700A an:EP06009700Aad application date 20061213 ad:20061213pri priority(ies) DE19958719A 19991206 pri:“DE19958719A 19991206”pridate all priority dates 20000913 pridate:20000913pdyear publication year 2013 pdyear:2013ds designated states DE ds:(DE OR GB OR FR)

GB ds:FRpctpn PCT publication number WO2006098969A2 pctpn:WO2006098969A2pctpd PCT publication date 20060921 pctpd:20060921pctan PCT application number US2006008177W pctan:US2006008177Wpctad PCT application date 20060308 pctad:20060308relan related application number Division of application No. 12/159,232 relan:US15923208

relad related application date Jun 26, 2008 relad:20080626ic IPCR C CO8 C08K C08K0005 ic:Ccpc CPC C C07 C07D C07D0471 C07D047104 cpc:C07D

ecla ECLA C07D487/10 ecla:C07D487/10uc US class 29 uc:029inv inventor(s) schmidt hans‐werner inv:("schmidt hans" AND thelakkat)

apl applicant Sony International (Europe) GmbH apl:sony

asg assignee SIEMENS AKTIENGESELLSCHAFT asg:siemenspa apl or asg assignee(s) or applicant(s) see apl and asg above pa:sonycor correspondent Dr Roger Brooks cor: “Dr Roger Brooks”agt agents Pohlman, Sandra M agt:”Pohlman, Sandra M”pcit patent citations EP0748154B1 pcit:EP0748154B1ncit non‐patent citations TANG C W: ”Two‐layer organic photovoltaic cell” ncit:(tang AND ”Two‐layer organic 

photovoltaic cell”)ttl title in English, French and German Sonnenenergiesystem ttl:(”solar energy” OR “énergie solaire” OR 

Sonnen*)ab abstract in English, French and Germandesc description in English, French and Germanclm claims in English, French and Germantext abstract or description or claims in 

English, French or Germanpnlang publication language EN FR DE PT NO RU NL SV FI TR IS and more pnlang:(NO OR FI OR SV)

Page 46: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

SureChEMBL Patent Numbers (SCPN)

• Standardised format used to search system

• Format: CC-PATNO-KK, e.g. WO-2011161255-A2

• Batch conversion available via interface homepage link

Page 47: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Keyword searches return documents

Page 48: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Patent family members

Page 49: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Export patent chemistry

Property range filters

Count filters

Go to ‘My Exports’ to download CSV or XML

Page 50: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Patent view - Front page

Page 51: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Patent view - Claims

Page 52: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Chemical entities in patent

Click on blue highlighted text to see chemical info box

Page 53: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Patent view - Tools

Access to source document PDF

Export chemistry for document or family

Page 54: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Chemistry-based searching

Structure sketch

(2 sketchers)

Types of search

Filter by MW range

Filter by document section

Page 55: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Structure search type differences

Page 56: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Chemistry searches return structures

Tautomers are registered as different structures, unlike in ChEMBL – this will likely change in future

Page 57: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Review chemistry hits

Page 58: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Compound report page

UniChem integration: On-the-fly integration with 71M structures and from 25 data sources

Page 59: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Review patent documents for chemistry

Page 60: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Review patent documents for chemistry

Page 61: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Example SureChEMBL Workflow

Page 62: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

myChEMBL Example

Page 63: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

myChEMBL LaunchPad

Page 64: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

SureChEMBL and myChEMBL

More: http://chembl.blogspot.co.uk/2014/10/mychembl-19-released.htmlDownload: ftp://ftp.ebi.ac.uk/pub/databases/chembl/VM/myChEMBL/current/

Page 65: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

SureChEMBL and myChEMBL

http://nbviewer.ipython.org/github/rdkit/UGM_2014/blob/master/Notebooks/Vardenafil.ipynb

Page 66: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Exercises

Page 67: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Exercises

1. What are the IPCR codes for Heterocyclic Compoundsand Peptides?

1. How many patents are classified as containing

• Heterocyclic compounds

• Peptides

• Heterocyclic compounds AND Peptides in 2013

2. How many family members does the patent WO-2011058149-A1 have?

Page 68: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Exercises

4. How many compounds have a structure similar (>90% Tanimoto) to the approved drug gefitinib?

4. How many patents contain the structure of gefitinib? And what is the priority number of the earliest patent?

5. Extract the chemistry from a recent patent family which makes reference to inflammation and also contains the following structure:

Page 69: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

SureChEMBL knowledge base

https://surechembl.uservoice.com/

Page 70: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

SureChEMBL support

[email protected]

Page 71: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Acknowledgements• ChEMBL team

• John Overington

• Jon Chambers

• George Papadatos

• Mark Davies

• Nathan Dedman

• Anna Gaulton

• Digital Science• Nicko Goncharoff

• James Siddle

• Richard Koks

• Open PHACTS consortium• http://www.openphacts.org/partners/consortium

Funding:Innovative Medicines Initiative Joint Undertaking, grant agreement no. 115191 (Open PHACTS)

Wellcome Trust Strategic Award for Chemogenomics, WT086151/Z/08/Z

European Molecular Biology Laboratory

European Commission FP7 Capacities Specific Programme, grant agreement no. 284209 (BioMedBridges)

Software:

Page 72: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Answers

1. Go to http://web2.wipo.int/ipcpub website and search for terms (note searching is a bit tricky):

• Heterocyclic compounds = C07D

• Peptides = C07K

2. Go to SureChEMBL (https://www.surechembl.org) site and carry out the following searches:

• ic:C07D (returned 848,603 hits 21/11/14)

• ic:C07K (returned 496,289 hits 21/11/14)

• ic:(C07K AND C07D) AND pdyear:2013 (returned 424 hits 21/11/14)

Page 73: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Answers

3. Carry out patent number search for WO-2011058149-A1 and click on family icon on results table

• 12 family members

4. Type the term ‘gefitinib’ into Manual structure input, check the Similarity search radio button and set the Tanimoto coefficient to 90%

• 50 structures are returned

5. Identical search for gefitinib, click on compound to retrieve patents, go to last page (sorry no sort function at present)

• WO-1996033980-A1

• GB-9508538-A 1995-04-27

Page 74: SureChEMBL – Tech Track Session · SureChEMBL – Tech Track Session ELIXIR Innovation and SME forum Mark Davies Technical Lead ChEMBL Group. ... UniChem – InChI‐based chemical

Answers

6. Conduct a keyword search for ttl:inflammation and draw the structure of aspirin. Choose an appropriate search method and press search button. Select 1 or more compounds and view patent results. To download chemistry press the “Export chemistry for this family” button:

(Make sure export settings are updated as aspirin molweight is 180)