Searching for antibody information on STN - SequenceBase · and non-patent documents • Sequence...
Transcript of Searching for antibody information on STN - SequenceBase · and non-patent documents • Sequence...
Searching for antibody information on STN®
Robert Austin – FIZ Karlsruhe
Agenda
• Introduction to antibodies and immunoglobulins• Understanding and searching antibody indexing
in DGENE, REGISTRY and CAplusSM
• Sequence searching for antibodies using DGENE, USGENE ®, PCTGEN and REGISTRY– Complementarity Determining Regions (CDRs) using
BLAST® and Sequence Code Match (SCM) searching– Multi-file search and post-processing
2
See also: Sequence Basics e-Seminar (June 2010):http://www.stn-international.com/Sequence_Basics_Seminar.html
Why use STN to search for antibodies?
• Sequence information on STN is comprehensive – Four sequence databases allow users to achieve a
comprehensive search of sequences from both patent and non-patent documents
• Sequence databases on STN are timely– Allows you to keep up-to-date with the most current
information• Sequence indexing is unique
– Allows you to retrieve sequences containing uncommon residues or chemical modifications that are difficult to find
3
Four sequence databases on STN provide unique information
• DGENE– Sequences from the 41 authorities covered by DWPISM
– Sequence data are intellectually analyzed and indexed– Legal status and patent family display options– Information is updated once every two weeks
• REGISTRY– Sequences from the 61 authorities covered by CAplus– Sequences also come from >3000 life science journals– Sequence data are intellectually analyzed and indexed– Information is updated daily
4
Four sequence databases on STN provide unique information (cont.)
• USGENE– Sequences from all relevant USPTO published patent
applications and granted (issued) patents– Legal status and patent family display options– Updated weekly, within three days of publication
• PCTGEN– Sequences submitted and published electronically as a
formal part of WIPO/PCT published patent applications– Legal status and patent family display options– Updated weekly, within 24 hours of publication
5
Antibodies are produced as a defence against foreign substances (antigens)
• Antibodies (Ab) are specialised glycoproteins, which differ in size, charge, carbohydrate content and amino acid sequence composition
• They are also known as immunoglobulins (Ig)• Antibodies are found in blood and other bodily
fluids of mammals and some other vertebrates• They are a central part of the humoral immune
response (HIR) and are synthesised by B-cells
6
Antibodies are useful because of their biological properties and high specificity
• There are different classes of antibodies, depending on their structure– Mammals have five classes of antibodies
• α (IgA), δ (IgD) , ε (IgE), γ (IgG), μ (IgM)– Each class has different biological properties
• Antibodies are highly specific to antigens– Able to locate one molecule of a protein antigen out of
more than 108 similar molecules– Useful in targeted therapy and as diagnostic tools
7
Mammalian antibodies are Y-shaped and composed of heavy and light chains
• Antibodies are composed of four polypeptides– Two identical light chain (L)– Two identical heavy chain (H)
• Both light and heavy chains consist of constant (C) region domains with little variability, and variable (V) region domains with high variability
• The four chains are held together by a several disulphide bonds and form a Y-shaped molecule
• The antigen binding sites (CDRs) are located at the tips of the Y-shaped arms
8
Additional nomenclature describes variable region domains
• Light chains exist in two forms– kappa (κ), lambda (λ)
• Heavy chains exist in five forms– α (IgA), δ (IgD), ε (IgE), γ (IgG), μ (IgM)– Variation in heavy chains gives rise to various
antibody subclasses: IgG1, IgG2, IgA1, etc.
9
Mammalian antibodies are Y-shaped and composed of heavy and light chains
10
-S-S--S-S-
CH2
CH3
CH 2
CH 3
Light Chain (L)κ, λ
Heavy Chain (H)γ, µ, α, δ, ε
Antigen binding site
Antigen binding site
CDR = complementarity determining regionVL = variable region - light chainVH = variable region - heavy chainCL = constant region - light chainCH = constant region - heavy chain
Hinge
Humanization of antibodies is an important process for therapeutic usage
• Immunotherapy (or biotherapy)– Uses certain parts of the immune system to fight
diseases such as cancer– Treatments are less toxic and potentially more
effective than chemical drugs– Types of antibodies used for therapy
• Monoclonal antibodies • Chimeric antibodies• CDR-Grafted antibodies• Phage Display antibodies
11
Antibodies are also used as diagnostic tools
• Antibody tools reduce assay time without compromising sensitivity
– Flow cytometric analysis• Analysis of morphological complexity of the cells, DNA
content (cell cycle analysis), cell sorting– Microarray technology
• Proteomics: Protein characterization and analysis of diseased vs. healthy patients
– Immunoblotting (or western blot)• Detection of a specific protein from a tissue or cell sample
– Immunohistochemistry• Localization of protein(s) in cells or tissue sections using
antibodies
12
Agenda
• Introduction to antibodies and immunoglobulins• Understanding and searching antibody indexing
in DGENE, REGISTRY, and CAplusSM
• Sequence searching for antibodies using DGENE, USGENE ®, PCTGEN and REGISTRY– Complementarity Determining Regions (CDRs) using
BLAST® and Sequence Code Match (SCM) searching– Multi-file search and post-processing
13
Antibody sequences are indexed in GENESEQTM on STN (DGENE)
• Description (/DESC)– Concise one-line description of the sequence– E.g. Mouse anti-protein X antibody VL region
• Keyword (/KW) indexing for antibody sequences– Type, e.g. humanized, monoclonal– Region, e.g. light chain constant region– Activity, e.g. antibody therapy, immune stimulation– Target, e.g. protein X– Disease, e.g. immune disorder, autoimmune disease– Technology, e.g. antibody engineering, antibody array
• Abstract (AB)– Includes the use of the antibody within the invention
• Features Table (/FEAT)– Details about Domain, Region, Disulphide-bonds, etc
15
L1 ANSWER 1 OF 1 DGENE COPYRIGHT 2010 THOMSON REUTERS on STN
ACCESSION NUMBER: AEN02775 protein DGENE
TITLE: New antibody useful for treating e.g. cancer
competitively inhibiting binding of competitor antibody
having complementarity determining region of specific
amino acid sequences as given in the specification to G-
protein coupled receptor.
INVENTOR: Howard M; Schall T
PATENT ASSIGNEE: (CHEM-N)CHEMOCENTRYX INC.
PATENT INFO: WO 2006116319 A2 20061102 60
APPLICATION INFO: WO 2006-US15492 20060419
PRIORITY INFO: US 2005-674140P 20050421
PAT. SEQ. LOC: Disclosure; SEQ ID NO 22
DATA ENTRY DATE: 22 FEB 2007 (first entry)
DOCUMENT TYPE: Patent
LANGUAGE: English
OTHER SOURCE: 2007-110158 [11]
DESCRIPTION: anti-CCX-CK2-antibody 11G8 VL region SEQ ID NO 22.
DGENE records also include the DWPI patent title (/TI).
Each DGENE record has a concise one-line description of the antibody sequence
Each DGENE record has keyword indexing for the antibody sequence
16
KEYWORD: cytostatic; neuroprotective; nootropic; nephrotropic; antirheumatic; antiarthritic; cardiant; antiarteriosclerotic; antiasthmatic; dermatological; antiinflammatory; gastrointestinal-gen.; antipsoriatic; vasotropic; immunosuppressive; antiulcer; ophthalmological; antidiabetic; vulnerary; hepatotropic; anorectic; respiratory-gen.; gynecological; hemostatic; cardiovascular-gen.; contraceptive; protein interaction; antibody; angiogenesis inhibition; cell proliferation; protein detection; antibody therapy; arthritis; Alzheimers disease; multiple sclerosis; renal failure; rheumatoid arthritis; transplant rejection; asthma; glomerulonephritis; contact dermatitis; inflammatory bowel disease; colitis; psoriasis; reperfusion injury; ocular disease; diabetic retinopathy; retinopathy of prematurity; macular degeneration; graft rejection; neovascular glaucoma; rubeosis; Osier-Webber Syndrome; telangiectasis; angiofibroma; Crohns disease; eczema;
• • •wound healing; osteopathic; fractures; burns; inflammation; ischemia; peripheral vascular disease; pre-eclampsia; cardiovascular disease; 11G8; light chain variable region.
ORGANISM: Mus sp.
Each DGENE abstract describes the use of the antibody sequence within the invention
17
ABSTRACT: The invention describes an antibody (A1) that competitively inhibits binding of a competitor antibody (a1) to CCX-CKR2 (G-protein coupled receptor), where the competitor antibody comprises the complementaritydetermining region (CDR) of specific amino acid sequences as given in the specification. The antibody is useful in a pharmaceutical composition for inhibiting angiogenesis or proliferation of a cancer cell in an individual such as other than human having or pre-disposed to have arthritis; for detecting a cell expressing CCX-CKR2 in a biological sample; for treating Alzheimer's disease; multiple sclerosis; kidney dysfunction; rheumatoid arthritis; cardiac allograft rejection; atherosclerosis; asthma; glomerulonephritis; contact dermatitis; inflammatory bowel disease; colitis; psoriasis; reperfusion injury; ocular angiogenic diseases, for example, • • • joints (e.g. arthritis and hemophiliac joints), healing of wounds, fractures, and burns, inflammatory diseases, ischemic heart, and peripheral vascular diseases; preclampsia and cardiovascular disease; for birth control. The antibody competitively inhibits binding of a competitor antibody to CCX-CKR2, and potently inhibits angiogenesis. This is the amino acid sequence of anti-CCX-CK2-antibody 11G8 light chain variable region.
The DGENE Feature Table includes detailed annotations for the antibody sequence
18
AMINO ACID COUNTS: 2 A; 4 R; 2 N; 6 D; 0 B; 2 C; 5 Q; 3 E; 0 Z; 9 G; 3 H;
5 I; 10 L; 4 K; 1 M; 4 F; 6 P; 15 S; 5 T; 1 W; 6 Y; 7 V;
0 Others
SEQUENCE LENGTH: 100
SEQUENCE
1 dvlmtqtpls lpvslgdqas iscrsshyiv hsdgntylew ylqkpgqspk
51 lliykvsnrf sgvpdrfsgs gsgtdftlki srveaedlgi yycfqgshvp
FEATURE TABLE:
Key |Location|Qualifier|
==========+========+=========+=======================
Region |1..23 |note |"framework region 1"
Region |24..39 |note |"CDR1"
Region |40..54 |note |"framework region 2"
Region |55..61 |note |"CDR2"
Region |62..93 |note |"framework region 3"
Antibody light and heavy chain sequences are indexed in separate DGENE records
19
L1 ANSWER 1 OF 2 DGENE COPYRIGHT 2010 THOMSON REUTERS on STN AN AWO22746 protein DGENETI Purifying an antibody from a composition comprises loading the . . .DESC Anti-VEGF bevacizumab humanized antibody light chain sequence,
SEQ ID 12.KW protein purification; VEGF ligand; cation-exchange; chromatography;
light chain; humanized antibody; protein purification; vascular endothelial growth factor.
SQL 214
L1 ANSWER 2 OF 2 DGENE COPYRIGHT 2010 THOMSON REUTERS on STN AN AWO22745 protein DGENETI Purifying an antibody from a composition comprises loading the . . . DESC Anti-VEGF bevacizumab humanized antibody heavy chain sequence,
SEQ ID 11.KW protein purification; VEGF ligand; cation-exchange; chromatography;
heavy chain; humanized antibody; protein purification; vascular endothelial growth factor.
SQL 453Note: This example comes from WO2009058812.
Thomson Reuters indexing makes clear which one is which.
Antibody light and heavy chain sequences are indexed in separate USGENE records
20
=> D AN TRIAL 1-2
L2 ANSWER 1 OF 2 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on STN
AN 20090148435.12 Protein USGENE
TI ANTIBODY PURIFICATION BY CATION EXCHANGE CHROMATOGRAPHY
(PublishedApplication)
DESC Artificial Protein; Sequence is synthesized; sequence 12 of 20
MTY Protein
SQL 214
L2 ANSWER 2 OF 2 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on STN
AN 20090148435.11
TI ANTIBODY PURIFICATION BY CATION EXCHANGE CHROMATOGRAPHY
(PublishedApplication)
DESC Artificial Protein; Sequence is synthesized; sequence 11 of 20
MTY Protein
SQL 453
Note: This example comes from US20090148435 A1, which is equivalent to WO2009058812 A1.
Patent applicants often do not provide a clear description.
Antibodies are indexed as substances in CAS REGISTRY
• Antibodies are indexed as sequences if the sequence(s) is provided by the author(s)– The Note (/NTE) field contains additional information
about the sequence (i.e. chemically modified, linkages, uncommon amino acids, etc.)
– Separate records for the full multi-chain antibody, light chain and heavy chain sequences may be created
• Antibody sequences are also indexed with – Index Names– Trade names– Generic names– Lab names
21
CA Index Names provide information, such as sequence type, organisms, and strain/cell/tissue types
22
L1 ANSWER 1 OF 1 REGISTRY COPYRIGHT 2010 ACS on STN RN 216974-75-3 REGISTRYCN Immunoglobulin G1, anti-(human vascular endothelial
growth factor)(human-mouse monoclonal rhuMAb-VEGF g1-chain), disulfide with human-mouse monoclonal rhuMAb-VEGF light chain, dimer (9CI) (CA INDEX NAME)
OTHER NAMES:CN Anti-VEGF monoclonal antibodyCN AvastatinCN AvastinCN BevacizumabCN rhuMAb-VEGFFS PROTEIN SEQUENCESQL 1334,453,453,214,214
Avastin is registered as a multichain sequence (two heavy and two light chains).
Brand/generic names, and lab names are listed under “Other names”.
The CA Index Name for Avastin contains additional information, such as the isotype (G1), the antigen (VEGF), monoclonal antibody, etc.
Modifications and/or linkages between chains are listed in the /NTE field
23
NTE multichain----------------------------------------------------------------------------------------------------------------------------
type ------ location ------ description----------------------------------------------------------------------------------------------------------------------------
bridge Cys-22 - Cys-96 disulfide bridge bridge Cys-150 - Cys-206 disulfide bridge bridge Cys-226 - Cys-214'' disulfide bridge bridge Cys-232 - Cys-232' disulfide bridge bridge Cys-235 - Cys-235' disulfide bridge bridge Cys-267 - Cys-327 disulfide bridge bridge Cys-373 - Cys-431 disulfide bridge bridge Cys-22' - Cys-96' disulfide bridge bridge Cys-150' - Cys-206' disulfide bridge bridge Cys-226' - Cys-214''' disulfide bridge bridge Cys-267' - Cys-327' disulfide bridge bridge Cys-373' - Cys-431' disulfide bridge bridge Cys-23'' - Cys-88'' disulfide bridge bridge Cys-134'' - Cys-194'' disulfide bridge ----------------------------------------------------------------------------------------------------------------------------In REGISTRY, the specific residue(s) position(s) are listed in the /NTE field.
CAS will index both light and heavy chains for antibodies
24
SEQ 1 EVQLVESGGG LVQPGGSLRL SCAASGYTFT NYGMNWVRQA PGKGLEWVGW51 INTYTGEPTY AADFKRRFTF SLDTSKSTAY LQMNSLRAED TAVYYCAKYP
101 HYYGSSHWYF DVWGQGTLVT VSSASTKGPS VFPLAPSSKS TSGGTAALGC• • •401 PPVLDSDGSF FLYSKLTVDK SRWQQGNVFS CSVMHEALHN HYTQKSLSLS451 PGK
SEQ 1 EVQLVESGGG LVQPGGSLRL SCAASGYTFT NYGMNWVRQA PGKGLEWVGW51 INTYTGEPTY AADFKRRFTF SLDTSKSTAY LQMNSLRAED TAVYYCAKYP
101 HYYGSSHWYF DVWGQGTLVT VSSASTKGPS VFPLAPSSKS TSGGTAALGC• • •401 PPVLDSDGSF FLYSKLTVDK SRWQQGNVFS CSVMHEALHN HYTQKSLSLS451 PGK
SEQ 1 DIQMTQSPSS LSASVGDRVT ITCSASQDIS NYLNWYQQKP GKAPKVLIYF51 TSSLHSGVPS RFSGSGSGTD FTLTISSLQP EDFATYYCQQ YSTVPWTFGQ• • •201 LSSPVTKSFN RGEC
SEQ 1 DIQMTQSPSS LSASVGDRVT ITCSASQDIS NYLNWYQQKP GKAPKVLIYF51 TSSLHSGVPS RFSGSGSGTD FTLTISSLQP EDFATYYCQQ YSTVPWTFGQ• • •201 LSSPVTKSFN RGEC
Heavy Chain
Heavy Chain
Light Chain
Light Chain
Light and heavy chain sequences may also be indexed in separate REGISTRY records
25
=> D SQIDE 1-2
L4 ANSWER 1 OF 2 REGISTRY COPYRIGHT 2010 ACS on STN RN 1150802-75-7 REGISTRYCN 8: PN: WO2009058812 SEQID: 12 unclaimed protein (CA INDEX
NAME)FS PROTEIN SEQUENCESQL 214. . . .
L4 ANSWER 2 OF 2 REGISTRY COPYRIGHT 2010 ACS on STN RN 1150802-74-6 REGISTRYCN 7: PN: WO2009058812 SEQID: 11 unclaimed protein (CA INDEX
NAME)FS PROTEIN SEQUENCESQL 453. . . .
Tip: Compare to DGENE (slide 19) and to USGENE (slide 20).
Note: REGISTRY sequence records from patents, often do not include a description of the sequence.
The antibody REGISTRY numbers are indexed in CAplus bibliographic records
26
L26 ANSWER 1 OF 1 HCAPLUS COPYRIGHT 2010 ACS on STN AN 2009:553192 HCAPLUSDN 150:513022TI Antibody purification by cation exchange chromatography using a high
pH wash step to remove of contaminants prior to eluting in a buffer with increased conductivity
IN Lebreton, Benedicte Andree; O'Connor, Deborah Ann; Safta, Aurelia; Sharma, Mandakini
PA Genentech, Inc., USASO PCT Int. Appl., 53pp.
CODEN: PIXXD2DT PatentLA EnglishFAN.CNT 1
PATENT NO. KIND DATE APPLICATION NO. DATE--------------- ---- -------- -------------------- --------
PI WO 2009058812 A1 20090507 WO 2008-US81516 20081029. . . .
AB A method for purifying an antibody by cation exchange chromatog. isdescribed in which a high pH wash step is used to remove of contaminants prior to eluting the desired antibody using an elution buffer with increased conductivity Preferably the antibody binds human CD20, such as rituximab, or binds human vascular endothelial growth factor (VEGF), such as bevacizumab. . . . .
CAplus records provide title, patent family and abstract.
The antibody Registry Numbers are linked to detailed roles and index terms in CAplus
27
CC 15-3 (Immunochemistry)Section cross-reference(s): 9. . . .
IT 174722-31-7P, Rituximab 216974-75-3P, BevacizumabRL: ARG (Analytical reagent use); BPN (Biosynthetic preparation); PUR(Purification or recovery); THU (Therapeutic use); ANST (Analyticalstudy); BIOL (Biological study); PREP (Preparation); USES (Uses)
(antibody purification by cation exchange chromatog. using high pH wash step to remove of contaminants prior to eluting in buffer with increased conductivity)
. . . .
IT 192433-87-7 214551-08-3 214551-09-4 214551-11-8 214551-12-9214551-13-0 444104-00-1 556112-97-1 556112-98-2 556112-99-3556113-00-9 1150529-46-6 1150802-74-6 1150802-75-7 1150802-76-8 1150802-77-9 1150802-78-0 1150802-79-1 1150802-80-4 1150802-81-5RL: PRP (Properties)
(unclaimed protein sequence; antibody purification by cationexchange chromatog. using a high pH wash step to remove of contaminants prior to eluting in a buffer with increased conductivity)
. . . .
This CAplus record has indexing for both the multi-chain antibody, and separately for the light and heavy chain sequences.
Antibodies are indexed in CAplus
• Covered by controlled terms in CAplus– Consult the Lexicon for old and new terms– Refine with additional concepts contained within the
same index term by using the (L) operator• CAS Registry Numbers (CAS RNs) are available
to supplement a CAplus search, if retrieval of specific antibody substances is required
28
How are antibodies indexed in CAplus?
• Antibodies are indexed to the most specific level disclosed in a document– Light chains
• κ (kappa), λ (lambda)– Heavy chains
• α (IgA), δ (IgD), ε (IgE), γ (IgG), μ (IgM)• Subclasses: IgG1, IgG2, IgA1, etc.
• Descriptors (limiters) can provide additional information– Examples: bispecific, catalytic, labeled, neutralizing,
humanized, chimeric, etc.
29
Antibody controlled term indexing has changed over time in CAplus
30
Controlled Term Years of Coverage
Antibodies and Immunoglobulins/CT 2002 to present
Amboceptors/CT 1907-1946
Antibodies/CT 1907-2001
Globulins, immune/CT 1967-1976
Immunoglobulins/CT 1962-2001
Immunoglobulin (Ig) nomenclature can be used to focus on specific forms of interest
31
Search Question:Find records covering lambda light chain immunoglobulins. Are any of these IgGimmunoglobulins?
Identify uses of these substances.
Search Strategy
To find references on immunoglobulins...
Step 1. Search appropriate Ig termsStep 2. Refine with class, subclass, or chain
nomenclatureStep 3. Evaluate using D SCAN HITStep 4. (Optional) Refine with additional
concepts or CAS RolesStep 5. Display results
32
Tips for searching in CAplus
• Use BOTH single word immunoglobulin and antibody terms in the /IT field, especially to include records prior to 2002
• Use (L) proximity to add modifying terms• Specific subclass may be used but allow for
generic class for comprehensive results– For example, IgG2a is of most interest but IgG2
encompasses it and should be included • Supplement CAS Roles with text terms in the
modifier
33
Use SET commands to automatically add plurals and abbreviations
=> SET PLU ON; SET ABB ON; SET SPE ONSET COMMAND COMPLETED
=> S (IMMUNOGLOBULIN(L)LAMBDA)/IT11999 IMMUNOGLOBULIN/IT
143185 IMMUNOGLOBULINS/IT150880 IMMUNOGLOBULIN/IT
((IMMUNOGLOBULIN OR IMMUNOGLOBULINS)/IT)20421 IG/IT6316 IGS/IT25278 IG/IT
((IG OR IGS)/IT)156119 IMMUNOGLOBULIN/IT
((IMMUNOGLOBULIN OR IG)/IT)178328 LAMBDA
68 LAMBDAS178342 LAMBDA
(LAMBDA OR LAMBDAS)L1 1213 IMMUNOGLOBULIN/IT (L) LAMBDA
34
Adding antibody term retrieves additional answers
=> S ANTIBODY/IT (L) LAMBDA70328 ANTIBODY/IT
214281 ANTIBODIES/IT223724 ANTIBODY/IT
((ANTIBODY OR ANTIBODIES)/IT)178328 LAMBDA
68 LAMBDAS178342 LAMBDA
(LAMBDA OR LAMBDAS)L2 568 ANTIBODY/IT (L) LAMBDA
=> S L1 OR L2L3 1374 L1 OR L2
35
Alternative search using BOTH terms
=> S (ANTIBODY OR IMMUNOGLOBULIN)/IT (L) LAMBDAL4 1374 (ANTIBODY OR IMMUNOGLOBULIN)/IT (L) LAMBDA
=> S L4 (L) IGG?80343 IGG?
L5 58 L4 (L) IGG?
=> D HIT SCAN
L5 58 ANSWERS CAPLUS COPYRIGHT 2010 ACS on STN IT Antibodies and Immunoglobulins
RL: ADV (Adverse effect, including toxicity); BSU (Biological study, unclassified); DGN (Diagnostic use); BIOL (Biological study); USES (Uses)
(IgG, .lambda., .kappa.; gammopathy detected byserum protein electrophoresis for predicting andmanaging therapy of lymphoproliferative disorder in liver transplant recipients)
HOW MANY MORE ANSWERS DO YOU WISH TO SCAN? (1):END
36
Both Lambda and Kappa forms are indexed.
Remove the kappa light chain entries from the index term search
37
=> S L5 (NOTL) KAPPA70588 KAPPA
11 KAPPAS70594 KAPPA
(KAPPA OR KAPPAS)L6 32 L5 (NOTL) KAPPA
=> D SCAN HIT
L6 32 ANSWERS CAPLUS COPYRIGHT 2010 ACS on STN IT Antibodies and Immunoglobulins
RL: BSU (Biological study, unclassified); BIOL (Biological study)
(light chain, .lambda.; of IgG autoantibodies toTSH receptors in Graves disease of humans)
HOW MANY MORE ANSWERS DO YOU WISH TO SCAN? (1):2
Exclude Kappa from the index term search with (NOTL).
Index terms describe therapeutic applications
L6 32 ANSWERS CAPLUS COPYRIGHT 2010 ACS on STN IT Lymphoma
(B-cell, monoclonal antibodies to IgG1.lambda. paraprotein variable region epitopes in
diagnosis of human)IT Antibodies
RL: BIOL (Biological study)(monoclonal, IgG1.lambda. paraprotein variableregion epitopes recognition by, of human, lymphoid disease diagnosis in relation to)
L6 32 ANSWERS CAPLUS COPYRIGHT 2010 ACS on STN IT Immunoglobulins
RL: PRP (Properties)(mapping of .lambda. light chain epitopes for humanlupus IgG autoantibodies)
IT Protein sequences(of Ig .lambda. light chains of humans in relationto lupus IgG autoantibody binding)
HOW MANY MORE ANSWERS DO YOU WISH TO SCAN? (1):0
38
Agenda
• Introduction to antibodies and immunoglobulins• Understanding and searching antibody indexing
in DGENE, REGISTRY, and CAplusSM
• Sequence searching for antibodies using DGENE, USGENE ®, PCTGEN and REGISTRY– Complementarity Determining Regions (CDRs) using
BLAST® and Sequence Code Match (SCM) searching– Multi-file search and post-processing
39
Sequence searching for CDRs
• Within the variable domains of the heavy and light chains, there are three hyperactivity regions called Complementarity Determining Region (CDR)
• Each chain contains three CDR regions• CDR is the region that recognizes the different
antigens– CDR is important when developing antibodies
40
BLAST and/or Sequence Code Match (SCM) can be used to retrieve CDRs
41
Search Question:The epithelial cell adhesion molecule (EpCAM) is a cell surface protein that is expressed by a variety of tumor cells. We have identified CDR sequence DMGWGSGWRP YYYYGMDV in our laboratory. Find all patent publications that disclose this sequence or similar sequences.
Search Strategy for DGENE, USGENE, PCTGEN and REGISTRY/CAplus
Step 1. RUN BLAST in USGENE, DGENE and PCTGEN using offline BATCH mode
Step 2. Repeat the search using CAS REGISTRY BLAST
Step 3. Retrieve, merge, organize by patent family and display USGENE, DGENE and PCTGEN results
Step 4. Retrieve, identify and display unique CAplus references from the REGISTRY BLAST search
Step 5. Post-process results into tables and reports
42
RUN BLAST searches in DGENE, USGENE and PCTGEN in offline BATCH mode
43
=> FILE DGENE; RUN BLAST DMGWGSGWRPYYYYGMDV/SQP -F F -E 20000 -W 2 -M PAM30 BATCH
PLEASE ENTER BATCH IDENTIFIER (MAX. 8 CHARS):EPCAMCDR
BATCH PROCESSING STARTED FOR EPCAMCDR
=> FILE USGENE; RUN BLAST DMGWGSGWRPYYYYGMDV/SQP -F F -E 20000 -W 2 -M PAM30 BATCH
PLEASE ENTER BATCH IDENTIFIER (MAX. 8 CHARS):EPCAMCDR
BATCH PROCESSING STARTED FOR EPCAMCDR
=> FILE PCTGEN; RUN BLAST DMGWGSGWRPYYYYGMDV/SQP -F F -E 20000 -W 2 -M PAM30 BATCH
PLEASE ENTER BATCH IDENTIFIER (MAX. 8 CHARS):EPCAMCDR
BATCH PROCESSING STARTED FOR EPCAMCDR
Tip: DGENE, USGENE and PCTGEN BLAST searches can be run in parallel using BATCH mode.
Add BATCH to the end of the command.
Adjust the REGISTRY BLAST settings for optimal search retrieval
44
Recommended BLAST settings for shortsequences (35 amino acids or less)1) Turn OFF (uncheck) the low complexity filter2) Increase Expectation Value to 20,0003) Decrease Word Size to 24) Choose the PAM-30 Weight Matrix
For more information about BLAST matrices, visit the NCBI web site.
Increase the maximum number of answers to 1,000.
Retrieve references for sequences
45
Note: in this example, BLAST sequences with scores of 42 or more (60% match or more) are selected.
Retrieve REGISTRY BLAST results
46
=> FIL REGISTRY
=> QUE (1065745-06-3 OR 1067695-29-7 OR 1065745-11-0 OR . . .
L1 QUE (1065745-06-3 OR 1067695-29-7 OR 1065745-11-0 OR . . .
=> QUE (862861-57-2 OR 487483-96-5 OR 215027-98-8 OR . . .
L2 QUE (862861-57-2 OR 487483-96-5 OR 215027-98-8 OR . . .
=> QUE (1089240-08-3 OR 1074003-68-1 OR 960549-36-4 OR . . .
L3 QUE (1089240-08-3 OR 1074003-68-1 OR 960549-36-4 OR . . .
=> S L1 OR L2 OR L3L4 36 L1 OR L2 OR L3
36 similar sequences (L4) with BLAST scores of 42 or more.
Transfer BLAST sequences with scores of 42 or more (60% match or more).
Commands within the dotted box are automatic commands.
Tip: BLAST is better than SCM for searching short sequences for less than 100% match
47
=> FIL REG
=> S DMGWGSGWRPYYYYGMDV/SQSFPL5 20 DMGWGSGWRPYYYYGMDV/SQSFP
=> S L5 NOT L4L6 0 L5 NOT L4
=> DEL L5-L6 Y
Subsequence family protein search (/SQSFP) (L5), allows for amino acid family substitution.
REGISTRY BLAST (L4) retrieved 16 extra sequences with 60% or higher match by score that were not retrieved with /SQSFP (L5).
Retrieve the DGENE, USGENE and PCTGEN in BLAST search results
48
=> FILE DGENE; RUN GETBATCH EPCAMCDR. . . .
ENTER (ALL) OR ? :60%L5 RUN STATEMENT CREATEDL5 32 DMGWGSGWRPYYYYGMDV/SQP.-F F -E 20000 -W 2 -M PAM30
=> FILE USGENE; RUN GETBATCH EPCAMCDR. . . .
ENTER (ALL) OR ? :60%L6 RUN STATEMENT CREATEDL6 17 DMGWGSGWRPYYYYGMDV/SQP.-F F -E 20000 -W 2 -M PAM30
=> FILE PCTGEN; RUN GETBATCH EPCAMCDR. . . .
ENTER (ALL) OR ? :60%L7 RUN STATEMENT CREATEDL7 12 DMGWGSGWRPYYYYGMDV/SQP.-F F -E 20000 -W 2 -M PAM30
DGENE, USGENE and PCTGEN BLAST searches are retrieved with the RUN GETBATCH command.
Merge and review the DGENE, USGENE and PCTGEN in BLAST search results
49
=> DUP IDE L5 L6 L7
FILE 'DGENE' ENTERED AT . . .COPYRIGHT (C) 2010 THOMSON REUTERS
FILE 'USGENE' ENTERED AT . . .COPYRIGHT (C) 2010 SEQUENCEBASE CORP
FILE 'PCTGEN' ENTERED AT . . .COPYRIGHT (C) 2010 WIPO
L8 61 DUP IDE L5 L6 L7 (INCLUDES 0 SETS OF DUPLICATES)ANSWERS '1-32' FROM FILE DGENE ANSWERS '33-49' FROM FILE USGENE ANSWERS '50-61' FROM FILE PCTGEN
=> SOR SCORE D AN DPROCESSING COMPLETED FOR L8 L9 61 SOR L8 SCORE D AN D
The multi-file answers (L8) can be into descending BLAST score order (L9).
Use the DUP IDE command to merge the results into a single multi-file L-number (L8).
Review the BLAST search results
50
=> D TRI SCORE ALIGN FROM EACH
L9 ANSWER 1 OF 61 DGENE COPYRIGHT 2010 THOMSON REUTERS on STN AN ATS30301 protein DGENETI New polypeptide comprises a binding domain capable of binding to an
epitope of human and non-chimpanzee primate CD3 epsilon chain, useful for preventing, treating, or ameliorating a proliferative, tumor, or immunological disorder.
DESC Anti-CD3/anti-EpCAM cross-species single chain Ab protein, SEQ: 592.KW single chain antibody; CD3E; T-cell CD3 glycoprotein epsilon chain;
TACSTD1; EpCAM; protein production; protein therapy; therapeutic; prophylactic to disease; protein detection; immune disorder; immunomodulator; cancer; cytostatic; hyperproliferation; Fusion protein.
SQL 504SCORE 70 100% of query self score 70BLASTALIGN
Query = 18 lettersLength = 504Score = 69.8 bits (157), Expect = 5e-18Identities = 18/18 (100%), Positives = 18/18 (100%)Query: 1 DMGWGSGWRPYYYYGMDV 18
DMGWGSGWRPYYYYGMDVSbjct: 99 DMGWGSGWRPYYYYGMDV 116
Example: displaying the best answer from each database in a free-of-charge review format.
Review the BLAST search results (cont.)
51
L9 ANSWER 21 OF 61 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on STN TI Method of identifying binding site domains that retain the capacity
of binding to an epitope (Patent)DESC Homo Sapiens Protein; sequence 54 of 77MTY ProteinSQL 127SCORE 70 100% of query self score 70BLASTALIGN
Query = 18 lettersLength = 127Score = 69.8 bits (157), Expect = 2e-18Identities = 18/18 (100%), Positives = 18/18 (100%)Query: 1 DMGWGSGWRPYYYYGMDV 18
DMGWGSGWRPYYYYGMDVSbjct: 99 DMGWGSGWRPYYYYGMDV 116
L9 ANSWER 25 OF 61 PCTGEN COPYRIGHT 2010 WIPO on STN TI CROSS-SPECIES-SPECIFIC BINDING DOMAINMTY PRTSQL 504SCORE 70 100% of query self score 70BLASTALIGN
Query = 18 lettersLength = 504Score = 69.8 bits (157), Expect = 5e-18Identities = 18/18 (100%), Positives = 18/18 (100%)Query: 1 DMGWGSGWRPYYYYGMDV 18
DMGWGSGWRPYYYYGMDVSbjct: 99 DMGWGSGWRPYYYYGMDV 116
Use the STN Express 8.4 Patent Family Manager wizard display the results
52
Access the patent family manager wizard from the Discover! Menu.
Choose a bibliographic display format with alignment for the first (best) hit, and a free-of-charge format with alignment for the rest of the sequences in each patent family group.
The patent family manager begins by organising the results using FSORT...
53
=> FSORT L9. . . .L10 61 FSO L9
11 Multi-record Families Answers 1-60Family 1 Answers 1-18Family 2 Answers 19-20Family 3 Answers 21-22Family 4 Answers 23-25Family 5 Answers 26-27Family 6 Answers 28-29Family 7 Answers 30-34Family 8 Answers 35-42Family 9 Answers 43-56Family 10 Answers 57-58Family 11 Answers 59-60
1 Individual Record Answer 610 Non-patent Records
In this example, 12 patent family groups (i.e. 11 + 1) are retrieved.
Commands in RED are those issued automatically by the STN Express Patent Family Manager.
FSORT organizes the patent sequence records by Publication, Application, Related, and Priority numbers.
...and then continues by displaying the family groups in the specified formats
54
=> DIS L10 PFAM=4 1 BIB,PSL,SQL,SCORE,ALIGN
L10 ANSWER 23 OF 61 DGENE COPYRIGHT 2010 THOMSON REUTERS on STN FAMILY4AN AEC16143 protein DGENE Full-textTI Treating tumorous disease, such as breast, colon, prostate, liver,
skin, ovarian, cervical and lung cancer, by administering a human immunoglobulin specifically binding to the human epithelial cell adhesion molecule (EpCAM) antigen.
IN Peters M; Locher M; Prang N; Quadt CPA (MICR-N) MICROMET AG.PI US 20050180979 A1 20050818 23AI US 2004-778915 20040213PRAI US 2004-778915 20040213LA EnglishOS 2005-590351 [60]DESC Human anti-EpCAM immunoglobulin heavy chain, SEQ ID NO: 1.PSL Claim 12; SEQ ID NO 1SQL 457SCORE 70 100% of query self score 70BLASTALIGN
Query = 18 lettersLength = 457Score = 69.8 bits (157), Expect = 5e-18Identities = 18/18 (100%), Positives = 18/18 (100%)Query: 1 DMGWGSGWRPYYYYGMDV 18
DMGWGSGWRPYYYYGMDVSbjct: 99 DMGWGSGWRPYYYYGMDV 116
Commands in RED are those issued automatically by the STN Express Patent Family Manager.
...and then continues by displaying selected results in the specified formats (cont.)
55
=> DIS L10 PFAM=4 2-TOT TRIAL,SCORE,ALIGN
L10 ANSWER 24 OF 61 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on STNFAMILY4TI Anti-EpCam Immunoglobulins (PublishedApplication)DESC Artificial Protein; Anti-EpCAM Heavy Chain; sequence 1 of 2MTY ProteinSQL 457SCORE 70 100% of query self score 70BLASTALIGN
Query = 18 lettersLength = 457Score = 69.8 bits (157), Expect = 5e-18Identities = 18/18 (100%), Positives = 18/18 (100%)Query: 1 DMGWGSGWRPYYYYGMDV 18
DMGWGSGWRPYYYYGMDVSbjct: 99 DMGWGSGWRPYYYYGMDV 116
L10 ANSWER 25 OF 61 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on STNFAMILY4TI Anti-EpCAM immunoglobulins (PublishedApplication)MTY ProteinSQL 457SCORE 70 100% of query self score 70BLASTALIGN
Query = 18 lettersLength = 457Score = 69.8 bits (157), Expect = 5e-18Identities = 18/18 (100%), Positives = 18/18 (100%)Query: 1 DMGWGSGWRPYYYYGMDV 18
DMGWGSGWRPYYYYGMDVSbjct: 99 DMGWGSGWRPYYYYGMDV 116
These two USGENE hits are in the same family as the DGENE record on the previous slide.
...and then continues by displaying selected results in the specified formats (cont.)
56
=> DIS L10 61 BIB,PSL,SQL,SCORE,ALIGN
L10 ANSWER 61 OF 61 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on STN AN 20070081993.104 Peptide USGENE Full-textTI Pharmaceutical composition comprising a bispecific antibody for epcam
(PublishedApplication)IN Kufer Peter (Moosburg, DE); Berry Meera (Ulm, DE); Offner Sonja
(Munich, DE); Brischwein Klaus (Munich, DE); Wolf Andreas (Planegg, DE)
PA NO ASSIGNEE AT PUBLICATIONPI US 20070081993 A1 20070412AI US 2004-554851 20040526RLI WO 2004-EP5687 20040526DT PatentSQL 18SCORE 70 100% of query self score 70BLASTALIGN
Query = 18 lettersLength = 18Score = 69.8 bits (157), Expect = 3e-19Identities = 18/18 (100%), Positives = 18/18 (100%)Query: 1 DMGWGSGWRPYYYYGMDV 18
DMGWGSGWRPYYYYGMDVSbjct: 1 DMGWGSGWRPYYYYGMDV 18
This USGENE record is the “individual record” in the FSORT answer set (L10).
Retrieve and identify any unique CAplus references by the REGISTRY search
57
=> FILE HCAPLUS
=> TRA L10 PNL11 TRANSFER L10 1- PN : 22 TERMSL12 16 L11ALL TERMS IN L11 RETRIEVED.
=> S L4L13 14 L4
=> S L13 NOT L12L14 1 L13 NOT L12 In this example, one additional relevant
reference has been found by including a REGISTRY/CAplus search (L14).
Transfer Publication Numbers (PN) from DGENE, USGENE and PCTGEN (L10) to find corresponding HCAplus records (L12).
The 36 REGISTRY records (L4) correspond to 14 HCAplus records (L13).
Display any unique CAplus references retrieved in the REGISTRY search
58
=> D L14 BIB ABS HITRN
L14 ANSWER 1 OF 1 HCAPLUS COPYRIGHT 2010 ACS on STN AN 2008:1210508 HCAPLUS Full-textDN 149:446284TI Cross-species bispecific single-chain antibodies to human or non-
chimpanzee primate CD3e and surface antigen for treating tumorous, proliferative, or immunological disease and cancer
IN Ebert, Evelyn; Meier, Petra; Sriskandarajah, Mirnaalini; Burghart, Elke; Wissing, Sandra; Klinger, Matthias; Bluemel, Claudia; Raum, Tobias; Rau, Doris; Mangold, Susanne; Kvesic, Majk; Kischel, . . . .Hausmann, Susanne; Riethmueller, Gert
PA Micromet A.-G., GermanySO PCT Int. Appl., 397pp.
CODEN: PIXXD2DT PatentLA EnglishFAN.CNT 3
PATENT NO. KIND DATE APPLICATION NO. DATE--------------- ---- -------- -------------------- --------
PI WO 2008119565 A2 20081009 WO 2008-EP2662 20080403. . . .
Retrieve, identify and display unique CAplus references from the REGISTRY search
59
AB The present invention provides polypeptides comprising an antibody binding domain capable of binding to an epitope of human and non-chimpanzee primate CD3 e-chain fused to a cell surface antigen selected from epidermal growth factor receptor, epidermal growth factor receptor variant III, melanoma chondroitin sulfateproteoglycan, carbonic anhydrase IX, CD30, CD33, CD44 variant 6, EpCAM, Her2/neu, MUC1, and IgE. An N-terminal 1-27 amino acid residues polypeptide fragment of the extracellular domain of human CD e-chain was identified which - in contrast to all other known epitopesof CD3 e - maintains its 3-dimensional structural integrity when taken out of its native environment in the CD3 complex. . . . . Further, the invention provides a method for the identification of polypeptides comprising a cross-species specific binding domain capable of binding to an epitope of human and non-chimpanzee primate CD3 e.
IT 1067695-29-7 1067695-33-3 1067695-35-51067695-37-7 1067695-39-9RL: PRP (Properties); THU (Therapeutic use); BIOL (Biological study); USES (Uses)
(amino acid sequence; cross-species bispecific single-chain antibodies to human or non-chimpanzee primate CD3e and surface antigen for treating tumorous, proliferative, or immunol. disease and cancer)
It is important to include the HITRN or HITIND display for post-processing.
STN Express post-processing tools provide the finishing touches to the multi-file search
1) DGENE, USGENE and PCTGEN results (L10) can be conveniently tabulated using the STN Express Table Tool and exported to a Microsoft Excel worksheet
2) The REGISTRY “BLAST Report with Alignment data” tool merges BLAST alignments with corresponding unique CAplus records (L14) to form a single RTF file
60
1)2)
1) DGENE, USGENE and PCTGEN results can be tabulated and exported to Excel
61
Preferred fields, fonts, labels, etc, can be saved as a Templatefor repeated re-use.
Once in Excel there are various options to sort, filter and review the multi-file results
62
Some tips for Microsoft Excel:• Resize columns and rows as desired –
especially the BLAST alignment column to approx 77
• View, Freeze panes – holds the top row fixed when scrolling down
• Add Filters – provides a great way to navigate results – for example by BLAST score (above)
2) REGISTRY BLAST Alignments can be merged with corresponding CAplus records
63
Preferred fields, fonts, labels, etc, can be saved as a Templatefor repeated re-use.
REGISTRY BLAST alignments are merged with the CAplus records in the transcript.
The REGISTRY “BLAST Report with Alignment data” tool provides an RTF file
64
Patent classifications for antibody topics
65
USCL IPC8 Class Subclass Subclass Group Topic
530 387.1+ C07K 16/00+ Immunoglobulins, Antibodies
530 388.1+ C12P 21/08+ Monoclonal Antibodies
530 391.1+ C07K 16/00+ Conjugated antibodies
424 1.49 A61K 49/00+ Compositions containing radio-labeled antibodies
424 130.1 A61K 39/395+ Compositions for body treatment containing antibodies (therapeutics, vaccines)
435 7.1+ G01N 33/53+ Immunoassays using antibodies
435 188 C12N 09/96+ Antibodies conjugated to enzymes
435 188.5 C07K 16/00+ Catalytic antibodies
435 325+ C12N 5/00+ Cells that express antibodies (fused, recombinant)
525 54.1 A61K 47/48+ Antibodies bound to resins
Derwent World Patents Index® (DWPISM) Manual Codes (/MC) for antibody topics
66
http://scientific.thomson.com/cgi-bin/mc/search.cgi
More than 120 Manual Codes (/MC) are available for antibody searching.
Summary
• DGENE provides detailed annotations and indexing for text searching for antibody technologies
• REGISTRY provides extensive annotations, and common, trade, generic, and lab antibody names
• CAS controlled and index terms are all useful for retrieving antibody information in CAplus – Use text terms to search for types of antibodies in CAplus
• Class (α (IgA), δ (IgD), ε (IgE), γ (IgG), μ (IgM))
• BLAST and SCM searches are available for antibody sequence searching in DGENE, USGENE, PCTGEN and REGISTRY on STN
67
Resources
• Archived e-Seminars:www.cas.org/support/stngen/stntraining/recorded.html– Unmasking the World of Antibodies in CAS REGISTRY – Finding Antibodies and Immunoglobulins – Sequence motif searching on STN
• STN User Documentation:www.cas.org/support/stngen/stndoc/sequences.html– Quick Reference Guides
• CAS REGISTRY: BLAST similarity searching via STN Express • CAS REGISTRY: Exact and pattern searching of nucleic acid
sequences • CAS REGISTRY: Exact and pattern searching of protein
sequences
• Sequence Searching on STN public workshopwww.stn-international.com/sequence_searching.html
68
Searching for antibody information on STN®
www.stn-international.com