Searching for antibody information on STN - SequenceBase · and non-patent documents • Sequence...

Searching for antibody information on STN®

Robert Austin – FIZ Karlsruhe

Agenda

• Introduction to antibodies and immunoglobulins• Understanding and searching antibody indexing

in DGENE, REGISTRY and CAplusSM

• Sequence searching for antibodies using DGENE, USGENE ®, PCTGEN and REGISTRY– Complementarity Determining Regions (CDRs) using

BLAST® and Sequence Code Match (SCM) searching– Multi-file search and post-processing

2

See also: Sequence Basics e-Seminar (June 2010):http://www.stn-international.com/Sequence_Basics_Seminar.html

http://www.stn-international.com/training_center/bioseq/dgene_wm.pdf

http://www.stn-international.com/Sequence_Basics_Seminar.html

Why use STN to search for antibodies?

• Sequence information on STN is comprehensive – Four sequence databases allow users to achieve a

comprehensive search of sequences from both patent and non-patent documents

• Sequence databases on STN are timely– Allows you to keep up-to-date with the most current

information• Sequence indexing is unique

– Allows you to retrieve sequences containing uncommon residues or chemical modifications that are difficult to find

3

Four sequence databases on STN provide unique information

• DGENE– Sequences from the 41 authorities covered by DWPISM

– Sequence data are intellectually analyzed and indexed– Legal status and patent family display options– Information is updated once every two weeks

• REGISTRY– Sequences from the 61 authorities covered by CAplus– Sequences also come from >3000 life science journals– Sequence data are intellectually analyzed and indexed– Information is updated daily

4

Four sequence databases on STN provide unique information (cont.)

• USGENE– Sequences from all relevant USPTO published patent

applications and granted (issued) patents– Legal status and patent family display options– Updated weekly, within three days of publication

• PCTGEN– Sequences submitted and published electronically as a

formal part of WIPO/PCT published patent applications– Legal status and patent family display options– Updated weekly, within 24 hours of publication

5

Antibodies are produced as a defence against foreign substances (antigens)

• Antibodies (Ab) are specialised glycoproteins, which differ in size, charge, carbohydrate content and amino acid sequence composition

• They are also known as immunoglobulins (Ig)• Antibodies are found in blood and other bodily

fluids of mammals and some other vertebrates• They are a central part of the humoral immune

response (HIR) and are synthesised by B-cells

6

Antibodies are useful because of their biological properties and high specificity

• There are different classes of antibodies, depending on their structure– Mammals have five classes of antibodies

• α (IgA), δ (IgD) , ε (IgE), γ (IgG), μ (IgM)– Each class has different biological properties

• Antibodies are highly specific to antigens– Able to locate one molecule of a protein antigen out of

more than 108 similar molecules– Useful in targeted therapy and as diagnostic tools

7

Mammalian antibodies are Y-shaped and composed of heavy and light chains

• Antibodies are composed of four polypeptides– Two identical light chain (L)– Two identical heavy chain (H)

• Both light and heavy chains consist of constant (C) region domains with little variability, and variable (V) region domains with high variability

• The four chains are held together by a several disulphide bonds and form a Y-shaped molecule

• The antigen binding sites (CDRs) are located at the tips of the Y-shaped arms

8

Additional nomenclature describes variable region domains

• Light chains exist in two forms– kappa (κ), lambda (λ)

• Heavy chains exist in five forms– α (IgA), δ (IgD), ε (IgE), γ (IgG), μ (IgM)– Variation in heavy chains gives rise to various

antibody subclasses: IgG1, IgG2, IgA1, etc.

9

Mammalian antibodies are Y-shaped and composed of heavy and light chains

10

-S-S--S-S-

CH2

CH3

CH 2

CH 3

Light Chain (L)κ, λ

Heavy Chain (H)γ, µ, α, δ, ε

Antigen binding site

Antigen binding site

CDR = complementarity determining regionVL = variable region - light chainVH = variable region - heavy chainCL = constant region - light chainCH = constant region - heavy chain

Hinge

Humanization of antibodies is an important process for therapeutic usage

• Immunotherapy (or biotherapy)– Uses certain parts of the immune system to fight

diseases such as cancer– Treatments are less toxic and potentially more

effective than chemical drugs– Types of antibodies used for therapy

• Monoclonal antibodies • Chimeric antibodies• CDR-Grafted antibodies• Phage Display antibodies

11

Antibodies are also used as diagnostic tools

• Antibody tools reduce assay time without compromising sensitivity

– Flow cytometric analysis• Analysis of morphological complexity of the cells, DNA

content (cell cycle analysis), cell sorting– Microarray technology

• Proteomics: Protein characterization and analysis of diseased vs. healthy patients

– Immunoblotting (or western blot)• Detection of a specific protein from a tissue or cell sample

– Immunohistochemistry• Localization of protein(s) in cells or tissue sections using

antibodies

12

Agenda


in DGENE, REGISTRY, and CAplusSM



13

Antibody sequences are indexed in GENESEQTM on STN (DGENE)

• Description (/DESC)– Concise one-line description of the sequence– E.g. Mouse anti-protein X antibody VL region

• Keyword (/KW) indexing for antibody sequences– Type, e.g. humanized, monoclonal– Region, e.g. light chain constant region– Activity, e.g. antibody therapy, immune stimulation– Target, e.g. protein X– Disease, e.g. immune disorder, autoimmune disease– Technology, e.g. antibody engineering, antibody array

• Abstract (AB)– Includes the use of the antibody within the invention

• Features Table (/FEAT)– Details about Domain, Region, Disulphide-bonds, etc

15

L1 ANSWER 1 OF 1 DGENE COPYRIGHT 2010 THOMSON REUTERS on STN

ACCESSION NUMBER: AEN02775 protein DGENE

TITLE: New antibody useful for treating e.g. cancer

competitively inhibiting binding of competitor antibody

having complementarity determining region of specific

amino acid sequences as given in the specification to G-

protein coupled receptor.

INVENTOR: Howard M; Schall T

PATENT ASSIGNEE: (CHEM-N)CHEMOCENTRYX INC.

PATENT INFO: WO 2006116319 A2 20061102 60

APPLICATION INFO: WO 2006-US15492 20060419

PRIORITY INFO: US 2005-674140P 20050421

PAT. SEQ. LOC: Disclosure; SEQ ID NO 22

DATA ENTRY DATE: 22 FEB 2007 (first entry)

DOCUMENT TYPE: Patent

LANGUAGE: English

OTHER SOURCE: 2007-110158 [11]

DESCRIPTION: anti-CCX-CK2-antibody 11G8 VL region SEQ ID NO 22.

DGENE records also include the DWPI patent title (/TI).

Each DGENE record has a concise one-line description of the antibody sequence

Each DGENE record has keyword indexing for the antibody sequence

16

KEYWORD: cytostatic; neuroprotective; nootropic; nephrotropic; antirheumatic; antiarthritic; cardiant; antiarteriosclerotic; antiasthmatic; dermatological; antiinflammatory; gastrointestinal-gen.; antipsoriatic; vasotropic; immunosuppressive; antiulcer; ophthalmological; antidiabetic; vulnerary; hepatotropic; anorectic; respiratory-gen.; gynecological; hemostatic; cardiovascular-gen.; contraceptive; protein interaction; antibody; angiogenesis inhibition; cell proliferation; protein detection; antibody therapy; arthritis; Alzheimers disease; multiple sclerosis; renal failure; rheumatoid arthritis; transplant rejection; asthma; glomerulonephritis; contact dermatitis; inflammatory bowel disease; colitis; psoriasis; reperfusion injury; ocular disease; diabetic retinopathy; retinopathy of prematurity; macular degeneration; graft rejection; neovascular glaucoma; rubeosis; Osier-Webber Syndrome; telangiectasis; angiofibroma; Crohns disease; eczema;

• • •wound healing; osteopathic; fractures; burns; inflammation; ischemia; peripheral vascular disease; pre-eclampsia; cardiovascular disease; 11G8; light chain variable region.

ORGANISM: Mus sp.

Each DGENE abstract describes the use of the antibody sequence within the invention

17

ABSTRACT: The invention describes an antibody (A1) that competitively inhibits binding of a competitor antibody (a1) to CCX-CKR2 (G-protein coupled receptor), where the competitor antibody comprises the complementaritydetermining region (CDR) of specific amino acid sequences as given in the specification. The antibody is useful in a pharmaceutical composition for inhibiting angiogenesis or proliferation of a cancer cell in an individual such as other than human having or pre-disposed to have arthritis; for detecting a cell expressing CCX-CKR2 in a biological sample; for treating Alzheimer's disease; multiple sclerosis; kidney dysfunction; rheumatoid arthritis; cardiac allograft rejection; atherosclerosis; asthma; glomerulonephritis; contact dermatitis; inflammatory bowel disease; colitis; psoriasis; reperfusion injury; ocular angiogenic diseases, for example, • • • joints (e.g. arthritis and hemophiliac joints), healing of wounds, fractures, and burns, inflammatory diseases, ischemic heart, and peripheral vascular diseases; preclampsia and cardiovascular disease; for birth control. The antibody competitively inhibits binding of a competitor antibody to CCX-CKR2, and potently inhibits angiogenesis. This is the amino acid sequence of anti-CCX-CK2-antibody 11G8 light chain variable region.

The DGENE Feature Table includes detailed annotations for the antibody sequence

18

AMINO ACID COUNTS: 2 A; 4 R; 2 N; 6 D; 0 B; 2 C; 5 Q; 3 E; 0 Z; 9 G; 3 H;

5 I; 10 L; 4 K; 1 M; 4 F; 6 P; 15 S; 5 T; 1 W; 6 Y; 7 V;

0 Others

SEQUENCE LENGTH: 100

SEQUENCE

1 dvlmtqtpls lpvslgdqas iscrsshyiv hsdgntylew ylqkpgqspk

51 lliykvsnrf sgvpdrfsgs gsgtdftlki srveaedlgi yycfqgshvp

FEATURE TABLE:

Key |Location|Qualifier|

==========+========+=========+=======================

Region |1..23 |note |"framework region 1"

Region |24..39 |note |"CDR1"


Region |55..61 |note |"CDR2"


Antibody light and heavy chain sequences are indexed in separate DGENE records

19

L1 ANSWER 1 OF 2 DGENE COPYRIGHT 2010 THOMSON REUTERS on STN AN AWO22746 protein DGENETI Purifying an antibody from a composition comprises loading the . . .DESC Anti-VEGF bevacizumab humanized antibody light chain sequence,

SEQ ID 12.KW protein purification; VEGF ligand; cation-exchange; chromatography;

light chain; humanized antibody; protein purification; vascular endothelial growth factor.

SQL 214

L1 ANSWER 2 OF 2 DGENE COPYRIGHT 2010 THOMSON REUTERS on STN AN AWO22745 protein DGENETI Purifying an antibody from a composition comprises loading the . . . DESC Anti-VEGF bevacizumab humanized antibody heavy chain sequence,

SEQ ID 11.KW protein purification; VEGF ligand; cation-exchange; chromatography;

heavy chain; humanized antibody; protein purification; vascular endothelial growth factor.

SQL 453Note: This example comes from WO2009058812.

Thomson Reuters indexing makes clear which one is which.

Antibody light and heavy chain sequences are indexed in separate USGENE records

20

=> D AN TRIAL 1-2

L2 ANSWER 1 OF 2 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on STN

AN 20090148435.12 Protein USGENE

TI ANTIBODY PURIFICATION BY CATION EXCHANGE CHROMATOGRAPHY

(PublishedApplication)

DESC Artificial Protein; Sequence is synthesized; sequence 12 of 20

MTY Protein

SQL 214

L2 ANSWER 2 OF 2 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on STN

AN 20090148435.11

TI ANTIBODY PURIFICATION BY CATION EXCHANGE CHROMATOGRAPHY

(PublishedApplication)

DESC Artificial Protein; Sequence is synthesized; sequence 11 of 20

MTY Protein

SQL 453

Note: This example comes from US20090148435 A1, which is equivalent to WO2009058812 A1.

Patent applicants often do not provide a clear description.

Antibodies are indexed as substances in CAS REGISTRY

• Antibodies are indexed as sequences if the sequence(s) is provided by the author(s)– The Note (/NTE) field contains additional information

about the sequence (i.e. chemically modified, linkages, uncommon amino acids, etc.)

– Separate records for the full multi-chain antibody, light chain and heavy chain sequences may be created

• Antibody sequences are also indexed with – Index Names– Trade names– Generic names– Lab names

21

CA Index Names provide information, such as sequence type, organisms, and strain/cell/tissue types

22

L1 ANSWER 1 OF 1 REGISTRY COPYRIGHT 2010 ACS on STN RN 216974-75-3 REGISTRYCN Immunoglobulin G1, anti-(human vascular endothelial

growth factor)(human-mouse monoclonal rhuMAb-VEGF g1-chain), disulfide with human-mouse monoclonal rhuMAb-VEGF light chain, dimer (9CI) (CA INDEX NAME)

OTHER NAMES:CN Anti-VEGF monoclonal antibodyCN AvastatinCN AvastinCN BevacizumabCN rhuMAb-VEGFFS PROTEIN SEQUENCESQL 1334,453,453,214,214

Avastin is registered as a multichain sequence (two heavy and two light chains).

Brand/generic names, and lab names are listed under “Other names”.

The CA Index Name for Avastin contains additional information, such as the isotype (G1), the antigen (VEGF), monoclonal antibody, etc.

Modifications and/or linkages between chains are listed in the /NTE field

23

NTE multichain----------------------------------------------------------------------------------------------------------------------------

type ------ location ------ description----------------------------------------------------------------------------------------------------------------------------

bridge Cys-22 - Cys-96 disulfide bridge bridge Cys-150 - Cys-206 disulfide bridge bridge Cys-226 - Cys-214'' disulfide bridge bridge Cys-232 - Cys-232' disulfide bridge bridge Cys-235 - Cys-235' disulfide bridge bridge Cys-267 - Cys-327 disulfide bridge bridge Cys-373 - Cys-431 disulfide bridge bridge Cys-22' - Cys-96' disulfide bridge bridge Cys-150' - Cys-206' disulfide bridge bridge Cys-226' - Cys-214''' disulfide bridge bridge Cys-267' - Cys-327' disulfide bridge bridge Cys-373' - Cys-431' disulfide bridge bridge Cys-23'' - Cys-88'' disulfide bridge bridge Cys-134'' - Cys-194'' disulfide bridge ----------------------------------------------------------------------------------------------------------------------------In REGISTRY, the specific residue(s) position(s) are listed in the /NTE field.

CAS will index both light and heavy chains for antibodies

24

SEQ 1 EVQLVESGGG LVQPGGSLRL SCAASGYTFT NYGMNWVRQA PGKGLEWVGW51 INTYTGEPTY AADFKRRFTF SLDTSKSTAY LQMNSLRAED TAVYYCAKYP

101 HYYGSSHWYF DVWGQGTLVT VSSASTKGPS VFPLAPSSKS TSGGTAALGC• • •401 PPVLDSDGSF FLYSKLTVDK SRWQQGNVFS CSVMHEALHN HYTQKSLSLS451 PGK

SEQ 1 EVQLVESGGG LVQPGGSLRL SCAASGYTFT NYGMNWVRQA PGKGLEWVGW51 INTYTGEPTY AADFKRRFTF SLDTSKSTAY LQMNSLRAED TAVYYCAKYP

101 HYYGSSHWYF DVWGQGTLVT VSSASTKGPS VFPLAPSSKS TSGGTAALGC• • •401 PPVLDSDGSF FLYSKLTVDK SRWQQGNVFS CSVMHEALHN HYTQKSLSLS451 PGK

SEQ 1 DIQMTQSPSS LSASVGDRVT ITCSASQDIS NYLNWYQQKP GKAPKVLIYF51 TSSLHSGVPS RFSGSGSGTD FTLTISSLQP EDFATYYCQQ YSTVPWTFGQ• • •201 LSSPVTKSFN RGEC

SEQ 1 DIQMTQSPSS LSASVGDRVT ITCSASQDIS NYLNWYQQKP GKAPKVLIYF51 TSSLHSGVPS RFSGSGSGTD FTLTISSLQP EDFATYYCQQ YSTVPWTFGQ• • •201 LSSPVTKSFN RGEC

Heavy Chain

Heavy Chain

Light Chain

Light Chain

Light and heavy chain sequences may also be indexed in separate REGISTRY records

25

=> D SQIDE 1-2

L4 ANSWER 1 OF 2 REGISTRY COPYRIGHT 2010 ACS on STN RN 1150802-75-7 REGISTRYCN 8: PN: WO2009058812 SEQID: 12 unclaimed protein (CA INDEX

NAME)FS PROTEIN SEQUENCESQL 214. . . .

L4 ANSWER 2 OF 2 REGISTRY COPYRIGHT 2010 ACS on STN RN 1150802-74-6 REGISTRYCN 7: PN: WO2009058812 SEQID: 11 unclaimed protein (CA INDEX

NAME)FS PROTEIN SEQUENCESQL 453. . . .

Tip: Compare to DGENE (slide 19) and to USGENE (slide 20).

Note: REGISTRY sequence records from patents, often do not include a description of the sequence.

The antibody REGISTRY numbers are indexed in CAplus bibliographic records

26

L26 ANSWER 1 OF 1 HCAPLUS COPYRIGHT 2010 ACS on STN AN 2009:553192 HCAPLUSDN 150:513022TI Antibody purification by cation exchange chromatography using a high

pH wash step to remove of contaminants prior to eluting in a buffer with increased conductivity

IN Lebreton, Benedicte Andree; O'Connor, Deborah Ann; Safta, Aurelia; Sharma, Mandakini

PA Genentech, Inc., USASO PCT Int. Appl., 53pp.

CODEN: PIXXD2DT PatentLA EnglishFAN.CNT 1

PATENT NO. KIND DATE APPLICATION NO. DATE--------------- ---- -------- -------------------- --------

PI WO 2009058812 A1 20090507 WO 2008-US81516 20081029. . . .

AB A method for purifying an antibody by cation exchange chromatog. isdescribed in which a high pH wash step is used to remove of contaminants prior to eluting the desired antibody using an elution buffer with increased conductivity Preferably the antibody binds human CD20, such as rituximab, or binds human vascular endothelial growth factor (VEGF), such as bevacizumab. . . . .

CAplus records provide title, patent family and abstract.

The antibody Registry Numbers are linked to detailed roles and index terms in CAplus

27

CC 15-3 (Immunochemistry)Section cross-reference(s): 9. . . .

IT 174722-31-7P, Rituximab 216974-75-3P, BevacizumabRL: ARG (Analytical reagent use); BPN (Biosynthetic preparation); PUR(Purification or recovery); THU (Therapeutic use); ANST (Analyticalstudy); BIOL (Biological study); PREP (Preparation); USES (Uses)

(antibody purification by cation exchange chromatog. using high pH wash step to remove of contaminants prior to eluting in buffer with increased conductivity)

. . . .

IT 192433-87-7 214551-08-3 214551-09-4 214551-11-8 214551-12-9214551-13-0 444104-00-1 556112-97-1 556112-98-2 556112-99-3556113-00-9 1150529-46-6 1150802-74-6 1150802-75-7 1150802-76-8 1150802-77-9 1150802-78-0 1150802-79-1 1150802-80-4 1150802-81-5RL: PRP (Properties)

(unclaimed protein sequence; antibody purification by cationexchange chromatog. using a high pH wash step to remove of contaminants prior to eluting in a buffer with increased conductivity)

. . . .

This CAplus record has indexing for both the multi-chain antibody, and separately for the light and heavy chain sequences.

Antibodies are indexed in CAplus

• Covered by controlled terms in CAplus– Consult the Lexicon for old and new terms– Refine with additional concepts contained within the

same index term by using the (L) operator• CAS Registry Numbers (CAS RNs) are available

to supplement a CAplus search, if retrieval of specific antibody substances is required

28

How are antibodies indexed in CAplus?

• Antibodies are indexed to the most specific level disclosed in a document– Light chains

• κ (kappa), λ (lambda)– Heavy chains

• α (IgA), δ (IgD), ε (IgE), γ (IgG), μ (IgM)• Subclasses: IgG1, IgG2, IgA1, etc.

• Descriptors (limiters) can provide additional information– Examples: bispecific, catalytic, labeled, neutralizing,

humanized, chimeric, etc.

29

Antibody controlled term indexing has changed over time in CAplus

30

Controlled Term Years of Coverage

Antibodies and Immunoglobulins/CT 2002 to present

Amboceptors/CT 1907-1946

Antibodies/CT 1907-2001

Globulins, immune/CT 1967-1976

Immunoglobulins/CT 1962-2001

Immunoglobulin (Ig) nomenclature can be used to focus on specific forms of interest

31

Search Question:Find records covering lambda light chain immunoglobulins. Are any of these IgGimmunoglobulins?

Identify uses of these substances.

Search Strategy

To find references on immunoglobulins...

Step 1. Search appropriate Ig termsStep 2. Refine with class, subclass, or chain

nomenclatureStep 3. Evaluate using D SCAN HITStep 4. (Optional) Refine with additional

concepts or CAS RolesStep 5. Display results

32

Tips for searching in CAplus

• Use BOTH single word immunoglobulin and antibody terms in the /IT field, especially to include records prior to 2002

• Use (L) proximity to add modifying terms• Specific subclass may be used but allow for

generic class for comprehensive results– For example, IgG2a is of most interest but IgG2

encompasses it and should be included • Supplement CAS Roles with text terms in the

modifier

33

Use SET commands to automatically add plurals and abbreviations

=> SET PLU ON; SET ABB ON; SET SPE ONSET COMMAND COMPLETED

=> S (IMMUNOGLOBULIN(L)LAMBDA)/IT11999 IMMUNOGLOBULIN/IT

143185 IMMUNOGLOBULINS/IT150880 IMMUNOGLOBULIN/IT

((IMMUNOGLOBULIN OR IMMUNOGLOBULINS)/IT)20421 IG/IT6316 IGS/IT25278 IG/IT

((IG OR IGS)/IT)156119 IMMUNOGLOBULIN/IT

((IMMUNOGLOBULIN OR IG)/IT)178328 LAMBDA

68 LAMBDAS178342 LAMBDA

(LAMBDA OR LAMBDAS)L1 1213 IMMUNOGLOBULIN/IT (L) LAMBDA

34

Adding antibody term retrieves additional answers

=> S ANTIBODY/IT (L) LAMBDA70328 ANTIBODY/IT

214281 ANTIBODIES/IT223724 ANTIBODY/IT

((ANTIBODY OR ANTIBODIES)/IT)178328 LAMBDA

68 LAMBDAS178342 LAMBDA

(LAMBDA OR LAMBDAS)L2 568 ANTIBODY/IT (L) LAMBDA

=> S L1 OR L2L3 1374 L1 OR L2

35

Alternative search using BOTH terms

=> S (ANTIBODY OR IMMUNOGLOBULIN)/IT (L) LAMBDAL4 1374 (ANTIBODY OR IMMUNOGLOBULIN)/IT (L) LAMBDA

=> S L4 (L) IGG?80343 IGG?

L5 58 L4 (L) IGG?

=> D HIT SCAN

L5 58 ANSWERS CAPLUS COPYRIGHT 2010 ACS on STN IT Antibodies and Immunoglobulins

RL: ADV (Adverse effect, including toxicity); BSU (Biological study, unclassified); DGN (Diagnostic use); BIOL (Biological study); USES (Uses)

(IgG, .lambda., .kappa.; gammopathy detected byserum protein electrophoresis for predicting andmanaging therapy of lymphoproliferative disorder in liver transplant recipients)

HOW MANY MORE ANSWERS DO YOU WISH TO SCAN? (1):END

36

Both Lambda and Kappa forms are indexed.

Remove the kappa light chain entries from the index term search

37

=> S L5 (NOTL) KAPPA70588 KAPPA

11 KAPPAS70594 KAPPA

(KAPPA OR KAPPAS)L6 32 L5 (NOTL) KAPPA

=> D SCAN HIT

L6 32 ANSWERS CAPLUS COPYRIGHT 2010 ACS on STN IT Antibodies and Immunoglobulins

RL: BSU (Biological study, unclassified); BIOL (Biological study)

(light chain, .lambda.; of IgG autoantibodies toTSH receptors in Graves disease of humans)

HOW MANY MORE ANSWERS DO YOU WISH TO SCAN? (1):2

Exclude Kappa from the index term search with (NOTL).

Index terms describe therapeutic applications

L6 32 ANSWERS CAPLUS COPYRIGHT 2010 ACS on STN IT Lymphoma

(B-cell, monoclonal antibodies to IgG1.lambda. paraprotein variable region epitopes in

diagnosis of human)IT Antibodies

RL: BIOL (Biological study)(monoclonal, IgG1.lambda. paraprotein variableregion epitopes recognition by, of human, lymphoid disease diagnosis in relation to)

L6 32 ANSWERS CAPLUS COPYRIGHT 2010 ACS on STN IT Immunoglobulins

RL: PRP (Properties)(mapping of .lambda. light chain epitopes for humanlupus IgG autoantibodies)

IT Protein sequences(of Ig .lambda. light chains of humans in relationto lupus IgG autoantibody binding)

HOW MANY MORE ANSWERS DO YOU WISH TO SCAN? (1):0

38

Agenda


in DGENE, REGISTRY, and CAplusSM



39

Sequence searching for CDRs

• Within the variable domains of the heavy and light chains, there are three hyperactivity regions called Complementarity Determining Region (CDR)

• Each chain contains three CDR regions• CDR is the region that recognizes the different

antigens– CDR is important when developing antibodies

40

BLAST and/or Sequence Code Match (SCM) can be used to retrieve CDRs

41

Search Question:The epithelial cell adhesion molecule (EpCAM) is a cell surface protein that is expressed by a variety of tumor cells. We have identified CDR sequence DMGWGSGWRP YYYYGMDV in our laboratory. Find all patent publications that disclose this sequence or similar sequences.

Search Strategy for DGENE, USGENE, PCTGEN and REGISTRY/CAplus

Step 1. RUN BLAST in USGENE, DGENE and PCTGEN using offline BATCH mode

Step 2. Repeat the search using CAS REGISTRY BLAST

Step 3. Retrieve, merge, organize by patent family and display USGENE, DGENE and PCTGEN results

Step 4. Retrieve, identify and display unique CAplus references from the REGISTRY BLAST search

Step 5. Post-process results into tables and reports

42

RUN BLAST searches in DGENE, USGENE and PCTGEN in offline BATCH mode

43

=> FILE DGENE; RUN BLAST DMGWGSGWRPYYYYGMDV/SQP -F F -E 20000 -W 2 -M PAM30 BATCH

PLEASE ENTER BATCH IDENTIFIER (MAX. 8 CHARS):EPCAMCDR

BATCH PROCESSING STARTED FOR EPCAMCDR

=> FILE USGENE; RUN BLAST DMGWGSGWRPYYYYGMDV/SQP -F F -E 20000 -W 2 -M PAM30 BATCH



=> FILE PCTGEN; RUN BLAST DMGWGSGWRPYYYYGMDV/SQP -F F -E 20000 -W 2 -M PAM30 BATCH



Tip: DGENE, USGENE and PCTGEN BLAST searches can be run in parallel using BATCH mode.

Add BATCH to the end of the command.

Adjust the REGISTRY BLAST settings for optimal search retrieval

44

Recommended BLAST settings for shortsequences (35 amino acids or less)1) Turn OFF (uncheck) the low complexity filter2) Increase Expectation Value to 20,0003) Decrease Word Size to 24) Choose the PAM-30 Weight Matrix

For more information about BLAST matrices, visit the NCBI web site.

Increase the maximum number of answers to 1,000.

http://www.cas.org/support/stnexp/seehow/expressblast.html

Retrieve references for sequences

45

Note: in this example, BLAST sequences with scores of 42 or more (60% match or more) are selected.

Retrieve REGISTRY BLAST results

46

=> FIL REGISTRY

=> QUE (1065745-06-3 OR 1067695-29-7 OR 1065745-11-0 OR . . .

L1 QUE (1065745-06-3 OR 1067695-29-7 OR 1065745-11-0 OR . . .

=> QUE (862861-57-2 OR 487483-96-5 OR 215027-98-8 OR . . .

L2 QUE (862861-57-2 OR 487483-96-5 OR 215027-98-8 OR . . .

=> QUE (1089240-08-3 OR 1074003-68-1 OR 960549-36-4 OR . . .

L3 QUE (1089240-08-3 OR 1074003-68-1 OR 960549-36-4 OR . . .

=> S L1 OR L2 OR L3L4 36 L1 OR L2 OR L3

36 similar sequences (L4) with BLAST scores of 42 or more.

Transfer BLAST sequences with scores of 42 or more (60% match or more).

Commands within the dotted box are automatic commands.

Tip: BLAST is better than SCM for searching short sequences for less than 100% match

47

=> FIL REG

=> S DMGWGSGWRPYYYYGMDV/SQSFPL5 20 DMGWGSGWRPYYYYGMDV/SQSFP

=> S L5 NOT L4L6 0 L5 NOT L4

=> DEL L5-L6 Y

Subsequence family protein search (/SQSFP) (L5), allows for amino acid family substitution.

REGISTRY BLAST (L4) retrieved 16 extra sequences with 60% or higher match by score that were not retrieved with /SQSFP (L5).

Retrieve the DGENE, USGENE and PCTGEN in BLAST search results

48

=> FILE DGENE; RUN GETBATCH EPCAMCDR. . . .

ENTER (ALL) OR ? :60%L5 RUN STATEMENT CREATEDL5 32 DMGWGSGWRPYYYYGMDV/SQP.-F F -E 20000 -W 2 -M PAM30

=> FILE USGENE; RUN GETBATCH EPCAMCDR. . . .


=> FILE PCTGEN; RUN GETBATCH EPCAMCDR. . . .


DGENE, USGENE and PCTGEN BLAST searches are retrieved with the RUN GETBATCH command.

Merge and review the DGENE, USGENE and PCTGEN in BLAST search results

49

=> DUP IDE L5 L6 L7

FILE 'DGENE' ENTERED AT . . .COPYRIGHT (C) 2010 THOMSON REUTERS

FILE 'USGENE' ENTERED AT . . .COPYRIGHT (C) 2010 SEQUENCEBASE CORP

FILE 'PCTGEN' ENTERED AT . . .COPYRIGHT (C) 2010 WIPO

L8 61 DUP IDE L5 L6 L7 (INCLUDES 0 SETS OF DUPLICATES)ANSWERS '1-32' FROM FILE DGENE ANSWERS '33-49' FROM FILE USGENE ANSWERS '50-61' FROM FILE PCTGEN

=> SOR SCORE D AN DPROCESSING COMPLETED FOR L8 L9 61 SOR L8 SCORE D AN D

The multi-file answers (L8) can be into descending BLAST score order (L9).

Use the DUP IDE command to merge the results into a single multi-file L-number (L8).

Review the BLAST search results

50

=> D TRI SCORE ALIGN FROM EACH

L9 ANSWER 1 OF 61 DGENE COPYRIGHT 2010 THOMSON REUTERS on STN AN ATS30301 protein DGENETI New polypeptide comprises a binding domain capable of binding to an

epitope of human and non-chimpanzee primate CD3 epsilon chain, useful for preventing, treating, or ameliorating a proliferative, tumor, or immunological disorder.

DESC Anti-CD3/anti-EpCAM cross-species single chain Ab protein, SEQ: 592.KW single chain antibody; CD3E; T-cell CD3 glycoprotein epsilon chain;

TACSTD1; EpCAM; protein production; protein therapy; therapeutic; prophylactic to disease; protein detection; immune disorder; immunomodulator; cancer; cytostatic; hyperproliferation; Fusion protein.

SQL 504SCORE 70 100% of query self score 70BLASTALIGN

Query = 18 lettersLength = 504Score = 69.8 bits (157), Expect = 5e-18Identities = 18/18 (100%), Positives = 18/18 (100%)Query: 1 DMGWGSGWRPYYYYGMDV 18

DMGWGSGWRPYYYYGMDVSbjct: 99 DMGWGSGWRPYYYYGMDV 116

Example: displaying the best answer from each database in a free-of-charge review format.

Review the BLAST search results (cont.)

51

L9 ANSWER 21 OF 61 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on STN TI Method of identifying binding site domains that retain the capacity

of binding to an epitope (Patent)DESC Homo Sapiens Protein; sequence 54 of 77MTY ProteinSQL 127SCORE 70 100% of query self score 70BLASTALIGN



L9 ANSWER 25 OF 61 PCTGEN COPYRIGHT 2010 WIPO on STN TI CROSS-SPECIES-SPECIFIC BINDING DOMAINMTY PRTSQL 504SCORE 70 100% of query self score 70BLASTALIGN



Use the STN Express 8.4 Patent Family Manager wizard display the results

52

Access the patent family manager wizard from the Discover! Menu.

Choose a bibliographic display format with alignment for the first (best) hit, and a free-of-charge format with alignment for the rest of the sequences in each patent family group.

http://www.stn-international.com/stn_express_pat_fam_manage.html

http://www.stn-international.com/stn_express_pat_fam_manage.html

The patent family manager begins by organising the results using FSORT...

53

=> FSORT L9. . . .L10 61 FSO L9

11 Multi-record Families Answers 1-60Family 1 Answers 1-18Family 2 Answers 19-20Family 3 Answers 21-22Family 4 Answers 23-25Family 5 Answers 26-27Family 6 Answers 28-29Family 7 Answers 30-34Family 8 Answers 35-42Family 9 Answers 43-56Family 10 Answers 57-58Family 11 Answers 59-60

1 Individual Record Answer 610 Non-patent Records

In this example, 12 patent family groups (i.e. 11 + 1) are retrieved.

Commands in RED are those issued automatically by the STN Express Patent Family Manager.

FSORT organizes the patent sequence records by Publication, Application, Related, and Priority numbers.

...and then continues by displaying the family groups in the specified formats

54

=> DIS L10 PFAM=4 1 BIB,PSL,SQL,SCORE,ALIGN

L10 ANSWER 23 OF 61 DGENE COPYRIGHT 2010 THOMSON REUTERS on STN FAMILY4AN AEC16143 protein DGENE Full-textTI Treating tumorous disease, such as breast, colon, prostate, liver,

skin, ovarian, cervical and lung cancer, by administering a human immunoglobulin specifically binding to the human epithelial cell adhesion molecule (EpCAM) antigen.

IN Peters M; Locher M; Prang N; Quadt CPA (MICR-N) MICROMET AG.PI US 20050180979 A1 20050818 23AI US 2004-778915 20040213PRAI US 2004-778915 20040213LA EnglishOS 2005-590351 [60]DESC Human anti-EpCAM immunoglobulin heavy chain, SEQ ID NO: 1.PSL Claim 12; SEQ ID NO 1SQL 457SCORE 70 100% of query self score 70BLASTALIGN



Commands in RED are those issued automatically by the STN Express Patent Family Manager.

...and then continues by displaying selected results in the specified formats (cont.)

55

=> DIS L10 PFAM=4 2-TOT TRIAL,SCORE,ALIGN

L10 ANSWER 24 OF 61 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on STNFAMILY4TI Anti-EpCam Immunoglobulins (PublishedApplication)DESC Artificial Protein; Anti-EpCAM Heavy Chain; sequence 1 of 2MTY ProteinSQL 457SCORE 70 100% of query self score 70BLASTALIGN



L10 ANSWER 25 OF 61 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on STNFAMILY4TI Anti-EpCAM immunoglobulins (PublishedApplication)MTY ProteinSQL 457SCORE 70 100% of query self score 70BLASTALIGN



These two USGENE hits are in the same family as the DGENE record on the previous slide.

...and then continues by displaying selected results in the specified formats (cont.)

56

=> DIS L10 61 BIB,PSL,SQL,SCORE,ALIGN

L10 ANSWER 61 OF 61 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on STN AN 20070081993.104 Peptide USGENE Full-textTI Pharmaceutical composition comprising a bispecific antibody for epcam

(PublishedApplication)IN Kufer Peter (Moosburg, DE); Berry Meera (Ulm, DE); Offner Sonja

(Munich, DE); Brischwein Klaus (Munich, DE); Wolf Andreas (Planegg, DE)

PA NO ASSIGNEE AT PUBLICATIONPI US 20070081993 A1 20070412AI US 2004-554851 20040526RLI WO 2004-EP5687 20040526DT PatentSQL 18SCORE 70 100% of query self score 70BLASTALIGN



This USGENE record is the “individual record” in the FSORT answer set (L10).

Retrieve and identify any unique CAplus references by the REGISTRY search

57

=> FILE HCAPLUS

=> TRA L10 PNL11 TRANSFER L10 1- PN : 22 TERMSL12 16 L11ALL TERMS IN L11 RETRIEVED.

=> S L4L13 14 L4

=> S L13 NOT L12L14 1 L13 NOT L12 In this example, one additional relevant

reference has been found by including a REGISTRY/CAplus search (L14).

Transfer Publication Numbers (PN) from DGENE, USGENE and PCTGEN (L10) to find corresponding HCAplus records (L12).

The 36 REGISTRY records (L4) correspond to 14 HCAplus records (L13).

Display any unique CAplus references retrieved in the REGISTRY search

58

=> D L14 BIB ABS HITRN

L14 ANSWER 1 OF 1 HCAPLUS COPYRIGHT 2010 ACS on STN AN 2008:1210508 HCAPLUS Full-textDN 149:446284TI Cross-species bispecific single-chain antibodies to human or non-

chimpanzee primate CD3e and surface antigen for treating tumorous, proliferative, or immunological disease and cancer

IN Ebert, Evelyn; Meier, Petra; Sriskandarajah, Mirnaalini; Burghart, Elke; Wissing, Sandra; Klinger, Matthias; Bluemel, Claudia; Raum, Tobias; Rau, Doris; Mangold, Susanne; Kvesic, Majk; Kischel, . . . .Hausmann, Susanne; Riethmueller, Gert

PA Micromet A.-G., GermanySO PCT Int. Appl., 397pp.

CODEN: PIXXD2DT PatentLA EnglishFAN.CNT 3

PATENT NO. KIND DATE APPLICATION NO. DATE--------------- ---- -------- -------------------- --------

PI WO 2008119565 A2 20081009 WO 2008-EP2662 20080403. . . .

Retrieve, identify and display unique CAplus references from the REGISTRY search

59

AB The present invention provides polypeptides comprising an antibody binding domain capable of binding to an epitope of human and non-chimpanzee primate CD3 e-chain fused to a cell surface antigen selected from epidermal growth factor receptor, epidermal growth factor receptor variant III, melanoma chondroitin sulfateproteoglycan, carbonic anhydrase IX, CD30, CD33, CD44 variant 6, EpCAM, Her2/neu, MUC1, and IgE. An N-terminal 1-27 amino acid residues polypeptide fragment of the extracellular domain of human CD e-chain was identified which - in contrast to all other known epitopesof CD3 e - maintains its 3-dimensional structural integrity when taken out of its native environment in the CD3 complex. . . . . Further, the invention provides a method for the identification of polypeptides comprising a cross-species specific binding domain capable of binding to an epitope of human and non-chimpanzee primate CD3 e.

IT 1067695-29-7 1067695-33-3 1067695-35-51067695-37-7 1067695-39-9RL: PRP (Properties); THU (Therapeutic use); BIOL (Biological study); USES (Uses)

(amino acid sequence; cross-species bispecific single-chain antibodies to human or non-chimpanzee primate CD3e and surface antigen for treating tumorous, proliferative, or immunol. disease and cancer)

It is important to include the HITRN or HITIND display for post-processing.

STN Express post-processing tools provide the finishing touches to the multi-file search

1) DGENE, USGENE and PCTGEN results (L10) can be conveniently tabulated using the STN Express Table Tool and exported to a Microsoft Excel worksheet

2) The REGISTRY “BLAST Report with Alignment data” tool merges BLAST alignments with corresponding unique CAplus records (L14) to form a single RTF file

60

1)2)

1) DGENE, USGENE and PCTGEN results can be tabulated and exported to Excel

61

Preferred fields, fonts, labels, etc, can be saved as a Templatefor repeated re-use.

Once in Excel there are various options to sort, filter and review the multi-file results

62

Some tips for Microsoft Excel:• Resize columns and rows as desired –

especially the BLAST alignment column to approx 77

• View, Freeze panes – holds the top row fixed when scrolling down

• Add Filters – provides a great way to navigate results – for example by BLAST score (above)

2) REGISTRY BLAST Alignments can be merged with corresponding CAplus records

63

Preferred fields, fonts, labels, etc, can be saved as a Templatefor repeated re-use.

REGISTRY BLAST alignments are merged with the CAplus records in the transcript.

The REGISTRY “BLAST Report with Alignment data” tool provides an RTF file

64

Patent classifications for antibody topics

65

USCL IPC8 Class Subclass Subclass Group Topic

530 387.1+ C07K 16/00+ Immunoglobulins, Antibodies

530 388.1+ C12P 21/08+ Monoclonal Antibodies

530 391.1+ C07K 16/00+ Conjugated antibodies

424 1.49 A61K 49/00+ Compositions containing radio-labeled antibodies

424 130.1 A61K 39/395+ Compositions for body treatment containing antibodies (therapeutics, vaccines)

435 7.1+ G01N 33/53+ Immunoassays using antibodies

435 188 C12N 09/96+ Antibodies conjugated to enzymes

435 188.5 C07K 16/00+ Catalytic antibodies

435 325+ C12N 5/00+ Cells that express antibodies (fused, recombinant)

525 54.1 A61K 47/48+ Antibodies bound to resins

Derwent World Patents Index® (DWPISM) Manual Codes (/MC) for antibody topics

66

http://scientific.thomson.com/cgi-bin/mc/search.cgi

More than 120 Manual Codes (/MC) are available for antibody searching.

http://scientific.thomson.com/cgi-bin/mc/search.cgi

http://scientific.thomson.com/cgi-bin/mc/displaycode.cgi?code=B04-B04C7

Summary

• DGENE provides detailed annotations and indexing for text searching for antibody technologies

• REGISTRY provides extensive annotations, and common, trade, generic, and lab antibody names

• CAS controlled and index terms are all useful for retrieving antibody information in CAplus – Use text terms to search for types of antibodies in CAplus

• Class (α (IgA), δ (IgD), ε (IgE), γ (IgG), μ (IgM))

• BLAST and SCM searches are available for antibody sequence searching in DGENE, USGENE, PCTGEN and REGISTRY on STN

67

Resources

• Archived e-Seminars:www.cas.org/support/stngen/stntraining/recorded.html– Unmasking the World of Antibodies in CAS REGISTRY – Finding Antibodies and Immunoglobulins – Sequence motif searching on STN

• STN User Documentation:www.cas.org/support/stngen/stndoc/sequences.html– Quick Reference Guides

• CAS REGISTRY: BLAST similarity searching via STN Express • CAS REGISTRY: Exact and pattern searching of nucleic acid

sequences • CAS REGISTRY: Exact and pattern searching of protein

sequences

• Sequence Searching on STN public workshopwww.stn-international.com/sequence_searching.html

68

Searching for antibody information on STN®

www.stn-international.com

Searching for antibody information on STN - SequenceBase · and non-patent documents • Sequence...

Documents

Transcript of Searching for antibody information on STN - SequenceBase · and non-patent documents • Sequence...