2.6 Data Capture and Analysis Tools MET and CAI

12
28-29 Oct 2010 35th ILO Meeting 1 International Atomic Energy Agency 2.6 Data Capture and Analysis Tools 2.6 Data Capture and Analysis Tools MET and CAI MET and CAI Alexander Nevyjel Head, Content Management Group INIS & NKM Section 35th Consultative Meeting of INIS Liaison Officers 28 - 29 October 2010, Vienna, Austria

description

2.6 Data Capture and Analysis Tools MET and CAI. 35th Consultative Meeting of INIS Liaison Officers 28 - 29 October 2010, Vienna, Austria. Alexander Nevyjel Head, Content Management Group INIS & NKM Section. Data Capture and Analysis Tools. MET - Metadata Extraction Tool - PowerPoint PPT Presentation

Transcript of 2.6 Data Capture and Analysis Tools MET and CAI

Page 1: 2.6 Data Capture and Analysis Tools MET and CAI

28-29 Oct 2010 35th ILO Meeting 1

International Atomic Energy Agency

2.6 Data Capture and Analysis Tools2.6 Data Capture and Analysis ToolsMET and CAIMET and CAI

Alexander NevyjelHead, Content Management Group

INIS & NKM Section

35th Consultative Meeting of INIS Liaison Officers28 - 29 October 2010, Vienna, Austria

Page 2: 2.6 Data Capture and Analysis Tools MET and CAI

28-29 Oct 2010 35th ILO Meeting 2 International Atomic Energy Agency

Data Capture and Analysis ToolsData Capture and Analysis Tools

• MET - Metadata Extraction Tool

• CAI – Computer-assisted Indexing

• CAI batch

• CAI online

Page 3: 2.6 Data Capture and Analysis Tools MET and CAI

28-29 Oct 2010 35th ILO Meeting 3 International Atomic Energy Agency

Metadata Extraction Tool - MET Metadata Extraction Tool - MET

Objective:

to automate INIS record creation from electronic

documents in PDF format

• Capture text from original (full) text (PDF)

• Reformat content according INIS input rules

• Verify against INIS authorities

• Produce bibliographic files in TTF and/or XML

• Export bibliographic files plus PDFs

Page 4: 2.6 Data Capture and Analysis Tools MET and CAI

28-29 Oct 2010 35th ILO Meeting 4 International Atomic Energy Agency

MET - MilestonesMET - Milestones

• Prototype development October 2007

• MET version 1.0 final acceptance December 2007

• Specifications Version 2 December 2007

• MET version 2 final acceptance August 2008

• Specifications Version 3.0, 3.1, 4.0 October 2008

• Development Version 3.0 Aug-Dec 2009

• MET version 3.0 final acceptance December 2009

• Development Version 3.1 Sep-Dec 2010

• Version 4 (for Member States) planned 2011

Page 5: 2.6 Data Capture and Analysis Tools MET and CAI

28-29 Oct 2010 35th ILO Meeting 5 International Atomic Energy Agency

MET – implementation planningMET – implementation planning

• MET version 2.0in operation August 2008

• MET version 3.0functionality improvements in operation December 2009

• MET version 3.1XML compliancein development planned for 2010

• MET version 4.0remote usage for and/or distribution to Member Statessubject to resources available

planned for 2011/2012

Page 6: 2.6 Data Capture and Analysis Tools MET and CAI

28-29 Oct 2010 35th ILO Meeting 6 International Atomic Energy Agency

Metadata Extraction Tool – METMetadata Extraction Tool – MET

• Five staff members working with MET since January 2008

• >9000 records produced, entered to the database

• Significant handling improvements, data capture facilitated

• Validation rules (basic set) implemented; final check by FIBRE and production system

• Advanced validation rules (FIBRE level) planned for phase 3 + 4

• Availability for Member states: when version 4 completed and consolidated (expected by end 2012)

Page 7: 2.6 Data Capture and Analysis Tools MET and CAI

28-29 Oct 2010 35th ILO Meeting 7 International Atomic Energy Agency

CAI for Member StatesCAI for Member States

• CAI batch(since 2005)• MS prepare FIBRE file

(without descriptors)

• MS send file to INIS

• INIS process CAI batch

• File sent back to MS

• Review process FIBRE

• Send final file to INIS to production system

• CAI online(since 2008)• MS prepare FIBRE file

(without descriptors)

• MS send file to INIS

• INIS load file on CAI online

• Review process online

• Export file to production system

Page 8: 2.6 Data Capture and Analysis Tools MET and CAI

28-29 Oct 2010 35th ILO Meeting 8 International Atomic Energy Agency

CAI ProcessingCAI ProcessingReviewing ProcessReviewing Process

• Delete all suggested descriptors which are too general

• Add relevant descriptors which were not found • numerical values, e.g. pressure ranges, temperature ranges,

etc

• nuclear reactions

• chemical compounds, alloys, etc.

• clean up BT/NTs from manual additions

• Clean up suggestions from homographic terms

Page 9: 2.6 Data Capture and Analysis Tools MET and CAI

28-29 Oct 2010 35th ILO Meeting 9 International Atomic Energy Agency

CAI Batch Processing StatisticsCAI Batch Processing Statistics2005 – Sep 20102005 – Sep 2010

  2005 2006 2007 2008 2009 2010/1-9 Total

AR 141 4 53   10   208

AU 224           224

BG 32   199 151 79 54 515

CN 299 2319 2314 2959 3883 3009 14783

DE 363 644 1019 879 744 762 4411

ET     13017 9186 6551 5559 34313

FR 138 721         859

JP 11     32 10 121 174

LT   39 69       108

MY 133 270 205 112 96   816

US   97 46       143

UZ 359 396 43   179   977

VN 8 16   83 94 114 315

others 306 105     244 112 767

Total 2014 4611 16965 13402 11890 9731 58613

Page 10: 2.6 Data Capture and Analysis Tools MET and CAI

28-29 Oct 2010 35th ILO Meeting 10 International Atomic Energy Agency

CAI Online Processing StatisticsCAI Online Processing Statistics2008 – Sep 20102008 – Sep 2010

  2008 2009 2010/1-9 Total

AR 36 451 159 646

AU       0

BG   10 76 86

BR 314 84 544 942

CU   10 63 73

CZ   175   175

EG   16   16

JP 59 516 166 741

MX   12 136 148

NL   69   69

NO   113 27 140

UY 65 96 119 280

UZ   54 80 134

others   117   117

Total 474 1723 1370 3567

Page 11: 2.6 Data Capture and Analysis Tools MET and CAI

28-29 Oct 2010 35th ILO Meeting 11 International Atomic Energy Agency

CAI Online Processing StatisticsCAI Online Processing Statistics2008 – Sep 20102008 – Sep 2010

0

100

200

300

400

500

600

BR JP AR UY CZM

XNO UZ

BG CU NLEG

other

s

2008 2009 2010/1-9

Page 12: 2.6 Data Capture and Analysis Tools MET and CAI

28-29 Oct 2010 35th ILO Meeting 12 International Atomic Energy Agency

CAI for Member StatesCAI for Member States

• CAI batch• China• Germany• Uzbekistan• Malaysia• Bulgaria• Viet Nam• ETDE

• CAI online• Argentina• Brazil• Bulgaria• Cuba• Czech Republic• Iran• Japan• Mexico• Switzerland• Uruguay• Uzbekistan

CAI online and CAI batch are now regular CAI online and CAI batch are now regular services for Member Statesservices for Member States