Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics,...

Supporting the Research Process

The NaCTeM Text Mining Service

William BlackInformatics, Manchester

Contents

• What is Text Mining/What is NaCTeM?• Approaches/Methods• Text Mining Tasks

– IE, Argumentative Zoning, Terminology Discovery

• End-user services for researchers• NaCTeM activities with social scientists

What is Text Mining?

• Knowledge discovery from textual sources– Primary sources

• Documents, News, Web

– Scientific Literatures

• Using NLP, Ontologies, IR on a large scale

What is the Text Mining Centre? http://www.nactem.ac.uk

• Established in 2004 in response to a JISC/EPSRC/BBSRC initiative

• A Manchester and Liverpool collaboration– Formerly also UMIST, Salford – Accommodated in the Manchester Interdisciplinary

Biocentre (MIB)

• Develop a variety of national services based on the application to biological sciences, with deployment from Autumn 2006

• Initially in biological sciences, with a second focus on social science during 2006-7

Text Mining - Approaches

• Distinguished from IR by semantic analysis leading to extraction of entities, facts, events, not mere documents.

• Distinguished from the Semantic Web by use of automated analysis based on robust natural language processing.

• A wide variety of methods and analyses ranging from domain-independent to domain-specific.

Methods of Text Mining

• Pipelined processes performing increasing levels of analysis common to all approaches– Document structure analysis, tokenization,

tagging, phrasal chunking, named entity recognition/classification, fact and event extraction.

– Indexed to provide conceptual IR services

Sample text mining sub-tasks

• Named entity recognition and classification.• Terminology discovery and ontology

maintenance• Information extraction (IE) in limited domains -

for intelligence analysts and scientists• Summarization - informative, tailored,

multilingual, multi-document• Open-domain IE and QA• Association mining over databases of extracted

facts.

Illustrations of IE on successive full-page screenshots

• Named entity phrase bracketing

• Named entity extraction

• Fact extraction and slot filling

• An application to a research literature

Terminology Discovery - Ananiadou, NaCTeM

• A form of unsupervised learning, whose only required resource is a general purpose PoS tagger.

• Can be applied to text in any language, domain or genre to reveal terminology on the basis of phrasehood and distribution.

• TerMine will be among the first deployed NaCTeM tools.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Argumentative ZoningSimone Teufel, Cambridge Computing Lab

• BKG: General scientific background (yellow)• OTH: Neutral descr’s of others’ work (orange)• OWN: Neutral descr’s of own, new work (blue)• AIM: Stmts of particular aim of current paper (pink)• TXT: Stmts of textual org. of current paper (red)• CTR: Contrastive or comparative stmts incl. explicit

mention of weaknesses of other work (green)• BAS: Stmts that own work is based on other work

(purple)

Argumentative Zoning Example

End-user services based on full NLP and conceptual indexing

• Two conceptual IR services based on prior full-scale NLP analysis of Medline at Tsujii Lab, University of Tokyo

– InfoPubMed: A complex tool supporting a research

workflow for literature review and knowledge

discovery/hypothesis generation

– Medie: A simple IR interface as intuitive as

Google, but returning fact-bearing sentences,

which are more than document surrogates.

Gene/gene productsyou are interested in

Fields

By clicking this button,you can restrict search fields

By clicking this button,you can restrict species.

GeneBoxes

Drag this GeneBox to the Interaction Viewer

Drag this InteractionBoxto ContentViewer

Sentence Box

Property which means the co-occurrenceIn the sentence is a direct evidence of interaction

Property which means the co-occurrenceIn the sentence is a mere co-occurrence

Possible end-user service based on AZ

More than Google’s PageRank™, because the links are typed.

NaCTeM and Social Science/Humanities

• In Year 3 (from Oct 2006), develop pilot service aimed at social science.

• Local links with NCESS• Preparatory invited workshop held in May,

2006.• Text-mining and Digitised C19th Research

Resources Workshop with British Library

Workshop on Text Mining in Social SciencesPresentations available at NaCTeM Web page

– Bridging qualitative and quantitative methods for social sciences using text mining techniques (Sophia Ananiadou)

– Text Mining Activities at the National Centre (Sophia Ananiadou, Jun-ich Tsujii, Paul Watry)

– Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD (Louise Corti)

– Author Identification (Katerina T. Frantzi) – Sentiment Analysis and Financial Grids (Lee Gillam) – Concordances and semi-automatic coding in qualitative analysis:

possibilities and barriers (Graham R. Gibbs) – Bridging quantitative and qualitative methods for social sciences using

text mining techniques (Tetsuya Nasukawa) – Computer-Assisted Content Analysis (Andrew Wilson)

NaCTeM status

• NaCTeM is almost at the end of its tool development phase

• Moving to deployment of services this Autumn

• Will include domain-independent terminology management from the outset

• Other applications of interest to social science researchers will be appearing approx. 1 year from now.

Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics,...

Documents

Transcript of Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics,...

MANCHESTER Cottoncity...Mobile : 98658 11000, 98658 22000 E-mail : manchester@cottoncity.in from the makers of MANCHESTER TOWERS MANCHESTER SQUARE MANCHESTER PERKS MANCHESTER PERKS

Classes and Instances Alan Rector With CO-ODE and NIBHI University of Manchester rector@cs.man.ac.uk OpenGALEN BioHealth Informatics Group © University.

Review of OWL for Biomedicine - II Alan Rector & CO-ODE/NIBHI University of Manchester Rector@cs.man.ac.uk OpenGALEN BioHealth Informatics Group © University.

Clinical Research Informatics - CDM Media · Clinical Research Informatics Biomedical Informatics Clinical Informatics Bio (Molecular) Informatics Nursing Informatics Dental Informatics

Sheffield - Manchester and Manchester Airport

OpenGALEN Slide No.: 1 Introduction to Clinical Terminology and Classification AL Rector OpenGALEN CO-ODE The Medical Informatics Group, U of Manchester.

Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG,

Research-Derived Web Design Guidelines for Older People · Research-Derived Web Design Guidelines for Older People Sri Kurniawan School of Informatics, The University of Manchester

Capturing Gender and Class Inequities: The CCTVisation of Delhi: …hummedia.manchester.ac.uk/institutes/gdi/publications/... · 2019-09-26 · Manchester Centre for Development Informatics

© University of Manchester 1 Patterns: Part-whole relations Short version OpenGALEN BioHealth Informatics Group © University of Manchester.

Communications in Computer and Information Science 795978-3-319-90596-9/1.pdf · Riza Batista-Navarro NaCTeM University of Manchester, UK Nicolas Béchet IRISA Université de Bretagne-Sud,

‘Active & Healthy Ageing’ - Manchester Informatics · and resource base for ageing research, ... (e.g. fRaill; EWL) and policy development ... band: mid-2001 to Census 2011 -10-5.

Modelling Biological Knowledge with OWL Robert Stevens and Georgina Moulton Bio-Health Informatics Group School of Computer Science University of Manchester.

Info-PubMed User Guide University of Tokyo, JAPAN NaCTeM, UK.

The Manchester Writing School at Manchester …...1 The Manchester Writing School at Manchester Metropolitan University presents: The Manchester Writing Competition 2019 Manchester

Manchester Airport - Manchester - Kirkby/Southport Updated ... · Manchester Airport d - - - Heald Green - - - Manchester Piccadilly - - - Manchester Oxford Road - - - Deansgate -

What is Medical Informatics? - Informatics

Terminologies & Ontologies? - University of Manchester · 2006. 9. 14. · rector@cs.man.ac.uk Dr Jeremy Rogers Senior Clinical Fellow in Health Informatics Northwest Institute of

Evaluatinghow Blockchain can transform the … · Christopher Hart (Manchester Informatics) What is BlockChain § Decentralised database sharing and storing registry of assets and

CANAL - Manchester Tours | Tours of Manchester | Tours Manchester