TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining...
Transcript of TEXT MINING: THE NEXT DATA FRONTIER · 2016-05-11 · To document and classify text mining...
TEXT MINING:
THE NEXT DATA FRONTIER An Infrastructural Approach
@openminted_eu
Dr. Petr Knoth CORE (core.ac.uk)
Knowledge Media institute, The Open University United Kingdom
OpenMinTeD Establish an open and sustainable Text and
Data Mining (TDM) platform and infrastructure
where researchers can collaboratively create,
discover, share and re-use knowledge from a
wide range of text based scientific and
scholarly related sources.
2
beyond Open Access MAKING SENSE OF
LARGE VOLUMES OF SCIENTIFIC CONTENT
3
The phases of text mining
@openminted_eu
NLP Analysis
Entity
Recognition
Data Mining
Knowledge
Discovery
Information
Extraction
STAGE 1 STAGE 2 STAGE 3 STAGE 4
Information
Retrieval
OPENMINTED -The Open Mining Infrastructure for Text and Data
TDM challenges for researchers
1. Content challenges - Barriers and obstacles due to non-availability,
technical restrictions, copyright law or licensing
issues
- No uniform way to search for, retrieve and
access content for TDM
@openminted_eu
OPENMINTED - The Open Mining Infrastructure for Text and Data
TDM challenges for researchers
2. Services challenges How to identify the most fitting TDM service?
How to combine with other TDM services I have
access to? How to use them on my content?
@openminted_eu
OPENMINTED - The Open Mining Infrastructure for Text and Data
TDM challenges for researchers
3. Processing challenges
Where to deploy? Are my machines powerful enough?
How can I get access to powerful machines?
Where to store intermediate and final results?
How to ensure persistence of storage?
@openminted_eu
OPENMINTED - The Open Mining Infrastructure for Text and Data
OpenMinTeD – Provides solutions
an open and sustainable TDM
infrastructure where researchers can
collaboratively create, discover, share and
re-use knowledge from a wide range of text
based scientific-related sources.
@openminted_eu
OPENMINTED - The Open Mining Infrastructure for Text and Data
OpenMinTeD – working on many fronts
@openminted_eu
10
ACCESSIBLE
CONTENT
DISCOVERABLE
SERVICES
EFFICIENT
PROCESSING
RESEARCH
COMMUNITIES
VALUE ADDED
APPS
Via standardised programmatic interfaces
Well-documented easily discoverable text mining services and workflows which process, analyse and annotate text
Operate on public e-Infrastructures via standarized APIs
Different scientific communities have different challenges
Community-driven applications to illustrate the value of the infastructure. Engage with industry.
OPENMINTED - The Open Mining Infrastructure for Text and Data
The project Started: June 2015
Duration: 3 years
Budget of: €6 million
Grant of: €5.3 million
16 Partners:
- 6 mining research groups
- 3 content providers
- 1 data center
- 1 library association
- 2 legal experts
- 6 community related partners
- 2 SMEs
Athena RIC Univ. of Manchester (NacTem) Univ. of Darmstadt INRA EMBL-EBI Agro-Know LIBER Univ. of Amsterdam Open University UK (CORE) EPFL CNIO Univ. of Sheffield (GATE) GESIS GRNET Frontiers Univ. of Stirling
PARTNERS
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
The OpenMinTeD landscape
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
Infrastructural approach
OpenMinted does not build
new services, but adopts
and adapts existing services
for new communities
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
Infrastructural approach
Focuses on interoperability
across text mining services
and content provision outlets
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
Infrastructural approach
Creates and an Open & collaborative space for
researchers to use the best fitting text mining services available building on the
cloud computing philosophy
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
@openminted_eu
Data centre Data centre Data centre Data centre
in public cloud
Publisher text corpus
OpenAIRE/CORE text corpus
PMC text corpus
Other text corpora
Other text corpora
Other text corpora
Other types of text corpora
Layer 3:
Interoperability
to shared storage and
computing resources
Language resources Language resources
Language resources Language resources
Layer 2:
Interoperability of
language resources
& corpora
Layer 1:
Interoperability
of text mining services
(platforms or
components)
Language resources and corpora registry service
Platform services
Users: researchers, curators, text-miners and new services developers
Registry Workflow Management Auth2 & Policy management Annotator Accounting
Mining Platforms Mining Platforms Mining Platforms
Proprietary architectures
Mining Platforms
OPENMINTED = The Open Mining Infrastructure for Text and Data
Overview
Interoperability framework
Bringing together mining tools, resources and content
1. Content metadata & transfer standards
To document scientific literature, language resources, taxonomies and provenance as well as transfer protocols for full text retrieval
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
Interoperability framework
Bringing together mining tools, resources and content
2. Service metadata & pipelining
To document and classify text mining services, how they receive input, in what form they output their results, how they combine for workflows, what granularity to consider.
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
Interoperability framework
Bringing together mining tools, resources and content
3. IPR and licensing
To study IPR restrictions, describe license metadata for re-use, for content and TDM services & tools, and information on how to apply for academic and non-commercial mining research
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
OpenMinTeD users
1. End users
- Researchers, data base curators, …
- Novice: use services to advance their science
- Advanced: use TDM services into complex workflows
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
OpenMinTeD users
2. Content and service providers
- Publishers, libraries, scientific data base centres, …
- TDM researchERS
- SME’s
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
@openminted_eu
RESEARCH
ANALYTICS
SOCIAL
SCIENCES
AGRICULTURE LIFE
SCIENCES
Bottom-up approach OpenMinTeD works with 4 use cases, which give their requirements and evaluate the results.
OPENMINTED = The Open Mining Infrastructure for Text and Data
Openminted use case 1
Scholarly communication analytics •Semantic search and discovery of open
scientific outcomes
•Map of academia – scholarly
communication network
•Research monitoring and analytics
Partners CORE/OU, OpenAIRE/ARC, Frontiers
2
4
@openminted_eu
Openminted use case 2
Life sciences •Assisted curation of the EMBL-EBI chemical
databases for metabolomics
•Curation of the neurosciences resources
KnowledgeBase and Neurolex
Partners EBI - Metabolomics, Human brain project
2
5
@openminted_eu
Openminted use case 3
Agriculture and biodiversity •Enrich agricultural databases to assist food- and
water-borne disease outbreak alerts and product
recalls
•Image, figure and dataset discovery in the
AGRIS
Partners INRA, AGRO-KNOW
2
6
@openminted_eu
Openminted use case 4
social sciences Develop and evaluate methods for the automatic
detection and linking of named entities, citation
traces and intentions in social science scientific
publications
Partners GESIS
2
7
@openminted_eu
What can OpenMinTeD do for you?
Are you a content provider?
make your content available for mining
Register your collections in the
OpenMinTeD registry and let others discover it
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
What can OpenMinTeD do for you?
Are you a TDM service provider?
share and collaborate with other TDM services
Register your TDM service in the
OpenMinTeD registry and let others discover it.
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
What can OpenMinTeD do for you?
Are you a text miner/research who can benefot from text-mining?
Use OpenMinTeD (when launched)
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
Conclusions
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
- The ability to text-mine research literature at scale can redefine the way we do research
- OpenMinTeD is laying the groundwork (interoperability) and building the cloud infrastructure for text-mining research literature
- Building an open, transparent infrastructure that is enabling others to participate
Contact us
www.openminted.eu
3
2
twitter.com/openminted_eu
facebook.com/openminted
bit.do/openmintedlinkedin
vimeo.com/openminted
bit.do/openmintedplus