ICIC 2013 Conference Proceedings Uwe Rosemann TIB

Post on 15-Jun-2015

661 views 2 download

Tags:

description

Text and Non-textual Objects: Seamless access for scientists Uwe Rosemann (German National Library of Science and Technology (TIB), Germany) The European High Level Expert Group on Scientific data has formulated the challenges for a scientific infrastructure to be reached by 2030: “Our vision is a scientific e-infrastructure that supports seamless access, use, re-use, and trust of data. In a sense, the physical and technical infrastructure becomes invisible and the data themselves become the infrastructure – a valuable asset, on which science, technology, the economy and society can advance”. Here, “data” is not restricted to primary data but also includes all non-textual material (graphs, spectra, videos, 3D-objects etc.). The German National Library of Science and Technology (TIB) has developed a concept for a national competence center for non-textual materials which is now founded by the German State and by the German Federal Countries. The center has to perform the task: developing solutions and services together with the scientific community to make such data available, citable, sharable and usable, including visual search tools and enhanced content-based retrieval. With solutions such as DataCite and modular development for extraction, indexing and visual searching of new scientific metadata, TIB will accept the challenge. And will make all data accessible to its users fast, convenient and easy to use. The paper shows what special tools are developed by TIB in the context of scientific AV-media, 3D-objects and research data.

Transcript of ICIC 2013 Conference Proceedings Uwe Rosemann TIB

Uwe Rosemann

ICIC 2013 Vienna

Textual and non-textual objects:

Seamless access for scientists

2

• Specialized Library for Architecture, Chemistry, Computer Science,

Mathematics, Physics, Engineering Technology

• Financed by Federal Government and all Federal States

• Member of the Leibniz Association

• Global supplier for scientific and technical

information

German National Library of Science and Technology (TIB)

3

Global Network

TechLib

4

Customers

71% 10%

Europe

14% 5%

World USA

Germany

5

Main Services

• Provision of scientific content

• full texts, document delivery, interlibrary loan

• Scientific retrieval

• portal GetInfo

• Long-term preservation

• DOI-Service for research data

• Research and development

6

Jim Gray, eScience Group, Microsoft Research

Changes in the scientific process

7

A gap

• A widening gap in the scientific record between published

research in a text document and the data that underlies it

• As a result, datasets are

• difficult to discover

• difficult to access

• Scientific information gets lost

8

Requirements - Politics

Knowledge is power.

Europe must manage the digital assets its researchers generate.

9

Final report of the High Level Expert Group on Scientific Data.

„Riding the wave“ – How Europe can gain access

from the rising tide of scientific data

10

Strategy – Move beyond text

Simulation

Scientific Films

3D Objects

Text

Research Data

Software

11

Move beyond text – Consequences for TIB

• Research communities produce many types of scientific and technical

information

• Each has its own unique characteristics and life cycle

• Must become capable of accepting and managing new media formats

12

Competence Center for Non-textual Materials I

• Develop a clear strategy for the use and integration of non-textual

materials at the TIB

• Systematically collect non-textual materials from research and teaching

• Define, integrate and establish technical infrastructure

• Define and establish workflows for indexing, cataloguing, digital

preservation, DOI names, licencing

13

Competence Center for Non-textual Materials II

• Develop innovative media-specific portals enabled by e.g. an automated

video analysis with scene, speech, text and image recognition

• Linking non-textual materials to other research information such as full

texts and research data via the specialist portal GetInfo

• Engage in communities, provide support and advice to media providers

TIB will establish its own research capacity

14

• Infrastructure for research data

• Visual search tools for AV-media

• 3D-Objects

• chemOCR

How have we been preparing ?

15

• In 2005, the TIB became a non-commercial DOI registration agency

for research data

• In 2010, the TIB became co-founder of the international DataCite

consortium to establish easier access to scientific research data on the

Internet

Mission

• Citability of research data

• High visibility of the data

• Easy re-use and verification of the data sets

• Increasing quality of published papers

Collaboration – Research Data

16

DataCite Members

17

Example: EHEC virus

18

Example: EHEC virus

19

DOI Services

• Contracts with 60 data centres

• Research Institutes

• Universities

• Libraries

• Publisher

• 776.454 DOI registrations

• 22.533 up to September 2013

20

Research data – Further developments

• KomFor

• Centre of Expertise for Research Data from the „Earth and

Environment“ project

• RADAR

• RADAR - Research Data Repositorium

• Visual Analysis

• VisInfo Methods

21

Zeit [h] T [°C] 1 12 2 13 3 12 4 12 5 13 6 35 7 17 8 11 9 10

10 12 11 13 12 13 13 12 14 12 15 12 16 11 17 11 18 10 19 10 20 11 21 11 22 10 23 12 24 12

Numerical data

22

Visual access to research data

23

• Infrastructure for research data

• Visual search tools for AV-media

• 3D-Objects

• chemOCR

How have we been preparing ?

24

TIB‘s portal for audiovisual media

Project Development of a portal for audiovisual media

Aim Improve access to AV-Media

Time July 2011 – December 2013

Partner Hasso-Plattner Institut for Softwaresystemtechnology GmbH

25

How do I find what I‘m looking for in videos?

Today: Manual annotation of the whole video

TIB‘s portal for audiovisual media

Metadata

• Titel

• Author

• Description

• Publisher

• Publication year

• Rightsholder

• …..

26

source: Scorupka, Sascha, Experiment der Woche, 2011

Future: Manual Annotation plus content-based information

1. Speech

2. Visual features

e.g. Indoor, Experiment, Technology

4. Structural Information

Scenes, Shots, Segments

3. Textual information Leibniz University Hannover

TIB‘s portal for audiovisual media

27

TIB‘s portal for audiovisual media

Media analysis process

Upload

28

TIB‘s portal for audiovisual media

Scene recognition

Hard cut

Kopf, S. Computergestützte Inhaltsanalyse von digitalen Videoarchiven, Mannheim. 2006

Automatic cut detection

→ luminance / contrast

→ colour distribution / colour

histogramm

→ edges

29

TIB‘s portal for audiovisual media

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering

this work is copy right ed nine teen thirty six

Automatic speech recognition

Quality of results is dependent upon

• quality of the speaker

• dialects

• background noises

• voice overlaps

30

TIB‘s portal for audiovisual media

Intelligent Character Recognition

Intelligent Character Recognition

(ICR)

• Character/Logo Detection

• Character Filtering

• Character Recognition

31

Method of analysis

Image recognition

Interview, experiment,

animation, lecture

Extracted data is

converted into text

TIB‘s portal for audiovisual media

Automated analysis: Image recognition

32

Visual Concepts

Graphical : Animation

Graphical : Drawing

Graphical : Diagram

Real : Outdoor

Real : Indoor

Real : Lecture /

Conference

Real : Interview

Real : Buildings ...

TIB‘s portal for audiovisual media

Machine learning

using visual features Keyframes Annotation

33

TIB‘s portal for audiovisual media

34

• Infrastructure for research data

• Visual search tools for AV-media

• 3D Objects

• chemOCR

How have we been preparing?

35 35

3D Objects – an excursion to Architecture

36

content based indexing

visual search

Visual search tools

37

segmentation with

form-primitives

extraction of

room connectivity

graphs

Content based indexing

38

3D sketch attributed graph

result visualization

Visual search

39

Further developments

40

• Infrastructure for research data

• Visual search tools for AV-media

• 3D Objects

• chemOCR

How have we been preparing ?

41

Search for chemical structures – how?

?

Chemists are used to drawing

Information retrieval in Chemistry

42

Table with reaction scheme

2a-i: Derivates from the reaction

Chemical structure

Reaction scheme

Chemical Names

Linked entities from the table

Textual and non-textual chemical information

43

image data chemical structure data

CLiDE chemOCR

Non-textual data processing – chemOCR

44

Information retrieval in chemistry Text AND formulas

45

Further subjects

• Open Science Lab

• Ontology

46

Dissemination of scientific and technical information has been a

foundational mission.

The methods have completely changed, but the mission

remains the same.

Conclusion

47

Ultimate Goal:

Interlinking and Search Across All

Types of Digital Assets.

Conclusion

48

GetInfo – Portal for Science and Technology

• 58 m metadata in internal index

• 390 m metadata in external sources

• 900.000 pdf fulltexts

• Data, AV-Media, 3D Objects

49

Development of media-specific portals

BEREITSTELLU

NG

Probado 3D Portal for audiovisual Media

50

Questions?