Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and...

Post on 18-Nov-2014

116 views 2 download

description

4th German-Russian Young Researchers Forum Saint-Petersburg, August 2014

Transcript of Machine Support for Interacting with Scientific Publications Improving Information Retrieval, and...

Introduction Vision Technology Solutions Conclusion

Machine Support for

Interacting w. Scienti®c Publications,

Improving Information Retrieval, and

Assessing Quality of Scienti®c Output

4th

German-Russian Young Researchers Forum 2014

Christoph Lange1,2

1Enterprise Information Systems, Institute for Applied Computer Science, University of Bonn

2Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Sankt Augustin

http://langec.wordpress.com/about

2014-07-07

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 1

Introduction Vision Technology Solutions Conclusion

Machine Support for

Assessing Quality of Scienti®c Output

4th

German-Russian Young Researchers Forum 2014

Christoph Lange1,2

1Enterprise Information Systems, Institute for Applied Computer Science, University of Bonn

2Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Sankt Augustin

http://langec.wordpress.com/about

2014-07-07

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 1

Introduction Vision Technology Solutions Conclusion

Hello, World!

2011 PhD at Jacobs Univ. Bremen, Germany: software for

collaborating on mathematical documents [Lan11]

2011/12 Univ. Bremen, Germany: making knowledge of

diªerent complexity manageable for computers

[OntoIOp13]

2012/13 Univ. Birmingham, UK: enabling domain

experts to make mathematical models

machine-veri®able [KLR]

2013– Enterprise Information Systems @ Univ. Bonn,

Germany / Organized Knowledge @ Fraunhofer IAIS:

enterprise information integration [AL14], data

quality assessment, . . .

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 2

Introduction Vision Technology Solutions Conclusion

Assess Quality of Scienti®c Output (I)

Vision: answer the following questions about the quality

of scienti®c output:

Author “What is a good workshop to discuss my latest

idea?”

Senior Researcher “Should I accept an invitation to the

programme committee of this conference?”

PhD Student “What are the best publications I should

read to get started?”

Reviewer “Is this paper based on high-quality data?”

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 3

Introduction Vision Technology Solutions Conclusion

Assess Quality of Scienti®c Output (II)

How? – Semantic Web / Linked Open Data technology

weak arti®cial intelligence – does not aim at

replacing, but at supporting humans

practically applicable, and scalable to the size of

the Web (→ search engine example)

suitable for connecting data from heterogeneoussources:

scienti®c publications(bibliographic metadata, citations and full text)

social networks(in science? – ResearchGate, Mendeley, etc.)

research data

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 4

Introduction Vision Technology Solutions Conclusion

Linked Open Data: schema.org

initiative of search engines (Google, Yandex, . . . )

structuring web page content (creative works,

events, organisations, persons, places, products)

Example (Movie description)

AvatarDirector: James Cameron (born August 16, 1954)

Science ®ction

Trailer

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 5

Introduction Vision Technology Solutions Conclusion

Linked Open Data: schema.org

initiative of search engines (Google, Yandex, . . . )

structuring web page content (creative works,

events, organisations, persons, places, products)

Example (Movie description)

<div class="movie"><h1>Avatar</h1><div class="director">Director: James Cameron(born August 16, 1954)

</div><span class="genre">Science fiction</span><a href="../movies/avatar-theatrical-trailer.html"Trailer</a></div>

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 5

Introduction Vision Technology Solutions Conclusion

Linked Open Data: schema.org

initiative of search engines (Google, Yandex, . . . )

structuring web page content (creative works,

events, organisations, persons, places, products)

Example (Movie description)

<div itemscope itemtype="http://schema.org/Movie"><h1 itemprop="name">Avatar</h1><div itemprop="director" itemscopeitemtype="http://schema.org/Person">Director: <span itemprop="name">James Cameron</span>

(born <span itemprop="birthDate">August 16, 1954</span>)</div><span itemprop="genre">Science fiction</span><a href="../movies/avatar-theatrical-trailer.html"itemprop="trailer">Trailer</a></div>

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 5

Introduction Vision Technology Solutions Conclusion

Linked Open Data: schema.org

initiative of search engines (Google, Yandex, . . . )

structuring web page content (creative works,

events, organisations, persons, places, products)

Example (Movie description)

Movie Avatar Person

James Cameron

August 16, 1954Science ®ction../movies/. . .

type

nam

e

director

genre

trailer

type

namebirth

Date

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 5

Introduction Vision Technology Solutions Conclusion

Social Data with schema.org

review or rating of a creative work, organization or

product (written by a person)

social network of a person: “knows”, “works for”, “is

colleague of”, “has parent/sibling/spouse/child/relative”

Example (Reviews of a movie)

Movie type

Avatar

name

reviews

authorreviewRatin

g

reviewsauthor

reviewRating

6

ratingValue

8.5

ratingValue

Pünktchen

name

Antonname

Person

type

type

kn

ow

s

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 6

Introduction Vision Technology Solutions Conclusion

schema.org in a Search Engine

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 7

Introduction Vision Technology Solutions Conclusion

Workshop Quality

Author: “What is a good workshop to discuss my latest

idea?”

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 8

Introduction Vision Technology Solutions Conclusion

Workshop Quality: Examples

Low-quality workshop

1st

International Workshop on Applied Networking

(but all non-invited submissions are from authors from

the same institution as the chairs)

High-quality workshop

focused topic, 10 editions so far, balanced continuity and

renewal in organising committee, number of

submissions not decreasing, international participation,

part of a high-pro®le conference

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 9

Introduction Vision Technology Solutions Conclusion

Workshop Quality: Data

Semantic Publishing Challenge [DL14]

@ Extended Semantic Web Conference 2014

One task focused on extracting Linked Data from

CEUR-WS.org workshop proceedings volumes

1,200 workshops since 1995

open access

most important publisher for computer science

workshops

semi-structured HTML tables of content

unstructured PDF full-text

A team from Saint-Petersburg (ITMO University)

won the award for the best-performing tool [KK14]

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 10

Introduction Vision Technology Solutions Conclusion

Conference Quality

Senior Researcher: “Should I accept an invitation to the

programme committee of this conference?”

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 11

Introduction Vision Technology Solutions Conclusion

Conference Quality in the Past: Ranking

CORE (Computing Research and Education Association

of Australasia) and ERA (Excellence in Research for

Australia) rankings of 2008, 2010 and 2013:

infrequent and intransparent

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 12

Introduction Vision Technology Solutions Conclusion

Paper Quality in the Past: Impact Factor

PhD Student: “What are the best publications I should

read to get started?”

Impact Factor

Average number of

citations of recent articles

journals only

not comparable across

disciplines

can be in°uenced by

journal editors

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 13

Introduction Vision Technology Solutions Conclusion

Paper Quality in the Future

Multidimensional, context-sensitive analysis:

trend detection, topic analysis, expert search,

community dynamics, research performance at

diªerent levels (e.g. [OM14])

context-sensitive citation analysis

e.g. 2014 Semantic Publishing Challenge task 2 (using

PubMedCentral XML metadata) [DL14]

“good citation”: B’s contribution is based on A’s

methodology

“bad citation”: A cited in a footnote in the “related work”

section

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 14

Introduction Vision Technology Solutions Conclusion

Data Quality

Reviewer: “Is this paper based on high-quality data?”

Quality metrics of an evolving dataset [DLA14]

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 15

Introduction Vision Technology Solutions Conclusion

Data Quality Assessment

Quality := “®tness for use” – categories [Zav+13]:

Relevancy Conciseness

Timeliness

Rep.-Conciseness

Interoperability

Consistency

Interpretability

Understandability

Versatility*

Availability

Performance* Interlinking*

SyntacticValidity

Representation

ContextualIntrinsic

Accessibility

Trustworthiness

Two dimensionsare related

Licensing*

Semantic Accuracy

Completeness

Security*

Dim1 Dim2

Enable authors to upload data with their papers!

Give peer reviewers access to data quality metrics

Starting collaboration with GESIS (social science)

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 16

Introduction Vision Technology Solutions Conclusion

Directions: Jailbreaking the PDF

“exploring ways to

access scholarly

data in modern

ways”

free peer-reviewed

scienti®c knowledge

from being locked

up in PDF

documents

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 17

Introduction Vision Technology Solutions Conclusion

Directions: Pact with the Devil

Openness vs. impact

Springer:

conference linked

data

Elsevier: executable

paper challenge

ResearchGate: open

reviews

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 18

Introduction Vision Technology Solutions Conclusion

Conclusion

Scientists need help with assessing the quality of

scienti®c output.

Having PDF documents peer-reviewed by human

experts is not su«cient.

We need better quality metrics than the impact

factor.

Not just paper quality matters, but also data quality.

Semantic Web/Linked Data technology helps to

provide complementary machine support. . .

. . . and is a gate into openness.

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 19

References

References I

S. Auer and C. Lange. “Interlinking Data and

Knowledge in Enterprises, Research and Society

with Linked Data”. In: Proceedings of the 11th

International Baltic Conference on Databases andInformation Systems (Baltic DB&IS). (Tallinn, Estonia,

June 8–11, 2014). Ed. by H.-M. Haav, A. Kalja, and

T. Robal. Invited paper. Tallinn, Estonia: Tallinn

University of Technology Press, 2014, pp. 3–12.

A. Di Iorio and C. Lange, eds. (Anissaras, Greece,

May 25, 2014). 2014. URL: http://2014.eswc-conferences.org/program/semwebeval.

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 20

References

References II

J. Debattista, C. Lange, and S. Auer. “Representing

Dataset Quality Metadata using

Multi-Dimensional Views”. 2014. Submitted.

M. Kolchin and F. Kozlov. “Unstable markup: A

template-based information extraction from web

sites with unstable markup”. In: SemanticPublishing Challenge (Extended Semantic WebConference, Semantic Web Evaluation Track).(Anissaras, Greece, May 25, 2014). Ed. by A. Di Iorio

and C. Lange. 2014. URL: http://2014.eswc-conferences.org/program/semwebeval.

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 21

References

References III

M. Kerber, C. Lange, and C. Rowat. ForMaRE.Formal Mathematical Reasoning in Economics. URL:

http://cs.bham.ac.uk/research/projects/formare/ (visited on 2013-02-10).

C. Lange. “Enabling Collaboration on Semiformal

Mathematical Knowledge by Semantic Web

Integration”. PhD thesis. Jacobs University

Bremen, 2011.

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 22

References

References IV

F. Osborne and E. Motta. “Understanding Research

Dynamics”. In: Semantic Publishing Challenge(Extended Semantic Web Conference, Semantic WebEvaluation Track). (Anissaras, Greece, May 25, 2014).

Ed. by A. Di Iorio and C. Lange. 2014. URL:

http://2014.eswc-conferences.org/program/semwebeval.

OntoIOp (Ontology, Model and Speci®cationIntegration and Interoperability), an OMG StandardDevelopment Initiative. 2013. URL:

http://ontoiop.org (visited on 2013-10-09).

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 23

References

References V

A. Zaveri, A. Rula, A. Maurino, R. Pietrobon,

J. Lehmann, and S. Auer. “Quality Assessment

Methodologies for Linked Open Data”. In:

Semantic Web Journal (2013). This article is still

under review. URL: http://www.semantic-web-journal.net/content/quality-assessment-linked-open-data-survey.

Lange (Bonn) Interacting with Scienti®c Publications; Assessing Quality of Scienti®c Output 2014-07-07 24