Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri...

35
Scholia: A Wikidata-based site for analytics and visualization of science Finn ˚ Arup Nielsen Cognitive Systems, DTU Compute, Technical University of Denmark 3 oktober 2018

Transcript of Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri...

Page 1: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia: A Wikidata-based site for analytics and

visualization of science

Finn Arup Nielsen

Cognitive Systems, DTU Compute, Technical University of Denmark

3 oktober 2018

Page 2: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Finn Arup Nielsen 1 3 oktober 2018

Page 3: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Scholia

Scholia is a webservicefrom https://tools.wmflabs.

org/scholia/ and a Pythonpackage from https://github.

com/fnielsen/scholia.

The webservice generatesoverview of science withWikidata Query Serviceand is built with the Flaskweb framework, HTML,Bootstrap, Javascript andtemplated SPARQL.

For researcher profiles, scientometrics, bibliographic reference manage-ment, information discovery (find relevant papers, scientific meetings,researchers, funding opportunities, . . . ).

Finn Arup Nielsen 2 3 oktober 2018

Page 4: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Where does the data comes from?

Finn Arup Nielsen 3 3 oktober 2018

Page 5: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Finn Arup Nielsen 4 3 oktober 2018

Page 6: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Wikidata

“Wikidata: Verifiable,Linked Open KnowledgeThat Anyone Can edit”(Dario Taraborelli)

CC0-licensed data avail-able on website, API,SPARQL endpoint ordump files.

Each page is an “item”with labels, aliases, prop-erties and property val-ues, as well as Wikipedialinks.

Wikidata site UI mockup from 2012 for Berlin (Q64).

Finn Arup Nielsen 5 3 oktober 2018

Page 7: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Wikidata Query Service

Wikidata Query Service

(WDQS) is the SPARQL

endpoint for the RDF-

transformed data in Wiki-

data.

There is a “Query Helper”

for non-programmatic for-

mation of SPARQL queries,

predefined prefixes, identi-

fier lookup.

Several results output for-

mats: table, bubble chart,

line chart, graphs, etc.

Finn Arup Nielsen 6 3 oktober 2018

Page 8: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

WikiCite

“WikiCite: Building the sum of all hu-

man citations” (Dario Taraborelli)

Use Wikidata to hold metadata about

works (scientific articles, book, etc.)

Properties: authors, publication date,

where it is published, reviewed by, edi-

tor, main subject, language, retracted

by, erratum, volume, issue number,

page range, number of pages, type

or genre (retraction notice, retracted

paper), series, publisher, and a lot

of identifiers: DOI, ACM, Semantic

Scholar, PMCID, PMID, arXiv, etc.

Finn Arup Nielsen 7 3 oktober 2018

Page 9: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

WikiCite Statistics

Wikidata statistics on

WikiCite data. Cur-

rently presented on

the main page of

Scholia.

121 million citations.

17 million PubMed

links.

14 million DOI links.

187 thousand ORCID

links.

Finn Arup Nielsen 8 3 oktober 2018

Page 10: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Jakob Voß’ WikiCite statistics

Jakob Voß’ Wikicite

statistics that is up-

date regularly.

http://wikicite.org/

statistics.html

Number of publica-

tions and citations in

Wikidata.

Note the staircase curve of the citations. My guess is that this shape

is due to prolific James Hare using Europe PubMed Central initially and

then switching to CrossRef for citations.

Finn Arup Nielsen 9 3 oktober 2018

Page 11: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Scholia

Finn Arup Nielsen 10 3 oktober 2018

Page 12: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Scholia’s aspects

Scholia shows Wiki-

data data in aspects,

author, work, organi-

zation (e.g., univer-

sity, research group),

venue (journal or con-

ference), series, pub-

lisher, sponsor, loca-

tion, event, award,

topic, chemical, dis-

ease, etc.

For instance, the Technical University of Denmark may be viewed as a

publisher, topic, organization, sponsor and location.

Finn Arup Nielsen 11 3 oktober 2018

Page 13: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Author aspect: Co-author graph

The egocentric co-author

graph in Scholia’s au-

thor aspect for the re-

searcher Mikkel Wal-

lentin, Aarhus Univer-

sity.

Colored according to

gender.

Finn Arup Nielsen 12 3 oktober 2018

Page 14: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Organization aspect: Citations

Co-author normalized citations per year for Technical University of Den-mark: Number of citations per year divided by number of co-authors oncited paper.

Finn Arup Nielsen 13 3 oktober 2018

Page 15: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Work aspect: Retractions

Wikidata can specifyretracted papers, re-traction notices andtheir connection.

By combining cita-tion and retractioninformation we canfind papers citing an-other paper after ithas been retracted.

Currently, Scholia visualizes such information in a timeline. Here Identi-fication of Aurora-A as a direct target of E2F3 during G2/M cell cycleprogression: “For example, silencing E2F3 prevented entry into G2/M inovarian cancer cells [61].” (received April 2016, accepted August 2017)

Finn Arup Nielsen 14 3 oktober 2018

Page 16: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Publisher aspects

Scatter plot of num-

ber of citations

as a function of

number of works

published in jour-

nals published un-

der the BioMed

Central brand.

The top left one is

Genome Biology,

the lower right Crit-

ical Care.

Finn Arup Nielsen 15 3 oktober 2018

Page 17: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Country aspect

Locations in Denmark

that is the main subject

of a work (Nielsen et al.,

2018).

Example popup: Suc-

cession of phytoplank-

ton in response to en-

vironmental factors in

Lake Arresø, North Zea-

land, Denmark.

Similar maps can be cre-

ated for narrative loca-

tions.

Finn Arup Nielsen 16 3 oktober 2018

Page 18: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Project aspect: Research projects in Scholia

Research project aspect

(Willighagen et al., 2018a).

If works are linked up

to the project (by Wiki-

data’s sponsored by prop-

erty) we can make un-

usually statistics.

Here citations per mil-

lion budget.

(The schema for projects

and grants is not quite

settled)

Finn Arup Nielsen 17 3 oktober 2018

Page 19: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Use aspect

Bar chart for usage

of SPM software (func-

tional neuroimaging soft-

ware) over time with dif-

ferent software versions

indicated by color.

Uses the describes a

project that uses prop-

erty.

Such data is likely not

available in directly ma-

chine readable format.

Finn Arup Nielsen 18 3 oktober 2018

Page 20: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Comparison of multiple items

Multiple countries, e.g., some Southern and Eastern African countries or

cheminformatics journals (here Willighagen’s citations to work ratio).

Finn Arup Nielsen 19 3 oktober 2018

Page 21: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Scholia’s “subaspects”

Cocitation network for machine learning researchers in Denmark:/scholia/country/Q33/topic/Q2539.

Finn Arup Nielsen 20 3 oktober 2018

Page 22: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Geodata and Scholia

Wikipedia researchers near

Tubingen: Weight infor-

mation in Wikidata by the

geographical distance and

topic of authored works

(Nielsen et al., 2018).

/scholia/location/Q3806/-

topic/Q52.

Nearby (in space and time)

events also possible.

Finn Arup Nielsen 21 3 oktober 2018

Page 23: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Finding related items

Finn Arup Nielsen 22 3 oktober 2018

Page 24: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Related diseases with Wikidata Query Service

Count some form ofco-occurences with aSPARQL query in theWikidata Query ser-vice.

Scholia is doing thisfor diseases and pro-teins with tailor-madeSPARQL. Here forthe disease schizo-phrenia.

Shows genetically as-sociated diseases viathe P2293 (geneticassociation) property.

Finn Arup Nielsen 23 3 oktober 2018

Page 25: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Wembedder

Finding related items basedon word2vec-based knowledgegraph embedding (Nielsen,2017).

Here for a scientific article.

In this case, the similar articlesfound are (probably) mostlyrelated to coauthorship rela-tions.

But a newer embedding wouldprobably be much affected bythe citation relations betweenpapers.

Finn Arup Nielsen 24 3 oktober 2018

Page 26: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Related items by co-citations

Example with Do alt-

metrics work? Twitter

and ten other social web

services.

Counts citations back

and forth, one step

and two step with the

SPARQL fragment:

wd:Q21133507

(^wdt:P2860 | wdt:P2860)

/

(^wdt:P2860 | wdt:P2860)?

?work .

Finn Arup Nielsen 25 3 oktober 2018

Page 27: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

How do we get data into Wikidata?

Finn Arup Nielsen 26 3 oktober 2018

Page 28: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Wikidata input

Manual input on the https://

www.wikidata.org website.

Magnus Manske’s tools: Source-MD including its ORCIDator andresolver, Quickstatements, TAB-ernacle (left screenshot). Rela-tively quick for each researcher ifORCID profile has DOI publica-tions.

Other approaches: Fatameh,programmatic upload, e.g., withWikidataIntegrator.

Scholia has arXiv scraping.

Finn Arup Nielsen 27 3 oktober 2018

Page 29: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Scientometrics limitations

PubMed bias: A large portion of the documents comes from PubMed.

DOI bias: Documens with DOIs are easier to setup than documents

without.

I4OC bias: The citations we have (and that we are going to get) are

primarily from open citation databases (CrossRef), i.e., citations from

organizations such as IEEE and Elsevier are underrepresented.

Authors are not equally represented. One problem: Some author names

are hard to resolve, e.g., Chinese and Korean names, cf. (Ioannidis et al.,

2018).

Scholia bias: Chemoinformatics, Zika virus, etc.

Finn Arup Nielsen 28 3 oktober 2018

Page 30: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Scholia usage statistics

Monthly pageview for

Scholia has increased

and has been over

300’000.

The latest increase is

likely due to inclu-

sion of link to Scho-

lia from Wikimedia

Commons templates.

Whether page view

comming this way are

bots or users are not

known.

Finn Arup Nielsen 29 3 oktober 2018

Page 31: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Scholia/Wikidata promotions

How do we spread the word of Scholia

and Wikidata?

Here Egon Willighagen uses the hash

tag #icanhazwikidata to encourage

researchers to tweet their ORCID iD

so that we can “orcidator” their pub-

lication into Wikidata.

Deep links from Wikipedia and Wiki-

media Commons to Scholia profiles,

e.g., on Uta Frith.

Finn Arup Nielsen 30 3 oktober 2018

Page 32: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Development

Development takes placeon GitHub under GPL athttps://github.com/-fnielsen/scholia/.

Three developers: EgonWillighagen (almost all che-moinformatics aspects, bi-ological pathways, etc., seealso (Willighagen et al.,2018b)) and Daniel Mi-etchen.

Provided a Python devel-opment environment, youcan download and runScholia on your own com-puter.

Finn Arup Nielsen 31 3 oktober 2018

Page 33: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

Scholia

Conclusion

Wikidata and its Wikidata Query Service yield an open corpus of metadata

queryable in complex ways.

Scholia aggregates Wikidata data a present the data in an interactive

environment.

Data in Wikidata is limited and there is biased coverage.

Wikidata input is somewhat cumbersome. We rely heavily on Magnus

Manskes bespoke tools.

Ontology still not clear, e.g., preprints, postprints

WikiCite part of Wikidata continues to grow.

Finn Arup Nielsen 32 3 oktober 2018

Page 34: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

References

References

Ioannidis, J. P. A., Klavans, R., and Boyack, K. W. (2018). Thousands of scientists publish a paper everyfive days. Nature, 561:167–169. DOI: 10.1038/D41586-018-06185-8.

Nielsen, F. A. (2017). Wembedder: Wikidata entity embedding web service. DOI: 10.5281/ZEN-ODO.1009127.

Nielsen, F. A., Mietchen, D., and Willighagen, E. (2018). Geospatial data and Scholia. Proceedings ofthe 3rd International Workshop on Geospatial Linked Data and the 2nd Workshop on Querying the Webof Data. DOI: 10.5281/ZENODO.1202256.

Willighagen, E., Jahn, N., and Nielsen, F. A. (2018a). The EU NanoSafety Cluster as Linked Datavisualized with Scholia. DOI: 10.6084/M9.FIGSHARE.6727931.

Willighagen, E., Slenter, D., Mietchen, D., Evelo, C. T., and Nielsen, F. A. (2018b). Wikidata and Scholiaas a hub linking chemical knowledge. 11th International Conference on Chemical Structures. Program &Abstracts, page 146. DOI: 10.6084/m9.figshare.6356027.v1.

Finn Arup Nielsen 33 3 oktober 2018

Page 35: Scholia: A Wikidata-based site for analytics and visualization ...Scholia Wikidata \Wikidata: Veri able, Linked Open Knowledge That Anyone Can edit" (Dario Taraborelli) CC0-licensed

References

Copyright and license

Wikidata logo by Arun Ganesh (Planemad). It is a trademark of the

Wikimedia Foundation.

Wikidata UI mockup by Denny Vrandecic, CC0.

Jakob Voß’ statistics plot is by himself with an unknown license.

Screenshot from Magnus Manske webservice.

Map is CC BY-SA by OpenStreetMap contributors.

WikiCite logo by Dario Taraborelli, CC0.

Photo of Dario Taraborelli by Pax Ahimsa Gethen, CC BY-SA 4.0.

Finn Arup Nielsen 34 3 oktober 2018