Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt...
Transcript of Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt...
![Page 1: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/1.jpg)
Arcas: Using Python to access open researchliterature
@NikoletaGlyn
![Page 2: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/2.jpg)
![Page 3: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/3.jpg)
The illustrated guide to a Ph.D.
Matt Might
http://matt.might.net/articles/phd-school-in-pictures/
![Page 4: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/4.jpg)
![Page 5: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/5.jpg)
ARTICLE
JOURNAL REVIEW
PUBLISHED
![Page 6: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/6.jpg)
Sustainable Software
![Page 7: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/7.jpg)
![Page 8: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/8.jpg)
![Page 9: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/9.jpg)
0.5min+ 100× 1.5min+ 10× 0.5min =155.5min⇒ 2h and 35.5min
![Page 10: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/10.jpg)
API
![Page 11: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/11.jpg)
QUERY
http://export.arxiv.org/api/query?search_query=ti:
Sustainable%20Software
![Page 12: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/12.jpg)
![Page 13: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/13.jpg)
15min+ 1min+ 50min = 66min⇒ 1h and 6min
![Page 14: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/14.jpg)
QUERY
http://export.arxiv.org/api/query?search_query=ti:
Sustainable%20Software
http://api.plos.org/search?q=title:
Sustainable%20Software&rows=100
http:
//www.nature.com/opensearch/request?queryType=cql&query=
dc.title%20adj%20SustainableSoftware&maximumRecords=100
...
![Page 15: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/15.jpg)
QUERY
http://export.arxiv.org/api/query?search_query=ti:
Sustainable%20Software
http://api.plos.org/search?q=title:
Sustainable%20Software&rows=100
http:
//www.nature.com/opensearch/request?queryType=cql&query=
dc.title%20adj%20SustainableSoftware&maximumRecords=100
...
![Page 16: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/16.jpg)
QUERY
http://export.arxiv.org/api/query?search_query=ti:
Sustainable%20Software
http://api.plos.org/search?q=title:
Sustainable%20Software&rows=100
http:
//www.nature.com/opensearch/request?queryType=cql&query=
dc.title%20adj%20SustainableSoftware&maximumRecords=100
...
![Page 17: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/17.jpg)
API1Query
XML
API2Query
XML
API3Query
XML
API4Query
XML
API5Query
XML
API6Query
XML
![Page 18: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/18.jpg)
ARCAS
API1Query
XML
API2Query
XML
API3Query
XML
API4Query
XML
API5Query
XML
API6Query
XML
![Page 19: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/19.jpg)
$ pip install arcas
![Page 20: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/20.jpg)
>>> import arcas
>>> api = arcas.Arxiv()
>>> parameters = api.parameters_fix(
... title=’sustainable software’, records=1, start=1)
>>> url = api.create_url_search(parameters)
>>> request = api.make_request(url)
>>> root = api.get_root(request)
>>> raw_article = api.parse(root)
>>> article = api.to_dataframe(raw_article[0])
>>> api.export(article, "result.json")
![Page 21: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/21.jpg)
{"key":{"0":"Ahern2013"},
"unique_key":{"0":"698d27415f69258ef122f46b184a77e0"},
"title":{"0":"VisIt: Experiences with Sustainable Software"},
"author":{"0":"Sean Ahern","1":"Eric Brugger"},
"abstract":{"0":" The success of the VisIt visualization..."},
"date":{"0":2013},
"journal":{"0":"arXiv"},
"provenance":{"0":"arXiv"}}
![Page 22: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/22.jpg)
>>> for p in [arcas.Arxiv, arcas.Nature, arcas.Ieee, arcas.Plos]:
... api = p()
... parameters = api.parameters_fix(
... title=’sustainable software’, records=1, start=1)
... url = api.create_url_search(parameters)
... request = api.make_request(url)
... root = api.get_root(request)
... raw_article = api.parse(root)
... try:
... for art in raw_article:
... article = api.to_dataframe(art)
... api.export(article, "result_from_{}.json".format(
... api.__class__.__name__))
... except TypeError:
... pass
![Page 23: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/23.jpg)
15min+ 5min = 20min
![Page 24: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/24.jpg)
2000
2002
2004
2006
2008
2010
2012
2014
2016
2018
year
2
4
6
8
10
12
14
16
num
ber o
f rec
ords
Articles per Year (N = 87)
![Page 25: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/25.jpg)
2000
2002
2004
2006
2008
2010
2012
2014
2016
year
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
20.0nu
mbe
r of r
ecor
dsProvenance
IEEEarXivPLOS
![Page 26: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/26.jpg)
![Page 27: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/27.jpg)
Birgit Penzenstadler
![Page 28: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/28.jpg)
Arcas
tools.py
doc/
arcas.readthedocs.io/
ieee nature
arxiv . . .
test ieee test nature
test arxiv . . .
![Page 29: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/29.jpg)
$ arcas_scrape --version
Arcas 0.0.3
$ arcas_scrape -p arxiv -t "Sustainable Software" -r 1
http://export.arxiv.org/api/query?search_query=ti:Sustainable
Software&max_results=1&start=1
![Page 30: Arcas: Using Python to access open research literature · The illustrated guide to a Ph.D. Matt Might](https://reader034.fdocuments.net/reader034/viewer/2022051808/6009a8ce17f60a018748f3a9/html5/thumbnails/30.jpg)
@NikoletaGlynhttps://github.com/ArcasProject/Arcas
https://nikoleta-v3.github.io