0
375
750
1125
1500
1875
2250
2625
3000
15
Usage of http://sparql.uniprot.org
See also: www.isb-sib.orgSee also: www.isb-sib.org
See also: www.sib.swiss
Contact [email protected] www.uniprot.org
The UniProt SPARQL Endpoint: 34 Billion Triples in Production Jerven Bolleman1, Sebastien Gehant1, Thierry Lombardot1, Alan Bridge1, Ioannis Xenarios1,2,3, Nicole Redaschi1, and the UniProt Consortium1,4,5 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva 4, Switzerland, 2Vital-IT Group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, 1015 Lausanne, Switzerland, 3University of Lausanne, 1015 Lausanne, Switzerland, 4European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK, 5Protein Information Resource (PIR), Georgetown University Medical Center, 3300 Whitehaven Street, NW, Suite 1200, Washington, DC 20007, USA
UniProt is mainly supported by the National Institutes of Health (NIH), National Human Genome Research Institute (NHGRI) and National Institute of General Medical Sciences (NIGMS) grant U41HG007822. Additional support for the EBI's involvement in UniProt comes from the NIH grant 2P41 HG02273. Swiss-Prot activities at the SIB are supported by the Swiss Federal Government through the State Secretariat for Education, Research and Innovation SERI. PIR's UniProt activities are also supported by the NIH grants 5R01GM080646-07, 3R01GM080646-07S1, 5G08LM010720-03, and 8P20GM103446-12, and the National Science Foundation (NSF) grant DBI-1062520.
UniProt on the web
UniProt is a comprehensive resource for protein sequence and annotation data. It has been available on the web since its creation in 2002 (and its predecessors Swiss-Prot and TrEMBL much longer...).
UniProt on the semantic web
All UniProt data is available in RDF since 2007 and can be downloaded in this format from the UniProt FTP site and the www.uniprot.org REST interface. Since 2014 you can also query the data directly on our public SPARQL endpoint at sparql.uniprot.org.
The UniProt data has grown eight fold over the last five years. UniProt release 2017_11 consists of 34 billion triples and requires just over 1.6TB of disk space when loaded in Virtuoso 7.2., a columnar relational database that supports SPARQL.
at your SERVICE
The SERVICE keyword allows you to run part of your query on another SPARQL endpoint. For example you can combine the UniProt and Ensembl endpoints to get the coding exons for a protein.
select ?protein ?transcript ?exon ?order { ?protein rdfs:seeAlso ?transcript . ?transcript up:database database:Ensembl . SERVICE <http://www.ebi.ac.uk/rdf/services/ensembl/sparql/> { ?transcript obo:SO_translates_to ?peptide . ?peptide a ensemblterms:protein . ?transcript obo:SO_has_part ?exon ; sio:SIO_000974 ?orderedPart . ?orderedPart sio:SIO_000628 ?exon ; sio:SIO_000300 ?order . } }
Using http://sparql.uniprot.org
This website contains example queries with brief English explanations. You can download query results in a number of formats, including tab- or comma-separated for use in Excel, R and other tools.
SPARQL: A graph query language
SPARQL is a standard for querying a graph database and looks a little bit like SQL. It is optimised for pattern matching and cross data source queries. There are more than 40 compliant implementations of the latest version 1.1 recommendation.Hardware
Node 2
64 cpu cores 256 GB ram
8 TB consumer SSD
Node 1
64 cpu cores 256 GB ram 8 TB consumer SSD
Load Balancer = Apache mod_balancer
Many more endpoints on the web
Triples: Simple sentences for complicated data
RDF uses (many) simple ‘sentences’ to describe information. Each one consists of subject-predicate-object, making it a triple. Example:
<http://sparql.uniprot.org/> rdfs:comment ‘a free API for you’
🏖🎄 🎄
SERVICE
SERVICE
SPARQL endpoints: Communicating over HTTP
SERVICE
Your everyday tools:Accessing endpoints over HTTP
SPARQL API
SPARQL API
SPARQL API
SPARQL API
Powered by Vital-IT
Powered by Vital-IT
ChEMBL
& more
🏖 🎄14 17🏖 🎄16
Top Related