Data+Need=Hack

download Data+Need=Hack

of 24

Embed Size (px)

description

 

Transcript of Data+Need=Hack

  • Data+Need=Hack Nikos Manolis AgroKnow 5th July 2014 supported by: 2nd SemaGrow Hackathon (in conjunction with IRSS14)
  • The hack equation
  • The Hackathon Challenges How to help agricultural researchers to discover the resources they need? How to support food safety trainers in preparing their training courses by using high quality material?
  • Lot of Open Data
  • The Data How to access Application programming interface (API) POST, GET, PUT, DELETE Dump files SPARQL endpoints Harvesting from services (OAI-PMH) HTML / data scraping Crawling combination of the above
  • Green Learning Network (GLN)
  • Green Learning Network (GLN) Two main parts: Metadata acquisition and preparation: transform, correct, identification, filtering, post-processing, broken link-checking Maintenance of aggregated metadata: up-to-date metadata records, broken link-checking
  • Need for data aggregation and harmonization
  • GLN Search API (agINFRA powered) REST-based queries over harmonized information (result of metadata processing) Internal data model supported akif: describing educational resources for agriculture, http://domain/search-api/v1/akif/?q=*
  • ABN Search API (agINFRA powered) Agriculture Bibliographic Network (ABN) REST-based queries over aggregated metadata Internal data model supported agrif: describing bibliographic resources for food & agriculture (mainly from FAOs data): http://domain/search-api/v1/agrif/?q=*
  • Search options Simple search http://domain/search-api/v1/akif/?q=tomato Searching within specific fields http://BASE_URL/search- api/v1/akif/?languageBlocks.en.description=tomato Temporal http://BASE_URL/search-api/v1/akif/?creationDate=2013-04-16 Fetching specific items http://BASE_URL/search-api/v1/akif/COLLECTION/20296
  • Managing results Sorting results e.g ?q=*&sort_by=creationDate&sort_order=desc Facets e.g ?facets=set&facet_size=3 Pagination e.g ?q=sea&page_size=25&page=3 Resources related to food safety risk analysis: http://api.greenlearningnetwork.com/ search-api/v1/akif/?q=risk?analysis &set=aglrfaocdx,optunesco,faocapacityportal,oeorganiceprints,oei ntute
  • The agDataHarvester service Implements the OAI-PMH protocol to harvest metadata records from open data providers REST-based API Harvested dataset available through HTTP
  • AgDataHarvester parameters { "document_type": "harvesting_target", "harvesting_target": { "name":"Repository name", "description":Short Repository Description", "url":"OAI-PMH target URL", "type":"metadata format prefix", "frequency":hours } }
  • param.json { "document_type": "harvesting_target", "harvesting_target": { "name":"Indian Academy of Science", "description":"Indian Academy of Science", "url":"http://repository.ias.ac.in/cgi/oai2", "type":"mets", "frequency":24 } } curl -X POST -d@param.json http://'demo001':XXX@agro.ipb.ac.rs/agcouchdb { "ok": true, "id": " 5c56a3fa18fa21d2a85fd63cc9eb78ac ", "rev": "1- 19ef1210376df8f1695a32b53ecb963a" }
  • http://agro.ipb.ac.rs/agcouchdb/_design/datasets/_list/search/list?dataset.process_parameter_id= 5c56a3fa18fa21d2a85fd63cc9eb78ac
  • Using scientific information
  • The AGRIS case A collection of more than 7 million bibliographic references in agriculture AGRIS records come with AGROVOC descriptors An RDF-aware system the AGRIS database is exposed as RDF AGROVOC is the backbone to interlink to external sources of information (statistics, distribution maps, country profiles, germplasm data)
  • Agrotagger The purpose of the application is to index some Web resources (i.e. URLs) with the AGROVOC thesaurus The application can accept two different inputs: A text file with a list of URLs The output file of an Apache Nuts Web Crawler (which contains a list of discovered URLs, but in a specific format) The output is a set of connections between input URLs and some extracted AGROVOC URIs It can be a simple text file or a set of triples (NTRIPLES serialization)
  • AgroTagger output
  • Crawling the Web Objective: discovering Web resources in agriculture and interlinking them to AGRIS records Final Goal: when the system displays an AGRIS record, a list of related Web resources should be available to the user
  • DataSets and APIs http://wiki.agroknow.gr/agroknow/index.php/ SemaGrow_Hackathon#DataSets_and_APIs
  • thank you! Nikos Manolis AgroKnow manolisn@agroknow.gr