Federated Ontology Based Query System
-
Upload
george-sam -
Category
Engineering
-
view
131 -
download
1
Transcript of Federated Ontology Based Query System
![Page 1: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/1.jpg)
Integrated Ontology for Sports(Domains: Cricket, Football and Tennis.)
Database Interoperability Project
Abhishek Agrawal, George Sam, Hari Haran Venugopal, Noopur Joshi
![Page 2: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/2.jpg)
• Problem Statement and Motivation• Scope of the Project• Our Approach• Data sources – Scraper• Data Cleaning – Google refining, Karma • Ontology Creation – Using existing ontology to create Federated• Data Modeling – Karma Tool• Data Publishing – RDF and Triple Store Creation. • Data Extraction – Using OpenRDF for SPARQL Query • Future Work and Challenges• Conclusion
Outline:
2
![Page 3: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/3.jpg)
Problem Statement and Motivation
3
Why do we need Ontologies?- Need for constant, intelligent access to up-to-date, integrated and detailed information from
the Web- Helps to aggregate data from various sources
Why Federated Sports Ontology?- Helps to represent different sports and presents a common view
- Is easily extendible
- Intelligent information gathering- Scores: Who's winning, and how did the score change? - Schedules: Who's playing who, when, and where? - Standings: Who's in first place? Who's closest to qualifying ?
- Data Analysis - Statistics: How do the players and/or teams measure up against one another in various
categories?
- News: How do we combine editorial coverage of sports with all data feeds??
![Page 4: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/4.jpg)
Tennis- Players- Tournaments
Cricket - Players- Matches- Rankings
Football- Players- Leagues
Scope of the Project
4
![Page 5: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/5.jpg)
Data Extraction
Data Cleaning
Ontology Creation
Date Modeling
Querying using SPARQL
Our Approach
5
![Page 6: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/6.jpg)
Web Scraping: (web harvesting or web data extraction) is a computer software technique of extracting information from websites.
Data Source: Scraper
Scraping tools:
• Beautiful Soap – Simple methods, Unicode support and consists of parsers like lxml and html5lib.
• Jsoup – Java HTML Parser, WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.
• Chrome Web scrapper – Using this extension you can create a plan (sitemap) how a web site should be traversed and what should be extracted. Using these sitemaps the Web Scraper will navigate the site accordingly and extract all data.
6
![Page 7: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/7.jpg)
Data Cleaning
Data cleansing, data cleaning or data scrubbing: is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database.
Data Cleaning tools:
• Karma Tool – Karma offers a programming-by-example interface to enable users to define data transformation scripts that transform data expressed in multiple data formats into a common format.
• Google Refine – a power tool for working with messy data, cleaning it up, transforming it from one format into another.
7
![Page 8: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/8.jpg)
Ontology : Class Hierarchy
8
![Page 9: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/9.jpg)
Federated Ontology
9
![Page 10: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/10.jpg)
Data Modeling
Tool Used: KARMA (USC ISI)
• Browser based Data Integration/ Data Modeling tool • Advantage – Data Integration and Publishing is easy
• Steps:1. Load Ontologies and data sets2. Primitive Data Filtering3. Setting semantic types for attributes4. Building semantics for sports individually
• Karma intelligently creates semantic mappings for higher concepts.• Create URL for entities.
10
![Page 11: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/11.jpg)
Screenshot
11
![Page 12: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/12.jpg)
Data Publishing
• Available frameworks : OpenRDF, Protégé, ApacheJena.
• OpenRDF :
Browser based framework Integrated with KARMA Publish each Data set
1. JSON2. R2RML Model3. RDF
Create Triple Store for RDF Load RDF into OpenRDF Triplestore
12
![Page 13: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/13.jpg)
13
![Page 14: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/14.jpg)
Data Extraction
SPARQL
• Language used to extract information from RDF
• Query Based
SELECT *WHERE {?Subject ?Predicate ?Object}
14
![Page 15: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/15.jpg)
Future Work
1. Inclusion of other sports 2. Creating a web/ mobile based interface to query data3. Creating an application for university level players and teams4. Providing more specific information like :
• Details about a particular team from the year 1990 – 2014• Images of the players/teams• Details of all the matches played between two players/ teams
15
![Page 16: Federated Ontology Based Query System](https://reader031.fdocuments.net/reader031/viewer/2022020208/55cbd22ebb61eb56678b4640/html5/thumbnails/16.jpg)
References
• http://www.isi.edu/integration/karma/• http://phd.jabenitez.com/wp-content/uploads/2014/03/A-Practical-Guide-To-Building-OWL-Ontologies-Using-Protege-4.pdf• http://ict.siit.tu.ac.th/~sun/SW/Protege%20Tutorial.pdf• http://www.crummy.com/software/BeautifulSoup/• https://chrome.google.com/webstore/detail/web-scraper/jnhgnonknehpejjnehehllkliplmbmhn?hl=en • https://code.google.com/p/google-refine/• http://www.datacleansing.net.au/Data_Cleansing_Services 16