In Search of a Semantic Book Search Engine: Are We There Yet?
-
Upload
irfan-ullah -
Category
Software
-
view
132 -
download
0
Transcript of In Search of a Semantic Book Search Engine: Are We There Yet?
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
In Search of a Semantic Book Search Engine on the Web:
Are We There Yet?
ByIrfan Ullah and Shah KhusroUniversity of Peshawar, Pakistan
5th Computer Science On-line Conference 20161
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
In this Presentation• Abstract• Introduction• Survey of the Literature• Extracting Structure & Indexing Books• Searching and Ranking Books• Book Recommendations• Fine-grained Access to Information in Books
• Discussion and Analysis• Conclusions• References
2
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
Abstract• Books – Valuable source of knowledge and learning• Position• Web Information Retrieval (IR) techniques for book retrieval • Existing searching solutions treat books as plaintext collections• Inaccurate and imprecise book search results
• Solution• Books are different from web pages• Structural semantics and logical connections in their content for
searching, ranking and recommendations• Fine-grained access to information in books e.g. tables, figures
3
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
Introduction• Web Information Retrieval• Rich text collections with explicit hypertextual structure• Used in searching and ranking web pages• Books lack this graph-like structure – Problem
• Books are well-organized and logically connected• Presenting a graph-like structure – can be used in searching,
ranking, and recommending books• But visible to Human readers only• Problem – Need to be machine understandable and processable
http://talk.payloadz.com/wp-content/uploads/2013/10/Selling-Books-Online-660x320.jpg
4
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
Introduction• Solution – Semantic Book Search Engine
• What is Required?• A more in-depth and comprehensive book structure ontology • Domain level ontologies to understand book contents in different
domains• Connecting books in graph-like manner
• Why?• Better searching, ranking, and recommendations• Increase user satisfaction• Promoting objectives of other stakeholders
5
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
Survey of the Literature• Extracting Structure & Indexing Books• Many Research Initiatives and Conferences
• INEX, ICDAR, and BooksOnline • Indexing books’ valuable parts [2].• Book layout analysis for extracting TOC [3] and other parts [8]• Resurgence software for detecting different parts [4-6]• Rule-based and SVM-based methods extracting TOC [7]• Detecting and parsing TOC pages [9], index pages [9] through
classical methods [10, 11] and using trailing page whitespace methods [9]
• Required• Connecting book title with other parts• Better book indexing, ranking and recommendations 6
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
Survey of the Literature• Searching and Ranking Books• Ranking authors by expert finding to rank books [12]
• “Authors capture an important aspect of relevance [12]” • Read books written by popular experts in the field
• No bags-of-words models• Ranking by what is actually inside books [13]• Thesaurus, reference works and ontologies• Helping readers in getting useful insights into text and decide about
the relevancy of the book
7
www.vectastock.comwww.python-course.eu
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
Survey of the Literature• Searching and Ranking Books• Digitized Books
• By combining and comparing scores for book headings, TOC and book titles [2].
• Digitization Projects – Limited/No Ranking• Project Gutenberg – sorting results • Google Books – 100 (unknown) ranking signals [1]
• Google Patents [15,16] – Not implemented YET• Books could be connected through references [14] – Limited
• Need• Using Semantic Web and Ontologies
8
prepa3.sems.udg.mx
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
Survey of the Literature• Book Recommendations• Available Recommenders
• BReK12 – readability levels of K-12 readers + book contents [21] • BReT – K-12 teachers in finding relevant books for K-12 students [22]• K3Rec – K-3 readers, their parents, and teachers [23]• Using near and partial duplicates, citation analysis, and metadata
similarities [24]. • User modeling – information from Social Web [17]. • Book reviews [18, 19]. • Semantic Web and ontologies [25-27] • Limited – Use only book descriptions not the actual content
• Required• True content-based semantic book recommender 9
bookshelvesofdoom.blogs.com
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
Survey of the Literature• Fine-grained access to information in books• Retrieving similar and related tables, figures, images, algorithms,
equations, quotations, and passages
• Augmenting tables with different data sources to restore back the lost semantics [28].
• Same is the case with figures and images
• CiteSeer – document, author, and table search
• Need• Exploitation of book structural semantics and logical connections
10
2.bp.blogspot.com
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
Discussion & Analysis• Indexing books• Multi-field inverted index should be used [29].
• Book search engine should be able to understand• The nature of books, their contents, and user intensions • E.g., fiction and novels, readers may be interested in different stratas
including the plot, the idea, and the composition of work [30].
• Required• Semantic indexing by exploiting book structural semantics• Indexing fictions/novels, and • Indexing books using metadata• Book reviews
11
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
Discussion & Analysis• Searching books• Search Engine Results Page (SERP)
• Too many relevant and irrelevant results – Information Overload [31]
• Required – User Interface• Provide more relevant results• Robust, non-ambiguous, understandable and relevant to information
need• Present results in a manner that augments user understanding
12
davidpoulos.com
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
Discussion & Analysis• Ranking and recommending books• Using ontologies and the actual book contents• Exploiting structural semantics and logical connections in book
contentss
• Problem• Existing ontologies (JeromeDL, and DocBook) are limited in fully
describing books
• Required• Comprehensive book structure and several domain-level
ontologies• Ontology Engineering and Ontology Learning [32] along with
involving domain experts 13
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
Discussion & Analysis• Finding Related tables and figures• Table extraction and searching
• Summarize, elaborate and compare tables • Interpret tables accurately• Structure and semantic characteristics of book tables of all possible layout
variations • Using online knowledge sources in annotating tables [28]• Using ontologies in indexing, searching, and ranking tables
• Figure extraction and searching• Relating figures using visual similarities and contextual clues• To retrieve books that present images and figures on a certain
concept or topic
14
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
Conclusions• Book Search and Retrieval• Has been focused by research initiatives and academic research• Several retrieval methods have been proposed• Several book ontologies have been developed for indexing,
ranking, and recommending books• Still we are miles away from the ideal system
• Need• Further research initiatives for discovering book structural
semantics and its use in searching, ranking, and recommending books
15
Com
pute
r Sci
ence
Onl
ine
Conf
eren
ce 2
016
Conclusions• Need – Semantic book search engine• Treat books different from other web documents
• Use their structural semantics and logical connections in searching, ranking, and recommendations
• Comprehensive book structure ontology
• Domain-level ontologies• To process book contents in different domains
• To create a graph-like structure of books to be used by PageRank type algorithms
• To allow fine-grained access to information in books like tables, figures, algorithms, equations, similar passages etc.
• To fulfill the information needs of readers and other stakeholders16