Download - Entity Search Engine

Transcript
Page 1: Entity Search Engine

ENTITY SEARCH ENGINE : A NEW SEARCH TOOL

Speaker : Tanmay Mondal , MSLIS 2013-2015

Indian Statistical Institute , Bangalore Documentation Research and Training Centre Seminar ( 1 ) - 2014

Page 2: Entity Search Engine

OverviewOverview

Present ApproachPresent Approach

Entity SearchEntity Search

Benefit of Entity SearchBenefit of Entity Search

Entity & Its FacetsEntity & Its Facets

Main Work of ESEMain Work of ESE

Popular Entity SearchPopular Entity Search

OKKAM-OKKAM-Enabling a Web of EntitiesEnabling a Web of Entities

Workflow of OkkamWorkflow of Okkam

My LibraryMy Library

ReferencesReferences

Page 3: Entity Search Engine

Present Approach

● Information is everywhere & it is growing exponentially

● A traditional information extraction approach is to scan every

document in any collection

● As document collection is the set of all web pages indexed by a

search engines

● Time consuming for users for getting pin-pointed information

Page 4: Entity Search Engine

Person

Location Organization Nationality Religion Product

Phone Number

Email Address/URL

Distance

Date

Time

Money Generic Number

For specific Information

Problem of identifying and linking / grouping different manifestations of the same real world object

Page 5: Entity Search Engine

Web of Documents Web of Entites

Cluster the records that correspond to same entity

Page 6: Entity Search Engine

Entity Search

● Entity refers to any object or a thing that can be uniquely identified in

the world

● It's a better match search queries with a database containing hundreds of

millions of "entities"● Each entity is in relation with many entites

● The answer entities have specific information & identifying the right

relationship among the entities● Semantic or faceted search on entities

Page 7: Entity Search Engine

Why ?

● When people use retrieval systems they are often not searching for

documents or text passages● Summarization of entities and concepts

● The named entities (persons, organizations, locations, products...) play a

central role in answering such information needs

● At least 20-30% of the queries submitted to Web SE are simply entities

● ~71% of Web search queries contain named entities

**Source - Building Taxonomy of Web Search Intents for Name Entity

Queries by Xiaoxin Yin & Sarthak Shah

Page 8: Entity Search Engine

Benefit of Entity Search

● Entities are often categorized into a taxonomy

● Primary task of the user is often to make a decision

● More structured than document based

● Entity is associated with the same URI across the different repositories

● Entity Information Integration● More understandable by Human

● Increase precision & less Time Consuming

Page 9: Entity Search Engine

Entity & Its Facets

● An entity must be distinguished from other entities Can be anything

including an abstract thing like Diseases ,Imaginary art etc.

● Type of an entity refers to a generic class into which the given entity is

classified.

● Attribute refers to a property (predicate) associated with an entity.

● Value refers to the value of an attribute (for a given entity).

● Relation provides more information with many entites

● Entity, Prof. S.R. Ranganathan is a person , IBM is an organization

Page 10: Entity Search Engine

Main Work of ESE

● Entity Retrieval : Entity search engines can return aranked list of entities most relevant for a user query

● Entity Relationship / Fact Mining and Navigation : It discover interesting relationships / facts about the entities associated with their queries

● Prominence Ranking : Detect the popularity of an entity and enable users to browse entities in different categories

● Entity Description Retrieval : Entity description blocks for each entity information about an object in a web page is generally grouped together as an object block

Page 11: Entity Search Engine

Popular Entity Search

● Product search-Various Products like Books, Electronics, Clothes, etc.

● People search-Experts, Friends, Profile of famous persons, etc.

● Location search-Travel, Address ,Business, Govt Offices, etc.

Page 12: Entity Search Engine
Page 13: Entity Search Engine

Idea about entity search engine

Page 14: Entity Search Engine

Main Work of ESE

● Entity Retrieval : Entity search engines can return aranked list of entities most relevant for a user query

● Entity Relationship / Fact Mining and Navigation : It discover interesting relationships / facts about the entities associated with their queries

● Prominence Ranking : Detect the popularity of an entity and enable users to browse entities in different categories

● Entity Description Retrieval : Entity description blocks for each entity information about an object in a web page is generally grouped together as an object block

Page 15: Entity Search Engine

Various ESE

● Freebase-http://www.freebase.com/● Sindice-http://sindice.com/● Geneview-http://bc3.informatik.hu-berlin.de/● Okkam-http://www.okkam.org/● WolframAlpha-http://www.wolframalpha.com/● Yatedo-http://www.yatedo.com/● GeoNames-http://www.geonames.org/● Dbpedia-http://dbpedia.org/About● EntityCube-http://entitycube.research.microsoft.com/ etc......

Page 16: Entity Search Engine

OKKAM-Enabling a Web of Entities

● Any collection of data and information about any type of entities

published on the Web can be integrated into a single virtual,

decentralized, open knowledge base.

● It  leads  to  a  faster,  more  efficient  and  more  precise  way  to 

deal with the flood of information available on the Web today

Entities should not be multiplied beyond necessity

Page 17: Entity Search Engine

OKKAM ENS

● OKKAM  ENS  is  for  entity  search,  where  storage,  indexing and matching technology was built for finding an entity given its description

● Every entity (individual, instance, “thing”) is assigned a global identifier, ideally unique

● More than 7.5 million entity repository with more structured formEntity identifiers should not be multiplied beyond necessity

Page 18: Entity Search Engine

Project Partners

● University of Trento, Italy (Co-Ordinator) ● L3S Research Center, Germany● SAP Research, Germany● Expert System, Italy● Elsevier B.V., Netherlands● Europe Unlimited SA, Belgium● National Microelectronics Application Center (MAC), Ireland● Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland● DERI Galway, Ireland● University of Malaga, Spain● INMARK, Spain● Agenzia Nazionale Stampa Associata (ANSA), Italy

Page 19: Entity Search Engine
Page 20: Entity Search Engine

Sources Of Information

● Wikipedia Provides lists of countries, cities, members of particulars

domains which is very common for our search query

● GeoNames contains over 10 million geographical names and consists of

over 9 million unique features of 2.8 million populated places and 5.5

million alternate names

● OkkamDBManager Another important information source for OKKAM

can be generic databases like extranets, online shops or publishing

houses

● OkkamManualEntry Another solution we provide to insert new entities

is the manual case

Page 21: Entity Search Engine

Data extracted from any unstructed sources more effectively

Page 22: Entity Search Engine

Cogito Semantic Technology

● Semantic analysis engine and complete semantic

network for a complete understanding of text

● Transforming unstructured information into structured

data

● Identifies the most relevant concepts

● Interprets the meaning of texts

● Precisely extracts information

● Automatically connet entites extracted from sources

Page 23: Entity Search Engine

SensigrafoSensigrafo● Enables the disambiguation of terms

● It allows Cogito to understand the meaning of words and

context

● Extraction of data and metadata

● Product development, competitive intelligence,marketing

,Finance, Media & Publishing, Oil & Gas, Life Sciences &

Pharma, Government and Telecommunications and many

activities where knowledge sharing is critical

● More than 1 million concepts,more than 4 million

relationships

Page 24: Entity Search Engine
Page 25: Entity Search Engine
Page 26: Entity Search Engine

Workflow of OkkamWorkflow of Okkam

● Storage: A scalable repository of entity profiles, in which billions of entities are assigned an ID and a profile, to distinguish one entity from another

● Matching: Requests from client applications arrive in the form of a bag of keywords or a collection of name value pairs (unstructured or semi-structured queries

● ID storage and management: stores, maintains and makes available for reuse IDs (URIs) for anything which is named in a networked environment

● Lifecycle Management: It takes care of the evolution Storage of the repository and of all entity profiles through different time

Page 27: Entity Search Engine

Entity Query & Matching in Okkam

Page 28: Entity Search Engine
Page 29: Entity Search Engine
Page 30: Entity Search Engine

ISI

Page 31: Entity Search Engine

Wolfram|Alpha

● Wolfram|Alpha is an engine for computing answers and

providing knowledge

● It generates output by doing computations from its own

internal knowledge base, instead of searching the

web and returning links

● It is an online service that answers factual queries

directly by computing the answer

● Make all systematic knowledge immediately computable

and accessible to everyone

Page 32: Entity Search Engine

5 nearest stars

Page 33: Entity Search Engine
Page 34: Entity Search Engine

How many newspapers are available in the globe

Page 35: Entity Search Engine
Page 36: Entity Search Engine

Overall Difficulties

●  The number of entities could be huge     

●  Information Redundancy

● Information Fragmentation

● Entity Information Integration

●  A single algorithm for fine­grained entity matching may not exist

●  Store and retrieve using IR based techniques 

●  Matching on very large datasets

● Natural Language Processing

Page 37: Entity Search Engine

Contd...

● Availability of a knowledge base is less● Multi domain entites ‐● Deduplication Problem● Some  names  and  relationships  could  be  incorrect  &  the 

information may not be update­to­date ● Name disambiguation is still largely unsolved● ESEs are at early age

Creating knowledge bases  from  text  and unstructured data  is  the goal

 

Page 38: Entity Search Engine
Page 39: Entity Search Engine

My Library

● Entites are for UseEntites are for Use

● Each Entity has its own attributes & relationEach Entity has its own attributes & relation

● Every Entity has its importanceEvery Entity has its importance

● Save the Time for finding out EntitesSave the Time for finding out Entites

● Entites are growing rapidlyEntites are growing rapidly

Page 40: Entity Search Engine

References

1. Statistical Entity Extraction from Web by Zaiqing Nie, Ji-Rong Wen, and Wei-Ying Ma, Fellow, IEEE2. State of the art in IE, overview, comparison and analysis by Stefan Dumitrescu ,PhD Student3. The Entity Name System: Enabling the Web of Entities by Heiko Stoermer, Themis Palpanas, George Giannakopoulos,University of Trento4. Hybrid entity clustering using crowds and data by Jongwuk Lee, Hyunsouk Cho,Jin-Woo Park,Young-rok Cha,Seung-won Hwang, Zaiqing Nie ,Ji-Rong Wen5. Supporting Entity Search:A Large-Scale Prototype Search Engine byTao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang

Page 41: Entity Search Engine

References...

6. OKKAM: Enabling a Web of Entities by Paolo Bouquet ,Heiko Stoermer ,Daniel Giacomuzzi ,University of Trento7. Entity Data Management in OKKAM by Themis Palpanas 1 , Junaid Chaudhry 2 , Periklis Andritsos 1 , Yannis Velegrakis 1 ,1 University of Trento,2 Ajou University8. SPACE AND TIME ENTITY REPOSITORY Human-enhanced time-awaremulti media search funded by EU07 See :http://issuu.com/cubrikproject/docs/issuu.cubrik.d41.unitn.wp4.v1.09. http://api.okkam.org/search/10. http://www.wolframalpha.com/

Page 42: Entity Search Engine