Learn more about Entity Extraction May 2014

13 april 2023

Overview of scenarios

Scenarios | Benefits of using entity extraction

Explore your contentExplore the enterprise graph

Discover insights about your productsMonitor trends

Discover new expertise inside your organizationFind the people with the right competences

Enhance search navigationFilter unstructured data

Scenarios | Benefits of using entity extraction

Prevent duplicate workFind similar content

Help your users find their dream homeExtract potential decision criteria from natural language

Visualize your content in a new wayEnrich documents with metadata

Discover new expertise inside your organizationFind the people with the right competences

Motivation

• Search for “usability”

• Only people that have tagged themselves with “usability” will be returned

• If we rely only on standard category types, database information, we get only what is in that person database

• But what if you could find also those that write, blog, or tweet about “usability”, without them being explicitly tagged with this category?

Enhanced search index

• The search index is enhanced with information about what topics, keywords, people, places, etc. authors write about

• Search for “usability”

• Get improved search results

Discover competences people have

Discover interests people have and share

Gather all people writing about the same topic

Enhanced expertise search

Enhance search navigationFilter unstructured data

Motivation

•Search for “yoga”

•Lots of semi-structured documents (HTML, Word, PDF, etc)

•Some are missing administrative metadata such as author, date last saved

•Some are missing descriptive metadata such as title, topic, tags, category

No proper title

Will you go through all results to find the relevant ones?

Extract named entities and metadata

•Identity and add to document information such as title, keywords, author, summary, subsection titles

New filters and improved metadata

• Search for “yoga”

• The newly created data is used to filter documents and improve relevance

Improved visual results (documents have titles)

Improved relevance (titles and subsection titles are ranked higher than body text)

Possibility to filter on authors, topics, places, etc (use the filter rather than pagination)

Explore your contentExplore the enterprise graph

Motivation

• Search for ‘Copenhagen’ on your intranet

• Ambiguous query

• Lots of results

• Missing context

• What is the user intent with this query?

Relationship Extraction for Entities

• Extract relations from unstructured data

• Built upon named entity recognition

• Relationship extraction enables us to do build a graph search solution with unstructured data

Lorem ipsum dolor sit amet Sarah Jensen, consectetur adipiscing elit Philadelphia et Copenhagen. Fusce nec placerat libero. Suspendisse nibh quam, sodales in posuere ac, porttitor non erat. Sed semper sodales varius. Fusce elementum Findwise, enim sed semper ultrices Carl Sorensen, nisl ligula consectetur sapien, non feugiat sapien enim id quam. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Nullam egestas non velit nec accumsan. Google at orci augue.

Proin tempus tristique arcu, a lobortis diam tempus ut. Nam arcu risus, tempor nec elit eu, Anders Anderson posuere viverra mauris. Donec tempor in magna in mollis. Suspendisse in elementum magna. Findwise in faucibus sapien, et Microsoft. Fusce ullamcorper malesuada sapien, sit amet viverra odio bibendum sed. Fusce molestie vel tortor nec eleifend. Nullam et leo ac felis iaculis convallis.





Sarah Jensen

Philadelphia Copenhagen

Google

Anders Anderson

Findwise

Microsoft

Carl Sorensen

Sarah Jensen

Philadelphia Copenhagen

Google

Anders Anderson

Findwise

Microsoft

Carl Sorensen

Suggestions as you type, using the graph

• Search for ‘Copenhagen’ on your intranet

Narrow down search results directly from the search box

Disambiguate the query by selecting one of the different type of suggestions (consultants, projects, partners)

Navigate directly to 2nd or higher level connections on the graph

Business Intelligence, using the graph

•Search for: ’Customers where we have done Projects based on Google technology with at least 1000 hour consulting time and a revenue of more than 1 MDKK and the word ”e-commerce” is mentioned many times in the Project Documentation’

Business Intelligence

Project numbers

(worked hours)

Financial numbers (revenue,

profits)

Project Documen

tation

How would this query look like in SQL?

Discover insights about your productsMonitor trends

Motivation

• Search for the product name ‘Tusin’

• Product is mentioned in different sources, under different contexts (user feedback, marketing material, internal specifications), and using different terminologies (on social media compared to website)

• How to keep track of all information?

• How easy is it to identify trends?

Identify the same product in different contexts

• Identify the entity denoting the same product from different sources

Internal name for the same product

InternalProductionSpecification

Product Marketing Material from Website

Feedback about the marketing material / the experience of the user Mentions the

product

User feedback

Metric

InternalIssues ManagementSystem

Monitor trends on your products

• Search for ‘Tusin’

or

• Remember it as a search term and create a dashboard with content driven by search

Monitor trends

Reduce time for replying customers or users

Stay competitive

Prevent duplicate workFind similar content

Motivation

• Just started working on a new material in a construction company

• What is the cost of duplicating the work?

• Will you perform a search on previous work?

• What if another team has a similar initiative?

Enhanced Search Index

• Automatically extract entities and representative keywords from content

Documents

Announcements

Public EmailsNewsfeed

Steel Structures

Glass Type 1.A

Project ANSATorso Tower

Polyethylene Terephthalate

Prevent duplicate work

• Get suggestions of similar work based on extracted entities

Identify similar work early in the project

Identify potential collaborations

Prevent duplicate work

Visualize your content in a new wayEnrich documents with metadata

Motivation

• Search for “financial results Copenhagen”

• Search results: documents

• Clicking on a result opens the document

• Does this search answer the user question?

Identify entities in documents

• Identify locations, revenues, departments, etc from semi-unstructured data

• Combine with data in spreadsheets or databases

Documents

Database

Spreadsheets

Answer

Visualize your content in a new way

• Search for “financial results Copenhagen”

• Additional information shown

• Can show computed results

Enrich documents with metadata

Visualise the content

Compute answers

Make comparisons

Create dashboards based on searches

Help your users find their dream homeExtract potential decision criteria from natural language

Motivation

• Searching for an ‘apartment with

a good view, located in central Copenhagen, well sized bathroom, close to shopping outlets, preferably with 3 rooms’

• The apartment information consists of mostly structured data (m2, number of rooms, post number, floor)

• Can we improve the search experience?

Long list of static filters

Search query consists of an area (post code, street etc.)

Understanding what the users want

• Here’s how Facebook helps users define their queries:

• Can we interpret the query ‘apartment with a good view,

located in central Copenhagen, well sized bathroom, close to shopping outlets, preferably with 3 rooms’ ?

Understanding what the users want

• Searching for ‘apartment with a

good view, located in central Copenhagen, well sized bathroom, close to shopping outlets, preferably with 3 rooms’

• Apartments with 3 rooms are shown in search results but those with less are not excluded

• Those that mention shopping outlets (such as Netto or Fakta) are boosted

Interpret natural language

Boost results based on ‘preferences’

Better search experience

Increase user satisfaction

Boost those with 3 rooms(boost on map can be represented by a bigger pointer)

Free text search

Behind the scene

Entity Extraction

Entity extraction is the process of identifying named entities (such as locations, people, companies) in a block of text

Add structure to unstructured data

New possibilities of interpreting the data

Improve data quality and findability of documents

Reduce time spent by users manually structuring content

Entity Extraction Framework

Combines dictionaries with trained model and regular expressions based on needs

Scalable, adaptable and extendable framework

Automatically enrich documents with named entities

Iterative approach to continuously improve accuracy

Built by Findwise as a reply to our customer requirements and vision

Entity Extraction Framework

AutotagEdit

Evaluate Incremental train

90% accuracyThe Danish and Swedish

entity extractors can reach 90% accuracy

Graphical Annotation Tool

Visual representation of annotated documents

Annotate more documents to improve precision

Easy-to-use, point and click interaction

Built by Findwise as a reply to our customer requirements and visions

Graphical Annotation Tool

Anders Hä[email protected]

Learn more about Entity Extraction May 2014

Technology

Transcript of Learn more about Entity Extraction May 2014