Learn more about Entity Extraction May 2014
-
Upload
anders-haeggdahl -
Category
Technology
-
view
323 -
download
1
description
Transcript of Learn more about Entity Extraction May 2014
13 april 2023
Overview of scenarios
Scenarios | Benefits of using entity extraction
Explore your contentExplore the enterprise graph
Discover insights about your productsMonitor trends
Discover new expertise inside your organizationFind the people with the right competences
Enhance search navigationFilter unstructured data
Scenarios | Benefits of using entity extraction
Prevent duplicate workFind similar content
Help your users find their dream homeExtract potential decision criteria from natural language
Visualize your content in a new wayEnrich documents with metadata
Discover new expertise inside your organizationFind the people with the right competences
Motivation
• Search for “usability”
• Only people that have tagged themselves with “usability” will be returned
• If we rely only on standard category types, database information, we get only what is in that person database
• But what if you could find also those that write, blog, or tweet about “usability”, without them being explicitly tagged with this category?
Enhanced search index
• The search index is enhanced with information about what topics, keywords, people, places, etc. authors write about
• Search for “usability”
• Get improved search results
Discover competences people have
Discover interests people have and share
Gather all people writing about the same topic
Enhanced expertise search
Enhance search navigationFilter unstructured data
Motivation
•Search for “yoga”
•Lots of semi-structured documents (HTML, Word, PDF, etc)
•Some are missing administrative metadata such as author, date last saved
•Some are missing descriptive metadata such as title, topic, tags, category
No proper title
Will you go through all results to find the relevant ones?
Extract named entities and metadata
•Identity and add to document information such as title, keywords, author, summary, subsection titles
New filters and improved metadata
• Search for “yoga”
• The newly created data is used to filter documents and improve relevance
Improved visual results (documents have titles)
Improved relevance (titles and subsection titles are ranked higher than body text)
Possibility to filter on authors, topics, places, etc (use the filter rather than pagination)
Explore your contentExplore the enterprise graph
Motivation
• Search for ‘Copenhagen’ on your intranet
• Ambiguous query
• Lots of results
• Missing context
• What is the user intent with this query?
Relationship Extraction for Entities
• Extract relations from unstructured data
• Built upon named entity recognition
• Relationship extraction enables us to do build a graph search solution with unstructured data
Lorem ipsum dolor sit amet Sarah Jensen, consectetur adipiscing elit Philadelphia et Copenhagen. Fusce nec placerat libero. Suspendisse nibh quam, sodales in posuere ac, porttitor non erat. Sed semper sodales varius. Fusce elementum Findwise, enim sed semper ultrices Carl Sorensen, nisl ligula consectetur sapien, non feugiat sapien enim id quam. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Nullam egestas non velit nec accumsan. Google at orci augue.
Proin tempus tristique arcu, a lobortis diam tempus ut. Nam arcu risus, tempor nec elit eu, Anders Anderson posuere viverra mauris. Donec tempor in magna in mollis. Suspendisse in elementum magna. Findwise in faucibus sapien, et Microsoft. Fusce ullamcorper malesuada sapien, sit amet viverra odio bibendum sed. Fusce molestie vel tortor nec eleifend. Nullam et leo ac felis iaculis convallis.
Lorem ipsum dolor sit amet Sarah Jensen, consectetur adipiscing elit Philadelphia et Copenhagen. Fusce nec placerat libero. Suspendisse nibh quam, sodales in posuere ac, porttitor non erat. Sed semper sodales varius. Fusce elementum Findwise, enim sed semper ultrices Carl Sorensen, nisl ligula consectetur sapien, non feugiat sapien enim id quam. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Nullam egestas non velit nec accumsan. Google at orci augue.
Proin tempus tristique arcu, a lobortis diam tempus ut. Nam arcu risus, tempor nec elit eu, Anders Anderson posuere viverra mauris. Donec tempor in magna in mollis. Suspendisse in elementum magna. Findwise in faucibus sapien, et Microsoft. Fusce ullamcorper malesuada sapien, sit amet viverra odio bibendum sed. Fusce molestie vel tortor nec eleifend. Nullam et leo ac felis iaculis convallis.
Lorem ipsum dolor sit amet Sarah Jensen, consectetur adipiscing elit Philadelphia et Copenhagen. Fusce nec placerat libero. Suspendisse nibh quam, sodales in posuere ac, porttitor non erat. Sed semper sodales varius. Fusce elementum Findwise, enim sed semper ultrices Carl Sorensen, nisl ligula consectetur sapien, non feugiat sapien enim id quam. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Nullam egestas non velit nec accumsan. Google at orci augue.
Proin tempus tristique arcu, a lobortis diam tempus ut. Nam arcu risus, tempor nec elit eu, Anders Anderson posuere viverra mauris. Donec tempor in magna in mollis. Suspendisse in elementum magna. Findwise in faucibus sapien, et Microsoft. Fusce ullamcorper malesuada sapien, sit amet viverra odio bibendum sed. Fusce molestie vel tortor nec eleifend. Nullam et leo ac felis iaculis convallis.
Sarah Jensen
Philadelphia Copenhagen
Anders Anderson
Findwise
Microsoft
Carl Sorensen
Sarah Jensen
Philadelphia Copenhagen
Anders Anderson
Findwise
Microsoft
Carl Sorensen
Suggestions as you type, using the graph
• Search for ‘Copenhagen’ on your intranet
Narrow down search results directly from the search box
Disambiguate the query by selecting one of the different type of suggestions (consultants, projects, partners)
Navigate directly to 2nd or higher level connections on the graph
Business Intelligence, using the graph
•Search for: ’Customers where we have done Projects based on Google technology with at least 1000 hour consulting time and a revenue of more than 1 MDKK and the word ”e-commerce” is mentioned many times in the Project Documentation’
Business Intelligence
Project numbers
(worked hours)
Financial numbers (revenue,
profits)
Project Documen
tation
How would this query look like in SQL?
Discover insights about your productsMonitor trends
Motivation
• Search for the product name ‘Tusin’
• Product is mentioned in different sources, under different contexts (user feedback, marketing material, internal specifications), and using different terminologies (on social media compared to website)
• How to keep track of all information?
• How easy is it to identify trends?
Identify the same product in different contexts
• Identify the entity denoting the same product from different sources
Internal name for the same product
InternalProductionSpecification
Product Marketing Material from Website
Feedback about the marketing material / the experience of the user Mentions the
product
User feedback
Metric
InternalIssues ManagementSystem
Monitor trends on your products
• Search for ‘Tusin’
or
• Remember it as a search term and create a dashboard with content driven by search
Monitor trends
Reduce time for replying customers or users
Stay competitive
Prevent duplicate workFind similar content
Motivation
• Just started working on a new material in a construction company
• What is the cost of duplicating the work?
• Will you perform a search on previous work?
• What if another team has a similar initiative?
Enhanced Search Index
• Automatically extract entities and representative keywords from content
Documents
Announcements
Public EmailsNewsfeed
Steel Structures
Glass Type 1.A
Project ANSATorso Tower
Polyethylene Terephthalate
Prevent duplicate work
• Get suggestions of similar work based on extracted entities
Identify similar work early in the project
Identify potential collaborations
Prevent duplicate work
Visualize your content in a new wayEnrich documents with metadata
Motivation
• Search for “financial results Copenhagen”
• Search results: documents
• Clicking on a result opens the document
• Does this search answer the user question?
Identify entities in documents
• Identify locations, revenues, departments, etc from semi-unstructured data
• Combine with data in spreadsheets or databases
Documents
Database
Spreadsheets
Answer
Visualize your content in a new way
• Search for “financial results Copenhagen”
• Additional information shown
• Can show computed results
Enrich documents with metadata
Visualise the content
Compute answers
Make comparisons
Create dashboards based on searches
Help your users find their dream homeExtract potential decision criteria from natural language
Motivation
• Searching for an ‘apartment with
a good view, located in central Copenhagen, well sized bathroom, close to shopping outlets, preferably with 3 rooms’
• The apartment information consists of mostly structured data (m2, number of rooms, post number, floor)
• Can we improve the search experience?
Long list of static filters
Search query consists of an area (post code, street etc.)
Understanding what the users want
• Here’s how Facebook helps users define their queries:
• Can we interpret the query ‘apartment with a good view,
located in central Copenhagen, well sized bathroom, close to shopping outlets, preferably with 3 rooms’ ?
Understanding what the users want
• Searching for ‘apartment with a
good view, located in central Copenhagen, well sized bathroom, close to shopping outlets, preferably with 3 rooms’
• Apartments with 3 rooms are shown in search results but those with less are not excluded
• Those that mention shopping outlets (such as Netto or Fakta) are boosted
Interpret natural language
Boost results based on ‘preferences’
Better search experience
Increase user satisfaction
Boost those with 3 rooms(boost on map can be represented by a bigger pointer)
Free text search
Behind the scene
Entity Extraction
Entity extraction is the process of identifying named entities (such as locations, people, companies) in a block of text
Add structure to unstructured data
New possibilities of interpreting the data
Improve data quality and findability of documents
Reduce time spent by users manually structuring content
Entity Extraction Framework
Combines dictionaries with trained model and regular expressions based on needs
Scalable, adaptable and extendable framework
Automatically enrich documents with named entities
Iterative approach to continuously improve accuracy
Built by Findwise as a reply to our customer requirements and vision
Entity Extraction Framework
AutotagEdit
Evaluate Incremental train
90% accuracyThe Danish and Swedish
entity extractors can reach 90% accuracy
Graphical Annotation Tool
Visual representation of annotated documents
Annotate more documents to improve precision
Easy-to-use, point and click interaction
Built by Findwise as a reply to our customer requirements and visions
Graphical Annotation Tool
Anders Hä[email protected]