Building a Foundation for
Info Apps
Tom ReamyChief Knowledge Architect
KAPS Group
Program Chair – Text Analytics World
Knowledge Architecture Professional Services
http://www.kapsgroup.com
2
Agenda
Introduction
A Semantic Platform – What and Why
Text Analytics – What and Why– Getting Started with Text Analytics
Building on the Platform:– Search– Range of Apps
Conclusion
3
Introduction: KAPS Group
Knowledge Architecture Professional Services – Network of Consultants Applied Theory – Faceted taxonomies, complexity theory, natural
categories, emotion taxonomies Services:
– Strategy – IM & KM - Text Analytics, Social Media, Integration– Taxonomy/Text Analytics development, consulting, customization– Text Analytics Quick Start – Audit, Evaluation, Pilot– Social Media: Text based applications – design & development
Partners – Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics
Clients: – Genentech, Novartis, Northwestern Mutual Life, Financial Times,
Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc.
Presentations, Articles, White Papers – www.kapsgroup.com
4
Building a Foundation for Info AppsWhat is a Semantic Platform?
Semantic Layer = Taxonomies, Metadata, Vocabularies + Text Analytics – adding cognitive science
Technology Layer– Search, Content Management, SharePoint, Intranets
Publishing process, multiple users & info needs– Hybrid human automatic structure (tagging)
Infrastructure – Not an Application– Business / Library / KM / EA and IT
Building on the Foundation– Info Apps (Search-based Applications)
Foundation of foundation – Text Analytics
5
Building a Foundation for Info AppsWhy a Semantic Platform
Search Failed – lack of semantics– Results of Find Wise survey – deep dissatisfaction– Ten years of development = ?
Content Management under-performing – lack of semantics Taxonomy and Metadata – a solution but - Failed
– Taxonomy – formal model of a domain– Library science good for some things – indexing, etc.
Semantics is about language, meaning, information – And structure = taxonomy Plus– Need cognitive science – how people think – Text Analytics
Solution = Strategic Vision + Quick Start
6
Building a Foundation for Info AppsText Analytics Features
Noun Phrase Extraction / Fact Extraction– Catalogs with variants, rule based dynamic– Relationships of entities – Ontologies of people-organizations, etc.
Sentiment Analysis – Products and Phrases– Statistics, Dictionaries, & rules – Positive and Negative
Summarization – replace snippets Auto-categorization – built on a taxonomy
– Training sets, Terms, Semantic Networks– Rules: AND, OR, NOT, DIST, PARAGRAPH, SENTENCE– Foundation – subjects, disambiguation, add intelligence to all
Ontologies – fact extraction + reasoning about relationships Text Mining – NLP, machine learning, predictive analytics
Building a Foundation for Info AppsAdding Structure to Unstructured Content How do you bridge the gap – taxonomy to documents? Tagging documents with taxonomy nodes is tough
– And expensive – central or distributed
Library staff –experts in categorization not subject matter– Too limited, narrow bottleneck– Often don’t understand business processes and business uses
Authors – Experts in the subject matter, terrible at categorization– Intra and Inter inconsistency, “intertwingleness”– Choosing tags from taxonomy – complex task– Folksonomy – almost as complex, wildly inconsistent– Resistance – not their job, cognitively difficult = non-compliance
Text Analytics is the answer(s)!
7
Building a Foundation for Info AppsAdding Structure to Unstructured Content
Text Analytics and Taxonomy Together – Platform– Text Analytics provides the power to apply the taxonomy– And metadata of all kinds– Consistent in every dimension, powerful and economic
Hybrid Model– Publish Document -> Text Analytics analysis -> suggestions for
categorization, entities, metadata - > present to author– Cognitive task is simple -> react to a suggestion instead of select from
head or a complex taxonomy– Feedback – if author overrides -> suggestion for new category– Facets – Requires a lot of Metadata - Entity Extraction feeds facets
Hybrid – Automatic is really a spectrum – depends on context– Automatic – adding structure at search results
8
Quick Start for Text Analytics Step 1 : Start with Self Knowledge Ideas – Content and Content Structure
– Map of Content – Tribal language silos– Structure – articulate and integrate– Taxonomic resources
People – Producers & Consumers– Communities, Users, Central Team
Activities – Business processes and procedures– Semantics, information needs and behaviors– Information Governance Policy
Technology – CMS, Search, portals, text analytics– Applications – BI, CI, Semantic Web, Text Mining
9
Quick Start for Text AnalyticsStep 2: Software Evaluation: Different Type of Evaluation Traditional Software Evaluation - Start
– Filter One- Ask Experts - reputation, research – Gartner, etc.• Market strength of vendor, platforms, etc.• Feature scorecard – minimum, must have, filter to top 6
– Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus
– Filter Three – In-Depth Demo – 3-6 vendors Reduce to 1-3 vendors Vendors have different strengths in multiple environments
– Millions of short, badly typed documents, Build application– Library 200 page PDF, enterprise & public search
Essential Step – POC or Pilot – search or first Info App
10
Quick Start for Text AnalyticsStep 3: Proof of Concept / Quick Start
POC – understand how text analytics can work in your environment
Learn the software – internal resources trained by doing Learn the language – syntax (Advanced Boolean) Learn categorization and extraction Good categorization rules
– Balance of general and specific– Balance of recall and precision
Develop or refine taxonomies for categorization POC – can be the Quick Start or the First Application
11
Development, ImplementationQuick Start – First Application: Search and TA Simple Subject Taxonomy structure
– Easy to develop and maintain Combined with categorization capabilities
– Added power and intelligence Combined with people tagging, refining tags Combined with Faceted Metadata
– Dynamic selection of simple categories– Allow multiple user perspectives
• Can’t predict all the ways people think• Monkey, Banana, Panda
Combined with ontologies and semantic data– Multiple applications – Text mining to Search– Combine search and browse
12
13
Building a Foundation for Info AppsWhat are Info Apps? Search-based Applications Plus E-Discovery, Behavior Prediction, document duplication, BI & CI, etc.
Legal Review– Significant trend – computer-assisted review (manual =too many)– TA- categorize and filter to smaller, more relevant set– Payoff is big – One firm with 1.6 M docs – saved $2M
Expertise Location – Data (HR, project) plus text – authored documents – subject & level
Financial Services– Combine unstructured text (why) and structured data (what)– Anti-Money Laundering
14
Building a Foundation for Info AppsPronoun Analysis: Fraud Detection - Enron Emails Patterns of “Function” words reveal wide range of insights Function words = pronouns, articles, prepositions, conjunctions, etc.
– Used at a high rate, short and hard to detect, very social, processed in the brain differently than content words
Areas: sex, age, power-status, personality – individuals and groups Lying / Fraud detection: Documents with lies have
– Fewer and shorter words, fewer conjunctions, more positive emotion words
– More use of “if, any, those, he, she, they, you”, less “I”– More social and causal words, more discrepancy words
Current research – 76% accuracy in some contexts Text Analytics can improve accuracy and utilize new sources
15
Conclusions
Info Apps based on search and search needs help Text analytics with taxonomy & metadata = semantic platform
– Formal and informal language and cognition
Semantic Infrastructure– Knowledge Audit -> Content, People, Technology, Processes
• Strategic Vision– Integration of text analytics search, content management– Hybrid Model of tagging – best of human & machine– Build integrated Info Apps
Platform vs. Apps = Yes Thing Big (Semantics), Build Small, Build Integrated
Questions?
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Top Related