1 1 A New Content Processing Framework for Search Applications Iain Fletcher...

34
1 1 A New Content Processing Framework for Search Applications Iain Fletcher [email protected]

Transcript of 1 1 A New Content Processing Framework for Search Applications Iain Fletcher...

  • Slide 1

1 1 A New Content Processing Framework for Search Applications Iain Fletcher [email protected] Slide 2 2 Agenda Briefly About Search Technologies Key Issues for Enterprise Search A New Content Processing Framework for Search Applications How do we use it? What does it look like? Use case example 2 Slide 3 3 Search Technologies overview 3 The leading IT services company focused on search engines Consulting Implementation Managed services Technology independent, working with most of the leading search engines 90 staff, 250+ customers Slide 4 4 Search Technologies overview San Diego, CA San Jose, CR Herndon, VA Ascot, UK Boston, MA Cincinnati, OH Slide 5 5 Executive team ExecutiveEnterprise Search Industry Experience Kamran Khan President & CEO 18 years: International Sales, VP Sales, Executive John Steinhauer VP Technology 16 years: Development Management, Project Management, Executive Paul Nelson Chief Architect 22 years: Development, Innovation, Architecting, Dev. Management Graham Charlesworth VP Europe 16 years: Business Development, VP Sales, Executive Phil Lewis Tech. Director, Europe 19 years: Development, Innovation, Architecting, Project Management Dennis Tran VP & Founder 21 years: International Sales, VP Sales John Back VP Sales 15 years: Sales, Federal Sales Director Iain Fletcher VP Marketing 16 years: International Sales, Product Management, VP Marketing # years in the search engine industry 5 Slide 6 6 Selected customers 6 Slide 7 7 7 A New Content Processing Framework for Search Applications Slide 8 8 Agenda Briefly About Search Technologies Key Issues for Enterprise Search A New Content Processing Framework for Search Applications How do we use it? What does it look like? Use case example 8 Slide 9 9 Enterprise Search - An Indifferent Reputation Major surveys show that no progress has been made during the last 10 years Searchers are successful in finding what they seek 50% of the time or less 2001, IDC, Quantifying Enterprise Search More than half cannot find the information they need using their Enterprise search system 2011, MindMetre/SmartLogic, Mind the Enterprise Search Gap 9 Slide 10 10 Search Fundamentals 10 Slide 11 11 Metadata Supports Relevance Ranking Slide 12 12 Metadata Supports Relevance Ranking Supported by great metadata! Title Meta description URL Inbound links Alt tag text Etc. Provided for free by millions of SEO practitioners Slide 13 13 Key Issues Almost all modern search functions are driven by data structure 13 Slide 14 14 Key Issues The majority of serious problems in serious search systems are caused by data quality issues Also... Big Data and BI from unstructured data will face the same challenges Can you trust an analysis if you are unsure of data providence? 14 Slide 15 15 Data quality examples The subscription portal caught out by template information The Intranet search skewed by a new piece of hardware The Intranet search where great quality was the problem! 15 Slide 16 16 Key Issues Data structure and quality issues are addressed in the indexing pipelines of search engines Cleaning, enriching, normalizing, granularizing... It is about process as much as technology And data constantly evolves Sometimes the built-in indexing pipeline is not good enough (issues with scale, flexibility or transparency) Some search engines dont really have one Weve written our own 16 Slide 17 17 Agenda Briefly About Search Technologies Key Issues for Enterprise Search A New Content Processing Framework for Search Applications How do we use it? What does it look like? Use case example 17 Slide 18 18 Document Processing Methodology for Search (DPMS) The Philosophy Understand the Document Model Understand the User Model Includes business-level requirements Create the Search Engine Model Search = the pivot point between User and Data Document everything 18 Slide 19 19 DPMS The Methodology Assessment (Search Technologies Architect and Business Analyst) DPMS Analysis (Knowledge Engineer, Business Analyst, etc.) Assessment Report Expert assessment and recommendations Validation Aspire DMDs Review (Architect, Domain Experts, Peers) 1 Assessment 2 Detailed Analysis 3 Execution Implementation (Developer) Validate DMDs Search Engine Slide 20 20 DPMS The Implementation Slide 21 21 Introducing Aspire Think of it as a stand-alone indexing pipeline with a framework + component architecture Framework built for scalability, performance and flexibility designed to use cloud elasticity Components built to be autonomous and transparent Slide 22 22 Technology Suite 100% Java OSGi See www.osgi.orgwww.osgi.org The Dynamic Module System for Java Apache Felix Open source implementation of OSGi Jetty Embedded HTTP server Maven & Maven Repositories For component deployment Slide 23 23 Component Configuration Any number of document processing pipelines can be used in an application Disparate data sources will need different treatment Components can be shared where appropriate Configurations are easy to change 23 Slide 24 24 Component autonomy Components communicate via XML Each component has a known and transparent input and output, and can be tested in isolation This simplifies problem diagnosis, promotes transparency and controls cost-of-ownership 24 Slide 25 25 Data Quality Monitoring Components have built-in quarantine systems to monitor data quality Content is constantly evolving This provides transparency and enables content issues to be diagnosed and resolved faster 25 Slide 26 26 The Component Library Search Technologies maintains a library of components Currently there are more than 70 Components can be as simple as 3 lines of groovy script, or complex, 3 rd party technologies Many applications can be addressed using existing components + configuration 26 Slide 27 27 Component Upgrading Components can be upgraded in-situ from a cloud-based service, without stopping/restarting the system Helpful in the maintenance of complex or mission-critical systems 27 Slide 28 28 Component control Every component has its own control / status page 28 Slide 29 29 A very simple example Slide 30 30 Security expansion example Slide 31 31 Patent Assignee Name Normalization Slide 32 32 Complexity example 32 CPA Global Discover The worlds leading patent research portal 80 million patents from 95 patent offices More than a dozen navigators built Numerous graphical search results display options Whole document comparison features Slide 33 33 In Summary Many applications today dont need this level of diligence But as data and data dynamism grows, more will A stand-alone unstructured content processing system can serve multiple applications, and makes sense for some companies Method. Diligence. Transparency its not rocket science... Applying this approach to enterprise search is a key part of moving user satisfaction forward during the next few years 33 Slide 34 34 Thank You! Iain Fletcher [email protected] http://uk.linkedin.com/in/iainfletcher