CB1004 Modelling Business Systems 31 Modelling Business Systems 3 Inventive Search.
Search Systems
-
Upload
miles-price -
Category
Technology
-
view
1.836 -
download
2
description
Transcript of Search Systems
![Page 1: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/1.jpg)
Search SystemsInformation Architecture
![Page 2: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/2.jpg)
Does your site need search?▫Does your site have enough contents?▫Will this divert resources from navigation
systems?▫Do you have time and knowledge to
optimize the search system?▫Are there alternatives?▫Will your users bother with search?
![Page 3: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/3.jpg)
Before you add a search system•Do not assume that a search engine
alone will satisfy all users information needs
•Should be used in addition to well structured navigation, not replacing navigation
![Page 4: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/4.jpg)
Need a search system if…
•When you have too much content to browse or content warrants it▫Eg – course catalog, research site, large
site like Microsoft, real estate site•Fragmented subsites – Eg – UB•Site is a learning tool – Eg – web coding
tutorials online•Dynamic site like a newspaper where
articles are archived and only way to access them is to search
![Page 5: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/5.jpg)
Search System Anatomy
•Indexing by SE•Web Sites need to be SEO•Spiders•What is indexed – url, title, headings,
keywords, content•Search interface•Boolean operators (and, or, not)
![Page 6: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/6.jpg)
The Retrieval Process
SearchInterface
QueryOperations
User QuerySearchEngine
DB ManagerModule
Content
TextDatabase
Results
Ranked DocsRetrieved Docs
![Page 7: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/7.jpg)
Search Systems
•Types of searches:▫Basic Search (also known as
“keyword search”
▫Advanced search: Use of search refinement and metadata search.
•Search Engines are the software applications and foundation of search systems
![Page 8: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/8.jpg)
Choosing what to search
•Don’t have to index everything
•If you conduct an inventory and analysis of your content you should have a good idea of what content is “good”
•Silos – staff directories, sub sites, tech articles, books, etc…
•Content components – title, author, etc..
![Page 9: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/9.jpg)
Search Zones
•Subsets of the site that have been indexed separately.
▫Example http://search.dell.com/index.asp ▫Amazon does a great job of this
•Can be: content type, audience, role, topic, geography, chronology, department
![Page 10: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/10.jpg)
Types of Pages
•Navigation pages – pages that help you browse a site
•Destination pages – contain actual information
•Want to make sure search results contain mostly destination pages
![Page 11: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/11.jpg)
Search Systems
•Selecting content components to index▫Take advantage of the site structure▫Components to index:
• Image Link• Image alt text• Description• Keywords• Remote anchor text
• Body• Title• URL• Site name• Link
![Page 12: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/12.jpg)
Search Algorithms
•There are many types of algorithms available.
•The bottom line is to select the one that is appropriate for the type of search capabilities required by the user.
![Page 13: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/13.jpg)
User
Tasks Browsing
Classic Models•Boolean•Vector space•Probabilistic
Structured Models•Non Overlapping Lists•Proximal nodes
Browsing•Flat•Structure Guided•Hypertext
Set Theoretic
•Fuzzy•Extended Boolean
Algebraic•Generalized Vector•Lat. Semantic Index•Neural Networks
Probabilistic•Inference Network•Belief Network•Language Models
Retrieval:AdhocFiltering
![Page 14: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/14.jpg)
Pattern Matching Algorithms• Most common, matches a string that user
entered
• Depending on your user’s needs you have to emphasize recall or precision.
• Recall - #relevant docs retrieved / #relevant docs in collection
• Precision - #relevant docs retrieved / #total docs in collection
![Page 15: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/15.jpg)
Pattern Matching Algorithms
•Automatic Stemming – expands a term to include other terms that share the same root▫Eg: “word” gets you “password”
•No Stemming – results contain just that word
•Depends on the content you are indexing. Eg – course catalog
![Page 16: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/16.jpg)
Other Approaches•Document Similarity - Allowing user
feedback (more like this option)
▫Can be done by re-querying w/o stopwords or automatically based on metadata
•Collaborative filtering Cited by Active Bibliography (related docs) Users who viewed this document also viewed Similar documents based on text Related documents based on co-citation
![Page 17: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/17.jpg)
Query Builders
•Tools that help SE performance – invisible to users
▫Spell-checkers – Google’s “did you mean”▫Phonetic tools – sounds like▫Stemming tools – same stem results▫Natural language processing tools – how to ▫Controlled vocabulary – include synonyms
![Page 18: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/18.jpg)
Presenting Results• What to display?
▫ Title▫ Summary▫ Relevance score▫ Other parts of the structure of docs▫ Depends on your audience – more or less info – give
users the option to see ‘detailed’ results if they choose – descriptive vs reprenstational
• How many documents?▫ Number of retrieved docs▫ Number of results per page
![Page 19: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/19.jpg)
Listing Results
•Sorting Alphabetically Chronologically
•Ranking By relevance By popularity By users’ or experts’ ratings By pay-for-placement
![Page 20: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/20.jpg)
Listing Results•Grouping results: Clustering
•Exporting results Print or email results Select a subset of results Save search
•No single approach is perfect – combine approaches
![Page 21: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/21.jpg)
Search Interfaces
•Factors that affect the interface design
User’s searching expertise Type of results wanted Type of information being searched Amount of information being searched
![Page 22: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/22.jpg)
Search Interface
•The box: Simple and clear
▫Good for users that don’t want to learn more about the search mechanism
▫Placement of search matters on a site▫Put close to main navigation or near top of
page▫Don’t be creative with button label
![Page 23: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/23.jpg)
Advanced Search
•Unveils search system functionality
▫Field searching▫Date ranges▫Search zones
•How often do you take advantage of these features?
![Page 24: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/24.jpg)
Supporting Revision
•What to do when users don’t get what they want?
Repeat search in results Explain where results came from (what data
was searched) Explain what the user did (restate query,
filters, sort order) Integrate searching and browsing (product
inventory)
![Page 25: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/25.jpg)
Search Systems•When users get stuck
▫Way too many results Options to narrow search
▫Zero results: Offer means of revising the search Search tips A means of browsing (I.e. site map) Human contact if searching & browsing don’t
work
![Page 26: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/26.jpg)
Search Systems
•Commercial web site search available: Verity Ultraseek Altavista Google …… and many others
![Page 27: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/27.jpg)
Search Systems
•Free search options:▫Adding Google search to your site:
http://www.google.com/searchcode.html
▫Open source software: Lucene: (Jakarta Project) MG: (Managing Gigabytes)
![Page 28: Search Systems](https://reader035.fdocuments.net/reader035/viewer/2022062613/54562cadb1af9f37608b4b67/html5/thumbnails/28.jpg)
Discussion Questions
•How has the search engine changed the way we use the web?
•Where do you see it going in the future?•Search Engines – Pros / Cons•Articles