ITEC547 Text Mining Fall 2015-16 Overview of Search Engines.

ITEC547 Text Mining Fall 2015-16

Overview of

Search Engines

Outline of Presentation

1Early Search Engines

2Indexing Text

for Search

3Indexing

Multimedia

4Queries

5Searching an Index

Search Engine Characteristics

• Unedited – anyone can enter content• Quality issues; Spam

• Varied information types• Phone book, brochures, catalogs, dissertations, news

reports, weather, all in one place!

• Different kinds of users– Lexis-Nexis: Paying, professional searchers– Online catalogs: Scholars searching scholarly literature– Web: Every type of person with every type of goal

• Scale– Hundreds of millions of searches/day; billions of docs

Web Search Queries

• Web search queries are short:– ~2.4 words on average (Aug 2000)– Has increased, was 1.7 (~1997)

• User Expectations:– Many say “The first item shown should be what I

want to see!”– This works if the user has the most

popular/common notion in mind, not otherwise.

Background

History, Problems, Solutions …

Search Engines

• Open Text (1995-1997)• Magellan (1995-2001)• Infoseek (Go) (1995-2001)• Snap (NBCi)(1997-2001)• Direct Hit (1998-2002)• Lycos (1994; reborn 1999)• WebCrawler (1994; reborn 2001)• Yahoo (1994; reborn 2002)• Excite (1995; reborn 2001)• HotBot (1996; reborn 2002)• Ask Jeeves (1998; reborn 2002)• Teoma (2000- 2001)• AltaVista (1995- )• LookSmart (1996- )• Overture (1998- )

EARLY SEARCH ENGINES

• Initially used in academic or specialized domains.– Legal and specialized domains consume a large

amount of textual info• Use of expensive proprietary hardware and

software– High computational and storage requirements

• Boolean query model• Iterative search model

– Fetch documents in many steps

Medline of National Library of Medicine

• Developed in late 1960 and made available in 1971• Based on inverted file organization• Boolean query language

– Queries broken down and numbered into segments– Results of a queries fed into the next query segment

• Each user assigned a time slot– If cycle not completed in time slot, most recent results are

returned• Query and browse operations performed as separate steps

– Following a query, results are viewed– Modifications start a new query-browse cycle

Dialog

• Broader subject content• Specialized collections of data on payment• Boolean query

– Each term numbered and executed separately then combined

– Word patterns– For multiword queries proximity operator W

Information Retrieval

• The indexing and retrieval of textual documents.

• Searching for pages on the World Wide Web is perhaps most widely used IR application

• Concerned firstly with retrieving relevant documents to a query.

• Concerned secondly with retrieving from large sets of documents efficiently.

Web Search as a Typical IR Task

• Given:– A corpus of textual natural-language documents.– A user query in the form of a textual string.

• Find:– A ranked set of documents that are relevant to

the query.

Typical IR System Architecture

IRSystem

Query String

Documentcorpus

RankedDocuments

1. Doc12. Doc23. Doc3 . .

Architecture of a Search Engine

The Web

Ad indexes

Web Results 1 - 10 of about 7,310,000 for miele. (0.12 seconds)

Miele, Inc -- Anything else is a compromise At the heart of your home, Appliances by Miele. ... USA. to miele.com. Residential Appliances. Vacuum Cleaners. Dishwashers. Cooking Appliances. Steam Oven. Coffee System ... www.miele.com/ - 20k - Cached - Similar pages

Miele Welcome to Miele, the home of the very best appliances and kitchens in the world. www.miele.co.uk/ - 3k - Cached - Similar pages

Miele - Deutscher Hersteller von Einbaugeräten, Hausgeräten ... - [ Translate this page ] Das Portal zum Thema Essen & Geniessen online unter www.zu-tisch.de. Miele weltweit ...ein Leben lang. ... Wählen Sie die Miele Vertretung Ihres Landes. www.miele.de/ - 10k - Cached - Similar pages

Herzlich willkommen bei Miele Österreich - [ Translate this page ] Herzlich willkommen bei Miele Österreich Wenn Sie nicht automatisch weitergeleitet werden, klicken Sie bitte hier! HAUSHALTSGERÄTE ... www.miele.at/ - 3k - Cached - Similar pages

ITEC547 Text Mining Fall 2015-16 Overview of Search Engines.

Documents

Transcript of ITEC547 Text Mining Fall 2015-16 Overview of Search Engines.

SCANIA MINING TRUCKS TRUCKS/SCANIA... · From exploration to reclamation and everything in-between Scania vehicles and engines are making a name for themselves in the mining world.

1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk.

Data Extract: Mining Context from the Web for Dataset ...Index Terms—Dataset, information retrieval, web mining, search engines. I. INTRODUCTION The abundance of data availability

New Capabilities of PolyAnalyst Text and Data Mining Applied to … · 2019-08-19 · New Capabilities of PolyAnalyst Text and Data Mining Applied to STEADES Data at IATA iv engines:

The effects of Web Spam on The Evolution of Search Engines CS315-Web Search and Mining.

CATERPILLAR CHROME PLATE REPLACEMENT … IS… MAKING PROGRESS POSSIBLE — World’s largest manufacturer of construction and mining equipment, diesel and natural gas engines and

Hydraulic Mining Excavator | RH 170 / RH 170-B · RH 170 / RH 170-B General Data: ... Diesel Engines Version 1 ... vital signs and service data of engines, hydraulic and lubrication

The University Number Illinois more ... · PDF fileSlides unnumbered, Mechanical Engineering Slides of railroad locomotives, diagrams of engines and aircraft, coal mining operations

MINING ENGINES - Horton...power solutions for the future.” Back with mining engines, Thulin had some encouraging things to say about just how quickly the transition was happening

Search Engines and Metasearch Engines

PITTSBURGH MINING AND SAFETY RESEARCH CENTER …PITTSBURGH MINING AND SAFETY RESEARCH CENTER INTERIM REPORT NO. 1646 THE HARD START PHENOMENA IN HYPERGOLIC ENGINES VOLUME I. BIBLIOGRAPHY

Choosing Between Graph Databases and RDF Engines for … · 2019-01-02 · Choosing Between Graph Databases and RDF Engines for Consuming and Mining Linked Data Domingo De Abreu,

In-line Engines V-Type Engines

Mining and METS: engines of economic growth and … · Mining and METS: engines of economic growth and prosperity for Australians i Glossary ABS Australian Bureau of Statistics ANZSIC

Mining and METS: engines of economic growth and … · 2.2.8 Maptek 28 2.2.9 Orica 29 2.2 ... engines of economic growth and prosperity for Australians i ... engines of economic growth

Ready For More. Underground. - mart.cummins.com · The Right Choice For Underground Mining Equipment. Cummins mining engines are designed to deliver exceptional dependability, reliability

“The DEUTZ Path to Tier 4 for Underground Mining Engines”mdec.ca/2010/w2_tremaine.pdf · Mdec 2010 Workshop Section 2 - 1 “The DEUTZ Path to Tier 4 for Underground Mining Engines

Query Log Mining in Search Engines - DCCUChile · PDF fileQuery Log Mining in Search Engines ... We start by exploring the properties of the user’s click data. ... 1.4 Relations

Who is Caterpillar? Cat Dealers Cat Business Units World’s leading manufacturer of construction & mining equipment; diesel & natural gas engines; and industrial.

Technology That Transforms. · High-Performance Engines For Mining Equipment. Technology That Transforms. 363571_Cummins_4087235.indd 2 9/15/16 9:56 AM