Luanne Freund & Elaine G. Toms Understanding the Brevity of...

1
Understanding the Brevity of Web Queries Luanne Freund & Elaine G. Toms Faculty of Information Studies University of Toronto {freund, toms}@fis.utoronto.ca Internet search engines are the most common resource for people seeking answers to questions and access to web-based information, and in most search engines, keyword queries are the primary means of retrieving information. Based on transaction log analyses of web queries, we know that web queries are on average about two words in length, and have grown only slightly over time. This is much shorter than queries used in traditional information retrieval systems, and provides very limited input to support information retrieval. This study seeks to better understand why people use such brief and general queries when using search engines: What factors motivate the formulation of brief queries? How are brief queries being used in the web search process? What obstacles prevent the use of longer, more descriptive queries? TREC10 Interactive Track user study of web searching - 48 non-expert participants searched for 4 assigned tasks using a slightly modified Google interface. Tasks were from four domains: Research, Consumer Health, Travel and Shopping. Half the tasks were left partially open so that participants could personalize them. We asked participants to use Keyword queries for 2 tasks, and full Sentence queries for 2 tasks. We used a transaction log to record queries and other aspects of search behaviour. A semi-structured interview was conducted while re-playing the task using screen capture software. The interview included questions on how and why queries were formulated and reformulated. Methods Introduction Why do people submit brief, non-discriminating queries to web search engines? Results I. Length Keyword queries were on average 2.7 terms in length; those entered as sentences were 5.7 terms long. Of the 297 queries collected in the study, over 30% were 1 or 2 terms in length. purchasing soy milk online… buy soy milk online …..soy milk products …..soy milk products – purchasing …..soy milk products - purchasing online…..soy milk products- online products…..soy milk online products // shopping and boots…..shopping and boots and lord&taylor …..brown’s Toronto …..brown’s and toronto and boots…..nine west …brown’s footwear ….patrick cox // cd great Britain …..cd…..jazz cds // places of interst in anatartica …..anatartica …..sight seeing in anatartica….. Anatartica….. places anatartica …..places in anatartica …..a tour of anatartica …..antartica …..antartica.com …..tours and travels in antartica // Amsterdam cool things …..amsterdam cat boat …..Amsterdam t ourism….. amsterdam cats // Journal on Titanic …..titanic sinking ……titanic sinking newspaper….. news article titanic sinking // websites for global warning …..websites for global warning information….. global warning information // kyoto accord …..kyoto protocol …..global warming // global warming ……global warmin g …..what is global warming? …..information on global warming // aurther ash bibliography …..aurther ash…..Aurther ashe…..Aurther ash …..Arther ash……Arthur ash …..Arthur tennis player …..toronto and history… toronto and archives ….. toronto and city and history….. history and Toronto (-.com)….. ontario history … II. Reformulation 46% of searches were completed using a single query. Queries reformulated over the course of a search were more often lengthened (one or more terms added) than shortened. The most common strategy was to change one or more of the query terms in successive reformulations. III. Use Context Discussion and Future Work Brief queries often occur in situations where the underlying information need is quite general, but also when it is well-defined and specific. In some cases, users are constrained by preconceptions of how search engines work, by lack of knowledge of the available material on a topic, and by their own abilities (see right). Short queries are also used strategically in a number of ways (see right). This study provides further evidence that querying is a complex and interactive process. Web queries seem to play different roles in searching than traditional information retrieval queries, which is one reason for their brevity. User perceptions of search engines and the Internet also play a role in shaping queries, which suggests a means of encouraging the use of longer queries, where this may improve search results. This study is part of an ongoing project to better understand the factors at play when people formulate web queries, in order to provide better support for that process. Query Reformulation Strategies Repeat Query 13% Remove Query Term/s 9% Add Query Term/s 20% Change query term/s 58% Uses Constraints Target a Site Knowledge and Abilities Perception of System Perception of Content Open a Gateway Brief queries, often proper nouns, were submitted when participants were seeking websites of known organizations or other information portal sites. “Futureshop”, “Lonely Planet” and “British Museum” are examples of queries seeking organizational websites. “Titanic” and “global warming” are examples of attempts to reach information portal sites on these topics. Participants preferred to search for detailed parameters once they reached the target website. Brief queries are used as opening gambits to provide easy entry into the webspace. This allows the searcher to get the lay of the land: “just sort of a general start to see what it would pull up with the two sort of areas I was interested in P14-13]”. Refining the search by adding more terms to the query was a secondary tactic only done if the initial results were unsatisfactory. Yeah, I'd go to the search box. If I knew I was going to Lonely Planet, and if I didn't know the whatever it is, the address, URL address or whatever it's called, I'd just go to the search box. [P16-33] when I approach certain topics, I choose to go to a specific website and do a search there, if I think its going to be more efficient..” [P09-24] antarctica? titanic? hepatitis? smoke? cortisone? Giacometti? Bicycle? cd? Sweaters? Travel? Establish a Universe Some participants did not want or need a highly focused search. These searchers used general queries purposely as a coarse filter, to create a browsing environment. Some expressed this in terms of “retaining control” over the search. Well because I actually... I want to decide, I want to sift through everything. Even though it's overwhelming. 10-21] I just prefer to find my own word restrictions on that, so I just start in a bigger range, and then I choose from there. [P07-34] deciding what to query is not the challenging part for me. Figuring out how to refine sometimes is….Starting isn’t the problem [P06-33] I could have put "Tylenol, large doses, benefits". Maybe nothing would have come up. I didn't try that. I put just "Tylenol, large doses". I didn't actually put "benefits" in. "Benefits" or "harmful". I didn't really know if it would be beneficial or harmful [P10-14] if “second hand smoking” did not work, I would probably go back or, well, do another search again but with either synonyms or added words, to make a more descriptive search [P13-12] Participants seemed to have a strong sense that search engine queries should be brief. Many participants expressed discomfort at being asked to submit queries in sentence format: “I don’t put sentences in these things [P10-21]”. Some participants seemed to view query formulation as a subject classification problem, and restricted their queries to subject descriptors. you just get into to the point, you don't. I can't imagine putting a sentence in there. P09-42 … what is the base? What is the base topic, and then branch out from there. So the base topic was Titanic, and then the next category would be history [P12-41] Because I wouldn't know how to describe... like if I'm only going to spend 700 bucks on a camera, I don't know how to put that in the search. …. I'm not sophisticated in how to... I think there's something about commas and ways to put things in a search part that I'm not familiar with [P10-14] I was going to put "flu shot in health profession", and I thought, I don't know. But I've seen like sometimes when you go to the library and they want to combine some things and I would put a dash or something, but I couldn't remember exactly how…[P19-24] Participants used brief queries when they did not know what was available on the web on a given topic. Brief queries were considered to be “safer”, because they close off fewer options and are more likely to return some results. Some searchers were quite happy with very large result sets; others used this as a cue to refine the query. I think that “second hand smoke,” maybe its too narrow to find in the directory, so I go to “smoke” firstly and try to find something…[P22-11] Some participants indicated that they used brief queries because they couldn’t think up more terms. A small number of participants said that they were lacking the know-how to build longer queries, which they felt required special syntax. Acknowledgements This work was partially funded by an IBM Centre for Advanced Studies Student Fellowship to the first author and a Natural Sciences and Engineering Research Council of Canada grant to the second author.

Transcript of Luanne Freund & Elaine G. Toms Understanding the Brevity of...

Page 1: Luanne Freund & Elaine G. Toms Understanding the Brevity of …faculty.arts.ubc.ca/lfreund/...TomsE_Understanding_Brevity_Poster_2… · Understanding the Brevity of Web Queries Luanne

Understanding the Brevity of Web Queries Luanne Freund & Elaine G. Toms

Faculty of Information Studies

University of Toronto

{freund, toms}@fis.utoronto.ca

Internet search engines are the most common resource for people seeking answers to questions and

access to web-based information, and in most search engines, keyword queries are the primary means of

retrieving information.

Based on transaction log analyses of web queries, we know that web queries are on average about two

words in length, and have grown only slightly over time. This is much shorter than queries used in

traditional information retrieval systems, and provides very limited input to support information retrieval.

This study seeks to better understand why people use such brief and general queries when using search

engines:

What factors motivate the formulation of brief queries?

How are brief queries being used in the web search process?

What obstacles prevent the use of longer, more descriptive queries?

TREC10 Interactive Track user study of web searching -

48 non-expert participants searched for 4 assigned tasks using a slightly modified Google interface.

Tasks were from four domains: Research, Consumer Health, Travel and Shopping.

Half the tasks were left partially open so that participants could personalize them.

We asked participants to use Keyword queries for 2 tasks, and full Sentence queries for 2 tasks.

We used a transaction log to record queries and other aspects of search behaviour.

A semi-structured interview was conducted while re-playing the task using screen capture software.

The interview included questions on how and why queries were formulated and reformulated.

Methods

Introduction

Why do people submit brief, non-discriminating queries to web search engines?

Results

I. Length

Keyword queries were on average 2.7 terms in length;

those entered as sentences were 5.7 terms long.

Of the 297 queries collected in the study, over 30% were 1 or 2

terms in length.

purchasing soy milk online… buy soy milk online …..soy milk products …..soy milk products – purchasing …..soy milk products - purchasing online…..soy milk products- online products…..soy milk online products // shopping and boots…..shopping and boots and lord&taylor …..brown’s Toronto …..brown’s and toronto and boots…..nine west …brown’s footwear ….patrick cox // cd great Britain …..cd…..jazz cds // places of interst in anatartica …..anatartica …..sight seeing in anatartica….. Anatartica….. places anatartica …..places in anatartica …..a tour of anatartica …..antartica …..antartica.com …..tours and travels in antartica // Amsterdam cool things …..amsterdam cat boat …..Amsterdam tourism….. amsterdam cats // Journal on Titanic …..titanic sinking ……titanic sinking newspaper….. news article titanic sinking // websites for global warning …..websites for global warning information….. global warning information // kyoto accord …..kyoto protocol …..global warming // global warming ……global warming …..what is global warming? …..information on global warming // aurther ash bibliography …..aurther ash…..Aurther ashe…..Aurther ash …..Arther ash……Arthur ash …..Arthur tennis player …..toronto and history… toronto and archives ….. toronto and city and history….. history and Toronto (-.com)….. ontario history …

II. Reformulation

46% of searches were completed using a single query.

Queries reformulated over the course of a search were more

often lengthened (one or more terms added) than shortened.

The most common strategy was to change one or more of the

query terms in successive reformulations.

III. Use Context

Discussion and Future Work

Brief queries often occur in situations where the underlying information need is quite general, but also

when it is well-defined and specific.

In some cases, users are constrained by preconceptions of how search engines work, by lack of

knowledge of the available material on a topic, and by their own abilities (see right).

Short queries are also used strategically in a number of ways (see right).

This study provides further evidence that querying is a complex and interactive process.

Web queries seem to play different roles in searching than traditional information retrieval queries, which

is one reason for their brevity.

User perceptions of search engines and the Internet also play a role in shaping queries, which suggests

a means of encouraging the use of longer queries, where this may improve search results.

This study is part of an ongoing project to better understand the factors at play when people formulate

web queries, in order to provide better support for that process.

Query Reformulation Strategies

Repeat Query

13%

Remove Query

Term/s

9%

Add Query

Term/s

20%

Change query

term/s

58%

Uses Constraints

Target a Site

Knowledge and Abilities

Perception of System

Perception of Content Open a Gateway

Brief queries, often proper nouns, were submitted when participants were

seeking websites of known organizations or other information portal sites.

“Futureshop”, “Lonely Planet” and “British Museum” are examples of queries

seeking organizational websites. “Titanic” and “global warming” are examples of

attempts to reach information portal sites on these topics.

Participants preferred to search for detailed parameters once they reached the

target website.

Brief queries are used as opening gambits to provide easy entry into the webspace.

This allows the searcher to get the lay of the land: “just sort of a general start to see what it

would pull up with the two sort of areas I was interested in P14-13]”.

Refining the search by adding more terms to the query was a secondary tactic – only done

if the initial results were unsatisfactory.

Yeah, I'd go to the search box. If I knew I was going to Lonely Planet,

and if I didn't know the whatever it is, the address, URL address or

whatever it's called, I'd just go to the search box. [P16-33]

when I approach certain topics, I choose to go to a specific website and do a search there, if I think its going to be more efficient..” [P09-24]

antarctica? titanic?

hepatitis? smoke?

cortisone? Giacometti?

Bicycle? cd?

Sweaters? Travel?

Establish a Universe

Some participants did not want or need a highly focused search.

These searchers used general queries purposely as a coarse filter, to create a

browsing environment.

Some expressed this in terms of “retaining control” over the search.

Well because I actually... I want to decide, I want to sift

through everything. Even though it's overwhelming.

10-21]

I just prefer to find my own word restrictions on that, so I just

start in a bigger range, and then I choose from there. [P07-34]

deciding what to query is not the challenging part for me. Figuring out how to refine

sometimes is….Starting isn’t the problem [P06-33]

I could have put "Tylenol, large doses, benefits". Maybe nothing would have come up. I didn't try that. I put just "Tylenol,

large doses". I didn't actually put "benefits" in. "Benefits" or "harmful". I

didn't really know if it would be beneficial or harmful [P10-14]

if “second hand smoking” did not work, I would probably go back or, well, do another search again but with either synonyms or added

words, to make a more descriptive search [P13-12]

Participants seemed to have a strong sense that search engine queries

should be brief.

Many participants expressed discomfort at being asked to submit queries in

sentence format: “I don’t put sentences in these things [P10-21]”.

Some participants seemed to view query formulation as a subject

classification problem, and restricted their queries to subject descriptors.

you just get into to the point, you don't. I can't imagine putting a

sentence in there. P09-42

… what is the base? What is the base topic, and then branch out

from there. So the base topic was Titanic, and then the next category

would be history [P12-41]

Because I wouldn't know how to describe... like if I'm only

going to spend 700 bucks on a camera, I don't know how to put that in the search. …. I'm not sophisticated in how to... I think there's something about commas and ways to put things in a search part that I'm not

familiar with [P10-14]

I was going to put "flu shot in health profession", and I thought, I don't know. But I've seen like sometimes

when you go to the library and they want to combine

some things and I would put a dash or something, but I

couldn't remember exactly how…[P19-24]

Participants used brief queries when they did not know what was available on

the web on a given topic.

Brief queries were considered to be “safer”, because they close off fewer

options and are more likely to return some results.

Some searchers were quite happy with very large result sets; others used this

as a cue to refine the query.

I think that “second hand smoke,” maybe its too narrow to find in the directory, so I go to

“smoke” firstly and try to find something…[P22-11]

Some participants indicated that they used brief queries because they

couldn’t think up more terms.

A small number of participants said that they were lacking the know-how to

build longer queries, which they felt required special syntax.

Acknowledgements

This work was partially funded by an IBM Centre

for Advanced Studies Student Fellowship to the

first author and a Natural Sciences and

Engineering Research Council of Canada grant to

the second author.