Outline - University of Otago · is different: Query formulation Search can contain both structural...

14
Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands Query Formulation for XML Retrieval with Bricks Bricks: , the building blocks to tackle query formulation issues in structured document retrieval Roelof van Zwol Jeroen Baas Herre van Oostendorp Frans Wiering Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands Outline Motivation Objectives with Bricks Theory behind Bricks Usability experiment Conclusions Outline - Motivation - Objectives - Theory - Usability - Conclusions

Transcript of Outline - University of Otago · is different: Query formulation Search can contain both structural...

Page 1: Outline - University of Otago · is different: Query formulation Search can contain both structural and textual conditions. Retrieval strategy Exploit the document structure to retrieve

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Query Formulation for XMLRetrieval with Bricks

Bricks: , the building blocks to tackle queryformulation issues in structured document retrieval

Roelof van Zwol

Jeroen Baas

Herre van Oostendorp

Frans Wiering

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

OutlineMotivationObjectives with BricksTheory behind BricksUsability experimentConclusions

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Page 2: Outline - University of Otago · is different: Query formulation Search can contain both structural and textual conditions. Retrieval strategy Exploit the document structure to retrieve

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

MotivationStructured document retrieval (XML retrieval)is different:

Query formulationSearch can contain both structural and textual conditions.

Retrieval strategyExploit the document structure to retrieve relevantdocument fragments.

Result presentationPresent individual document fragments, or clusteredfragments (browse and fetch), requires new navigationtechniques.

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Running example:A user planning his holiday, has thefollowing specific information need:

“Find information about traveling todestinations with major airports, where theweather has a tropical climate.”

Based on the Lonely Planet collection.

Legend:Structural conditions

Textual conditions

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Page 3: Outline - University of Otago · is different: Query formulation Search can contain both structural and textual conditions. Retrieval strategy Exploit the document structure to retrieve

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Query formulation

Techniques:

Keyword based (NEXI-CO)

Natural Language Processing

Combination of keyword- and structure-based search (NEXI-CAS)

Bricks…

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Query formulationNEXI-Content Only:

Keyword-based query, containing words andphrases.User is ignorant of document structure.Any document or XML document fragmentcan be returned.

Information request:“major airport” destination weather “tropicalclimate”

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Page 4: Outline - University of Otago · is different: Query formulation Search can contain both structural and textual conditions. Retrieval strategy Exploit the document structure to retrieve

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Query formulationNatural language processing:

Complex formulation, including structure is very wellpossible in theory.Close to user’s “mental model”.

User needs to know about structure.

Brisbane’s NLP2NEXI engine:Kindly provided by [Geva et al.]We ran into conversion problems, with respect to recursivestructure, and generation of complex NEXI-queries.

Therefore not included into experiment.

Information request:Find information about traveling to destinations with majorairports, where the weather has a tropical climate.

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Query formulationNEXI - Content and Structure:

Query contains structural requirements.User is aware of document structure, and …User is capable to specify structural constraints.

Powerful query language for structured documentretrieval.

Information request://destination[about(.//weather, “tropical climate”)] //getting_there_and_away[about(., “major airport”]

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Page 5: Outline - University of Otago · is different: Query formulation Search can contain both structural and textual conditions. Retrieval strategy Exploit the document structure to retrieve

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Query formulation

Task Complexity 71

Form

ulat

ion

com

plex

ity

Easy

Hard

ContentOnly

NEXIContent

AndStructure

Bricks (?)

Natural LanguageProcessing

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Objectives with BricksMinimize the complexity of the queryformulation process

Minimize the required knowledge of thedocument structure

Maximize the expression power asprovided by NEXI

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Page 6: Outline - University of Otago · is different: Query formulation Search can contain both structural and textual conditions. Retrieval strategy Exploit the document structure to retrieve

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Running example in bricks

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Theory behind BricksGraphical approachIntuition of a mental modelBuilding blocksAvoid information overload

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Page 7: Outline - University of Otago · is different: Query formulation Search can contain both structural and textual conditions. Retrieval strategy Exploit the document structure to retrieve

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Graphical approachReduces syntactical formulation issues.

Bricks is NEXI-compatible.

Reduces/eliminates knowledge of documentstructure.

Bricks uses of pull-down lists.

Alternative in development: TreeSearch.

Avoids formulation of malformed informationrequests.

Bricks uses extensive checks for both query syntaxand structure.

Referred to in literature as: Direct manipulation [Preece et al.]

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Intuition of a mental modelA user has a mental model of the informationhe is looking for.

Effectivity and efficiency of user increaseswhen the formulation process is close to theuser’s mental model.

User thinks in ‘natural language’…… and is likely to specify the requested element ofretrieval first:

“Find information about traveling to…”

… NEXI-spefication is not user oriented, butstructure oriented…

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Page 8: Outline - University of Otago · is different: Query formulation Search can contain both structural and textual conditions. Retrieval strategy Exploit the document structure to retrieve

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Building blocks

In Brick, the formulation process is splitinto small building blocks, that a userneeds to complete, to specify hisinformation need.

Similar theory as used for form-wizards.

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Avoid information overload

The risk of too many options…

Syntax: allow a minimal (logical) set of nextsteps.

Structure: present a limited (logical) set ofstructural elements. (LP: 271 unique elements)

Semantic

Logical

Presentation

<destination><weather>

<attractions>

<p> <b> <i>

<ul><li>

<section><chapter><image>

Page 9: Outline - University of Otago · is different: Query formulation Search can contain both structural and textual conditions. Retrieval strategy Exploit the document structure to retrieve

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Usability experimentHypothesisSetupResultsTask complexityObservations

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Hypothesis

H1:

Use of sophisticated query formulationtechniques will lead to a higher effectivenessof the task performance.

NEXI-CO and Bricks should perform significantlybetter than NEXI-CO, with respect to successfultask completion.

The difference in effectiveness is dependant onthe task complexity.

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Page 10: Outline - University of Otago · is different: Query formulation Search can contain both structural and textual conditions. Retrieval strategy Exploit the document structure to retrieve

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Hypothesis

H2:

The Bricks approach for query formulationwill increase the efficiency of the user for agiven task.

When time is taken into account, with respect toeffectivity, we expect that users will needsignificantly less time to (successfully) completethe task, when using NEXI-CO and Bricks.

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

SetupDocument collection: Lonely Planet.

Systems: NEXI-CO, NEXI-CAS, and Bricks.

Users: 54 MIR-students, trained in NEXI andSDR.

Topics: 27 topics, sorted in three complexitygroups.

Survey: prior and after the experiment theparticipants filled in a survey (satisfaction)

Experience: Before the experiment the userspracticed with the TERS-interface and searchengines. (reduces learning effects)

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Page 11: Outline - University of Otago · is different: Query formulation Search can contain both structural and textual conditions. Retrieval strategy Exploit the document structure to retrieve

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Overall results

Effectivity: significant diff. between Brick and NEXI-CAS vs. NEXI-CO. (H1)

Efficiency: Bricks most efficient. (sign. diff.) (H2)

Time: users need significantly more time to formulateinformation need with NEXI-CAS.

Satisfaction: no significant difference, but NEXI-CASpreferred, followed by Bricks.

4.10.150.27198NEXI-CO

4.70.140.34245NEXI-CAS

214

Time

0.32

Effectivity

0.16

Efficiency

4.6Bricks

SatisfactionSystem

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Task complexity

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Page 12: Outline - University of Otago · is different: Query formulation Search can contain both structural and textual conditions. Retrieval strategy Exploit the document structure to retrieve

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Task complexity

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Task complexity

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Page 13: Outline - University of Otago · is different: Query formulation Search can contain both structural and textual conditions. Retrieval strategy Exploit the document structure to retrieve

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Observations

Completely different working procedures:

NEXI-CO: formulate query, inspect resultsand refine. (many iteration steps)

NEXI-CAS: step-wise construction andvalidation of NEXI-query, submissions tocheck syntax (few iteration steps)

Bricks: Longer construction time, almost noiterations, submit to check results.

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

ConclusionsUser need to be capable to adequately use thestructure of a document, to make SDR work inpractice.

Three objectives for query formulation:Adequate expression power

Minimize syntactical formulation problems

Minimize required knowledge of document structure

Bricks: graphical approach, intuition of mentalmodel, building blocks, avoid informationoverload. (Online-demo available)

Outline - Motivation - Objectives - Theory - Usability - Conclusions

Page 14: Outline - University of Otago · is different: Query formulation Search can contain both structural and textual conditions. Retrieval strategy Exploit the document structure to retrieve

Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

ConclusionsExperiment:

Sophisticated query formulation techniques have apositive influence of task performance (effectivenessof NEXI-CAS and Bricks)

Bricks is more efficient, since it allows the user tosuccessfully perform their task in a shorter amountof time.

Task complexity vs. effectivity: negative correlation,but Bricks and NEXI-CAS are more effective for mid-and highly complex task.

Task complexity vs. efficiency: strong negativecorrelation.

NEXI-CAS is sensitive with respect to time needed for taskcompletion.

Outline - Motivation - Objectives - Theory - Usability - Conclusions