Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and...

23
Sept 20-21, 2001 R. Scott Cost - CADIP, UM BC 1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium

Transcript of Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and...

Page 1: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 1

CARROT II

Collaborative Agent-based Routing and Retrieval of Text, Version 2CADIP Fall Research Symposium

Page 2: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 2

Overview

A distributed, agent-based system for large scale, high bandwidth information retrieval and visualization. Carrot I, implemented ~1997,

demonstrated the distribution of queries to various backend systems through a single broker, using Telltale, with TKQML as a communication mechanism.

Page 3: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 3

Outline

Project Review Goals Overview Issues Architecture

Progress Report

Page 4: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 4

C2 Project Goals

Build a powerful, high-bandwidth distributed IR systemCreate a testbed for research in a variety of IR issuesFoster new and ongoing IR research at UMBC

Page 5: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 5

Basic C2 Approach

A client submits a query to some agent in a distributed C2 system.That agent uses metadata about its collection and the collections around it to decide whether to handle or forward the query to another agent.Results are assembled, and returned to the client.

Page 6: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 6

How does it work?

Single IR engine is replicated across multiple machinesEach engine gets a portion of the total document collectionEngines exchange metadata describing their collectionsEngines receive queries, and either answer or forward them as appropriate

Page 7: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 7

Research Issues/Questions

Heterogeneity (information sources)Metadata (form, order, comparison)Query Management (routing, standing)Results FusionCorpus ManagementIntegration with Parallel Telltale, RAMA Index-based parallelism Storage-based parallelism

Page 8: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 8

Flexible System

Form of system can change dramatically, based on: How system is distributed How metadata is distributed How queries are handled How fusion is handled Whether or not system adapts

dynamically to query performance and/or load…

Page 9: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 9

Some example scenarios

Two peer agents, each managing a corpus (IR System is MG).

Each agent advertises metadata to the other. Queries directed at either, routed to appropriate agent.

Based on TREC WT10g Collection ~1,700,000 documents from the WWW N agents, one for each of the ~12,000 servers

represented in collection Topology of system inferred from link topology in

collection of web pages

An agent starts and runs a C2 system for a specific purpose.

Page 10: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 10

C2 Architecture

C2 Agents Form the core of the C2 system

C2 Infrastructure Elements Provide effective communication and

control support

C2 Support Elements Control and provide access to system

Page 11: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 11

C2 Agent

Java-based software agentCommunicates using the Jackal systemRuns a local corpus and metadata engine (currently MG)

Page 12: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 12

Basic Node Architecture

Agent

Jackal Othernodes

IR EngineWrapper

DecisionInterface

IR System: Manages local corpus and metadata

IR System: Manages local corpus and metadata

Page 13: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 13

C2 Infrastructure

Provides for efficient control of systemHierarchicalSeveral Types of Agent: Master Node Platform Cluster

Page 14: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 14

Infrastructure

Master Agent

Node AgentControls one physical node

Next Node…

PlatformControls one JVM

Next Platform…

Cluster AgentControls one Jackal instance

Next Cluster…

C2 AgentC2 Agent

Page 15: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 15

Infrastructure…

Infrastructure hierarchy allows for efficient propagation of control informationCommunication and coordination is localized to reduce overheadShape of tree can be modified to change performance

Page 16: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 16

C2 Support

Master Controls the C2 system

ANS White pages communications support

Collection Manager Controls distribution of

documents/collections to C2 Agents

Logger Agent Logs system operational information

Page 17: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 17

C2 Tools

Query Agent Supports the controlled presentation,

collection and analysis of large batches of queries

C2 System Visualizer Presents a graphical view of the flow

of queries through the system

Page 18: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 18

C2 Tools: Visualizer

(screen shot)

Page 19: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 19

For More Information …

For more details on the goals and design of the project, individuals are referred to documents on the Project site: http://acm.org/

~cost/carrot2/info.htm

Page 20: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 20

3/6/12 Plan (From 9/2000)

3: Clear design, working prototype.6: Fully operational system, testing on real data.12: Publication ready results for one or more research questions. Tentative target of CIKM.50-75% complete: System still in test with scalability issues, design publications in press.

Page 21: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 21

3/6/12 Plan (From 9/2001)

3: Exercise system and prepare initial results for publication.6: Expand system.12: To be determined.

Page 22: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 22

External Publication Plans

WWW 2002Autonomous Agents 2002SIGIR 2002

Page 23: Sept 20-21, 2001R. Scott Cost - CADIP, UMBC1 CARROT II Collaborative Agent-based Routing and Retrieval of Text, Version 2 CADIP Fall Research Symposium.

Sept 20-21, 2001 R. Scott Cost - CADIP, UMBC 23

Academic Milestones

Monitoring and Control of a Distributed IR System M.S. Thesis, Srikanth Kallurkar (Fall ’01)

Integrating C2 as an Information Source for ITTALKS M.S. Project, Yogesh Nagappa (Fall ’01)

Integrating Telltale into the C2 System 691 Project, Jonathan Kessler and Matt Siegel (Fall ’01)

Visualization of a Distributed IR System 691 Project, Tom Laufert (Fall ’01)

Data Fusion in C2 Agents 691 Project, Mithun Sheshagiri (Fall ’01)

Query Caching in the C2 System M.S. Thesis, Hemali Majithia (Spring ’02)

A User-friendly interface to the C2 System Jacquelyn Nicole Winston, High School Intern (Spring ’01)