Proceedings Template - WORD - USF Computer Sciencewolber/Research/WebTop/papers/jcdl04.doc · Web...

15

Click here to load reader

Transcript of Proceedings Template - WORD - USF Computer Sciencewolber/Research/WebTop/papers/jcdl04.doc · Web...

Page 1: Proceedings Template - WORD - USF Computer Sciencewolber/Research/WebTop/papers/jcdl04.doc · Web viewDavid Wolber, Chris Brooks University of San Francisco 2130 Fulton Avenue San

Navigating the Personal WebDavid Wolber, Chris Brooks

University of San Francisco2130 Fulton Avenue

San Francisco, CA., 94117(415) 422-6451

ABSTRACTThis paper presents a system for seamlessly navigating from one’s own personal space to external information sources and to the personal spaces of other users. We present techniques for peer-to-peer peer knowledge sharing and zero-input publishing, as well as a context view that combines searching, browsing, associative file management, and blog-like features.

Categories and Subject DescriptorsH.3.3 [Information Search and Retrieval]

General TermsAlgorithms, Human Factors, Standardization, Experimentation, Languages.

Keywords personalization, contextualization, collaboration, peer-to-peer.

1. INTRODUCTIONThe context for our research includes two emerging phenomena: 1) an explosion in the availability of general purpose and domain-specific document collections (digital libraries), and 2) the pervasiveness of incredibly powerful computing, storage, and networking capabilities available to ordinary computer users. The purpose of our research is to leverage these phenomena in order to improve the research and creative process.

We take an inward-out approach, grounding our tools and techniques in what we call the personal web. The term actually has dual connotations: 1) providing a user with a personal view of the WWW, and 2) considering the personal information space, including all documents, bookmarks, and links, as a highly interconnected space that extends seamlessly to the external world.

Page 2: Proceedings Template - WORD - USF Computer Sciencewolber/Research/WebTop/papers/jcdl04.doc · Web viewDavid Wolber, Chris Brooks University of San Francisco 2130 Fulton Avenue San

have designed a user interface that allows for easy navigation of this neighborhood.

3. WEBTOP: AN ASSOCIATIVE CLIENTAssociative agents serve as virtual library assistants, peeking over the user’s shoulder as the user writes or browses, analyzing what associative information would be helpful, and then scurrying off to virtual libraries (information sources) to gather data. Also known as reconnaissance agents [29], personal information assistants [9],

and attentive systems [31], the goal is to augment the user’s associative thinking capabilities and thereby improve the creation and discovery process.

Figure 2 shows a screenshot of the current version of our associative agent, WebTop. The user can browse web documents or edit MS-Word documents in the right panel. As the user works, associative links are displayed in the left panel, which we call the context view. The ‘I’,’O’, and ‘C’ icons specify the type of association. ‘I’ stands for inward link, i.e. the document points to the working one, ‘O’ stands for outward link, and ‘C’ means the document has similar content. When the user clicks on any expander (+) in the context panel, associations at the next degree of separation from the open document are displayed.

Based on our study of associative agents as well as the experience we have gained building and using them, we have identified several features that seem to be effective in helping users locate and manage information. They include:

Zero-input interface [29]. In the traditional desktop, creation and information retrieval are two distinct processes. When a creator is in need of information, he or she switches from the current task, opens a search engine, formulates an information query and then invokes the query.

Zero-input interfaces seek to integrate creation and information retrieval. The agent underlying the interface analyzes the user’s working document(s) and automatically formulates and invokes information queries. One common zero-input task uses TFIDF

Personal Personal Personal General-Purpose Search Engine

Specialized Search Engines

Associative Thinking Agent

Figure 1. The proposed system allows any entity that can provide associative data to interested parties to register as associative thinking agent source. This includes general-purpose search engines like Google, specialized search engines like the Internet Archive, and ordinary users of personal computers. Associative thinking agents, residing on personal computers, help the user identify sources and information relevant to the user’s current task.

Page 3: Proceedings Template - WORD - USF Computer Sciencewolber/Research/WebTop/papers/jcdl04.doc · Web viewDavid Wolber, Chris Brooks University of San Francisco 2130 Fulton Avenue San

[3] to identify the most characteristic words in the document, then sends those as keywords to information source searches (this is how the ‘C’ links in figure 1 are generated). The results of such queries are listed on the periphery of the user’s focus. The user periodically glances at the suggested links and interrupts the working task only when something is of interest.

Because zero-input interfaces are always formulating associative queries, impromptu information discovery is facilitated. There is no need for users to stop their current task and switch contexts

Page 4: Proceedings Template - WORD - USF Computer Sciencewolber/Research/WebTop/papers/jcdl04.doc · Web viewDavid Wolber, Chris Brooks University of San Francisco 2130 Fulton Avenue San

Figure 2. The WebTop Associative Client

and applications in order to search for related work. The challenge, however, is that many users find non-initiated changes in the user interface disruptive. Our client uses a ramping interface [38] to iteratively and cooperatively expose information to the user, and only modifies the context panel when a new url is loaded or a node in the panel is expanded. We also provide a search box for traditional search.

Graph/tree view of retrieved information. Search engines typically provide results in a linear fashion. The user can select a link to view the corresponding page, but there is no way to expand the link to view documents related to it, and there is no mechanism for viewing a set of documents and their relationships.

A more flexible approach, taken by WebTop, is to display retrieved links in a file-manager-like tree view. When the user expands a node in the tree, the system retrieves information associated with that link and displays it at the next level in the tree. By expanding a number of nodes, the user can view a collection of associated documents, e.g., the citation graph of a particular research area.

Mixing of association types. Search engines and file managers typically focus on one type of association. For instance, Google’s standard search retrieves content-related links, that is, links related to a set of keywords. In the separate advanced search page, a Google user can view inward links of a URL. However, there are no queries or views that integrate content-related and link-related associations. Similarly, file managers

focus on one association type—parent-child relationships of folders and documents—and ignore hyper-link associations and content-related associations. The early associative agents [9, 38] also focused on one association type—content-related links—and ignored explicit links and other associations.

WebTop integrates various association types, e.g., folder-child, link, content-relation, and in general results from any query available in the associative sources API which we have defined (see next section). Associations from each type can be listed at each level of the tree, allowing a user to view various multiple-degree associations, e.g., the documents that point to the content-related links of a document, or the inward links of the outward links of a document (its siblings).

Note that when the sources of the links are personal webs, this essentially allows the client user to navigate into the personal space of another user. When a document from another’s personal web is expanded, the system will display outward, inward, and content-related links from that same source. Outward and inward links from a personal web includes folders as well as other documents, so the client user can navigate both the folder hierarchy and the links within the personal space of the other user.

Mixing of external and internal documents. In the traditional desktop, there are tools that work with web documents (search engines) and tools that work with local documents (file managers and editors). There is generally little integration between the two. WebTop de-emphasizes the distinction

Page 5: Proceedings Template - WORD - USF Computer Sciencewolber/Research/WebTop/papers/jcdl04.doc · Web viewDavid Wolber, Chris Brooks University of San Francisco 2130 Fulton Avenue San

between local and external documents by integrating both into a single context tree view, and by considering links from local to external documents. For instance, if a local document contains a hyperlink to a web document, the agent will display that relationship. If an external document has similar content to that of a local document, that association will be displayed. By considering both the user’s own documents and documents from external sources, the associative agent serves as both a remembrance agent [38] and a reconnaissance agent [29].

Associative saving. Users can also save documents within the context panel, so the agent also serves as an associative file manager. The system provides the ability to create edge links, which associate documents without modifying the internals of either document. WebTop stores these links as metadata and displays them in the context panel. One use edge links facilitates is the ability to add links to unowned web pages (bookmarks).

Such integration of previously separated tools is beginning to occur in commercial systems, e.g., one can now blog within Google. In that case, browsing and blogging (annotated, published bookmarking) are integrated, but saving to the user’s personal space is not. WebTop integrates all of these features—when a user links a document into the personal web, it is saved locally and, if within the shared personal web space, made available to other users. This feature is what we call zero-input publishing—just by bookmarking and saving documents, the user can disseminate knowledge.

Aggregation of Multiple Sources The newest version of WebTop allows a user to select the active sources from the dynamic list of all registered associative sources. For each chosen source, the user also specifies the queries that should be invoked (e.g., keyword search, inward link, outward link) including the number of results returned from each. When a new URL or document is loaded in the browser, or a node in the context tree is expanded, the user-specified sources and queries are invoked. The results are then displayed in both the order and count specified by the user.

4. ASSOCIATIVE SOURCESIn many domains, web service providers are agreeing on standard programmatic interfaces so that information consumer applications need not re-implement client code to access each particular service. For instance, Microsoft has published a WSDL interface to which securities information providers can conform [41].

Our system applies this standardization in a cross-domain fashion by considering web services that provide similar “associative” functionality but are not generally within the same topic domain. In particular, we consider a class of web service which we call associative information sources. Such services associate documents with keywords, documents with other documents, authors with documents, and in general information resources with other resources. The Google and Amazon web services are prime examples of services in this class, as are domain-specific information sources like FindLaw, the Modern Languages Association (MLA) page for literature, and CiteSeer and the ACM Digital library for computer science.

Currently such services either provide only a web page interface that must be scraped by an agent, or they provide a web service

based on their own programmatic interface (API). For instance, the Google and Amazon web services both provide a search method that accepts keywords and returns a list of links, but a different method signature is used by each. Because of this non-uniformity, client application must talk to each associative source using a different protocol. This prohibits a developer who has written a client for the Google service from reusing the code used to access the Amazon service.

More importantly, the lack of a uniform API prohibits the use of polymorphic lists of associative sources. This is important for helsources, such as the WebTop system described above. Without polymorphism, the choice of which sources to make available in a client application must be set at development time, and the end-user of the client application is restricted to those chosen. An end-user cannot access a newly created or discovered source without the code of the client being changed.

A standardized API and registry system is clearly the solution. Initiative for standardization of search-like protocols exist in both the web meta-search areas with START, SDLIB, and SDART [18], and in the digital library world with, for instance, the OAI [35], OCI [36], and XLink[47].

Our particular goal is to define an API based on the XML/Soap web service protocol and the accompanying Web Service Description Language (WSDL [46]) and Universal Description, Discovery, and Integration (UDDI [44]) specifications. We also plan to explore various associative methods within our API(s), thus we have not conformed to any of the existing protocols in this version of our software.

Instead, we have defined a common API for an “associative source”, and a public registry system for such sources [49]. The API is specified in a publicly available WSDL file. It contains various associative methods, including keyword and citation search (see Figure X). The methods allow the client to specify the number of links to be returned and to set restrictions (e.g., date, country) on the elements that should be considered. Results are returned in a generic list of Metadata objects, where the Metadata class is defined to contain the Dublin Core fields and a URL.

With this open system, any organization or individual can expose a digital collection as an associative information source. If there is already a web service for the collection, the owners or a third-party can write a wrapper (adapter) service that conforms to the associative sources API but makes calls to the existing service.

After the source is implemented and deployed, it can be registered using a web page interface that we provide. The registry parses the WSDL files of the sources that register to determine which of the associative source API methods are implemented.

Aggregator agents like WebTop use a web service interface to the registry to access the list of registered sources available to the user. The objects returned from the registry web service method contain URLs referencing the WSDL and endpoint of the source, the particular associative methods that the source provides, and metadata about the source. The aggregator can list the available sources for the user to choose from, or intelligently choose the source(s) for the user. In either instance, the list of

Page 6: Proceedings Template - WORD - USF Computer Sciencewolber/Research/WebTop/papers/jcdl04.doc · Web viewDavid Wolber, Chris Brooks University of San Francisco 2130 Fulton Avenue San

sources is dynamic, allowing users to benefit from newly developed sources as soon as they are available.

To bootstrap the system, we developed a number of associative source web services, including ones that access data from Google, Amazon, Citeseer, the Meerkat RSS feed site, and FindLaw. We have also developed sample C# and Java web service code that can be downloaded and used to build new associative sources. In implementing these sources, a number of interoperability problems were faced, primarily because of immaturity with the WSDL and XML/SOAP protocols and difference in the .Net and Java web service development tools. Through much trial and error, these problems were solved so that services developed in both the Java and .Net platforms can be called generically by our agent.

5. SHARING THE PERSONAL WEBA key component of our project is the idea of a personal web. A personal web consists of the collection of documents and bookmarks on the user’s local hard disk or server space. On initial startup, WebTop users specify the root folders to be analyzed for the personal web, e.g., “My Documents”. The system iterates through the file system, identifying hyperlinks between documents and the characteristic words of each document, and building an inverse index for full-text search of the space. As a user works, this personal web metadata is updated so that it is always consistent with the file system. For example, when a user bookmarks a web page or adds a link from a document to a web page, that association information is recorded.

One use of the personal web information is as a remembrance agent for the user —when a document is opened in the browser, documents from the user’s own personal web that point to or are content-related to the open document are displayed in the context panel.

A more interesting use, of course, is for peer-to-peer knowledge sharing. It should be noted that the personal web is not just a collection of documents that can be searched. Instead, it consists of documents and associations, including hyperlinks and folder-document relationships. Thus, each time the user categorizes a document by placing it in a folder, create a folder, or adds a hyperlink within a document, he is adding to the richness of the information.

To expose personal webs for sharing, we have implemented a web service which conforms to the associative source API and returns information from a personal web. We are currently completing implementation of a mechanism so that, on initial start-up of WebTop, users will be asked if they want to expose their personal spaces as information sources, and, if so, specifications as to which folders should be shared with whom. If a user chooses exposure, the system will automatically register the personal web as an associative source and, each time the user logs on to the system, deploy the web service exposing the methods to the outside world.

Once the users specifies the shared folders, the user will be able to share without effort—all document saving, bookmarking, and link creation will automatically create shared knowledge? We hypothesis that zero effort publication will lead to more sharing, and that much of the information in personal spaces is hidden

from others not for privacy reasons, but because publishing the information as a web page or a blog takes effort.

We realize, of course, that privacy is a complicated issue in both the corporate and academic settings. The challenge will be to provide a privacy specification mechanism that is flexible enough to provide for the various needs of individuals and organizations, but easy enough that people actually use it instead of choosing “share all” or “share none”. The W3C P3P effort [45] should be of help here, along with efforts such as [1] and [5] Our plan is to implement a fairly simple mechanism, make the system available, then use an iterative approach using user feedback to refine the privacy mechanism.

6. PERSONALIZED PAGE RANKING AND SOURCE SELECTIONMultiple information sources exacerbate the already challenging information overload problem of single source search engines. Clustering of results can help [11], as can personalized page ranking [20, 23, 39] and automated source selection.

WebTop currently requires the user to explicitly specify the number of results to be returned from each source, and the ordering. We are currently implementing extensions which will also provide automated page ranking and source selection. The algorithm we have designed combines content similarity, link analysis, and source reputation measurements in choosing sources and links. Prior to information retrieval, context information, including the open documents and those near them in the personal web, can be compared against characteristic terms from prospective sources. In the post-processing phase, the context information can be compared to the results returned from the various sources. In both cases, links from the personal web and the personal webs of others can be used to compute a personalized PageRank

We also plan to incorporate source reputation into the algorithms. The reputation of a source can be computed specifically for the user by measuring the percentage of listed links from the source that the user actually chooses. Collaborative filtering can also be used to take into account the source’s reputation vis-à-vis other users. Such automated source reputation measures have proven helpful in blogging systems and peer-to-peer systems [19].

7. SCENARIOS OF USEThis section current and envisioned uses of the WebTop system.

7.1 Impromptu Information DiscoveryA key facet of the integrated WebTop system is that information can be discovered in an impromptu manner. For instance, when a user opens one of his own research papers in the browser, the system will display the inward links to that paper. If a new web page has linked to it or cited it, that page will appear.

The first author was using WebTop and opened up his list of publications page. An interesting inward link appeared that the author had not seen before. He clicked on the link, and was taken to a page written in German. Fortunately, the page had some English links pointing to interesting work related to the first author’s work.

Page 7: Proceedings Template - WORD - USF Computer Sciencewolber/Research/WebTop/papers/jcdl04.doc · Web viewDavid Wolber, Chris Brooks University of San Francisco 2130 Fulton Avenue San

7.2 Navigating an Expert’s Personal WebImagine an expert, say Henry Lieberman of the MIT Media Lab, renowned computer scientist, guru in various areas including artificial intelligence, software agents, and human-computer interaction, and a person that has a profound effect on the people he meets, one of those people with whom a five-minute conversation can revolutionize ones thinking, change the course of ones research, trigger a thousand new ideas for thought.

Now imagine Henry sitting in his office at MIT, or on a cross-country airplane, sitting and working on his computer, reading some papers, browsing the web, bookmarking particular texts, writing notes, adding links between a paper from one of his research communities (e.g., HCI) to another (e.g., Artificial Intelligence). And imagine that you are able to look over his shoulder and observe him, or better yet, all the work he is doing is recorded in an easily digestible form, one that you can browse at your leisure. Instead of asking Google what it thinks about, say, the semantic web, you can ask Henry—you can search within his personal collection of bookmarks and notes, you can browse his directories, you can navigate through the links he has layered on top of the documents in his collection.

Then to take it a step further, besides navigating the document links within Henry’s collection, you can also navigate people links, people links that have been created through an automated analysis of Henry’s documents, so now you are not only picking Henry’s brains, but sitting in a room of experts, seamlessly floating from one expert’s brain to another. And all this is made true because the selection, filtering, and notation work that Henry and the others have been performing is available in the public domain.

7.3 WebTop as GroupwareYou are part of a small research group at a small law firm. Unlike the scenario above, you only want to share your day-to-day knowledge creation with members of the group, but it is crucial that the members of the group do not duplicate their efforts and that your team is able to produce a cohesive and exhaustive survey of the topic in a limited time.

In this case, you choose the personal webs of the other group members as the sources for your associative agent, as well as a law citation source. Whenever you open a new document, the context panel displays any notes or cases that the other members have associated with the document, as well as cases that cite or are cited by the open document.

8. ECONOMIC AND POLITICAL MOTIVATIONThough important in today’s relatively free Internet environment, distributed and peer-to-peer knowledge sharing may prove even more important as the freedom of that environment is challenged. We must not be fooled by the free and creative beginnings of the Internet, or the “good citizen” approach of Google’s founders [McHugh]—the economic and political forces of our society can easily render these historical anomalies. Instead, a Zinn-like [51] view must be taken: this is a battle for the Internet between corporate interests and the general populace.

Neither business nor political interests are motivated to create a more knowledgeable society. As opposed to free thinkers, both are better served by a placid population of consumers. Corporations do not care what we see, as long as we view their interstitial advertisements. The government that protects those corporations fears anything that threatens their power.

As the portal to information is centralized through Google (a Google-Opoly[33], the company that buys Google, or some other monopoly that emerges, the dangers to information freedom will grow. As Lawrence Lessig has argued[28], this danger will not present itself in law or stated policy, but in code! Our freedom will be dependent on the source choosing and page ranking algorithms hidden within the centralized server.

Two dangers lurk: the infiltration of advertisement within our computers, and thought control through the careful dispensation and withholding of information. In terms of advertisement, one need only consider the historical precedent of television, and how the commercial time per hour is risen steadily.

Powerful forces would like to see the Internet go in the same direction. Consider the following Kafkaesque scenario which the first author recently experienced. As he was browsing, advertisements—to Netscape, Great Beginnings, and AOL, among other “legitimate” companies—began appearing on his computer screen. Note that these were not the normal pop-ups that the author, in his infinite wisdom, had removed months before using anti-popup software. In this case, the popup did not emanate from the pages the author was visiting. Instead, an “adware” agent, maliciously installed on the author’s computer, was responsible. The agent was not only popping up interstitial messages, it was monitoring the user’s browser behavior so as to “personalize” the ads that he received (no comment on how this personalization manifested itself).

Being a somewhat savvy computer user, the author realized what was occurring and opened up the “Control Panel” to remove it. He searched his list of installed components and found one he didn’t recognize. With some fear that he was deleting some component of some application that was important, the author selected the unknown application for removal.

A dialog appeared stating that the application should not be removed, but if you really want to, click OK. Of course, the dialog disappeared before the OK button could be chosen. After many attempts, the author deftly reached the OK button to specify his choice. Unfortunately, and obviously, clicking the button had no effect—the program was not removed.

The point of this anecdote is not that some idiot was able to infiltrate the computer in this way, but that legitimate companies were accepting enough of such a strategy to buy into it. It is certainly proof of the passion and desperation corporations have in making Internet advertisement work, as well as their expectation of what consumers will accept.

The other threat which centralized access to the web poses, is information access control through page ranking or other software mechanisms. Encoded in complex algorithms, such access control would significantly hinder freedom of thought. Google has already acknowledged reacting to pressure from the Chinese government:

Page 8: Proceedings Template - WORD - USF Computer Sciencewolber/Research/WebTop/papers/jcdl04.doc · Web viewDavid Wolber, Chris Brooks University of San Francisco 2130 Fulton Avenue San

Chinese who use Google to search on terms like "falun gong" or "human rights in china" receive a standard-looking results page. But when they click on any of the results, either their browsers are redirected to a blank or government-approved page, or their computers are blocked from accessing Google for an hour or two.[33]

One is left to wonder: does Google facilitate similar access control for our government, but in less obvious ways?

Clearly, know matter who is in charge, centralized control of information is not in the public’s interest. Systems that aggregate information from separate collections are less prone to such control. Those based on a centralized registry, such as in the WebTop system we have described, are not immune from control, even though the centralization is at the information source level and does not rank or suggest documents directly. Ideally, both source identification and document discovery should probably use the peer-to-peer model [39,42]

9. RELATED WORKReconnaisance Agents. The idea of an agent that assists users in their browsing and discovers new links on their behalf has been explored in the past. One of the most well-known systems is Lieberman's Letizia system [30], which helped users browse web pages by looking ahead at the links on each page and suggesting ones that match the user's working profile. Letizia performed a personal crawl [12] seeded from the current document, and built the user's working profile based on information recorded during the current session. A successor system, PowerScout [29], used longer-term user profiling information, and also recognized the need for multiple profiles representing a user's various personalities and interests.

Margin Notes [38] was a just-in-time information system. It used TFIDF to find the most characteristic words in each section of a document, and then sent those words to both a general-purpose search engine and a search facility for the user's local documents. The resulting links were listed in the margins of the document, providing just-in-time and up-to-date annotations of the document. Since the annotations could come from the user's local files as well as the web, the agent serves as a remembrance agent as well as a reconnaissance agent.

Whereas Letizia helped users browse, Margin Notes provides associative links both within the browser context and a word processing context. This latter component is important in that associative information is pushed to the user during the creative process. Margin Notes is also different than Letizia in that it uses a general-purpose search engine to search for related items on the entire web, as opposed to searching only in the neighborhood of the current document.

Watson [9] is an information management assistant similar to Margin Notes. Its goal, similar to that of SUITOR [31], is to be attentive to all of a user's everyday applications. Watson researchers also explored automatically selecting the source of a search using the terms from the working document [27].

The WebTop associative agent was motivated by all of the above systems. Like Watson, it finds related information from multiple information sources. It also performs a search in the neighborhood of the working document, like Letizia, and

searches the personal space, like Margin Notes. WebTop is different from all these systems in the types of associations it considers and the way it relaxes conventional distinctions between local and web documents, and between various types of associations.

Personal Knowledge Base Systems. TheBrain [43] is billed as an associative computing system. It allows users to create “things”, associate them, and view a graph of those things. A user can create a thing from a URL, but the system is not integrated with a file manager, browser, or web graphing system.

Haystack [22] offers an integrated personal platform based on XML and RDF associations. Haystack is at the systems level, with XML-compliant applications running on top of it. Thus, applications like email and file managers speak the same language, enabling the system to offer associative features not possible in the traditional environment.

MBiblio [37] offers a personalized interface to a federation of digital libraries. All libraries in the federation follow the OAI-PMH standard. Like WebTop, use of a standard allows for cohesive meta-search. Note, however, that MBiblio is not integrated with a browser or file manager, and does not provide views of different types of associations.

Meta-search. There are a number of commercial and research metasearch systems. 37.com provides access to 37 different search engines from a single interface. dogpile (www.dogpile.com) combines various types of information source, including newsgroups and white pages. Inquiris [17], [50], and MetaSpider [13] use link analysis as well as content in clustering the results from various sources. Inquiris also augments queries with user information. SavvySearch [14] uses the user’s past choices to help choose the sources that should be searched on each query. The digital library system described in [34] performs both meta-search and reference linking from multiple sources.

Source Discovery. Source discovery is aided by sources exposing characteristic terms or some other sort of profile. START and SDARTS [18] define standards for exposing such a profile. Similar efforts exist in the UDDI world. DAML-S [2] is a web service specific language to describe semantics of particular methods that a source provides.

5.3. Collaboration. The idea of developing a multi-agent system that allows users to share information or collaborate has also been explored by other researchers. Chau, et. al. [11] describe a system in which users are able to annotate and share the results of Web searches. They found that user performance was reduced compared to a single-user system when the number of other collaborators was small, but that once a threshold number of other collaborators and searches was reached, sharing and annotation became a worthwhile task.

Research in Collaborative Filtering systems has also considered the problem of allowing users to share recommendations about web pages, movies, or books. Collaborative filtering is an extremely active research area, producing both research projects such as GroupLens [25] among many others, as well as commericial products such as the Alexa Toolbar. Collaborative filtering typically works by having a large set of users rate a

Page 9: Proceedings Template - WORD - USF Computer Sciencewolber/Research/WebTop/papers/jcdl04.doc · Web viewDavid Wolber, Chris Brooks University of San Francisco 2130 Fulton Avenue San

number of documents, and then relating a new user to these users, so that the new user is 'close' to users with similar tastes.

Community Formation. One of the more intriguing possibilities is the spontaneous emergence of communities of people with shared interests discovering each other. Flake, et. al., [15] have studied the identification of self-organizing communities on the web. Using techniques from graph theory, in particular network flow algorithms, they identify clusters of web pages that are highly interconnected, thereby forming a community. Often these communities are emergent; they form through a series of local interactions, rather than through some supervising process. Flake, et al's work differs from ours in that they are concerned with identifying communities that already exist, rather than bringing new communities into existence. We are also interested in forming communities of users, rather than documents. Nevertheless, their work on identifying network structures that enable community formation will help us in determining successful methods to help aid in community formation.

There has also been work in the multi-agent area concerning the formation of coalitions and congregations of agents [7,8,40,52] and communities [16,32]

Personal Spiders and Focused Crawlers. Focused crawlers [10] and Personal spiders [12,13] crawl the web beginning with a set of seed documents and a profile. The plan for WebTop is to run a personal crawler with seed documents taken from the personal web.

10. CURRENT STATUS AND FUTURE WORKOnly a prototype of our newest WebTop version exists. It is incomplete, buggy, and has yet to be formally evaluated, but informal observation of users suggests it is extremely powerful as a research tool. It, the implemented associative sources, and the registry, have not yet been made public. Our plan is to release a public version in March of 2004.

Once the system is available, we plan to study both explicitly designed groups of WebTop users, as well as grassroots uses of the system. Will communities of users evolve? Will experts share? Will users free-ride the system, as many do with Gnutella and other file sharing systems?

The system currently offers no help in source discovery, other than a description provided by each source. We plan to explore mechanisms for automated source discovery, as well as automated congregation of sources.

As mentioned, we are also implementing both automated source selection and personalized page ranking using a user model based on the personal web We also are completing implementation of a personal spider that emanates from the documents in the personal web. Whereas the context panel allows the user and agent to collaboratively navigate, the personal spider works on its own (as the user sleeps!), collecting metadata about the documents near the personal space. Our plan is to perform user tests of the quality of n-degree neighborhoods, both for the user’s own neighborhood, and for the neighborhoods of others (some expert or group of experts). Will such neighborhoods provide better search results than Google in some instances?

11. SUMMARYThe key contributions of our work are the introduction of

An associative tree view of documents that can be“programmed” by the end-user, i.e. the user can choose the associations shown on node expansion.

A system that integrates browsing, searching, citation analysis, and blogging.

A working version of an “associative source” API and registry.

A GUI that cohesively integrates the personal space, external information sources, and the personal spaces of others.

Though the work is presented as a web application, most of the ideas apply to the concept of a personal digital library as well.

12. ACKNOWLEDGMENTSWe would like to thank the 2003 senior project team at the University of San Francisco for implementing the WebTop system described in this paper.

13. REFERENCES[1] Agrawal, R., Kiernan, J., Skirkant, R. Xu, Y., An XPath-

based Preference Language for P3P, Proceedings of the World Wide Web Conference, WWW2003, Budapest, Hungary, 2003.

[2] DAML-S Coalition: Ankolekar, A. Burstein, M. Hobbs, J. Lassila, O. Martin, D. McIlraith, S., Narayanan, S., Paolucci, M. Payne, T. Sycara, K. and Zeng, H., DAML- S: Semantic markup for Web services. In Proc. Int. Semantic Web Working Symposium (SWWS), 411–430, 2001

[3] Baeza-Yates, R., Ribeiro-Neto, B., Modern Information Retrieval. ACM Press, New York, 1999.

[4] Billsus, D., Pazzani, M.,Learning Probabilistic Models,. Workshop Notes of "Machine Learning for User Modeling", Sixth International Conference on User Modeling, Chia Laguna, Sardinia, 1997.

[5] Bretzke H., Vassileva J., Motivating Cooperation in Peer to Peer Networks, in P.Brusilovsky, A. Corbett, F.De Rosis (eds.) Proceedings of the 9th International Conference, on User Modelling, UM03, Johnstown, PA, Springer LNCS, 218-227, 2003

[6] Brin, S., Page, L. The anatomy of a large-scale hypertextual Web search engine", Computer Networks and ISDN Systems, 30(1), pp107-117, 1998.

[7] Brooks, C., Durfee, E., Armstrong, A., .An Introduction to Congregating in Multiagent Systems,. Proceedings of the Fourth International Conference on Multiagent Systems, pp 79-86, 2000

[8] Brooks, C. Durfee, E.,Congregation Formation in Multiagent Systems,. Autonomous Agents and Multiagent Systems, Special Issue on Infrastructure for Agents, Multi-Agent Systems and Scalable Multi-Agent Systems, 7(1-2), July/September, 2003.

[9] Budzik, J., Hammond, K., "Watson: Anticipating and Contextualizing Information Needs," 62nd Annual Meeting

Page 10: Proceedings Template - WORD - USF Computer Sciencewolber/Research/WebTop/papers/jcdl04.doc · Web viewDavid Wolber, Chris Brooks University of San Francisco 2130 Fulton Avenue San

of the American Society for Information Science, Medford, NJ, 1999.

[10] Chakrabarti, S., van den Berg, M., and Dom, B.: Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery. In Proceedings of the 8th International World Wide Web Conference, Toronto, Canada, May 1999.

[11] Chau, M., Zeng, D., Chen, H., Huang, M.,Nendriawan, D., Design and Evaluation of a Multi-agent Collaborative Web Mining System, Decision Support Systems, 988, 2002.

[12] Chen, H., Chung, Y., Ramsey, M., Yang, C., An Intelligent personal spider (agent) for dynamic internet/intranet searching,. Decision Support Systems, 23(1), pp. 41-58, 1998.

[13] Chen, H., Fan. H., Chau, M., and Zeng, D.: MetaSpider: Meta-searching and Categorization on the Web. Journal of the American Society of Information Science & Technology, 52(13) (2001), 1134-1147.

[14] Dreilinger, D. Howe, A., Experiences with selecting search engines using metasearch, ACM Transactions on Information Systems (TOIS), v.15 n.3, p.195-222, July 1997

[15] Flake, G., Lawrence, S., Giles, C, Coetzee, F., "Self-Organization of the Web and Identification of Communities", IEEE Computer, 35(3), pp 66-71, 2002.

[16] Foner, L. N. Yenta: A MultiAgent, ReferralBased Matchmaking System. In Proceedings of The First International Conference on Autonomous Agents, 301-307, ACM Press, 1997.

[17] Glover, E., Tsioutsiouliklis, K., Lawrence, S. Pennock, D., Flake, G., Using Web Structure for Classifying and Describing Web Pages, Proceedings of WWW02, Honolulu, HA, 2002.

[18] Green, N., Ipeirotis,P., Gravano, L., SDLIP + STARTS = SDARTS a protocol and toolkit for metasearching, Proceedings of the first ACM/IEEE-CS joint conference on Digital libraries, p.207-214, January 2001, Roanoke, Virginia, United States.

[19] Gupta, M., Judge, P., Ammar, M., Peer to peer systems: A reputation system for peer-to-peer networks, Proceedings of the 13th international workshop on Network and operating systems support for digital audio and video, 2003.

[20] Haveliwala, T., Topic-sensitive PageRank. In Proceedings of the Eleventh International World Wide Web Conference, Honolulu, Hawaii, May 2002.

[21] Huang, Z., Chung, W., Ong, T., Chen, H., A Graph-Based Recommender System for Digital Library, in: Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'02), Portland, Oregon, July 14-18, 65-73, (2002).

[22] Huynh, D., Karger, D., and Quan, D. Haystack: a platform for creating, organizing and visualizing information using RDF. Semantic Web Workshop, WWW2002 (May 2002).

[23] Jeh, G., Widom, J., Scaling personalized web search, Proceedings of the twelfth international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary.

[24] Kleinberg, J., Authoritative Sources in a Hyperlinked Environment. J. ACM 46(5): 604-632 (1999).

[25] Konstan, J. Miller, B., Maltz, D. Herlocker, J., Gordon, L, and Riedl, J., GroupLens: Applying Collaborative Filtering to Usenet News, Communications of the ACM, 40, 3,1997.

[26] Lawrence, S., Giles, C., "Text and Image Metasearch on the Web", Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, pp 829-835, CSREA Press, 1999.

[27] Leake, D., Scherle, R., Budzik, J., Hammond, K., .Selecting Task-Relevant Sources for Just-in-Time Retrieval (1999)., Proceedings of the AAAI-99 Workshop on Intelligent Information Systems), AAAI Press.

[28] Lessig, L., The Future of Ideas: The Fate of the Commons in a Connected World, Random House, 2001.

[29] Lieberman, H., Fry, C., Weitzman, L., Exploring the Web with Personal Reconnaissance Agents,. Communications of the ACM, 44(8), August, 2001.

[30] Lieberman, H., Letizia: An agent that assists Web browsing,. Proceedings of the International Joint Conference on Artificial Intelligence( IJCAI-95), Montreal, 1995.

[31] Maglio, P., Barrett, R., Campbell, C., Selker, T., SUITOR: An Attentive Information System,. 2000 International Conference on Intelligent User Interfaces,, New Orleans, LA, ACM Press.

[32] Marsh, S. and Masrour, Y. 1997. Agent Augmented Community Information — The ACORN Architecture. In Proceedings of CASCON’97, Meeting of Minds, 1997.

[33] McHugh, J., “Google vs. Evil”, Wired Magazine, 11.01, http://www.wired.com/wired/archive/11.01/google_pr.html, Janurary 2003.

[34] Mischo, W., Habing, T., Cole, T., Integration of simultaneous searching and reference linking across bibliographic resources on the web, Proceedings of the 2003 Joint Conference on Digital Libraries, 2003.

[35] Open Archives Initiative, http://www.openarchives.org/

[36] Open Citation Project, http://opcit.eprints.org/

[37] Reyes-Farfan, N., Sanchez, J., Personal Spaces in the Context of OAI, proceedings of the Joint Conference on Digital Libraries, 2003.

[38] Rhodes, B., Maes, P., Just-in-time information retrieval Agents, IBM Systems Journal, 39,(3-4), pp685-704, 2000

[39] Shi, S., Yu, J., Yang, G., Wang, D., Distributed Page Ranking in Peer-to-Peer Systems, Proceedings of 2003 International Conference on Parallel Processing , October, 2003

[40] Shehory, O., Kraus,S., Mehods for Task Allocation via Agent Coalition Formation., Artificial Intelligence, 101, pp165-200, 1998.

Page 11: Proceedings Template - WORD - USF Computer Sciencewolber/Research/WebTop/papers/jcdl04.doc · Web viewDavid Wolber, Chris Brooks University of San Francisco 2130 Fulton Avenue San

[41] Short, S., Building XML Web Services for the Microsoft .Net Platform, Microsoft Press, 2002.

[42] Suel, T., Mathur, C. Wu, J. Zhang, J., A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval, Proceedings of WWW 2003, 2003.

[43] TheBrain, www.thebrain.com.

[44] UDDI Home Page, http://www.uddi.org.

[45] WC3 P3P Group, http://www.w3.org/P3P.

[46] WC3 WSDL Specification, http://www.w3.org/TR/wsdl

[47] XLink Language Specification, http://www.w3.org/TR/xlink/

[48] Wolber, D., Kepe M., Ranitovic, R., Exposing Document Context in the Personal Web,. Proceedings of the International Conference on Intelligent User Interfaces (IUI 2002), San Francisco, CA..

[49] Wolber, D., Brooks, C., Associative Agents and Sources, submitted to the World Wide Web Conference (WWW 2004).

[50] Yu, C., Meng, W., Wu, W., Liu, K., Efficient and Effective Metasearch for Text Databases Incorporating Linkages among Documents. In Proc. of SIGMOD 01, CA, 2001.

[51] Zinn, H., The People’s History of the United States, Harper and Row, 1980.

[52] Zlotkin, G., Rosenschein, J., Coalition, Cryptography, and Stability: Mechanisms for Coalition Formation in Task Oriented Domains,. Proceedings of the National Conference on Artificial Intelligence, Seattle, WA, pp 432-437, 1994.