The Online Knowledge Center: Building a Component Based...

6
The Online Knowledge Center: Building a Component Based Portal O. Balsoy Computer Science Department Florida State University Tallahassee, FL ([email protected]) M. S. Aktas Commmunity Grids Lab Indiana University Bloomington, IN G. Aydin Commmunity Grids Lab Indiana University Bloomington, IN M. N. Aysan Commmunity Grids Lab Indiana University Bloomington, IN C. Ikibas Commmunity Grids Lab Indiana University Bloomington, IN A. Kaplan Commmunity Grids Lab Indiana University Bloomington, IN J. Kim Computer Science Department Florida State University Tallahassee, FL M. E. Pierce Commmunity Grids Lab Indiana University Bloomington, IN A. E. Topcu Commmunity Grids Lab Indiana University Bloomington, IN B. Yildiz Commmunity Grids Lab Indiana University Bloomington, IN G. C. Fox Commmunity Grids Lab Indiana University Bloomington, IN Abstract This paper presents an overview of the On- line Knowledge Center (OKC) web portal. The OKC is built around a portlet/container architecture: a cen- tral control portal is composed of several portlets that can deliver both local content and content from re- mote servers. The modular structure allows us to de- velop sophisticated portal components independently and plug them into the portal container using well de- fined XML interfaces. We describe problems we have discovered with this architecture and extensions and solutions that we are implementing. We also describe two advanced services that can be plugged into the overall framework: XML message-based newsgroups and hybrid structured/unstructured data searches. Keywords: portlets, XML messaging, hybrid search 1 Introduction The Online Knowledge Center (OKC) is a compo- nent based portal system that we are designing to support distributed web content display, manage- ment, and authoring. The OKC will potentially be used to provide access to a wide range of infor- mation in various formats, including but not lim- ited to presentation slides, software repositories, contact information, training announcements, and newsgroup postings. Component-based portals represent an impor- tant area for investigation and development. Es- sentially, this allows us to treat all web accessible content as objects that can be wrapped inside stan- dard XML interfaces. Through a standard web ser- vice framework such as WSRP[1], portal contain- ers will be able to dynamically discover and bind to desired content portlets. We view this as the correct interface management system for user interfaces to

Transcript of The Online Knowledge Center: Building a Component Based...

Page 1: The Online Knowledge Center: Building a Component Based Portalgrids.ucs.indiana.edu/ptliupages/publications/ike02okc.pdf · web services: portals become an aggregate of dis-tributed

The Online Knowledge Center: Building a Component BasedPortal

O. BalsoyComputerScienceDepartment

FloridaStateUniversityTallahassee,FL

([email protected])

M. S.AktasCommmunityGridsLab

IndianaUniversityBloomington,IN

G. AydinCommmunityGridsLab

IndianaUniversityBloomington,IN

M. N. AysanCommmunityGridsLab

IndianaUniversityBloomington,IN

C. IkibasCommmunityGridsLab

IndianaUniversityBloomington,IN

A. KaplanCommmunityGridsLab

IndianaUniversityBloomington,IN

J.KimComputerScienceDepartment

FloridaStateUniversityTallahassee,FL

M. E. PierceCommmunityGridsLab

IndianaUniversityBloomington,IN

A. E. TopcuCommmunityGridsLab

IndianaUniversityBloomington,IN

B. YildizCommmunityGridsLab

IndianaUniversityBloomington,IN

G. C. FoxCommmunityGridsLab

IndianaUniversityBloomington,IN

Abstract This paperpresentsan overviewof the On-line KnowledgeCenter(OKC)web portal. TheOKCisbuilt around a portlet/containerarchitecture: a cen-tral control portal is composedof several portletsthatcan deliver both local content and content from re-moteservers. The modular structureallowsus to de-velop sophisticatedportal componentsindependentlyandplug theminto theportal containerusingwell de-fined XML interfaces.We describeproblemswe havediscoveredwith this architecture and extensionsandsolutionsthat we are implementing.We alsodescribetwo advancedservicesthat can be plugged into theoverall framework: XML message-basednewsgroupsandhybrid structured/unstructureddatasearches.

Keywords: portlets, XML messaging, hybridsearch

1 Introduction

TheOnlineKnowledgeCenter(OKC) is acompo-nentbasedportal systemthat we aredesigningtosupportdistributed web contentdisplay, manage-ment,andauthoring.TheOKC will potentiallybeusedto provide accessto a wide rangeof infor-mation in variousformats,including but not lim-ited to presentationslides, software repositories,contactinformation,training announcements,andnewsgrouppostings.

Component-basedportals representan impor-tant areafor investigationand development. Es-sentially, this allows us to treatall webaccessiblecontentasobjectsthatcanbewrappedinsidestan-dardXML interfaces.Throughastandardwebser-vice framework suchasWSRP[1], portal contain-erswill beableto dynamicallydiscoverandbindtodesiredcontentportlets.Weview thisasthecorrectinterfacemanagementsystemfor userinterfacesto

Page 2: The Online Knowledge Center: Building a Component Based Portalgrids.ucs.indiana.edu/ptliupages/publications/ike02okc.pdf · web services: portals become an aggregate of dis-tributed

webservices:portalsbecomeanaggregateof dis-tributedportlets,eachof which in turn is a clientto oneor morewebservices.All componentsde-scribethemselvesandcommunicatewith XML.

In summary, the OKC development and re-search focuses on the following major areas:portal componentsthat manageinformation de-livered from distributed content servers; a dis-tributed content managementsystem; dynamiccontentcreation”wizards” to supportnewsgroups,training andconferenceregistration;andmultiplesearchcapabilities,including searchesover semi-structuredweb site material, structuredsearchesover XML data,and hybrid searchesover linkeddocuments.

The OKC server containsonly the centralpor-tal controlanddisplaycodeandOKC specificwebcontent.We useJetspeed[2] for this portal frame-work. Otherwebcontentis maintainedonseparateserversthatdoubleascontentmanagementservers.Apache-Jakarta’s Slide[3] project is a WebDAV-based[4] contentmanagementsystemthat we areevaluating.

Remotecontentis organizedinto portlets,whichare mapped to HTML tables for display inbrowsers. The centralportal server controls thearrangementand display of the portlets. Portletcontentcan be as simple as static web pages,orit canbe a userinterfaceto sophisticatedcustomservices.

In the remainderof the paperwe presentex-tensionsandmodificationsthat we have madetostandardportlets. We thendescribetwo examplesof sophisticatedservices(newsgroupsandhybridsearches)that we aredeveloping. Web interfacesfor theseservicesplugnaturallyinto theportalcon-tainerframework asportletcomponents.

2 Portal Development for OKC

The OKC Web Portal is built on an opensourceproject from Apache’s Jakartaproject[5], calledJetspeed. Jetspeedprovides user authenticationand profiling, screen layout managementandconfiguration,HTML aggregation from differentsources,andothervariousportalservices.WehaveconsideredJetspeedas a startingpoint to imple-

mentOKC distributedcontentmanagementportal.However, further researchand developmentwasneededto extendJetspeedfunctionality to accom-plishourgoal.

2.1 Navigation Problem

Jetspeedserves as a thin Web interface with nocontentdepth. All the content,exceptJetspeed’smenuand navigation links, is retrieved from re-mote hostsand usersare directedto thesehostswhenever a link is chosen. The OKC PortalpresentsHTML content from different sourceswhile it keepsusersnavigatingthroughthecontentwithin the portal environment. That is, we needto beableto displaylinkedpageswithin thesameportlet,delivering”deep”contentdevelopedby in-dependentcontentdevelopers.

Jetspeed’s standardmechanismfor deliveringremoteweb contentis throughthe WebPagePort-let. ThisportletalsousesotherAPI functions,suchasHTMLRewriter, to replaceall theHTML linkswith theirabsoluteforms.

The OKC developmentteam hasdevelopedanew versionof WebPagePortlet,calledOKCWeb-PagePortlet.In thenew version,we have assignedauniqueidentificationto eachportletto distinguishbetweenthem.This allows usto maintainseparatestatein several different navigable portlets. ThisID systemmay be supersededby future standardJetspeedconventions.

We have also rewritten HTML links fetchedfrom the remotecontentso that insteadof spawn-ing a new browserwindow with anabsoluteURLaddress,we now redirectthis absoluteURL as aparameterto ourmodifiedWebPagePortlet.This issummarizedin Table2.1.

JetspeedrunsonaTurbine[6] servletwhichcon-structsa specialobject,calledrundata,that main-tainsall servletrequest,sessionand userprofilerdata,anddistributesit to eachportlet. Followingtheuserclick, theportalrequestURL is formedas"http://./portal?unique_id=http://./&.," and, from rundataobjects,the URL parametersareretrieved by eachportletusing their uniqueIDs. Finally, the new URL isusedto fetch the selectedpage,an action whichsimulatesWebnavigation.

Page 3: The Online Knowledge Center: Building a Component Based Portalgrids.ucs.indiana.edu/ptliupages/publications/ike02okc.pdf · web services: portals become an aggregate of dis-tributed

Original Link http://remote.host/dir/file.htmlWebPagePortlet <a href="http://remote.host/dir/file.html">OKCWebPagePortlet <a href="http://jetspeed.host/jetspeed/portal?unqueId=

http://remote.host/dir/file.html">

Table1: Reprocessingrulesfor URLs in differentportlets.A uniqueidentify for theportletis specifiedin aconfigurationfile andreplacestheparameteruniqueId above.

In caseof multiple portletsusedby the sameuser, each OKCWebPagePortletstores the lastclicked URL in the servlet session. When theportal view is reconstructedwithout clicking alink, i.e. returning from layout customizer, allthe portletsretrieve their currentURLs from theservletsession.If noURL is found,theinitial URLfrom theregistry is used.

2.2 HTML Aggregation Problems

All the HTML contentandresourcesaccessedbythe portal have their own elementsandcharacter-istics accordingto the context they are producedfor. The title information, metadatainformation,scripts,colors,font faces,styles,images,Java ap-plets,ActiveX controlsandmany other technolo-gies are widely used. However, forming a newportalpagefrom differentHTML sourcesrequiressacrificeof featureswhichdonotwork with othersquitewell, or scriptswhichdonotfit into theportalenvironment.

OKC Web Portal discriminates among theHTML featuresasneeded.Javascriptsectionsarepreserved but their behaviors may change. OKCincludesa script library to resolve problemsthatcontentdevelopersmight face for adaptingtheirscripts. All the HTML headertagsexceptMETAtagsare preserved. LINK tagsare moved to theportal’s headersectionso that developers’ stylemay take effect. Any HTML contentwith framesis displayedin a new browserwindow outsidetheportal environment. Contentfiles with systematicnamesfor eachframecanbedisplayedwithin theportal.HTML links with window andframetargetsarealteredto suitOKC Portalenvironment.

2.3 Restricted Hosts

As we solvedthenavigationproblem,we alsohadto prevent HTML contentfrom different sourcesdisplayedwithin the portal becauseof the prob-lems mentionedin Section2.2. We defineeachcontenthostandspecificlocationwithin thathostasanOKC area.An OKC areaconsistsof remotecontentin a singledirectorythathasall beenpro-ducedfollowing OKC contentguidelines.All theHTML links from an areaare local and relativepaths. Suchcontentis displayedin an areaport-let insidetheportal. Links to otherareasarevalidandareaportletscanswitchfrom areato area.Anylink to resourcesoutsideanareais not guaranteedto possessvalid, OKC-compatiblewebcontent,soit is displayedoutsideof the portal view in a newwindow.

Eachareais registeredin an areaconfigurationfile in XML with its name,hostandpathinforma-tion. Also, eachareais expectedto have a stemlinks file at their top level contentdirectory. Thestemlinks file is anXML file thatdefinesthecon-tent’s top level menu. This file containsa menutitle, description,and a list of menuitems, eachhaving a relative pathfrom the top level directoryto thecontentfile, anitemtitle andabrief descrip-tion. The OKC Web Portal provides areasensi-tivemenusalongwith theareaportletswhichcon-structsa sub menuwhile usersnavigate throughthecontent.Any changesin areasaredetectedandthesubmenuis replacedwith anothermenuusingstemlinks files insidethecontent.

3 XML Messaging for NewsGroups

TheOKC WebPortalis enrichedby collaborativetools.As oneof thefirst attemptsto makethishap-

Page 4: The Online Knowledge Center: Building a Component Based Portalgrids.ucs.indiana.edu/ptliupages/publications/ike02okc.pdf · web services: portals become an aggregate of dis-tributed

pen,we have addednewsgroupscapability to theportal whereuserscan interact throughboth on-line andoffline messaging.TheOKC notonly pro-videsinterfacesto readandpostmessagesbut alsoallows e-maildistributionsto andfrom groupsub-scribers.Thisserviceresemblesmailinglists;how-ever, the architectureis built on distributed mod-ulesoperatingseparatelyandcommunicatingby aJMS-compliantevent brokering system[7, 8] sothat eachmodulecan serve as a Web servicetomultiple clientsor otherservicesassystemsgrowand have new functionalities. This representsanexampleof usingformswith XML targets.

Thenewsgroupsystem’smajorcomponentsandinteractionsareshown in Figure1.

Figure1: OKC’s newsgrouparchitecturesupportsbothemailandJSPclientsandreaders.

Newsgroupmessagesarenothingbut eventsforthe system,whosestructuresare defined by anXML schema,envelopedin SOAP [9] messageswith attachments,and handledas other systemeventsarehandled.Newsgroupmoduleslisten todistributed eventsand distinguishthe newsgroupmessagesfrom othersandactaccordingly.

There are two major event generatorsin thenewsgroupsarchitecture.ThesearetheNewsWiz-ard and the E-mail Handler. The Wizard is aJSP[10] interfacebasedon theevent schema.For

newsgroupusers,it is aninterfaceto postmessagesto thesystem.After usersposttheir messages,theWizard generatesXML event instancesand pub-lishesthemto the event broker assystemevents.The secondmajor event generatoris the Handler.It interactswith anSMTPhost,validatesandcon-vertsusers’e-mailmessagesto systemevents.

Eachevent is identifiedby a uniqueURI suchasgxos://okc/events/2334 andmessagescontainedbyeventsasgxos://okc/newsgroups/agroup/44332.EachURI correspondsto an XML documentorXML metaobjectwhoseschemaandmanagementaredefinedby GXOS [11]. EachXML metaob-ject containsinformationaboutwhereandhow anactualresourceis accessedandpreserved.

The NumberGeneratorserviceis deployed toensureasystem-wideuniquenaming.Thisserviceis usedby event publishersbeforeany URI is as-signedto eventsandmessages.

As events are published, there are two sub-scribersof suchevents,theNews Recorderwhichprovidespersistency for systemevents. Eachusermessageis recordedin a databasefor later refer-rals.Theothersubscriberis theE-mailDistributor.TheDistributor detectseventswith usermessagesandsendsthemout to thesubscribersof thenewsgroupthemessagewassentto.

Recordedevents or user messagescan be re-viewedasa list of resourcesin RichSiteSummaryformat[12], which is meantto publishrecentcon-tent changesof a Web site in an XML format toremotesubscribers.RSScontainsa topic,descrip-tion, anditemswith link titles andaddressesto therelatedresources.The News Feedermodulepro-videssuchRSSfeedsto any otherpartsof thesys-tem. Any resourcechosenfrom the itemslist canberetrievedfrom thedatabaseby usingits addressinformationthroughtheFeeder.

The front-end of the newsgroupsarchitectureis the News Reader. This module interactswithusersandretrievesRSSfeedsfor selectedgroupsor news messagesin XML if a particularmessageis selectedto display. TheReaderusesXSLT[13]transformersto generateHTML content for theportalfrom XML messages.

Page 5: The Online Knowledge Center: Building a Component Based Portalgrids.ucs.indiana.edu/ptliupages/publications/ike02okc.pdf · web services: portals become an aggregate of dis-tributed

4 Hybrid Search Prototype

We madea test prototype of the hybrid searchfor the XML documentsattachedwith externalfiles. An XML documentcanhave link tags,whichdesignateexternal documents.The external doc-umentsmay be Microsoft Word files, MicrosoftPower Point files, PDF documents,or PostScriptfiles. Hybrid searchesallow us to simultaneouslysearchsemi-structureddocumentsand structuredmetadataabout those documents. For this, weare incorporatingOracle databaseand searchingtools[14, 15, 16].

Actually, we cannot know specific informa-tion of the paperdocumentwithout an XML in-stancethat representsmeta-datawith a title, au-thors, affiliations, the source,and the publishedyear. The benefitsof the meta-dataare not onlythe particularmeta-informationbut also the per-formanceimprovementwith narrowing the groupto besearchedfor thecontext index. However, thelarge amountof XML documentsmay reducetheperformancefor extracting particularnodesfromXMLType columnsin the Oracledatabasetables.In that case,we can considerordinary indexedcolumntypesmappingto the frequentlyaccessednodesof the XML instances.The DocType col-umn hasa filtering option: BINARY documentswill be convertedto plaintext documentsthroughthe filter andTEXT documentsdo not needto befilteredwhenindexed.TheEntity-Relationship(E-R) diagramfor ourprototypeis in Figure2.

The descriptionsattribute of the Papersentityset representsthe XML instanceswith XMLTypeobject type, and the PaperND(Paper NameandDirectory) is a key field with a unique value ofNameandDirectoryin thePaperstable.TheCon-tentsattribute of the PaperFilesentity sethastheBFILE large object type. The actualfile will bestoredin thedesignatedsubdirectoryoutsideof thedatabase.TheDocTypecolumnhasa filtering op-tion: BINARY will be filtered and TEXT is notnecessaryto filter when indexed. The additionalupdate,insert,or deletewill changethe statusoftheindex andthenew informationwill bereflectedinto theindex usingsynchronizepackageincludedin the Oracle9i databasesystem. The FileLoca-tor tableshows the relationshipbetweentwo data

Figure2: Figure3 E-R diagramof the testproto-type

tables:PapersandPaperFiles.Users interact with the searchtool through a

webinterface. Userqueriesarehandledby a Javaservletthatmakesa SQL queryfrom theparame-terspassedfrom theusersearchWebpageandgetstheresultsetfrom thedatabasetablesby queryingthroughJDBCconnection.

5 Summary and Future Work

In thispaperwehave describedextensionsto port-let capabilitiesthatareneededto supportentirere-moteareaswithin a singleportlet. We have alsodescribedimportantexampleservices: the news-groupsystemwill serveasaprototypefor all of ourdynamic contentcreationand management,andhybrid searchcapabilitiescanbeappliedto searchonline libraries. Important future work includesmoresophisticatedcontentmanagementservices,includingpagevalidity servicesthatcheckHTMLcomplianceandmessage-basedauthoringsystemssimilar to the newsgroupsystem. We also planto develop portlets that supportmultimediacon-tent. Finally, we must also supportWSRP-stylewebservicesfor finding andbindingto distributedportlets. This will requiresupportfor distributed

Page 6: The Online Knowledge Center: Building a Component Based Portalgrids.ucs.indiana.edu/ptliupages/publications/ike02okc.pdf · web services: portals become an aggregate of dis-tributed

events within a web servicesframework, whichwill beanimportantareafor development.

6 Acknowledgements

The Online Knowledge Center is funded by theUS Departmentof Defense’s High PerformanceComputing ModernizationProgramthrough theProgrammingEnvironmentandTraininginitiative.Wegratefullyacknowledgetheir support.

References

[1] WebServicesfor RemotePortals:http://www.oasis-open.org/committees/wsrp/.

[2] JetspeedOverview:http://jakarta.apache.org/jetspeed/site/index.html.

[3] TheJakartaSlideProject:http://jakarta.apache.org/slide

[4] WebDav Resources:http://www.webdav.org

[5] TheApacheJakartaProjecthomepage:http://jakarta.apache.org.

[6] JakartaTurbineWebSite:http://jakarta.apache.org/turbine.

[7] Java MessageServiceAPI 1.0.2:http://java.sun.com/products/jms.

[8] NaradaEventBrokeringSystem:http://grids.ucs.indiana.edu/ptliupages/projects/narada.

[9] SimpleObjectAccessProtocol(SOAP) 1.1:http://www.w3c.org/TR/SOAP.

[10] JavaServer PagesTechnology:http://java.sun.com/products/jsp.

[11] GarnetXML ObjectSpecification:http://aspen.ucs.indiana.edu/project/gxos/.

[12] RDFSiteSummarySpecification:http://groups.yahoo.com/group/rssdev/files/specification.html.

[13] ExtensibleStylesheetLanguageTranforma-tions:http://www.w3c.org/Style/XSL.

[14] Oracle Corporation, Oracle 9i ApplicationDeveloper’s Guide- XML, June2001.

[15] OracleCorporation,OracleText ApplicationDeveloper’s GuideRelease9.0.1,June2001.

[16] Oracle Corporation, Oracle 9i New Fea-tures Summary, Technical white paper.http://www.orcle.com/xml/documents, Octo-ber2000.