1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.
-
Upload
arthur-butler -
Category
Documents
-
view
221 -
download
3
Transcript of 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.
![Page 1: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/1.jpg)
1
Introduction to Stanford DB GroIntroduction to Stanford DB Group Researchup Research
Li RuixuanLi Ruixuan
http://cs.hust.edu.cn/rxli/http://cs.hust.edu.cn/rxli/
[email protected]@public.wh.hb.cn
![Page 2: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/2.jpg)
2
ContentsContents
IntroductionIntroduction Past projectsPast projects Current projectsCurrent projects EventsEvents ReferencesReferences LinksLinks
![Page 3: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/3.jpg)
3
The Stanford Database GroupThe Stanford Database Group ““Mainstream” facultyMainstream” faculty
– Hector Garcia-MolinaHector Garcia-Molina– Jennifer WidomJennifer Widom– Jeff UllmanJeff Ullman– Gio WiederholdGio Wiederhold
““Adjunct” facultyAdjunct” faculty– Chris Manning (natural language processing)Chris Manning (natural language processing)– Rajeev Motwani (theory)Rajeev Motwani (theory)– Terry Winograd (human-computer interaction)Terry Winograd (human-computer interaction)
A.k.a. A.k.a. Stanford InfoLabStanford InfoLab
![Page 4: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/4.jpg)
4
Database Group (cont’d) Database Group (cont’d) Approximately 25 Ph.D. studentsApproximately 25 Ph.D. students Varying numbers of M.S. and undergraduate Varying numbers of M.S. and undergraduate
studentsstudents Handful of visitorsHandful of visitors One senior research associateOne senior research associate One systems administrator, one programmerOne systems administrator, one programmer Excellent administrative staffExcellent administrative staff Resident photographerResident photographer
![Page 5: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/5.jpg)
5
Research Areas (very coarse)Research Areas (very coarse) Digital librariesDigital libraries Peer-to-peer systemsPeer-to-peer systems Data streamsData streams Replication, caching, archiving, broadcast, Replication, caching, archiving, broadcast,
…… The WebThe Web Ontologies, semantic WebOntologies, semantic Web Data miningData mining MiscellaneousMiscellaneous
![Page 6: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/6.jpg)
6
Past ProjectsPast Projects LICLIC: Large-Scale Interoperation and Composition (1999) – : Large-Scale Interoperation and Composition (1999) –
mediator (SKC, OntoWeb, CHAIMS, SmiQL, image DB)mediator (SKC, OntoWeb, CHAIMS, SmiQL, image DB) SKCSKC: Scalable Knowledge Composition (2000) - semantic h: Scalable Knowledge Composition (2000) - semantic h
eterogeneityeterogeneity TID: Trusted Image Distribution (2001) - Image Filtering foTID: Trusted Image Distribution (2001) - Image Filtering fo
r Secure Distribution of Medical Informationr Secure Distribution of Medical Information Image Database: Content-based Image Retrieval (2003)Image Database: Content-based Image Retrieval (2003) SimQL:Simulation Access Language (2001) - Software moSimQL:Simulation Access Language (2001) - Software mo
dules in manufacturing, acquisition, and planning systemsdules in manufacturing, acquisition, and planning systems
![Page 7: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/7.jpg)
7
Past Projects (cont’d)Past Projects (cont’d) TSIMMISTSIMMIS: Wrapping and mediation for hetero: Wrapping and mediation for hetero
genous information sources (1998)genous information sources (1998) Lore: A Database Management System for XMLore: A Database Management System for XM
L (2000)L (2000) WHIPS: WareHouse Information Prototype at SWHIPS: WareHouse Information Prototype at S
tanford (1998) - Data warehouse creation and mtanford (1998) - Data warehouse creation and maintenanceaintenance
MIDAS: Mining Data at Stanford (1999)MIDAS: Mining Data at Stanford (1999) WSQ: Web-Supported Queries (2000) - IntegratWSQ: Web-Supported Queries (2000) - Integrat
ing database queries and Web searchesing database queries and Web searches
![Page 8: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/8.jpg)
8
Current ProjectsCurrent Projects WebBaseWebBase: Crawling, storage, indexing, and querying of lar: Crawling, storage, indexing, and querying of lar
ge collections of Web pages. (ge collections of Web pages. (MolinaMolina)) STREAMSTREAM: A Database Management System for Data Strea: A Database Management System for Data Strea
ms (ms (WidomWidom)) PeersPeers: Building primitives for peer-to-peer systems (: Building primitives for peer-to-peer systems (MolinMolin
aa)) Digital LibrariesDigital Libraries: Interoperating on-line services for end-us: Interoperating on-line services for end-us
er support (TID,WebBase,OntoAgents) (er support (TID,WebBase,OntoAgents) (MolinaMolina)) TRAPPTRAPP: Approximate data caching: trading precision for p: Approximate data caching: trading precision for p
erformance (erformance (WidomWidom)) CHAIMSCHAIMS: Compiling High-level Access Interfaces for Mul: Compiling High-level Access Interfaces for Mul
ti-site Software (1999) (ti-site Software (1999) (WiederholdWiederhold)) OntoAgentsOntoAgents: Ontology based Infrastructure for Agents (200: Ontology based Infrastructure for Agents (200
2) (2) (WiederholdWiederhold))
![Page 9: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/9.jpg)
9
WebBase: ObjectivesWebBase: Objectives Provide a Provide a storage infrastructurestorage infrastructure for Web-like co for Web-like co
ntent ntent Store a Store a sizeable portionsizeable portion of the Web of the Web Enable researchers to easily Enable researchers to easily build indexesbuild indexes of pa of pa
ge features across large sets of pages ge features across large sets of pages Distribute Webbase content via Distribute Webbase content via multicast channmulticast chann
els els Support Support structurestructure andand content-based queryingcontent-based querying o o
ver the stored collection ver the stored collection
![Page 10: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/10.jpg)
10
WebBase: ArchitectureWebBase: Architecture
Page RepositoryWWW
Crawler
Indexing Module
Indexing Module
Retrieval Indexes
Client
Index
API
Indexing Client
MulticastModule
MulticastModule
Client
Client
QueryEngine
QueryEngine
WebBase
API
Client
Analysis Module
Analysis Module
Feature Repository
Client
Client
![Page 11: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/11.jpg)
11
WebBase: Current StatusWebBase: Current Status Efficient “smart” Efficient “smart” crawlercrawler
– ParallelismParallelism– Freshness & RelevanceFreshness & Relevance
Efficient and scalable Efficient and scalable indexingindexing– Distributed Web-scale content indexesDistributed Web-scale content indexes– Indexes over graph structureIndexes over graph structure
UnicastUnicast dissemination dissemination– Within StanfordWithin Stanford– External clients: Columbia, U.Wash, U.C.BerkeleyExternal clients: Columbia, U.Wash, U.C.Berkeley
![Page 12: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/12.jpg)
12
WebBase: In ProgressWebBase: In Progress
WebBase InfrastructureWebBase Infrastructure– Multicast disseminationMulticast dissemination– Complex queriesComplex queries
Other workOther work– PageRankPageRank extensions extensions– Clustering and similarity Clustering and similarity searchsearch– Structured data Structured data extractionextraction– Hidden Web Hidden Web crawlingcrawling
![Page 13: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/13.jpg)
13
Data Streams: MotivationData Streams: Motivation Traditional DBMS -- data stored in finite, Traditional DBMS -- data stored in finite,
persistent persistent data setsdata sets New applications -- data as multiple, continuous, New applications -- data as multiple, continuous,
rapid, time-varying rapid, time-varying data streamsdata streams– Network monitoring and traffic engineeringNetwork monitoring and traffic engineering– Security applicationsSecurity applications– Telecom call recordsTelecom call records– Financial applicationsFinancial applications– Web logs and click-streamsWeb logs and click-streams– Sensor networksSensor networks– Manufacturing processesManufacturing processes
![Page 14: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/14.jpg)
14
STREAM: ArchitectureSTREAM: Architecture
DSMS
Scratch Store
Input streams
RegisterQuery
StreamedResult
StoredResult
ArchiveStored
Relations
![Page 15: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/15.jpg)
15
STREAM: ChallengesSTREAM: Challenges
Multiple, continuous, rapid, time-varyingMultiple, continuous, rapid, time-varying streams of datastreams of data
Queries may be Queries may be continuous continuous (not just one-time)(not just one-time)– Evaluated continuously as stream data arrivesEvaluated continuously as stream data arrives– Answer updated over timeAnswer updated over time
Queries may be Queries may be complexcomplex– Beyond element-at-a-time processingBeyond element-at-a-time processing– Beyond stream-at-a-time processingBeyond stream-at-a-time processing
![Page 16: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/16.jpg)
16
DBMS versus DSMSDBMS versus DSMS Persistent relationsPersistent relations
One-time queriesOne-time queries
Random accessRandom access
Access plan determined Access plan determined by query processor and by query processor and physical DB designphysical DB design
““Unbounded” disk storeUnbounded” disk store
Transient streams (and Transient streams (and persistent relations)persistent relations)
Continuous queriesContinuous queries
Sequential accessSequential access
Unpredictable data Unpredictable data arrival and arrival and characteristicscharacteristics
Bounded main memoryBounded main memory
![Page 17: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/17.jpg)
17
STREAM: Current StatusSTREAM: Current Status Data Data streamsstreams and stored and stored relationsrelations Declarative Declarative languagelanguage for registering for registering
continuous queriescontinuous queries Flexible Flexible queryquery plansplans Designed to cope with high Designed to cope with high datadata ratesrates and and
queryquery workloadsworkloads– Graceful approximation when neededGraceful approximation when needed– Careful resource allocation and usageCareful resource allocation and usage
RelationalRelational, , centralizedcentralized (for now) (for now)
![Page 18: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/18.jpg)
18
STREAM: Ongoing WorkSTREAM: Ongoing Work
AlgebraAlgebra for streams for streams SemanticsSemantics for continuous queries for continuous queries Synopses and Synopses and algorithmicalgorithmic issues issues MemoryMemory management issues management issues Exploiting Exploiting constraintsconstraints on streams on streams ApproximationApproximation in query processing in query processing DistributedDistributed stream processing stream processing System developmentSystem development
![Page 19: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/19.jpg)
19
STREAM: Related WorkSTREAM: Related Work Amazon/CougarAmazon/Cougar (Cornell) – sensors (Cornell) – sensors AuroraAurora (Brown/MIT) – sensor monitoring, dataflow(Brown/MIT) – sensor monitoring, dataflow Hancock Hancock (AT&T) – telecom streams(AT&T) – telecom streams NiagaraNiagara (OGI/Wisconsin) – Internet XML databases (OGI/Wisconsin) – Internet XML databases OpenCQ OpenCQ (Georgia) – triggers, incr. view maintenance(Georgia) – triggers, incr. view maintenance StreamStream (Stanford) – general-purpose DSMS(Stanford) – general-purpose DSMS TapestryTapestry (Xerox) – pub/sub content-based filtering (Xerox) – pub/sub content-based filtering TelegraphTelegraph (Berkeley) – adaptive engine for sensors (Berkeley) – adaptive engine for sensors TribecaTribeca (Bellcore) – network monitoring (Bellcore) – network monitoring
![Page 20: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/20.jpg)
20
Peer-To-Peer SystemsPeer-To-Peer Systems
Multiple sitesMultiple sites (at edge) (at edge) Distributed resourcesDistributed resources Sites are Sites are autonomousautonomous (different owners) (different owners) Sites are both Sites are both clients and serversclients and servers Sites have Sites have equal functionalityequal functionality
![Page 21: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/21.jpg)
21
P2P BenefitsP2P Benefits
Pooling available (inexpensive) resourcesPooling available (inexpensive) resources High availability and fault-toleranceHigh availability and fault-tolerance Self-organizationSelf-organization
![Page 22: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/22.jpg)
22
P2P ChallengesP2P Challenges SearchSearch
– Query ExpressivenessQuery Expressiveness– ComprehensivenessComprehensiveness– TopologyTopology– Data PlacementData Placement– Message RoutingMessage Routing
Resource ManagementResource Management– fairnessfairness– load balancingload balancing
SecuritySecurity & & PrivacyPrivacy– AnonymityAnonymity– ReputationReputation– AccountabilityAccountability– Information Information
PreservationPreservation– Information QualityInformation Quality– TrustTrust– Denial of service Denial of service
attacksattacks
![Page 23: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/23.jpg)
23
Peers: Stanford ResearchPeers: Stanford Research
New New ArchitecturesArchitectures Performance Modeling and Performance Modeling and OptimizationOptimization SecuritySecurity and Trust and Trust Distributed Distributed ResourceResource ManagementManagement ApplicationsApplications
![Page 24: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/24.jpg)
24
Digital Library Project: Digital Library Project: OverviewOverview
InternetLibraries
PaymentInstitutions
SearchAgents
User Interfacesand Annotations
Commercial Information Brokers &
Providers
CopyrightServices
Query/DataConversionHTTP
Z39.50
Telnet
![Page 25: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/25.jpg)
25
DigLib Projects: DLI1,DLI2DigLib Projects: DLI1,DLI2
Resource Resource DiscoveryDiscovery RetrievingRetrieving Information Information InterpretingInterpreting Information Information ManagingManaging Information Information SharingSharing Information Information
![Page 26: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/26.jpg)
26
DigLib: Resource DiscoveryDigLib: Resource Discovery
Geographic ViewsGeographic Views (Tools to assist you in (Tools to assist you in more systematically locating different types more systematically locating different types of information from a large and diverse of information from a large and diverse number of information sources)number of information sources)
![Page 27: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/27.jpg)
27
DigLib: Retrieving InformationDigLib: Retrieving Information
Information Tiling Information Tiling PalmPilot Infrastructure (PDA)PalmPilot Infrastructure (PDA) Power Browsing (PDA) (PDA) Query Translator Query Translator SDLIP (Simple Digital Library Interoperabil (Simple Digital Library Interoperabil
ity Protocol)ity Protocol) Value Filtering Value Filtering WebBase
![Page 28: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/28.jpg)
28
DigLib: Interpreting InformationDigLib: Interpreting Information
MuralsMurals (Tools to help a user interpret and (Tools to help a user interpret and organize search results)organize search results)
Web ClusteringWeb Clustering
![Page 29: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/29.jpg)
29
DigLib: Managing InformationDigLib: Managing Information
Archival Repositories Archival Repositories Archiving Movie Archiving Movie InterBib (a tool for maintaining bibliographInterBib (a tool for maintaining bibliograph
ic information)ic information) Medical Transport Info Medical Transport Info PhotoBrowser PhotoBrowser
![Page 30: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/30.jpg)
30
DigLib: Sharing InformationDigLib: Sharing Information
Diet ORB (PDA, based on MICO) (PDA, based on MICO) Digital Wallets Digital Wallets Mobile Info Delivery Mobile Info Delivery Mobile Security Mobile Security Multicasting Multicasting
![Page 31: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/31.jpg)
31
DLI1 Projects (95-99)DLI1 Projects (95-99)
AHA AHA ComMentor ComMentor DLITE DLITE GoogleGoogle GLOSS GLOSS FAB FAB Grassroots Grassroots MetadataMetadata Architecture Architecture
RManage/FIRM RManage/FIRM SenseMaker SenseMaker SCAM SCAM Shopping Models, U-PAShopping Models, U-PA
I I SONIA SONIA STARTS STARTS WebWriterWebWriter
![Page 32: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/32.jpg)
32
TRAPP: OverviewTRAPP: Overview
TRAPP: Tradeoff in Replication Precision and Performance
A.k.a: Approximate Data Caching Project goal: investigating techniques to : investigating techniques to
permit controlled and explicit relaxation of permit controlled and explicit relaxation of data precision in exchange for improved data precision in exchange for improved performanceperformance
![Page 33: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/33.jpg)
33
TRAPP: MotivationTRAPP: Motivation
Transactional consistency too expensiveTransactional consistency too expensive Even nontransactional propagation of every Even nontransactional propagation of every
update still too expensive in many casesupdate still too expensive in many cases
SolutionSolution: Approximate Caching – Exploit the fact that many applications do not r
equire exact consistency– Avoid propagating insignificant updates– Trade cache precision for network load
![Page 34: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/34.jpg)
34
Example: Example: TRAPP Over Numeric Data
Caches store intervals that bound the exact source values
Sources refresh when value leaves interval
Query answers are intervals Precision constraints specify maximum width
[2, 5] [-1, 0.8]
3.9 0.2
cache
source source
refreshes refreshes
![Page 35: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/35.jpg)
35
Eg(cont’d): Querying in TRAPPEg(cont’d): Querying in TRAPPFor one-time aggregation queries:For one-time aggregation queries:
– Answers computed by combining approximate cached Answers computed by combining approximate cached data and exact source datadata and exact source data
– At query-timeAt query-time: Find low-cost subset of sources to : Find low-cost subset of sources to probe so final answer will have adequate precisionprobe so final answer will have adequate precision
– Algorithm determined by aggregation functionAlgorithm determined by aggregation function» Some easy, some hardSome easy, some hard
probe
Query: X + Y (within 2)
Answer: [2.9, 4.7]
[2, 5] [-1, 0.8]
3.9 0.2
cache
source sourceX Y
![Page 36: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/36.jpg)
36
TRAPP: Approximate CachingTRAPP: Approximate CachingTwo common scenarios:Two common scenarios:
• Minimize bandwidth usage, precision fixedMinimize bandwidth usage, precision fixed» TRAPPTRAPP: caches store : caches store boundsbounds as approximations as approximations» Queries select combination of cached & source dataQueries select combination of cached & source data» Adaptive bound adjustmentAdaptive bound adjustment for good precision level for good precision level
• Bandwidth fixed, maximize precisionBandwidth fixed, maximize precision» Best-Effort SynchronizationBest-Effort Synchronization: caches store stale copies: caches store stale copies» Refreshing based on Refreshing based on priority schedulingpriority scheduling» Global priority order via Global priority order via thresholdthreshold» Adaptive threshold settingAdaptive threshold setting for flow control for flow control
![Page 37: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/37.jpg)
37
TRAPP: StatusTRAPP: Status
Past workPast work: focused on an approximate data : focused on an approximate data caching architecture that permits fine-caching architecture that permits fine-grained control of the precision-grained control of the precision-performance tradeoff for numerical data in performance tradeoff for numerical data in data caching environments.data caching environments.
Current workCurrent work: applying the above : applying the above techniques and others to more complex data techniques and others to more complex data such as Web pages. such as Web pages.
![Page 38: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/38.jpg)
38
CHAIMS: OverviewCHAIMS: Overview CHAIMS: Compiling High-level Access Interfaces for MuCHAIMS: Compiling High-level Access Interfaces for Mu
lti-site Softwarelti-site Software ObjectiveObjective: Investigate revolutionary approaches to large-s: Investigate revolutionary approaches to large-s
cale software composition.cale software composition. ApproachApproach: Develop and validate a composition-only langu: Develop and validate a composition-only langu
age, a protocol for large, distributed, heterogeneous and auage, a protocol for large, distributed, heterogeneous and autonomous megamodules, and a supporting system.tonomous megamodules, and a supporting system.
PlannedPlanned contributionscontributions: : – Asynchrony by splitting up CALL-statement.Asynchrony by splitting up CALL-statement.– Hardware and software platform independence.Hardware and software platform independence.– Potential for multi-site dataflow optimization.Potential for multi-site dataflow optimization.– Performance optimization by invocation scheduling.Performance optimization by invocation scheduling.
![Page 39: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/39.jpg)
39
CHAIMS: OverviewCHAIMS: Overview
Megaprogram for composition, written by domain programmer
CHAIMS system automates generation of client for
distributed system
Megamodules, provided by various megamodule
providersMegamodules
CHAIMS
![Page 40: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/40.jpg)
40
CHAIMS: ArchitectureCHAIMS: Architecture
writes
e
Megaprogrammer
d
a
b
c
Distribution System (CORBA, RMI…)
CSRT(compiled megaprogram)
Megaprogram(in CHAIMS language)
CHAIMS Compiler
generates
MEGA modules
CHAIMS Repository
adds information to
MegamoduleProvider
wraps non-CHAIMScompliant megamodules
information
information
Wrapper Templates
![Page 41: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/41.jpg)
41
OntoAgents: ObjectiveOntoAgents: Objective
OntoAgents goalOntoAgents goal: establish an agent infrast: establish an agent infrastructure on the WWW or WWW-like networructure on the WWW or WWW-like networksks
Such an agent infrastructure requires an Such an agent infrastructure requires an infinformation food chainormation food chain: every part of the food : every part of the food chain provides information, which enables tchain provides information, which enables the existence of the next part. he existence of the next part.
![Page 42: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/42.jpg)
42
OntoAgents: ArchitectureOntoAgents: Architecture
•Ontology Ontology Construction ToolConstruction Tool
•Ontology Ontology Articulation ToolkitArticulation Toolkit
•Annotated WebpageAnnotated Webpagess•Webpage AnnotatiWebpage Annotati
on Toolon Tool
•OntologiesOntologies•AgentsAgents
•Metadata Metadata RepositoryRepository
•Inference Inference EngineEngine
•Community Community PortalPortal
•End End UserUser
![Page 43: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/43.jpg)
43
Events: DB Seminars
Academic Year
Fall Winter Spring
2002/2003no seminar in the fall quarter
Database Seminar (CS545)
Genome Databases (CS545G)
2001/2002 Past, Present, and Future of Database Technology
Genome DatabasesDatabase Seminar to come
2000/2001 Interoperation, Databases and the Semantic Web
Image Databases Databases and the Semantic Web
1999/2000Ontologies, E-Commerce, XML & Metadata
n/aOntologies, E-Commerce, XML & Metadata
1998/1999 Digital Libraries Image Databases Internet and Databases
1997/1998 Data Warehousing Image Databases Internet and Databases
1996/1997 Fall Quarter 96 Image Databases Spring Quarter 97
![Page 44: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/44.jpg)
44
Events: MeetingsEvents: Meetings
Stanford Computer Science Forum - Annual AfStanford Computer Science Forum - Annual Affiliates Meetingfiliates Meeting, Stanford, May 2003. , Stanford, May 2003.
SWiMSWiM (the (the Stream Winter Meeting)Stream Winter Meeting): About 35 r: About 35 researchers in the data streams are came together esearchers in the data streams are came together at Stanford for at Stanford for SWiM, SWiM, Jan. 2003. Jan. 2003. – Stream TeamStream Team: A few data streams research groups h: A few data streams research groups h
eld some informal get-togethers, 2002. eld some informal get-togethers, 2002. Conference TalkConference Talk: ACM SIGMOD/PODS, VLD: ACM SIGMOD/PODS, VLD
B, ICDT, ICDE, ICDCS, B, ICDT, ICDE, ICDCS, CIDRCIDR
![Page 45: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/45.jpg)
45
References: WebBaseReferences: WebBase Junghoo Cho, Hector Garcia-Molina. Junghoo Cho, Hector Garcia-Molina. ""
Parallel CrawlersParallel Crawlers," ," In Proceedings of In Proceedings of the Eleventh Worthe Eleventh World Wide Web Conferenceld Wide Web Conference, May 2002. , May 2002.
Taher Haveliwala, Aristides Gionis, etc. Taher Haveliwala, Aristides Gionis, etc. ""Evaluating Strategies for Similarity Search on the WebEvaluating Strategies for Similarity Search on the Web,"," Proceedings of the Eleventh International World Wid Proceedings of the Eleventh International World Wide Web Conference, May 2002. e Web Conference, May 2002.
Taher Haveliwala. Taher Haveliwala. ""Topic-SensitiveTopic-Sensitive PageRank PageRank,"," Proce Proceedings of the Eleventh International World Wide Web edings of the Eleventh International World Wide Web Conference, May 2002. Conference, May 2002.
![Page 46: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/46.jpg)
46
References: STREAMReferences: STREAM R. Motwani, J. Widom, etc. R. Motwani, J. Widom, etc.
Query Processing, Resource Management, and ApproximatiQuery Processing, Resource Management, and Approximation in a Data Stream Management System on in a Data Stream Management System
In Proc. of the 2003 Conference on Innovative Data SystemIn Proc. of the 2003 Conference on Innovative Data Systems Research (CIDR), January 2003 s Research (CIDR), January 2003
A. Arasu, B. Babcock. etc. A. Arasu, B. Babcock. etc. STREAM: The Stanford Stream Data ManagerSTREAM: The Stanford Stream Data Manager In In Proc. of the ACM Intl Conf. on Management of Data (SIProc. of the ACM Intl Conf. on Management of Data (SIGMOD 2003), June 2003 GMOD 2003), June 2003
B. Babcock, S. Babu, etc. B. Babcock, S. Babu, etc. Models and Issues in Data Stream SystemsModels and Issues in Data Stream Systems Invited paper in Proc. of the 2002 ACM Symp. on Principles Invited paper in Proc. of the 2002 ACM Symp. on Principles of Database Systems (PODS 2002), June 2002 of Database Systems (PODS 2002), June 2002
![Page 47: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/47.jpg)
47
References: PeersReferences: Peers
Neil Daswani, Hector Garcia-Molina and Beverly YanNeil Daswani, Hector Garcia-Molina and Beverly Yang. g. Open Problems in Data-Sharing Peer-to-Peer Systems,Open Problems in Data-Sharing Peer-to-Peer Systems, In ICDT, 2003.In ICDT, 2003.
Hector Garcia-Molina. Hector Garcia-Molina. Peer-To-Peer Data Management,Peer-To-Peer Data Management, Key-notes Key-notes In ICDE, In ICDE, 2002.2002.
Hrishikesh Deshpande, Mayank Bawa, and Hector GarHrishikesh Deshpande, Mayank Bawa, and Hector Garcia-Molina. cia-Molina. Streaming Live Media over a Peer-to-Peer Network.Streaming Live Media over a Peer-to-Peer Network.
![Page 48: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/48.jpg)
48
References: TRAPPReferences: TRAPP
C. Olston and J. Widom. C. Olston and J. Widom. Best-Effort Cache Synchronization with Source CooperaBest-Effort Cache Synchronization with Source Cooperation.tion. ACM SIGMOD 2002 International Conference on Man ACM SIGMOD 2002 International Conference on Management of Data, Madison, Wisconsin, June 2002, pp. 7agement of Data, Madison, Wisconsin, June 2002, pp. 73 -84.3 -84.
C. Olston, B. T. Loo and J. Widom. C. Olston, B. T. Loo and J. Widom. Adaptive Precision Setting for Cached Approximate ValAdaptive Precision Setting for Cached Approximate Values.ues. ACM SIGMOD 2001 International Conference on Man ACM SIGMOD 2001 International Conference on Management of Data, Santa Barbara , California, May 2001, agement of Data, Santa Barbara , California, May 2001, pp. 355-366.pp. 355-366.
![Page 49: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn.](https://reader038.fdocuments.net/reader038/viewer/2022110209/56649e555503460f94b4d152/html5/thumbnails/49.jpg)
49
Useful LinksUseful Links Database Group: http://www-db.stanford.edu/ STREAM: http://www-db.stanford.edu/stream/ Peers: http://www-db.stanford.edu/peers/ DigLib: http://www-diglib.stanford.edu/ TRAPP: http://www-db.stanford.edu/trapp/ WebBase: http://www-diglib.stanford.edu/~tes
tbed/doc2/WebBase/