Search Computing Overview
-
Upload
search-computing -
Category
Technology
-
view
11.131 -
download
0
description
Transcript of Search Computing Overview
Search ComputingStefano Ceri, Keynote talk at CAISE, Hammamet, June 9, 2010
Joint work with: Adnan Abid, Mamoun Abu Helu, Davide Barbieri, Daniele Braga, Marco Brambilla, Alessandro Bozzon, Alessandro Campi, Sofia Ceppi, Francesco Corcoglioniti, Emanuele Della Valle, Davide Eynard, Piero Fraternali, Nicola Gatti, Giorgio Ghisalberghi, Michael Grossniklaus, Davide Martinenghi, Marco Masseroli, Maristella Matera, Chiara Pasini, Elena Pellizzotti, Stefania Ronchi, Marco Tagliasacchi, Luca Tettamanti, Salvatore Vadacca, Riccardo Volonterio, Serge Zagorac
Prof. Stefano CeriDatabase Management
Genesis of Search Computing
My “Gong Show” challenge at 2003 Lowell Workshop: “Find an ethnical restaurant in a nice place close to Milano” .
Logically a composition of domains:– Restaurants (ethnical)– Geo-locations (nice place close to Milano)
Composing maps with “geo-located” information is now solved by all search engines …
… but in general no system is capable of composing arbitrary semantic domains
Prof. Stefano CeriDatabase Management
Motivating Examples
“Who are the strongest candidates in Europe for competing on software ideas?”
3
“Who is the best doctor who can cure insomnia in a close-by hospital?”
“Where can I attend an interesting scientific conference in my field and at the same time relax on a beautiful beach nearby?”
Prof. Stefano CeriDatabase Management
Their Common Aspect
Multi-domain queries
Individual answers are on the Web
4
A knowledgeable user would do the query step-by-step:– Search database conferences, get their city– Check that the city average temperature is warm enough– Search low-cost flights via a broker for that city– Search luxury hotels via another broker
We want a system for supporting this search process– Build several “solutions” which already integrate all dimensions– Rank “solutions” according to a global rank function and output
results in rank order– Support user-friendly query definition and result browsing – Add search domains while the search proceeds – Possibly change the relative weight of each ranking
Prof. Stefano CeriDatabase Management
OVERALL FRAMEWORK
5
Prof. Stefano CeriDatabase Management
Search Computing architecture: overall view 6
Main Query flow
DomainRepository
Front End
Query Planner
Cache
Query To Domain Mapper
Cache
Query Analysis
Cache
Query Engine
OP 1 OP 2 OP N Cache...
WS-Framework
Cache
ServiceRepository
Result Transformation
Cache
WSWorld
High-Level Query
Sub-queries
ConcreteQuery Plan
Low-level queries Merged Results
DomainFramework
Cache
Final UserResults
<Uses> relation
Prof. Stefano CeriDatabase Management
Search Computing architecture: overall view 7
Main Query flow
DomainRepository
Front End
Query Planner
Cache
Query To Domain Mapper
Cache
Query Analysis
Cache
Query Engine
OP 1 OP 2 OP N Cache...
WS-Framework
Cache
ServiceRepository
Result Transformation
Cache
WSWorld
High-Level Query
Sub-queries
ConcreteQuery Plan
Low-level queries Merged Results
DomainFramework
Cache
Final UserResults
<Uses> relation
High level query“Where can I attend a DB
scientific conference close to a beautiful beach reachable
with cheap flights?”
Sub query 1“Where can I attend
a DB scientific conference?”
Sub query 2“place close to
a beautiful beach?” Sub query 3
“place reachable with cheap flight?”
Prof. Stefano CeriDatabase Management
Search Computing architecture: overall view 8
Main Query flow
DomainRepository
Front End
Query Planner
Cache
Query To Domain Mapper
Cache
Query Analysis
Cache
Query Engine
OP 1 OP 2 OP N Cache...
WS-Framework
Cache
ServiceRepository
Result Transformation
Cache
WSWorld
High-Level Query
Sub-queries
ConcreteQuery Plan
Low-level queries Merged Results
DomainFramework
Cache
Final UserResults
<Uses> relation
Low level query 1ConfSearch(“DB”,placeX,dateY)
Low level query 2TourSearch(“Beach”,PlaceX)
Low level query 3Flight(“cost<200”,PlaceX,DateY)
Prof. Stefano CeriDatabase Management
Search Computing architecture: overall view 9
Main Query flow
DomainRepository
Front End
Query Planner
Cache
Query To Domain Mapper
Cache
Query Analysis
Cache
Query Engine
OP 1 OP 2 OP N Cache...
WS-Framework
Cache
ServiceRepository
Result Transformation
Cache
WSWorld
High-Level Query
Sub-queries
ConcreteQuery Plan
Low-level queries Merged Results
DomainFramework
Cache
Final UserResults
<Uses> relation
Query plan
Services invocations and operators execution
Results
Presented resultsESWC-Crete-OlympicCAISE- Hammamet – AlitaliaTOOLS-Malaga-EasyJet
Prof. Stefano CeriDatabase Management
Search Computing architecture: incremental prototyping11
Prototype 1:Core behaviour of the system.
• Engine-based execution of queries
• Domain repository• Service repository • Coarse result
presentation
<Uses> relation
DomainRepository
Front End
Query Planner
Cache
Query To Domain Mapper
Cache
Query Analysis
Cache
Query Engine
OP 1 OP 2 OP N Cache...
WS-Framework
Cache
ServiceRepository
Result Transformation
Cache
WSWorld
High-Level Query
Sub-queries
ConcreteQuery Plan
Low-level queriesMerged Results
DomainFramework
Cache
Final UserResults
Ad
min
In
terf
ace
Lo
w-le
vel q
ue
rie
s
Su
b-q
ue
rie
s
Co
ncr
ete
Qu
ery
Pla
n
Prof. Stefano CeriDatabase Management
Search Computing architecture: incremental prototyping12
Prototype 1:Core behaviour of the system.
• Engine-based execution of queries
• Domain repository• Service repository • Coarse result
presentation
<Uses> relation
DomainRepository
Front End
Query Planner
Cache
Query To Domain Mapper
Cache
Query Analysis
Cache
Query Engine
OP 1 OP 2 OP N Cache...
WS-Framework
Cache
ServiceRepository
Result Transformation
Cache
WSWorld
High-Level Query
Sub-queries
ConcreteQuery Plan
Low-level queriesMerged Results
DomainFramework
Cache
Final UserResults
Ad
min
In
terf
ace
Lo
w-le
vel q
ue
rie
s
Su
b-q
ue
rie
s
Co
ncr
ete
Qu
ery
Pla
n
Prototype 2:Planning
• Automatic optimized query planning
Prof. Stefano CeriDatabase Management
Search Computing architecture: incremental prototyping13
Prototype 1:Core behaviour of the system.
• Engine-based execution of queries
• Domain repository• Service repository • Coarse result
presentation
<Uses> relation
DomainRepository
Front End
Query Planner
Cache
Query To Domain Mapper
Cache
Query Analysis
Cache
Query Engine
OP 1 OP 2 OP N Cache...
WS-Framework
Cache
ServiceRepository
Result Transformation
Cache
WSWorld
High-Level Query
Sub-queries
ConcreteQuery Plan
Low-level queriesMerged Results
DomainFramework
Cache
Final UserResults
Ad
min
In
terf
ace
Lo
w-le
vel q
ue
rie
s
Su
b-q
ue
rie
s
Co
ncr
ete
Qu
ery
Pla
n
Prototype 2:Planning
• Automatic optimized query planning
Prototype 3:Mapping and presentation
• mapping to domains• presentation of results
Prototype 4:High level queries
Prof. Stefano CeriDatabase Management
CAISE FOCUS on: Service Registration 14
Service Marts:
• Conceptual representation of resources as entities and connections
• Logical representation of signatures
• Physical representation as service implementations
Prof. Stefano CeriDatabase Management
CAISE FOCUS on: Front-end 15
DomainRepository
Front End
Query Planner
Cache
Query To Domain Mapper
Cache
Query Analysis
Cache
Query Engine
OP 1 OP 2 OP N Cache...
WS-Framework
Cache
ServiceRepository
Result Transformation
Cache
WSWorld
High-Level Query
Sub-queries
ConcreteQuery Plan
Low-level queries Merged Results
DomainFramework
Cache
Final UserResults
Liquid Query
Client-side framework for configuration and automatic rendering of query and result interfaces
User interaction primitives that allow to perform explanatory search
Prof. Stefano CeriDatabase Management
16CAISE FOCUS on: Development Process
Liquid Query
Liquid Result
Liquid Query Template
User Interface Specification
Final User
Expert User
<<uses>>
<<uses>>
<<produces>><<defines>>
<<submits>>
<<manipulates>>
Service Developer
SeCo Expert
Search Services SeCo platform<<implements>> <<deploys>>
Wrapping
Service Publisher
<<implements>>
Materialization / Normalization
Registration of Service Mart
<<performs>>
<<defines>>
<<uses>>
Service Mart Repository
<<produces>>
<<uses>>
De
plo
y T
ime
Ser
vice
Pu
blis
hin
g T
ime
Co
nfi
g.
Tim
eE
xecu
tio
n T
ime
Development Support Environment
Tools supporting Service Registration
Query Design
Performance Monitoring
Prof. Stefano CeriDatabase Management
SERVICE REGISTRATION
17
Prof. Stefano CeriDatabase Management
Service Registration in SeCo
Objective: providing a framework for registering services as first-class citizens within SeCo
=> Service Marts High-level abstractions of “real world entities” that provide a simple
interface to users and hide implementation details
Inspired by Data Marts, a data modeling pattern used in data warehousing
Each Service Mart can have multiple modalities of data access and can be mapped to multiple service implementations, possibly offered by different providers
=> Connection Patterns High-level abstractions of “real world relationships” that provide a
simple interface to users and hide implementation details
Built by means of attributes that share the same domains
Prof. Stefano CeriDatabase Management
Service Marts – Conceptual LevelEvery SM definition includes a name and a collection of the exposed attributes,
i.e. the attributes of the real world object described by the SMMovie(Title, Director, Year, Language, Genres(Genre), Actors(Name, Sex))
Atomic, single valued, typed attributes
Repeating groups (multi-valued, typed attributes) Each “repeating group” is a non-empty set of typed sub-attributes
that collectively defines a property of the service mart
The model choices are: To support structural complexity with only one level of nesting
(rather than an arbitrary level of nestings) To avoid explicit descriptions of relationship (using repeating
groups for M:N relationships)
Prof. Stefano CeriDatabase Management
Service Marts – Logical Level
At this level, each SM is associated with one or more Access Patterns, i.e.:
Movie1(TitleO, DirectorO, ScoreRO, YearO, LanguageI, Genres.GenreO,
Actors.NameO , Actors.SexO, Genres.GenreI)
Movie2(TitleI, DirectorO, YearO, LanguageO, Genres.GenreO, Actors.NameO ,
Actors.SexO)
Access patterns contain adorned attributes, i.e. attributes tagged with one of the following: I, if they are input attributes O, if they are Output attributes R, if they are attributes used for ranking – they may or may not be visible in output
Movie1 makes access to movies by Language and Genre (i.e., “action movies in English”) and results are ranked by Score (a new attribute).
Movie2 makes access to movies by Title (e.g. “Ben Hur”). We expect few (zero, one, more) results which are not ranked.
Prof. Stefano CeriDatabase Management
Service Marts – Physical Level
At this level, every Access Pattern can match different Service Implementations, having:
Physical URI to be called Physical properties which are specific to the implementation Mapping between logical and physical parameters
IMDBMovie1(MovieTitleO, DirectorO, StarsRO, YearO, LanguageI,
Genres.GenreI, Actors.NameO , Actors.GenderO)
IMDBMovie AP: Movie1
TTL=6000, chunksize=10, cacheable=true, exposed=false, ...
URI: http://...
Title Director Score Year Language ...
MovieTitle Director Stars Year Lang ...
Prof. Stefano CeriDatabase Management
External and Selector Attributes
external attributes, for supporting access and ranking
selector attributes, for supporting choices among service implementations
Movie(Title, Director, Year, Language, …)
Movie1: TitleO | DirectorO | YearO | … | ScoreRO | GenreI
External attributes
Movie(Title, Director, Year, Language, …)
Movie Implementation 1
Movie Implementation n
... Selector
Language
SM
AP
SM
SI
SI
Movien: TitleO | DirectorO | YearO | … | TitleIAP
Prof. Stefano CeriDatabase Management
Connection Patterns
Connections between marts only exist in terms of attributes that share the same domains, on different levels of abstraction:
Conceptually by a nondirected edge with a name: PlayingMovie(Movie,Theatre)
Logically by an edge (possibly directed) with name and join condition: PlayingMovie(Movie,Theatre): (Title=Movie.Title)
Movie Theatre
Movie4
Theatre2
Prof. Stefano CeriDatabase Management
Connection Patterns – Logical Level
Directed edge: Information is “piped” from one access pattern to another, along connection attributes which are in output in the first service and in input in the second service -> PIPE JOIN
Movie1
Title Director Score Year A.Name A.Sex
Theatre1
Name Address M.Start M.Title
G.GenreLanguage
Prof. Stefano CeriDatabase Management
Connection Patterns – Logical Level
Undirected edge: results are produced by both access patterns in output and then joined -> PARALLEL JOIN
Movie1
Title Director Score … … G.Genre
Theatre1
Name Address M.Start M.Title
Prof. Stefano CeriDatabase Management
Access pattern
Service Mart
Service Interface
Movie
Movie2
IMDB2
Access pattern
Movie1
Access pattern
Theatre1
Service Interface
IMDB1
Service Interface
Hyperrev1
Service Interface
Google1
Service Interface
NYLocalSearch
Service Mart
Theatre
Join of two Services, Pipe Version, NY CitySearch only in NY
Prof. Stefano CeriDatabase Management
JOIN OF TWO SEARCH SERVICES
27
Prof. Stefano CeriDatabase Management
JOIN of Web Services
Input: items resulting from TWO web service calls, possibly ranked
Output: composed items resulting from the concatenation of matching items, presented in a “global ranking order”
Matching condition using:– value equality,– partial set matching– term matching within a vocabulary…..
Services are known, their matching function is predefined: this is not service discovery!
Prof. Stefano CeriDatabase Management
Join 29
bx5
Service X Service Y
bx4
bx3
bx2
bx1
by5
by4
by3
by2
by1
r1
r2
r3
Prof. Stefano CeriDatabase Management
Matching items 30
Prof. Stefano CeriDatabase Management
Choice of the join strategies
The join search space– Different explorations for different joins methods under different
assumptions and with different guarantees
tij
Any exploration trajectory for this space is a join strategy
Chunksize
Candidate join result
Chunk
Prof. Stefano CeriDatabase Management
Nested Loop - Rectangular 32
Prof. Stefano CeriDatabase Management
Merge scan - Triangular 33
Prof. Stefano CeriDatabase Management
Parallel and Pipe Joins
Parallel join of two search services
Pipe join of two search services
34
(1,10)5(0,1)n
period: 500 msS1 S2
(1)
(2)
size: 20stop: 1
S1
S2
(1,2)n
period: 150 msC1
stop: 10excess: (1,1)
(1)
(2)
Prof. Stefano CeriDatabase Management
SUPPORT OF “SIMILARITY JOINS"
35
Prof. Stefano CeriDatabase Management
Supporting value similarity
Concept of “nearness” is widely implemented depending on different contexts, such as:
Lexical near (similar strings) Spatial near (between addresses/geo locations) Temporal near (between dates/times) Economic near (between costs)
Context is defined according to the attributes involved
=> Semantics of nearness built bottom-up, starting from the physical layer (available services) up to the conceptual one.
Prof. Stefano CeriDatabase Management
Similarity comes from Shared Domains
restaurant
The attribute “address” is shared by the 4 entities. Its semantic type, describing a location, enables “nearness” connections between each pair of entities (i.e. addresses can be compared for “nearness” within the same city, country, …)
theatre
apartment
hotel
Address
Address
Address
Address
Spatial Near
Prof. Stefano CeriDatabase Management
Supporting Nearness within Services
Several physical services natively support ranking by distances (e.g. GoogleMovies)
E.g.: GoogleMovies receives the user address as input, and returns theatres ranked by distance, each one with its address as output. UserAddress and Distance are external attributes.
GoogleMovies(UserAddressI, DistanceR | NameO, AddressO, Movie.TitleI, Movie.StartTimeO)
GoogleMovies AP: Theatre1
TTL=6000, chunksize=10, cacheable=true, provides=Spatial Near
URI: http://...
UserAddress Name Address M.Title M.StartTime ...
IAddr Name OAddr MovieTit MovieTime ...
Prof. Stefano CeriDatabase Management
“Nearness” Support within Services
GoogleMovies AP: Theatre1
TTL=600, chunksize=10, cache=1, provides=Spatial Near
URI: http://...
UserAddr Name Address M.Title ...
Addr Name Addr MovieTit ...
Theatre1 UserAddress Address M.Title M.StartTimeName
Restaurant2 NameAddress Cuisine Price
Spatial near
Theatre Restaurant
Spatial Near
Distance
Prof. Stefano CeriDatabase Management
Nearness Services within the Execution Engine
Ad-hoc services providing the notion of distance at the physical level require two domain values as input and produce their distance as output
Two input attributes to specify two values of the domain One output attribute specifies the distance in given units
SpatialNear System
TTL=600, chunksize=1, cacheable=1, ...
URI: http://...
Input1, Input2: Coordinates Output: Distance (Km)
Prof. Stefano CeriDatabase Management
Theatre1 Address M.Title M.StartTimeName
Restaurant2 NameAddress Cuisine
Theatre Restaurant
Spatial Near
Addr1 Addr2 DistanceSpatial Near
Price
SpatialNear System
TTL=600, chunksize=1, cacheable=1, ...
URI: http://...
Input1, Input2: Coordinates Output: Distance (Km)
Supporting Nearness within the Execution Engine
Prof. Stefano CeriDatabase Management
Access pattern
Service Mart
Service Interface
Movie
Movie2
IMDB2
Access pattern
Movie1
Access pattern
Theatre1
Service Interface
IMDB1
Service Interface
Hyperrev1
Service Interface
Google1
Service Interface
NYLocalSearch
Service Mart
Theatre
Service Mart
Restaurant
AP providing
spatial near
Rest1
Access pattern
Rest2
Service Interface
Yahoo1
Service Interface
Yahoo2
Join of three Services at the three Levels in NY
Spatial Near
Search only in NY
Prof. Stefano CeriDatabase Management
Three Levels with Connection Semantics
Services Connections
Conceptual Service Mart Name (with associated semantics)
Bindings between SM and AP attributes, plus definition of extra
attributes
Logical Access PatternJoin attributes,directed vs
undirected edge (with nearness service APs added as needed)
Bindings between AP attributes and SI
parameters
PhysicalService Interface (with
associated semantics and with system services)
Nearness Services
Prof. Stefano CeriDatabase Management
Resource graph
Concert
Specialized way for describing search service based knowledge available on the web [ER model, ontology, class diagram?]
Artist
ExhibitionRestaurant
Hotel
Movie
Metro Station
Theatre
Photo
Landmark
News
...
Piece
...
...
...
...
ShoppingCenter...
...
...
Prof. Stefano CeriDatabase Management
APPLICATION DEVELOPMENT PROCESS
46
Prof. Stefano CeriDatabase Management
SeCo development process
Main Roles:• Service
developer• Service
publisher• Expert user• SeCo expert
Dichotomy:• Top-down
vs. Bottom-up
• Run time vs. Design time
Implement search service
Wrap or materialize service
Register service mart and interface
Service Mart model
Service developer
Service publisher
Design Liquid Query TemplateExpert user
Liquid Query model
Sea
rch
Ser
vice
D
evel
opm
ent
Ser
vice
A
dapt
atio
n an
d R
egis
trat
ion
App
licat
ion
Con
figur
atio
n
Panta Rhei plan refinementSeCo expert
Que
ry P
lan
Ref
inem
ent
Manual optimization needed?
N
Y
Query Plan model
Prof. Stefano CeriDatabase Management
The service registration process
Service Description
SM Identification
Some SM retrieved
?SM CREATION
Modification of the SM structure?
Buttom up Strategy
SM UPDATE
SM MAPPING
YES NO
Top down Strategy
Hybrid Strategy
Associated SI Update (new connections)
YES
NO
Service Physical Description
AP CREATION
END
Prof. Stefano CeriDatabase Management
The SM Creation process, with semantic hintsSM CREATION
SM Name and attributes schema definition
SM and attributes Semantical Description
Synsets (and tags?)
Connection patterns (CP) definition
WN
Type conventions
Spatial_nearTextual near
Temporal_near
Movie(Title, Director, Score, Year, Genres(Genre), Openings(Country, Date), Actors(Name))
Movie: S: (n) movie, film, picture, moving picture, moving-picture show, motion picture, motion-picture show, picture show, pic, flick
(a form of entertainment that enacts a story by sound and a sequence of images giving the
illusion of continuous movement) "they went to a movie every Saturday night";
Director: S: (n) film director, director (the person who directs the making of a film)
Defined CP: Shows Textual_near
Possible CP: Title (String) Textual_near
Year (Date) Temporal_near …Composition Language operators association
Theatres
SMn
SM1
Shows(Movie, Theatre): [(Title=Title)]
Automatic recommendation of connectable SMs
Prof. Stefano CeriDatabase Management
The SM Mapping procedure
SM MAPPING
Movie(Title, Director, Score, Year, …)
Original SM
ImdbMovie: Title | Director | Score | Year | …
Corresponding SM attributes
Auxiliary attributes (i.e. query attributes)
Director: StringDirector: S: the person who directs the making of a film)Director (String)
SI
f
Selector
Selector attributes
Prof. Stefano CeriDatabase Management
SeCo Tools
• Online tool suite that covers the whole development process
• Mashup-based
• Built by using state of the art technologies:
1. MVC on the client: Javascript MVC2. UI organization and panels: Yahoo! User Interfaces3. Diagram drawing and editing: WireIt
Prof. Stefano CeriDatabase Management
53Service Mart Registration
Prof. Stefano CeriDatabase Management
54Mapping editor
Prof. Stefano CeriDatabase Management
55Query Registration Interface
Prof. Stefano CeriDatabase Management
56Query Registration Editor, Logical Connections
Prof. Stefano CeriDatabase Management
LIQUID QUERY INTERFACE
57
Prof. Stefano CeriDatabase Management
Liquid Query
“ A new paradigm allowing users to formulate and get responses to multi-domain queries through an exploratory information seeking approach, based upon structured information sources exposed as software services…”
•Composite answers obtained by aggregating search results
from various domains
•Highlight the contribution of each search service
•Join of results based on the structural information afforded by the search service interfaces
•Refine the user query
•Re-shape the result list
Prof. Stefano CeriDatabase Management
Concert
Artist
ExhibitionRestaurant
Hotel
Movie
Metro Station
Theatre
Photo
Landmark
News
...
Piece
...
...
...
...
ShoppingCenter...
...
...
Photo
Liquid query definition
Concert
It consists of subsetting and parametrizing the resource graph...
Metro Station
RestaurantNewsExhibition
Artist
Hotel
= inputs, outputs + GR = global ranking
Prof. Stefano CeriDatabase Management
Photo
Liquid query definition
Concert
... And then characterizing the user interaction
Metro Station
RestaurantNewsExhibition
Artist
Hotel
Plus:
• Parametrization of global ranking
• Data visualization options
• .. and so on
Expand
Prof. Stefano CeriDatabase Management
Query Submission
Concert query conditions
Hotelsquery conditions
Prof. Stefano CeriDatabase Management
Query Execution & Result Presentation
Prof. Stefano CeriDatabase Management
SECO ENGINE
63
Prof. Stefano CeriDatabase Management
Overview
The tools is aimed at developers and permits to compose, plan and run a SeCo query
Four panels, one for each query processing phase:
Splashscreen!
Prof. Stefano CeriDatabase Management
Query composition (1)
Service interface browser
• lists registered service interfaces• Input and output parameters are listed
Selected service’s statistics
• collected service statistics are displayed• statistics may be edited for testing purposes
Prof. Stefano CeriDatabase Management
Query composition (2)
User-entered datalog-like query
• joins implicitly encoded by datalog vars• $vars encode query inputs provided at runtime
Query optimisation parameters• control the behaviour of the planner
• trigger the planning process
Prof. Stefano CeriDatabase Management
Logical planning
Prof. Stefano CeriDatabase Management
Physical planning
Prof. Stefano CeriDatabase Management
Query execution (1)
Execution session management
• a session corresponds to a single query execution, where multiple user commands may be issued
• query input parameters are specified at session initialisation
Execution status
• displays the current session status• displays the status of the execution commands issued so far
Execution commands forms
• a more-all command requires more query results• a more-one command requires more results by extracting more data from a specific service invoked by the query
Prof. Stefano CeriDatabase Management
Query execution (2)
Query results
• Displays ranked results, as soon as computed
Execution timeline
• displays activation of execution units (e.g. service calls)
• useful to fine tune the engine and the join strategies
Prof. Stefano CeriDatabase Management
Query execution (3)
Service calls log
• displays service calls at the chunk granularity • shows response times, statistics, cache behaviour
Prof. Stefano CeriDatabase Management
DEMO
http://demo.search-computing.eu
Prof. Stefano CeriDatabase Management
SUMMARY OF SECO RESULTS
73
Prof. Stefano CeriDatabase Management
Results after 18 months
Concepts– Service marts, rank join methods, panta rhei, liquid query
Research results– Springer LNCS: Search Computing Challenges and Directions– Many publications (withVLDB,WWW), many ongoing submissions– Filing of US Patent (top-k method, random & sequential services)
Prototypes – Execution environment, focus on liquid query and on integration – Design support environment, focus on mashups
Dissemination– Fifteen keynote talks, twelve articles in the Italian press– SeCo Web site, SeCo blog, facebook, linked-in, twitter communities– Search Computing Graduate Course at PoliMi
Temporary research positions (1 phd, 5 post-ms, 3 post-doc)
74
Prof. Stefano CeriDatabase Management
75Publications
SeCo- D. Braga, A. Campi, S. Ceri, A. Raffio Joining the results of heterogeneous search engines Information Systems, Vol. 33, Issues 7-8, (November-December 2008), Pages 658-680
- D. Braga, S. Ceri, F. Daniel, D. Martinenghi Optimization of Multi-Domain Queries on the Web VLDB 2008: 562-573, Auckland, New Zealand, August 2008
- D. Braga, S. Ceri, F. Daniel, D. Martinenghi Mashing Up Search Services, IEEE Internet Computing 12(5): 16-23 (2008)
- D. Braga, D. Calvanese, A. Campi, S. Ceri, F. Daniel, D. Martinenghi, P. Merialdo, R. Torlone, NGS: a framework for multi-domain query answering, ICDE Workshops 2008: 254-261
- S. Ceri, Search Computin Invited Paper, 25th International Conference on Data Engineering, Shanghai, March 29 - April 2, 2009
- D. Barbieri, A. Bozzon, D. Braga, M. Brambilla,A. Campi, S. Ceri, E. Della Valle, P. Fraternali, M. Grossniklaus, D. Martinenghi, S. Ronchi, M. Tagliasacchi Data-driven optimization of -
search service composition for answering multi-domain queries (USETIM 2009) workshop at VLDB 2009, Lyon, France, August 24-28, 2009
- M.Brambilla, S. Ceri, Engineering Search Computing Applications: Vision and Challenges The 7th joint meeting of the European Software Engineering Conference (ESEC) and the ACM
SIGSOFT Symposium on the Foundations of Software Engineering (FSE), Amsterdam, The Netherlands, August 24-28 2009
- S. Ceri Search Computing The 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Milan, Italy, September 15-18 2009
- S. Ceppi and N. Gatti, An Automated Mechanism Design Approach for Sponsored Search Auctions with Federated Search Engines In Proceedings of the 12^th Workshop on Agent-
Mediated Electronic Commerce (AMEC) in the 9^th International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Toronto, Canada May 10 2010
- D. Martinenghi, M. Tagliasacchi, and S. Ceri Top-k pipe-join International Workshop on Ranking in Databases, Long Beach, USA, March 2010
- A. Bozzon, M. Brambilla, S. Ceri, P. Fraternali Liquid Query: Multi-Domain Exploratory Search on the Web WWW 2010 - 19th International World Wide Web Conference - Raleigh,
North Carolina, April 26-30 2010
- A. Campi, S. Ceri, A. Maesani, S. Ronchi Designing Service Marts for Engineering Search Computing Applications The Tenth International Conference on Web Engineering, ICWE
2010, Vienna, Austria, July 5-9 2010
Related- M. Brambilla, S. Ceri, I. Celino, D. Cerizza, E. Della Valle, F. M. Facca, A. Turati, C. Tziviskou Experiences in the Design of Semantic Services Using Web Engineering Methods and Tools
Journal on Data Semantics 2008- A. Raffio, D. Braga, S. Ceri, P. Papotti, M. Hernandez Clip: a Visual Language for Explicit Schema Mappings International Conference on Data Engineering (ICDE), April 2008
- D. Braga, D. Calvanese, A. Campi, S. Ceri, F. Daniel, D. Martinenghi, P. Merialdo, R. Torlone A New Generation Search Engine Supporting Cross Domain Queries Italian Symposium on
Advanced Database Systems (SEBD), June 2008
- D. Braga, D. Calvanese, A. Campi, S. Ceri, F. Daniel, D. Martinenghi, P. Merialdo, R. Torlone NGS: a Framework for Multi-Domain Query Answering IIMAS, International Conference on Data
Engineering Workshops (ICDE), April 2008
- A. Raffio, D. Braga, S. Ceri, P. Papotti, M. Hernandez Clip: a Tool for Mapping Hierarchical Schemas ACM SIGMOD/PODS Conference, Demo Session, June 2008
- A. Bozzon, M. Brambilla, P. Fraternali Conceptual Modeling of Multimedia Search Applications Using Rich Process Models ICWE 2009, Springer LNCS, vol. 5648, ISBN 978-3-642-02817-5.
- E. Della Valle, S. Ceri, D. F. Barbieri, D. Braga, A. Campi A First Step Towards Stream Reasoning Future Internet Symposium (FIS) 2008, pp. 72-81.
- A. Bozzon, M. Brambilla, F. M. Facca, G. Toffetti Carughi A Conceptual Modeling Approach to Business Service Mashup Development IEEE International Conference on Web Services, ICWS
2009, Los Angeles. IEEE Press, July 2009, pp. 751 - 758.
- P. Fraternali, M. Brambilla, A. Bozzon, Model-Driven Design of Audiovisual Indexing Processes for Search-Based Applications Content-Based Multimedia Indexing, 2009, CBMI '09, IEEE Press,
ISBN: 978-1-4244-4265-2, pp. 120-125.
- D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle and M. Grossniklaus, C-SPARQL: SPARQL for Continuous Querying Proceedings of WWW 2009, 18th International World Wide Web Conference
(Poster), Madrid, Spain, April 2009
- D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle and M. Grossniklaus Continuous Queries and Real-time Analysis of Social Semantic Data with C-SPARQL
In Proceedings of SDoW 2009, 2nd ISWC Workshop on Social Data on the Web, Washington, DC, USA, October 2009
- D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle and M. Grossniklaus C-SPARQL: A Continuous Query Language for RDF Data Streams International Journal of Semantic Computing (IJSC), 2010,
World Scientific Publishing
- D. F. Barbieri, D. Braga, S. Ceri and M. Grossniklaus An Execution Environment for C-SPARQL Queries In Proceedings of EDBT 2010, 13th International Conference on Extending Database Technology,
Lausanne, Switzerland, March 2010
Prof. Stefano CeriDatabase Management
Web Site & Blog
Web Site
Tech Watch Blog
Blog stats: ~ 900 absolute unique visitors in the last two months
76
Prof. Stefano CeriDatabase Management
Accesses to Web Site & Blog
Provenance
Sources
77
Visits: 20% USA, 18% Italy, 6% UK, 4% India, 4% Canada
Prof. Stefano CeriDatabase Management
Search Computing First Workshop June 17-19, 2009 78
Prof. Stefano CeriDatabase Management
Search Computing Challenges and Directions (LNCS, vol. 5950, Ceri-Brambilla eds.)
Part 1: Vision– Ceri: Search computing – Baeza-Yates: Next generation search– Weikum: Search for knowledge
Part 2: Technology Watch – Della Valle-Buganza-Gatti: The search engine industry– Casati-Daniel-Soi: Mashup technologies– Baumgartner-Campi-Gottlob-Herzog: Web data extraction– Hedeler-Belhajjame-Campi-Embury-Fernandez-Paton:Dataspaces– Bozzon-Fraternali: Multimedia and multimodal information retrieval
Part 3: Issues in Search Computing – Campi-Ceri-Gottlob-Ronchi: Service marts– Braga-Campi-Grossniklaus: Join methods and query optimization– Ilyas-Martinenghi-Tagliasacchi: Rank aggregation – Braga-Grossinklaus-Ceri: Panta Rhei, a query execution environment– Brambilla-Ceri-Fraternali-Manolescu: Liquid queries and liquid results– Brambilla-Ceri: Software engineering of search computing applications– Masseroli-Paton-Spasic: Search computing and the life sciences
79
Prof. Stefano CeriDatabase Management
Second Workshop: Design Principles
Consolidate several ongoing research chapters touching the various aspects of the project
Develop connections to other research projects so as to share knowledge - and possibly build cooperations based on mutual complementarity.
Setting internal deadlines to project evolution– Being ready for the workshop– Dump organisational responsibility to session chairs
Try a more discussion-oriented format– Our view– Guest’s views– Panel/discussion (sometimes driven, sometimes not)
Produce Proceedings as Springer LNCS, each session contributing to a short part
80
Prof. Stefano CeriDatabase Management
81Second SeCo Workshop Last Week
Prof. Stefano CeriDatabase Management
Second Workshop: Sessions
Pre-Workshop (Milano, May 25) – Search as a Process – Business Models
Workshop (Como, May 26-28)– Semantic Resource Framework– Wrapping Technology and Ontological Annotation– Design Tools and Mashup Languages– Search Computing and Research Evaluation– Query Processing– Rank Join– Search Computing for BioMedical Applications– User-Centered Approach to Search Computing Applications
Post-Workshop (Milano, May 31) – Visual Interfaces for Complex Search
82
Prof. Stefano CeriDatabase Management
83Looking forward
Establish stronger co-operation with other projects– Both for technology and applications
Strengthen SeCo “core research”– Cover the process lifecycle with methods & tools– Improve result visualization and user interaction– Use semantics in service registration and query processing– Turn Panta Rhei into a full Service Base Management System
(SBMS) with new rank join methods, proximity, uncertainty…
Strengthen the prototypes– Fully develop the registration environment– Extend the execution environment, make it scalable over clouds– Extend the liquid interface, cover mobile interfaces
Put a “killer” application online (usable!)
Explore exploitation options