eTOUR CWA final 2009-06-03

139
CEN CWA - - - - - WORKSHOP Final 2009-06-03 AGREEMENT ICS Number English version Harmonization of data interchange in tourism This CEN Workshop Agreement has been drafted and approved by a Workshop of representatives of interested parties, the constitution of which is indicated in the foreword of this Workshop Agreement. The formal process followed by the Workshop in the development of this Workshop Agreement has been endorsed by the National Members of CEN but neither the National Members of CEN nor the CEN Management Centre can be held accountable for the technical content of this CEN Workshop Agreement or possible conflicts with standards or legislation. This CEN Workshop Agreement can in no way be held as being an official standard developed by CEN and its Members. This CEN Workshop Agreement is publicly available as a reference document from the CEN Members National Standard Bodies. CEN Members are the national standards bodies of Austria, Belgium, Bulgaria, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, the Netherlands, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland, and United Kingdom. EUROPEAN COMMITTEE FOR STANDARDIZATION COMITÉ EUROPÉEN DE NORMALISATION EUROPÄISCHES KOMITEE FÜR NORMUNG Management Centre: Avenue Marnix 17, 36 B-1000 Brussels © 2009 CEN All rights of exploitation in any form and by any means reserved worldwide for CEN National Members. Ref. No. CWA - - - - -

Transcript of eTOUR CWA final 2009-06-03

Page 1: eTOUR CWA final 2009-06-03

CENCWA - - - - -

WORKSHOP Final 2009-06-03

AGREEMENT

ICS Number

English version

Harmonization of data interchange in tourism

This CEN Workshop Agreement has been drafted and approved by a Workshop of representatives of interested parties, the constitutionof which is indicated in the foreword of this Workshop Agreement.

The formal process followed by the Workshop in the development of this Workshop Agreement has been endorsed by the NationalMembers of CEN but neither the National Members of CEN nor the CEN Management Centre can be held accountable for the technicalcontent of this CEN Workshop Agreement or possible conflicts with standards or legislation.

This CEN Workshop Agreement can in no way be held as being an official standard developed by CEN and its Members.

This CEN Workshop Agreement is publicly available as a reference document from the CEN Members National Standard Bodies.

CEN Members are the national standards bodies of Austria, Belgium, Bulgaria, Cyprus, Czech Republic, Denmark, Estonia, Finland,France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, the Netherlands, Norway, Poland,Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland, and United Kingdom.

EUROPEAN COMMITTEE FOR STANDARDIZATIONCOMITÉ EUROPÉEN DE NORMALISATION

EUROPÄISCHES KOMITEE FÜR NORMUNG

Management Centre: Avenue Marnix 17, 36 B-1000 Brussels

© 2009 CEN All rights of exploitation in any form and by any means reserved worldwide for CEN National Members.

Ref. No. CWA - - - - -

Page 2: eTOUR CWA final 2009-06-03

2 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

Contents

Foreword 7

Executive summary 8

Summary of recommendations 11

Overall recommendations 11

List of recommendations on different topics 12

1 Scope 17

2 Normative references 18

3 Abbreviations, terms and definitions 19

3.1 Abbreviations 19

3.2 Terms and definitions 20

4 Methodology and thematic overview 21

4.1 Thematic circle 21

4.2 Topics 23

4.2.1 Semantics 23

4.2.2 Data transformation 24

4.2.3 Process handling 25

4.2.4 Metasearch 25

4.2.5 Object identification 25

4.3 Cross-cutting concerns / Prerequisites 26

4.3.1 Legal aspects 26

4.3.2 Multiculturalism 27

4.3.3 Business models 28

4.3.4 Technology 29

5 Case study 30

5.1 The processes 31

5.1.1 The actors 31

5.1.2 Consumer process 31

5.1.3 Travel-related professional process 33

5.2 The information and communication technologies 34

5.2.1 Multiple levels of data sources 34

5.2.2 Type of information 36

5.2.3 Type of data sources 38

6 Semantics 40

6.1 Standards 40

6.1.1 Needs and requirements 40

6.1.1.1 Introduction 40

6.1.1.2 Needs 41

6.1.1.3 Requirements 42

6.1.2 State of the art 42

6.1.2.1 Types of standards 44

6.1.2.2 List of travel industry standards, companies and organizations(examples) 44

Page 3: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 3

6.1.3 Gaps and future needs 55

6.1.4 Recommendations 55

6.1.4.1 Short-term recommendations (1–3 years) 55

6.1.4.2 Long-term recommendations (3–10 years) 56

6.2 Taxonomies 56

6.2.1 Needs and requirements 56

6.2.1.1 Introduction 56

6.2.1.2 Needs 56

6.2.1.3 Requirements 57

6.2.2 State of the art 57

6.2.2.1 Examples of tourism taxonomies 58

6.2.3 Gaps and future needs 59

6.2.4 Recommendations 60

6.2.4.1 Short-term recommendations (1–3 years) 60

6.2.4.2 Long-term recommendations (3–10 years) 60

6.3 Ontologies 60

6.3.1 Needs and requirements 60

6.3.1.1 Introduction 60

6.3.1.2 Needs 61

6.3.2 State of the art 62

6.3.2.1 Definitions of the notion of ontology within the computer sciencedomain 62

6.3.2.2 Main components of an ontology 63

6.3.2.3 Ontology development tools 63

6.3.2.4 Ontology development languages 64

6.3.2.5 Examples of standard ontologies 65

6.3.3 Gaps and future needs 68

6.3.4 Recommendations 69

6.3.4.1 Short-term recommendations (1–3 years) 69

6.3.4.2 Long-term recommendations (3–10 years) 69

7 Data transformation 70

7.1 Structured data mapping 70

7.1.1 Needs and requirements 70

7.1.1.1 Introduction 70

7.1.1.2 Needs 71

7.1.1.3 Requirements 72

7.1.2 State of the art 73

7.1.3 Gaps and future needs 74

7.1.4 Recommendations 75

7.1.4.1 Short-term recommendations (1–3 years) 75

7.1.4.2 Long-term recommendations (3–10 years) 75

7.2 Manual semantic annotation 75

7.2.1 Needs and requirements 76

7.2.2 State of the art 77

7.2.3 Gaps and future needs 78

Page 4: eTOUR CWA final 2009-06-03

4 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

7.2.4 Recommendations 78

7.2.4.1 Short-term recommendations (1–3 years) 78

7.2.4.2 Long-term recommendations (3–10 years) 78

7.3 Automatic information extraction 79

7.3.1 Needs and requirements 79

7.3.1.1 Needs 79

7.3.1.2 Requirements 79

7.3.2 State of the art 79

7.3.2.1 Named entity recognition 80

7.3.2.2 Event extraction 80

7.3.2.3 Tourism-specific information extraction 81

7.3.3 Gaps and future needs 82

7.3.3.1 Named entity recognition 82

7.3.3.2 Event extraction 82

7.3.3.3 Tourism-specific information extraction 82

7.3.4 Recommendations 82

7.3.4.1 Short-term recommendations (1–3 years) 82

7.3.4.2 Long-term recommendations (3–10 years) 82

7.4 Inter-ontology mapping 83

7.4.1 Needs and requirements 83

7.4.1.1 Introduction 83

7.4.1.2 Needs 83

7.4.1.3 Requirements 83

7.4.2 State of the art 84

7.4.3 Gaps and future needs 85

7.4.4 Recommendations 86

7.4.4.1 Short-term recommendations (1–3 years) 86

7.4.4.2 Long-term recommendations (3–10 years) 86

8 Process handling 87

8.1 Needs and requirements 87

8.1.1 Introduction 87

8.1.2 Needs 88

8.1.3 Requirements 90

8.2 State of the art 91

8.2.1 Global standardization efforts 91

8.2.2 Application Integration and APIs 91

8.3 Gaps and future needs 92

8.4 Recommendations 93

8.4.1 Short-term recommendations (1–3 years) 93

8.4.2 Long-term recommendations (3–10 years) 93

9 Metasearch 94

9.1 Methodology 94

9.1.1 Needs and requirements 94

9.1.1.1 Introduction 94

9.1.1.2 Quality of results 94

Page 5: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 5

9.1.1.3 Response time 94

9.1.1.4 Access to data 95

9.1.1.5 Efforts for maintenance 95

9.1.2 State of the art 95

9.1.2.1 Web crawler 95

9.1.2.2 HTTP requests 95

9.1.2.3 Website wrapper 96

9.1.2.4 Application Programming Interfaces (API) 96

9.1.2.5 Web services 96

9.1.2.6 Semantic annotation 96

9.1.2.7 Caching mechanism 97

9.1.2.8 Summary 97

9.1.3 Gaps and future needs 97

9.1.4 Recommendations 98

9.1.4.1 Short-term recommendations (1–3 years) 98

9.1.4.2 Long-term recommendations (3–10 years) 98

9.2 Querying 99

9.2.1 Needs and requirements 99

9.2.1.1 Introduction 99

9.2.1.2 Needs and requirements 99

9.2.2 State of the art 100

9.2.2.1 Methods for query distribution 100

9.2.2.2 Query by example 101

9.2.2.3 Standardized query languages 101

9.2.2.4 Interface standardization 102

9.2.2.5 Metadata syndication 103

9.2.3 Gaps and future needs 104

9.2.3.1 Query by example 104

9.2.3.2 Standardized query languages / SPARQL 104

9.2.3.3 Interface standardization 104

9.2.3.4 Metadata syndication 105

9.2.4 Recommendations 105

9.2.4.1 Short-term recommendations (1–3 years) 105

9.2.4.2 Long-term recommendations (3–10 years) 105

9.3 Role of registries in eTourism 106

9.3.1 Needs and requirements 106

9.3.1.1 Introduction 106

9.3.1.2 Needs 106

9.3.1.3 Requirements 107

9.3.2 State of the art 107

9.3.2.1 UDDI and the ebXML Registry Specification 107

9.3.2.2 CEN/ISSS eGovernment Focus Group and CEN/ISSS WSeGov-Share 109

9.3.3 Gaps and future needs 111

9.3.3.1 Shortcomings of current registry standards 111

Page 6: eTOUR CWA final 2009-06-03

6 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

9.3.3.2 Future needs 112

9.3.4 Recommendations 113

9.3.4.1 Short-term recommendations (1–3 years) 113

9.3.4.2 Long-term recommendations (3–10 years) 114

10 Object identification 115

10.1 Needs and requirements 115

10.1.1 Introduction 115

10.1.2 Needs 115

10.1.3 Requirements 116

10.1.3.1 Location codes 116

10.1.3.2 Travel service codes 116

10.1.3.3 Travel service qualifier codes 117

10.1.3.4 Travel company codes 117

10.2 State of the art 117

10.2.1 IATA 117

10.2.2 ICAO 118

10.2.3 ISO 119

10.2.4 UN/LOCODE 119

10.2.5 HEDNA 120

10.2.6 ACRISS 120

10.2.7 GIATA 120

10.2.8 GS1 121

10.2.9 URI 121

10.2.10 UUID 121

10.3 Gaps and future needs 122

10.3.1 Location 122

10.3.1.1 Country codes 122

10.3.1.2 Region codes 122

10.3.1.3 City, airport and other point of travel codes 123

10.3.2 Currency and language codes 124

10.3.3 Travel service codes 124

10.3.4 Travel service qualifier codes 124

10.3.5 Travel company codes 124

10.4 Recommendations 125

10.4.1 Short-term recommendations (1–3 years) 125

10.4.2 Long-term recommendations (3–10 years) 125

11 Best practice case 126

11.1 The starting point 126

11.2 The existing case of euromuse.net 126

11.3 Future scenario for euromuse.net 127

11.4 Critical discussion 128

12 Bibliography and references 130

Page 7: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 7

Foreword

The objective of the Workshop CEN/ISSS WS/eTOUR on “Harmonization of datainterchange in tourism” and the production of this draft CEN Workshop Agreement(CWA) was approved by the Workshop at its plenary meeting held in Brussels on 6February 2008.

This final version of the CWA was approved by letter ballot following the finalWorkshop meeting on 15 May 2009.

The document has been prepared by the eTOUR Project Team:

David Faveur, Afidium, France,

Manfred Hackl, x+o Business Solutions GmbH, Austria,

Marc Wilhelm Küster, Fachhochschule Worms, Germany,

Carlos Lamsfus, Asociacion Centro de Investigacion Cooperativa enTurismo, Spain.

In his capacity as Chair of the Workshop Wolfram Höpken, University of AppliedSciences Ravensburg-Weingarten and Etourism Competence Center Austria, hascontributed greatly to the work with the CWA.

The Secretary of the Workshop has been Håvard Hjulstad, Standards Norway.

Workshop participants have included: Afidium (France) • Asociación Centro deInvestigación Cooperativa en Turismo, CICtourGUNE (Spain) • BIT Reiseliv (Norway)• Centre de Recherche Public Henri Tudor (Luxembourg) • ECCA – EtourismCompetence Center Austria • eCl@ss – International Classification System(Germany) • euromuse.net – the European exhibition portal • ETOA – European TourOperators Association • Fachhochschule Worms (Germany) • Hochschule München– Fakultät für Tourismus (Germany) • FernUniversität in Hagen (Germany) • Infoterm– International Information Centre for Terminology • IfM – Institute for MuseumResearch – SMB-PK (Germany) • OpenTravel Alliance (USA) • Smart InformationSystems (Austria) • Travel and Telecom Ltd (UK) • TTI – Travel Technology InitiativeLtd (UK) • Universitat Oberta de Catalunya (Spain) • x+o Business Solutions GmbH(Austria)

This CEN Workshop Agreement is publicly available as a reference document fromthe National Members of CEN: the national standards bodies of Austria, Belgium,Bulgaria, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany,Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, theNetherlands, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain,Sweden, Switzerland, and United Kingdom.

Page 8: eTOUR CWA final 2009-06-03

8 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

Executive summary

Problem statement

Tourism is in the vanguard of ICT adoption and eBusiness in the area of eMarketingand online sales (B2C). Yet, in a ranking of various sectors the tourism industry onlyachieves a mid-level score in the overall use of ICT and eBusiness. It is still laggingbehind especially regarding the deployment of ICT infrastructure and the adoption ofe-integrated business processes [eBusiness W@tch Report 2006/2007, p 167]. Atthe same time, tourism is an important and growing sector of the European economy,with a large presence of SMEs.

Electronic data interchange and the interoperability between systems of differentparties are critical for the execution of eBusiness processes throughout the entireindustry. A CEN Workshop was set up to recommend approaches for reaching globalinteroperability, i.e. seamless data interchange and execution of business processesin the tourism sector, meeting the requirements of players on all levels of the valuechain.

Approach

Data interchange has two key components: The electronic data itself and theexchange of data between two or more tasks in larger process chains. This hinges onthe ability of all tasks to understand the data they are supposed to consume – i.e.data interoperability – and of processes to be able to meaningfully cooperate –process interoperability. This draft CWA thus circles around the two core issues“data” and “processes” and related challenges in the domain that need deeperanalysis. In particular, we have identified five topics for further analysis which arebriefly outlined below: “semantics”, “data transformation”, “process handling”,“metasearch” and “object identification”.

These five topics are placed in the larger context of four cross-cutting concerns thatpermeate all of them. Tourism transactions on the one hand regularly transcendnational and cultural boundaries and frequently involve both very small and verylarge players. On the other hand, very many of the parameters – rating systems foraccommodation, opening hours of sites, classification of beaches – are regulatednationally or even regionally and reflect cultural preferences. All transactions mustnaturally follow pertinent national or regional laws and regulations. This leads to thefour cross-cutting concerns “Legal aspects”, “Multiculturalism”, “Business models”,and “Technology”.

The five challenges

Semantics

The meaning and structure of data is at the heart of data interoperability – and, giventhe plethora of pertinent formats, it is unfortunately a complex problem. Agreedstrategies towards the expression of semantics in eTourism applications are key to

Page 9: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 9

the flexible integration of heterogeneous data structures from a wide number of datasources. In this it is also a central requirement for the building of flexible, cross-organizational process chains.

Data transformation

The co-existence of many different data formats already implies the need totransform data during data exchange. This mapping can affect data structures ondifferent levels that need to be transformed:

Meta data: Ontologies and taxonomies; Structured data; Unstructured data.

Together with well-defined semantics, data transformation is an essential tool tointegrate data sources and build cross-organizational processes.

Process handling

The World Wide Web has significantly boosted the use of ICT in the tourism industryand empowered customers to make travel arrangements autonomously by the use ofa wide variety of different data sources. This requires the seamless interplay ofdifferent computer systems, allowing new online services like dynamic packaging oftourism products.

Metasearch

Metasearch proper builds on shared semantics and data transformation to enablesearches across different individual search components of heterogeneous websitesand aggregate the results in a unified list. From a user’s perspective they offer thus aone-stop entry point to a specific type of information; from a technology perspectivethey have high demands on distributed data querying.

Object identification

Electronic transactions often hinge upon the idea of being able to uniquely identifythe objects on which they operate. In contrast to for example flights, there are manytypes of objects in Tourism that do not have a unique identifier. There is at present nouniversally accepted scheme to identify, say, a given hotel that should be booked, orto compare different offers for the same hotel.

Best practice case

To demonstrate the whole interoperability issue and reflect on ways how to solve theproblems derived from the five challenges, the existing eTEN project euromuse.nethas been chosen as a best practice case. euromuse.net deploys the Harmonisetechnology, a result of a former IST project, to mediate between different data

Page 10: eTOUR CWA final 2009-06-03

10 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

formats from the cultural heritage and the tourism sector and is confronted very muchwith the same challenges as discussed in the workshop report.

“Mediation” has been identified as the key concept to reach interoperability in a highlyfragmented and diversified area like the tourism industry. This best practice casedemonstrates the way how to easily reach interoperability by data mediation, whileleaving enough flexibility to each partner to define his own data format.

Recommendations

The workshop came up with a number of recommendations that are all centredaround the basic idea to deal with the diversity of existing standards, technologies,projects, and entities – rather than bringing another standard to the market. Thekeywords in this context are harmonization and mediation.

The suggested approach is to watch carefully existing standards or approaches,when starting to create something new, and to build upon them keeping differencesto a necessary minimum. This harmonization shall help to avoid isolated standardsand approaches that make interoperability difficult.

Furthermore, ways should be found to mediate between the remaining differences ofexisting approaches. The tourism sector has come up with a broad spectrum ofdifferent standards and models, and for various reasons it will be difficult, if not evenimpossible, to replace them. This diversity is also needed to some extend andmediation between them shall help to deal with these differences.

To oversee the market, it is highly recommended to implement a watchtower as afollow-up action within the work of this CEN workshop, keeping a map of thesemantic landscape to support harmonization of data and offering technology andrecommendations to mediate between existing standards. HarmoNET, as an existingnon-profit network, established out of a European project and dedicated to datamediation in tourism, shall be the starting point for this watchtower.

In addition it is recommended to invest in long-term research on semantic methodsand tools, as well as new ways of object identification, to continue what has alreadystarted in several European projects.

These recommendations aim at keeping diversity and flexibility of the EuropeaneTourism landscape, while allowing process and data interoperability for the actorsinvolved to achieve a higher level of e-integration.

Page 11: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 11

Summary of recommendations

Overall recommendations

The workshop came up with a number of recommendations that are all centredaround the basic idea to deal with the diversity of existing standards, technologies,projects, and entities – rather than bringing another standard to the market. Thekeywords in this context are harmonization and mediation. The desirable it seems tounify terms and standards to allow easy exchange of information and execution ofprocesses, the important it is to leave the market flexibility and diversity to define dataschemas. Instead, ways should be found to mediate between the differentapproaches. The tourism sector has come up with a broad spectrum of differentstandards, and for various reasons it will be difficult, if not even impossible, to replacethem.

One reason is that eTourism-relevant information, like most of all productdescriptions and information classifications, are often deeply rooted in local andnational peculiarities and are sometimes even expressed by national law. Take assimple example the classification of “sea view” or of “wellness area”.

Another reason is the game of market forces, making it difficult to reach consensuson the issues involved. Different from many other industries, like the constructionindustry, the benefit from having different standards seems to have more advantagesthan the lack of interoperability has disadvantages. This can be observed in the areaof destination management as well as on the side of tour operators. However, theneed for standardization is recognized as it can be seen from different industryassociations and forums. But strong resistance can be observed when discussingapproaches for European or worldwide standards.

Above all the detailed recommendations listed in the chapters below, a generalapproach is therefore suggested to harmonize (keeping differences to a minimum)and to mediate (enable understanding between the differences) existing formats andstandards. This approach must be flexible, easy to use and cost-effective, as it is thecase for example within the project euromuse.net, which is described as the bestpractice case. These criteria are critical to the success of the approach, since thetourism sector is characterized by a large number of small and medium-sizedorganizations.

The approach of mediation shall in no way invite to establish as many isolated newstandards as possible. One should rather try to watch carefully existing standards orapproaches when starting to create something new, to enable later mediationbetween them as easily as possible, only deviating from other standards where it isabsolutely required.

To ease the harmonization and mediation, it is highly recommended to implement awatchtower, keeping a map of the semantic landscape and offering technologies andrecommendations to mediate between existing standards. The watchtower shallmonitor relevant standards and reference lists to see what is coming up and what isused in the market. It could also help to identify existing standards or frameworks for

Page 12: eTOUR CWA final 2009-06-03

12 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

object identification. In addition it could also keep track of technologies and projectseasing the problem of data and process interoperability, to come up withrecommendations on interoperability approaches and best practises for data modelsand for interoperability approaches. At the same time the watchtower couldoperatively be offering a data mediation service between the recognized standards inthe field, to serve as a central data meditation service.

It is recommended to write a more detailed proposal for this “eTourism watchtower”as a follow-up action within the work of this CEN workshop. HarmoNET, as anexisting non-profit network established out of a European project and dedicated todata mediation in tourism, is the ideal starting point for this watchtower. HarmoNET iscomposed by main tourism bodies from different levels and is well positioned as thehost for the watchtower.

In addition it is recommended to invest in long-term research on semantic methodsand tools, as well as new ways of object identification, to continue what has alreadybeen started in several European projects. A more detailed recommendation onresearch areas is given in the following chapters. However, the proposed watchtowercould also help to identify gaps and needs for long-term research.

All these recommendations aim at keeping diversity and flexibility of the EuropeaneTourism landscape, while allowing process and data interoperability for the actorsinvolved to achieve a higher level of e-integration.

List of recommendations on different topics

Under each topic are listed short-term recommendations and long-termrecommendations with time spans of 1–3 and 3–10 years respectively.

Standards

Short-term recommendations

Leverage existing standards rather than develop new specifications wheneverpossible.

Build cooperation between private associations like IATA, OTA, and XFT andformal standardization bodies such as ISO and CEN.

Build a “watchtower” registry of relevant eTourism standards that is also actingas a coordination body between various formal and informal standardizationactivities. Such an activity can be modelled on the MoU/MG.

Long-term recommendations

Lower the entry barrier for participation in pertinent formal and informalstandardization bodies especially for SMEs and extend the scope of thoseactivities to cover the requirements of SMEs.

Work on interoperability approaches between different standards.

Page 13: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 13

Taxonomies

Short-term recommendations

Follow existing taxonomies including established definitions whereverpossible.

Produce mappings between eTourism-related taxonomies.

Federate existing eTourism-related taxonomies across languages based ontaxonomy mappings and offer a SKOS (Simple Knowledge OrganisationSystem) interface to them.

Formulate guidelines for the design of eTourism-related taxonomies.

Long-term recommendations

Build organizational structures for the long-term duration of eTourism-relatedtaxonomies.

Ontologies

Short-term recommendations

Use recognized standard reference models such as the Harmonise ontology(for tourism purpose) or CIDOC CRM (for cultural heritage data) whereverpossible.

Produce guidelines for the mappings between eTourism-related ontologiesbased on standard reference models.

Use established standards such as RDF(S), OWL or the Topic Map ConstraintLanguage to express ontologies.

Heighten the awareness of Open Source, user-friendly tools for ontologydefinition such as Protegé.

Structured data mapping

Short-term recommendations

Use (graphical) mediation tools enabled with reasoning capabilities toautomatically suggest same (semantically equivalent) data resources, identifyinconsistencies and decreases the amount of human intervention in themapping processes.

Pursue the design and implementation of new data resources on the bases ofagreed recommendations, such as the W3C recommendations for SemanticWeb technologies.

Page 14: eTOUR CWA final 2009-06-03

14 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

Long-term recommendations

Use semantic web technologies (e.g. based on RDF URIs) to name andrepresent (data) resources on the Web so that mapping can be automaticallyundertaken.

Agree the degree of formality information ought to be defined with, so thatautomatic mapping tools can compare information.

Ontologies should be developed on different abstraction level. Agreed high-level ontologies should be in place and should be used when defining domainontologies. General domain ontologies should be reused when more specificsub-domain ontologies are defined.

Manual semantic annotation

Due to the nature of this topic, there can be some overlapping of recommendationswith other issues that have already been covered, such as ontologies.

Short-term recommendations

Enhance the use of standard ontologies (e.g. Harmonise) on the field oftourism.

Enhance the development of ontologies with standard languages: OWL, RDF.

Enhance the use of already existing manual annotation tools in the realm oftourism.

Long-term recommendations

Investigate in automation of annotations:

Investigate in automatic ontology extension.

Automatic information extraction

Short-term recommendations

Foster the use of semantic web technologies to describe non-structured dataon the web by the means of resources to make data machine processable.

Semantically tag non-structured information.

Long-term recommendations

Together with a recognized body such as the W3C, agree on the name thatought to be used for the tags that represent a particular tourism content andthat is valid for search machines.

Develop SW that enables (semi)automatic information annotation according tothe previous recommendation.

Page 15: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 15

Inter-ontology mapping

Short-term recommendations

Foster the development of ontologies using the same standard definitionlanguage as well as the same degree of formality and expressivity to easeautomatic ontology mapping, following W3C recommendations.

Long-term recommendations

Based on the short-term recommendations, build graphic user interface basedtools that automatically merge and link ontologies, using the ontologies'reasoning capabilities to automatically find and resolve alignmentinconsistencies.

Process handling

Short-term recommendations

Simplify and rationalize existing processes – use stateless process handling orrequest-response-pairs only.

Build an ontology of common processes in the tourism industry.

Long-term recommendations

Develop process mediators.

Put research efforts into intelligent agent technologies for automatic processhandling.

Metasearch methodology

Short-term recommendations

Make use of semantic technologies to describe your data.

Provide content and meta-content as close to an existing standard aspossible.

Provide regularly updated, external data stores with pre-processed and welldescribed content for fast querying (caching mechanism), if you have largerquerying process times or complex queries.

Development of aggregated data repositories, providing pre-processed datafrom different sources.

Page 16: eTOUR CWA final 2009-06-03

16 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

Long-term recommendations

Focus on development of fast and easy to use alternatives of metasearchtechnologies, enabling or supporting use of semantic technologies for datatransformation.

Querying

Short-term recommendations

If a system should be available for external queries, make use of generalquery statements that are supported by a broad range of query languages.Avoid specific features and functionality of own database.

Further develop flexible standardized query languages that can be adapted todifferent system environments and support semantically enriched data.

Publish “partial translators”, which provide a structured translation for humansearch concepts like “near”, that can be used by different query languages.

Long-term recommendations

Research on technologies for flexible and adaptive query methods, that areable to understand semantics of a web repository and can send an appropriatequery.

Object identification

Short-term recommendations

Build a registry of present object identifications in the tourism industry.

Develop travel related global geography identifiers and build transcodingcapabilities.

Develop travel company related global identifiers.

Long-term recommendations

Provide guidelines for travel service coding schemes.

Build a global repository with transcoding capacities.

Page 17: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 17

1 Scope

The CEN/ISSS Workshop on eTourism aims at producing guidelines for reachingglobal interoperability, i.e. enabling seamless data interchange and execution ofeBusiness processes in the tourism sector.

The Workshop’s main deliverable will be a CEN Workshop Agreement (CWA) on“Harmonization of data interchange in tourism”.

The CWA will cover the following topics under a pan-European interoperabilityperspective:

a. analysis and identification of the needs of B2B and B2C partners forharmonized data interchange;

b. analysis of the gaps in the design of current interoperability approaches;c. description of the metadata and principles and requirements for data

modelling;d. analysis of business models and legal issues (IPRs5, DRMs, Personal data

protection and privacy);e. analysis of existing initiatives and approaches for flexible harmonization and

global interoperability (including process interoperability);f. recommendations concerning a general framework for eTourism related

information exchange;g. best practice case.

The Workshop’s main focus is on interoperability issues in electronic datainterchange. It will analyse and further build on the results of the already completedEuropean projects Harmonise (Tourism Harmonisation Network), HarmoTEN andSatine (Semantic based interoperability infrastructure for integrating web serviceplatform to peer to peer networks). The Workshop’s aim is to validate anddisseminate their results to a wider audience than the project partners. The CENWorkshop will build on the work done by previous projects on metadata frameworksand ontologies.

It is outside the scope of the Workshop to do any direct standardization work onterminology. Instead, it will analyse existing initiatives or approaches to theinteroperability problem and recommend steps how to make maximal use of suchapproaches as well as necessary research activities to further improve them.

The CEN Workshop will focus on data integration and discovery as well as seamlessexecution of eBusiness processes. Application of the above will support end-usersatisfaction/consumption of travel products, increase data reliability, revenuegeneration and margin contribution, motivating early adoption and roll out to market.

Page 18: eTOUR CWA final 2009-06-03

18 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

2 Normative references

The following normative documents (European and International Standards) arereferenced in this document. Other documents of interest are listed in theBibliography.

ISO 639-1:2002 Codes for the representation of names of languages — Part1: Alpha-2 code

ISO 639-2:1998 Codes for the representation of names of languages — Part2: Alpha-3 code

ISO 639-3:2007 Codes for the representation of names of languages — Part3: Alpha-3 code for comprehensive coverage of languages

ISO 3166-1:2006 Codes for the representation of names of countries and theirsubdivisions — Part 1: Country codes

ISO 3166-2:2007 Codes for the representation of names of countries and theirsubdivisions — Part 2: Country subdivision code

ISO 4217:2008 Codes for the representation of currencies and funds

ISO/IEC 7810:2003 Identification cards — Physical characteristics

ISO 9000:2005 Quality management systems — Fundamentals andvocabulary

ISO/IEC 9075 (several parts) Information technology — Database languages— SQL

ISO/IEC 9834:2005 (several parts) Information technology — Open SystemsInterconnection — Procedures for the operation of OSI Registration Authorities

ISO/IEC 10646:2003 Information technology — Universal Multiple-OctetCoded Character Set (UCS)

ISO/IEC 13250:2003 Information technology — SGML applications — Topicmaps

ISO 14001:2004 Environmental management systems — Requirements withguidance for use

ISO 16642:2003 Computer applications in terminology — Terminologicalmarkup framework

ISO 21127:2006 Information and documentation — A reference ontology forthe interchange of cultural heritage information

ISO/IEC Guide 2:2004 Standardization and related activities — Generalvocabulary

Page 19: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 19

3 Abbreviations, terms and definitions

3.1 Abbreviations

A

AI — artificial intelligence

API — application programminginterfaces

ASCII — American Standard Code forInformation Interchange

B

B2B — business to business

B2C — business to consumer

B2G — business to government

C

CEN — European Committee forStandardization

CRS — computer reservation system

CWA — CEN Workshop Agreement

CycL — Ontology language used in AIand computer science

D

DB — database

DML — data manipulation language

DRM — digital rights management

E

EDI — electronic data interchange

ETSI — EuropeanTelecommunications StandardsInstitute

F

ftp — file transfer protocol

G

G2C — government to citizen

GDS — global distribution system

H

HEDNA — Hotel Electronic DistributionNetwork Association

HTML — hypertext markup language

http — hypertext transfer protocol

I

IATA — International Air TransportAssociation

ICAO — International Civil AviationOrganization

ICT — information andcommunications technology

IEC — International ElectrotechnicalCommission

IEEE — Institute of Electrical andElectronic Engineers

IFITT — International Federation for ITand Travel & Tourism

IFLA — International Federation ofLibrary Associations andInstitutions

IPR — intellectual property right

ISO — International Organization forStandardization

IST — Information SocietyTechnologies

M

M2M — machine to machine

N

NKRL — narrative knowledgerepresentation language

O

OML — outline markup language

OWL — ontology web language

P

P3P — Platform for PrivacyPreferences

PDA — personal digital assistant

PMS — property management system

Q

QBE — query by example

Page 20: eTOUR CWA final 2009-06-03

20 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

R

RDF — resource descriptionframework

RDFS — resource descriptionframework schema

RMSIG — Reference Model SpecialInterest Group (under IFITT)

S

SCORM — Sharable Content ObjectReference Model

SHOE — simple HTML ontologyextensions

SME — small and medium enterprises

SOA — service-oriented architectures

SQL — standardized query language

T

TCP/IP — Transmission ControlProtocol / Internet Protocol

TGV — train grande vitesse: highspeed train

U

UCS — universal character set(ISO/IEC 10646)

UNWTO — World TourismOrganization

URI — uniform resource identifier

W

W3C — World Wide Web Consortium

WAI — Web Accessibility Initiative

WSMO — web service modelingontology

WWW — World Wide Web

X

XFT — exchange for travel

XHTML — extensible HTML

XML — extensible markup language

XSLT — extensible stylesheetlanguage transformation

3.2 Terms and definitions

For the purpose of this document the following definitions apply.

computer reservation system (CRS) — computerized system used to store andretrieve information and conduct transactions

eTourism — eBusiness methods and techniques applied to the tourism domain

global distribution system (GDS) — CRS connecting and integrating theautomated booking systems of different organizations

tour operator — person or company that organizes tours

thesaurus — controlled vocabulary containing synonyms and relationships, but notdefinitions (see 7.2.1.2)

taxonomy — subject-based classification using a controlled vocabulary in ahierarchy (see 6.2 and 7.2.1.1)

ontology — (1) study of the nature of being, existence or reality; (2) structuredinformation about reality (see 6.3)

folksonomy — taxonomy developed as a broad collaborative effort (see 7.2.1.3)

Page 21: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 21

4 Methodology and thematic overview

Tourism is in the vanguard of ICT adoption and eBusiness in the area of eMarketingand online sales (B2C). Yet, in a ranking of various sectors, the tourism industry onlyachieves a mid-level score in the overall use of ICT and eBusiness. It is still laggingbehind especially regarding the deployment of ICT infrastructure and the adoption ofe-integrated business processes [eBusiness W@tch Report 2006/2007, p 167].

Tourism is an important and growing sector of the European economy, with a largepresence of SMEs. ICT is an enabler to strengthen efficiency, reduce costs andimprove competitiveness of the industry. Tourism is expected to contribute 8.4 % oftotal employment and 9.9 % of the GDG worldwide [World Travel and TourismCouncil, 2008, p 4].

For these reasons, it is important that companies and associations in the tourismsector understand the benefits they can reap from eBusiness, enhance their ICTinfrastructure, and adopt eBusiness processes.

Electronic data interchange and the interoperability between systems of differentparties are critical for eBusiness processes in all industry sectors. This CWS focuseson approaches for reaching global interoperability, i.e. seamless data interchangeand execution of business processes in the tourism sector.

In eBusiness implementations the tourism sector has some specificities. Data qualityand reliability are critical issues (e.g. updated opening hours for a museum, reliableon-line booking). Other critical issues are territorial definition and coordinationbetween regional or local groups and national sites. Commercial information (B2B,B2C, B2G) and “touristic information” (information to the end user, G2C) are bothconcerned. All involved parties provide information at different levels (e.g.government – travel warning; B2C the mentioned opening hours, B2B distributionprices and their meanings). These specificities lead to a high degree of heterogeneityin tourism. Tourism market structures are complex and highly fragmented.Information interchange on the level of processes and data structures is notharmonized and the electronic execution of business processes on a global level isstill burdened by heterogeneous interfaces and data structures.

4.1 Thematic circle

Data interchange has two key components: the electronic data itself and theexchange of data between two or more tasks in larger process chains; see figure 4-1.

Page 22: eTOUR CWA final 2009-06-03

22 – CEN/ISSS WS/eTOUR

This hinges on the ability of all tasks to understand the data they are supposed toconsume – i.e. data interoperabilitycooperate – process interoperabilconcepts of data and processes; see figure 4

CEN/ISSS WS/eTOUR – CWA – 2009-06-03

Figure 4-1

This hinges on the ability of all tasks to understand the data they are supposed toi.e. data interoperability – and of processes to be able to meaningfullyprocess interoperability. Our report thus circles around the two key

concepts of data and processes; see figure 4-2.

Figure 4-2

This hinges on the ability of all tasks to understand the data they are supposed toand of processes to be able to meaningfully

ity. Our report thus circles around the two key

Page 23: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 23

The circle captures the relationship between data and process interoperability withkey enablers that need deeper analysis. In particular, we have identified the followingtopics for further analysis:

Semantics Data transformation Process handling Metasearch Object identification

The topics are placed in the larger context of four cross-cutting concerns thatpermeate all of them. Tourism transactions on the one hand regularly transcendnational and cultural boundaries and frequently involve both very small and verylarge players. On the other hand, many of the parameters – rating systems foraccommodation, opening hours of sites, classification of beaches – are nationally oreven regionally regulated or reflect cultural preferences. All transactions mustnaturally follow pertinent national or regional laws and regulations.

Processes will in particular be implemented in line with the process owner’s overallbusiness model. The data structures will similarly often be dictated by the owner’svalue proposition. Furthermore, both data and processes will at least to a degreereflect the technology – software, hardware, overall connectivity etc. – on which thesystem in question operates.

4.2 Topics

The following subsections will briefly present each of the selected topics, give a birds-eye view, and motivate the rationale for their choice. The remainder of the report willthen examine the issues methodically and in more detail.

4.2.1 Semantics

The meaning and structure of data is at the heart of data interoperability – and, giventhe plethora of pertinent formats, it is unfortunately a complex problem. Differenceson the syntactic level – say, XML messages versus comma-separated files or EDI-type communications – can already impact how much semantics data carries alreadyin itself. Formal or informal standards can externally assign meaning to an otherwisemeaningless data set (say, to an otherwise arbitrary sequence of fields in the rows ofa csv file), or explicate the semantics of XML structures that to humans are alreadypartially self explanatory.

Taxonomies can help to unambiguously specify possible value sets for the data,ideally combined with specific definitions of the individual options and theirrelationship to others. Ontologies can then reference and use theses value sets inproperties of classes that go a long way further towards specifying the exactsemantics of data.

In conjunction with data transformation techniques agreed strategies towards theexpression of semantics in eTourism applications are crucial to the flexible integration

Page 24: eTOUR CWA final 2009-06-03

24 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

of heterogeneous data structures from a wide number of data sources. In this it isalso a central requirement for building flexible, cross-organizational process chains.

4.2.2 Data transformation

The co-existence of many data formats already implies the need to transform dataduring data exchange. This mapping can affect data structures on different levels:

Ontologies and taxonomies Structured data Unstructured data

Together with well-defined semantics data transformation is an essential tool tointegrate data sources and build cross-organizational processes.

Structured data: Structured data can be expressed in a number of syntactic formats(XML, csv, EDI etc.), but even within one “syntactic family” the concrete datastructures regularly conflict. For XML, standardized technologies such as XSLT areused to describe and execute the mapping between data sets. However, this usuallyinvolves loss of information and only works with a limited precision depending on thesimilarity of the underlying data models.

Unstructured data: Much of the eTourism-related data is only available inunstructured formats such as web sites. To use this data in automatic transactions isdifficult at best, as the semantics of individual data sets are quite unclear. Twostrategies can help to explicate their meaning: explicit manual semantic annotationand automatic information extraction:

Semantic annotation: Key information on a web page is explicitly added to thesite as metadata in a machine-readable format (e.g. a serialization format ofRDF).

Automatic information extraction: The unstructured information is automaticallystructured according to some predefined templates. The information is thenavailable for reuse.

Inter-ontology mapping: Often a number of independent ontologies compete in agiven domain, even more so in the case of overlapping domains. Furthermore,different, though related standards such as RDF-S, OWL and Topic Maps [ISO/IEC13250] are in common use to express ontologies syntactically or to describe theconstraints applied to it. These multitudes of approaches imply the need for referenceontologies such as the Harmonise ontology to exchange semantic information acrossindividual ontologies.

An even larger number of formats are currently employed to express taxonomies,many of which can be mapped to the reference system defined in the TerminologyMarkup Framework [ISO 16642].

Page 25: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 25

4.2.3 Process handling

The eTourism sector has evolved from isolated online presences to onlinetransaction platforms. The WWW has significantly boosted the use of ICT in thetourism industry and empowered travellers and tourists to access more informationfrom a wide variety of data sources. More and more consumers want to arrange theirstay on their own and combine different products to a unique bundle instead ofbuying pre-packaged tours. Dynamic packaging, for example, has become one of themost discussed buzz words at industry events, but without process interoperability itwill always remain a more local or otherwise limited phenomenon.

Consumers are getting more and more used to make online transactions, and itcomes to a crowding out process: Business actors have to follow demand to keep orexpand their market share. Traditional distribution channels are vanishing, and moreflexible and dynamic networks rise. A trend for outsourcing and focussing on corecompetences could be observed, leading to a more consumer-centric approach andallowing highly individualized and ad-hoc product design. This challenge brings with itthe need to orchestrate business processes flexibly and across organizations.

4.2.4 Metasearch

One of the prerequisites for process handling is the ability to identify the relevantplayers for potential joint processes and to find information across those players.Registries, especially federated registries, will play a leading role in describingpotential partners and their services. They will thus facilitate to bring them together.

Metasearch builds on shared or mapped semantics and data transformation toenable searches across different individual search components of heterogeneousinstances (platforms, websites, databases) and aggregate the results in a unified list.From a user’s perspective they offer thus a one-stop entry point to a specific type ofinformation (e.g., hotels or flights).

At present search components differ in their query syntax, which makes it difficult toscale metasearches and to spontaneously integrate new data sources. For the actualtechnical realization of metasearches agreed query strategies and query syntaxesare therefore desirable and being worked upon.

4.2.5 Object identification

Electronic transactions often hinge upon the idea of being able to uniquely identifythe objects on which they operate. A flight booking service needs to have a clear ideahow the flight booked actually maps to the physical event of a flight operatedbetween a point of departure A and a destination B. The mapping does not have tobe one-to-one – in our example a flight may have a number of identifiers throughcode-sharing –, but the object described must be distinct.

While object identification does work for flights, there are many other types of objectsin eTourism that do not have a unique identifier. One of the most important cases in

Page 26: eTOUR CWA final 2009-06-03

26 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

point is accommodation. There is at present no universally accepted scheme toidentify, e.g., a given hotel that should be booked.

4.3 Cross-cutting concerns / Prerequisites

By their very nature cross cutting concerns permeate the requirements for andimplementation of all topics. We will here outline major characteristics of theseconcerns that will be referenced and (if needed) expanded upon in the relevantchapters of the report.

4.3.1 Legal aspects

eTourism transactions do not happen in a void. They regularly transcend nationaland regional boundaries and are governed by the legal system(s) of the country orcountries concerned. Such transactions are almost always impacted by contractuallaw, such as the laws that govern the contract between the end user and the touroperator or service provider(s), between tour operators and their destinations, or theobligations of travel agencies towards their customers. Related to this we see thelaws regulating the redress that one of the contractual partners can seek in the caseof a perceived break of obligations.

Laws, however, influence many other areas in eTourism transactions. The followinglist is only indicative and certainly not a complete overview of pertinent legislation:

Reporting obligations on security and crime prevention, especially in the caseof air transport.

Classification schemes, e.g. in the form of legally defined eTourism terms (insome countries).

Customer protection laws setting, amongst others, minimum standards for thedata provided to end users.

Reporting obligations on statistics. Health and anti-discrimination regulations that can impact eTourism data and

processes. Media publication regulations when sharing or reusing media on Internet in

general.

Some countries and regions such as Oberösterreich even have dedicated laws ontourism (see http://www.oberoesterreich-tourismus.at/alias/lto/recht/410624/tourismusrecht.html).

These laws are only partially harmonized across Europe, [Directive 90/314/EEC] as adirective setting minimum pan-European standards for customer protections forpackaged tours being more an exception than the rule. Furthermore, the legalsystems of countries across Europe do not necessarily cover the same areas. Forexample, hotel classification is mandated by law in some countries such as Italy andGreece and does not even exist in others such as Finland.

Dynamic packaging (example): The example of dynamic packaging might illustratesome aspects of the impact of legal system on tourism. For pre-packaged tours the

Page 27: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 27

legal situation is quite clear from an end user’s point of view. Such tours are alwaysregulated by the national laws in question. The tour operator is from the customer’sperspective the only contractual partner [Freyer, 2006, p 234] [Directive 90/314/EEC]and responsible for providing all the services that were promised. It is also aloneresponsible for any possible redress that may result from unsatisfactory services.

The situation is much muddier for extras such as car rental at the place of destinationfor which a travel agency only acts as an intermediate. Dynamic packaging poseseven bigger problems in this direction. An intermediary – often a specialized travelagent – combines pre-assembled packages based on user preferences. The userdoes not administrate the different items in the package himself, but gets offers forpackages which are dynamically assembled based on his preferences. However, thelegal and contractual consequences of such dynamic bundles are not clear yet. Foran end-user such a bundle of sub-packages can imply also a set of separatecontracts which do not by themselves necessarily fall under the definition of“package” of [Directive 90/314/EEC]. In consequence, the contractual situations andthe legal mechanisms for redress can be quite more complex for dynamic packaging.An unnamed provider of software components for eTourism transactions named thisas the single biggest obstacle to the uptake of dynamic packaging.

4.3.2 Multiculturalism

Related to legal aspects are the multicultural facets of many eTourism transactionswhich span cultures and frequently involve both very small and very large players,thus also mixing organizational cultures. Culture here is a much wider concept thanhigh culture and covers “the set of distinctive spiritual, material, intellectual andemotional features of society or a social group, and [...] encompasses, in addition toart and literature, lifestyles, ways of living together, value systems, traditions andbeliefs” [UNESCO, 2002]. Europe in particular is characterized by multiculturalismright down to its official motto, unity in diversity.

Many cultural preconditions have influenced local description systems such as ratingsystems for accommodation or classification of beaches which in some countries arenationally or even regionally regulated. Others such as usual opening hours of sitesor food offerings follow usually local customs without being subject to laws.

Multilingualism: Languages are an integral and often defining part of cultures, and assuch multiculturalism includes multilingualism, the coexistence of many languages.Until around the turn of the millennium the treatment of multilingual data in computersystems posed major problems. However, the widespread adoption of the UniversalCharacter Set (UCS) [ISO/IEC 10646], also known as Unicode, and its companionstandards has changed the game. The UCS is supported in virtually all currentoperating systems and many application programs including all major browsers andemail clients. XML is squarely based on the UCS. Thus both the internalrepresentation, the exchange and the display of multilingual data is now quiteunproblematic.

That said, some of the Global Distribution Systems (GDSs) that are at the core ofmany eTourism transactions stem from the 1950s and 1960s, and even the youngestof the “big four” GDSs, Amadeus, was written in the 1970s and 1980s. In this they

Page 28: eTOUR CWA final 2009-06-03

28 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

long predate the UCS and have at best sketchy support for multilingual data. Many tothis day operate on subsets of ASCII. This obviously can create considerable issuesnotably for the handling of personal names and the names of organizations. It isoutside of the scope of this report to elucidate these issues in detail, though it wouldbe highly beneficial to be able to get this overview.

Taxonomies and terminology are another important area in which data is necessarilylanguage-dependent. The exact definitions of categories such as “double room” (withor without children), “luxury hotel” etc. will reflect the understanding in a givenlanguage and culture.

Accommodation ratings (example): A concrete example for classification systemsthat reflect on legal, cultural, linguistic, not to forget (in some countries) personalpreferences are accommodation ratings. As we have seen, some countries requirehotels to be classified whereas others do not even have a national classificationsystem. And where they do exist, the quality criteria differ widely from country tocountry, as an overview such as http://www.hotelstars.org/ shows. For example, athree-star hotel in Germany (http://www.hotelsterne.de/uk/system_kriterien.php) isguaranteed to have rooms 14 m2 or bigger for singles and 18 m2 or bigger fordoubles with bilingual employees and a reception that is open 12 hours a day, tosingle out only a few of the criteria. The rooms in a three-star hotel in Poland(http://hotelarze.pl/en/regulations/) on the other hand must only have 10 m2 resp. 14m2, but a minimum 12 hour room service, but do not need to command foreignlanguages. Many other criteria are not even comparable as the overall schemes arequite different.

In countries such as the USA the officious AAA classification is complemented bymany classification systems that are specific to individual travel websites such asExpedia or Travelocity and at times even calculate customer feedback into thefigures. Depending on the (often non-transparent) weighting of criteria these sitesarrive at quite different ratings, which in turn may deviate again significantly fromcustomer ratings [Grossman, 2004]. With the increasing dominance of internationalsites these additional ratings are going to start competing with the official or officiousnational European rating systems.

In view of this multitude of taxonomies, which may or may not in turn coincide withthe customer’s own cultural and personal preferences, ratings will have to be basedon specific properties of accommodation, and, for that matter, general service, ratherthan on general classifications alone. Searches for “hotels with WiFi, restaurant androoms over 20 m2” are likely to produce more acceptable results for users of manycultural background than searches on “3-star” alone.

4.3.3 Business models

Each player in the tourism industry operates on an implicit or explicit business model.The value proposition can be the traditional offer of a service, e.g., accommodation; itcan be the “convenience” and consulting proposition of a travel agent, or the“integrated packaging” approach of a tour operator, to name only a few. Much of thethrust of customer-driven eTourism transactions stems, in fact, from the desire todisintermediate the industry or, at least, to offer a new type of intermediaries that

Page 29: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 29

operate automatically and can thus compete primarily on the price front and, in part,are closely related to today’s GDSs. The GDSs themselves operate on two related,but distinct business models, namely of being a service company for major serviceproviders such as airlines, and as an integration platform for intermediaries.

All eTourism activities must be seen in the context of the relevant business models.They dictate the initial willingness to interchange data and to engage in cross-organizational processes. In much of a sense this willingness is a premise for thisreport.

4.3.4 Technology

The advent of the World Wide Web makes a watershed also for the tourism industry.As we have seen, GDSs have been operational since the early 1960s, but theydepended on highly proprietary distribution networks to allow travel agents to interactwith them. The advent of videotext systems such as BTX in Germany and Minitel inFrance and similar technologies in the 1980s somewhat opened and standardizedthese channels, but by and large the communication channels remained accessibleonly to professional intermediates.

The success of the WWW has largely standardized the communication channelsbetween providers to standard internet protocols; not necessarily http, though, asmany larger data sets are still transferred using ftp or related protocols. Theunderlying technology has in many cases changed much less, though, with today’sGDSs largely operating on the same transactional stacks as before, but some –though by no means all – of its details have been abstracted away through thecommon protocols.

This standardization on common network protocols has allowed for the rise ofcollaboration standards such as SOAP-based Web Services, XML-based dataformats, semantic standards and, last but not least, the http standard itself that isagain in today’s emphasis on RESTful web services. This report concentrates on theinteroperability layer between implementations.

Page 30: eTOUR CWA final 2009-06-03

30 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

5 Case study

Mechanisms and solutions for electronic data exchanges in the tourism industry weredeveloped a long time ago at first by airline companies in order to allow them to beable to exchange data about flights and bookings. Different standards emerged fromthose initial operational exchanges, taking into account the limitation of the means ofcommunications of that time.

Over the years, the need to access inventory, prices, booking files, customer dataand sales or descriptive information has boomed, first through the development of theGDSs (Sabre in 1960, Galileo in 1971), main CRSs (Pegasus, Wizcom, etc.) andmore recently with the web, used both for B2B and B2C applications.

The thematic circle introduced earlier (4.1) will be illustrated through the followingcase study. The base guideline for the case study corresponds to a consumer (enduser or travel-related professionals) wanting to book a trip or gather travel-relatedinformation using information and communication technologies.

Figure 5-1

The case study is first detailed in terms of different trip phases and correspondinginformation needs and processes to be used by the consumer:

Before the trip to end up with a booked travel; Before the trip to increase his knowledge around the trip, update the trip itself,

etc.; During the trip to amend his trip or input comments, media, etc.; After the trip to testify, complain, etc.

Platforms, technologies, types of information and data sources are reviewed withinthe case study. Some drawbacks and limitations, gaps and future needs will also beidentified and associated to the elements of our thematic circle, which will then bedetailed later in the document.

Page 31: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 31

5.1 The processes

5.1.1 The actors

We consider the case study to be as general as possible and include any type oftourism actors and business processes. In the context of the case study, wedifferentiate the following types of tourism actors:

end consumer (traveller, customer booking for somebody else, etc.), travel related professional (incoming agent, tour operator agent, travel

consultant, etc.).

Figure 5-2

5.1.2 Consumer process

Buying a trip taking advantage of the web can be seen as a four steps globalprocess:

Discovering:o select possible destinations and types of trips based on personal or

family interests (a particular activity or hobby, a destination, etc.);o select according to a season (winter sport, sun in winter, etc.);o investigate prices and opportunities, accommodations, services, events,

etc.;o explore recommendations and ratings from other travellers;o etc.

Shopping: to match reality with expectations:o compare prices;o compare content of offers (similar offers, different types of trips, etc.);o investigate testimonies.

Constituting the trip itself by:o validating price and availability for a trip from a unique vendor, oro amalgamating components from different vendors – such as hotel

vendor, pre-packaged tour vendor, airline company, etc. (in a uniquebooking or in multiple bookings);

o requesting bids or quotes or alerts from different vendors. Finalizing the buying process (confirmed or option booking(s)):

o finally buy from a unique vendor, oro buy the amalgamated components (stored in a unique or multiple

bookings);o add links to reference data (to keep track of weather, health or country

data, activities, testimony, etc.);o pay (deposit or total).

Page 32: eTOUR CWA final 2009-06-03

32 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

Once a booking is finalized, this is not the end of the process. Certain consumerswould continue browsing the web to

complement: Search for additional information, testimonies, activities toperform, exchange with people having travelled in the same club or region;

bargain: Find better opportunities and counter proposals or complements;

manage: Simply update their booking(s) to take into account new information(a change of plan, more people joining, new activities to cram into the agenda,make a special request for a meal, print the e tickets or itineraries, consultingwith specialists, pay the due amounts, etc.).

Finally, during or after the trip comes the part that is now booming with the web 2.0sites: The consumer could

testify: He will add its own piece of information on the web, using forums,testimony sites, polls;

publish new generated content, such as media, text;

enrich its profile(s) on the different sites in order to keep in touch withopportunities in relation with their interests;

follow his subsequent trips in case he actually prepared more than one trip orhe acquired components that would be valid on several trips;

share common interests in order to organize group events;

possibly file and follow up a complaint.

This is illustrated by the schema in figure 5-3.

Page 33: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 33

Figure 5-3

5.1.3 Travel-related professional process

Not all end consumers perform the whole process online, but require assistance forparts or all of the life cycle of a booking. In that case, part of the above mentionedactivities would apply in a B2B framework, with more or less the same features.Additionally, travel professionals would also consume specific expert processes notnecessarily available for end consumers, such as

air ticketing in case of negotiated fares; building complex itineraries including items that cannot be found or bought

online; finding availabilities or better prices where automated systems would not; bringing added value services or expertise that would correspond to the

differentiation of the distributor (specialized destination or activity, luxury trips,etc.).

Other professional processes also revolve around the major task of publishing datafor professional and end consumer use, such as

publishing fares, providing information on products and destinations, referencing other sources of information, selecting and ranking data (vendors, destinations, etc.), etc.

Page 34: eTOUR CWA final 2009-06-03

34 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

Those additional processes either rely on the same systems, platforms andcommunications means as the ones available to end consumers, but with advancedfeatures, rely on specific systems not available to end consumers or end up beingmanual.

5.2 The information and communication technologies

The present case study supposes that the consumer uses information andcommunication technologies (over the web or a private network, a public web site, arestricted B2B site, a dedicated rich application, etc.) to consult travel information.The case study considers that the front system used by the end consumer takesadvantage of multilevel sources, possibly even of a multilevel dynamic network oftravel related services. Though sources may publish heterogeneous structured andnon-structured data, the front system would still provide homogeneous access to itsend user for all the data they publish. The distributor should have the responsibilityand choice of the final formatting and proposed processes.

This is of course only possible depending on the flexibility of the exchanges, on theformats made available by the sources and intermediates, on the extensive use ofsemantic web and other mechanisms allowing automated exchanges and recognitionof meaning and data. This will be detailed in the present document.

The user may also consult different sites in parallel, therefore initiating differentprocesses. This behaviour is considered outside the present case study.

5.2.1 Multiple levels of data sources

Figure 5-4

The owner of the communication and information technology would usually own oneor several data sources and directly make us of them. That would be the case forinstance for a hotel group for its hotel data (editorial text, prices, availabilities,comments, etc.).

Page 35: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 35

The front system may also connect to other external sources to aggregate additionalinformation. A hotel chain may not own the inventory of each hotel and couldinterrogate the different hotel PMS or hotel groups CRS to validate the availability.

Each of those additional external sources could therefore either own the data or itselfaggregate content from other sources, therefore creating a chain of sources involvedin a single request from the consumer. That would typically be the case for an onlinesite like Opodo dynamically requesting airline availability and fares from a GDS(Amadeus in our example), itself launching requests to different airlines in relationwith the expected city pair.

The added value of using layers of sources would reside in their capacity to

concentrate coherent data from different sources (such is the case of GDSsfor airlines, comparators);

enrich data from a source by either directly adding data or by concatenatingdata from other external sources (like web sites proposing different types oftrips).

Online agencies such as Expedia or Opodo also have back office systems to enterand maintain editorial data, price lists and stocks. That would be their own datasource. They typically do not own destination, weather, policy or health related databut use external sources such as Lonely Planet or government web sites. Thosedistributor in-house systems also usually connect to GDSs (Global DistributionSystems such as Amadeus or Galileo) to request airline fares and availability. Wewould be in the situation where an intermediate data source browses other externaldata sources for information.

This need for a distributed architecture composed of distinct systems around theworld and owned by different companies with various strategies and technologieslead to a number of constraints and requirements identified as cross-cutting aspectsin the previous introduction:

Technical aspects come first to mind, with the need to ensure compatibility ofthe different systems, increase the reliability of the individual elements,measure the impact on architectures and scale accordingly. Performance ofthe different systems and of the overall chain is key and leads to additionalcomplexity (such as caching, uniqueness of data, etc.).

Business models must also be taken into account because making money iscentral for the complete system to work smoothly. There must therefore be thecapacity

o to use other systems against retribution (fixed price, price pertransaction, percentage of a booking, etc.);

o to add mark-ups along the chain and still get a competitive price;o to access net prices directly on intermediate levels in the chain;o etc.

Legal aspects is equally important, with the necessity to ensure thato the information and products found and possibly purchased on the

different systems can legally be purchased or used;

Page 36: eTOUR CWA final 2009-06-03

36 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

o the distributor and the end user will have the capacity to track individualproviders so that they fulfil their obligation (provided there is the samenotion at the provider’s place), in case of any issue.

Even multiculturalism is present when speaking about systems composing thecomplete infrastructure:

o Provision of services (and support) on a 24 hours basis and not to stopservers during the night is unusual in certain countries or for smallcompanies.

o Documentation to consume the service may not be written in a widelyused language such as English or with multiple translation.

The main topics involved to allow process and data interoperability also come intoplay in case of multilevel data sources:

Object identification is mandatory to avoid cumbersome and time-consumingtranscoding to allow data enrichment along the chain and ultimatelycomparison and cleaning of results.

Semantics provide the structure behind the data in order to have coherencebetween the layers and to merge the information after some datatransformation has been performed.

Data transformation is central to the implementation of multilevel data sourcesbecause data seldom share the same formats even when based on the samestandards.

Metasearch is the key to search for services dynamically and have looselybound systems.

Without efficient process handling, multi level data sources efficiency willremain minimal and would only correspond to juxtaposing data from differentservices without true interactions.

With all these elements in place in the multilevel data sources scenario of our casestudy, we could have complex processes in place like dynamic packaging forinstance with

data interoperability, sharing and grouping objects with different identifiers andsemantic definitions, and

process interoperability:o compatibility of the different exchanges for each sub process;o capacity to have evolution only on certain components of the system;o etc.

5.2.2 Type of information

The type of information that front platforms would provide in our case study are:

product-related data (editorial data, testimony, media, prices, availability,technical data detailing the travel (flight numbers, airlines, type of rooms, etc.),marketing qualification;

Page 37: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 37

destination-related data (geography, health, climate, history, activities,religions, etc.);

customer-related data (in case user is a known customer or consultant –identity, preferences, past trips, relatives and family members, additionalqualification, etc.);

etc.

According to the type of information, different types of issues and needs arise, thatwe have again grouped based on our thematic circle with first the topic of ourthematic cycle and then the pre requisites:

Object identification to pinpoint unique identical elements:o Same object, but coming from different sources (e.g. the same hotel

from different web sites);o Same object, but within aggregated content (e.g. a hotel within a

packaged tour same as a hotel sold alone).

Semantics:o Identify meaning of information (e.g. identify climate information in

country-related content);o Provide explicit objective structured definitions and rules and not just

transcoding features (e.g. to explain for each provider what a doubleroom is – like D = exactly 2 adults whereas DBL may contain 2 adultsbut would also accept a child in an extra bed).

Data transformation:o Extract media from text;o Extract text from HTML;o Extract data with certain meaning from a complete document;o Map different ontologies to be able to share information (e.g. to

understand that a D Room for a provider is a DBL room for another anda double for a third);

o etc.

Process handling:o Network capacity to ensure that a complete complex process would be

able to run on separate systems with true system independence;o According to the presence or absence of certain data, launch certain

process, stop certain process, direct to alternative sources;o Cost effectiveness;o Reliability, stability and performance to ensure that the user will not

suffer from failure or inefficiency of one or several process chainelements;

o Launch alerts according to certain contents or lack of certain contents;o Ensure security over the complete process chain;o Updates of data sources (through a manual request, remotely, with

automatic synchronization, etc.);o etc.

Metasearch:o Perform queries without being hindered by specific query syntax;

Page 38: eTOUR CWA final 2009-06-03

38 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

o Browse the internet for certain types of information without having tosetup the searches manually.

Multiculturalismo Multilingualism of content (information in Italian provided by the Italian

government to be used in the USA);o Access the right media for the audience (pictures with or without

people, certain colours of hairs, etc., would be more or less appropriatein certain countries for instance);

o Dynamic translations.

Technology:o Share communication protocols or at least use interoperable

communication protocolso Data accessibility (XML-based format to publish the data accessible via

web services versus csv file to be sent by mail);o Large updates of data sources (large amounts of data for each source,

multiplication of sources, etc.).

Legal aspects:o Reliability of the data itself;o Estimate the quality of the data;o Determine the legal constraints associated with a piece of information.o Conditions to use and distribute the data;o Condition to store data not owned;

Business models:o Business model in relation with the use of the data;

5.2.3 Type of data sources

Typical data sources in the travel industry are

GDSs (Galileo/Worldspan, Amadeus and Sabre – Allowing access to airline,car rental, hotel, ferry, insurance, leisure, etc. systems – via XML, Edifact orflat file transfers);

Specialized online platforms providing structured data for a certain type ofactivity (Car rental companies, Hotel concentrators, Tour operators,destination management systems, etc., usually via XML based web servicesor XHTML data);

Web sites providing HTML based data (structured and non structured).

This is illustrated in figure 5-5.

Page 39: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 39

Figure 5-5

As introduced in the previous chapters, each data source may in turn connect tomultiple data source of the same type or of other types.

For instance, a specialized pre-packaged provider could

Connect to Galileo for scheduled airo Galileo would directly connect to Delta airlines buto Galileo would connect to Amadeus to get Air France flights

Connect to Pegasus for hotelso Pegasus would hold certain inventories buto Pegasus could access Gulliver or Transhotels for others

Connect to Trip advisor to get testimonies both on airlines and hotels etc.

Page 40: eTOUR CWA final 2009-06-03

40 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

6 Semantics

6.1 Standards

6.1.1 Needs and requirements

6.1.1.1 Introduction

The first word that may come to mind when talking about data and informationinteroperability and exchange is “standards”. Standards have traditionally beenwidely used in different industries. The general goal of standards and standardizationis to allow compatibility, interoperability, safety, repeatability, quality, etc. The processof developing and agreeing upon a general standard is known as standardization.

Generally speaking a standard is an established norm or requirement that needs tobe followed in order to allow components (of different nature and origin) to fit andwork together. It is usually a formal document that establishes uniform engineering ortechnical criteria, methods, processes and practices designed to be consistently usedas a rule, guideline, or definition.

The ISO/IEC Guide 2 defines a standard as “a document established by consensusand approved by a recognized body that provides for common and repeated use,rules, guidelines or characteristics for activities or their results, aimed at theachievement of the optimum degree of order in a given context”. Standards help tomake life simpler and to increase the reliability and the effectiveness of many goods,services and processes. They are intended to be a summary of good and bestpractices rather than general practice. Standards are created by bringing together theexperience and expertise of all interested parties such as the producers, sellers,buyers, users and regulators of a particular material, product, process or service.Standards are designed for voluntary use and do not impose any regulations.However, laws and regulations may refer to certain standards and make compliancewith them compulsory. For example, the physical characteristics and format of creditcards is set out in International Standard ISO/IEC 7810:1996. Adhering to thisstandard means that the cards can be used worldwide.

Within the computer science domain and Information and Communication Techno-logies standards have also been widely used and are becoming increasingly moreimportant. There are a vast number of both software and hardware developers andmanufacturers worldwide that produce different items. These items do need to followparticular standards in order to work together in a satisfactory manner. As the amountof information contained on the Internet increases every second, a unified represent-tation for web data and resources is needed in today’s large scale Internet datamanagement systems. This unification of standards will allow machines to meaning-fully process the available information and to (successfully) exchange and integratedata coming from distributed databases and information management systems. Thishas been occurring, e.g. in the context of eLearning with the development of theSCORM (http://www.adl.net/) and AICC (http://www.aicc.org/) standards, or in thecontext of telemedicine applications with the development of standard data transportprotocols such as HL7 and ISO/IEEE 11073, among others.

Page 41: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 41

This is also mandatory in the tourism sector as it is changing from a labour-intensiveindustry into a knowledge and information-intensive industry. In the tourism domainthe usage of information systems to support market processes has not reached thegoal of a single electronic tourism market. The complex structure in traditionaltourism markets, characterized by a lot of different distribution channels and longvalue chains, was transformed one-to-one to its electronic counterpart. The resultwas a multitude of different electronic tourism markets. The most important obstacleto a single tourism market is the missing commitment of all market participants on thesemantics of information to be exchanged as well as on the method for theexchange.

There have already been some efforts invested in this direction (see 6.1.2) in order toenable distributed data exchange and integration. Interoperability between databasesand information sources needs to be provided on both a technical and informational(semantic) level. The social value of the Web is that it enables human communi-cation, commerce, and opportunities to share knowledge, information and experien-ces. One of W3C’s (World Wide Web Consortium, http://www.w3c.org/) primary goalsis to make these benefits available to all people, whatever their hardware, software,network infrastructure, native language, culture, geographical location, or physical ormental ability might be.

6.1.1.2 Needs

Benefits of use of standards

Standards have proved to be a powerful tool for organizations of all sizes, supportinginnovation, increasing productivity and efficiency in their business processes.Effective standardization promotes competition and enhances profitability, enabling abusiness to take a leading role in shaping the industry itself. Generally speaking,standards allow a company to:

attract and assure customers; demonstrate market leadership; create competitive advantage; develop and maintain best practice.

Standards within business

In modern business effective communication along the supply chain and withlegislative bodies, clients and customers is imperative. Applying standards within theeveryday operation of a company provides the means to measure various variablesand thus, to be able to manage the evolution of the variables, providing benefitswhen applied within the infrastructure of a company itself. Business costs and riskscan be minimized, internal processes streamlined and communication improved.Standardization promotes interoperability, providing a competitive edge necessary forthe effective worldwide trading of products and services.

Page 42: eTOUR CWA final 2009-06-03

42 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

6.1.1.3 Requirements

Within the tourism industry standards may help companies to be more competitive interms of being present on the web by complying with information and communicationstandards and recommendations. In order to achieve exchange and integration ofinformation through different information systems, information formats and transferprotocols must be compatible and ought to allow any hardware and software used toaccess the information to work together.

Furthermore, information integration and exchange are required to provide trade andcommerce operations capacity on web sites, so that a local company be globallypresent through the web and increases its business opportunities.

In order to do this there must be some kind of standard or sufficiently agreedcommunication protocol between information systems that are interchanginginformation. It may be standards about data repositories (how to structure data in arepository or how that data ought to be named) or standards about intermediationsystems, such as HarmoNET ontology.

Regarding Web and information standards one of the most active bodies the W3C.W3C designs and promotes interoperable open (non-proprietary) formats andprotocols to avoid the market fragmentation of the past. A W3C Recommendation isthe equivalent of a web standard, indicating that this W3C-developed specification isstable, contributes to web interoperability, and has been reviewed by the W3Cmembership, who favours its adoption by the industry.

To reach a general information standard in order to enable information integrationand exchange within the tourism sector is a relatively complex activity. Thus,recommendations by official and recognized bodies (such as W3C) in this directionare to follow to allow tourism companies’ information management systems toeffectively process all information and to interoperate.

6.1.2 State of the art

As mentioned before, a standard is “a document established by consensus andapproved by a recognized body that provides for common and repeated use, rules,guidelines or characteristics for activities or their results, aimed at the achievement ofthe optimum degree of order in a given context”.

ETSI is the European Telecommunications Standards Institute. It is an independent,non-profit organization in the telecommunications industry in Europe with world-wideprojection. ETSI is officially responsible for standardization of Information andCommunication Technologies (ICT) within Europe.

ETSI standards could be described in general as being “definitions and specificationsfor products and processes requiring repeated use”. They are certainly a set of rulesfor ensuring that a process is always carried out the same way with a certain degreeof quality, or that a product is always manufactured following the same tasks and thetasks the same order, also, complying with a certain degree of quality assumed to begenerally satisfactory.

Page 43: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 43

A more complete definition of a “standard” from an ETSI perspective would be: “Atechnical specification approved by a recognized standardization body for repeatedor continuous application, with which compliance is not compulsory and which is oneof the following:

International Standard: a standard adopted by an international standardizationorganization

European Standard: a standard adopted by a European standardization body National Standard: a standard adopted by a national standardization body and

made available to the public.”

(Source: Directive 98/34/EC definitions.)

ETSI standards making priorities include:

fully specified scoping; consistent use of specific terms; accurate referencing; contextualizing of abbreviations; accuracy and completeness of technical content; clear and unambiguous requirements; legibility and comprehension.

Two major objectives of ICT standardization are interconnection and interoperability.ETSI’s uncompromising approach facilitates these by ensuring content is easilyinterpretable, understandable and unambiguous. Only this level of attention to detailcan produce the truly high quality standards that Industry, Operators and Users nowdemand to grow their increasingly global markets.

Standards can be found throughout daily life, but why would we need to usestandards? Rather than asking why we would need standards, we might usefully askourselves what the world would be like without standards. Products would not workas expected. They would be of inferior quality and incompatible with other products orequipment, in fact they would not even connect with them, and in extreme cases;non-standardized products could potentially be dangerous.

From a user’s standpoint, standards are extremely important in the computer industrybecause they allow the combination of products from different manufacturers tocreate a customized system. Without standards, only hardware and software from thesame company could be used together. In addition, standard user interfaces canmake it much easier to learn how to use new applications.

Most official computer standards are set by one of the following organizations:

ANSI (American National Standards Institute); ITU (International Telecommunication Union); IEEE (Institute of Electrical and Electronics Engineers); ISO (International Organization for Standards); VESA (Video Electronics Standards Association).

Page 44: eTOUR CWA final 2009-06-03

44 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

6.1.2.1 Types of standards

The primary types of technical standards are:

A standard specification is an explicit set of requirements for an item, material,component, system or service. It is often used to formalize the technicalaspects of a procurement agreement or contract. For example, there may be aspecification for a turbine blade for a jet engine which defines the exactmaterial and performance requirements, shape, etc. This guarantees thatcomponents produced by different manufactures may be used and assembledin the same product and perform as expected.

A standard test method describes a definitive procedure which produces a testresult comparable with a reference in order to validate a certain product. Itmay involve making a careful personal observation or conducting a highlytechnical measurement. For example, a physical property of a material is oftenaffected by the precise method of testing: any reference to the property shouldtherefore reference the test method used.

A standard procedure (or standard practice) gives a set of instructions forperforming operations or functions, usually tasks to be carried out in aparticular order. For example, the quality assurance system at companiesensures that all procedures within one company have been identified, definedand are always carried out the same way.

A standard guide is general information or options which do not require aspecific course of action.

A standard definition is formally established terminology and sufficientlyagreed within the expert community.

6.1.2.2 List of travel industry standards, companies and organizations(examples)

Following there is a classification and description of standards within or with directrelevance for the tourism domain. Standards and initiatives have been assigned tofive different categories: Tourism initiatives and vocabularies; eBusinessvocabularies; eBusiness frameworks; Business semantics; Modelling languages:

Tourism initiatives and vocabularies:o ACRISS: The Association of Car Rental Industry Systems Standards

has devised a car coding system, the ‘ACRISS Code’ This identifies thefeatures of a car so that you can be sure your client gets the samestandard of car wherever they rent in Europe from an ACRISS Member.

o ANSI ASC X121 TG08, American National Standards Institute: TheANSI X12 standards have been the first branch-independent standardsfor EDI, but their focus is only on the North American market. TodayANSI X12 has specified more than 275 document types, so-calledtransaction sets to be used in B2B. Similarly, to UN/EDIFACT the ANSIX12 syntax is based on hierarchical structuring and implicit dataelement identification. However, X12 has its unique set of notations and

Page 45: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 45

rules on representations. X12 does not make use of composite dataelements. ANSI ASC 12 is divided into branch-specific subcommittees.The subcommittee X12I is responsible for the area of transportation.Each subcommittee consists of multiple task groups. X12I TG08 is thetask group for travel, tourism and leisure.

o CEN/TC 329, European Committee for Standardization / TechnicalCommittee Tourism Services: The Technical Committee 329 TourismServices of the European Committee for Standardization focuses on thestandardization of terminology and specification of facilities andservices including tourism related activities that can be used ininformation and reservation systems. Accordingly, CEN/TC 329develops a European glossary of definitions for tourist terms. Theproject has brought together numerous national and international tradeassociations and interest groups as well as tour operators, publicinstitutions and consumer groups. The glossary covers domain know-how that should be captured by an ontology in the field of tourism.

o DATEX: DATEX is a European task force set up to standardize theinterface between inter-regional Traffic Control Centres. Thestandardization work resulted in the Data Exchange Network (DATEX-Net) Specifications for Interoperability, which is a set of basic tools toprovide a common interface including a common Data Dictionary, acommon set of EDIFACT messages and a common Geographicalmessaging system.

o Enjoy Europe: In the enjoyeurope initiative, previously InTouriSME,launched in 1996, 40 European regions agreed on a common metadatadefinition (Minimum Data Set MDS) about key tourism information,federating the local legacy systems and encapsulating the metadatadescriptions. The first tangible result of this federation is theEnjoyEurope portal. In addition the achieved European tourism datainteroperability will provide the critical-mass information base to newservices.

o HEDNA stands for Hotel Electronic Distribution Network Association,http://www.hedna.org/. HEDNA is a global association focused onindentifying distribution opportunities and providing solutions for thelodging industry and its distribution community. HEDNA’s activities areintended to stimulate the booking of hotel rooms through the use ofGlobal Distribution Systems, the Internet and other electronic means.HEDNA works on the following directions:

Optimizing the use of current and emerging technologies; Providing an opportunity for an open exchange of information

among members; Educating industry partners.

HEDNA has produced the Unique Global Identifiers forthe Hospitality Industry, UGI, which is a unique referencenumber to identify and provide information aboutoperational units within the hospitality industry. A UGI is arandom number that is attached to attribute andrelationship information.

Page 46: eTOUR CWA final 2009-06-03

46 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

o HITIS, Hospitality Industry Technology Integration Standards: The goalof HITIS is to identify general functions (of property managementsystems) and standardize their implementation. In addition, a commondata dictionary for hospitality relevant data is to be developed. HITISprovides an object standard and therefore specifies standardizedinterfaces for objects providing the identified functions. The objectstandard is additionally provided as XML specifications.

o IATA: The International Air Transport Association. The main objective ofthis association is to assist airline companies to achieve lawfulcompetition and uniformity in prices. IATA assigns the followingstandard identifiers:

IATA Airport Codes, or also IATA location identifier, to designatemost of the airports around the world;

IATA Railway station Codes: Following the idea of the AirportCodes, IATA has also labelled different railway stations in theworld, especially if there is an agreement between a flyingcompany and a railway provider;

IATA Airline designation code that identifies airlines operatingworldwide;

IATA also assigns codes to delays.

o IATAN: IATAN’s mission is to promote professionalism, administermeaningful and impartial business standards, and to provide cost-effective products and services that benefit the travel industry. Throughthe use of its informational and other resources, IATAN provides a vitallink between the supplier community and the US travel distributionnetwork.

o ICAO: The ICAO Council adopts standards and recommendedpractices concerning air navigation, prevention of unlawful interference,and facilitation of border-crossing procedures for international civilaviation. In addition, the ICAO defines the protocols for air accidentinvestigation followed by transport safety authorities in countriessignatory to the Convention on International Civil Aviation, commonlyknown as the Chicago Convention. The ICAO also standardizes certainfunctions for use in the airline industry, such as the AeronauticalMessage Handling System AMHS; this probably makes it a standardsorganization. The ICAO defines an International Standard Atmosphere(also known as ICAO Standard Atmosphere), a model of the standardvariation of pressure, temperature, density, and viscosity with altitude inthe Earth’s atmosphere. This is useful in calibrating instruments anddesigning aircraft. The ICAO standardizes machine-readable passportsworld-wide. Such passports have an area where some of theinformation otherwise written in textual form is written as strings ofalphanumeric characters, printed in a manner suitable for opticalcharacter recognition. This enables border controllers and other lawenforcement agents to process such passports quickly, without havingto input the information manually into a computer. ICAO publishes Doc9303, Machine Readable Travel Documents and the technical standardfor machine-readable passports. A more recent standard is forbiometric passports. These contain biometrics to authenticate the

Page 47: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 47

identity of travellers. The passport’s critical information is stored on atiny RFID computer chip, much like information stored on smartcards.Like some smartcards, the passport book design calls for an embeddedcontactless chip that is able to hold digital signature data to ensure theintegrity of the passport and the biometric data.

o IFITT RMSIG, IFITT Reference Model Special Interest Group: Theobjective of the IFITT Reference Model Special Interest Group (IFITTRMSIG) is the harmonization of electronic tourism markets in an openand flexible manner, based on a reference model. The main purpose ofthe IFITT RMSIG is bringing together the different market participantsand domain experts to ensure a broad acceptance of the referencemodel. The reference model, provided by the IFITT RMSIG, is aframework for modelling electronic tourism markets. Instead of fixstandardization, the reference model enables the flexible description ofspecific models, based on a common modelling language andstandardized building blocks as vocabulary. The purpose of thereference model is to enable the description of specific models forspecific standards or data exchange formats in a form understandableby other market participants and enable a mapping between differentstandards. Suppliers of tourism services or brokers within the tourismmarket can use the reference model to describe their specific standardsor data exchange formats which can be understood and used by othermarket participants. In this way, not only new but also existingstandards or data exchange formats can be integrated into one openelectronic tourism market.

o JourneyWeb: The project JourneyWeb researched, designed anddeveloped an Internet-based protocol for dynamic exchanging ofelectronic schedule data between distributed heterogeneous computingsystems, allowing any telephone enquiry centre or Internet-basedservice universal access to any public transport information, regardlessof location. These services allow any traveller to obtain an unbiasedselection of integrated journey alternatives, which may contain tripsfrom multiple travel modes (train, bus, coach, air), and be remotelysourced from different databases and software suppliers.

o KAREN, Keystone Architecture Required for European Networks: TheKAREN Framework Architecture gives support when intelligenttransport system (ITS) implementation is being planned and prepared,and offers a basis for an European integrated approach to ITS. Theorganization of the project was designed to facilitate the completeprocess to be followed from the establishment of Europeanrequirements, through the production of a comprehensive EuropeanTransport Telematics Framework Architecture, to the creation ofconsensus and endorsement of the results.

o omnis-online: omnis-online is an electronic marketplace platform for thetravel and tourism industry. The essential feature of omnis-online is thecontractual and procedural framework which makes it possible forbuyers and sellers of holiday and travel products, in different parts ofthe world, to trade together. omnis-online provides a standards book,which is a statement of procedures, rules and definitions to govern the

Page 48: eTOUR CWA final 2009-06-03

48 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

use of omnis-online, especially including product description standards(based on XML).

o OTA, Open Travel Alliance: The Open Travel Alliance aims to promotethe free flow of travel services through multiple distribution channels.Therefore, the objective is to provide a vocabulary and grammar forcommunicating travel-related information as tags across all travelindustry segments. These tags will be implemented using theeXtensible Markup Language (XML).

o SIGRT, Sistema de Informaçao de Gestao de Recursos Turísticos:SIGRT is an information system that serves as a global reference in thepromotion of national tourism products in Portugal. The enormoussource of information spanning over multiple touristic sectors is madeavailable to the general public, to tourism operators and other public orprivate institutions in the sector. The system is based on a tourismresource database that enables the storage management andavailability of data elements regarding the national tourism services. Inthe scope of SIGRT are defined standardized data structures fordescribing tourism products. The information is structured according to190 different, identified types each having its separate data structures.

o TIH, Travel Information Highway: The Travel Information Highway (TIH)is an open communications approach to facilitate the exchange of realtime information between network operators themselves and with driverinformation service providers. It was developed with the objectives ofco-ordinating network operating strategies and providing high qualityinformation services to the travelling public.

o TIN, Tourism Information Norm for the German tourism: The TIN aimsto provide rules for a uniform presentation and search structure withinthe information and reservation systems of the German tourism.Therefore, it defines and structures the characteristics for describingtourism services and specifies the access to the tourism services.

o TourinFrance: TourinFrance aims to develop a nation-wide commondata format for describing and exchanging tourism information. Thisdata format respects the independence and autonomy of all theregional and local tourist information systems, but should allow someday to aggregate the local contents at a national level. This nationalproject will therefore enable the exchange of data between the existingplayers involved in the French tourism: the French governmentservices, the regions, the counties, the tourist offices, and the majortourism federations. The format, which is NOT based on XML, allowstransmitting information about the following touristic entities: hotels,campsites, self catering, restaurants, events, natural sites, culturalsites, leisure activities, tourist routes, and holiday villages, and holidayresorts. Currently, TourinFrance does not support any commercialtransactions.

o Transmodel: TRANSMODEL is a reference data model for PublicTransport operations. TRANSMODEL is a description of the data ofinterest to a company in designing an Integrated Information System.TRANSMODEL is a conceptual model and does not mandate any

Page 49: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 49

particular implementation at the logical or physical level.TRANSMODEL increases the efficiency of transport operations byunderpinning them with more secure and reliable Information Systems.TRANSMODEL is also expected to open the market by allowingintegration of complementary software products from different suppliers.

o TransXchange: TransXChange aims to define a national data standardfor the interchange of bus route registration, route and timetableinformation between operators, the Traffic Area Network, LocalAuthorities and Public Transport Executives, and the NationalPassenger Transport Information System. TransXChange is a standardfor data records, thus its scope is defined by the possible extent of itscontents which are in turn determined by the concepts to be supported.

o TRIDENT, TRansport Intermodality Data sharing and ExchangeNeTworks: The goal of the project is to support multimodal travel ITSservices by establishing common and reusable mechanisms thatenable sharing and exchanging data between transport operators(content owners) of different modes (bus/tram/metro, rail and road) aswell as information service providers. It will also investigate andpropose solutions for the organizational and strategic issues hamperingtravel intermodality. This will lead to proposals for new standards aswell as to recommendations supporting the implementation of systemsbased on the project’s results.

o TTI, Travel Technology Initiative: The Travel Technology Initiative wascreated to establish technology standards within the travel industry. TTImaintains and publishes the Unicorn EDI messages, of which there arenow over 130 in use throughout the travel industry. The TTI iscooperating with the OTA on establishing XML standards.

o UIC 912, Union Internationale de Chemins de Fer 912 protocol: UCI912 is a proprietary protocol developed by UIC to support theinformation exchange of their members. The application areas coveredare international freight, passenger and baggage traffic, anddocumentary research. Each message format contains a headerfollowed by a series of 32 bit fields. EDIFER (UIC’s competence centrefor EDI standardization) is maintaining the UIC 912. However, it was astrategic decision to adopt the UN/EDIFACT standards as the overallUIC standard. Recently, a new working group on XML has beenestablished within UIC. UIC was participating in ebXML.

o UN/LOCODE: This is a United Nations Code for Trade and TransportLocations. It is a geographic coding scheme developed by UnitedNations Economic Commission for Europe (UNECE). The UN/LOCODEassigns codes to locations used in trade and transport with functionssuch as seaports, rail and road terminals, airports, post offices andborder crossing points. UN/LOCODEs have five characters. The firsttwo are letters, and come from the ISO 3166-1 alpha-2 country codes.Normally three letters will follow, but if there are not enoughcombinations, numbers from 2 to 9 can also be used. For airports, thethree letters following the country code are not always identical to theIATA airport code.

Page 50: eTOUR CWA final 2009-06-03

50 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

o UN/EDIFACTTT&L, United Nations rules for Electronic DataInterchange for Administration, Commerce and Transport, Travel,Tourism and Leisure: UN/EDIFACT aims at facilitating the electronicexchange of business data between communication partners and iscomprised of a set of internationally agreed standards directories andguidelines for the electronic exchange of structured data.

o USTOA: The United States Tour Operator Association is a professionalassociation representing the tour operator industry. It is composed ofcompanies whose tours and packages encompass the entire globe andwho conduct business in the USA.

o WATA, World Association for Travel Agencies: WATA is since 50 yearsthe leading association of the travel trade. With members in mostcountries and on all continents it stands for quality and reliability. Anown guarantee-fund covers transactions. The MASTER KEY and theGLOBAL TRAVEL PLANNER help tour operators and travel agents intheir daily business and the yearly General Assembly fosters friendship.

eBusiness vocabularies:o cXML, commerce eXtensible Markup Language: cXML allows buyers,

suppliers, aggregators, and intermediaries to communicate using asingle, standard, open language. Successful business-to-businesselectronic commerce (B2B e-commerce) portals depend upon a flexible,widely adopted protocol. cXML is a language designed specifically forB2B e-commerce and provides access to products and services. cXMLtransactions consist of documents, which are simple text files with well-defined format and content. Most types of cXML documents areanalogous to hardcopy documents traditionally used in business.

o OAG, Open Applications Group: The Open Applications Group is a non-profit industry consortium focussing on promoting the easy and cost-effective integration of key business application software componentsfor enterprise and supply chain functions for end-user organizations.The Open Applications Group Integration Specifications (OAGIS)accelerate component integration and electronic commerce byproviding capabilities for Supply Chain Integration using the ExtensibleMarkup Language (XML). The Open Applications Group has alsopublished a proposal for a common middleware (OAMAS) that, whenadopted with OAGIS, will move the industry much closer to the vision ofplug and play compatibility for business applications.

o RosettaNet: RosettaNet is an independent, non-profit organizationdedicated to promoting an industry-wide initiative to agree on and adoptcommon electronic business processes world-wide. RosettaNet focuseson building a master dictionary to define properties for products,partners, and business transactions. This master dictionary, coupledwith an established implementation framework (exchange protocols), isused to support the eBusiness dialog known as the Partner InterfaceProcess (PIP). RosettaNet PIPs create new areas of alignment withinthe overall IT supply-chain eBusiness processes, allowing IT supply-chain partners to scale eBusiness, and to fully leverage electronic

Page 51: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 51

commerce applications and the Internet as a business-to-businesscommerce tool.

o SIMPL-edi: SIMPL-edi’s purpose is to provide more focused EDImessages based on simple, standard international data elements andwell structured master files. It builds on the best practice work alreadydone with an aim to provide a sound basis for the widest, most costeffective use of electronic commerce and associated computerapplications. The philosophy of Simpl-EDI is a common sharedunderstanding of what is required to pass between companies involvedin the supply, transport and purchase of all types of goods and services.Simplified messages rely upon the removal of redundant or stableinformation from the messages themselves to master data files where itcan be separately accessed or processed.

o xCBL, XML Common Business Library: The XML Common BusinessLibrary (xCBL) is a set of XML building blocks and a documentframework that allows the creation of robust, reusable, XML documentsto facilitate global trading. It essentially serves as the “mother code”,providing one language that all e-marketplace participants canunderstand. This interoperability allows businesses everywhere toeasily exchange documents across multiple e-marketplaces, givingglobal access to buyers, suppliers, and providers of business services.

eBusiness frameworks:

o BizTalk: BizTalk is an industry initiative defining the BizTalk Framework,an Extensible Markup Language (XML) framework for applicationintegration and electronic commerce. It includes a design framework forimplementing an XML schema and a set of XML tags used in messagessent between applications. The BizTalk Framework will be used toproduce and publish XML schemas in a consistent manner.

o ebXML, electronic business XML: The mission of ebXML is to providean open XML-based infrastructure enabling the global use of electronicbusiness information in an interoperable, secure and consistent mannerby all parties. ebXML, sponsored by UN/CEFACT and OASIS, is amodular suite of specifications that enables enterprises of any size andin any geographical location to conduct business over the Internet.Using ebXML, companies now have a standard method to exchangebusiness messages, conduct trading relationships, communicate datain common terms and define and register business processes.

o eCoFramework: The goal of the eCo Framework project is to develop acommon framework for interoperability among XML-based applicationstandards and key electronic commerce environments. The project’sworking group will develop a specification for content names anddefinitions in electronic commerce documents, and an interoperabletransaction framework specification.

o Ontology.org: Ontology.Org is an independent industry and researchforum focussed upon the application of ontologies in Internetcommerce. It is the central goal of Ontology.Org to use ontologies to

Page 52: eTOUR CWA final 2009-06-03

52 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

address the problems that impact the formation and sustainability oflarge electronic trading groups.

o OntoWeb: OntoWeb - Ontology-based information exchange forknowledge management and electronic commerce, a collaborativenetwork of European researchers and industrials, which aims tostrengthening the European influence on Semantic Webstandardization efforts such as those based on RDF and XML.

o OO-edi, Object-oriented edi: OO-edi is an attempt to put the Open-ediReference Model into practice. It is based on business process andinformation modelling methodologies. The resultant model specifies thebusiness flow needs and identifies related object classes to the extentthat production of off-the-shelf software to support EDI exchangesbecomes feasible. Business process and information modelling in OO-edi is based on the Unified Modelling Language (UML). Themethodology used for business process and information modelling is acustomization of the Rational Unified Process and is calledUN/CEFACT’s Modelling Methodology (UMM or “N90”). Thedevelopment of this methodology was influenced by the methodologyused at SWIFT. Later on, the methodology merged with the modellingmethodology & metamodel used in RosettaNet. UMM is referenced byebXML as the preferred methodology to model ebXML compliantbusiness processes.

o Open-edi, ISO/IEC 14662:1997 Information Technology — Open-edireference model: Open-edi specifies a Reference Model which shouldenable business partners to do business electronically without any prioragreements.

o SOAP, Simple Object Access Protocol: SOAP provides the definition ofan XML document which can be used for exchanging structured andtyped information between peers in a decentralised, distributedenvironment. It is fundamentally a stateless, one-way messageexchange paradigm, but applications can create more complexinteraction patterns (e.g., request/response, request/multipleresponses, etc.) by combining such one-way exchanges with featuresprovided by an underlying transport protocol or application-specificinformation. SOAP is silent on the semantics of any application-specificdata it conveys, as it is on issues such as the routing of SOAPmessages, reliable data transfer, firewall traversal, etc.

o UDDI, Universal Description, Discovery and Integration: UDDI enablescompanies to publish how they want to conduct business on the web,potentially fuelling growth of business-to-business (B2B) electroniccommerce. UDDI will benefit businesses of all sizes by creating aglobal, platform-independent, open architecture for describingbusinesses and services, discovering those businesses and services,and integrating businesses using the Internet. Therefore, the corefunction of UDDI is performed by a UDDI Business Registry.

o XML/EDI, eXtensible Markup Language / Electronic Data Interchange:XML/EDI is an extension of EDI and aims to enhance the EDImechanisms by the flexibility and extensibility of XML. The basic

Page 53: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 53

approach of the XML/EDI framework is expressing EDI mechanismsusing XML syntax. To reach full dynamic electronic commerce theXML/EDI framework provides three additional components: processtemplates containing processing information, software agentsinterpreting the process templates and repositories providing syntacticand semantic information needed for the execution of EDI transactions.

Business semantics:o BSR, Basic Semantic Register: The BSR is an official ISO data register

for use by designers, implementers and users of information systems ina manner which will allow systems development to move from a closedto an open multilingual environment, especially for use in domestic andinternational electronic communication including electronic commerceand EDI. The purpose of the BSR is to provide an internationally agreedregister of multilingual data concepts, semantic units (SU), with itstechnical infrastructure. This will provide storage, maintenance anddistribution facilities for reference data about semantic units and theirlinks (bridges) with operational directories. The semantic units will bebuilt from semantic components, which can be considered as buildingblocks.

o ISO/IEC 11179, Specification and Standardization of Data Elements:ISO/IEC 11179 is a multi-part International Standard concerning dataelement specification and standardization. The complete set includessix interrelated parts, with each part focusing on one aspect of dataelement development and maintenance.

o UNSPSC, United Nations Standard Product and ServicesClassification: The UNSPSC Code is a coding system to classify bothproducts and services. It has been established in 1999 by the merger ofthe UN Common Procurement Code (CPC) list with the Dun andBradstreet Standard Product and Services Code list. UNSPSC is a fivelevel hierarchical taxonomy. A product or service is identified by a twocharacter numerical (and a textual description) for each level. Forexample the code “90” on the highest level contains “Travel and Foodand Lodging and Entertainment Services”. UNSPCS codes can be usedin a UDDI registry for the identification of products and codes.

o UNTDED, Trade Data Elements Directory, ISO 7372: The standarddata elements included in this Directory are intended to facilitateinterchange of data in international trade. These standard dataelements can be used with any method for data interchange on paperdocuments as well as with other means of data communication: theycan be selected for transmission one by one, or used within a particularsystem of interchange rules.

Modelling languages:o DCMI, Dublin Core Metadata Initiative: The Dublin Core Metadata

Initiative (DCMI) is an organization dedicated to promoting thewidespread adoption of interoperable meta data standards anddeveloping specialized metadata vocabularies for describing resourcesthat enable more intelligent information discovery systems. The DCMI iscommitted to the continual refinement of a “core” foundation of property

Page 54: eTOUR CWA final 2009-06-03

54 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

values and types to provide vertically specific (or semantic) informationabout Web resources, much in the same way a library card cataloguesprovide indexed information about book properties. The Dublin CoreMetadata Element Set (DCMES) was the first metadata standarddeveloped out of the DCMI as an IETF standard. DCMES provides asemantic vocabulary for describing “core” information properties, suchas “Description” and “Creator” and “Date”.

o GINF, Generic Interoperability Framework: The Generic InteroperabilityFramework (GINF) has been developed to facilitate integration ofheterogeneous components. One of the main principles it employs isthe generic representation of protocols, languages, data and interfacedescriptions. The current implementation of the framework is based onRDF. The implementation of GINF provides semantic-orientedmiddleware for application development and integration. GINFmiddleware allows creating open and highly extensible client/serverapplications.

o OIL, Ontology Inference Layer: OIL is a proposal for a web-basedrepresentation and inference layer for ontologies, which combines thewidely used modelling primitives from frame-based languages with theformal semantics and reasoning services provided by description logics.It is compatible with RDF Schema (RDFS), and includes a precisesemantics for describing term meanings (and thus also for describingimplied information).

o Object Management Group: The OMG was formed to create acomponent-based software marketplace by hastening the introductionof standardized object software. The organization’s charter includes theestablishment of industry guidelines and detailed object managementspecifications to provide a common framework for applicationdevelopment. Primary goals are the reusability, portability, andinteroperability of object-based software in distributed, heterogeneousenvironments. Within specific task forces are developed specificationsfor special area markets or domains (domain interfaces) e.g. theTransportation or the Retail group.

o RDF, Resource Description Framework: RDF provides the foundationfor metadata interoperability across different resource descriptioncommunities. RDF allows descriptions of Web resources - any objectwith a Uniform Resource Identifier (URI) as its address - to be madeavailable in machine understandable form. This enables the semanticsof objects to be expressible and exploitable. RDF is based on aconcrete formal model utilizing directed graphs that elude to thesemantics of resource description. The basic concept is that aResource is described through a collection of Properties called an RDFDescription. Each of these Properties has a Property Type and Value.Any resource can be described with RDF as long as the resource isidentifiable with a URI.

o UML, Unifying Modelling Language: The Unified Modelling Language(UML) is a language for specifying, visualizing, constructing, anddocumenting the artefacts of software systems, as well as for business

Page 55: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 55

modelling and other non-software systems. Unified Modelling Languagefuses the concepts of object-oriented analysis and design approaches(Booch, OMT and OOSE). The result is a single, common, and widelyusable modelling language.

6.1.3 Gaps and future needs

Standards are generally speaking becoming an increasingly central issue in the so-called information and knowledge-based society. Information and communicationtechnologies are tremendously impacting the travel and tourism industry,transforming the whole sector, both from the industry itself and from the consumerside. Within this context, standards in the travel and tourism industry ought to provideintegration and exchange of heterogeneous sources of distributed tourisminformation so that processes (reservation, purchasing, checking, etc.) canseamlessly be carried out, no matter where from, which communication technology isused, language of use, etc.

There is a need to define and provide (semantic) definitions and clarifications in orderto transform disparate localized information into a global, coherent resource withinthe Internet (most common communication platform and environment in this case).

The following important functionalities should be obtained:

Define a common language for domain experts and IT developers to formulaterequirements and to agree upon system functionalities with respect to thecorrect handling of tourism information.

Support the (semi-)automatic data transformation algorithms from local toglobal structures without (essential) loss of meaning.

Support associative queries against integrated resources by providing a globalmodel of the basic classes and their associations to formulate such queries.

Define data structures that can be applied, used and processed by themajority of organizations in the travel and tourism industry.

Define the level of formality data ought to be defined with by the differentagents of the travel and tourism sector so that these data can efficiently beautomatically compared and processed by machines with the lowest humanintervention possible.

6.1.4 Recommendations

6.1.4.1 Short-term recommendations (1–3 years)

Leverage existing standards rather than developing new specificationswhenever possible.

Build cooperation between private associations like IATA, OTA, and XFT andformal standardization bodies such as ISO, and CEN.

Build a “watchtower” registry of relevant eTourism standards that is also actingas a coordination body between various formal and informal standardizationactivities. Such an activity can be modelled on the MoU/MG.

Page 56: eTOUR CWA final 2009-06-03

56 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

6.1.4.2 Long-term recommendations (3–10 years)

Lower the entry barrier for participation in pertinent formal and informalstandardization bodies especially for SMEs and extend the scope of thoseactivities to cover their respective requirements.

Work on interoperability approaches between different standards.

6.2 Taxonomies

6.2.1 Needs and requirements

6.2.1.1 Introduction

Traditionally all sciences classify their objects. Astronomy classifies celestial bodiessuch as planets, stars and galaxies. Botany classifies plants, chemistry thechemicals, medicine classifies illnesses, psychology classifies mental processes,library and information science classify documents and systems and methods ofknowledge organization, religious studies classify religions, and the list could go onforever.

Such classifications are not performed just in order to create an aesthetic effect.Classifications are constructed in order to work efficiently, and also to provide themeans to efficiently find and retrieve meaningful and required information.Classification is not something extra put on the top of scientific work; rather it issomething deeply integrated within scientific work itself, as it provides deeperunderstanding on the subject matter of study.

For example, if a new group of chemical substances are found to help cure a certaindisease and this fact is widely demonstrated, it shall be classified as a kind of drug(e.g. as antidepressives, tranquillizers or anti-inflammatory drugs) that helps humansrecover from that particular disease.

There is a close connection between the development of scientific concepts andclassifications. E.g. when an astronomer recognizes the different between the natureof a particular star or planet, s/he is reflecting this fact in both his/her conception ofthe item and its later classification within the table of celestial bodies. Classification iscarried out under various criteria and it aims at distributing entities within differentgroups that have one (or several) similar (common) features.

6.2.1.2 Needs

Due to the tremendous impact that information and communication technologieshave had upon the travel and tourism industry, the whole sector has to be rethought.Despite traditional (tourism) research still being valid, research on other realms(especially IT-related) is also needed, e.g. new information management systems(storage, management, access, retrieval), new communication technologies,channels and platforms, devices that allow people on the move to be able to accessand receive information-based services as well as new consumption patterns andbehaviours. All of this can only be possible if information is classified, stored and

Page 57: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 57

organized in agreed ways by all agents (public institutions and bodies, industry,research communities, final user, etc.). Relevant tourism information in general, itsorganization within information management systems and its explicit specificationthrough schemas or information representation methods and models need to bedefined.

Information and content are key. To access the right piece of information at the rightmoment information needs to be clearly stored and classified. Almost anything(including tourism information, i.e. travel, accommodation, restoration, events, usefulinformation, etc.) has to be classified following a structure, e.g. taxonomic schemas.

6.2.1.3 Requirements

As the amount of (all kinds of) available information increases on the web theparticular piece of information we seek may be buried into the one that we do notseek. Thus, the activity of classifying information becomes increasingly moreimportant as it makes it easier to find a particular content on the web. This in terms ofservice provided by a company can be translated into business opportunities.Information availability in an easy way is significantly more important to thoseplanning some kind of leisure activity, as their behaviour pattern indicates that theywill not spend too long on Web sites looking for information. Thus, information has tobe object oriented, not experience oriented. However, in order to build a successfultourism taxonomy, both approaches are required.

Taxonomies or conceptual hierarchies are crucial for any knowledge-based system,i.e. any system making use of declarative knowledge about the domain it deals with.Thus, in order for a knowledge system to succeed it has to be easy enough to use foranyone without specialized training or background on its content. Information has tobe classified in a meaningful manner by taxonomists and tourism domain experts in ageneral way and it requires strict control over the creation of new entities andbranches. Information management principles and practices, taxonomies, and othercontrolled vocabularies serve as knowledge management tools that can be used tohelp organize content and make connections between people and the informationthey need.

6.2.2 State of the art

Taxonomies are one way of classifying things into groups. There is a significantdifference between describing the objects being classified and describing thesubjects used to classify them. Taxonomies (and other classification techniques) aredifferent approaches to describe subjects, i.e. it is a subject-based classification thatarranges terms in a controlled vocabulary into a hierarchy without doing anything elseany further. In practise, taxonomies may be found applied to more complexstructures.

The benefit of this (taxonomy) approach is that it allows related terms to be groupedtogether and categorized in ways that make it easier to find the correct term to usefor whatever purpose. Within the tourism domain, if there is a taxonomicalclassification for the notion of “Event”, different “Event”s could be classified under the

Page 58: eTOUR CWA final 2009-06-03

58 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

general one, e.g., sport events, cultural events, etc., and would allow a tourist toeasily find the kind of “Event” s/he wants to undertake.

Etymologically speaking, the word “taxonomy” comes from the Greek taxis(“arrangement, order”) and nomos (“law”).

The units in taxonomies are termed taxon (plural: taxa). Initially taxonomy was onlythe science of classifying living organisms and species, but later the word wasapplied in a wider sense, and may also refer to either a classification of things, or theprinciples underlying that classification. Classification of species, however, beganwell before the eighteenth century. Aristotle distinguished species by habitat andmeans of reproduction, but Andrea Cesalpino produced the first significant taxonomyof plants in 1583, arranging the species in a hierarchical, graded order. His work wasdeveloped by Marcello Malpighi, who expanded his hierarchical system to includeanimals. The word taxonomy is sometimes used synonymously with classificationand sometimes given a special meaning.

There have also been some attempts to differentiate taxonomies from simpleclassifications. These attempts may also serve as a review of the different definitionsauthors have given to the notion of taxonomy. “A taxonomy obtains when severalfundamenta divisionis are considered in succession, rather than simultaneously, byan intensional cl. [classification]. The order in which fundamenta are considered ishighly relevant: the taxonomy obtained by using property X to classify a genus andthen property Y to classify its species is by no means the same as that obtained byconsidering property Y first and property X afterwards” [Marradi, 1990].

Campbell & Currier (31/10/00) [Campbell, Currier] asks: What is a taxonomy? Andthey provide the following answer:

A taxonomy is an ordered classification system. Information is grouped according to presumed natural relationships. Ordered resources are grouped like with like. The structure of a taxonomy should be consistent with user groups

conceptualization of their subject.

6.2.2.1 Examples of tourism taxonomies

There is a vast number of taxonomic classifications within the tourism domain in theliterature. Almost every project applying information management methods use ataxonomy in order to organize the existing information of their universe (the project tobe developed) of discourse. Taxonomies are later used to design databasestructures, ontologies and other tools in order the information to be easily accessibleand retrievable for the final user.

In commercial web sites and online travel agencies, their services are oftenorganized under taxonomies, e.g. restaurants and kinds of restaurants.Accommodation facilities are organized under different categories: hostel, 5-starhotel, 4-star hotel, etc., or even in ranges of price, depending upon the searchcriteria.

Page 59: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 59

Other examples found in the literature are:

Cultural Tourism Taxonomies and Folksonomies: The objective of thetaxonomy is to develop a comprehensive map of elements of cultural heritagethat attract different people to town and cities by:

o identifying and categorizing a range of cultural attractors;o identifying interests and motivators for different types of tourists;o identifying relations of attractions between attractors and interests;

Cultural Tourism Taxonomies and Folksonomies: there is a project in theCOST Action C21 that defines and builds taxonomy of attraction tourism site.23 types of attractions are defined thanks to the Prentice’s typology. Buildingclassification is a complex process. Some problems arise like in the Deweyclassification where most part of the topics concern mostly European culturethan worldwide culture. In the Urban field domain, classifications are moreobject-oriented classifications than experience oriented classifications like theDewey classification. Experienced classification is based on the usage.Folksonomy is an ethnoclassification. The goal is to define categories ratherthan build a correct classification. For example Flikr web site contains the mostpopular tags used for photography.

6.2.3 Gaps and future needs

As it can be seen from the previous text after having thoroughly reviewed the mostsignificant literature, one single way of (correctly) classifying things does not exist.Furthermore, the same instances could be classified in different ways (may bedepending on their application scenario) with different objects, and different objectscould be instantiated using the same meaning. Consequently there is a need to findan agreement in the community involving all agents possible: public administrationand regulatory bodies, industry, final user, research community, etc.

In a taxonomy the means for subject description consists of essentially onerelationship: the broader/narrower relationship used to build the hierarchy. The set ofterms being described is of course open, but the language used to describe them isclosed, since it consists only of a single relationship. Before actually developing thetaxonomy, one needs to define the scope of the classification, purpose, and types ofcontent formats. It is crucial to bear in mind at all times the target audience andcommunities who will use it. An evaluation of the needs can be carried out, orinterviews to identify and focus on the content final users care about and on theorganization of the content.

Taxonomies usually require strict control over the creation of new entities andbranches and this restriction needs to be overcome, especially given the wayinformation is consumed on the web. Systems need to be as dynamic as possible,i.e., flexible.

Traditional information systems have classified information according to a particularhierarchy. Now, some information systems allow introducing links and they makeinformation available in an easier way. In the future, information systems will not

Page 60: eTOUR CWA final 2009-06-03

60 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

introduce classification methodologies; rather, they will categorize their content andwill make it available via tags, links, etc.

6.2.4 Recommendations

6.2.4.1 Short-term recommendations (1–3 years)

Follow existing taxonomies including established definitions whereverpossible.

Produce mappings between eTourism-related taxonomies. Federate existing eTourism-related taxonomies across languages based on

the mappings and offer a SKOS interface to them. Formulate guidelines for the design of eTourism-related taxonomies.

6.2.4.2 Long-term recommendations (3–10 years)

Build organizational structures for the long-term duration of eTourism-relatedtaxonomies.

6.3 Ontologies

6.3.1 Needs and requirements

6.3.1.1 Introduction

The word “Ontology” (note the upper-case ‘O’) comes originally from philosophy.From a philosophical point of view, Ontology is the branch of philosophy which dealswith the nature and the organization of reality [Guarino, Giaretta, 1995]. We have togo as far back as to Aristotle to see the first reference to this word when he tries todefine a “science” that is “on top of” the rest of the sciences, when he describes in hisMetaphysics Book IV a science that studies the being as being (i.e. Ontology):

“There is a science that studies the being as being and its properties as such (being)which belong to it in virtue of its nature. Now, this science is not the same as any ofthe so called special sciences, since none of these other treat (universally) the beingas being itself but reducing the being to one part of it, they (“only”) investigate theessential properties of this part. Since we are seeking the first principles and thehighest causes, there must (clearly) be something to which these belong in virtue ofits own nature. If then, those who sought the elements of existing things wereseeking these same principles, it is necessary that the elements must be elements ofbeing not only by accident but just because it is being. Therefore, it is of being asbeing that we also must grasp the first causes” [Aristotle, Metaphysics Book IV].

At the computer science domain, ontologies (note now the lower-case ‘o’) aim atcapturing domain knowledge in a generic way and providing a commonly agreedunderstanding of a domain which may be reused and shared across applications andgroups. Ontologies provide a common vocabulary of an area and define with differentlevels of formality the meaning of terms and the relations between them. Since thebeginning of the 1990s, ontologies have become a popular research topic

Page 61: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 61

investigated by several Artificial Intelligence research communities, includingknowledge engineering, natural language processing and knowledge representation.More recently, the notion of an ontology is also becoming widespread in fields suchas intelligent information integration, information retrieval on the Internet, andknowledge management. The reason for ontologies being so popular is in large partdue to what they promise: a shared and common understanding of some domain thatcan be communicated across people and computers.

6.3.1.2 Needs

In recent years, the development of ontologies has been moving from the realm ofArtificial Intelligence (AI) laboratories to the desktops of domain experts. Ontologieshave become common on the World Wide Web. Ontologies on the web range fromlarge taxonomies categorizing web sites to categorizations of products for sale andtheir features. Many disciplines now develop standardized ontologies that domainexperts can use to share and annotate information in their fields. Why wouldsomeone want to develop an ontology? Here are some of the (possible) reasons:

Clarification of knowledge structures

Ontological analysis clarifies the structure of knowledge. The first reason is that theyform the heart of any system of knowledge representation. If there are notconceptualizations that underlie knowledge, then there is not a vocabulary forrepresenting knowledge. Thus, the first step in knowledge representation isperforming an effective ontological analysis of some field of knowledge. Weakanalyses lead to incoherent knowledge bases.

Consider a domain in which there are people, some of whom are students, someprofessors, some are other type of employees, some are females and some males.For quite some time, a simple ontology was used in which the classes of students,employees, professors, males and females were represented as “types of” humans.Soon this caused problems because it was noted that students could also beemployees at times and can also stop being students. Further ontological analysisshowed that “students”, “employees”, etc. are not “types of” humans, but rather theyare “roles” that humans can play, unlike categories such as “females”, which are infact a “types of” humans. Clarifying the ontology of this data domain made it possibleto avoid various difficulties in reasoning about the data.

Knowledge sharing

Ontologies enable knowledge sharing. The second reason why ontologies areimportant is that they provide a means of sharing knowledge. Suppose we do ananalysis and arrive at a satisfactory set of conceptualizations and terms standing forthem for some are of knowledge, say, the domain of “electronic devices”. Theresulting ontology would be likely to include terms such as “transistors” and “diodes”,and more general terms such as “functions”, “processes”, and also terms in theelectrical domain, such as “voltage”, that could be necessary to represent thebehaviour of these devices. It is important to note that the ontology – defined by thebasic concepts involved and their relations – is intrinsic to the domain, apart from achoice of vocabulary to represent it. This ontology can be shared with others who

Page 62: eTOUR CWA final 2009-06-03

62 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

have similar needs for knowledge representation in that domain, avoiding the needfor replicating the knowledge analysis.

6.3.2 State of the art

Already in the middle of the 1980s the building of a big knowledge base on commonsense began. This knowledge base can be considered as an (probably the first)ontology. However, it is not until the beginning of the 1990s that ontologies weremore known.

It is at that time when DARPA (Defence Advanced Research Projects Agency)started its Knowledge Sharing Effort envisioning as a new way in which intelligentsystems could be built [Neches, Fikes, Finin, et al, 1991]. Building knowledge-basedsystems today usually entails constructing new knowledge bases from scratch. Itcould be done by assembling reusable components. System developers would thenonly need to worry about creating the specialized knowledge. This new system wouldinteroperate with existing systems using them to perform some of its reasoning. Inthis way declarative knowledge, problem solving techniques, and reasoning serviceswould be all shared among applications. This approach would facilitate buildingbigger and better systems at lower cost.

Since then, a considerable progress has been made in developing conceptual basesneeded for building technology that allows knowledge component reuse and sharing.

6.3.2.1 Definitions of the notion of ontology within the computer sciencedomain

One of the first definitions of the word “ontology” within the computer science domainis due to Neches et al [1991]. They defined an ontology as follows: “An ontologydefines the basic terms and relations compromising the vocabulary of a topic area aswell as the rules for combining terms and relations to define extensions to thevocabulary”.

It can be affirmed that this definition gives some clues about how to proceed to buildan ontology, including some vague definitions:

identify basic terms and relations between them, identify rules to combine them, and provide definitions of such terms and relations.

Later, in 1993, Gruber’s definition becomes the most referenced on the literature. Thefollowing is his definition of an ontology: “An ontology is an explicit specification of aconceptualization”. Conceptualization refers to an abstract model of phenomena inthe world by having identified the relevant concepts of those phenomena. Explicitmeans that the type of concepts used and the constraints on their use are clearlydefined. Formal refers to the fact that the ontology should be machine readable andprocess able. Shared reflects the notion that an ontology captures consensualknowledge, that is, it is not private to some individual, but accepted by arepresentative group of users that belong to a particular domain of knowledge.

Page 63: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 63

Finally we include here Uschold’s and Grüninger’s [1993] definition of an ontology:“Ontology is the term used to refer to the shared understanding of some domain ofinterest which may be used as a unifying framework to solve problems e.g. semanticinteroperability, structuring and representing relevant concepts in a large knowledgebase, etc.” As a conclusion, it can be said that there are as many definitions of thisword as authors, although these last two are the most used ones in the reviewedliterature.

6.3.2.2 Main components of an ontology

Ontologies provide a common vocabulary of an area and define – with different levelsof formality – the meaning of the terms and the relations between them. Knowledgein ontologies is mainly formalised using five kinds of components: classes, relations,functions, axioms and instances [Gruber, 1993].

Classes (also concepts) in the ontology are usually organized in taxonomies.Classes or concepts are used in a broad sense. A concept can be anythingabout which something is said and, therefore, could also be the description ofa task, function, action, strategy, reasoning process, etc.

Relations represent a type of interaction between concepts of the domain.They are formally defined as any subset of a product of n sets.

Functions are a special case of relations in which the n element of therelationship is unique for the n-1 preceding elements.

Axioms are used to model sentences that are always true. They can beincluded in an ontology for several purposes, such as defining the meaning ofontology components, defining complex constrains on the values of attributes,the arguments of relations, etc., verifying the correctness of the informationspecified in the ontology or deducing new information.

Instances are used to represent specific elements.

6.3.2.3 Ontology development tools

The tools that can be used for building ontologies usually provide a graphical userinterface for building ontologies, which allows the ontologists to create ontologieswithout using directly a specific ontology specification language. Some tools such asProtégé, Chimaera, and FCA-Merge have been created for merging and integratingontologies.

In the context of the Semantic Web, some tools have arisen during last years for theannotation of web resources in SHOE, RDF or DAML+OIL and OWL. Their mainobjective is the creation and maintenance of ontology-based markups in static webdocuments. In fact, they are used for managing easily instances, attributes andrelationships between web resources. Some of these annotation tools areOntoAnnotate, OntoMAt, and SHOE Knowledge Annotator.

There are also some ontology-based text mining tools, which allow extractingontologies either from structured, semi-structured or free text. These tools are usedto learn ontologies from natural language, exploiting the interacting constraints on the

Page 64: eTOUR CWA final 2009-06-03

64 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

various language levels (from morphology to pragmatics and background knowledge)in order to discover new concepts and stipulate relationships between concepts.

There are some important parameters that can be used in the comparison andevaluation of existing tools. Some of these parameters are:

Software architecture and tool evolution. This includes information abouthardware and software platforms necessary to use the tool, its architecture,extensibility and ontology storage. In this sense tools are moving towardsJava-based applications most of them accessible in the web.

Interoperability. There is not standardization on the tools that are used whenperforming any of these tasks, and these environments are not usuallyinteroperable.

Methodology support. It is not usual that a tool gives support to a methodologyfor building ontologies.

6.3.2.4 Ontology development languages

A great range of languages have been used for the specification of ontologies duringthe last decade: Ontolingua, LOOM, OCML, Flogic, CARIN. Many of these languageshad already been used for representing knowledge inside knowledge-basedapplications, others were adapted from existing knowledge representationlanguages, and there is also a group of languages that were specifically created forthe representation of ontologies. These languages, which we will call “traditional”languages, are in a stable phase of development, and their syntax consists of plaintext where ontologies are specified.

Recently many other languages have been developed in the context of the WorldWide Web: RDF, RDF Schema, SHOE, XOL, OML, OIL, DML+OIL, and OWL. Theirsyntax is based on XML, which has been widely adopted as a ‘standard’ language forexchanging information on the web, except for SHOE, whose syntax is based onHTML.

Among all these languages, RDF and RDF Schema cannot be considered to beontology specification languages per se, but rather general languages for thedescription of metadata in the web. Most of these “markup” languages are still in adevelopment phase.

There are many other languages that have been also considered in this survey. Forinstance, some languages have been created for the specification of specificontologies, such as CycL and GRAIL. There are also some other languages thathave not been created specifically for the representation of ontologies, includingadditional features that are not usual in ontologies, such as NKRL.

The selection of an ontology specification language for the development of anontology will not only depend on the characteristics of the language, but also on thetools that support it, the applications in which the ontology will be used, and theavailability of reusable ontologies in the same domain in a specific language.

Page 65: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 65

The most commonly used ontology development languages are the following:

RDF: RDF (Resource Description Framework) is one of the essential toolswithin Semantic Web. It is defined as a data model for objects (“resources”)and their relations. It offers a simple semantic and uses XML-based syntax.The scientific community has chosen the RDF as a standard in order to markmetadata and it is widely supported by the W3C and various othersorganizations;

OWL: The Ontology Web Language has been designed in order to extendRDF’s descriptive features. OWL is part of a growing recommendation practiseby the W3C in Semantic Web related issues. OWL has three differentlanguages, each of them related with a higher degree of expressivity. Theyhave been conceptualized according to the level of expressivity and formalityneeded in the application:

o OWL Lite: it is widely used in cases where a hierarchical classificationis needed and where there are very simple restrictions. E.g., it allowsestablishing cardinality restrictions, but it only allows to establish 0 or 1values. OWL Lite has less complexity than OWL DL;

o OWL DL (Description Logic): The OWL DL language has beendesigned for cases in which maximum expressivity is required. OWL DLincludes all functionality and power of OWL Full but with somerestrictions e.g., one class can be a sub-class of many other classes,however, a class cannot be an instance of another class;

o OWL Full is the third OWL sub-language. In OWL Full a class cansimultaneously be considered as a set of individual classes and as anindividual class on its own. OWL Full can be considered as anextension of RDF(S).

6.3.2.5 Examples of standard ontologies

There have been some research communities that have already tried to definestandard ontologies that cover a particular area of knowledge in a generic way andthat could thus be used in a standard way.

The CIDOC Conceptual Reference Model (CIDOC CRM)

The CIDOC CRM is a core ontology explaining the extended meaning of datastructures from humanities and cultural heritage, including history of science, is theoutcome of a long-term disciplined knowledge engineering activity which excels in itsontological commitment, i.e. acceptance of its constructs by domain experts.

The primary role of the CRM is to enable information exchange and integrationbetween heterogeneous sources of cultural heritage information (Doe, 03). It aims atproviding the semantic definitions and clarifications needed to transform disparate,localized information sources into a coherent global resource within a largerinstitution, in intranets or within the Internet. More concretely, it defines and it isrestricted to the underlying semantics of database schema and document structuresused in cultural heritage and museum documentation in terms of a formal ontology.

Page 66: eTOUR CWA final 2009-06-03

66 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

The success of the CRM relies on the fact that the explanation of common meaningcan be done by a very small set of primitive concepts and relations in contrast to datastructure that suggest to the user what to say about an object. The relations in datastructures that connect items directly by highly specific, diverse kind of relationshipcan frequently be expressed by data paths composed of a few fundamentalrelationships defined within the core ontology.

The CIDOC CRM has become the most promising core element for realizingsemantic interoperability in archives, libraries and museums by its capability to linkintellectual structure of highly diverse sources and products of scientific and scholardiscourse with the elements formally handled by information systems.

The CIDOC CRM is the culmination of over 10 years work by the CIDOCDocumentation Standards Working Group and CIDOC CRM SIG (Special InterestGroup) which are working groups of CIDOC. Since 2006 it is official standard ISO21127.

FRBRoo

The FRBRoo is a formal ontology intended to capture and represent the underlyingsemantics of bibliographic information and to facilitate the integration, mediation, andinterchange of bibliographic and museum information. The FRBR model wasoriginally designed as an entity-relationship model by a study group appointed by theInternational Federation of Library Associations and Institutions (IFLA).

The CIDOC CRM model was being developed from 1996 under the auspices of theICOM-CIDOC (International Council for Museums – International Committee onDocumentation) Documentation Standards Working Group. The idea that both thelibrary and museum communities might benefit from harmonizing the two models wasfirst expressed in 2000 and grew up in the following years. Eventually it led to theformation, in 2003, of the International Working Group on FRBR/CIDOC CRMHarmonisation that brings together representatives from both communities with thecommon goals of:

expressing the IFLA FRBR model with the concepts, tools, mechanisms, andnotation conventions provided by the CIDOC CRM, and

aligning (possibly even merging) the two object-oriented models with the aimto contribute to the solution of the problem of semantic interoperabilitybetween the documentation structures used for library and museuminformation, such that:

o all equivalent information can be retrieved under the same notions, and

o all directly and indirectly related information can be retrieved regardlessof its distribution over individual data sources;

o knowledge encoded for a specific application can be repurposed forother studies;

o recall and precision in systems employed by both communities isimproved;

Page 67: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 67

o both communities can learn from each other’s concepts for their mutualprogress;

o for the benefit of the scientific and scholarly communities and thegeneral public.

In 2006 a first draft of FRBRoo was completed. It is a logically rigid model interpretingconceptualizations expressed in FRBRer and of concepts necessary to explain theintended meaning of all FRBRer attributes and relationships. The model is formulatedas an extension of the CIDOC CRM. Any conflicts occurring in the harmonizationprocess with the CIDOC CRM have been or will be resolved on the CIDOC CRM sideas well. The Harmonization Group intends to continue work modelling the FRARconcepts and elaborating the application of FRBR concepts to performing arts.

HarmoNET

The Harmonisation Network for the Exchange of Travel and Tourism Information,HarmoNET, is an international network bringing together people and organizationswith an interest in the topic of harmonization and seamless information exchange intravel and tourism. HarmoNET provides unique technologies and services enablingan easy, affordable and fast information exchange.

The travel and tourism industry is an information-based business in which informationexchange is essential in order to maintain a dynamic market. HarmoNET aims tocreate for its members an international network for harmonization and seamless dataexchange in the travel and tourism industry. HarmoNET does not implement a newstandard, rather it provides the means for an effective data mediation process.

HarmoNET offers the following services:

Ontology Management: HarmoNET provides and maintains a tourism specificontology as a common definition of concepts and terms, their meaning andrelations between them. This ontology serves as a common agreement for theHarmoNET mediation service as well as a reference model for buildingspecific data models or tourism information systems.

Mediation Service: The HarmoNET mediation service provides a technicalsolution to the interoperability problem. Heterogeneous data are mapped fromthe local format on the one side to the format on the other side.

Community Services: In order to build a strong community and foster thecommunication and information exchange within the community HarmoNEToffers online community services like mailing lists, discussion fora, newsletteror bulletin boards as well as traditional community services like conferences,workshops and seminars, which will allow the community to meet together andto further work on the definition of the HarmoNET ontology.

SUO

Recognizing both the need for large ontologies and the need for an open processleading to a free, public standard, a diverse group of people has come together tomake such a standard a reality. The Standard Upper Ontology (SUO) will be an

Page 68: eTOUR CWA final 2009-06-03

68 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

upper level ontology that provides definitions for general-purpose terms and acts as afoundation for more specific domain ontologies.

It is estimated to contain between 1000 and 2500 terms plus roughly ten definitionalstatements for each term.

The standard will be suitable to support knowledge-based reasoningapplications.

This standard will enable the development of a large (20 000 +) general-purpose standard ontology of common concepts, which will provide the basisfor middle level domain ontologies and lower-level application ontologies;

The ontology will be suitable for “compilation” to more restricted forms such asXML or database schemata. This will enable database developers to definenew data elements in terms of a common ontology, and thereby gain somedegree of interoperability with other compliant systems.

Owners of existing systems will be able to map existing data elements justonce to a common ontology, and thereby gain a degree of interoperability withother representations that are compliant with the SUO.

Domain-specific ontologies that are compliant with the SUO will be able tointeroperate (to some degree) by virtue of the shared common terms anddefinitions.

Applications of the ontology will include:o e-commerce applications from different domains which need to

interoperate at both the data and semantic levels;

o educational applications in which students learn concepts andrelationships directly from, or expressed in terms of, a commonontology. This will also enable a standard record of learning to be kept;

o natural language understanding tasks in which a knowledge-basedreasoning system uses the ontology to disambiguate natural languageterms and structures.

SOUPA

Standard Ontology for Ubiquitous and Pervasive Applications (SOUPA) is designedto model and support pervasive computing applications. This ontology is expressedusing the Web Ontology Language OWL and includes modular componentvocabularies to represent intelligent agents with associated beliefs, desires, andintentions, time, space, events, user profiles, actions, and policies for security andprivacy. SOUPA can be extended and used to support the applications of CoBrA, abroker-centric agent architecture for building smart meeting rooms, and MoGATU, apeer-to-peer data management for pervasive environments.

6.3.3 Gaps and future needs

Ontologies are still not flexible enough and extensible enough. The tourism sectorcould be partially covered by some concepts, however, the extension of the initialontology would require a relatively large (manual) effort in order to cover new

Page 69: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 69

concepts. Due to the heterogeneity of the travel and tourism industry, it is a challengefor a single ontology to cover the whole market offer, thus the ontology managementprocess would potentially be too complicated.

In order for tourism companies to adopt either a standard or an ontology mediationprocess, companies should have the feeling to be able to make their offerdifferentiable within the market. Standards and ontologies sometimes tend to burydistinct features of products and services.

6.3.4 Recommendations

6.3.4.1 Short-term recommendations (1–3 years)

Use recognized standard reference models such as the Harmonise ontology(for tourism purpose) or CIDOC CRM (for cultural heritage data) whereverpossible.

Produce guidelines for the mappings between eTourism-related ontologiesbased on standard reference models.

Use established standards such as RDF(S), OWL or the Topic Map ConstraintLanguage to express ontologies.

Heighten the awareness of Open Source, user-friendly tools for ontologydefinition such as Protegé.

6.3.4.2 Long-term recommendations (3–10 years)

Build ontologies to represent other standards, e.g. IATA, etc. Build tools to automatically map ontologies. Work on automatic ontology (re)structuring and population.

Page 70: eTOUR CWA final 2009-06-03

70 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

7 Data transformation

7.1 Structured data mapping

7.1.1 Needs and requirements

7.1.1.1 Introduction

The so-called information society demands complete access to available information,which is most of the times distributed and heterogeneous. First a suitable informationsource must be located that potentially contains data of interest. Then access to thedata contained in the information source has to be provided, i.e. the informationsource and the querying system need to understand each other in order to effectivelyretrieve the particular piece of information of interest.

In order to establish comprehensive information sharing and to achieve efficientinteroperability of information systems various kinds of solutions have been madeavailable. Within these range of possible approaches, ontologies have shown to playan important role in resolving semantic heterogeneity among information sources byproviding a shared understanding of a given domain of interest, e.g. the travel andtourism industry.

Information sources may contain information on different levels of organization: Datamay be structured in databases, semi-structured in XML documents or completelynon-structured as web pages or other type of documents available. Regardless whatthe origin of data is it has to be mapped to an ontology if the objective is to achieveinteroperability of the local system with some other. In this chapter the mappingbetween an information source (e.g. a database, an XML file, etc.) and an ontology isreviewed.

The first step in a mapping process is to relate ontologies to actual contents of aninformation source. Ontologies may relate to the database scheme but also to singleterms used in the database or data structure. Regardless of this distinction we canobserve different general approaches used to establish a connection betweenontologies and information sources:

Structure Resemblance: A straightforward approach to connecting theontology with the structured data source is to simply produce a one-to-onecopy of the structure and encode it in a language that makes automatedreasoning possible. The integration is then performed on the copy of themodel and can easily be tracked back to the original data.

Definition of Terms: In order to make the semantics of terms in a databaseschema clear it is not sufficient to produce a copy of the schema. There areapproaches that use the ontology to further define terms from the database orthe database scheme. These definitions do not correspond to the structure ofthe database; these are only linked to the information by the term that isdefined. The definition itself can consist of a set of rules defining the term.However, in most cases terms are described by concept definitions.

Page 71: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 71

Structure Enrichment is the most common approach to relating ontologies toinformation sources. It combines the two previously mentioned approaches. Alogical model is built that resembles the structure of the information sourceand contains additional definitions of concepts.

Meta-Annotation: A rather new approach is the use of meta-annotations thatadd semantic information to an information source. This approach is becomingprominent with the need to integrate information present in the World WideWeb where annotation is a natural way of adding semantics. We can furtherdistinguish between annotations resembling parts of the real information andapproaches avoiding redundancy.

7.1.1.2 Needs

Using information systems in the travel and tourism industry implies using informationcoming from different data sources. All in all, it is a system working in cooperationwith other systems and for this to happen, information coming from various datasources may be needed to provide a particular service to a client.

Mapping is a very critical operation in various application domains such as semanticweb, schema or ontology integration, data integration, data warehouses,eCommerce, etc. As it has been mentioned in previous chapters, eCommerceactivities are crucial in the eTourism domain. Focussing on mappings three differentkinds can be distinguished:

Schema mapping: Mappings are established between schemas of databases.This method takes two database schemas as an input and produces amapping between elements of the two schemas that correspond to each other.

Ontology mapping: Ontology mapping is somewhat similar to schemamapping. In this case, the purpose of the mapping is to create a relation of thevocabulary of two ontologies that share the same domain of discourse.

Database-to-Ontology mapping: This is the process through which astructured data source and an ontology are semantically related at aconceptual level, i.e. relationships are set up between the ontology and datasource components.

The approach to be taken requires the creation of a mapping description using somekind of formal language that maintains the level of formality and expressivity of boththe ontology and the database. The document containing the description of them hasto show the correspondences between the components of the database’s SQLschema and those of the ontology. Afterwards, the ontology needs to be populatedthrough the mappings that have been made explicit in the document. The processought to be as automatic as possible in order to not need a high human effort.

In order to do this, languages to define mappings are needed. These languages haveto have the following features:

They have to be fully declarative in order to efficiently define and describemappings between relational database schemas and ontologies. It is has to beexpressive enough to define the semantics of the mappings.

Page 72: eTOUR CWA final 2009-06-03

72 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

The language ought to define how to create instances in the ontology in termsof the data stored in the database.

The language needs to have a declarative nature in terms of discoveringinconsistencies and ambiguities in the definition of a mapping. This potentialproblems have to automatically be discovered by the mapping language.

The mapping definition language could potentially be used to automaticallycharacterize data sources to allow dynamic query distribution in intelligentinformation integration approaches.

The mapping definition language doesn’t have to declare the degree ofsimilarity between database elements and ontology components. Rather, ithas to state under which conditions and after what transformations thedatabase elements are equivalent to the ontology components.

7.1.1.3 Requirements

Semantic conflicts occur whenever two contexts do not use the same interpretation ofthe information. Goh identifies three main causes for semantic heterogeneity thatneed to be overcome in order to achieve semantic interoperability [Goh, 1997]:

Confounding conflicts occur when information items seem to have the samemeaning, but differ in reality, e.g. owing to different temporal contexts.

Scaling conflicts occur when different reference systems are used to measurea value. Examples are different currencies.

Naming conflicts occur when naming schemes of information differsignificantly. A frequent phenomenon is the presence of homonyms andsynonyms.

The use of ontologies for the explication of implicit and hidden knowledge is apossible approach to overcome the problem of semantic heterogeneity. With respectto the impact on the data exchange, structuring conflicts can be differentiated:

fully mappable: all clashes can be resolved without any loss of information; partially or non-mappable: covering the structural conflicts for which any

conceivable transformation will cause a loss of information.

Here are some examples of clashes between different standards identified [Dell’Erba,Fodor, Höpken, et al, 2005].

Different naming: Equivalent concepts have different names in differentstandards. This is a fully mappable semantic clash.

Different position: Equivalent concepts have different positions within thestructure of the standards. This is also a fully mappable semantic clash.

Different scope of concepts: Concepts, containing the same piece ofinformation in different standards, have different scopes, i.e., the same pieceof information might be represented as single concept or as a part of severalconcepts. This is also a fully mappable semantic clash.

Different abstraction levels: The same information is represented on differentlevels of abstraction. This is a partially mappable semantic clash.

Page 73: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 73

Different granularity: The same information is represented on different levels ofgranularity. This is a partially mappable semantic clash.

Missing concept: If a concept in one standard has no counterpart in the otherstandard, it cannot be mapped.

Most of current approaches to solve the interoperability problem are mainly based onthe idea of fixed, obligatory standards, which define all details of the exchangedmessages. An example of an international XML-based standard is the specification ofOTA [OTA]. Companies, which are using such standards, are automatically able toexchange information with each other. However, all details of the exchangedmessage must be committed among all communication participants. The process ofdefining and maintaining such standards requires a lot of effort and therefore suchstandards are almost exclusively used by large companies such as hotel chains,airline companies and Global Distribution Systems (GDS).

7.1.2 State of the art

This section presents the state of the art on structured-to-ontology mapping from adatabase perspective. The same concepts hold for any other structured data source,such as XML data structures.

There are different mapping situations arising from database-to-ontology mapping. Adatabase-to-ontology mapping can be defined as a set of correspondences thatrelate the vocabulary of a relational database schema with that of an ontology. Thatis, we want to relate a database’s tables, columns, primary and foreign keys, etc.,with an ontology’s concepts, relations, attributes, etc.

There are several approaches in the literature to address the database to ontologymapping. In general, they can be classified into two main categories: approaches tocreate a new ontology from a database and approaches to map a database to analready existing ontology.

Creating an ontology from a database: This approach refers to the creation ofan ontology model from a relational database model and migrates the contentsof the database to the generated ontology. The mappings here are simply thecorrespondences between each created ontological component (class,property, etc.) and its original database component (table, column). Mappingsin this case are usually not extremely complex and the process could beautomated in a high degree. However, this kind of direct mapping may fail toexpress the full semantics of the database domain. The creation of anontology structure may require the discovery of hidden semantics implicitlyexpressed between database components (e.g., referential constraints) andtake them into account in the ontology building process.

Mapping a database to an already existing ontology refers to the creation oflinks between them or to populate the ontology with database content.Mappings in this case are far more complex as different levels of overlapsbetween the database domain and the ontology’s one can be found. Thosedomains do not necessarily have to coincide, as the criteria used to designdatabases and the criteria used to design ontologies are different.

Page 74: eTOUR CWA final 2009-06-03

74 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

Both mapping processes include two processes:

mapping definition (i.e. the definition from the database structure (schema) tothe ontology structure, and

data migration, the migration of database content to instances of the ontology.

Volz et al [Volz, Handschuch, Staab, Studer, 2004] [Volz, Stojanovic, Stojanovic,2002] propose an approach based on semiautomatic generation of an F-Logicontology from a relational database model. Mappings are defined between thedatabase and the generated ontology. The ontology generation process takes intoaccount different types of relationships between database tables and maps them tosuitable relations in the ontology. The mapping process is not completely automaticand a user intervention is needed when several rules could be applied to choose themost suitable.

Each table is transformed to a class and each attribute is transformed to a property.In addition, if the relational database table has foreign key references to other tables,these can be transformed to instance pointers, i.e. a new slot is added to the classrepresenting the reference table whose value is an instance of the class representingthe referenced table. The user manually selects the tables that he wants to map tothe ontology, then the mapping process is run in a completely automatic manner.

Relational.OWL [de Laborda, Conrad, 2005] is an OWL ontology representingabstract schema components of relational databases. Based on this ontology, theschema of (virtually) any relational database can be described and in turn be used torepresent the data stored in that specific database. This approach uses the meta-modelling capabilities of OWL-Full, which prevents the use of decidable inference onthe resulting ontology.

The definition of mappings is automatic or semi-automatic in the approaches thatcreate a new ontology, whereas there is no approach allowing the completelyautomatic definition of mappings to an already existing ontology. On the other hand,the process of ontology population is always automatic. The approaches that createa new ontology utilize the massive dump process for ontology population, except theapproach DB2OWL which allows the query driven process.

7.1.3 Gaps and future needs

Although a lot of effort has already been invested in ontology research (concept,methods, building, theory, etc.) and (commercial) application building, generalmapping processes are still at their infancy. There is a clear notion of what a mappingis, however, the real semantics and expressiveness of the links themselves have notyet been clearly defined.

Most of the mappings have been defined ad-hoc, i.e. for particular cases and areneither reusable nor extensible to other cases. Besides, should changes occur withindatabases, the whole mapping and even ontology would have to be redefined inorder to cover new concepts and relations.

Page 75: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 75

The literature review has shown a number of languages that have been used to mapdatabases to ontologies. However, there is no evidence of any language that links(maps) ontology components to database elements.

There is still a lot of human intervention needed for creating mappings. Althoughgraphical interfaces have been created (like in the case of R2O) still the mappingwork is in general hand intensive. This depends upon the level of formality anddifferent expressivity information is represented with and stored in databases. Onepossible way to automate in a certain degree the mapping creating process could beto recommend the building of the ontology using existing standard languages. Thisway ontologies could be compared, as they would have the same degree ofexpressivity and formality.

7.1.4 Recommendations

7.1.4.1 Short-term recommendations (1–3 years)

Use (graphical) mediation tools that enabled with reasoning capabilities toautomatically suggest same (semantically equivalent) data sources, identifyinconsistencies and decreases the amount of human intervention in themapping process.

Pursue the design and implementation of new data resources on the bases ofagreed recommendations, such as the W3C recommendations for SemanticWeb technologies.

7.1.4.2 Long-term recommendations (3–10 years)

Use semantic web technologies (e.g. based on RDF URIs) to name andrepresent (data) resources on the Web so that mapping can be automaticallyundertaken.

Agree the degree of formality information ought to be defined with, so thatautomatic mapping tools compare same kind of information.

Foster high level general ontologies to describe particular domains of interestso that low-level more concrete ontologies can later be linked or merged withinthe (more general) structure (if and only if both ontologies are defined with thesame level of formality and with the same ontology definition language).

7.2 Manual semantic annotation

Semantic Annotation is about attaching meaningful (information) structures toinformation resources such as documents, general multimedia content or informationon the Web in such a way that they can be used by computers in a meaningful way toenhance the usefulness of those resources. Semantic Annotation formally identifiesconcepts and relations between concepts in documents, and is intended primarily foruse by machines.

Information about documents and information sources has traditionally beenmanaged through the use of metadata. Metadata is just information concerning a

Page 76: eTOUR CWA final 2009-06-03

76 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

particular source of information: author, date, origin, content, type of file, etc. Withinthe context of Semantic Web (as defined by Tim Berners-Lee) annotating documentcontent is proposed by using semantic information from domain ontologies [Berners-Lee, 2001]. The result of (manually) annotating a Web information resource is Webpages with machine interpretable mark-up that provide the source material with whichagents and Semantic Web services and advanced search engine operate. The goalis to create annotations with well-defined semantics.

The amount of tourism information on the Web is huge and the diversity of its natureis also vast. Furthermore, recent studies have shown that decisions of tourists abouttheir potential destinations are increasingly influenced by multimedia and web-basedcontent and comments generated by other tourists. Besides, tourists have begun toshare their experiences on the web in the so-called Web 2.0 phenomenon and atremendous amount of web pages have been created by tourists and final users.Event destination management organizations are beginning to include usergenerated content into their own web sites as a way to promote their destination.

All of this information (usually non-structured) has to be made available to thegeneral public, i.e. metadata about that information has to be created in order tomake that information reachable on the Internet.

7.2.1 Needs and requirements

For the sake of data interoperability and exchange a well defined semantics is a mustto ensure that annotator and annotation consumer actually share meaning. A keycontribution of the Semantic Web is therefore to provide a set of worldwide standardsand recommendations on manual annotation. These recommendations allow tooperate with heterogeneous resources by providing an intermediation of commonsyntax, methods, semantics and understanding.

Travel and tourism is a leading industry in the application of B2C and B2B2CeCommerce and mCommerce solutions as well as Web based information channel,and a huge number of tourism information systems have been developed in order tosupport all the processes related to the electronic market. If the objective is toautomate the eBusiness processes over the Web with no human intervention andallowing machines to automatically interoperate among them, there is a must toannotate information sources so that a mediation ontology can integrate informationcoming from heterogeneous systems.

Therefore, in order for the tourism industry to succeed, new ways of data and contentannotation have to be developed so that the particular piece of information is used bya particular machine for a particular business process allowing a vertical dataintegration approach to the tourism market.

Semantic annotation is required as it brings enhanced information retrieval andimproved interoperability among systems. Information retrieval, mostly related tosearch on unstructured data sources, is improved by the ability to perform searches,which exploit the ontology to make inferences about data from heterogeneousresources [Welty, 1999].

Page 77: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 77

According to the Semantic Annotation for knowledge Management [Uren, 2005]:requirements and a survey of the state of the art, there are six requirements forsemantic annotation:

Standard formats: standards can provide a bridging mechanism that allowsheterogeneous resources to be accessed simultaneously and collaboratingusers and organizations to share annotations. Two standards can bementioned: the OWL for describing ontologies and RDF for annotationschema;

User centred: easy to use interfaces that ease the task of annotatingdocuments;

Ontology Support: Annotation tools need to support multiple ontologies;

Support of heterogeneous document formats;

Document evolution: Keep documents and annotations consistent;

Annotation storage: There are different storage criteria. Ones argue thatannotations ought to be stored separately from the documents and othersargue that annotations are an integral part of the document and therefore theyshould be stored together.

7.2.2 State of the art

There are a number of tools that produce semantic annotations, i.e. annotations thatrefer to a particular ontology. These tools meet some of the requirements above,however, they need further development.

Manual annotation tools allow users to manually create annotations, i.e. metadataabout a particular information source. These tools are in general terms relativelysimilar to those used for pure textual annotations, but differ in the sense that theyprovide some support for ontologies.

Following, there is a list with some of the most relevant annotation tools found in theliterature:

Amaya [Quint, 1994] is a Web browser and editor that marks-up Webdocuments in XML or HTML. The user can make annotations in the same tools/he uses for browsing purposes. It facilitates manual Web pages annotationbut does not support any automatic annotations;

The Annozilla browser aims to make all Amaya annotations readable in theMozilla browser;

The Mongrove system is another example of manual but user friendlyannotation tool [McDowell, 2003]. The annotation tool is a straight forward GUIthat allows users to associate a selection of tags to text that they highlight;

Due to the increase of multimedia content on the Web, tools to annotate thiskind of content have become very useful. Vannotea [Schroeter, 2003] can beused to add metadata to MPEG-2 (video), JPEG(2000) image and Direct 3D(mesh) files, with the mesh being used to define regions of images;

Page 78: eTOUR CWA final 2009-06-03

78 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

OntoMat Annotizer: this is a tool for making annotations which is built on theprinciples of the CREAM framework. It has a Web browser to display the pagewhich is being annotated and provides some reasonably user friendlyfunctions for manual annotation, such as drag and drop creation of instancesand the ability to mark-up pages while they are being created;

The M-OntoMat-Annotizer [Bloehdorn, 2005] supports manual annotation ofimage and video data by indexers with little multimedia experience byautomatic extraction of low level features that describe objects in the content.A commercial version of OntoMat, called OntoAnnotate,5 is available fromOntoprise;

SHOE Knowledge Annotator [Heflin, 2001] was an early system which allowedusers to mark-up HTML pages in SHOE guided by ontologies available locallyor via a URL. Users were assisted by being prompted for inputs. Unusually,the SHOE Knowledge Annotator did not have a browser to display Webpages, which could only be viewed as source code.

7.2.3 Gaps and future needs

Although annotation tools are most of them based in easy to understand and to useGUI, it is still relatively expensive to annotate information sources. There is a need forintegrated systems that allow users to deal with the documents, ontologies and theannotations that link documents to ontologies.

The most important challenge in manual annotation tools is automation – automationto support annotation, to support ontology maintenance and automation to helpmaintain the consistency of documents, ontologies and annotations (Uren, 2005).

Other important challenges for the future in this active research area are: automatingthe annotation of information of various formats, addressing issues of trust andsecurity and resolving problems of storage.

7.2.4 Recommendations

Due to the nature of this topic, there can be some overlapping of recommendationswith other issues that have already been covered, such as ontologies.

7.2.4.1 Short-term recommendations (1–3 years)

Enhance the use of standard ontologies (e.g. harmoNISE) on the field oftourism.

Enhance the development of ontologies with standard languages: OWL, RDF.

Enhance the use of already existing manual annotation tools in the realm oftourism.

7.2.4.2 Long-term recommendations (3–10 years)

Investigate in automation of annotations.

Page 79: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 79

Investigate in automatic ontology extension.

7.3 Automatic information extraction

7.3.1 Needs and requirements

Much of the data relevant for eTourism is available on normal public web sites. Justas tourism itself is a wide-ranging concept, data pertinent for it can stem from manysources, touristic and otherwise. Local communities often provide ample informationabout points of interest in their area. Theatres and orchestras regularly publish theirprogrammes, museums inform about opening hours and ticket prices, and hotelsfrequently provide information about their services that complement the basic factsthat are stored in major booking systems.

These scenarios are only a few examples of many. Actually, the amount ofinformation that is stored in this way probably vastly surpasses that in structuredsources. As Martin Hepp, Katharina Siorpaes, et al have analyzed, structured andunstructured data complement each other in many cases, e.g. for hotels where websites frequently contain more complete descriptions of the hotel, while the GDSs onlypublish the room availability.

Normally, however, the data on the web is unstructured and geared towards humanconsumption only. Only rarely do metadata or formal resource descriptions reliablycomplement and explicate this unstructured information to facilitate its use inautomated transactions or automated integration with structured resources. It seemsunlikely that this situation is going to improve fundamentally over the next years.

The unstructured nature of the data invariably limits its reuse in electronictransactions. Based on this type of information it will be difficult at best to, e.g.,automatically complement a hotel booking with the reservation of museum andtheatre tickets.

7.3.1.1 Needs

Nonetheless, as long as there is no prospect of fundamentally reversing the presentsituation, companies need to leverage the currently existing data as well as possible.

7.3.1.2 Requirements

In an ideal world, information extraction would structure free text in such a way that itcan be automatically analyzed, queried and integrated with structured data sources.This is certainly illusionary for the foreseeable future. Nevertheless, it is necessary toexplore the potential of the various facets of information extraction for the eTourismdomain.

7.3.2 State of the art

Information extraction is “the automatic identification of selected types of entities,relations, or events in free text” [Grishman, 2003]. Unstructured information is thus

Page 80: eTOUR CWA final 2009-06-03

80 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

automatically structured and usually imported into databases, XML files or otherstructured storage formats for subsequent analysis and evaluation.

Information extraction is by no means restricted to web sites. In fact, informationextraction was originally popularized in the 1980s based on locally stored free textcorpora. However, many of today’s application incorporate the harvesting ofinformation on the web. This is certainly also the more applicable scenario foreTourism.

Currently the two branches of information extraction that have drawn most attentionin the research community are named entity recognition – the explication ofreferences to persons, organizations, places, etc. – and event extraction; the latter,e.g., practiced in projects such as JRC’s EMM Violent Events Maps that areautomatically compiled from published news feeds. Both are pertinent to eTourism.Furthermore, some research has been done on information extraction specific toeTourism.

7.3.2.1 Named entity recognition

Named entity recognition is by now a rather well understood topic with wideapplications both across many fields – computational linguistics, computationalphilology and related disciplines, even genetics – and across many languages.Approaches for name taggers often build either on hand-crafted rules – goodclassifiers can reach a precision well above 90 % for English language material (cf.Grishman, 2003, note 3) – or machine learning technologies including automatedlearning and statistical model building. Both maximum entropy [Borthwick, 1999] andHidden Markov [Bikel, Miller, Schwartz, Weischedel, 1997] models have been trainedusing tagged reference materials. The models have then been successfully appliedto untrained material, reaching again precision levels above 90 % for new material.

Various readily available tools implement named entity recognition. The ANNIEpackage of the open source GATE suite contains resources such as a tokenizers,gazetteers and semantic taggers to build rule-based named entity resolvers. Manyother open source or commercial offerings are listed inhttp://en.wikipedia.org/wiki/Named_entity_recognition.

7.3.2.2 Event extraction

Whereas named entity recognition is a rather well understood topic, event extractionis somewhat more experimental and by necessity more closely bound to the type ofevents that are supposed to be extracted. A given event type is usually capturedaccording to a given template – essentially a database table or a set of formalassertions – whose valencies are filled from entities that are isolated in the free text.As a rule named entity recognition is a part of this explication process as namedentities frequently occur in the description of events.

To illustrate this situation a typical example of the description event from the WallStreet Journal of 1993-02-19 may help. This example is lifted directly from GATEInformation Extraction:

Page 81: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 81

New York Times Co. named Russell T. Lewis, 45, president and generalmanager of its flagship New York Times newspaper, responsible for allbusiness-side activities. He was executive vice president and deputy generalmanager. He succeeds Lance R. Primis, who in September was namedpresident and chief operating officer of the parent.

Ideally event extraction might automatically capture the series of events implied inthis article according to a job-related template with fields such as organization, jobtitle, newly appointed person, and previous job holder. In reality this is often highlynon-trivial, as exemplified by the number of anaphoric references (“he”, “who” and“the parent”), the need for inference (Primis obviously was the previous job holderand has now been promoted) and the amount of encyclopaedic knowledge (NewYork Time Co. is the holding for the newspaper) needed for interpreting even thisshort and seemingly simple news bulletin.

Unsurprisingly results tend to be better if the source material already follows somerecurrent pattern, as is the case, e.g., for many job postings or medical records, butalso, interestingly, for news articles on violent events such as bombings or earthquakes.

The number of readily available tools for event extraction is smaller than that fornamed entity extraction, and they need to be heavily tailored for any given type ofevent extraction and template. One example for such a tool is the open source GATEInformation Extraction package. Commercial offerings include the OpenCalais suiteof web services.

7.3.2.3 Tourism-specific information extraction

Information extraction for tourism specific data necessarily has to deal with a numberof different types of events such as performances, sports events, entries from eventcalendars, etc. Each of these can have its own display rules and needs its owntemplates. Furthermore, pertinent data is regularly spread across many sources inmany different languages and must hence support parsing from many languages.Ideally, it should then be stored in language-independent templates based onlanguage-neutral concepts hierarchies. For end-user consumption the templatesmust be rendered in various languages, ideally fully automatically. Already the FP4project on Multilingual Information Extraction for Tourism and Travel Assistance(MIETTA, 1998–2000) worked precisely on these issues. Xu, Netter, Stenzhornelaborate on two event types, adult education courses and theatre performances anddescribe the MIETTA system developed in the project. Sadly, they do not publish anydata on the reliability of the system by testing the extracted information againstmanually captured data, as would have been normal. To know this would obviouslybe a precondition to gauge the viability of the project’s approach.

A brief follow-up project on a Multilingual Information Environment for Travel andTourism Applications (MIETTA-II) was funded under FP5 from 2001 to 2002. Itsprimary goal was to commercialize the findings of MIETTA. Unfortunately, there isvery little information publicly available on MIETTA-II. It is not clear if the projectactually achieved its primary objective and if the results of the two projects were everdeployed in real-life settings.

Page 82: eTOUR CWA final 2009-06-03

82 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

Current research seems to have largely abandoned information extraction in thetourism domain and have opted for semantic web approaches to interoperability.Such approaches have been analyzed in projects such as SATINE (2004–2006) andconcentrate on the semantic description of web services that give access to alreadystructured data.

7.3.3 Gaps and future needs

7.3.3.1 Named entity recognition

For eTourism named entity recognition is a key to linking extracted information withgiven locations or organizations such as hotels, theatres, or other relevant players.For this purpose one need agreement on an suitable model to unambiguously linkthe names of organizations against a suitable vocabulary of organizational units inthe eTourism domain, possibly based on the 29 types proposed in “Annotationguidelines for answer types” [Brunstein, 2002]. These findings need to be validatedagainst sample data to test the level of granularity and a sufficient precision in thetagging.

7.3.3.2 Event extraction

Event extraction is still a research area, though, as we have seen, first applicationsare operational, e.g. in the news arena. Standardization in this area would bepremature, though.

7.3.3.3 Tourism-specific information extraction

Event extraction for eTourism is still very much an area of research. In particular itmisses performance tests that would allow an informed decision on the precision thatcurrent systems can reach. Given the great potential that information extraction canhave for the domain, it would be highly desirable to have such data.

7.3.4 Recommendations

7.3.4.1 Short-term recommendations (1–3 years)

Foster the use of semantic web technologies to describe non-structured dataon the web by the means of resources to make data machine processable.

Semantically tag non-structured information.

7.3.4.2 Long-term recommendations (3–10 years)

Agree on the name tags (labels) (preferably with intervention of a recognizedbody such as the W3C) representing particular tourism content ought to have,so that it is made visible for search machines.

Develop SW that enables (semi)automatic information tagging according tothe previous recommendation.

Page 83: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 83

7.4 Inter-ontology mapping

7.4.1 Needs and requirements

7.4.1.1 Introduction

The mapping between an integrated global ontology and local ontologies maysupport enterprise knowledge management and data or information integration. Inthe Semantic Web an integrated global ontology extracts information from the localones and provides a unified view through which users can query different localontologies. In an information integration system a mediated schema is constructedfor user queries. Mappings are used to describe the relationship between themediated schema, i.e. an integrated global ontology and local schemas.

7.4.1.2 Needs

There may be different airlines flying to the same destinations from same origins, andthat information has to be shown to the final user in order for her to make a decisionon the most convenient way to travel.

Tasks on distributed and heterogeneous systems demand support from more thanone ontology. Multiple ontologies need to be accessed from different systems. Inaddition, the distributed nature (conceptualization) of ontology development has ledto dissimilar ontologies for the same or overlapping domains. Therefore, variousparties with different ontologies do not fully understand each other and they cannotwork together as a consequence, not allowing electronic transactions. To solve theseproblems it is necessary to use ontology mapping to achieve interoperability amonginformation sources and enable effective and efficient business transactions over theInternet

7.4.1.3 Requirements

Information sharing and integration does not only have to provide full accessibility todata. In addition it ought to make that data fully processable and interpretable bymachines as well. One possible way to achieve effective heterogeneous informationintegration is creating links among already existing ontologies. There are differentways to map ontologies among them: from an integrated global ontology into localontologies, local ontologies among them and ontology mapping in ontology mergingand alignment.

Ontology mapping between local ontologies provide interoperability for highlydynamic, open and distributed environments, i.e. tourism. It can be used formediation between distributed data in such environments. This kind of mapping ismore appropriate and scalable than mappings between an integrated global ontologyand local ontologies. It enables ontologies to be contextualized as it keeps contentlocal. It can provide interoperability between local ontologies when different localontologies cannot be integrated or merged because of mutual inconsistency of theirinformation.

Page 84: eTOUR CWA final 2009-06-03

84 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

With the growing use of ontologies in different domains of interest, the problem ofoverlapping knowledge in a common domain becomes critical. The complexity of thetravel and tourism industry could by no means be represented by a single ontology,thus multiple ontologies would have to be accessed from various applications. Inter-ontology mapping could very well provide a common layer from which severalontologies could be accessed and hence could exchange information in semanticallysound manners.

7.4.2 State of the art

The task of integrating heterogeneous information sources put ontologies in context.They cannot be perceived as standalone models of the world but should rather beseen as the glue that puts together information of various kinds. Consequently, therelation of ontologies to their environment plays an essential role in informationintegration. The term mapping is used to denote the connection of ontologies to otherparts of the application system. The two most important uses of mappings requiredfor information integration are mappings between ontologies and the information theydescribe and mappings between different ontologies used in a system.

Many information integration systems use more than one ontology to describe theinformation. The problem of mapping different ontologies is a well known problem inknowledge engineering. General approaches that are used in information integrationsystems are:

Defined Mappings: A common approach to the ontology mapping problem is toprovide the possibility to define mappings. Different kinds of mappings aredistinguished in this approach starting from simple one-to-one mappingsbetween classes and values up to mappings between compound expressions.This approach allows a great flexibility, but it fails to ensure a preservation ofsemantics: the user is free to define arbitrary mappings even if they do notmake sense or produce conflicts.

Lexical Relations: The approaches extend a common description logic modelby quantified inter-ontology relationships borrowed from linguistics. Some ofthe relationships used are synonym, hypernym, hyponym, overlap, coveringand disjoint. While these relations are similar to constructs used in descriptionlogics they do not have a formal semantics. Consequently, the subsumptionalgorithm is rather heuristic than formally grounded.

Top-Level Grounding: In order to avoid a loss of semantics, one has to stayinside the formal representation language when defining mappings betweendifferent ontologies. A straightforward way to stay inside the formalism is torelate all ontologies used to a single top-level ontology. This can be done byinheriting concepts from a common top-level ontology. This approach can beused to resolve conflicts and ambiguities. While this approach allowsestablishing connections between concepts from different ontologies in termsof common superclasses, it does not establish a direct correspondence. Thismight lead to problems when exact matches are required.

Semantic Correspondences: An approach that tries to overcome the ambiguitythat arises from an indirect mapping of concepts via a top-level grounding isthe attempt to identify well-founded semantic correspondences between

Page 85: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 85

concepts from different ontologies. In order to avoid arbitrary mappingsbetween concepts, these approaches have to rely on a common vocabularyfor defining concepts across different ontologies.

7.4.3 Gaps and future needs

Ontologies have been widely used in a large number of information systems fordifferent purposes. However, there is still a lot to be done in order to successfullymediate information exchange and integration processes.

Although reasonable results have been achieved on the technical side of usingontologies for intelligent information integration, the use of inter-ontology mapping isstill an exception. Reviewing the literature, it seems that most of the mappings havebeen realised ad-hoc, i.e. for the particular purpose of the mapping itself, especiallyfor the connection of different ontologies. There are approaches that try to providewell-founded mappings, but they either rely on assumptions that cannot always beguaranteed or they face technical problems. There is a need to undertake researchon mapping methodologies for general purposes.

Most systems only provide tools to develop ontologies, and they fail to indicate aparticular methodology to develop them. The comparison of different approachesindicates that requirements concerning ontology language and structure depend onthe kind of information to be integrated and the intended use of the ontology. There isa need to develop a more general methodology that includes an analysis of theintegration task and supports the process of defining the role of ontologies withrespect to these requirements.

By the use of ontologies inter-ontology mapping could offer ontology-basedmediation amongst diverse information sources. eTourism is an industry with a strongneed for interoperability among all agents that take part in the value chain if anintegrated service (general market trend) is to be provided. While languages forrepresenting semantic models have been intensively studied, semantic mapping is anopen and active research field still in very early stages. Still, there are a number ofissues that need to be overcome in the near future:

Identify incompatibilities between different data models and representationstructures.

Define the degree of formality information needs to be defined with, so thattwo different data structures can effectively and as automatically as possiblebe compared, merged, and eventually mapped.

Language Heterogeneity is still a problem. There are a number of standardlanguages (with different degrees of formality) for representing semanticmodels. However, one of the major challenges still is the translation betweenmodels encoded in different languages. The challenge is to providetranslations with guaranteed formal properties.

The Nature of Semantic Relations: Most existing mapping approaches use avery limited set of semantic relations that can hold with elements from differentmodels. In particular implication and equivalence are frequently used. Manyrealistic settings, however, demand for richer relations such as inconsistency,

Page 86: eTOUR CWA final 2009-06-03

86 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

effect-cause relations or overlap. Very limited work exists on approaches formeasuring the degree of relatedness specified by a mapping. This is inparticular important when mappings are created by automatic mapping tools.A very specific problem with respect to semantic relations is the definition ofsemantic relations between models that describe the domain of interest atdifferent levels of abstraction.

A general observation about the state of the art of mapping representations isthat mappings are not yet considered to be first class entities in semanticmodels. While most approaches agree on elements such as conceptsrelations and instances, mappings are not yet an agreed element of semanticmodelling. Important operations on mappings such as reasoning about,retrieving and composing mappings are currently not supported.

A Framework for Comparing Mappings: A very concrete research task is todesign a common framework for comparing existing mapping approaches.

Text mining: Text analysis and conversion into a formal representing modelthat can automatically be linked into the common model of the tourism domain.

Identify techniques and methodologies in order to solve these problems. Theproblem of semantic mapping needs to be further automated; however,complete automation is not expected to be reached.

Tools that support the creation of a common ontology that enables semi-automatic mapping of data structures with data representation models.

Tools that act as information and mediation brokers (data and informationconversion) between the common and particular models.

7.4.4 Recommendations

Recommendations within this section are by nature very similar to therecommendations proposed within chapter 6.3 (“Ontologies”).

7.4.4.1 Short-term recommendations (1–3 years)

Foster the development of ontologies using the same standard definitionlanguage as well as the same degree of formality and expressivity to easeautomatic ontology mapping, following W3C recommendations.

7.4.4.2 Long-term recommendations (3–10 years)

Based on short term recommendations, build graphic user interface basedtools that automatically merge and link ontologies using the ontologies'reasoning capabilities to automatically find and resolve alignment andinconsistencies.

Page 87: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 87

8 Process handling

8.1 Needs and requirements

8.1.1 Introduction

Consumers in the tourism industry are getting more and more used to make onlinetransactions, and the industry is competing with services to attract these customersand get them to the actual booking act as fast as possible. Traditional distributionchannels are vanishing, and more flexible and dynamic networks rise. This verydynamic development puts pressure on service providers: Business actors have tofollow demand to keep or expand their market share, otherwise they might getcrowded out.

These challenges require skills in marketing but most of all in deploying moderninformation technology to manage the actual buying or booking process. Thisprocess and other processes in the domain alike usually require the participation ofdifferent players along the value chain to be fulfilled, making it necessary to interacteasily with other computer systems on a process level. But the management ofbusiness processes is already difficult within one organization, making it a muchmore sophisticated challenge in a network of organizations.

We want to start the discussion on the topic by looking at broadly accepteddefinitions. Davenport [1993] defines a (business) process as “a structured,measured set of activities designed to produce a specific output for a particularcustomer or market. It implies a strong emphasis on how work is done within anorganization, in contrast to a product focus’ emphasis on what. A process is thus aspecific ordering of work activities across time and space, with a beginning and anend, and clearly defined inputs and outputs: a structure for action. ... Taking aprocess approach implies adopting the customer’s point of view. Processes are thestructure by which an organization does what is necessary to produce value for itscustomers.”

Although this is a very customer-oriented definition, it is first of all broadly accepted,and second it has an important phrase: “A process is thus a specific ordering of workactivities across time and space, with a beginning and an end, and clearly definedinputs and outputs.” Something similar comes from Rummler, Brache [1995] sayingthat “a business process is a series of steps designed to produce a product orservice”.

As already outlined in the introduction to this topic, in the context of information andcommunication technology we consider a process to consist of data, being defined asinputs and outputs, and of its execution, being a “work activity” or step. The problemof data heterogeneity across different systems is part of the chapter on semantics,while we want to discuss the dynamic aspect of executing processes by involvingheterogeneous computer systems in this chapter.

In fact even the one-time exchange of data is already a simple process, which impliesthat data cannot be exchanged without having some kind of processes being

Page 88: eTOUR CWA final 2009-06-03

88 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

involved. Since this is already true for web sites being “crawled” to get information,we do not want to consider passive process participation in our discussion of thematter. Instead, we consider a rather complex interplay of at least two participants.This has always been a problematic issue, being a more critical challenge comparedto mere exchange of data. This issue becomes even more pressing within a highlynetworked, dynamic and diverse environment like the tourism industry today. Theintroduction of standards is of course always a sophisticated way to meetinteroperability issues, but we know from the past that it is difficult to find industry-wide acceptance. One reason is the loss of flexibility accompanying standards,another one the game of market forces.

Since we leave the problem of data mediation to the chapter of semantics, and sincewe consider complex processes, we have named this chapter “Process handling”,and we consider it the dynamic component of process interoperability with the needof active participation of all actors involved.

8.1.2 Needs

Under this chapter the basic and principle needs for process interoperability areanalysed and discussed, while requirements are outlined in the following chapter.The challenge is to find ways of process interoperability between heterogeneoussystems that allows an easy integration of business processes and leaves theautonomy and diversity of the different players, which is needed to correspond withthe diversity of requirements on a global scale. The following discussion does nottouch business issues like pricing, virtual or ad-hoc organizational forms, dynamicpacking, legal aspects, etc. The intention is also not to design platforms for theseissues; it is merely about discussing and recommending one or several ways to allowprocess interoperability.

According to the definition presented in the introduction, a process has a clearlydefined starting point, a clearly defined ending point (an objective), and might have anumber of interim steps. The following use case illustrates in simple terms a bookingprocess in the tourism industry: The starting point is a user with the intention to booka room in a hotel he has already selected. The starting point is a clearly identifiedobject; the ending point is a booking confirmation for this object for a specific date.The process could be broken down into the following steps:

1. Check availability of the room for the date given.2. Get customer’s acceptance of terms (e.g. room rate for that date) and

payment details.3. Make reservation and print booking confirmation.

These three steps are very basic and might have some backward loops (e.g. if theroom is not available) or sub-steps (the check for availability might include atemporary blocking of the room for the specific date). But most of all, parts of theprocess might run on other systems. Imagine that the booking is done on a portalcomprising a number of different hotel chains. The checking of availability is done ona hotel chain’s computer system and the check for approval of payment by creditcard (a pre-requisite for making a reservation) is done on a third system.

Page 89: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 89

This short use case illustrates that a business process can be broken down indifferent steps (or sub-processes), which might need the interaction of differentsystems. The entire process could be drawn by using a flow chart showing thedifferent steps and their dependencies. In any case the completion of the entirebusiness process requires the handling of the steps and conditions. For example, ifthe room has been reserved in a first step and in a later step the credit card is notaccepted, then the reservation of the room must be cancelled. Or it is cancelledautomatically after some time if there is no confirmation which is required to completethe booking. This is up to the design of the process on the hotel’s side. However, theportal, owner of the entire business process, might need to deal with as manydifferent systems as hotel chains are presented on the portal. And each of thesystems might have different naming for reserving a room (booking, reservation,locking, etc.) and different conditions (requires confirmation to complete thereservation, cancels reservation automatically after some time without confirmation,keeps reservation alive until status is changed, etc.).

Although the portal requires just one step to be done on another system, it mighthave to deal with 100 different ways how to deal with this step if 100 hotels areinvolved. And each hotel might have to deal with 100 booking systems if they makebusiness with 100 portals. Thus each actor might have to implement 100 interfaces tobe interoperable with the required other systems. It is obvious that this is increasingdramatically the efforts to run processes automatically with other partners.

Since this simple use case is a very common use case and the industry is dependingmore and more on the interaction of different computer systems, we can assume astrong need for a solution that decreases the complexity for process interoperabilityin a networked environment. Typical business processes in the tourism industry are:

searching, selling and buying, reservation, booking, modification, cancellation, confirmation, notification, payment and other money transfers.

This list might not be complete and could provoke a lot of discussion (e.g. thedifference of buying and booking, confirmation and notification, etc.). However, itshall only bring examples of the frame of possibilities we are discussing. To performall these processes in a networked environment we can assume that

“the basic industry need is an applicable concept for the technical interactionof heterogeneous ICT systems to provoke and run complete business processcycles involving at least two different technical systems.”

“Applicable concept” shall express the need for something that is useful in dailybusiness life.

Page 90: eTOUR CWA final 2009-06-03

90 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

“Technical interaction of heterogeneous ICT system” focuses the topic on thetechnical level, leaving out business, legal, social or any other aspects. It has to bepossible to run processes on different systems regardless of their technicalspecification (frameworks, programming language, data base, etc.). However, it isobvious that these different systems cannot be completely different. They need atleast a, whatever kind of, network connection and some protocol to be able to interactwith each others. In this context, and due to its relevance, it is assumed that aninternet connections and industry-wide used protocols (e.g. TCP/IP) and standards(e.g. XML) are available.

“Complete business process cycle” means that a business process, wherever it startsor ends, should be carried out completely as defined.

“Involving at least two different technical systems” shall define that the topic can havea bi-directional setup, but in any case has to be flexible enough to be run in anetwork of different technical systems, thus more than two and up to an unspecifiednumber.

The design and management of business processes is a subject on its own, but forthe discussion following it is enough to say that business processes can be brokeninto a number of steps. Each of the steps needs a trigger to initiate the step, hassome conditions to be started and delivers an output, including information requiredfor the performance of the overall process (e.g. trigger for another step).

Furthermore, we assume that a step can only run on one system. If it requires twosystems this step must be broken down into several steps. This assumption isreasonable, because despite of the discussion whether this is technically feasible, itis necessary to have the authority over a step with one actor. Otherwise two or moreactors would be responsible for the same step which is obviously not feasible inpractice.

8.1.3 Requirements

The chapter about requirements shall bring the industry needs, as described above,in a more structured and operative form.

1. Network capability: Ability to run one complete business process as aninterplay of an unlimited number of different and clearly separated computersystems.

2. System independence: Ability to be deployed independent from the ICTsystem used, especially independent of:

o databases,o data structuring,o operating systems,o frameworks.

3. Player independence: Ability of each player to participate in the samebusiness process but initiated from an unlimited number of different players.

4. Process range: Flexibility to run each business process possible in thetourism industry.

Page 91: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 91

5. Player’s autonomy: Leave autonomy and flexibility to the individual player tochange own system and consume external data in an autonomous way.

6. Cost effectiveness: Low cost of integration and operation.

7. Stability and reliability: Including fault secure system and error handling.

8. System performance: High performance to allow fast transactions andcomprehensive handling.

9. Security: Ability to meet security and trust requirements like:

o data encryption,o partner identification,o fraud resistance.

10.System openness: Accessibility and availability, e.g. whether the system ispublicly available or the specification is open.

8.2 State of the art

The following systems are currently the state of the art in the tourism industry:

8.2.1 Global standardization efforts

Standards define a formulized schema how to handle processes - thus they usuallyleave only little flexibility for the participants who have to comply with the standards.Furthermore they flatten diversity, since no deviations from standards are allowed.Standards can provide rather rigid rules, as for example the standards set by theOpen Travel Alliance or by OASIS (ebXML). They define on a concrete level thedata-schemas and process rules. Developing such a standard is in general a lengthyprocess, initiated either by market power or by industrial or governmental interestgroups. And the implementation of a standard can be a complex and expensiveprocedure, which makes it especially difficult for smaller players.

Additionally, there are also more flexible initiatives, giving a framework within whichplayers can adapt according to their needs, based on a non-cohercitive language thatallow to express common basic element in a similar ways for all players but at thesame time allow combination of those elements in different ways so as to allowdiversity. Cost of implementation can be reduced compared to a full standard sinceall players may publish different levels of services. The use of templates also allows acertain flexibility in the format of responses according to requesters. A drawbackstems from the fact that integrating different players may require certain adaptationsdue to commercial or system driven specificities. On the other hand this fact allowscompetition and diversity. An example of such a language is the XFT (Exchange ForTravel) language.

8.2.2 Application Integration and APIs

Application Integration and Application Programming Interfaces (API) allow a 1:1interplay of different players. The type of interplay is defined by the partners involvedjointly or by one central partner having the market power to do so (like in the case ofAmadeus). Application Integration is a much deeper way of system interplay requiringdevelopment work to fulfil the purpose by “integrating” the systems, while an API is a

Page 92: eTOUR CWA final 2009-06-03

92 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

gateway for external systems where the corresponding partners do not care aboutwhat happens behind the partner’s gate.

This is feasible when having a central player, but is not feasible in open and dynamicnetworks, since an interface for each player has to be developed. This increasesdrastically complexity and cost of implementation. However, Application Integrationand APIs are better suited to handle different processes and are more responsiveregarding specific requirements for the systems involved.

8.3 Gaps and future needs

Service providers in the tourism industry are faced with a fast changing and highlydynamic environment. They have to meet changing market requirements in shortertime and more cost-efficient. They need to offer enhanced functionalities to theircustomers and at the same time need to run processes within the interplay ofdifferent systems, including the integration of external information systems.

The following table helps to highlight the current state of the art as described abovemeet the needs and requirements identified:

Criteria Standards Application integration

Network capability rather no no

System independence rather yes rather yes

Player independence yes rather no

Process range rather no rather yes

Player’s autonomy no no

Cost effectiveness rather no no

Stability and reliability yes rather yes

System performance rather yes yes

Security indifferent rather yes

System openness rather yes no

The different entries might well be questionable and can raise discussions, but ingeneral they reflect well the current situation: Standards and Application Integrationare not fully suitable for a highly networked and dynamic environment like the tourismindustry today. They result in a loss of autonomy, need some central entity or controlof power, and are expensive.

For each process run over different systems the interface needs to be specified,developed and maintained separately, since they do not all make use of the samestandards or interfaces. If a new version of a standard or interface is published itcannot be used automatically. It needs to be deployed and maintained manually. It is

Page 93: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 93

obvious that a more flexible solution with a mediating technology meets better therequirements than rather rigid technologies.

Current research projects touch the issue of process interoperability by focussing onSemantic Web Services and Multi-Agent Technologies, but also Grid technologies,with the aim to develop intelligent and adaptive systems for the interplay ofheterogeneous systems. Examples for these projects are:

Agent Link: http://www.agentlink.org/

ArguGRID: http://www.argugrid.eu/

ARTEMIS: http://www.srdc.metu.edu.tr/webpage/projects/artemis/

ASG: http://asg-platform.org/cgi-bin/twiki/view/Public

BREIN: http://www.eu-brein.com/

DIP: http://dip.semanticweb.org/index.html

SATINE: http://www.srdc.metu.edu.tr/webpage/projects/satine/

SUPER: http://www.ip-super.org/

These projects address the need for a more flexible and cost efficient way to alignbusiness processes between different systems. A promising way is the concept ofSemantic Business Process Management, resulting from the application of SemanticWeb Services to Business Process Management, as for example discussed by Heppet al [2005]. Based on this concept, Cimpian, Mocan [2005] proposed a processmediator, adjusting the bi-directional flow of messages based on the Web ServiceModeling Ontology (WSMO). This approach is similar to that chosen by theHarmonise project (http://www.harmonet.org/) for data mediation, in which atechnology for mediating between heterogeneous data sources was developed. TheHarmonise technology allows involved parties to exchange information withoutchanging the local data structure, only by referring to a common understanding of adomain-specific ontological concept, the Harmonise Ontology [Fodor, Werthner,2005].

8.4 Recommendations

8.4.1 Short-term recommendations (1–3 years)

Simplify and rationalize existing processes – use stateless process handling orrequest-response-pairs only.

Build an ontology of common processes in the tourism industry.

8.4.2 Long-term recommendations (3–10 years)

Develop process mediators. Put research efforts into intelligent agent technologies for automatic process

handling.

Page 94: eTOUR CWA final 2009-06-03

94 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

9 Metasearch

9.1 Methodology

9.1.1 Needs and requirements

9.1.1.1 Introduction

Metasearch is the ability to run one search process over different search engines ofheterogeneous instances (platforms, websites, databases) and aggregate result in aunified list. In the tourism industry they are typically used to compile and comparespecific offers. Examples are:

Checkfelix: http://www.checkfelix.at/,

Kayak: http://www.kayak.com/,

Farechase: http://farechase.yahoo.com/,

Trabber: http://www.trabber.com/,

Kelkoo Travel: http://travel.kelkoo.co.uk/, and

Minube: http://www.minube.com/.

Typically, search results are not stored in a database, but delivered as real-timeresults. However, some systems make use of data replication for static data (like,e.g., hotel descriptions).

9.1.1.2 Quality of results

Metasearch engines in tourism rely on quality of data, especially regarding accuracyof information. Quality of results obviously depends on the understanding of thesame information provided by different systems, like:

data organization: the structuring and naming of data;

data understanding: the encoding (like different categories) and precision ofinformation (like miles and kilometres);

data depth: the availability of different information items in different systems(like disclosure of price composition);

data accuracy: like prices and availability.

9.1.1.3 Response time

An acceptable response time for search engines is of high importance to meet user’sexpectations. Metasearch engines depend on the response time of the other searchengines and have to face clever algorithms to avoid deadlocks. However, this is lessa matter of information and process interoperability, except that runtime performancein aggregating data might be improved.

Page 95: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 95

9.1.1.4 Access to data

Data can either be accessed by getting it automatically from the web interface (userinterface) or via a data interface (e.g. web services). Semantic Annotation of contentand Semantic Mapping, so that the metasearch engine can find the information thatis required provides part of the answers and are detailed in the correspondingchapters of this study. However, there are still some remaining issues for instanceregarding data encapsulated in pure graphical applications such as Flashapplications, because the data is not accessible at all. Possible solutions come fromthe new technology trends such as Flex applications where the whole application isXML based. Other issues stem from client-side calculations (e.g. options depend ondifferent settings, prices are calculated on the fly by client side). In that case, the datais directly hard coded in the application and would require interpretation of the codeto access the data and the corresponding rules.

Another aspect of access to data is covered in the section on querying (9.2).

9.1.1.5 Efforts for maintenance

The search on another system is often tailored to the particularities of the foreignsystem. Interfaces have to be updated each time the other systems changes to keepthe service level, if these interfaces do not follow a given pattern or standard.

9.1.2 State of the art

9.1.2.1 Web crawler

Web crawlers (synonyms: Robots, Bots, Spiders) are software scripts and programsthat browse the World Wide Web in an automated manner to create copies ofwebsite (which are processed by other software agents later) or to gather specificinformation. They are used by search engines but are typically not used formetasearch processes, since they are normally only gathering information and notrunning processes on other websites.

9.1.2.2 HTTP requests

HTTP requests can be used to run automated search queries on existing searchengines by rebuilding the HTTP request that is used on each of the external sites.HTTP requests are very maintenance-intensive, since each little change in the HTTPrequests requires an up-date of the process. Depending on the external site, data issent back in an unstructured or in a structured manner and needs to be processed tobring it into the scheme (semantics) used for displaying results by the metasearchengine. Depending on the provision of results on the external systems, HTTPrequests can be used as a light-weighted, yet still maintenance-intensive, way forrunning a metasearch process.

Page 96: eTOUR CWA final 2009-06-03

96 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

9.1.2.3 Website wrapper

Website wrappers allow to grasp the unstructured information provided on websitesand transform them into a structured form. Advanced wrappers allow to run more intodeeper information architectures than webcrawlers, but have to be adapted for eachwebsite that has to be wrapped. The provider of the website that is wrapped does notneed to make any changes or adaptations, thus the semantics of the wrapper can beapplied to the search process.

Since advanced tools can run different operations, website wrappers are well suitedfor metasearching. Still they need considerable maintenance efforts since a changein the wrapped website requires an update of the wrapper.

9.1.2.4 Application Programming Interfaces (API)

APIs are the classic way for interactions between different computer systems andallow a broad range of possibilities. They can be independent of programminglanguage and are therefore open for any kind of integration and informationexchange. It has to be mentioned that the implementation of an API typically causesconsiderable efforts, and there is no general standard for APIs, since APIs varysignificantly from the purpose and the domain concerned.

9.1.2.5 Web services

A web service is a software application enabling the exchange of data in XML formatto allow machine-to-machine (M2M) interaction on different platforms. Different fromHTTP requests and web crawlers, the provision of a web service has to beimplemented by the service provider, who is identified by listings in registries (UDDI).

A similar approach is REST (or RESTful web services) to allow the exchange ofdomain-specific data over HTTP without an additional messaging layer like in webservices. It is often described as an easier form of web services.

Web services and REST provide means to provide data in a structured(understandable) way and are therefore well suited to get information for metasearchqueries. However, it still leaves out the problem of different ways to describe thedata.

9.1.2.6 Semantic annotation

Semantic annotations provide methods to add metadata to documents, allowingsome formalised understanding of the information provided in these documents. Inthis way the meaning (semantics) of information can be understood automatically byinformation systems (see also chapter on semantic annotations, 7.2).

Three different approaches exist do add semantic annotations to a document:

1. Embedded annotations are added into the document.2. The document refers to another document providing the annotations.

Page 97: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 97

3. Annotations refer to the document concerned.

Semantic annotation is an enabler for metasearch, making it easier to find andunderstand resources with information relevant to the search criteria. It is not asearch method itself and still depends on a common schema to describe theannotation, but can be a powerful method in combination with web crawlers.

9.1.2.7 Caching mechanism

Caching mechanisms aim to provide extensive content directly disclosed by vendorson a regular basis (every day, hour, etc.) in a common language to be easilyintegrated by metasearch engines. Many metasearch engines in fact constituteinternal caches from cached data provided by the vendors on a regular basis so as toimprove response time and improve the probability to be present on the metasearchengine (because sources with poor response time are unlikely to be displayed inmetasearch engines).

9.1.2.8 Summary

The methods described above can be divided into two groups.

The first group comprises methods where the agent providing a metasearch enginecan integrate other search engines without any assistance from the search enginesused (web crawler, HTTP requests, website wrapper). Thus the metasearch agent ismore independent from the other systems. These methods are therefore moreflexible, but cause considerable efforts for the implementation and maintenance of ametasearch service. However, they would obviously cause less effort if standards aresupported or interoperability problems are solved.

The second group comprises methods, where the assistance of the external searchengine is required, where some kind of interface is provided or where other changesare necessary. Clearly, these methods make it easier to implement and maintain ametasearch service, but require the application of standards or the solving ofinteroperability issues to run smoothly.

9.1.3 Gaps and future needs

The tourism industry is characterised by a vast number of search engines to searchfor flights, accommodations, events, attractions and other tourism services.Metasearch engines are important tools to provide a one-stop access to thisinformation on a regional, national or transnational level.

The methods for metasearch described above provide useful tools to integratedifferent search engines, but quality of results, response time, access and efforts formaintenance depend very much on the use of standards or the ability to understandthe other system in another way. Especially the combination of website wrapper andsemantic annotations to websites seem a promising way to enable improvedmetasearch functionalities. The deployment of metasearch engines could be

Page 98: eTOUR CWA final 2009-06-03

98 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

supported sharply if either broadly accepted standards or means for theinteroperability of tourism related information could be provided.

One important direction metasearch engines can take is that of semantics. Semanticsearch engines are becoming increasingly popular. Semantic search engines aresystems that need to understand (the meaning of) both what the user is asking for aswell as the information that is stored in the web. Any semantics-based queryrecognizes key words used in order to carry out a search and uses that sameinformation in order to display more precise results. The final and main objective ofthis search technique is to find all documents on the web that contain the mostrelevant information related to the query (i.e. those that syntactically match with thesearch keywords), minimizing the number of false results.

Additionally, semantic information enables the inference of new knowledge fromsemantic documents, based on various logic rules together with classes, attributesand values. This knowledge can to be stored and processed, which is not done bytraditional search engines. The resulting graph from a semantic data collection isdifferent from the one created with hyperconnected HTML-based documents. Thus, asemantic search engine is different from traditional ones, because a semantic searchengine searches on semantic annotations of content, i.e. annotations realised usingdomain ontologies.

Another big challenge for metasearch engines, after understanding the content, is thesystem performance. All methods have to fetch data from different systems,transform them and display them in appropriate manner (ranking, paging, etc.). Themore sources the systems queries, the more benefit it offers for the user, but theslower the system becomes.

9.1.4 Recommendations

9.1.4.1 Short-term recommendations (1–3 years)

Make use of semantic technologies to describe your data. Provide content and meta-content as close to an existing standard as

possible. Provide regularly updated, external data store with pre-processed and well

described content for fast querying (caching mechanism), if you have largerquerying process times or complex queries.

Development of aggregated data repositories, providing pre-processed datafrom different sources.

9.1.4.2 Long-term recommendations (3–10 years)

Focus on development of fast and easy to use alternatives of metasearchtechnologies, enabling or supporting use of semantic technologies for datatransformation.

Page 99: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 99

9.2 Querying

9.2.1 Needs and requirements

9.2.1.1 Introduction

More often than not the information involved in eTourism transactions is distributedacross a number of different data stores, usually operated by different companies:various GDSs (potentially in their respective national incarnations), CRSs, othersources, not to forget the plethora of unstructured data such as the web. Asdiscussed throughout this section, we often need to find information in and acrossmany of these data sources and, indeed, often for the data sources themselves.

However in many scenarios applications need to go beyond mere search and toquery across data sources for data sets that meet very specific sets of constraints.Typical queries might be:

List all hotels in Rome with at least three stars that have availabilities betweenOctober 20th and 22nd.

List all prices for flights to Rome that fly in on the morning of 20th and returnon the evening of the 22nd.

In many cases queries could be much more complex still and be combined withconstraints based on geographical data (hotels not more than 500 metres from theSpanish Steps), price ranges (not more than EUR 100 per night), etc. In many casessubsequent queries will build on the existing result sets of simpler queries and furtherrefine them in a piecemeal manner.

Going more into detail of human search behaviour, we can observe that users are notsearching for hotels “not more than 500 metres from the Spanish Steps” but rather forhotels “close to” or “near” to the Spanish Steps. However, a hotel “near” Rome mightdescribe another distance than a hotel “near” the Spanish Steps. The translation ofhuman search needs or peculiarities into a respective machine-readable searchquery covers aspects of interoperability we are not going to cover in this chapter,which focuses on machine-machine interoperability. Nevertheless, natural languageprocessing and the transformation into search queries remain important aspects andchallenges in querying.

9.2.1.2 Needs and requirements

Fast transactions

Queries along these lines are typical parts of the selection phase in eTourismtransactions. A given transaction will often involve a considerable number of queriesas the customer or her agents are narrowing down their result set to a small numberof hits that fit the demands. Queries therefore must be fast and return results within amaximum of a few seconds. Nevertheless, we can observe metasearch engines onthe market today taking minutes rather than seconds to run real-time queries onexternal systems.

Page 100: eTOUR CWA final 2009-06-03

100 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

Reflect complexity of search requirements

Queries must be able to be sufficiently expressive to model the customer’srequirements, either directly through a single complex query that enumerates allconstraints, or through a sequence of simpler queries that narrow down result sets.

100 % correctness of query results at this stage is highly desirable, but not absolutelynecessary. The ultimate corroboration or falsification of query results can follow at thebooking phase when an unbinding service offer is turned into a binding contractbetween supplier and customer.

Expansion of querying sources

The tourism industry is a highly dynamic environment, and data stores and searchengines appear, and also disappear, almost continuously. Content aggregation andsyndication become indispensable tasks for the provision of one-stop platforms.Ideally, integration of new data stores into a user or travel agency facing metasearchengine should be largely transparent, easy and thus cost-efficient.

9.2.2 State of the art

9.2.2.1 Methods for query distribution

Technically, the search query entered into current metasearch engines has to betranslated to other data stores for further processing. This can either be done by

1. manual translations on a case-per-case basis,2. use of query by example,3. use of standardized query languages, or4. use of standardized query interfaces.

Furthermore, queries can be truly federated or based on pre-harvested and regularlyupdated data that the participating data stores provide to the metasearch engine.Some engines also combine these two approaches.

At present, option 1 dominates. For federated queries individual data stores todayoffer their own query strategies. These strategies often reflect their historic evolutionand their specific internal processes. This makes querying one of the biggestchallenges for metasearch, since queries cannot be translated easily from onesystem to another. The integration of each new data store means considerablecustom programming, making it a costly and time-consuming enterprise.

For want of a standardized format, options 2 (“query by example”), 3 (“standardizedquery languages”) and option 4 (“standardized query interfaces”) are at present notwidely used, though they offer considerable potential for easier integration of datastores. All of them are related in that they propose to look at commonly agreed querylanguage.

Page 101: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 101

9.2.2.2 Query by example

“Query by example” was developed by IBM in the 1970s in parallel to what was tobecome SQL (cf. Ramakrishnan, Gehrke, 2002, chapter 6). The user suppliesexample result sets that can formulate constraints or other selection criteria inaddition to typical string values. Examples can often be built through graphical userinterfaces.

When looking for a hotel, for example, the client would specify basic hotelcharacteristics (e.g. name, category, etc.), and room category, and the system wouldon that basis return a suitable set of hotels [Höpken, 2004]. This way, query byexample partially relieves users to learn about formalized query languages and,instead, allows them to find related entries to known samples. However, it needsclear templates for the type of examples that can be constructed and used as thebasis for cross-data store queries.

9.2.2.3 Standardized query languages

Standardized query languages usually expect users – which in this case will normallybe system integrators rather than end-users – to learn a specialized language forquerying a system. The best known of these certainly is the Standardized QueryLanguage (SQL) for relational databases which is used by all current relationaldatabase management systems. At least the core features of ISO/IEC 9075, theinternational standard specifying SQL, are implemented by virtually all suppliers ofrelational database management systems.

SQL does not lend itself particularly well to federated queries and is normally used toconsult a given database instance. SQL-like syntax is used, however, for drill-downsearches in federated registries such as the ebXML Registry Specification. Likewise,the SQL syntax has heavily influenced the syntax of a number of other non-relationalquery languages such as the Object Query Language (OQL), Simple Protocol andRDF Query Language (SPARQL), and aspects of the Topic Map Query Language(TMQL). In the following we shall look at one new query language especially for(potentially federated) semantic queries.

SPARQL: The Query Language for RDF (SPARQL) is, as the name suggests, alanguage for querying RDF triples. This relatively new W3C Recommendation wasonly published in January 2008, but can already point to a considerableimplementation base. It can be used to query against a considerable number ofcommercial and non-commercial native triple stores – for a not complete list cf.http://esw.w3.org/topic/SparqlImplementations –, but also against adaptors such asD2R Server that sit on top of relational databases. This flexibility has encouraged thegrowth of a number of publicly available SPARQL endpoints, some of which are listedon http://esw.w3.org/topic/SparqlEndpoints.

SPARQL can honour the transitivity properties defined in RDF-S and OWLontologies.

Typical queries might look like:

Page 102: eTOUR CWA final 2009-06-03

102 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

SELECT ?resource

WHERE {

?resource dc:creator <http://www4.wiwiss.fu-berlin.de/gutendata/resource/people/Abbott_Eleanor_Hallowell_1872-1958>

which would list all resources created by Eleanor Hallowell Abbot available in a giventriple store, or

SELECT ?title

WHERE {

?book dc:language "en" .

?book dc:title ?title

} ORDER BY ?title

which lists all available titles of publications in English.

Unlike SQL, SPARQL can be used for distributed queries and aggregation of dataacross data stores [Schenk, Staab, 2008], [Haase, Wang, 2007] [Quilitz, Leser,2008]. Vocabularies can be cross-references and cross-queries across data stores.However, in the case of divergent ontologies being exposed in participating datastores, suitable mappings, e.g. to a reference ontology, must exist for a distributedquery to succeed. Such a search strategy could in principle scale to manuallyannotated data sources such as web pages annotated with RDFa.

That said, at present few, if any, examples of distributed SPARQL queries across anumber of nodes operated by different organizations are known to be used in aproduction environment, even less so queries including many individual web pages(though some commercial products such as Allegrograph(http://agraph.franz.com/allegrograph/) support elaborate and largely transparentfederated SPARQL queries and reasoning across distributed instances of thesystem). Little is known if the technology would, in fact, scale well enough for largeheterogeneous networks, and, if so, in which type of network topology.

Similarly, in spite of the rather positive overall SPARQL take-up in general, noendpoints in production use are currently known in the eTourism domain. SPARQLmay or may not prove to be a good choice for the domain.

9.2.2.4 Interface standardization

Just as the query language itself, also the interfaces to query services can bestandardized, e.g. through shared interface specifications in WSDL. Without a sharedquery language the expressiveness of such services is necessarily limited, but inmany cases the result sets even of simple queries can be subsequently refined infurther query steps.

The Open Travel Alliance specifies query interfaces in their schemas(http://www.opentravel.org/Specifications/SchemaIndex.aspx?FolderName=2008A),e.g. for the availability of cruises, of golf courses and many more. While the long-termbenefits of interface standardization may be less than that of a shared query

Page 103: eTOUR CWA final 2009-06-03

language, it is a chain of piecemeal, often informal standardization activities that canlower integration cost in the short

9.2.2.5 Metadata syndication

The alternative to distributed queriessyndication. In rather simple formssupplier’s CRSs – this is in many cases the practice in today’dumps simply replace all the supplier’s data.

Using simple syndication protocols based on Topic Map or RDF playloads onlyactually changed records are exchanged between data stores. An Atomgeneral purpose syndication protocol is sShare CWA (http://www.egovpt.org/fg/CWA_Part_1bfeeds and can thus import new, deleted or updated records on a casefrom their source registry, provided they have the necessary credentials to accessthe feeds – and their metadata can be mapped on a shared reference ontology.

CEN/ISSS WS/eTOUR – CWA – 200

language, it is a chain of piecemeal, often informal standardization activities that canlower integration cost in the short to mid-term.

Metadata syndication

The alternative to distributed queries is local data stores based on metadatasyndication. In rather simple forms – regular supply of data dumps generated from

this is in many cases the practice in today’s GDSs. Often suchdumps simply replace all the supplier’s data.

Using simple syndication protocols based on Topic Map or RDF playloads onlyactually changed records are exchanged between data stores. An Atomgeneral purpose syndication protocol is specified in part 1b of the nascent eGov

http://www.egovpt.org/fg/CWA_Part_1b). Nodes subscribe to changefeeds and can thus import new, deleted or updated records on a case

their source registry, provided they have the necessary credentials to accessand their metadata can be mapped on a shared reference ontology.

Figure 9-1

2009-06-03 – 103

language, it is a chain of piecemeal, often informal standardization activities that can

local data stores based on metadataregular supply of data dumps generated from

s GDSs. Often such

Using simple syndication protocols based on Topic Map or RDF playloads onlyactually changed records are exchanged between data stores. An Atom-based

pecified in part 1b of the nascent eGov-). Nodes subscribe to change

feeds and can thus import new, deleted or updated records on a case-by-case basistheir source registry, provided they have the necessary credentials to access

and their metadata can be mapped on a shared reference ontology.

Page 104: eTOUR CWA final 2009-06-03

104 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

Querying thus becomes a sub-problem of data integration, and queries can then berun locally against the aggregated data store. Updates can be pulled at shortintervals (say, every 10 minutes), thus providing cached queries with nearly liveresults.

9.2.3 Gaps and future needs

9.2.3.1 Query by example

Query by example (QBE) can be used without any specific query language, only bythe use of data samples. This makes it easy to implement when data interoperabilityis solved, independently of which kind of method is used to reach datainteroperability (standard, interfaces, mediation). The main drawback is the fact thatcomplex queries cannot be made and user requirements will not be met in mostcases. However, in specific scenarios, especially when looking for descriptions orlistings in domains with a shared or even standardized ontology, QBE might prove tobe sufficient.

9.2.3.2 Standardized query languages / SPARQL

In order to gauge the potential of standardized query languages in general andSPARQL in particular in tourism scenarios we need experience reports and test-beds. Such test-beds should involve major information providers and or integrators toevaluate aspects such as:

ease of the production of RDF triples based on existing data stores, ease of the implementation of SPARQL endpoints on top of existing data

stores, and performance characteristics of federated SPARQL queries.

Since the ICT infrastructure in the tourism industry is characterized by a broad rangeof heterogeneous systems (and thus different databases), it is very unlikely that atypical query language can be deployed as a standard on a broad base. SPARQL, onthe other hand, has the potential for broad acceptance, since it can be deployed ontop of existing reference models in the case of divergent data models. In this setup ithas similar benefits and constraints as QBE, but overcomes QBE’s main obstacle byallowing complex queries. SPARQL seems therefore to be one of the main potentialcandidates for handling metasearch queries in a distributed and divergentenvironment.

9.2.3.3 Interface standardization

The short-term benefit of interface standardization can be heightened by an overviewof respective schemata. On this basis the relevant fora such as the Open TravelAssociation and XFT can spot gaps and fill them. Such specifications should beelaborated with the general recommendations in the process handling section.

Interface standardization seems also a reasonable practical method for runningdistributed queries. Its potential is limited by mainly two facts: Firstly, each possible

Page 105: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 105

query sequence must be defined as part of the interface(s). This makes queryinterfaces difficult in its definition and adoption, and therefore limits the potential forrunning complex queries since the efforts for defining and deploying interfacesbecome overwhelming. Secondly, participating partners must either implement thestandard or define mappings to be interoperable with the standard. Thus queryinterfaces have to be implemented either for each query scenario or mappings basedon a shared reference model must be setup.

Thus interface standardization is more advanced than QBE, but still has similarrestrictions. It might be well suitable for specific scenarios, but sets its limits for abroader deployment.

9.2.3.4 Metadata syndication

eTourism can build on the experience with metadata syndication in the eGovernmentdomain. Those results should be evaluated and screened for their applicability in thedata integration between CRSs, GDSs and intermediates.

In fact metadata syndication bypasses the problem of running integrated queries in“metasearch” scenarios, by making normal queries in “metadata repositories”. Thisseems to be a fairly practical approach. It results in fast queries allowing whateverdegree of complexity – only limited by the possibilities of the query language used.One disadvantage is the hosting of redundant data. Another one is the need ofconstant updates in a highly dynamic environment like, e.g., for hotel bookings. Itmight also be doubtful that metadata syndication is feasible for networkedenvironments, since it might result in multiple asynchronous data hubs. A clearadvantage is that it keeps the data source free from queries, improving the source’soverall system performance. Especially GDSs are subject to performance problems,which are alleviated this way. Again, the full benefit might be reserved to a limitednumber of search scenarios.

9.2.4 Recommendations

9.2.4.1 Short-term recommendations (1–3 years)

If a system should be available for external queries, make use of generalquery statements that are supported by a broad range of query languages.Avoid specific features and functionality of own database.

Further develop flexible standardized query languages that can be adapted todifferent system environments and support semantically enriched data.

Publish “partial translators”, which provide a structured translation for humansearch concepts like “near”, that can be used by different query languages.

9.2.4.2 Long-term recommendations (3–10 years)

Research on technologies for flexible and adaptive query methods, that areable to understand semantics of a web repository and can send an appropriatequery.

Page 106: eTOUR CWA final 2009-06-03

106 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

9.3 Role of registries in eTourism

9.3.1 Needs and requirements

9.3.1.1 Introduction

As has been discussed above, both services and data is widely distributed in typicaleTourism scenarios. In addition to the information provided by one or more largeplayers such as a GDS, a typical eTourism transaction can bring together – or couldin the future profit from combining – local and remote data from many sources.Standardized queries, e.g. based on SPARQL, or ad-hoc protocols can be used toactually retrieve specific data sets from data stores (see above) and web servicescan be used to access specific services, ideally through standardized APIs.

This scenario presumes, however, prior awareness of all pertinent sources ofinformation and services that in reality no single player is acquainted with. Instead,machine-processable information on such stores and services is currently either notavailable at all or spread across many different data collections. These range frommajor commercial operators such as the GDSs themselves over national or regionalintegrators and portals to the web sites of small tourist destinations that list relevantservices in their small geographical area.

The need for machine-processable information especially on services has long beenrecognized. When web services became popular in the late 1990s, three key factorswere considered to be crucial for the success of the then new paradigm:

1. Technical interoperability: Web services need to exchange (often pre-defined) data structures or perform RPC-style calls across systems.

2. Description of service interfaces: The APIs of web services and the datastructures must be defined in a machine-readable fashion.

3. Lists of available services: Knowledge about other existing web servicesand their goals as prerequisite for using them.

Solutions for these requirements are based on open specifications and are in thecontext of “traditional” web services usually identified with the three well-known basicweb service standards SOAP, WSDL and UDDI (questions of semanticinteroperability were largely out of focus at that time). In RESTful Web Services thestack is somewhat less clearly defined especially for machine-processable APIdescriptions, but the general requirements are the very much the same.

9.3.1.2 Needs

Looking beyond those specific web service standards, the OASIS Reference Modelfor Service Oriented Architectures [OASIS Reference Model] explores some of theserequirements on a more precise, technologically neutral level. Around the idea of aservice as “the mechanism by which needs and capabilities are brought together”gravitate concepts such as interaction of services, their service descriptions and theirvisibility and reachability, all grounded in the willingness to collaborate with the goalof achieving a real-world effect.

Page 107: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 107

Rightly, “the large amount of associated documentation and description” [OASISReference Model] that exists for a service is seen as one of the definingcharacteristics of Service-Oriented Architectures (SOAs). This service description,however, goes way beyond interface descriptions and includes both informationabout organizational prerequisites – the foundations for what terms organizationalinteroperability – and a depiction of the service’s semantics, required for semanticinteroperability. The same principles can be extended to the primordial role of thevisibility for data stores in eBusiness and in particular eTourism transactions, anaspect less in the focus of the OASIS Reference Model.

Registries are typically regarded as one approach to achieve visibility, other optionsbeing semantic or general-purpose search engines. Registries in this sense help tofind actual resources, thus enabling their discovery. For that purpose, they storemore or less standardized metadata to describe those resources and offer aninterface to query that metadata. This metadata could conceivably one day also beharvested using information extraction.

9.3.1.3 Requirements

Registries must facilitate finding existing services and data repositories. Togetherwith standardized query technologies they thus help to put those resources to optimaluse.

While registries focus on the visibility of resources, they build on the often unspokenassumption that there is already a willingness to collaborate and share thoseresources in a given context, be it within an organization or across organizationalboundaries, be it for free or for a charge. This may or may not be true in a givencase, and it may or may not imply that a registry owner is willing to give up controlover the data. Furthermore, in the real world there is rarely a single source ofinformation for any given area of interest, and, as we have seen in the introduction tothis section, it is particularly true for the tourism sector. Individual registries aremaintained at various levels of government – notably, local authorities supportingtheir local tourism industry –, in tourism associations, GDSs and other private sectororganizations. This makes sense; in many cases the maintainers are closest to thevery resources themselves and have both the best first-hand knowledge and thestrongest business case to keep the data up-to-date.

That said, there is also a strong requirement for centrality, or, more exactly, centralinterfaces to enable searches across individual registries. Otherwise any one searchwill involve direct queries to a large number of eTourism registries, negating the veryidea of visibility of data and services.

9.3.2 State of the art

9.3.2.1 UDDI and the ebXML Registry Specification

Two well known registry standards dominate the relatively small literature on thesubject, namely UDDI and the ebXML Registry Specification. But neither standardhas been widely adopted in the market. This is, as we argue in Küster, Moore,Ludwig, 2007, due to fundamental design issues that plague both specifications,

Page 108: eTOUR CWA final 2009-06-03

108 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

namely the mixing up of the in reality orthogonal technical exchange formats,information models and organizational rules, leading to very limited adaptability fornew requirements and to bloated specifications.

UDDI

UDDI is the best-known standard for registries of services. The UDDI 1.0specification was formally released in 2002, pushed by major software vendors suchas IBM, SAP and Microsoft. It was supposed to lay the basis for the loosely coupledoperation of web services, bringing together service consumers and serviceproviders, possibly even based on automatic discovery and cooperation. For thispurpose, the vendors created three public UDDI registries that were open to allinterested parties. These public registries, however, were not widely used and wereeventually discontinued in early 2006.

Technically, UDDI is above-all an API for a set of SOAP-based web services withtheir respective data models. This API has continued to grow over the threepublished versions of the standard and covers today amongst others methods forpublishing information on businesses and their services, for finding them and forestablishing links between them. By now, the monolithic UDDI 3.0 standard totals anestimated 400 pages, not counting the nine XML schemata with the actual APIspecifications.

ebXML Registry Specification

The ebXML Registry Specification is composed of the two sister OASIS standards[OASIS ebXML Registry], the former specifying its internal data model, the latter itsSOAP-based API. In coverage it is quite similar to UDDI, though it supports moreflexible content models. It distinguishes itself from UDDI by the support of federatedqueries across a number of different registries:

Figure 9-2

The Open Source Omar project(http://ebxmlrr.sourceforge.net/3.0/PropertiesGuide.html) is by all appearances themost popular implementation of the ebXML Registry Specification.

Page 109: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 109

Semantically Enhanced Registries in SATINE

Neither UDDI nor the ebXML Registry Specification allows per se for detailedsemantic descriptions of (web) services, let alone other types of resources such asdata stores. Queries can at maximum leverage rather coarse-grained, domain-independent taxonomies such UNSPSC.

As has been argued above, semantic technologies are a key to enabling data andprocess interoperability, but are at present largely underused in eTourism in generaland in GDSs in particular. The SATINE project(http://www.srdc.metu.edu.tr/webpage/projects/satine) was funded under FP6 from2004 to 2006 with the explicit goal to overcome the shortcomings of some currentGDSs. SATINE set out to “provide tools and mechanisms for publishing, discoveringand invoking web services through their semantics in peer-to-peer networks”(http://www.srdc.metu.edu.tr/webpage/projects/satine/deliverables/D4.1.1.doc).Semantic technologies and specifically ontologies for web services play a significantrole in the SATINE architecture.

Amongst other deliverables SATINE set out to establish a “Semantic-basedInteroperability Infrastructure for integrating Web Service Platforms to Peer-to-PeerNetworks”. Looking at both specifications, but in particular at the ebXML RS, theSATINE project defined mechanisms for describing the Web service semantics ofregistry entries through the use of OWL-S (task 4.1 deliverable(http://www.srdc.metu.edu.tr/webpage/projects/satine/deliverables/D4.1.1.doc)). Inparticular it built mapping tools for OWL constructs into UDDI and ebXML RS, inparticular ebXML class hierarchies. It studied how these semantic descriptions canbe leveraged in queries. The user interfaces permits to discover the Web Servicesadvertised in the SATINE P2P network using their semantic definitionshttp://www.srdc.metu.edu.tr/webpage/projects/satine/publications/FreezedeChallenges.doc.

Discovery is intended to happen on both levels: that of concrete eTourism servicesand of eTourism-related collections and registries. Few strategies for actuallyfederating those registries are defined, though.

9.3.2.2 CEN/ISSS eGovernment Focus Group and CEN/ISSS WSeGov-Share

The CEN/ISSS eGov-Share Workshop(http://www.cen.eu/cenorm/businessdomains/businessdomains/isss/workshops/wsegovshare.asp) was established in February 2008 with the aim to help designers anddevelopers of eGovernment systems and applications by developing approaches andtools to facilitate the sharing of information across agencies and across borders.

The workshop produces specifications, guidelines and two practical demonstrators tohelp designers and developers of eGovernment systems and services to be able toexchange descriptions of eGovernment resources in the widest sense and to buildand maintain federated repositories that integrate resources – both services and datastores – created and managed by several agencies creating a single point of accessto users.

Page 110: eTOUR CWA final 2009-06-03

110 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

Figure 9-3

Local registries are aggregated into larger registries that are often targeted at specificuser communities. Those aggregated registries can, of course, be further aggregatedinto other registries still. All the while the origin of certain metadata sets remains fullytraceable through unique identifiers. Furthermore, each of the semantic descriptionsis addressable through normal URLs, making the overall architecture fully RESTfuland an ideal fit for Resource Oriented Architectures (ROAs) and SOAs alike.

In the overall framework of specifications the workshop first specifies a simpledomain-independent, Atom-based protocol for the exchange of semanticdescriptions. It continues with a reference ontology for eGovernment resources withtwo representations, one in OWL and one in Topic Maps. While this ontology isnaturally domain specific, the overall architecture supports to plug in arbitrary otherdomain reference ontologies e.g. for eTourism. Terminological resources, e.g.“skosified” vocabularies and taxonomies such as Eurovoc, are used to provideanchor points for value domains. Soft cultural elements help to heighten theawareness of and information on culturally variable system factors.

Page 111: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 111

Figure 9-4

The resulting multipart CWA is currently out for open consultation and consists of thefollowing parts:

CWA Part 0: Introduction CWA Part 1a: Reference Ontology and Metadata Schema CWA Part 1b: Protocol for the Syndication of Semantic Descriptions CWA Part 2: Federated Terminological Resources CWA Part 3: Establishment of a set of Soft Cultural Elements CWA Part 4: Evaluation and Recommendations

Future work may add specifications for the organizational arrangements especially inthe eGovernment domain.

9.3.3 Gaps and future needs

9.3.3.1 Shortcomings of current registry standards

Neither UDDI nor ebXML registries have been well received in the market place. Thisis due to a number of serious shortcomings that affect those registries:

Page 112: eTOUR CWA final 2009-06-03

112 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

1. Both specifications essentially build on a fixed ontology with correspondingdata formats for registry entries. This ontology is non-trivial to extend for otherrequirements (the ebXML Registry model being more flexible than UDDI).

2. Both specifications are overly long and complex.3. Both are bound to a single technologies stack, SOAP-based web services.4. Both standards lack a well-defined and simple data exchange storage format.5. Especially UDDI clearly implies a specific set of procedures and organizational

environments in which to operate.6. UDDI has insufficient support for linking up registries. The federation support

in ebXML is heavily Web Service based and difficult to implement.

Attempts such as SATINE to build ontology constructs into the registries furthercomplicate the specifications and have seen little adoption in practice.

In short, UDDI and, to a lesser extent, the ebXML Registry Specification, meshesthree important, but orthogonal concerns that should be kept apart:

an information model for registry entries, a specific technical interface to the registry, and organizational procedures for maintaining the registry data.

9.3.3.2 Future needs

In line with the recommendation of the CEN/ISSS eGovernment Focus Group to“build domain registries in response to the needs of individual business cases and toconstruct them out of existing, standardized technologies” (section 1.2.2) we need todesign federated registries in the eTourism domain that are built on a suitableinformation model – possibly in line with models used in semantic interoperability. Atthe same time we should align with trends on technical interfaces – both notificationand exchange formats – to those registries that are laid by the CEN/ISSS eGov-Share Workshop.

Much of the eGov-Share architecture lends itself ideally to this adoption, providedthat a reference ontology for eTourism-related resources is developed.

Page 113: eTOUR CWA final 2009-06-03

The “watchtower” registry of relevant eTourism standards (cf. recommendation6.1.4.1) lends itself to be the testshared eTourism registries for those registries. Once this prototype is in operation,plans for the long-term operation of that registry must be elaborated that, again, canbe exemplary for other registries

9.3.4 Recommendations

9.3.4.1 Short-term recommendations (1

Develop a reference ontology for eTourism Build the “lighthouse” registries (cf. other recommendations) based on the

syndication specifications standardized in WS eGo Specify collaboration models for shared eTourism registries for those

registries.

CEN/ISSS WS/eTOUR – CWA – 200

Figure 9-5

The “watchtower” registry of relevant eTourism standards (cf. recommendation6.1.4.1) lends itself to be the test case also for developing collaboration models forshared eTourism registries for those registries. Once this prototype is in operation,

term operation of that registry must be elaborated that, again, canbe exemplary for other registries in the domain.

Recommendations

term recommendations (1–3 years)

Develop a reference ontology for eTourism-related resources.Build the “lighthouse” registries (cf. other recommendations) based on thesyndication specifications standardized in WS eGov/Share.Specify collaboration models for shared eTourism registries for those

2009-06-03 – 113

The “watchtower” registry of relevant eTourism standards (cf. recommendationcase also for developing collaboration models for

shared eTourism registries for those registries. Once this prototype is in operation,term operation of that registry must be elaborated that, again, can

related resources.Build the “lighthouse” registries (cf. other recommendations) based on the

Specify collaboration models for shared eTourism registries for those

Page 114: eTOUR CWA final 2009-06-03

114 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

9.3.4.2 Long-term recommendations (3–10 years)

Plan for the long-term operation and business models for the “watchtower”registry.

Page 115: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 115

10 Object identification

10.1 Needs and requirements

10.1.1 Introduction

Until recently and still a standard practice, getting information or buying travel-relatedproducts is performed via intermediaries (such as agencies) directly providing theinformation and performing the bookings on dedicated systems, possibly vendor-specific systems. As introduced in the case study, the use of internet for travel-related searches and online shopping is increasing and already widely accepted.Multiple sources of information are available, proposing single products (like hotels,car rentals, events, etc.) or complex packaged products comparing or aggregatinginformation from different sources and becoming sources themselves.

Identifying identical items (like the same hotel with similar names from different sites),comparing information on different items (such as room or price definitions), mergingor filtering similar information from different sources (such as getting information onBaleares sometimes searching a Spanish region and sometimes directly Baleares) isnext to impossible in the current situation.

10.1.2 Needs

In this chapter the basic needs for unique identifiers for tourism products or servicesare discussed.

Travel being a change in location, precisely identifying geographical locations is abasic need for tourism. Identification mechanisms should allow searches andidentification as well as geopositioning on maps, being a growing tool used on theweb in relation with travel. In a more general context each travel service should haveunique identifiers so as to allow cross references between the various sources,reliable comparisons and data aggregation. That would cover hospitality items,events, animations, activities, historical sites, exhibitions, museums, etc. In a world ofopen architecture technology, that would allow matching data from different sourceswithout time consuming transcoding data being built, therefore reducing the amountof translations, allowing more efficient querying and reduced time to resolve queryissues like cache synchronization. That would also allow building extensiveknowledge bases compiling different data sources (matching hotel, chain andtestimony sites for instances, completing with regional, historical or event siteinformation, etc.).

Certain aspects of each type of service further require unique identification so as toremove ambiguity of definition and to allow comparisons. For instance, when buyinga stay in a hotel, the type of room (a double or a triple room) becomes major.However at present, it is not necessarily clear what a double room would be (what isthe size of the bed, is there one or two beds, can an extra bed be used, for a child oreven an adult, etc.). For another component, other features may be crucial andshould be correctly identified.

Page 116: eTOUR CWA final 2009-06-03

116 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

On a different level, to track the different intermediaries introduced in the previouscase study and possibly to allow compensations for services rendered by differententities in the whole travel (pre trip, on trip and post trip), tagging individual entitiessuch as central reservation systems, credit card companies, GDSs, web sites,wholesalers, travel agencies, chains, etc. would greatly facilitate commission andmoney collection.

10.1.3 Requirements

This chapter outlines the different requirements that may be deduced from thepreviously exposed needs. Being able to uniquely identify objects corresponds tobuilding taxonomies for certain domains or ontologies, some of them beingmentioned in the following chapters. More information may be found in the taxonomychapter of this document.

10.1.3.1 Location codes

Unique precise exhaustive location codes are a basic requirement for the travelindustry. Location coding should not be limited to general codes such as countries,cities or airports. Online information and booking facilities becoming widespread, it isnow required to be able to associate codes to all levels of locations that can be usedin a travel, such as

touristic regions, terminals, stations (railways stations, ski stations, car rental pickup stations), points of interests, leisure, event or activity locations, etc.

The location codes are often directly used by the experts and become also more andmore visible to end users (on itineraries, on displays, in search forms, etc.).

Geodesic coordinates is also becoming vital information for searches (“What can I doin the vicinity of my hotel?”, “What alternative hotel?”, etc.), to represent itineraries,results, etc. However, it does not seem realistic for the geodesic coordinates to bethe unique coding mechanism, the coordinates being complex and in essencecorresponding to a point. What would therefore be a country coordinate?

10.1.3.2 Travel service codes

Travel products are always composed of a number of separate travel servicesproposed by different vendors through a multitude of resellers. Some of thosecompanies have codes that are standardized (such as airline IATA codes), but mosthave codes that depend on the vendor or distributor (for instance hotel codes aredifferent for each distributor so that you would have a different hotel code for thesame hotel in Sabre, Amadeus, Hotels.com, the hotel chains proposing the hotel andthe hotel Property management system itself).

Page 117: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 117

Furthermore, more and more types of leisure, activities or travel related services arebeing proposed and published on Internet, without any unique identification (andclassification). Unique identifiers for all those services are required to have a chanceto discover and aggregate data in an efficient way.

It seems however unrealistic at present to imagine a unique global entity providingidentification for all services worldwide, specific identifiers per country or per sectorwould also be possible provided there is capacity to ensure uniqueness of codes.

10.1.3.3 Travel service qualifier codes

To compare or qualify each type of services, it is now more and more required tohave structured information based on universally accepted taxonomies. Thisinformation must also be codified. For some services, like hotels or car rental, it ismore developed than for others, but it only corresponds to recommendedcodifications and not true unique identifiers. For most services, codification is stillspecific to each service provider.

In that case also, it seems unrealistic to have a unique body responsible for that typeof codification.

10.1.3.4 Travel company codes

The important level of intermediation and the quantity of different companies involvedin a selling process lead to complexity to explain pricing schemes, to unravel inquestion of complaints, to proceed with payments. Adding traceability for each step inthe process is becoming an important requirement. That would imply uniqueidentifiers for each company involved in those processes, such as

the end travel services introduced in the previous chapter, the wholesalers (hotel chains, tour operators), the distributors such as travel agents, online companies, the intermediaries (central reservation systems, GDSs, switch companies, the compensation, commission processing or payment processing companies, the call centres, etc.

10.2 State of the art

In this section, we review commonly used codes. There are other bodies producingcodes that either associate other codes to regions, cities, countries or provide localcodes (for instance for all cities in a country, postal codes, etc.). Those have not beenreviewed, though they could very well be used to designate more locations, notlinked with airports for instance.

10.2.1 IATA

The IATA codes are the first codes that come to mind in the travel industry, becausethey are used for airports, airline companies, etc.

Page 118: eTOUR CWA final 2009-06-03

118 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

IATA Airport Codes: alpha-3 Codes. The IATA alpha-3 airport codes uniquelyidentify individual airports worldwide. They are made up of precisely threeletters; numerals are not allowed. In fact those codes have been expanded toalso contain city codes in case a city has more than one airport, as well ascoach, rail or ferry locations if requested by an airline or CRS. For instanceTGV railway stations usually have IATA codes because TGV are used asfeeders for the airlines. It therefore becomes truer to define IATA codes aslocation codes used in travel rather than only airport codes. Except for cities,the codes correspond to transportation boarding locations and not really tostay or service oriented locations. Drawbacks of IATA airport codes are thefact that they cannot be much extended to include all locations required for thetravel industry.

IATA Airline Code: officially an alphanumeric-3 codes as well as pure numericcodes (used for ticketing for instance). They were initially an alphanumeric-2code which are the codes that are mainly used. The alphanumeric-2 codes areused in combination with others in ticket numbers, timetables, tariffs, etc.Codes are also allocated to railway or coach companies, whenever requestedby airlines or GDSs. There are also codes that are reused for different airlines,whenever their destinations are not likely to overlap! Codes allocated toairlines that discontinue business would be reused after six months.

IATA Agency codes: Numeric codes: IATA is pivotal in the worldwideaccreditation of travel agents issuing airline tickets with exception of the USA,where this is done by the Airlines Reporting Corporation. Permission to sellairline tickets from the participating carriers is achieved through nationalmember organizations. As a consequence, there are agencies that would nothave IATA numbers which have lead to alternative solutions according tocountries, allocating Pseudo IATA numbers in some cases (such as SNCFissuing agencies in France that are not IATA).

There are also less used IATA codes such as baggage tag issuers, delay codes,accounting prefix codes, logistics company codes, etc.

10.2.2 ICAO

ICAO airport codes: The ICAO (International Civil Aviation Organization)alpha-4 airport identifier codes uniquely identify individual airports worldwide.They are used in flight plans to indicate departure, destination and alternateairfields, as well as in other professional aviation publications. Usually, the firsttwo letters of ICAO codes identify the country (but do not correspond to ISOcountry codes). In the continental USA, however, codes normally consist of a‘K’ followed by the airport’s IATA code.

ICAO airline designator: The ICAO airline designator is a code assigned by theInternational Civil Aviation Organization (ICAO) to aircraft operating agencies,aeronautical authorities and services. The codes are always unique by airline.There are ICAO codes for companies that have no correspondence with IATAcodes.

Page 119: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 119

10.2.3 ISO

A number of ISO standards are used on a regular basis in the travel industry:

Country codes, ISO 3166-1 alpha-2, alpha-3 and numeric. ISO 3166-1, as partof the ISO 3166 standard, provides codes for the names of countries anddependent territories, and is published by the International Organization forStandardization (ISO). Some codes are in fact regions and not countries (suchas MQ for Martinique, part of France), therefore leading to some confusion (“IsFR only the mainland or the whole of France?” for instance). Alpha-2 codesare more often used, alone or in combinations.

Region zones ISO 3166-2 alphanumeric codes. ISO 3166-2 is the second partof the ISO 3166 standard published by the International Organization forStandardization (ISO). It is a geocode system created for coding the names ofcountry subdivisions and dependent areas, such as regions, states,departments, etc., depending on countries. They usually correspond toadministrative zones.

Language codes: ISO 639-1. Although alpha-2 codes are not sufficient to codeall languages, this is sufficient in most cases. In case there is a need toexpand, ISO 639-2 or ISO 639-3 could be used. In some cases, when localvariations of the languages are important, the ISO 3166-2 country code isused in association with the language code (such as fr-FR and fr-CA).

Currency codes: ISO 4217. The first two letters of the code are the two lettersof ISO 3166-1 alpha-2 country codes and the third is usually the initial of thecurrency itself. In some cases, the third letter is the initial for “new” in thatcountry’s language, to distinguish it from an older currency that wasrevaluated; the code often long outlasts the usage of the term “new” itself.

10.2.4 UN/LOCODE

The United Nations Code for Trade and Transport Locations is commonly moreknown as UN/LOCODE. Although managed and maintained by the UNECE, it is theproduct of a wide collaboration in the framework of the joint trade facilitation effortundertaken within the United Nations.

Each code element consists of five characters, where the two first indicate thecountry (according to ISO 3166-1) and the three following represent the place name.Examples such as CHGVA, FRPAR, GBLON, JPTYO and USNYC ring bells for airtravellers who are used to see the three last letters of these codes on their luggagetags. UN/LOCODE picks up the IATA location identifiers wherever possible, to benefitfrom their association value and to avoid unnecessary code conflicts. In allocatingcodes, the secretariat tries to find some mnemonic association link with the placenames, to aid human memorization. This is of course increasingly difficult for largecountry lists where the 17576 permutations of three letters are near exhaustion.

Each code is also associated to different additional information, among them(possibly multiple) function(s) such as airport, harbour, railway station, road terminal,etc.

Page 120: eTOUR CWA final 2009-06-03

120 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

The position of this additional coding mechanism is interesting because

it is based on existing and accepted standard (IATA codes whenever possible,ISO 3166-2 country codes);

it expands the code list following the same structure and methodology; it takes into account the human use of the codes, facilitating mnemonic

associations.

10.2.5 HEDNA

HEDNA is an international association focused on identifying distributionopportunities and providing solutions for the lodging industry and its distributioncommunity. HEDNA compiles codes for instance for hotel chains, room types, etc.,so as provides list and codes of conducts on how to use lists.

HEDNA also works on a project to provide global unique identifiers.

10.2.6 ACRISS

ACRISS Members utilize an industry standard vehicle matrix to define car groupsensuring a like to like comparison of standards across countries. This easy-to-usematrix consists of four categories. Each position in the four character vehicle coderepresents a definable characteristic of the vehicle. The expanded vehicle matrixmakes it possible to have 400 vehicle types.

This coding system has been adopted to ensure that all ACRISS members displaythe same coding for the same vehicles, enabling you to make an informed decisionwhen comparing rates.

This certainly facilitates understanding what type of vehicle being rented thoughmany surprises can still happen, even within ACRISS members.

ACRISS does not actually provide standardization for all car rental related data; forinstance car rental stations are not standardized, nor are opening hours.

10.2.7 GIATA

GIATA acquires and standardizes (normalizes) the digital image and text data formany tour operators and travel agencies such as TUI, Thomas Cook, Easyjet,Expedia, Opodo or Lastminute.com. They are also used by all well-known CRS/GDS(Amadeus, Sabre, Galileo/Worldspan) to provide decoding information based on aunique identifier present in those GDS.

GIATA is not a global standardization body but it has compiled enough data tobecome de facto a “standard” source of information, their identifier becoming theidentifier. It is not completely true though since it is not globally used, nor even usedby the hotel owners.

Page 121: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 121

10.2.8 GS1

The GS1 System is an integrated system of global standards that provides foraccurate identification and communication of information regarding products, assets,services and locations. It is the most implemented supply chain standards system inthe world.

GS1 Identification Keys automatically identify things such as trade items, locations,logistic units, and assets in a unique way worldwide. They can be used on bar codes,in online transactions, for selling or synchronization processes, etc.

Though this identification scheme is not used at present in a systematic way in thetravel industry, it is applied in many other trades in a successful manner and couldtherefore be easily expanded to the travel trade.

GS1 operates in multiple sectors and industries and already works in close relationwith many corporations throughout the world as well as various standardizationbodies such as

International Organization for Standardization (ISO), UN/EDIFACT, GCI (Global Commerce Initiative), ISBN (International Standard Book Number), and ISSN (International Standard Serial Number).

10.2.9 URI

Since we are reviewing methods to obtain unique identifiers, the W3C provides ameans for globally unique identifiers: URIs. Uniform Resource Identifier (URI) is acompact string of characters used to identify or name a resource on the Internet. Themain purpose of this identification is to enable interaction with representations of theresource over a network, typically the World Wide Web, using specific protocols.

URIs could be used in the travel industry in a systematic way, but they have majordrawbacks such as

not being short, requiring registration (and therefore money), not really providing standard naming conventions.

10.2.10 UUID

Universally Unique Identifier (UUID) is an identifier standard used in softwareconstruction, standardized by the Open Software Foundation (OSF) as part of theDistributed Computing Environment (DCE). The intent of UUIDs is to enabledistributed systems to uniquely identify information without significant centralcoordination. Thus, anyone can create a UUID and use it to identify something withreasonable confidence that the identifier will never be unintentionally used by anyone

Page 122: eTOUR CWA final 2009-06-03

122 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

for anything else. Information labelled with UUIDs can therefore be later combinedinto a single database without needing to resolve name conflicts.

Though not directly applied in the tourism industry, since it is technically oriented,UUIDs are interesting in the sense that they do not require a centralised body forvalidation (though repositories or registries would be useful). UUID keys are still notdirectly usable due to their inherent complexity.

10.3 Gaps and future needs

In this section the gaps present in the current identification schemes are outlined aswell as future needs.

10.3.1 Location

In the previous chapters we have seen that various associations and organizationspropose location identifiers. However, there is currently no worldwide identificationstandard that can uniquely identify and provide information about entities within thetravel industry.

10.3.1.1 Country codes

There is mainly consensus around the country codes (though several codingschemes exist). The ISO 3166 standard is very widely used and even incorporated inother standards (like UN codes). However, the alpha-2 codes are mostly used,limiting the migration to alpha-3 codes. That may hinder extending the codes.

Some “country” codes are also allocated to regions of certain countries or even partof the world that are bigger than countries (like EU for the European Union, MQ forMartinique). This most likely comes from the need to have travel-oriented zones thatoften coincide with countries, but not always. At present this is not done in asystematic way (there is no code for Corsica or Baleares for instance). There is a realneed to differentiate touristic “zones” with political countries or areas.

10.3.1.2 Region codes

There is less consensus here. The ISO subdivisions of countries are less widely usedbecause they are less matching the travel industry needs.

There is a need to provide travel specific regions, that do not really mappolitical or administrative boundaries (cruise regions at sea, ski regions (ormountains)) that are present on several countries, specific touristic regionsthat may be within a country or across countries (Mediterranean region, thesouth of France, Sardinia, Balearics, La Réunion, etc.).

Some countries have several levels of subdivision and the current ISO codesonly take into account one level (like the French departments but not theFrench regions for which a local coding is used, some codes being identical tothe ISO sub regions, but with different meaning though).

Page 123: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 123

Some travel companies are also specialized on certain domains (like diving,hunting, etc.) and they also require specific regions related to their specialty.There is no way to submit such regions in order to create a global repository.There should be a mechanism to submit and validate such codificationbecause that would allow better understanding of offers which are at presentdifficult to compare.

10.3.1.3 City, airport and other point of travel codes

IATA, though widely accepted provides a number of incomplete identifications (city,airport, railway stations, etc.) without differentiation or identification of the types oflocations.

New codes are added only in relation with airline related business without systematiccoding processes.

Furthermore, alpha-3 identification is far too limited to code travel relatedidentifications.

Those codes are still widely used and a global coding process should allow theirintegration, at least for their original objectives (airport codes).

ICAO is also providing airport codes in a more neutral way, including non ISO countrycodes. They tend to be used internally by airlines and airports, therefore using twosets of codes. They tend to be specialized though and limited to airports.

All in all, airport codification is fairly well covered though cluttered. However, nocodification integrate terminal data and airport codes so that vendors often createpseudo codes such as CD3 in lieu of CDG terminal 3, disrupting the initial IATAcodes.

Furthermore, travel destinations are not limited to airports or main cities (which arecovered by the IATA codes). Precisely defining cities in general, villages, stations(airport terminals, ski, railway, car rental, coach, etc.), points of interest within citiesor outside, lieu-dits, etc. does not exist on a global scale and is a major issue foreTourism.

There are several possible ways to move forward: either differentiate airports, railwaystations, cities and build identification schemes for each type of item. Or on thecontrary create a unique set of identifiers for points of travel.

The second approach corresponds to the historical approach where cities actuallyinherited the codes of their airports and then sometimes differentiation occurred. Thatseems logical because when travelling somewhere, location and airport is a verysimilar notion (for the trip), except in case of multiple airports and airportdifferentiation is in order.

IATA nor ICAO seem in a position to provide coding schemes. Integrating local postalcodes and possibly other codifications in a global identification process could speed

Page 124: eTOUR CWA final 2009-06-03

124 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

up the process. The UN has also initiated the same type of process, with stations,harbours, etc., completing airport codes whenever possible.

10.3.2 Currency and language codes

Relevant ISO standards are already in place and there are no any major gaps,except possibly defining precisely the notion of localization.

10.3.3 Travel service codes

IATA and preferably ICAO provide extensive identification for airlines. Car rentalcompanies, hotel chains are also identified by two or three identifiers. However, thereis no global unique identification for hospitality items, cruise companies, events,animations, activities, services, restaurants, etc.

It is therefore impossible to have unique identifiers for each element of a trip and it istherefore impossible to compare or even amalgamate information. In case suchidentification was in place, there would then be a need to provide additionalqualification like understanding the rights of the source related to the content (like isthis first hand information, does the author have the right to create or distribute theinformation, etc.).

Some organizations such as the HEDNA have such a project for hospitality servicesor other specific services. Private companies in certain countries provide partial data(such as GIATA in Germany). Private companies distributing content also provideunique identifiers within their system which do not allow cross referencing.

10.3.4 Travel service qualifier codes

If we still want to refine the definition of travel related items, we would need to identifyrooms types, car types, facilities, staff credentials, etc.

Here again, nothing comprehensive really exists. Certain associations providerecommendations or partial identification schemes and guidelines, without possiblyimposing a standard. For instance, there are coding recommendations for doublerooms (such as DBL or just D) but that does not inform on the true occupancy of theroom or its situation, view, comfort, etc.

Defining unique codes for travel services is very delicate because it touchesmarketing or sales oriented information which is subjective and also requires manydetails to allow precision. Actual codes are likely to be aggregates of differentinformation (such as room information, bed information, features, location, etc.).

10.3.5 Travel company codes

Finally, we expressed the importance of being able to track each company in a travelrelated booking or data exchange process. There is presently no global serviceproviding unique identifiers for vendors, distributors, central reservation systems,cruise companies, commission payment systems, tour operators, etc.

Page 125: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 125

10.4 Recommendations

10.4.1 Short-term recommendations (1–3 years)

Build a registry of present object identifications in the tourism industry. Develop travel related global geography identifiers. Integrate the global geography identifiers in the registry and build transcoding

capability. Develop travel company related global identifiers.

10.4.2 Long-term recommendations (3–10 years)

Provide recommendations for travel service coding schemes. Build transcoding capacity over the above mentioned repository to transform

the registry into a thesaurus.

Page 126: eTOUR CWA final 2009-06-03

126 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

11 Best practice case

11.1 The starting point

The objective of the best practice case is to instantiate a real case studydemonstrating the future scenario of main issues of data and process interoperability,based on an existing business case. The CWA results shall be discussed based onthis case to have them witnessing the feasibility of what the CWA states.

We have selected an existing eTEN project, which joined the workshop as a memberand was also present with key-note speakers during one of the workshop meetings inBerlin. The project called “euromuse.net” comes not from the core tourism domain,but from the tourism driver cultural heritage. The project improves an existingplatform to offer services and exhibition data to the tourism industry and wants tobridge the existing gap between cultural heritage and the tourism industry. It actuallyfaces the same problems as discussed in the workshop and has an appropriate datamediation solution in use to show the way recommended in general by this CWA toovercome the interoperability problems. It uses Harmonise 2.0 to integrate data from100s of Europe’s top museums and provide this aggregated information to the varietyof players of the tourism industry. And of course there is a strong need for a cost-effective and easy-to-use solution, since museums usually do not have large ITdepartments, if any at all.

euromuse.net has been identified as a very good starting point for discussions of theissue and to demonstrate a real live system, which could otherwise not really beimplemented easily within the course of the CEN workshop. It allows to make a realdemonstration and to discuss the issues presented in this document based on thesystem in use.

11.2 The existing case of euromuse.net

People interested in exhibitions and museums depend on access to information,which – in most cases – is only available spending great effort on a rather complexand scattered market as such bundled data from museums on a supranational andmultilingual level is difficult to access. euromuse.net is a public access portalproviding multilingual information on museums and their exhibitions throughoutEurope.

euromuse.net offers both, a ‘one-stop’ web tool to the greatest exhibitions in Europefor the public as well as a special data interface called Harmonise to deliverstructured data from the museums for the tourism sector. The euromuse.net projectwill deploy an existing online service, which provides multilingual information abouttemporary exhibitions and museums as well as other museum resources on a webplatform, to develop a wider pan-European data-collection based on public sectorinformation to be re-used by different actors in the cultural and tourism fields. Theproject aims at three main goals:

Page 127: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 127

1. Improve and increase the existing platform, a website offering museum andexhibition information to the general public for free.

2. Integrate the museums’ information of the euromuse.net database with theHarmonise tools. Through this integration euromuse.net’s rich content willaffiliate with the online offers of other European and national tourism andmarketing services for culture.

3. Enhance the existing services to integrate information on scientific publicationsfrom museums and to expand the current services, which provide an overviewof “virtual” museums and their (online) resources.

The main focus is to improve the connection between existing marketing andpromotion channels of the tourism industry and the cultural sector over theeuromuse.net database. A general idea of the euromuse.net project is to betterconnect the museum sector with relevant target groups in the tourism sector– both ona professional and on a non-professional or private level. euromuse.net services willsupport and strengthen existing connections between the general public interested inmuseums and exhibitions, the professional tourism sector and museum.

The service will help to create easily accessible information about exhibitions andmuseums all over Europe. This takes place by offering the information on threecomplementing services: On the website http://www.euromuse.net/, mainly for thegeneral public and accessible for free, via tools for structured data exchange withdatabases of tourism industry and other tourism players and on a scientific literaturedatabase of museum publications, mainly for researchers and museum staff. Thetools for data exchange will enable representatives of the tourism industry andservices to organize personalised tourism packages for their customers through theservice.

Because the requests of industrial and private users normally differ, the project offersspecial access for tourism industry users besides the euromuse.net website. Specialsearch strings and precise queries to the euromuse.net database allow optimizedpreparation of organized trips. Industrial users will receive structured and xmlformatted data on a special export from the euromuse.net database. The commercialusers of this functionality will be requested to pay a contribution for this serviceprovided.

11.3 Future scenario for euromuse.net

The current setup of euromuse.net is sufficient to collect exhibition information fromhundreds of partners with different data models, and to pool this information in acentral repository via the Harmonise service. This data can be searched by projectexternal partners, most of all tourism organizations, to get up-to-date informationabout exhibitions all over Europe. This is again done via the Harmonise service, sothe tourism partners do receive the data in their own data format and can feed theirdata bases easily.

This follows very much the approaches recommended in this CWA to overcome thedata interoperability problem. However, here the current setup ends leaving upsome issues open, which have also been discussed in the topics of this CWA. Some

Page 128: eTOUR CWA final 2009-06-03

128 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

of them should be deployed in euromuse.net in the future, some of them are still noteasy to solve.

Following the order of this document, the first issue is the process handling. Mostmuseums do not have a system to allow online ticket purchasing, but they might havesoon or later. Online buying of tickets will therefore become an issue, also becausetravel agencies might wish to bundle services together dynamically to sell a full travelpackage to the client comprising also exhibitions. Process handling would beprincipally possible easiest by a stateless way of managing processes, handling itonly by exchange of data. Process mediators are currently being developed inapplied ICT-sciences and might offer an improved solution on the longer run (theseprocess mediators work similarly to the data mediator Harmonise).

Meta search is the next topic and in some sense euromuse.net is already a meta-search repository, since it is aggregating data from different sources and makes itavailable for search queries. When currently querying data on the euromuse.net database, a fixed query string or query rules have to be used, since no proper solutioncould be found to handle different query strings in a flexible and generic way. In thefuture, it should be possible also to map different queries to run one querysimultaneously on a larger number of instances, which all might have a differentquery language. This shows the need for interoperable query languages but also theneed for registries, in order to find the data instances that should be searched.Clearly, there is the need for some meta information about where to search, becausesearching any data base in the world to get a certain set of data is inefficient if notimpossible. Thus, reliable registries directing search queries to potential data sourceswould significantly improve search efficiency.

And even if you search various data bases and retrieve a large number of results(let’s say exhibitions in the case of euromuse.net) you do not automatically know howmany exhibitions are represented several times in the data sets retrieved. Thus,object identification is the last of the topics, which are covered by this CWA and arealso a future enhancement of euromuse.net. If all exhibitions, museums and locationscan be identified automatically, then it is possible to clean the data base from multipleentries of the same object automatically. At the moment the issue is open ineuromuse.net, since the number of sources is manageable and the probability, thatone exhibition is reported by two museums, is very low. However, this might risesignificantly and quickly when the network grows.

11.4 Critical discussion

The discussion above showed that even in this rather small scenario, where thebusiness case and the players can be overlooked easily, all the topics touched in theCWA are relevant issues. Even if the project comes originally from the culturalheritage sector, it has strong links with the tourism industry, maybe stronger than wemight perceive when looking at it at the first time. This more “extra-orbital” issue ofexhibitions might also make it easier to see the questions and answers raised in thisdocument, since it is less bound to the daily topics of hotels and flights (and of courseother products more in the core of tourism than exhibitions). Nevertheless, evencoming from the outer sphere of tourism, it is of deep relevance to tourism in Europe.

Page 129: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 129

It is easy to realise that the topics are exactly the same for exhibitions as they are foraccommodation. euromuse.net therefore demonstrates nicely how all of the issuescan be solved also on a global scale. The same technology and setup for mediatingdata and processes can be used for any other object, like accommodation, flights, carrentals, events, etc.

After all, one important issue remains unanswered, since it is out of scope of theinteroperability issue: Although you could exchange all the data smoothly, identifydata sources easily, understand the content and also run processes for bookings -how to assure data quality? How to make sure a time table (opening hours, flightschedules) is correct or the price quotes are valid? Quality of service and useracceptance will depend very much on data quality. In euromuse.net it is discussed tohave users involved to report back quality of information. Maybe the involvement ofusers (user generated content) is a reliable source for estimation of data quality. Butalthough this topic is an important one, it is not part of this CWA about data andprocess interoperability.

Page 130: eTOUR CWA final 2009-06-03

130 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

12 Bibliography and references

The following is a list of documents and web sites other than referenced Europeanand International Standards, which are listed in chapter 2 (“Normative references”).

[Adam, Hofer, Zang, et al, 2005] Otmar Adam, Anja Hofer, Sven Zang, ChristophHammer, Mirko Jerrentrup, Stefan Leinenbach: “A Collaboration Framework forCross-enterprise Business Process Management”. In: Panetto, Hervé (Hrsg.):Interoperability of Enterprise Software and Applications – INTEROP-ESA’2005.Geneva, Schwitzerland, February 23–25, 2005, Technical Sessions, 2005, p499-510

[Addis, Boniface, Goodall, et al, 2003] M. Addis, M. Boniface, S. Goodall, P.Grimwood, S. Kim, P. Lewis, K. Martinez, A. Stevenson: “SCULPTEUR:Towards a new paradigm for multimedia museum information handling”, In:Proceedings of the Second International Conference on Semantic Web, p 582-596, 2003

[Addis, Stevenson, 2002] M. Addis, A. Stevenson: D6.2 Impact on World-WideMetadata Standards, Deliverable report of ARTISTE project, 2002

[Adrian, Sauermann, Roth-Berghofer, 2007] B. Adrian, L. Sauermann, T. Roth-Berghofer: “ConTag: A semantic tag recommendation system”. In: Proceedingsof I-Semantics ’07, p 297-304, 2007

[Advanced Distributed Learning] http://www.adlnet.gov/[Agent Link] http://www.agentlink.org/[Ahern, King, Naaman, et al, 2007] S. Ahern, S. King, M. Naaman, R. Nair, J.H.I.

Yang: “ZoneTag: Rich, Community-Supported Context-Aware Media Captureand Annotation”. In: Proceedings, MSI workshop CHI2007, San Jose, Calif,2007

[AICC] Aviation Industry CBT Committee, http://www.aicc.org/[Amadeus] http://www.amadeus.com/[Amann, Fundulaki, 1999] B. Amann, I. Fundulaki: “Integrating Ontologies and

Thesauri to build RDF Schemas”, ECDL Research and Advanced Technologiesfor Digital Libraries, p 234-253, 1999

[ANSI] American National Standards Institute, http://www.ansi.org/[ArguGRID] http://www.argugrid.eu/[Aristotle] Aristotle: Metaphisics Book IV,

http://classics.mit.edu/Aristotle/metaphysics.4.iv.html[Arnarsdóttir, Berre, Hahn, Missikoff, Taglino] K. Arnarsdóttir, A.-J. Berre, A. Hahn, M.

Missikoff, F. Taglino: Semantic Mapping: ontology based vs. model basedapproach. Alternative or complementary approaches?, ftp://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-200/17.pdf

[ARTEMIS] http://www.srdc.metu.edu.tr/webpage/projects/artemis/[ASG] http://asg-platform.org/cgi-bin/twiki/view/Public[Aviation Industrie CBTI Committee] http://www.aicc.org/[Baader, Horrocks, Sattler, 2003] F. Baader, I. Horrocks, U. Sattler: “Description

logics as ontology languages for the semantic web”. In: S. Staab, R. Studer,eds: Lecture Notes in Artificial Intelligence, Springer Verlag, 2003

[Bailey, 1994] K.D. Bailey: Typologies and Taxonomies - An Introduction toClassification Techniques, London, Sage Publications, Quantitative Applicationsin the Social Sciences, 1994

Page 131: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 131

[Barrasa, Corcho, Gómez-Pérez, 2004] J. Barrasa, O. Corcho, A. Gómez-Pérez:R2O, an Extensible and Semantically Based Database-to-Ontology MappingLanguage. Second Workshop on Semantic Web and Databases (SWDB2004).Toronto, Canada, August 2004

[Berners-Lee, Hendler, Lassila, 2001] Tim Berners-Lee, J. Hendler and O. Lassila:“The Semantic Web”. In: Scientific American vol 284, no 5, p 34-43, May 2001

[Bikel, Miller, Schwartz, Weischedel, 1997] Daniel M. Bikel, Scott Miller, RichardSchwartz, Ralph Weischedel: Nymble: a High-Performance Learning Name-finder, 1997, http://xxx.lanl.gov/pdf/cmp-lg/9803003

[Biron, Malhotra, 2001] P.V. Biron, A. Malhotra (Eds): XML Schema Part 2:Datatypes. W3C Recommendation, May 2001,http://www.w3.org/TR/xmlschem-2/

[Bizer, 2003] C. Bizer: D2R MAP – A Database to RDF Mapping Language, Thetwelfth international World Wide Web Conference, WWW2003, Budapest,Hungary, 2003

[Bloehdorn, 2005] S. Bloehdorn, K. Petridis, C. Saathoff, N. Simou, V. Tzouaras,Y.Avrithis, S. Handschuh, Y. Kompatsiaris, S. Staab, M.G. Strintzis, Semanticannotation of images and videos for multimedia analysis, in: Proceedings of the2nd European Semantic Web Conference (ESWC 2005), 29 May–1 June 2005,Heraklion, Greece, 2005

[Borgida, An, Mylopoulos, 2005] A. Borgida, Y. An, J. Mylopoulos: Inferring ComplexSemantic Mappings Between Relational Tables and Ontologies from SimpleCorrespondences. In: CoopIS, DOA, and ODBASE, OTM ConfederatedInternational Conferences, Cyprus, Part II, volume 3761 of LNCS, p 1152-1169,Springer, 2005

[Borthwick, 1999] Andrew Eliot Borthwick: A maximum entropy approach to namedentity recognition, New York University, 1999

[BREIN] http://www.eu-brein.com/[Brunstein, 2002] Ada Brunstein, “Annotation guidelines for answer types”, BBN

Technologies, 2002, http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html

[Campbell, Currier] L.M. Campbell, S. Currier, (31/10/00),http://www.sesdl.scotcit.ac.uk/sellic_pres/sellic2.html

[Chandrasekaran, Josephson, Benjamins, 1998] B. Chandrasekaran, J.R.Josephson, V.R. Benjamins: “Ontology of Tasks and Methods”, In: Proceedingsof 1998 Banff Knowledge Acquisition Workshop, 1998

[CIDOC CRM] The CIDOC Conceptual Reference Model, http://cidoc.ics.forth.gr/[Cimpian, Mocan, 2005] Emilia Cimpian, Adrian Mocan: WSMX Process Mediation

Based on Choreographies, 1st International Workshop on Web ServiceChoreography and Orchestration for Business Process Management, 2005

[DARPA] Defense Advanced Research Projects Agency, http://www.darpa.gov[Davenport, 1993] Thomas Davenport: Process Innovation: Reengineering work

through information technology, Harvard Business School Press, Boston, 1993[Davis, Van House, Towle, et al, 2005] M. Davis, N. Van House, J. Towle, S. King, S.

Ahern, C. Burgener, D. Perkel, M. Finn, V. Viswanathan, M. Rothenberg:(2005). MMM2: mobile media metadata for media sharing, CHI ’05 extendedabstracts on Human factors in computing systems, April 02-07, Portland, OR,USA, 2005, http://portal.acm.org/citation.cfm?id=1056910&dl=GUIDE&coll=GUIDE&CFID=25341546&CFTOKEN=26292269

Page 132: eTOUR CWA final 2009-06-03

132 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

[de Laborda, Conrad, 2005] C.P. de Laborda, S. Conrad: Relational.OWL A Data andSchema Representation Format Based on OWL. In Second Asia-PacificConference on Conceptual Modelling (APCCM2005), volume 43 of CRPIT, p89-96, Newcastle, Australia, 2005, ACS

[Dell’Erba, Fodor, Höpken, et al, 2005] M. Dell’Erba, O. Fodor, W. Höpken, et al,“Exploiting Semantic Web Technologies for Harmonizing e-Markets”. In: IT&TInformation Technology & Tourism – Application – Methodologies –Techniques, 2005

[DIP] http://dip.semanticweb.org/index.html[Directive 90/314/EEC] Council Directive 90/314/EEC of 13 June 1990 on package

travel, package holidays and package tours[Dodgeball] http://www.dodgeball.com/[Dörr, 2003] M. Dörr: “The cidoc conceptual reference module: An ontological

approach to semantic interoperability of metadata”. AI Magazine 24(3) (2003),75–92

[Dörr, Guarino, Fernández López, et al, 2001] M. Dörr, N. Guarino, M. FernándezLópez, E. Schulten, M. Stefanova, A. Tate: “State of the Art in ContentStandards. OntoWeb Deliverable 3.1.”, Technical Report, 2001

[Dörr, Hunter, Lagoze, 2003] M. Dörr, J. Hunter, C. Lagoze: “Towards a coreontology for information integration. Journal of Digital Information 4(1)” (2003)

[Dou, McDermott, Qi] D. Dou, McDermott, P. Qi: “Ontology translation by OntologyMerging and Automated Reasoning”

[Dunieveld, Stoter, Weiden, et al, 2000] A.J. Dunieveld, R. Stoter, M.R. Weiden, B.Kenepa, V.R. Benjamins: “WonderTools? A comparative study of ontologicalengineering tools”, 2000

[Earley, 2005] S. Earley: Resolving Taxonomy Challenges and InformationArchitecture Conflicts, 2005 http://www.dama-nj.org/presentations/Seth%20Earley%20Taxonomies%20May%2012%202005%20(DamaNJ).pdf

[eBusiness W@tch Report 2006/2007] eBusiness W@tch Report 2006/2007,http://www.ebusiness-watch.org/key_reports/documents/EBR06.pdf

[ebXML] eBusiness XML, http://www.ebxml.org/[Echarte, Astrain, Cordoba, Villadangos, 2007] F. Echarte, J.J. Astrain, A. Cordoba,

J. Villadangos: Ontology of Folksonomy: A New Modelling Method. Proceedingsof the Semantic Authoring, Annotation and Knowledge Markup Workshop(SAAKM2007), British Columbia, Canada, Vol-289, 2007,http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-289/p08.pdf

[ESP Game] http://www.espgame.org/[ETSI] European Telecommunications Standards Institute, http://www.etsi.org/[euromuse] http://www.euromuse.net, http://www.euromuse-project.net[Expedia] http://www.expedia.com/[Fabian, 1975] J. Fabian: “Taxonomy and Ideology: On the Boundaries of Concept

Classification”. In: M. Kinkade (ed), Linguistics and Anthropology, Lisse, p 183-197, 1975

[Facebook] http://www.facebook.com/[Flickr] http://www.flickr.com/[Fodor, Werthner, 2005] Oliver Fodor, Hannes Werthner: Harmonise: a step toward

an interoperable e-tourism marketplace. In: International Journal of ElectronicCommerce, Winter 2004-5, Vol 9, No 2, p 11-39, 2005

[Freyer, 2006] Freyer, Walter: Tourismus: Einführung in dieFremdenverkehrsökonomie, 8th revised ed, München : Oldenbourg, 2006

Page 133: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 133

[Fuxman, Hernández, Ho, et al, 2006] A. Fuxman, M.A. Hernández, H. Ho, R. Miller,P. Papotti, L. Popa: Nested Mappings: Schema Mapping Reloaded. Proc. VLDB2006 Conf., p 67-78, Seoul, Korea, 2006

[Garshol, 2004] L.M. Garshol: Metadata? Thesauri? Taxonomies? Topic Maps!Making Sense of it all, Journal of Information Science, 2004

[Gennari, Musen, Fergerson, et al, 2002] J. Gennari, M.A. Musen, R.W. Fergerson,W.E. Grosso, M. Crubezy, H. Eriksson, N.F. Noy, S.W. Tu: The Evolution ofProtégé: An Environment for Knowledge-Based Systems Development,Technical Report SMI-2002-0943, 2002

[Ghawi, Cullot] R. Ghawi, N. Cullot: Database-to-ontology Mapping Generation forsemantic interoperability

[Gilchrist, 2003] A. Gilchrist: Thesauri, taxonomies and ontologies - an etymologicalnote. Journal of Documentation, 2003, 59 (1), p 7-18

[Goodall, Lewis, Martinez, et al, 2004] S. Goodall, P.H. Lewis, K. Martinez, P.Sinclair, F. Giorgini, M.J. Addis, M.J. Boniface, C. Lahanier, J. Stevenson:“SCULPTEUR: Multimedia Retrieval for Museums”, CIVR 2004, LNCS 3115, p638-646, 2004

[Grishman, 2003] Ralph Grishman, “Information Extraction”. In: The OxfordHandbook of Computational Linguistics, ed. R. Mitkov, Oxford University Press,2003

[Grosof, Horrocks, Volz, Decker, 2003] B.N. Grosof, I. Horrocks, R. Volz, S. Decker:Description logic programs: Combining logic programs with description logic. InProc. of the Twelfth International World Wide Web Conference (WWW 2003), p48-57, ACM, 2003

[Grossman, 2004] Grossman, David: Confusion is the star of hotel rating systems,http://www.usatoday.com/travel/columnist/grossman/2004-03-05-grossman_x.htm

[Grove, 2003] A. Grove: Taxonomy. In: Encyclopedia of Library and InformationScience, p 2770-2777, New York, Marcel Dekker Inc, 2003

[Gruber, 1993a] T.R. Gruber: “A translation approach to portable ontologyspecifications”, Knowledge Acquisition, Vol 5, 1993

[Gruber, 1993b] T.R. Gruber: “Towards Principles of the Design of Ontologies Usedfor Knowledge Sharing”, International Journal of Human Computer Studies, Vol43, p 907-928, 1993

[Gruber, 2005a] T. Gruber: Ontology of Folksonomy: A Mash-up of Apples andOranges, AIS SIGSEMIS Bulletin, 2005

[Gruber, 2005b] T. Gruber: TagOntology, a way to agree on the semantics of taggingdata, 2005

[GS1] http://www.gs1.org/[Guarino, Giaretta, 1995] N. Guarino, P. Giaretta: “Ontologies and knowledge bases.

Towards a terminological clarification”, Towards Very Large Knowledge Bases.Ed IOS Press, p 25-32

[GUID] http://en.wikipedia.org/wiki/GUID[Gulli, Signorini, 2005] A. Gulli, A. Signorini: Building an open source meta search

engine [WWW2005][Haase, Wang, 2007] P. Haase, Y. Wang: “A decentralized infrastructure for query

answering over distributed ontologies”. In: Proceedings of the 2007 ACMSymposium on Applied Computing (Seoul, Korea, March 11-15, 2007). SAC’07. ACM, New York, NY, p 1351-1356,http://doi.acm.org/10.1145/1244002.1244294

Page 134: eTOUR CWA final 2009-06-03

134 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

[HarmoNET] The Harmonisation Network for the Exchange of Travel and TourismInformation, http://www.harmonet.org/

[HEDNA] http://www.hedna.org/[Heflin, 2001] J. Heflin, J. Hendler, “A portrait of the Semantic Web in action”, IEEE

Intell. Syst. 16 (2) (2001), p 54–59[Hempel, 1965] C.G. Hempel: “Fundamentals of Taxonomy”, p 137-154. In: C. G.

Hempel: Aspects of scientific explanation and other essays in the philosophy ofscience, New York, The Free Press, 1965

[Hepp, Leymann, Domingue, et al, 2005] Martin Hepp, Frank Leymann, JohnDomingue, Alexander Wahler, Dieter Fensel: Semantic Business ProcessManagement: A Vision Towards Using Semantic Web Services for BusinessProcess Management, Proceedings of the IEEE ICEBE. 2005

[Höpken, 2004] Wolfram Höpken: Reference Model of an Electronic Tourism Market(IFITT RM), Version 1.3, 2004,http://www.rmsig.de/documents/ReferenceModel.doc

[Hull, 1998] D.L. Hull: Taxonomy. In: Routledge Encyclopedia of Philosophy, Version1.0, London, Routledge, 1998

[Hunter, 2002] J. Hunter: “Combing the CIDOC CRM and MPEG-7 to describemultimedia in museums”, In: Proceedings of Museums on the Web 2002Conference, Boston, 2002

[IATA] http://www.iata.org/, http://en.wikipedia.org/wiki/IATA[IEEE] Institute of Electrical and Electronics Engineers, http://www.ieee.org[IFITT] International Federation for IT and Travel & Tourism, http://www.ifitt.org/[IFLA] International Federation of Library Associations and Institutions,

http://www.ifla.org/[ISO] International Organization for Standardization, http://www.iso.org/; for

references to ISO standards see also chapter 2 “Normative references”[ISO 3166] http://www.iso.org/iso/country_codes.htm,

http://www.iso.org/iso/fr/country_codes.htm[ISO/IEEE 11073] Health informatics — Point-of-care medical device

communications (multiple parts)[ISO 21127:2006] Information and documentation — A reference ontology for the

interchange of cultural heritage information[IST] Information Society Technologies, http://cordis.europa.eu/ist/[ITU] International Telecommunication Union, http://www.itu.int[Iurgel, 2004] I. Iurgel: From another point of view: art-E-fact, In: Proc. TIDSE’04

(2004) vol 1, p 26-35[Kalfoglou, Schorlemmer, 2003] Yannis Kalfoglou, Marco Schorlemmer: Ontology

mapping, the state of the art. Knowledge Engineering Review, 18(1), p 1-31,2003

[Kim, Yang, Song, et al, 2007] H.L. Kim, S.K. Yang, S.J. Song, G.J. Breslin: “TagMediated Society with SCOT Ontology”, Proceedings of the Semantic WebChallenge 2007 in conjunction with the Sixth International Semantic WebConference, November 11-15, Busan, Korea, 2007

[Knerr, 2006] T. Knerr: Tagging Ontology: Towards a Common Ontology forFolksonomies, 2006

[Konstantinou, Spanos, Chalas, et al, 2006] N. Konstantinou, D. Spanos, M. Chalas,E. Solidakis, N. Mitrou: VisAVis: An Approach to an Intermediate Layer betweenOntologies and Relational Database Contents. International Workshop on WebInformation Systems Modeling (WISM 2006), Luxembourg, 2006

Page 135: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 135

[Küster, Moore, Ludwig, 2007] Marc Wilhelm Küster, Graham Moore, and ChristophLudwig, “Semantic registries.” In: XMLTage 2007 in Berlin, Berlin, 2007

[Lagoze, Hunter, 2001] C. Lagoze, J. Hunter: “The ABC Ontology and Model”,Journal of Digital Information, Vol 2, No 2, 2001

[Lahti, Palola, Korva, et al, 2006] J. Lahti, M. Palola, J. Korva, U. Westermann, K.Pentikousis, P. Pietarila: “A mobile phone-based context-aware videomanagement application,” In: Multimedia on Mobile Devices II, Edited byCreutzburg, Takala, Chen, Proceedings of the SPIE, Volume 6074, p 204-215,2006

[Lamsfus, Linaza, Smithers] Carlos Lamsfus, María Teresa Linaza, Tim Smithers:“Towards semantic-based information exchange and integration standards: theart-E-fact ontology as a possible extension to the CIDOC CRM (ISO/CD 21127)standard”. K-CAP2005, Banff, Alberta, Canada, Proceedings (ISSN 1613-0073)of the Workshop on Integrating Ontologies, p 49-54

[Landwehr, Bull, McDermott, Chpi, 1994] C.E. Landwehr, A.R. Bull, J.P. McDermott,W.S. Chpi: A Taxonomy of Computer Program Security Flaws, with Examples.ACM Computing Surveys, 26,3 (Sept 1994),http://chacs.nrl.navy.mil/publications/CHACS/1994/1994landwehr-acmcs.pdf

[Lassila, Swick, 1999] O. Lassila, R.R. Swick: “Resource Description Frameworks(RDF): Model and Syntax Specification”, Recommendation World Wide WebConsortium, February 1999

[LOCODE] http://www.unece.org/cefact/locode/[Lu, Meng, Shu, et al, 2005] Y. Lu, W. Meng, L. Shu, C. Yu, K. Liu: Evaluation of

Result Merging Strategies for Metasearch Engines. WISE Conference, 2005[Lu, Wu, Zhao, et al, 2007] Yiyao Lu, Zonghuan Wu, Hongkun Zhao, Weiyi Meng,

King-Lup Liu, Vijay Raghavan, Clement Yu: MySearchView: A CustomizedMetasearch Engine Generator. 26th ACM SIGMOD International Conference onManagement of Data (SIGMOD 2007), Demo paper, p 1113-1115, Beijing,China, June 2007

[Marradi, 1990] A. Marradi Classification, Typology, Taxonomy. Quality and Quantity,1990, XXIV, 2, p 129-157. Available at:http://web.archive.org/web/20040705070709/http://www.unibo.edu.ar/marradi/classqq.pdf (Visited 2004-01-04)

[McDowell, 2003] L. McDowell, O. Etzioni, S. Gribble, A. Halevy, H. Levy, W.Pentney,D. Verma, S. Vlasseva, Enticing ordinary people onto the SemanticWeb via instant gratification. In: Proceedings of the 2nd International SemanticWeb Conference (ISWC 2003), October 2003

[Medjahed, Bouguettaya, 2005] Brahim Medjahed, Athman Bouguettaya: A MultilevelComposability Model for Semantic Web Services, IEEE Transactions onKnowledge and Data Engineering (July 2005) vol 17 Issue7 p 954-968

[Meehl, 1995] P.E. Meehl: Bootstraps taxometrics: solving the classification problemin psychopathology. American Psychologist, 1995, 50(4), p 266-275

[Meng, Yu, Liu, 2002] W. Meng, C. Yu, K. Liu: Building Efficient and EffectiveMetasearch Engines. ACM Computing Surveys, 34(1), March 2002, p 48-89

[Merholz, 2004] P. Merholz: Ethnoclassification and vernacular vocabularies, 2004[metasearch] http://www.trln.org/events/NISO/NISOmetasearch.ppt[Miles, Brickley, 2005] A. Miles, D. Brickley: SKOS Core Vocabulary Specification,

W3C Working Draft, 2005

Page 136: eTOUR CWA final 2009-06-03

136 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

[Miller, Haas, Hernandez, 2000] E. Miller, L. Haas, M.A. Hernandez: SchemaMapping as Query Discovery. Proc. VLDB 2000 Conf., p 77-88, Cairo, Egypt,2000

[Mishler, 2006] B.D. Mishler: Integrative Biology 200A, “Principles of phylogenetics”,2006, http://ib.berkeley.edu/courses/ib200a/pdfs/lect_12_(classification).pdf

[Mutton, P., and Golbeck, 2003] P. Mutton, J. Golbeck: “Visualization of semanticmetadata and ontologies”, In: Proc. of Information Visualization 2003, LondonUK, 2003

[MySpace] http://www.myspace.com/[Neches, Fikes, Finin, et al, 1991] R. Neches, R.E. Fikes, T. Finin, T.R. Gruber, T.

Senator, W.R. Swarout: “Enabling technology for knowledge sharing”, AIMagazine, Vol 12, No 3, p 36-56, 1991

[Noy, McGuinness, 2001] N.F. Noy, D.L. McGuinness: “Ontology development 101: AGuide to creating your first ontology”, Standford University, 2001

[OASIS ebXML Registry] OASIS ebXML Registry Information Model (RIM) Standard,v3.0 and OASIS ebXML Registry Services (RS) Standard, v3.0,http://www.oasis-open.org/committees/download.php/23648/regrep-3.0.1-cd3.zip

[OASIS Reference Model] http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=soa-rm, http://www.oasis-open.org/committees/download.php/16587/wd-soa-rm-cd1ED.pdf

[ontology] http://www.ontologyportal.org/pubs/IJCAI2001.pdf[Opodo] http://www.opodo.com/[OTA] Open Travel Alliance, http://www.opentravel.org/[P3P] W3C’s Platform for Privacy Preferences, http://www.w3.org/P3P/[Petrini, Risch, 2004] J. Petrini, T. Risch: Processing Queries over RDF views of

Wrapped Relational Databases. In 1st International Workshop on WrapperTechniques for Legacy Systems, WRAP 2004, Delft, Holland, 2004

[Photostuff] http://www.photostuff.com/[Prud’Hommeaux, Seaborne, 2006] E. Prud’Hommeaux, A. Seaborne: SPARQL

Query Language for RDF. World Wide Web Consortium, Working Draft WD-rdf-sparql-query-2006, 2006

[Quilitz, Leser, 2008] B. Quilitz, U. Leser: Querying Distributed RDF Data Sourceswith SPARQL. In: The Semantic Web: Research and Applications. LNCS5021/2008, p 524-538

[Quint, 2004] V. Quint, I. Vatton, An Introduction to Amaya, W3C NOTE 20-February-1997, 1997, http://www.w3.org/TR/NOTE-amaya-970220.html

[Rabble] http://www.rabble.com/[Ramakrishnan, Gehrke, 2002] Raghu Ramakrishnan, Johannes Gehrke: Database

Management Systems, 3rd edition, McGraw-Hill, 2002[RDF] http://www.w3.org/RDF/[reference sources]

http://www.libraries.rutgers.edu/rul/rr_gateway/e_ref_shelf/refmaps.shtml[REWERSE] Reasoning on the Web with Rules and Semantics, http://rewerse.net/[RFC 4122] http://www.ietf.org/rfc/rfc4122.txt, http://www.faqs.org/rfcs/rfc4122.html[Rodriguez, Gómez-Pérez, 2006] J.B. Rodriguez, A. Gómez-Pérez: “Upgrading

relational legacy data to the semantic web”. In: Proceedings of the 15thInternational Conference on World Wide Web (Edinburgh, Scotland, May 23 -26, 2006), WWW ’06, ACM Press, New York, NY, p 1069-1070

[RQL] http://www.openrdf.org/doc/rql-tutorial.html

Page 137: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 137

[Rummler, Brache, 1995] Rummler, Brache: Improving Performance: How to managethe white space on the organizational chart, Jossey-Bass, San Francisco, 1995

[Sanghee, Lewis, Martinez, 2004] K. Sanghee, P. Lewis, K. Martinez: “SCULPTEUR-D7.1- Semantic Network of Concepts and their Relationships”, TechnicalDeliverable, 2004

[Sarvas, Herrarte, Wilhelm, Davis, 2004] R. Sarvas, E. Herrarte, A. Wilhelm, M.Davis: “Metadata creation system for mobile images”. In: Proceedings of the2nd international conference on Mobile systems, applications, and services,Boston, MA, USA, 2004,http://portal.acm.org/citation.cfm?id=990072&dl=GUIDE&coll=GUIDE&CFID=25341318&CFTOKEN=52999446

[Sarvas, Viikari, Pesonen, Nevanlinna, 2004] R. Sarvas, M. Viikari, J. Pesonen, H.Nevanlinna: “MobShare: controlled and immediate sharing of mobile images”.In: Proceedings of the 12th annual ACM international conference onMultimedia, October 10-16, 2004, New York, NY, USA,http://portal.acm.org/citation.cfm?id=1027690&dl=GUIDE&coll=GUIDE&CFID=25341318&CFTOKEN=52999446

[SATINE] Semantic Web travel services on a voyage of discovery,http://www.srdc.metu.edu.tr/webpage/projects/satine/,http://cordis.europa.eu/ictresults/index.cfm/section/news/tpl/article/BrowsingType/Features/ID/79947

[Schenk, Staab, 2008] S. Schenk, S. Staab: “Networked graphs: a declarativemechanism for SPARQL rules, SPARQL views and RDF data integration on theweb”. In: Proceeding of the 17th international Conference on World Wide Web(Beijing, China, April 21-25, 2008). WWW ’08. ACM, New York, NY, p 585-594,http://doi.acm.org/10.1145/1367497.1367577

[Schroeter, 2003] R. Schroeter, J. Hunter, D. Kosovic, Vannotea, A collaborativevideo indexing, annotation and discussion system for broadband networks, in:Proceedings of the K-CAP 2003 Workshop on “Knowledge Markup andSemantic Annotation”, October 2003, Florida, 2003

[SCORM] Sharable Content Object Reference Model,http://www.adlnet.gov/scorm/index.aspx

[Shvaiko, Euzenat, 2005] P. Shvaiko, J. Euzenat: “A Survey of Schema-BasedMatching Approaches”. In: J. Data Semantics IV 3730, 2005, p 146-171

[Silva] Nuno Silva: Ontology Mapping for Interoperability in Semantic Web, GEDAC -Knowledge Engineering and Decission Support Research Group, Porto,Portugal

[Slavic, 2000] A. Slavic: A Definition of Thesauri and Classification as Indexing Tools,2000, http://dublincore.org/documents/thesauri-definition/ (Visited 2005-12-20)

[Smithers, Posada, Stork, et al, 2004] T. Smithers, J. Posada, A. Stork, M.Pianciamore, N. Ferreira, S. Grimm, I. Jimenez, S. di Marca, G. Marcos, M.Mauri, P. Selvini, N. Sevilmis, B. Thelen, V. Zecchino: “Informationmanagement and knowledge sharing in wide”, In: European Workshop on theIntegration of Knowledge, Semantics and Digital Media Technology, London,2004

[SOUPA] Standard Ontology for Ubiquitous and Pervasive Applications,http://ebiquity.umbc.edu/paper/html/id/168/

[standard] http://www.webopedia.com/TERM/S/standard.htm,http://www.etsi.org/WebSite/Standards/WhatIsAStandard.aspx

Page 138: eTOUR CWA final 2009-06-03

138 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03

[Stuckenschmidt, van Harmelen, de Waard, et al, 2004] H. Stuckenschmidt, F. vanHarmelen, A. de Waard, T. Scerri, R. Bhogal, J. van Buel, I. Crowlesmith, Ch.Fluit, A. Kampman, J. Broekstra, E. van Mulligen: “Exploring Large DocumentRepositories with RDF Technology: The DOPE project”, IEEE IntelligentSystems, Vol 19, No 3, p 34-40, 2004

[Studer, Benjamins, Fensel, 1998] R. Studer, R. Benjamins, D. Fensel: “KnowledgeEngineering”, DKE, Vol 25, No 1-2, p 161-197, 1998

[SUO] Standard Upper Ontology, http://suo.ieee.org/[SUPER] http://www.ip-super.org/[taxonomy] http://archive.eiffel.com/doc/manuals/technology/oosc/inheritance-

design/penn.html, http://www.db.dk/bh/lifeboat_ko/concepts/taxonomy.htm,http://www.db.dk/jni/lifeboat/info.asp?subjectid=15,http://en.wikipedia.org/wiki/Taxonomy

[Terzi, Vakali, Hacid] Elvimaria Terzi, Athna Vakali, Mohand-Saïd Hacid, “KnowledgeRepresentaioin, ontologies and the Semantic Web”

[Topic maps] http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html[towntology] http://www.towntology.net/Meetings/0605-Belfast/Minutes-

Presentations.pdf[Trant, Wyman, 2006] J. Trant, B. Wyman: Investigating social tagging and

folksonomy in art museums with steve.museum. World Wide Web 2006:Tagging Workshop. Editor, Edinburgh, Scotland, ACM, Access, 2006

[TUI] http://www.tui.com/[UNESCO, 2002] UNESCO: Universal Declaration on Cultural Diversity, 2002,

http://unesdoc.unesco.org/images/0012/001271/127160m.pdf[UNWTO] World Tourism Organization, http://www.unwto.org/[Uren, Cimiano, Iria, Handschuh, Vargas-Vera, Motta, Ciravegna] Semantic

annotation for knowledge management: Requirements and a survey of the stateof the art, 2005: Journal of Web Semantics

[URI, URL] http://www.w3.org/Addressing/[URN] Uniform Resource Name,

http://en.wikipedia.org/wiki/Uniform_Resource_Name,http://fr.wikipedia.org/wiki/Nom_uniformisé_de_ressource,http://de.wikipedia.org/wiki/Uniform_Resource_Name

[Uschold, Grüninger, 1996] M. Uschold, M. Grüninger: “Ontologies: Principles,Methods and Applications”, Knowledge Engineering Review, Vol 2, 1996

[Van Damme, Hepp, Siorpaes, 2007] C. Van Damme, M. Hepp, K. Siorpaes:FolksOntology: An Integrated Approach for Turning Folksonomies intoOntologies, Proceedings of the ESWC 2007 Workshop “Bridging the Gapbetween Semantic Web and Web 2.0”, Innsbruck, Austria, 2007

[Van Harmelen, Broekstra, Chirtiaan, et al, 2001] F. Van Harmelen, J. Broekstra, F.Chirtiaan, H. Horst, A. Kampman, J. van der Meer, M. Sabou: “Ontology-basedInformation Visualisation”, In: Proceedings of the Fifth International Conferenceon Information Visualisation, England, 2001

[VESA] Video Electronics Standards Association, http://www.vesa.org/[Volz, Handschuch, Staab, Studer, 2004] R. Volz, S. Handschuch, S. Staab, R.

Studer: OntoLiFT Demonstrator, 2004[Volz, Stojanovic, Stojanovic, 2002] R. Volz, L. Stojanovic, N. Stojanovic: Migrating

dataintensive Web Sites into the Semantic Web. ACM Symposium on AppliedComputing (SAC 2002), Madrid, Spain, March 2002

[W3C] World Wide Web Consortium, http://www.w3.org/

Page 139: eTOUR CWA final 2009-06-03

CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 139

[Wache, et al, 2001] H. Wache, et al: Ontology-Based Integration of Information - ASurvey of Existing Approaches. In Stuckenschmidt, H., editor, IJCAI-2001Workshop on Ontologies and Information Sharing, p 108-117, Seattle, USA,April 4-5, 2001

[WAI] W3C’s Web Accessibility Initiative, http://www.w3.org/WAI/[Web Service Modeling Ontology Working Group] http://www.wsmo.org/[Welty, 1998] C.A. Welty: The Ontological Nature of Subject Taxonomies. In: N.

Guarino (ed), Proceedings of the First Conference on Formal Ontology andInformation Systems, Amsterdam, IOS Press, 1998,http://www.cs.vassar.edu/faculty/welty/papers/fois-98/fois-98-1.html

[Welty, 1999] C. Welty, N. Ide, Using the right tools: enhancing retrieval frommarkedup documents, J. Comput. Humanit. 33 (10) (1999), p 59–84

[Wielinga, Schreiber, Wielemaker, Sandberg, 2001] B.J. Wielinga, A.Th. Schreiber, J.Wielemaker, J.A.C. Sandberg: “From Thesaurus to Ontology”, In: Proceedingsof the First International Conference on Knowledge Capture and Acquisition, p194-201, 2001

[Wordpress] http://wordpress.com/[World Travel and Tourism Council, 2008] World Travel and Tourism Council: 2008

Tourism and Travel Executive Summary, 2008,http://www.wttc.org/bin/pdf/temp/exec_summary_final.html

[WRL] The Web Rule Language, http://www.wsmo.org/wsml/wrl/wrl.html[XFT] eXchange For Travel, http://www.exchangefortravel.org/[YouTube] http://www.youtube.com/[Zhao, Meng, Wu, et al, 2005] H. Zhao, W. Meng, Z. Wu, V. Raghavan, C. Yu: Fully

Automatic Wrapper Generation for Search Engines. World Wide WebConference (WWW14), p 66-75, 2005