Interoperability - is it feasible

download Interoperability - is it feasible

of 42

  • date post

    13-Jan-2016
  • Category

    Documents

  • view

    32
  • download

    1

Embed Size (px)

description

Interoperability - is it feasible -. Peter Wittenburg. Why care about interoperability?. e-Science & e-Humanities “data is the currency of modern research” thus need to get integrated access to many data sets data sets are scattered across many repositories => (virtual) integration - PowerPoint PPT Presentation

Transcript of Interoperability - is it feasible

  • Why care about interoperability?e-Science & e-Humanitiesdata is the currency of modern researchthus need to get integrated access to many data sets data sets arescattered across many repositories => (virtual) integrationcreated by different research teams using different conventions (formats, semantics)often in bad states and quality => curation

    thus interoperability most used word at ICRI conference

    Big Questions: What is meant with interoperability?How to remove interoperability barriers to analyze large heterogeneous and probably distributed data sets? Is interoperability something we need/want to achieve?

  • What is interoperability?Wikipedia: Interoperability is a property of a system, whose interfaces are completely understood, to work with other systems, present or future, without any restricted access or implementation.

    IEEE: Interoperability is the ability of two or more systems or components to exchange information and to use the information that has been exchanged.

    OBrian/Marakas: Being able to accomplish end-user applications using different types of computer system, operating systems, and application software, interconnected by different types of local and wide-area networks.

    OSLC: To be interoperable one should actively be engaged in the ongoing process of ensuring that the systems, procedures and culture of an organization are managed in such a way as to maximise opportunities for exchange and re-use of information.

  • What is interoperability?Technical Interoperability (techn. encoding, format, structure, API, protocol)Semantic Interoperability is it also about bridging understanding between two or more humans?

    humans humans we better speak about understandinghumans machine same or?machine machine well here interoperability makes sense

  • What is interoperability?seems that every one speaks about technical systems when talking about interoperabilitydo we include feeding machines with some mapping rules specified by human users and then carrying out some automatic functions? when linguists hear about mapping tag sets some immediately say that it is impossible and does not make sensewhy: tags are part of a whole theory behind it well if you look to other disciplines (life sciences, earth observation sciences etc.) thats exactly what they do whypeople want to work across collections and ignore theories some see tag sets just as first help but want to work on raw datasome see the demand of politicians and society to come up with answers and not with statements about problems AND there is much money (is it useless?)

  • Big Data in Natural Sciencenumbers in regular structures how to find relevant data sets

    volcanology/earthquakes/Tsunamies/etc. X sensor datastreams (seismology)(time, location, parameters)X human observations (biodiversity)(time, location, nr. frogs (etc))

    window extraction to transfer and manage datainterpret regular structures (even frogs)time normalization, take care of dynamics etc.visualize things coherently

  • Big Data in Natural Sciencenumbers in regular structures how to find relevant data sets

    volcanology/earthquakes/Tsunamies/etc. X sensor datastreams (seismology)(time, location, parameters)X human observations (biodiversity)(time, location, nr. frogs (etc))

    window extraction to transfer and manage datainterpret regular structures (even frogs)time normalization, take care of dynamics etc.visualize things coherently interoperability looks simple enoughjust find patterns in sequences of numbers the format you need to know(well not quite as simple, but ...)

  • Big Data in Environmental Sciencesmany different types of observationsclimate, weather, etc.species and populations according to multitude of classification systems and schools

    grand challengehow can all these observations be used to stabilize our environmenthow can it all be used to maintain diversityetc.

  • Big Data in Environmental Sciencesmany different types of observationsclimate, weather, etc.species and populations according to multitude of classification systems and schools

    grand challengehow can all these observations be used to stabilize our environmenthow can it all be used to maintain diversityetc.sounds similar to our fieldinteroperability is toughbut there are expected gains and there is more money

    intensive work also in social science

  • many layers of interop: accessEnablingTechnologiesDiscoveryAccess(ref. resolution, protocols, AAI)InterpretationReuseAccessed via RepositoriesDatasetsmetadata search resulting in Handles (PID) and some propertiesHandle (PID) resolution and you get the datahere linguistics is playing a role(get schemas and semantics)here linguistics is playing an even bigger role(get context information)need a high degree of automationwhat can be automated

  • many layers of interop: management(mostly underestimated!!!)EnablingTechnologiesCollections +PropertiesAccess(ref. resolution, protocols, AAI)formalized policiesworkflow engineAssessmentAccessed via RepositoriesDatasetsmetadata gathering resulting in Handles (PID) and some propertiesHandle (PID) resolution and you get the datacheck of rules and engineits all about establishing trustformal rules manipulate properties of Handles and metadata and may generate new DOsneed a high degree of automationneed a high degreeof automation

  • simple but essential example: PIDsits similar to TCP/IP with all its core machinery that brought us the Internet and thus interoperability with respect to communicationemail system works when we abstract from content and thus the semantics of our human messages and focus on the semantics of attributes, parameters etc.

    lets assume that you want to use a certain file and first want to be sure that the file has not been modifiedyou look up in metadatathat automatically looks for the PIDthe PID is resolved automatically and a checksum is retrievedthe checksum is automatically compared with the checksum of the file accesseda warning is given automatically if the two dont matchthis would be a great service (and will come)

  • emailWWWphoneSMTPHTTPRTPTCPUDPIPcopperfiberradioCSMAasyncsonetethernetPPPInternet machinery (collaboration CNRI and MPI) DNSall applications making use of the same basic protocol where the packet is the basic object and where endpoints have addresses and names

  • PersistentReferenceAnalysisCitationAppsCustomClientsPlug-InsResolution SystemTypingPIDLocal StorageCloudComputedData SetsRDBMSFilesDigital ObjectsData machinery (collaboration CNRI and MPI) all applications making use of the same basic protocol where data is the basic object and where PID and metadata attributes describe object propertiesPID recordattributesbit sequence(instance)metadataattributespoints to instances describes propertiesdescribes properties& contextpoint toeach other

  • Layers of interoperabilityProtocols/APIs: defined formats, semantics, processesSCSI: how to read/write/etc. blocks to/from SCSI disc File System: how to read/write/etc. to/from logical entities how to organize files on a machine (virtualization) how to organize files across machines OAI/PMH: how to serve metadata descriptions SRU/CQL: how to do distributed content search etc. etc.

    all based on standards or widely accepted best practices advantage: standards establish a 1:N relation constant over timelarge number of standards/BP for metadata (structure, semantics)

  • back to linguistics where are we in the linguistics domain?what happened in some well-known projects do we miss the big challenges which other disciplines have and that would force us to ignore schools, vainness, etc.

    4 examplesmetadataDOBESTDSCLARIN

  • metadata is kind of easyDC/OLAC CMDI mapping examples: DC:languageCMDI:languageInDC:languageCMDI:dominantLanguageDC:languageCMDI:sourceLanguageDC:languageCMDI: targetLanguageDC:dateCMDI:creationDateDC:dateCMDI:publicationDateDC:dateCMDI:startYearDC:dateCMDI:derivationDate DC:formatCMDI:mediaTypeDC:formatCMDI:mimeTypeDC:formatCMDI:annotationFormatDC:formatCMDI:characterEncodingeveryone accepts now: metadata is for pragmatic purposes and not replacing the one and only one true categorization mapping errors may influence recall and precision but who cares reallycrucial for machine processingsemantic mapping doable due to limited element sets and to now well-described semantics (except for recursive machines such as TEI)if mapping is used for discovery no problemif mapping is used for statistics well ...

  • truth in metadata usage still !!Rebecca Koskela: DataONE

  • DOBES some factsDOBES = Documentation of Endangered Languages

    some factsstarted 2000 with 7 international teams and 1 archive team2012: now 68 documentation teams working almost every wherecross-disciplinary approach: linguists, ethnologists, musicologists, biologists, ship builders, etc.every year one workshop and two training courses

  • DOBES Agreementsin first 2-3 years quite some joint agreementsformats to be stored in the archive interoperabilityprinciples of archiving such as PIDs workflows determining the archive-team interaction organizational principles to manage and manipulate data metadata to be used to manage and find data (pragmatics vs. theory) joint agreement on Code of Conduct short discussions on more linguistic aspects failedagreement on joint tag set - NOagreement on joint lexical structures - NOetc. good reason: the languages are so different bad reason: agreements require effort

  • recent DOBES Questions now after >10 years we have so much good data in the archivewhat can we do with it ????traditional: every researcher looks at his/her data and publishes of co