Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and...

25
Lecture Notes in Computer Science 12391 Founding Editors Gerhard Goos Karlsruhe Institute of Technology, Karlsruhe, Germany Juris Hartmanis Cornell University, Ithaca, NY, USA Editorial Board Members Elisa Bertino Purdue University, West Lafayette, IN, USA Wen Gao Peking University, Beijing, China Bernhard Steffen TU Dortmund University, Dortmund, Germany Gerhard Woeginger RWTH Aachen, Aachen, Germany Moti Yung Columbia University, New York, NY, USA

Transcript of Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and...

Page 1: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Lecture Notes in Computer Science 12391

Founding Editors

Gerhard GoosKarlsruhe Institute of Technology, Karlsruhe, Germany

Juris HartmanisCornell University, Ithaca, NY, USA

Editorial Board Members

Elisa BertinoPurdue University, West Lafayette, IN, USA

Wen GaoPeking University, Beijing, China

Bernhard SteffenTU Dortmund University, Dortmund, Germany

Gerhard WoegingerRWTH Aachen, Aachen, Germany

Moti YungColumbia University, New York, NY, USA

Page 2: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

More information about this series at http://www.springer.com/series/7409

Page 3: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Sven Hartmann • Josef Küng •

Gabriele Kotsis • A Min Tjoa •

Ismail Khalil (Eds.)

Database and ExpertSystems Applications31st International Conference, DEXA 2020Bratislava, Slovakia, September 14–17, 2020Proceedings, Part I

123

Page 4: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

EditorsSven HartmannClausthal University of TechnologyClausthal-Zellerfeld, Germany

Josef KüngJohannes Kepler University of LinzLinz, Austria

Gabriele KotsisJohannes Kepler University of LinzLinz, Austria

A Min TjoaIFSVienna University of TechnologyVienna, Wien, AustriaIsmail Khalil

Johannes Kepler University of LinzLinz, Austria

ISSN 0302-9743 ISSN 1611-3349 (electronic)Lecture Notes in Computer ScienceISBN 978-3-030-59002-4 ISBN 978-3-030-59003-1 (eBook)https://doi.org/10.1007/978-3-030-59003-1

LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI

© Springer Nature Switzerland AG 2020This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of thematerial is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology nowknown or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information in this book arebelieved to be true and accurate at the date of publication. Neither the publisher nor the authors or the editorsgive a warranty, expressed or implied, with respect to the material contained herein or for any errors oromissions that may have been made. The publisher remains neutral with regard to jurisdictional claims inpublished maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AGThe registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Page 5: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Preface

This volume contains the papers presented at the 31st International Conference onDatabase and Expert Systems Applications (DEXA 2020). This year, DEXA was heldas a virtual conference during September 14–17, 2020, instead of as it was originallyplanned to be held in Bratislava, Slovakia.

On behalf of the Program Committee we commend these papers to you and hopeyou find them useful.

Database, information, and knowledge systems have always been a core subject ofcomputer science. The ever increasing need to distribute, exchange, and integrate data,information, and knowledge has added further importance to this subject. Advances inthe field will help facilitate new avenues of communication, to proliferate interdisci-plinary discovery, and to drive innovation and commercial opportunity.

DEXA is an international conference series which showcases state-of-the-artresearch activities in database, information, and knowledge systems. The conferenceand its associated workshops provide a premier annual forum to present originalresearch results and to examine advanced applications in the field. The goal is to bringtogether developers, scientists, and users to extensively discuss requirements, chal-lenges, and solutions in database, information, and knowledge systems.

DEXA 2020 solicited original contributions dealing with all aspects of database,information, and knowledge systems. Suggested topics included, but were not limitedto:

– Acquisition, Modeling, Management and Processing of Knowledge– Authenticity, Privacy, Security, and Trust– Availability, Reliability and Fault Tolerance– Big Data Management and Analytics– Consistency, Integrity, Quality of Data– Constraint Modeling and Processing– Cloud Computing and Database-as-a-Service– Database Federation and Integration, Interoperability, Multi-Databases– Data and Information Networks– Data and Information Semantics– Data Integration, Metadata Management, and Interoperability– Data Structures and Data Management Algorithms– Database and Information System Architecture and Performance– Data Streams and Sensor Data– Data Warehousing– Decision Support Systems and Their Applications– Dependability, Reliability, and Fault Tolerance– Digital Libraries and Multimedia Databases– Distributed, Parallel, P2P, Grid, and Cloud Databases– Graph Databases

Page 6: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

– Incomplete and Uncertain Data– Information Retrieval– Information and Database Systems and Their Applications– Mobile, Pervasive, and Ubiquitous Data– Modeling, Automation, and Optimization of Processes– NoSQL and NewSQL Databases– Object, Object-Relational, and Deductive Databases– Provenance of Data and Information– Semantic Web and Ontologies– Social Networks, Social Web, Graph, and Personal Information Management– Statistical and Scientific Databases– Temporal, Spatial, and High Dimensional Databases– Query Processing and Transaction Management– User Interfaces to Databases and Information Systems– Visual Data Analytics, Data Mining, and Knowledge Discovery– WWW, Databases and Web Services– Workflow Management and Databases– XML and Semi-structured Data

Following the call for papers which yielded 190 submissions, there was a rigorousrefereeing process that saw each submission reviewed by three to five internationalexperts. The 38 submissions judged best by the Program Committee were accepted asfull research papers, yielding an acceptance rate of 20%. A further 20 submissions wereaccepted as short research papers.

As is the tradition of DEXA, all accepted papers are published by Springer. Authorsof selected papers presented at the conference were invited to submit substantiallyextended versions of their conference papers for publication in special issues ofinternational journals. The submitted extended versions underwent a further reviewprocess.

We wish to thank all authors who submitted papers and all conference participantsfor the fruitful discussions.

This year we have five keynote talks addressing emerging trends in the database andartificial intelligence community:

• “Knowledge Graphs for Drug Discovery” by Prof. Ying Ding (The University ofTexas at Austin, USA)

• “Incremental Learning and Learning with Drift” by Prof. Barbara Hammer (CITECCentre of Excellence, Bielefeld University, Germany)

• “From Sensors to Dempster-Shafer Theory and Back: the Axiom of AmbiguousSensor Correctness and its Applications” by Prof. Dirk Draheim (Tallinn Universityof Technology, Estonia)

• “Knowledge Availability and Information Literacies” by Dr. Gerald Weber (TheUniversity of Auckland, New Zealand)

• “Explainable Fact Checking for Statistical and Property Claims” by Paolo Papotti(EURECOM, France)

vi Preface

Page 7: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

In addition, we had a panel discussion on “The Age of Science-making Machines”led by Prof. Stéphane Bressan (National University of Singapore, Singapore).

This edition of DEXA features three international workshops covering a variety ofspecialized topics:

• BIOKDD 2020: 11th International Workshop on Biological Knowledge Discoveryfrom Data

• IWCFS 2020: 4th International Workshop on Cyber-Security and Functional Safetyin Cyber-Physical Systems

• MLKgraphs 2020: Second International Workshop on Machine Learning andKnowledge Graphs

The success of DEXA 2020 is a result of collegial teamwork from many individuals.We like to thank the members of the Program Committee and the external referees fortheir timely expertise in carefully reviewing the submissions.

Warm thanks to Ismail Khalil and the conference organizers as well as all workshoporganizers.

We would also like to express our thanks to all institutions actively supporting thisevent, namely:

• Comenius University Bratislava (who was prepared to host the conference)• Institute of Telekoopertion, Johannes Kepler University Linz (JKU)• Software Competence Center Hagenberg (SCCH)• International Organization for Information Integration and Web based Applications

and Services (@WAS)

We hope you enjoyed the conference program.

September 2020 Sven HartmannJosef Küng

Preface vii

Page 8: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Organization

Program Committee Chairs

Sven Hartmann Clausthal University of Technology, GermanyJosef Küng Johannes Kepler University Linz, Austria

Steering Committee

Gabriele Kotsis Johannes Kepler University Linz, AustriaA Min Tjoa Vienna University of Technology, AustriaIsmail Khalil Johannes Kepler University Linz, Austria

Program Committee and Reviewers

Javier Nieves Acedo Azterlan, SpainSonali Agarwal IIIT, IndiaRiccardo Albertoni CNR-IMATI, ItalyIdir Amine Amarouche USTHB, AlgeriaRachid Anane Coventry University, UKMustafa Atay Winston-Salem State University, USAFaten Atigui CEDRIC, CNAM, FranceLadjel Bellatreche LIAS, ENSMA, FranceNadia Bennani LIRIS, INSA de Lyon, FranceKarim Benouaret Université de Lyon, FranceDjamal Benslimane Université de Lyon, FranceMorad Benyoucef University of Ottawa, CanadaCatherine Berrut LIG, Université Joseph Fourier Grenoble I, FranceVasudha Bhatnagar University of Delhi, IndiaAthman Bouguettaya The University of Sydney, AustraliaOmar Boussai ERIC Laboratory, FranceStephane Bressan National University of Singapore, SingaporeSharma Chakravarthy The University of Texas at Arlington, USACindy Chen University of Massachusetts Lowell, USAGang Chen Victoria University of Wellington, New ZealandMax Chevalier IRIT, FranceSoon Ae Chun City University of New York, USAAlfredo Cuzzocrea University of Calabria, ItalyDebora Dahl Conversational Technologies, USAJérôme Darmont Université de Lyon, FranceSoumyava Das Teradata Labs, USAVincenzo Deufemia University of Salerno, ItalyDejing Dou University of Oregon, USA

Page 9: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Cedric Du Mouza CNAM, FranceUmut Durak German Aerospace Center (DLR), GermanySuzanne Embury The University of Manchester, UKMarkus Endres University of Passau, GermanyNora Faci Université de Lyon, FranceBettina Fazzinga ICAR-CNR, ItalyStefano Ferilli Università di Bari, ItalyFlavio Ferrarotti Software Competence Centre Hagenberg, AustriaFlavius Frasincar Erasmus University Rotterdam, The NetherlandsBernhard Freudenthaler Software Competence Center Hagenberg, AustriaSteven Furnell Plymouth University, UKAryya Gangopadhyay University of Maryland Baltimore County, USAManolis Gergatsoulis Ionian University, GreeceJavad Ghofrani HTW Dresden University of Applied Sciences,

GermanyVikram Goyal IIIT Delhi, IndiaSven Groppe University of Lübeck, GermanyWilliam Grosky University of Michigan, USAFrancesco Guerra Università di Modena e Reggio Emilia, ItalyGiovanna Guerrini DISI, University of Genova, ItalyAllel Hadjali LIAS, ENSMA, FranceAbdelkader Hameurlain IRIT, Paul Sabatier University, FranceTakahiro Hara Osaka University, JapanIonut Iacob Georgia Southern University, USAHamidah Ibrahim Universiti Putra Malaysia, MalaysiaSergio Ilarri University of Zaragoza, SpainAbdessamad Imine Loria, FrancePeiquan Jin University of Science and Technology of China, ChinaAnne Kao Boeing, USAAnne Kayem Hasso Plattner Institute, University of Potsdam,

GermanyUday Kiran The University of Tokyo, JapanCarsten Kleiner University of Applied Science and Arts Hannover,

GermanyHenning Koehler Massey University, New ZealandMichal Kratky VSB-Technical University of Ostrava, Czech RepublicPetr Kremen Czech Technical University in Prague, Czech RepublicAnne Laurent LIRMM, University of Montpellier, FranceLenka Lhotska Czech Technical University in Prague, Czech RepublicWenxin Liang Chongqing University of Posts

and Telecommunications, ChinaChuan-Ming Liu National Taipei University of Technology, TaiwanHong-Cheu Liu School of Information Technology and Mathematical

Sciences, AustraliaJorge Lloret University of Zaragoza, SpainHui Ma Victoria University of Wellington, New Zealand

x Organization

Page 10: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Qiang Ma Kyoto University, JapanZakaria Maamar Zayed University, UAESanjay Madria Missouri S&T, USAJorge Martinez-Gil Software Competence Center Hagenberg, AustriaElio Masciari Federico II University, ItalyAtif Mashkoor Software Competence Center Hagenberg, AustriaJun Miyazaki Tokyo Institute of Technology, JapanLars Moench University of Hagen, GermanyRiad Mokadem Paul Sabatier University, Pyramide Team, FranceAnirban Mondal The University of Tokyo, JapanYang-Sae Moon Kangwon National University, South KoreaFranck Morvan IRIT, Paul Sabatier University, FranceFrancesco D. Muñoz-Escoí Universitat Politècnica de València, SpainIsmael Navas-Delgado University of Malaga, SpainWilfred Ng The Hong Kong University of Science

and Technology, Hong KongTeste Olivier IRIT, FranceCarlos Ordonez University of Houston, USAMarcin Paprzycki Systems Research Institute, Polish Academy

of Sciences, PolandOscar Pastor Lopez Universitat Politècnica de València, SpainClara Pizzuti CNR-ICAR, ItalyElaheh Pourabbas CNR, ItalyBirgit Proell Johannes Kepler University Linz, AustriaRodolfo Resende Federal University of Minas Gerais, BrazilWerner Retschitzegger Johannes Kepler University Linz, AustriaClaudia Roncancio Grenoble Alps University, FranceViera Rozinajova Slovak University of Technology in Bratislava,

SlovakiaMassimo Ruffolo ICAR-CNR, ItalyShelly Sachdeva National Institute of Technology Delhi, IndiaMarinette Savonnet University of Burgundy, FranceStefanie Scherzinger Universität Passau, GermanWieland Schwinger Johannes Kepler University Linz, AustriaFlorence Sedes IRIT, Paul Sabatier University, FranceNazha Selmaoui-Folcher University of New Caledonia, New CaledoniaMichael Sheng Macquarie University, AustraliaPatrick Siarry Université de Paris 12, FranceGheorghe Cosmin Silaghi Babes-Bolyai University, RomaniaHala Skaf-Molli University of Nantes, LS2N, FranceBala Srinivasan Monash University, AustraliaStephanie Teufel University of Fribourg, iimt, SwitzerlandJean-Marc Thévenin IRIT, Université Toulouse I, FranceVicenc Torra University Skövde, SwedenTraian Marius Truta Northern Kentucky University, USALucia Vaira University of Salento, Italy

Organization xi

Page 11: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Ismini Vasileiou De Montfort University, UKKrishnamurthy Vidyasankar Memorial University, CanadaMarco Vieira University of Coimbra, PortugalPiotr Wisniewski Nicolaus Copernicus University, PolandMing Hour Yang Chung Yuan Chritian University, TaiwanXiaochun Yang Northeastern University, ChinaHaruo Yokota Tokyo Institute of Technology, JapanQiang Zhu University of Michigan, USAYan Zhu Southwest Jiaotong University, ChinaMarcin Zimniak Leipzig University, GermanyEster Zumpano University of Calabria, Italy

Organizers

xii Organization

Page 12: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Abstracts of Keynote Talks

Page 13: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Knowledge Graph for Drug Discovery

Ying Ding

The University of Texas at Austin, USA

Abstract. A critical barrier in current drug discovery is the inability to utilizepublic datasets in an integrated fashion to fully understand the actions of drugsand chemical compounds on biological systems. There is a need to intelligentlyintegrate heterogeneous datasets pertaining to compounds, drugs, targets, genes,diseases, and drug side effects now available to enable effective network datamining algorithms to extract important biological relationships. In this talk, wedemonstrate the semantic integration of 25 different databases and showcase thecutting-edge machine learning and deep learning algorithms to mine knowledgegraphs for deep insights, especially the latest graph embedding algorithm thatoutperforms baseline methods for drug and protein binding predictions.

Page 14: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Incremental Learning and Learning with Drift

Barbara Hammer

CITEC Centre of Excellence, Bielefeld University, Germany

Abstract. Neural networks have revolutionized domains such as computervision or language processing, and learning technology is included in everydayconsumer products. Yet, practical problems often render learning surprisinglydifficult, since some of the fundamental assumptions of the success of deeplearning are violated. As an example, only few data might be available for taskssuch as model personalization, hence few shot learning is required. Learningmight take place in non-stationary environments such that models face thestability-plasticity dilemma. In such cases, applicants might be tempted to usemodels for settings they are not intended for, such that invalid results areunavoidable.Within the talk, I will address three challenges of machine learning when

dealing with incremental learning tasks, addressing the questions: how to learnreliably given few examples only, how to learn incrementally in non-stationaryenvironments where drift might occur, and how to enhance machine learningmodels by an explicit reject option, such that they can abstain from classificationif the decision is unclear

Page 15: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

From Sensors to Dempster-Shafer Theoryand Back: The Axiom of Ambiguous Sensor

Correctness and Its Applications

Dirk Draheim

Tallinn University of Technology, [email protected]

Abstract. Since its introduction in the 1960s, Dempster-Shafer theory becameone of the leading strands of research in artificial intelligence with a wide rangeof applications in business, finance, engineering, and medical diagnosis. In thispaper, we aim to grasp the essence of Dempster-Shafer theory by distinguishingbetween ambiguous-and-questionable and ambiguous-but-correct perceptions.Throughout the paper, we reflect our analysis in terms of signals and sensors as anatural field of application. We model ambiguous-and-questionable perceptionsas a probability space with a quantity random variable and an additional per-ception random variable (Dempster model). We introduce a correctness propertyfor perceptions. We use this property as an axiom for ambiguous-but-correctperceptions. In our axiomatization, Dempster’s lower and upper probabilities donot have to be postulated: they are consequences of the perception correctnessproperty. Furthermore, we outline how Dempster’s lower and upper probabili-ties can be understood as best possible estimates of quantity probabilities.Finally, we define a natural knowledge fusion operator for perceptions andcompare it with Dempster’s rule of combination.

Page 16: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Knowledge Availability and InformationLiteracies

Gerald Weber

The University of Auckland, New Zealand

Abstract. At least since Tim Berners-Lee’s call for ‘Raw Data Now’ in 2009,which he combined with a push for linked data as well, the question has beenraised how to make the wealth of data and knowledge available to the citizensof the world. We will set out to explore the many facets and multiple layers ofthis problem, leading up to the question of how we as users will access andutilize the knowledge that should be available to us.

Page 17: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Explainable Fact Checking for Statisticaland Property Claims

Paolo Papotti

EURECOM, France

Abstract. Misinformation is an important problem but fact checkers are over-whelmed by the amount of false content that is produced online every day. Tosupport fact checkers in their efforts, we are creating data-driven verificationmethods that use structured datasets to assess claims and explain their decisions.For statistical claims, we translate text claims into SQL queries on relationaldatabases. We exploit text classifiers to propose validation queries to the usersand rely on tentative execution of query candidates to narrow down the set ofalternatives. The verification process is controlled by a cost-based optimizer thatconsiders expected verification overheads and the expected claim utility astraining samples. For property claims, we use the rich semantics in knowledgegraphs (KGs) to verify claims and produce explanations. As information in aKG is inevitably incomplete, we rely on rule discovery and on text mining togather the evidence to assess claims. Uncertain rules and facts are turned intological programs and the checking task is modeled as a probabilistic inferenceproblem. Experiments show that both methods enable the efficient and effectivelabeling of claims with interpretable explanations, both in simulations and inreal world user studies with 50% decrease in verification time. Our algorithmsare demonstrated in a fact checking website (https://coronacheck.eurecom.fr),which has been used by more than twelve thousand users to verify claims relatedto the coronavirus disease (COVID-19) spread and effect.

Page 18: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

“How Many Apples?” or the Ageof Science-Making Machines (Panel)

Stéphane Bressan (Panel Chair)

National University of Singapore, Singapore

Abstract. Isaac Newton most likely did not spend much time observing applesfalling from trees. Galileo Galilei probably never threw anything from the towerof Pisa. They conducted thought experiments.What can big data, data science, and artificial intelligence contribute to the

creation of scientific knowledge? How can advances in computing, communi-cation, and control improve or positively disrupt the scientific method?Richard Feynman once explained the scientific method as follows. “In gen-

eral, we look for a new law by the following process. First, we guess it; don’tlaugh that is really true. Then we compute the consequences of the guess to seewhat, if this is right, if this law that we guessed is right, we see what it wouldimply. And then we compare those computation results to nature, or, we say,compare to experiment or experience, compare directly with observations to seeif it works. If it disagrees with experiment, it’s wrong and that simple statementis the key to science.” He added euphemistically that “It is therefore notunscientific to take a guess.”Can machines help create science?The numerous advances of the many omics constitute an undeniable body of

evidence that computing, communication, and control technologies, in the formof high-performance computing hardware, programming frameworks, algo-rithms, communication networks, as well as storage, sensing, and actuatingdevices, help scientists and make the scientific process significantly more effi-cient and more effective. Everyone acknowledges the unmatched ability ofmachines to streamline measurements and to process large volumes of results, tofacilitate complex modeling, and to run complex computations and extensivesimulations. The only remaining question seems to be the extent of theirunexplored potential.Furthermore, the media routinely report new spectacular successes of big data

analytics and artificial intelligence that suggest new opportunities. Scientists arediscussing physics-inspired machine learning. We are even contemplating theprospect of breaking combinatorial barriers with quantum computers. However,except, possibly for the latter, one way or another, it all seems about heavy-dutymuscle-flexing without much subtlety nor finesse.Can machines take a guess?Although the thought processes leading to the guesses from which theories

are built are laden with ontological, epistemological, and antecedent theoreticalassumptions, and the very formulation of the guesses assumes certain conceptualviews, scientists seem to have been able to break through those glass ceilingsagain and again and invent entirely new concepts. Surely parallel computing,optimization algorithms, reinforcement learning, or genetic algorithms can assist

Page 19: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

in the exploration of the space of combinatorial compositions of existing con-cepts. In the words of Feynman again: “We set up a machine, a great computingmachine, which has a random wheel and if it makes a succession of guesses andeach time it guesses a hypothesis about how nature should work computesimmediately the consequences and makes a comparison to a list of experimentalresults that it has at the other hand. In other words, guessing is a dumb man’sjob. Actually, it is quite the opposite and I will try to explain why.” He con-tinues: “The problem is not to make, to change or to say that something might bewrong but to replace it with something and that is not so easy.”Can machines create new concepts?The panelists are asked to share illustrative professional experiences, anec-

dotes, and thoughts, as well as their enthusiasm and concerns, regarding theactuality and potential of advances in computing, communication, and control inimproving and positively disrupting the scientific process.

xxii S. Bressan

Page 20: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Contents – Part I

Keynote Paper

From Sensors to Dempster-Shafer Theory and Back:The Axiom of Ambiguous Sensor Correctness and Its Applications . . . . . . . . 3

Dirk Draheim and Tanel Tammet

Big Data Management and Analytics

A Graph Partitioning Algorithm for Edge or Vertex Balance . . . . . . . . . . . . 23Adnan El Moussawi, Nacéra Bennacer Seghouani,and Francesca Bugiotti

DSCAN: Distributed Structural Graph Clustering for Billion-Edge Graphs . . . 38Hiroaki Shiokawa and Tomokatsu Takahashi

Accelerating All 5-Vertex Subgraphs Counting Using GPUs. . . . . . . . . . . . . 55Shuya Suganami, Toshiyuki Amagasa, and Hiroyuki Kitagawa

PhoeniQ: Failure-Tolerant Query Processing in Multi-node Environments . . . 71Yutaro Bessho, Yuto Hayamizu, Kazuo Goda, and Masaru Kitsuregawa

Consistency, Integrity, Quality of Data

A DSL for Automated Data Quality Monitoring . . . . . . . . . . . . . . . . . . . . . 89Felix Heine, Carsten Kleiner, and Thomas Oelsner

Fast One-to-Many Reliability Estimation for Uncertain Graphs . . . . . . . . . . . 106Junya Yanagisawa and Hiroaki Shiokawa

Quality Matters: Understanding the Impact of Incomplete Dataon Visualization Recommendation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Rischan Mafrur, Mohamed A. Sharaf, and Guido Zuccon

Data Analysis and Data Modeling

ModelDrivenGuide: An Approach for Implementing NoSQL Schemas. . . 141Jihane Mali, Faten Atigui, Ahmed Azough, and Nicolas Travers

Automatic Schema Generation for Document-Oriented Systems . . . . . . . . . . 152Paola Gómez, Rubby Casallas, and Claudia Roncancio

Page 21: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Schedule for Detour-Type Demand Bus with Regular Usage Data. . . . . . . . . 164Yuuri Sakai, Yasuhito Asano, and Masatoshi Yoshikawa

Algebra for Complex Analysis of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 177Jakub Peschel, Michal Batko, and Pavel Zezula

FD-VAE: A Feature Driven VAE Architecture for Flexible SyntheticData Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

Gianluigi Greco, Antonella Guzzo, and Giuseppe Nardiello

Data Mining

Bi-Level Associative Classifier Using Automatic Learning on Rules . . . . . . . 201Nitakshi Sood, Leepakshi Bindra, and Osmar Zaiane

RandomLink – Avoiding Linkage-Effects by Employing Random Effectsfor Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

Gert Sluiter, Benjamin Schelling, and Claudia Plant

Fast and Accurate Community Search Algorithm for Attributed Graphs . . . . . 233Shohei Matsugu, Hiroaki Shiokawa, and Hiroyuki Kitagawa

A Differential Evolution-Based Approach for Community Detectionin Multilayer Networks with Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

Clara Pizzuti and Annalisa Socievole

Databases and Data Management

Modeling and Enforcing Integrity Constraints on Graph Databases . . . . . . . . 269Fábio Reina, Alexis Huf, Daniel Presser, and Frank Siqueira

Bounded Pattern Matching Using Views . . . . . . . . . . . . . . . . . . . . . . . . . . 285Xin Wang, Yang Wang, Ji Zhang, and Yan Zhu

A Framework to Reverse Engineer Database Memory by AbstractingMemory Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304

James Wagner and Alexander Rasin

Collaborative SPARQL Query Processing for DecentralizedSemantic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320

Arnaud Grall, Hala Skaf-Molli, Pascal Molli, and Matthieu Perrin

Information Retrieval

Discovering Relational Intelligence in Online Social Networks . . . . . . . . . . . 339Leonard Tan, Thuan Pham, Hang Kei Ho, and Tan Seng Kok

xxiv Contents – Part I

Page 22: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Generating Dialogue Sentences to Promote Critical Thinking . . . . . . . . . . . . 354Satoshi Yoshida and Qiang Ma

Semantic Matching over Matrix-Style Tables in RichlyFormatted Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369

Hongwei Li, Qingping Yang, Yixuan Cao, Ganbin Zhou,and Ping Luo

On Integrating and Classifying Legal Text Documents. . . . . . . . . . . . . . . . . 385Alexandre Quemy and Robert Wrembel

Prediction and Decision Support

An Explainable Artificial Intelligence Methodology for Hard DiskFault Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403

Antonio Galli, Vincenzo Moscato, Giancarlo Sperlí,and Aniello De Santo

An Improved Software Defect Prediction Algorithm Using Self-organizingMaps Combined with Hierarchical Clustering and Data Preprocessing . . . . . . 414

Natalya Shakhovska, Vitaliy Yakovyna, and Natalia Kryvinska

A City Adaptive Clustering Framework for Discovering POIswith Different Granularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425

Junjie Sun, Tomoki Kinoue, and Qiang Ma

Deep Discriminative Learning for Autism SpectrumDisorder Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435

Mingli Zhang, Xin Zhao, Wenbin Zhang, Ahmad Chaddad, Alan Evans,and Jean Baptiste Poline

From Risks to Opportunities: Real Estate Equity Crowdfunding . . . . . . . . . . 444Verena Schweder, Andreas Mladenow, and Christine Strauss

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455

Contents – Part I xxv

Page 23: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Contents – Part II

Authenticity, Privacy, Security and Trust

A Framework for Factory-Trained Virtual Sensor Models Basedon Censored Production Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Sabrina Luftensteiner and Michael Zwick

Blockchain-Based Access Control for IoT in Smart Home Systems . . . . . . . . 17Bacem Mbarek, Mouzhi Ge, and Tomas Pitner

Online Attacks on Picture Owner Privacy. . . . . . . . . . . . . . . . . . . . . . . . . . 33Bizhan Alipour Pijani, Abdessamad Imine, and Michaël Rusinowitch

Cloud Databases and Workflows

Distributed Caching of Scientific Workflows in Multisite Cloud . . . . . . . . . . 51Gaëtan Heidsieck, Daniel de Oliveira, Esther Pacitti,Christophe Pradal, François Tardieu, and Patrick Valduriez

Challenges in Resource Provisioning for the Execution of Data WranglingWorkflows on the Cloud: A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Abdullah Khalid A. Almasaud, Agresh Bharadwaj, Sandra Sampaio,and Rizos Sakellariou

Genetic Programming Based Hyper Heuristic Approach for DynamicWorkflow Scheduling in the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Kirita-Rose Escott, Hui Ma, and Gang Chen

Data and Information Processing

Bounded Pattern Matching Using Views . . . . . . . . . . . . . . . . . . . . . . . . . . 93Xin Wang, Yang Wang, Ji Zhang, and Yan Zhu

Extending Graph Pattern Matching with Regular Expressions . . . . . . . . . . . . 111Xin Wang, Yang Wang, Yang Xu, Ji Zhang, and Xueyan Zhong

Construction and Random Generation of Hypergraphs with PrescribedDegree and Dimension Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

Naheed Anjum Arafat, Debabrota Basu, Laurent Decreusefond,and Stéphane Bressan

Page 24: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Knowledge Discovery

A Model-Agnostic Recommendation Explanation System Basedon Knowledge Graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Yuhao Chen and Jun Miyazaki

Game Theory Based Patent Infringement Detection Method . . . . . . . . . . . . . 164Weidong Liu, Xiaobo Liu, Youdong Kong, Zhiwei Yang,and Wenbo Qiao

Unveiling Relations in the Industry 4.0 Standards Landscape Basedon Knowledge Graph Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

Ariam Rivas, Irlán Grangel-González, Diego Collarana, Jens Lehmann,and Maria-Esther Vidal

On Tensor Distances for Self Organizing Maps:Clustering Cognitive Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

Georgios Drakopoulos, Ioanna Giannoukou, Phivos Mylonas,and Spyros Sioutas

Machine Learning

MUEnsemble: Multi-ratio Undersampling-Based Ensemble Frameworkfor Imbalanced Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

Takahiro Komamizu, Risa Uehara, Yasuhiro Ogawa,and Katsuhiko Toyama

The Linear Geometry Structure of Label Matrix for Multi-label Learning . . . . 229Tianzhu Chen, Fenghua Li, Fuzhen Zhuang, Yunchuan Guo,and Liang Fang

Expanding Training Set for Graph-Based Semi-supervised Classification . . . . 245Li Tan, Wenbin Yao, and Xiaoyong Li

KPML: A Novel Probabilistic Perspective Kernel Mahalanobis DistanceMetric Learning Model for Semi-supervised Clustering . . . . . . . . . . . . . . . . 259

Chao Wang, Yongyi Hu, Xiaofeng Gao, and Guihai Chen

Semantic Web and Ontologies

Indexing Data on the Web: A Comparison of Schema-Level Indicesfor Data Search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

Till Blume and Ansgar Scherp

CTransE: An Effective Information Credibility Evaluation Method Basedon Classified Translating Embedding in Knowledge Graphs . . . . . . . . . . . . . 287

Yunfeng Li, Xiaoyong Li, and Mingjian Lei

xxviii Contents – Part II

Page 25: Lecture Notes in Computer Science 12391978-3-030-59003-1/1.pdf · – Big Data Management and Analytics – Consistency, Integrity, Quality of Data – Constraint Modeling and Processing

Updating Ontology Alignment on the Instance Level Basedon Ontology Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

Adrianna Kozierkiewicz, Marcin Pietranik, and Loan T. T. Nguyen

Theme-Based Summarization for RDF Datasets . . . . . . . . . . . . . . . . . . . . . 312Mohamad Rihany, Zoubida Kedad, and Stéphane Lopes

Contextual Preferences to Personalise Semantic Data Lake Exploration . . . . . 322Devis Bianchini, Valeria De Antonellis, Massimiliano Garda,and Michele Melchiori

Stream Data Processing

Feature Drift Detection in Evolving Data Streams . . . . . . . . . . . . . . . . . . . . 335Di Zhao and Yun Sing Koh

An Efficient Segmentation Method: Perceptually Important Pointwith Binary Tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

Qizhou Sun and Yain-Whar Si

SCODIS: Job Advert-Derived Time Series for High-Demand SkillsetDiscovery and Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

Elisa Margareth Sibarani and Simon Scerri

Temporal, Spatial, and High Dimensional Databases

Temporal Aggregation of Spanning Event Stream: A General Framework . . . 385Aurélie Suzanne, Guillaume Raschia, and José Martinez

Leveraging 3D-Raster-Images and DeepCNN with Multi-source UrbanSensing Data for Traffic Congestion Prediction. . . . . . . . . . . . . . . . . . . . . . 396

Ngoc-Thanh Nguyen, Minh-Son Dao, and Koji Zettsu

Privacy-Preserving Spatio-Temporal Patient Data Publishing. . . . . . . . . . . . . 407Anifat M. Olawoyin, Carson K. Leung, and Ratna Choudhury

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417

Contents – Part II xxix