Semiotics in spreadsheets
-
Upload
ivelize-rocha-bernardo -
Category
Education
-
view
170 -
download
1
Transcript of Semiotics in spreadsheets
Semiotics in Spreadsheets: Enhancing Semantic
InteroperabilityIvelize Rocha Bernardo
André Santanchè
Outline
•Motivation•Research Problems•Related Work•What I did in my Master Degree•Limitations of the Master Degree Proposal•Which are the plans to the PhD
Motivation
Large amount of information in spreadsheets[Syed et al., 2010]
Motivation
Large amount of information in spreadsheets[Syed et al., 2010]
Why?
•They are intuitive•They have high flexibility -> diverse needs
Motivation
However, they were designed for:•Isolated use•Human reading
Research Goal
The main goal of our research is to promote a richer semantic interoperability among spreadsheets
Interoperability(Ouksel & Sheth 1999)
system interoperabilitysyntactic interoperabilitystructural interoperabilitysemantic interoperability
Interoperability(Ouksel & Sheth 1999)
system interoperabilitysyntactic interoperabilitystructural interoperabilitysemantic interoperability
(Tolk 2006)no interoperabilitytechnical interoperabilitysyntactic interoperabilitysemantic interoperabilitypragmatic interoperabilitydynamic interoperabilityconceptual interoperability
Interoperability(Ouksel & Sheth 1999)
system interoperabilitysyntactic interoperabilitystructural interoperabilitysemantic interoperability
(Tolk 2006)no interoperabilitytechnical interoperabilitysyntactic interoperabilitysemantic interoperabilitypragmatic interoperabilitydynamic interoperabilityconceptual interoperability
Interoperability
semantic interoperability semantic interoperabilitypragmatic interoperabilitydynamic interoperabilityconceptual interoperability
Data Interpretation
Which elements must be considered in this
interpretation process?
Which elements must be considered in this
interpretation process?
Unity Interpretation
Related Work
isolated label
(Han et al,. 2008) - RDF123: from spreadsheets to RDF, The Semantic Web. Lecture Notes in Computer Science, vol. 5318. Springer
(Langegger & Wolfram, 2009) - XLWrap Querying and Integrating Arbitrary Spreadsheets with SPARQL, The Semantic Web. Lecture Notes in Computer Science, vol. 5823. Springer
Related Work
template
(Abraham & Erwig, 2006) - Inferring Templates from Spreadsheets, Proceedings of the International Conference on Software Engineering
Related Work
instances
(Zhao et al, 2010) - A spreadsheet system based on data semantic object, IEEE International Conference on Information Management and Engineering
Related Work
isolated label associated to linked data
(Syed et al., 2010) - Exploiting a Web of Semantic Data for Interpreting Tables, Proceedings of the Web Science Conference
Related Work
correlation of labels associated to linked data
(Venetis et al., 2011) - Recovering Semantics of Tables on the Web, Proceedings of the VLDB Endowment
(Mulwad et al., 2010) - Using linked data to interpret tables, Proceedings of the International Workshop on Consuming Linked Data
Related Work
correlation between several spreadsheet elements associated to linked data
(Limaye, 2010) - Annotating and Searching Web Tables Using Entities, Proceedings of the VLDB Endowment
How far the system can interpret, considering labels and
their correlations?
How much different they are in fact?
How much different they are in fact?
How much different they are in fact?
How much different they are in fact?
What I did in my Master Degree
Research Strategy1. To identify construction patterns followed by biologists
during the creation of these spreadsheets
2. To verify if these construction patterns could lead us to recognition of the spreadsheet purpose
3. To achieve a semantic interoperability among these spreadsheets
How to identify Construction Patterns
*
*
How to identify Construction Patternswhat
*
How to identify Construction Patternswhat
*
How to identify Construction Patternswhat
what
*
How to identify Construction Patternswhat
whatwhen
*
How to identify Construction Patternswhat
what wherewhen
Construction Patterns
*
Construction Patterns
*
catalogue
Construction Patterns
*
catalogue
Construction Patterns
*
catalogue
collection
Construction Patterns
*
catalogue
collection
SciSpread System
Architecture EvaluationAutomatic analysis of 11,150 spreadsheets
the system recognized 1,151 spreadsheets806 spreadsheets were classified as catalogue
345 spreadsheets were classified as collection
Total: 748,459 records analyzed
*
Architecture Evaluation - Results
• Random subset of 1,203 spreadsheets was selected to evaluate precision/recall– Precision: 0.84
– Recall: 0.76
– Specificity: 0.95
*
Limitation of the Master Degree Proposal
Main Limitations● Single DomainSpecific spreadsheets (catalogue and
collection)
● Lack of a Model to represent construction patterns○ after, model for construction
patterns isolated for each other
● Linking labels to ontologies○ not able to aggregate different
labels belonging to the same concept
○ the ontology was selected by us, it is not necessarily the best representation for spreadsheets' data
● Single Domain○ Specific spreadsheets (catalogue
and collection)
● Lack of a Model to represent construction patterns○ after, model for construction
patterns isolated for each other
● Linking labels to ontologies○ not able to aggregate different
labels belonging to the same concept
○ the ontology was selected by us, it is not necessarily the best representation for spreadsheets' data
● Multiple Domains
● Model as an association network○ relates elements and
concepts of several spreadsheets
● Linking spreadsheet structure to ontologies○ the link is made between
concepts
Which are the plans to my PhD
Start
SEEK
Start
SEEK
proj.
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
tre.val.
SD
Unittre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
MOSES
tre.val.
SD
Unittre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
MOSES
M_MZ_sample1
tre.val.
SD
Unittre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
MOSES
M_MZ_sample1
ura
Saccharomyces_cerevisiae
4932
CEN.PK-113-7D
ura3
6,5 0,1 37 0,5 oC
tre.val.
SD
Unittre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
MOSES
M_MZ_sample1
ura
Saccharomyces_cerevisiae
4932
CEN.PK-113-7D
ura3
6,5 0,1 37 0,5 oC
tre.val.
SD
Unittre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
MOSES
M_MZ_sample1
ura
Saccharomyces_cerevisiae
4932
CEN.PK-113-7D
ura3
6,5 0,1 37 0,5 oC
tre.val.
SD
Unittre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
MOSES
M_MZ_sample1
ura
Saccharomyces_cerevisiae
4932
CEN.PK-113-7D
ura3
6,5 0,1 37 0,5 oC
tre.val.
SD
Unittre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
MOSES
M_MZ_sample1
ura
Saccharomyces_cerevisiae
4932
CEN.PK-113-7D
ura3
6,5 0,1 37 0,5 oC
tre.val.
SD
Unittre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
MOSES
M_MZ_sample1
ura
Saccharomyces_cerevisiae
4932
CEN.PK-113-7D
ura3
6,5 0,1 37 0,5 oC
tre.val.
SD
Unittre.val.
SD
Unit
Semantic Interoperability among Spreadsheets
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
tre.val.
SD
Unit
tre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
tre.val.
SD
Unit
tre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
ID
tre.val.
SD
Unit
tre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
IDtimerel.glu.
tre.val.
SD
Unit
tre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
IDtimerel.glu.
tre.val.
SD
Unit
tre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
IDtimerel.glu.
tre.val.
SD
Unit
tre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
IDtimerel.glu.
tre.val.
SD
Unit
tre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
IDtimerel.glu.
genotype
tre.val.
SD
Unit
tre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
IDtimerel.glu.
genotype
tre.val.
SD
Unit
tre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
IDtimerel.glu.
genotype
tre.val.
SD
Unit
tre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
IDtimerel.glu.
genotype
trea.
tre.val.
SD
Unit
tre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
IDtimerel.glu.
genotype
trea.
tre.val.
SD
Unit
tre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
IDtimerel.glu.
genotype
trea.
tre.val.
SD
Unit
tre.val.
SD
Unittre.val.
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
IDtimerel.glu.
genotype
trea.
SD
Unit
tre.val.
SD
Unittre.val.
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
IDtimerel.glu.
genotype
trea.
tre.val.
SD
Unit
tre.val.
SD
Unit
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
tre.val.
SD
Unit
tre.val.
SD
Unit
IDtimerel.glu.
genotype
trea.
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
tre.val.
SD
Unit
tre.val.
SD
Unit
IDtimerel.glu.
genotype
trea.
SpreadsheetPurpose
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
tre.val.
SD
Unit
tre.val.
SD
Unit
IDtimerel.glu.
genotype
trea.
SpreadsheetPurpose
SpreadsheetDomain
Data Model
Spreadsheets Semiotic Sign
Data Model
Spreadsheets Semiotic Sign
signifierstructuralform
Data Model
Spreadsheets Semiotic Sign
signifier signifiedstructuralform
spreadsheet purpose
+semantic
spreadsheet data
Architecture
Start
SEEK
proj.
title
nam.
org.
NCBIID
stra.
genenam.
Mod.type
phe.
com.
tre1.ph
tre2.tem.
End
tre.val.
SD
Unit
tre.val.
SD
Unit
IDtimerel.glu.
genotype
trea.
SpreadsheetPurpose
SpreadsheetDomain
StartXYZ
How to devise different domains when the networks are interconnected?
Research Challenge
SpreadsheetDomain
SpreadsheetPurpose
Research Questions
• When spreadsheets could be considered of the same purpose?
• Is there a canonical representation among spreadsheets of the same purpose?
• Is it possible to define a canonical representation for a spreadsheet group• Can this representation be used to predict
spreadsheets of a given purpose?
Acknowledgements● Laboratory of Information Systems (LIS)● UNICAMP● FAPESP● Microsoft Research FAPESP Virtual Institute
(NavScales project)● CNPq (MuZOO Project and PRONEX-FAPESP)● INCT in Web Science(CNPq 557.128/2009-9)● CAPES
Thank you for your attention!