Post on 17-Jul-2015
Data Analysis in a Changing Discourse |
Presented By
Date
Data Analysis in a Changing Discourse
The Challenges of Scholarly Communication
Paul Groth @pgroth
Data Analysis in a Changing Discourse |
Data Analysis in a Changing Discourse | 3
Data Analysis in a Changing Discourse | 4
Data Analysis in a Changing Discourse |
Data Analysis in a Changing Discourse |
queri
consum
correl
hierarch
profillognorm
graph
ws-bpel
to
program
decis
global
electron
mechan
imbalanc
cook
word
bottleneck
brows
relev
recip
geograph
markov
graph-basrate
design
click
spectral
index
section
access
petri
conduct
net
usag
modular
clickstream
implicit
valu
search
forum
auction
technolog
anchor
rdf
anycast
social
opinion
semant
approxim
prefer
folksonomi
tag-bas
substr
mobil
select
use
from
&
recommend
on
relatprobabilist
uddi
prototyp
cach
ict4d
retriev
scalabl
annot
tag
learn
stream
process
share
templat
topic
minimum
explor
onlin
secur
travel
answer
product
resourc
peer-to-p
usabl
geoloc
bloom
domin
sparql
goal-driven
issu
inform
suggest
composit
feedback
telecom
keyboard
taxonomi
dynam
entiti
reinforc
monitor
polici
delici
handl
gadget
framework
spatio-tempor
discuss
workload
sidejack
submodular
mode
found
citat
hard
combinatori
meta
sponsor
energi
extract
orient
network
join
space
publish
research
content
on-lin
adapt
internet
integr
partit
navig
reason
theori
compliancthread
clickthrough
filter
length
regress
frequent
independ
denorm
rank
evolut
script
data
interact
system
messag
circl
privaci
gpseavesdrop
fuzzi
crawl
keyword
tree
structur
h-index
balanc
video
schema
browser
and
function
comput
mine
engin
rout
technology-enhanc
(well
soap
distribut
track
price
object
eye-track
regular
segment
model
co-clust
multi-keyword
determin
bulletin
commerc
qos
text
cdn
random
session
reput
find
xml
locat
winner
activ
cloak
local
express
mainten
cost-per-act requirorgan
statist
mediat
microbusi
view
wiki
set
knowledg
2.0 expertis
disjunct
detect
expert
pattern
review
wikipedia
debat
languag
chemic
flickr
approach
attribut
spars
isol
extens
p2p
news
advertis
popul
protect
instant
axiomat
dissemin
voicesit
tempor
facet
instanc
context
logic
load
ontolog
walk
distil
suppli
trust
communiti
duplic
invert
devic
componinterest
basic
imag
bayesian
repetit
educ
hidden
semantic-bas
novel
datalog
servic
near
behavior
anonym
incentive-cent
region
server-sid
propag
metric
cross-languag
cluster
pharm
lightweight
develop
minim
media
medic
econom
complex
dht
infer
optim
effect
userextern
task
semantics)
person
programm
the
paradigm
isoton
monet
photo
rest
collabor
demograph
web
cut
character
board
persuas
subsequ
match
applic
classfic
webpag
traffic
associ
measur
microformat
collect
cascad
soft
page
sitemap
crawler
shed
excerpt
maxim
mirror
guarante
p3p
transport
viral
for
overlay
characteris
larg
market
machin
same-origin
compress
web-bas
vs.
comparison
of
labelsemistructur
disabl
owl
effici
log
task-bas
spam
question
aspect-ori
fast
interfac
analysi
semi-supervis
wireless
cloud
pagerank
categor
consist
isid
problem
similar
query-log
classif
featur
evalu
pseudo
abstract
diagnosi
proven
generat
mutual
mashup
discoveri
virtual
bpel
field
communic
phish
architectur
longev
svm
algorithm
fsg
reliabl
descript
visual
rule
Keyword co-‐occurrence network in WWW 2008
web, query, online, mobile
Data Analysis in a Changing Discourse |
represent
monet
queri
consum
collabor
paper
semantic/data
reput
languag
entiti
web
locat
polici
with
explain
desktop
blog
to
analyz
rich
geo/tempor
analyt
applic
digit
tangible/hapt
spell
(slas)
traffic
relev
measur
unstructur
level
h
negat
authent
correct
sensemak
statist
soft
manag
crawlerwiki
enterpris
properti
aspect
porn
natur
creation
rate
design
structur
capac
extract
click
index
network
for
open
review
multimedia
definit
publish
discoveri
content
method
communiti
internet
approach
defens metadata
machin
real-world
agreement
rich-media
market
base
theori
repositori
news
advertis
vertic
on
search
auction
of
page
filter
context
social
fine-grain
improv
provis
semistructur
produc
plan
control
semant
e-commerc
effici
appli
qualiti
rank
system
right
mobil
summar
select
use
from
log
spam
interact
compos
avail
their
attack
interfac
includ
recommend
corpus
large-scal
ontolog
deliveri
that
tool
privaci
site
trailvisual
link
ling
harvest
cach
replic
novel
retriev
evolut
scalabl
servic
access
annot
contextu
learn
browser
object-ori
analysi
classif
comput
evalu
context-awar
process
in
share
mine
cluster
tag
explor
generat
onlin
facet
develop
techniqu
secur
perform media
research
exchang
econom
other
exploratori
combin
document
divers
sub/super-docu
relat
resourc
distribut
compress
discov
virus
user
component-bas
engin
data
model
feder
audit
sentiment
algorithm
author
issu
person
text
inter-organiz
suggest
mechan
the
opinion
Keyword co-‐occurrence network in WWW2010
search, social, data
Data Analysis in a Changing Discourse |
Figure 1. Evolution of the number of classes of the three branches of the Gene Ontology.
Dameron O, Be@embourg C, Le Meur N (2013) Measuring the EvoluKon of Ontology Complexity: The Gene Ontology Case Study. PLoS ONE 8(10): e75993. doi:10.1371/journal.pone.0075993 h@p://127.0.0.1:8081/plosone/arKcle?id=info:doi/10.1371/journal.pone.0075993
Data Analysis in a Changing Discourse |
Table 2. Gene Ontology complexity variations.
Dameron O, Be@embourg C, Le Meur N (2013) Measuring the EvoluKon of Ontology Complexity: The Gene Ontology Case Study. PLoS ONE 8(10): e75993. doi:10.1371/journal.pone.0075993 h@p://127.0.0.1:8081/plosone/arKcle?id=info:doi/10.1371/journal.pone.0075993
Data Analysis in a Changing Discourse |
• The most recent changes to the GO term “apoptotic process” as displayed in QuickGO [20]. In total there have been 54 changes over the lifetime of the term.
• Huntley et al. GigaScience 2014 3:4 doi:10.1186/2047-217X-3-4
Definitions change
Data Analysis in a Changing Discourse |
Ramifications
Data Analysis in a Changing Discourse |
Data Analysis in a Changing Discourse | 13
What happens to the long tail?
Data Analysis in a Changing Discourse |
CHEMBL 15: Targets are now proteins
h@p://chembl.blogspot.nl/2013/01/chembl-‐15-‐schema-‐changes.html
14
Data Analysis in a Changing Discourse | 15
Data Analysis in a Changing Discourse | 16
Downstream effects
Data Analysis in a Changing Discourse |
The growth of data munging
17
Data Analysis in a Changing Discourse |
h@ps://storify.com/chenghlee/dataformathell
h@p://isps.yale.edu/sites/default/files/files/IDCC14_DQR_PeerGreenStephenson.pdf
Data Analysis in a Changing Discourse |
“60 % of time is spent on data preparation”
NASA, A.40 Computational Modeling Algorithms and Cyberinfrastructure, tech. report, NASA, 19 Dec. 2011
Data Analysis in a Changing Discourse |
Search target Oxidoreductase: 481 targets from different species
Selection of all the oxidoreductases and filtering bioactivities with the criteria IC50 < 100 (no units could be selected): 11497 data obtained
Table exported to a excel spreadsheet and manually filtered
From Mabel Loza - USC team
Data Analysis in a Changing Discourse |
The Seven Deadly Sins of
Bioinformatics
Professor Carole Goble carole.goble@manchester.ac.uk
The University of Manchester, UK The myGrid project
OMII-UK
Data Analysis in a Changing Discourse |
22
Andy Law's Third Law • “The number of unique identifiers assigned to
an individual is never less than the number of Institutions involved in the study”... and is frequently many, many more.
h@p://bioinformaKcs.roslin.ac.uk/lawslaws.html
PubChem Drugbank ChemSpider
Imatinib
Mesylate
What Is Gleevec?
Data Analysis in a Changing Discourse |
Some Solutions
24
Data Analysis in a Changing Discourse |
Issue: Identifiers aren’t the same and we can’t agree on when one thing equals another Solution: Adaptive identifier mapping based on profiles
Strict Relaxed
Analysing Browsing
Data Analysis in a Changing Discourse | 26
Issue: There’s no one data model of science SoluKon: Simple “common sense” driven data model primarily focused on user interface needs
Data Analysis in a Changing Discourse | provbook.org
Data Analysis in a Changing Discourse |
Data Analysis in a Changing Discourse |
My Questions:
15/03/15
29
Data Analysis in a Changing Discourse |
[Gray et al. ISWC 2014]
Data Analysis in a Changing Discourse |
Data Analysis in a Changing Discourse |
We have to rely on computers
32
Data Analysis in a Changing Discourse |
Contact: Elsevier Labs
• Paul Groth p.groth@elsevier.com • http://pgroth.com • @pgroth
15/03/15
33
Data Analysis in a Changing Discourse |
• What is the interplay between data munging and concept drift? • What happens when humans are not in the loop? • What’s our tolerance for fuzziness? • Should we worry about the long tail?
34
Questions