Technical Coping Strategies for Resource Discovery - Paul Walk
description
Transcript of Technical Coping Strategies for Resource Discovery - Paul Walk
![Page 2: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/2.jpg)
Contents
1. a general consideration:• open or closed
2. a particular challenge:• synchronisation in an open world
3. the ‘nothing new’, but doing it better• APIs that work and can be trusted
![Page 3: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/3.jpg)
a healthy(?) state of tension between open and closed
![Page 4: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/4.jpg)
open and closed worlds
• I’m not talking about licensing or access to data
• open• unbounded -‐ like the Web
• closed• bounded -‐ like most collections management system, aggregations etc.
• formally, much of what we do is underpinned by ‘open/closed worlds’ assumptions:
• open world assumption: any statement not known to be true is unknown• closed world assumption: any statement not known to be true is false
![Page 5: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/5.jpg)
characteristics of an open world
![Page 6: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/6.jpg)
characteristics of a closed/bounded world
![Page 7: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/7.jpg)
judging where to apply each
• we need our infrastructure (especially integration technology between systems) to be open and relatively unbounded
• the Web is still the best available foundation for this
• however, we still need to manage our resources, maintain quality and honour complex rights management commitments
• we probably need to recognise that users’ experience is often enhanced through the application of a more focussed, targeted and context-‐aware approach
![Page 8: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/8.jpg)
a particular challenge
![Page 9: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/9.jpg)
synchronisation
• how is the state of the resource maintained across an infrastructure of ‘federated’ repositories?
• if a resource is changed or deleted, how does the right-‐hand side aggregation know?
• note -‐ this is based on our existing ‘harvesting’ or ‘pull’ approach
ResourceCollection
ResourceCollection
ResourceCollection
Aggregation
Aggregation
ResourceCollection
Aggregation
multiple harvest routes,multiple copies
![Page 10: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/10.jpg)
ResourceSync
• a joint project of NISO and OAI, led by Herbert Van de Sompel of Los Alamos
• a light-‐weight mechanism to allow the state of web resources to be communicated between web systems
• developing a spec which builds on the sitemap speciTication, allowing content providers to publish changesets
• draft: http://bit.ly/WYhTz2
• Jisc have funded UK participation in this
![Page 11: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/11.jpg)
The sun shone, having no alternative, on the nothing new. Murphy, Samuel Becket
![Page 12: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/12.jpg)
A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable
Leslie Lamport
![Page 13: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/13.jpg)
a common ‘anti-pattern’
• as a developer, I have no reason to trust that these APIs are any good.
• after all, the service provider doesn’t seem to trust them for their own application....
some aggregated data of broad interest and potential usefulness
UI
APIAPIAPI
Future3rd-party
dev
Future3rd-party
dev
Future3rd-party
dev
UI
UI
UI
= certainty= belief= speculation
end-user
end-userend-user
end-user
![Page 14: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/14.jpg)
a better pattern
• As a developer, I’m more likely to trust this pattern.
• the content provider is using their own API to deliver their own application.
• they have a vested interest!
some aggregated data of broad interest and potential usefulness
API
3rd-partyapp
focussedapp
UIUI
end-userend-user
= certainty= belief= speculation
![Page 15: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/15.jpg)
APIs are not best thought of as machine-to-machine interfaces
APIs are interfaces for developers
![Page 16: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/16.jpg)
messages from developers to content-providers
• These are from yesterday’s developer day held here at the BL in support of this summit:
• please don’t build elaborate APIs which do not allow us to see all of the data, or its extent. It’s not that we simply want to download all the data -‐ but we do need to see what we’re dealing with
• if you give us access to incomplete data (perhaps because you’re worried about revealing poor data quality), then we will tend to either abandon our attempts to use it or we will ‘Bill in the gaps’ with data from elsewhere. So offering an API which delivers incomplete data is usually self-‐defeating
• the implicit bargain, made explicit:• give us access to the data as soon as possible and we will do some of the work to process so it is Bit for some new purpose -‐ and we will happily share this code with you
![Page 17: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/17.jpg)
Questions for the parallel sessions
1. Which emerging technologies do we need to focus on in 2013?
2. Do we still need to aggregate?
3. What does data quality stop us doing?
![Page 18: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/18.jpg)
Which emerging technologies do we need to focus on in 2013?
• Graphs: Content Context is king
• both Facebook and Google are betting heavily on graph technologies
• closer to home -‐ so are content providers like the BBC
• linking these is an interesting challenge
• databases based on a graph model give the potential for a richer understanding about entities (users!)
• instrumentation in personal devices makes more context available (e.g. geo-‐location).
![Page 19: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/19.jpg)
Do we still need to aggregate?
![Page 20: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/20.jpg)
Do we still need to aggregate?
yes.
![Page 21: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/21.jpg)
Do we still need to aggregate?
• to address systems/network latency -‐ provide a cache
• to showcase!
• for ‘Web Scale concentration’
• network effects if user facing services also developed
• to create middleman business opportunities
• as infrastructure to support locally developed services
• as an approach to preservation
yes.
![Page 22: Technical Coping Strategies for Resource Discovery - Paul Walk](https://reader035.fdocuments.net/reader035/viewer/2022081404/558d26bbd8b42a2e638b4671/html5/thumbnails/22.jpg)
What does data quality stop us doing?
• interpreted as: “what does a concern for data quality stop us doing?”• it stops us from releasing data early
• interpreted as: “what does poor/uncertain data quality stop us doing?”• it erodes trust, which impacts the likelihood of someone doing something worthwhile with our data
• reconciling these concerns is a major challenge for us.