Short-term storage and data documentation Mari Wigham COMMIT
-
Upload
warren-webb -
Category
Documents
-
view
221 -
download
1
Transcript of Short-term storage and data documentation Mari Wigham COMMIT
Short-term storage and data
documentation
Klik op het pictogram als u een afbeelding wilt toevoegenKlik op het pictogram als u een afbeelding wilt toevoegen
Mari Wigham
COMMIT/
Information Management @ WUR
Organising, sharing, finding and reusing data
Expertise in:
●Modelling data and how people work with it
●Semantic technology – making the data ‘understandable’ for computers so that they can better support the users
Emphasis on applications for use in real life
Information Management @ WUR
This presentation
What data do we store?
With what technology?
Where do we store it?
Why document your data?
Example: Study on the effects of diet on health
Questions
What data are we storing?
Raw data
Final data
Papers
but also
Intermediate data
Drafts of papers
Methods
Equipment and materials
Labnotes
...
What data are we storing?
Everything you need to be able to do your work
Everything your colleagues need to do their work
Everything required by your funding organisation
Everything necessary to reproduce your work
With what technology?
Smartphone/tablet/laptop/PC...
Specialised hardware and software
Dedicated e-labnotebook software or standard software
What do you want to be able to do with it?
●Take notes?
●Access internet?
●Log on to your network?
●Write documents?
●Give presentations?
●Use in the lab?
●Link up with lab systems?
Where do we store it?
Storage solutions
Advantages Disadvantages Suitable for
Personal computer /laptop
• Always available• Portable
• What if it breaks/is stolen?
• What if you are ill or away?
Temporary storage
Network driveManaged file servers
• Regularly backed up and maintained
• Stored securely• Stored centrally
• Costs• May not be
accessible from everywhere/by everyone
Master copy (if enough space is provided)
External storage devices – USB, flash etc.
• Low cost• Portable
• Easily damaged or lost
• Insecure
Temporary storage
Cloud services – Dropbox, Figshare, SkyDrive etc.
• Automatic sync (some services)
• Easy access
• Is it secure?• No control over
backup procedure
Data sharing
Sharepoint and OneDrive
"tussen droom en daadstaan wetten in de weg en praktische bezwaren"
“there’s many a slip twixt cup and lip”
• Proof of Concept – end of April• If a green light, available end September
Features
Application
Data classification
Social functions
Search facility
Costs
Work offline
AnyTime
AnyWhere
AnyDevice
Share with 3rd parties
User-friendly
Sourcing
Microsoft Sharepoint
2013
Enterprise info – department, project
Up to Confidential
Social
Search facility
High costs
No sync client linux
7x24 via App, Web, Explorer
Internal / External WUR
Windows, Mac, iOS, Android
Yes, via Federation
Self & simple
On Premises
Microsoft Onedrive business
Personal data
Up to Confidential
Social integration
Search integration
Costs very low
No sync client linux
7x24 via App, Web, Explorer
Internal / External WUR
Windows, Mac, iOS, Android
Yes, via Federation
Self & simple
Public Cloud
Sharepoint 2010
Enterprise info – department, project
Up to Confidential
No Social
Mediocre Search
High costs
No sync client
7x24 via App, Web, Explorer
Internal / External WUR
Windows, Mac, iOS, Android
X-Account
Complex
On Premises
M: W: drive
All types of data
Up to Secret
No Social
Limited Search
High costs
Limited sync with M:\drive on laptop
7x24 Via Citrix
Via Citrix
Via Citrix / WURclient
Not possible
Not self/ simple
On Premises
Sharepoint and OneDrive
CurrentEnvironment
Short term storage – what are the issues?
Space
Access
●From where?
●By who?
Versioning
Backups
Finding it again!
Short term storage: Basic tips
Space
●Try to estimate how much you will need
●How will you monitor use?
●What do you do if you need more?
●What is your procedure for deletion?
Access
●Think about who will need access and from where
●What is your alternative if there is temporarily no access?
●Does everyone have the same access and edit rights?
Short term storage: Basic tips
Versioning
●use a file in one (online) location as the “master”, and do all your modifications and processing on copies of that master
●When you have consolidated your changes and do not want to lose them, replace the master file by the consolidated file
● Indicate versions clearly – especially which is the master!
●Use a naming convention that includes date or number (e.g. ..._v1, ..._v2)
●Keep track of ‘milestone files’
Short term storage: Basic tips
Backups
●As soon as possible
●Regularly
●How easily can you get hold of the backup?
●Make sure the backup is as independent as possible from the main storage
Finding
●Use descriptive names (descriptive for others than just yourself!)
●Document your data
Why document your data?
For yourself
For data processing and analysis
Help in writing reports and papers
Reference for the future
●Will you still understand it in 2 months, 6 months, 2 years..?
Include failures and dead ends!
“On 19 September 1994, on the verge of giving up, Wiles had a flash of insight that the proof could be saved by returning to his original Horizontal Iwasawa theory approach, which he had abandoned in favour of the Kolyvagin–Flach approach, this time strengthening it with expertise gained in Kolyvagin–Flach's approach”
For others
Your research colleagues – the ‘lone genius’ is very rare.
Provenance and traceability
●Patents
●Fraud
Journals are starting to ask for the data behind the paper
Research institutes and funding institutions such as the EU and NWO also increasingly want the data
The importance of good documentation
“I have discovered a truly marvellous proof of this, which this margin is too narrow to contain”
Documentation = paper?
Data documentation
Structure is essential!
The structure comes from you!
A hierarchy of different files...
...or everything in one program
Example
Study to examine the effects of diet on health
- Conducted over 3 years by 3 researchers – Peter, Lisa and Anna
There are many ways to organise the data. We will look at three:
- By researcher
- By year
- By activity
Example
It is now the summer holidays in 2014. Peter and Anna are on holiday, and Lisa has received some urgent questions from the reviewers. They need to know:
the procedure used to produce the high protein diet
which bureau measured the data
what sort of preprocessing was carried out on the data.
Organisation by year/researcher
Need to know what was done when or by who
Example – Organising by activity
Easy to navigate through, for each question you quickly find the right folder
- even if you had no prior knowledge.
Example – Organising by activity
Still need to do quite a lot of detective work to find the information
– have to rely on good names, guesswork, and ...
...read through the content of the files.
Structure AND metadata
Enter a brief description for each activity (folder)
It may help to identify inputs and outputs, or types of files (e.g. dataset, procedure, sample, document)
Linking to items produced in other activities allows you to:
● follow the workflow
● reuse items
● avoid problems due to multiple copies
Example – Organising by activity plus extra structure
Easy to navigate through, for each question you quickly find the right folder
- even if you had no prior knowledge.
Descriptions and structure help you to find and understand the data
Links make the whole process traceable
Electronic lab notes
Notes taken in the lab are often unstructured
May also cover different projects
Splitting the notes per activity and structuring them helps
How far you go depends on the time you have and what is necessary for understanding the data
The same goes for other large, unstructured files
Structuring data
It takes time!
But it’s an investment – not time lost
Why document your data?
If you store your files in a structure with description and links, you can:
See your research in context
Find – and understand - information more easily
Make your research traceable
Make your research reusable
Questions?