Data management (newest version)

50
Attribution-NonCommercial-ShareAlike 1. Plan ahead Managing needs Ethics Plagiarism Note-taking 2. Organizing your data Files Metadata RSS feeds Manage your email References Remote access Safekeeping 3. Preserving your data What to keep/delete Long-term storage 4. Market your data Reasons to share Reasons not to share How ? G. Gabriel LSC Library Pocock House 235 Southwark Bridge Road London SE1 6NP [email protected] © jannoon028, FreeDigitalPhotos.net Manage your data

Transcript of Data management (newest version)

Page 1: Data management (newest version)

Attribution-NonCommercial-ShareAlike

1. Plan ahead Managing needs

Ethics

Plagiarism

Note-taking

2. Organizing your data Files

Metadata

RSS feeds

Manage your email

References

Remote access

Safekeeping

3. Preserving your data What to keep/delete

Long-term storage

4. Market your data Reasons to share

Reasons not to share

How ?

G. Gabriel

LSC Library

Pocock House

235 Southwark Bridge Road

London SE1 6NP

[email protected]

© jannoon028, FreeDigitalPhotos.net

Manage

your data

Page 2: Data management (newest version)

What is data?

©EpicGraphic.com

Presentation Information Data Knowledge

Page 3: Data management (newest version)

The Royal Society. (2012). Science as an open enterprise. Available at www.oecd.org/sti/sci-tech/38500813.pdf (retrieved 18 October 2014).

What is data?

Page 4: Data management (newest version)

“’research data’ are defined as factual records

(numerical scores, textual records, images and

sounds) used as primary sources for scientific

research, and that are commonly accepted in the

scientific community as necessary to validate

research findings. A research data set

constitutes a systematic, partial representation of

the subject being investigated.”

What is research data?

OECD. (2007). OECD Principles and guidelines for access to

research from public funding. Available at www.oecd.org/sti/sci-tech/38500813.pdf (retrieved 1 October 2014).

Page 5: Data management (newest version)

EMC. (2012). The digital

universe: 50-fold growth

from the beginning of

2010 to the end of 2020

[picture]. Available at

http://www.emc.com/lead

ership/digital-

universe/iview/executive-

summary-a-universe-

of.htm (retrieved 14

August 2014).

Digital universe

Page 6: Data management (newest version)

• Video;

• Audio;

• Databases;

• Still images;

• Spreadsheets;

• Text documents;

• Instrument measurements;

• Experimental observations;

• Quantitative/qualitative data;

• Slides, artefacts, specimens, samples;

• Survey results & interview transcripts;

• Simulation data, models & software;

• Sketches, diaries, lab notebooks;

©Supertrooper, FreeDigitalPhotos.net

Types/formats of research data

©thmvmnt on Flickr

©David Castillo Dominici, FreeDigitalPhotos.net

Page 7: Data management (newest version)

©Stuart Miles, FreeDigitalPhotos.net© Stuart Miller, FreeDigitalPhotos.net

Page 8: Data management (newest version)

Consider your data needs:

• Type of data created

• Consider what data will be created (e.g.

interviews/transcripts, experimental

measurements);

• Consider how data will be created/captured (e.g.

recorded, written, printed);

• Consider the equipment/software required (find

out if there is funding in case new software is

needed).

Plan ahead data management needs

Page 9: Data management (newest version)

Consider your data needs:

• Choose format(s)

• What software/formats have you (or your

colleagues) used in past projects;

• What software/formats can be easily

modified/shared (e.g. Microsoft Excel, SPSS);

• What formats are at risk of obsolescence;

• What software is compatible with hardware you

already have.

Plan ahead data management needs

Page 10: Data management (newest version)

Consider your data needs:

• Volume of data created

• Consider where data is going to be stored;

• Consider if the scale of data poses challenges

when sharing/ transferring data.

• Plan how to sort and analyse data;

• Investigate about Intellectual property rights (IPR)

concerning your research and its dissemination, future

related research projects, and associated profit/credit.

Plan ahead data management needs

Page 11: Data management (newest version)

• Investigate about data protection and ethics -

according to the Data Protection Act 1998 (governs the

processing of personal data), information must follow

eight data protection principles:

processed fairly and lawfully obtained for specified and lawful purposes adequate, relevant and not excessive accurate and, where necessary, kept up-to-date not kept for longer than necessary processed in accordance with the subject's rights kept secure not transferred abroad without adequate protection

Available at http://www.legislation.gov.uk/ukpga/1998/29/contents (retrieved 17 August 2014).

Plan ahead ethics

Page 12: Data management (newest version)

“Plagiarism is defined as submitting as one's own work, irrespective of intent to deceive, that which derives in part or in its entirety from the work of others without due acknowledgement. It is both poor scholarship and a breach of academic integrity.”.

© Thomas Hawk via Flickr

University of Cambridge. (2011). University-wide statement on plagiarism. Available at http://www.admin.cam.ac.uk/univ/plagiarism/students/statement.html (Retrieved 10 July

2014).

Plan ahead plagiarism

Page 13: Data management (newest version)

While you are reading/writing, make sure you identify:

• Which part is your own thought and which is taken from other authors;

• Which parts of your own writing are a response to the argument or directly inspired by ideas in the text;

• Which parts are paraphrases of the author’s points;

• Which parts were done in collaboration with others.

Plan ahead avoiding plagiarism

Page 14: Data management (newest version)

Design a reading grid to take notes of the main ideas/data/ research (including specific citations you may use later on).

• Quivy and Campenhoudt

Main ideas/content Evaluation of

ideas/content

1. e.g. Theory A considers… (pages x-x) e.g. Different

theories;

Take further

research on those

supporting theory x

and theory y;

2. e.g. Theory B considers…

3. e.g. Theory C…

Plan ahead note-taking

Translated from: Quivy, R.; Campenhoudt, L. (2008). Manual de investigação em ciências sociais (5 ed.). Lisboa: Gradiva.

Page 15: Data management (newest version)

• The Cornell Method

Major themes Detailed points

1st main point

e.g. There are several types of theories

More detailed information.

e.g. Theory A explains…

More detailed information.

e.g. Theory B explains…

e.g. Theory C explains…

2nd main point

e.g. Why do some believe in theory A

e.g. Reason 1…

e.g. Reason 2…

critical evaluation

e.g. Both theories A and B do not explain the occurrence of xxx.

Plan ahead note-taking

Pauk, W. (1993). How to study in college (5th ed.). Boston: Houghton Mifflin Co.

Page 16: Data management (newest version)

Plan ahead further information

JISC Legal: copyright and intellectual property lawhttp://www.jisclegal.ac.uk/LegalAreas/CopyrightIPR.aspx

JISC Legal: data protection overviewwww.jisclegal.ac.uk/LegalAreas/DataProtection/DataProtectionOverview.aspx

UK Data Archive: duty of confidentially http://www.data-archive.ac.uk/create-manage/consent-ethics/legal?index=1

The Information Commissioners Office guide to data protectionhttp://www.ico.org.uk/for_organisations/data_protection/the_guide

Page 17: Data management (newest version)

LEKO via Jalopnik, ThePimp.Blog

Page 18: Data management (newest version)

When naming files:

• Adhere to existing procedures (within your research

group, or preferred by your supervisor);

• Use folders and subfolders

– Name folders appropriately (e.g. after the areas of

work) and consistently;

– Structure folders hierarchically (limited number of

folders for the broader topics, and more specific

folders within these);

– Separate on-going and completed work;

Organize your data files

Page 19: Data management (newest version)

When naming files:

• Be consistent with filenames

– Choose a standard vocabulary like a numbering

system (e.g. xxxx_v01.doc; 1930film0001.tif), and

specify the amount of digits to use (standard: eight-

character limit);

– Decide on the use of dates so that documents are

displayed chronologically;

– Include a version control table for important

documents;

Organize your data files

Page 20: Data management (newest version)

When naming files:

• Be consistent with filenames

– Avoid characters such as / : * ? < > | (because they

are reserved for the operating system) and spaces;

use hyphens or underscores, particularly with files

destined for the Web;

– When drafts are circulating, decide how to identify

individuals (e.g. xxxx_v01.doc);

– Mark the final document as “Final” and prevent

further changes.

Organize your data files

Page 21: Data management (newest version)

Organize your data files

• Review records (assess materials regularly or at the

end of a project to ensure files aren’t kept needlessly);

• Backup everything: your files, data, and even your

favourites.

Page 22: Data management (newest version)

• Use metadata (data about data -

usually embedded in the data

files/documents themselves) to

add information to your

documents (e.g. use Microsoft

Office’s “Document properties”).

– Provide searchable information

to help you/others find

information.

Organize your data metadata

Page 23: Data management (newest version)

• Standard metadata fields:

– Title (name of the dataset or research project);

– Creator (who created the data);

– Identifier (number used to identify the data);

– Subject(s) (keywords);

– Intellectual property rights held for the data;

– Access information (where/how data can be

accessed by others);

– Methodology (how the data was generated);

– Versions (date/time stamp for each file).

Organize your data metadata

Page 24: Data management (newest version)

• Structure information from the web

(news websites, blogs, etc.) into a

feeds reader (e.g. feedly, digg reader,

NewsBlur, NetVibes); ©Vector, www.youtoart.com

• Set up RSS

feeds from

databases.

Organize your data RSS feeds

Page 25: Data management (newest version)

• Structure your folders by subject, activity or

project;

• Set up a separate folder for personal emails

(create filters);

• Archive old emails;

• Delete useless emails and block junk

email;

• Limit the use of attachments (use

alternative ‘data sharing’ options);

• Try applications to help you manage your

email (see “7 great services for taking back

control of your inbox”)

Organize your data manage your email

Page 26: Data management (newest version)

• Keep track of every

bibliographic reference

used/seen;

• Use a reference

management software;

• Backup your

bibliographic data.

Organize your data references

Page 27: Data management (newest version)

©winnond,

FreeDigitalPhotos.net

• Use a single technology/method of

remote access

or

• Decide on clear rules for managing

your remote access technologies

• Designate one device as your “master”

storage location;

• Transfer the latest versions of your

files to your master device ASAP,

every time that you do work away from

your master storage location;

• Back up your important files regularly.

Organize your data remote access

Page 28: Data management (newest version)

• Key printed data should be kept in a secure location

(e.g. locked cupboards);

• Keep sensitive electronic data password protected,

encrypted or sett privileged levels of access

(including backups);

• Do not use printouts with sensitive data as scrap

paper. Decide on efficient methods of disposing

(e.g. shredding);

Organize your data safekeeping

Page 29: Data management (newest version)

• Computer terminals should not be left unattended

and should be logged off at the end of each

session;

• Protect your computer with anti-virus, firewall and

anti-keylogging;

• Choose strong passwords and change them

frequently (if you store passwords on a computer

system, encrypt the file);

Organize your data safekeeping

Page 30: Data management (newest version)

• Store crucial data in more than one secure location:

• Networked drives;

• Personal computers/laptops;

• External storage devices (CDs, DVDs, USB flash

drives);

• Remote or online systems for storing (Dropbox, Mozy,

A-Drive, etc.).

Organize your data safekeeping

Page 31: Data management (newest version)

Organize your data further information

Data Documentation Initiative www.ddialliance.org

UK Data Archive: documenting your datawww.data-archive.ac.uk/create-manage/document/overview

MIT Libraries documentation and metadatahttp://libraries.mit.edu/guides/subjects/data-management/metadata.html

Online services that provide storage (e.g. DropBox)

Online/desktop programs to storage and keep track of the changes made to documents (e.g. Git)

Page 32: Data management (newest version)

See: http://datalib.edina.ac.uk/mantra/

Organize your data further information

Page 33: Data management (newest version)

Jones, S. (2011). How to Develop a Data Management and Sharing Plan. Edinburgh: Digital Curation Centre. Available at:

http://www.dcc.ac.uk/resources/how-guides/develop-data-plan#sthash.hwE7pntn.dpuf (retrieved 17 February 2014).

Organize your data further information

Page 34: Data management (newest version)

©Pixabay.com

Page 35: Data management (newest version)

EMC (2012). The digital universe in

2020: big data, bigger digital shadows, and

biggest growth in the Far East. Available at

http://www.emc.com/leadership/digital-

universe/iview/executive-summary-a-universe-of.htm

(retrieved 14 January 2014).

Preserving your data the cloud

Page 36: Data management (newest version)

• Does your funder needs to keep data and /or make

it available for a certain amount of time?

• Is the data a vital record of a project/organisation/

and therefore needs to be retained indefinitely?

• Do you have the legal and intellectual property

rights to keep and re-use the data? If not, can

these be negotiated?

• Does sufficient metadata exist to allow data to be

found wherever it is stored?

Preserving your data what to keep/delete?

Page 37: Data management (newest version)

• If you need to pay to keep the data, can you afford

it?

• Only store what you need to keep! Storage costs

money and/or effort and storing massive amounts of data

require a well thought plan to organize it so that

information is easily found;

Preserving your data what to keep/delete?

Page 38: Data management (newest version)

• Digital repository

Provides online archival storage – usually open access –

and cares for digital materials, ensuring that they remain

readable for as long as the repository survives.

• Archive/data center

Ensure data safe-keeping in the long term: datasets are

fully documented with all bibliographical details and

users of the data are aware of the need to acknowledge

the data sources in publications.

e.g. Archaeology Data Service

Preserving your data long term storage

Page 39: Data management (newest version)

Preserving your data further reading

https://dmponline.dcc.ac.uk

Digital Curation Centre: the value of digital curationwww.dcc.ac.uk/digital-curation

UK Data Archive FAQwww.data-archive.ac.uk/help/user-faq#2

National Preservation Office: caring for CDs and DVDswww.bl.uk/blpac/pdf/cd.pdf

Wikipedia: list of backup softwarehttp://en.wikipedia.org/wiki/List_of_backup_software

Wikipedia: comparison of online back-up serviceshttp://en.wikipedia.org/wiki/List_of_online_backup_services

Page 40: Data management (newest version)

Digital Curation Centre. (cop. 2004-2014). DCC

curation lifecycle model [image]. Available at

http://www.dcc.ac.uk/resources/curation-lifecycle-

model (retrieved 17 February 2014).

Page 41: Data management (newest version)

©SOMMAI, FreeDigitalPhotos.net

Page 42: Data management (newest version)

• Scientific integrity - publishing your data and citing

its location in published research papers can allow

others to replicate, validate, or correct your results,

thereby improving the scientific record.

• Funding mandates - UK research councils are

increasingly mandating data sharing so as to avoid

duplication of effort and save costs.

• Raise/Increase the impact of your research - those

who make use of your data and cite it in their own

research will help to increase your impact within your

field and beyond it.

Market your data reasons to share

Page 43: Data management (newest version)

• Preserve your data for future use – anyone can

benefit by being able to identify, retrieve, and

understand the data by themselves after you have lost

familiarity with it (perhaps several years hence).

• Making publicly funded research available publicly

- there is a growing movement for making publicly

funded research available to the public, as indicated

for example, in the Organisation for Economic Co-

operation and Development (OECD) Principles and

Guidelines for Access to Research Data from Public

Funding.

Market your data reasons to share

Page 44: Data management (newest version)

• Increase transparency through creating,

disseminating and curating knowledge.

• Increase collaboration - the use of archived data by

other researchers may lead to with the data owner and

to co-authorship of publications based on re-use of the

data.

Market your data reasons to share

Page 45: Data management (newest version)

• If your data has financial value or is the basis for

potentially valuable patents, it may be unwise to share

it, even with a data licence or terms and conditions

attached.

• If the data contains sensitive, personal information

about human subjects, it may violate the Data

Protection Act, ethics codes, or written consent forms.

Do not even share data with other researchers. Note:

often there are ways to anonymise the data to remove

the personally identifying information from it, thus

making it sharable as a public use dataset.

Market your data reasons not to share

Page 46: Data management (newest version)

• If parts of the data are owned by others (such as

commercial entities or authors) you may not have the

rights to share the data, even if you have derived

wholly new data from the original sources.

Market your data reasons not to share

Page 47: Data management (newest version)

• Publish in Open Access journals;

• Enhance your online presence through social

media (Facebook, Twitter, start and maintain a blog);

• Use author identification (researcherID from Web of

Science; Scopus ID, ORCID);

• Share research in ”academic” platforms (LinkedIn,

Academia.edu, ResearchGate, Microsoft Academic

Search, Mendeley);

• Keep track of different metric statistics (number of

citations);

Market your data how?

Page 48: Data management (newest version)

Digital Curation Centre Overview of major funders’ data policies

SHERPA JULIET searchable international database of funders' open access and archiving requirements.

Times Higher Education supplement "Research intelligence - Request hits a raw spot" (15 July 2010).

DOAJ – Directory of Open Access Journals (with information on OA journal preservation program and OA quality standards.

OAD – Open Access Directory.

Market your data Further information

Page 49: Data management (newest version)

Guidance Leaflet by DICE, SHARD and PrePARe projects.

Summary

Page 50: Data management (newest version)

LSC LibraryPocock House

235 Southwark Bridge RoadLondonSE1 6NP

[email protected]/lsclondon

Attribution-NonCommercial-ShareAlike