D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data...

43
Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan Deliverable Editor: Emily Vacher (ODI) Other contributors: Deliverable Reviewers: R. Brochenin (Tu/e) / Shatha Jaradat (KTH) Deliverable due date: 31/07/2016 Submission date: 29/07/2016 Distribution level: Public Version: 1.0 This document is part of a research project funded by the Horizon 2020 Framework Programme of the European Union

Transcript of D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data...

Page 1: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Projectacronym: EDSA

Projectfullname: EuropeanDataScienceAcademy

Grantagreementno: 643937

D5.6UpdatedEDSADataManagementPlan

DeliverableEditor: EmilyVacher(ODI)

Othercontributors:

DeliverableReviewers:

R.Brochenin(Tu/e)/ShathaJaradat(KTH)

Deliverableduedate: 31/07/2016

Submissiondate: 29/07/2016

Distributionlevel: Public

Version: 1.0

ThisdocumentispartofaresearchprojectfundedbytheHorizon2020FrameworkProgrammeoftheEuropeanUnion

Page 2: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

ChangeLog

Version Date Amendedby Changes

0.1 27/05/2016 EmilyVacher Createddocument,addedinitialplanoutline

0.2 31/05/2016 EmilyVacher AddeddatasetsandWPdescriptions

0.3 10/06/2016 EmilyVacher Incorporatedamendments

0.4 23/06/2016 EmilyVacher Updatedwithnewpublisheddata

0.5 28/06/2016 EmilyVacher Incorporatedamendments

0.6 27/07/2016 ElenaSimperl ScientificReview

1.0 29/07/2016 AnetaTumilowicz FinalQA

Page 3: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage3of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

TableofContents

ChangeLog..................................................................................................................................................................................2 

TableofContents......................................................................................................................................................................3 

ListofTables..............................................................................................................................................................................4 

ListofFigures............................................................................................................................................................................5 

1  ExecutiveSummary.........................................................................................................................................................6 

1.1  Lessonslearnt...........................................................................................................................................................6 

1.2  UpdatesfromInitialDMP.....................................................................................................................................7 

2  Policy..................................................................................................................................................................................10 

2.1  DatastandardsandmetadatapolicyforEDSA.........................................................................................10 

2.2  DatasharingpolicyforEDSA...........................................................................................................................10 

2.3  SupportingpeoplewhowanttouseEDSAdata......................................................................................11 

2.4  DatastorageandmanagementpolicyforEDSA......................................................................................12 

2.5  DatapreservationandarchivingpolicyforEDSA...................................................................................13 

3  Challengesanddecisions............................................................................................................................................14 

3.1  Informedconsent.................................................................................................................................................14 

3.2  Anonymisationofpersonaldata....................................................................................................................14 

3.3  Thirdpartylicences.............................................................................................................................................15 

4  DataManagementPlan...............................................................................................................................................16 

4.1  Summary..................................................................................................................................................................16 

4.2  TheEDSARegister...............................................................................................................................................16 

4.2.1  Introduction....................................................................................................................................................16 

4.3  Workpackage1‐Demandanalysisandadvisoryboard.....................................................................17 

4.3.1  Corporaofcrawledweb‐basedadvertsfromLinkedIn................................................................17 

4.3.2  AggregatedstatisticsofEuropeanskilldemandbasedonweb‐basedjobadverts..........18 

4.3.3  Individualresultsfromdemandanalysis...........................................................................................20 

4.3.4  Summarydatafromsurveysandinterviews....................................................................................21 

4.3.5  De‐identifiedsurveyresponsesfromdemandanalysis................................................................22 

4.3.6  Recordingsandtranscriptionsofinterviews....................................................................................23 

4.3.7  ideXlabsearchplatformresults..............................................................................................................24 

4.4  Workpackage2–Curriculaandcoursedevelopment.........................................................................26 

4.4.1  RelatedcoursedataregardingsimilarmodulesandtrainingavailableacrosstheEU...26 

4.4.2  Datasetforcourseexamplesandexercises.......................................................................................27 

4.4.3  Eventlogfromamunicipalityprocess................................................................................................29 

4.5  Workpackage3–Trainingdeliveryandlearninganalyticsfeedback...........................................30 

4.5.1  Repositorystatisticsondownloadsandviewsofeducationalresources.............................30 

Page 4: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page4of43EDSAGrantAgreementno.643937

4.5.2  LearningAnalyticsdatageneratedfromtheEDSAOnlineCoursesportal...........................32 

4.5.3  InternallogofeLearningsystems..........................................................................................................33 

4.5.4  Statisticsofcourseregistration,participationandcompletion................................................34 

4.5.5  Aggregatedstatisticsofengagementwiththedevelopedcoursesandeducationalresources.........................................................................................................................................................................36 

4.5.6  RecordedbehaviorofstudentsfollowingthefirstsessionoftheprocessminingMOOC37 

4.6  Workpackage4–Disseminationandcommunitybuilding...............................................................38 

4.6.1  WebserverlogsandGoogleanalyticsofprojectwebsiteaccess.............................................38 

4.6.2  Generatedsocialmediaengagementdata..........................................................................................40 

4.7  Workpackage5–Exploitation.......................................................................................................................41 

4.7.1  Listofprojectexploitationresults‐collaborations,institutionalandgeographicalbeneficiaries...................................................................................................................................................................41 

4.7.2  TheEDSARegister........................................................................................................................................42 

ListofTablesTable1:EntriesintheDataManagementPlan‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐8 

Table2:FourLevelsofCertificates‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐11 

Table3:CorporaofcrawledWeb‐basedadvertsfromLinkedIn‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐17 

Table4:AggregatedStatisticsofEuropeanskilldemandonweb‐basedjobadverts‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐18 

Table5:Individualresultsfromdemandanalysis ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐20 

Table6:Summarydatafromsurveysandinterviews‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐21 

Table7:De‐identifiedsurveyresponsesfromdemandanalysis‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐22 

Table8:Recordingsandtranscriptionsofinterviews‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐23 

Table9:IdeXlabsearchplatformresults‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐24 

Table10:RelatedcoursedataregardingsimilarmodulesandtrainingavailableacrosstheEU‐‐‐‐‐26 

Table11:Datasetforcourseexamplesandexercises‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐27 

Table12:Eventlogfromamunicipalityprocess‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐29 

Table13:Repositorystatisticsondownloadsandviewsofeducationalresources‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐30 

Table14:LearninganalyticsdatageneratedfromEDSAonlinecoursesportal‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐32 

Table15:Internallogofelearningsystems ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐33 

Table16:Statisticsofcourseregistration,participationandcompletion‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐34 

Table17:Aggregatedstatisticsofengagementwiththedevelopedcoursesandeducationalresources‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐36 

Table18:RecordedbehaviourofstudentsfollowingthefirstsessionoftheprocessminingMOOC37 

Table19:WebserverlogsandGoogleanalyticsofprojectwebsiteaccess‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐38 

Table20:Generatedsocialmediaengagementdata‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐40 

Page 5: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage5of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

Table21:Listofprojectexploitationresults‐collaborations,institutionalandgeographicalbeneficiaries‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐41 

Table22:TheEDSARegister‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐42 

ListofFiguresFigure1EntriesintheDataManagementPlan‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐8 

Figure2TheDataSpectrum‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐10 

Page 6: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page6of43EDSAGrantAgreementno.643937

1 ExecutiveSummaryTheEuropeanDataScienceAcademy(EDSA)isparticipatinginthepilotactiononopenaccessresearchdata,asdefinedintheguidelinesonOpenAccesstoScientificPublicationsandResearchDatainHorizon20201.EDSAdataincludesdatathatisused,generatedandcollectedbytheproject.Thedatamanagementplan(DMP)iskeytotrackingthesedatasets,andidentifyingwhichofthemhavebeenorcanbepublishedunderanopenlicence.Itisnotalwaysappropriatetopublishdataasopendata;thedatamanagementplanallowsustoclearlyseewhatdataisnotpublishedopenlyandthereasonforthat.ThisistheseconditerationoftheDMP.ThefirstwasincludedinD5.5atMonth6oftheproject.TheoriginalDMPwasanoutlineofthedataweanticipated,whereasthisDMPincludesdatathatwehavestartedcollecting,generatingandusingintheproject.ThefinalversionoftheDMPisdueatMonth36oftheproject.TheEDSADMPincludesthefollowinginformationforeachdataset:

Datasetreferenceandname Datasetdescription Standardsandmetadata Datasharing Archivingandpreservation

Specifically,ourgoalsareto:

Manageandmaintaindata,whereapplicable,toensurequalityandtomakethedatausable. Ensure that all dataproducedby theproject is subject to appropriate levels of security and

privacy. Publishdataproducedbytheprojectunderanopenlicence,wherepossible.

Atthehalfwaystageoftheproject,thisDMPaddressessomeofthekeychallengesandlessonsthattheConsortiumhaslearnt.Theseareoutlinedbelow.Theplanalsooutlineshowindividualdatasetswillbemaintainedduringandaftertheproject.Wewillcontinuetoupdatedatasetswhereappropriate,collectandgeneratenewdatasetsandpublishasmuchofourdataasopenaspossible,usingthelessonsthatwehavelearnt.

1.1 Lessonslearnt

ThemanagementofEDSAdatahasprovideduswithsomeusefullessonsforfurtheriterationsoftheDMP.Theselessonscanbedividedintotwocategories:licensingandmanagingrisk.

1GuidelinesonOpenAccesstoScientificPublicationsandResearchDatainHorizon2020(2016)http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020‐hi‐oa‐pilot‐guide_en.pdf[accessed29/06/2016]

Page 7: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage7of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

Licensing

Checkthirdpartylicencesatregularintervalstoensurethatweareadheringtotheirtermsaslicencechangesaregenerallynotcommunicatedbydataproviders.

EncouragetheuseofCreativeCommonslicences2,specificallyCC‐BY,withintheEDSAprojecttomakeitaseasyaspossibletoreuseourdata.

Managingrisk

Anticipatechangesindatausewherepossible. Getinformedconsentforuseofpersonaldata

Wewilladdtotheselessonsastheprojectcontinuesandwefacemorechallengeswithourdatause.

1.2 UpdatesfromInitialDMP

TheEDSADMP is not a static log of the project datasets but an evolving resource, that reflects thechangingnatureofthedatatheprojectcollectsorgenerates.ThisDMPreflectstheprogresstheprojecthasmade in thepast12monthsof theproject ‐ incorporating furtherdatasetsandadhering toourguidingpoliciesonopenpublication.D5.5providedaninitialsnapshotofthedatawemanagedintheearlystagesoftheprojectandthedatathatweanticipatedwouldbecollectedorgeneratedoverthecomingmonths.ThisDMPreflectsthecurrentstatusoftheproject’sdatasetsinmoreextensivedetail.Figure1showsthedatasetswhichhavebeencollectedorgeneratedbytheConsortiumbetweenmonths6and18oftheproject.TherearetwonewentriestotheDMP:

De‐identifiedsurveyresponsesfromthedemandanalysisresearch LearningAnalyticsdatageneratedfromtheEDSAOnlineCoursesportal

Thede‐identifiedsurveyresponseshavereplacedtheaggregatedresultsfromtheonlinesurveydatasetasthewaythedataispresentedhasbeenchanged.Foramoredetaileddiscussiononthistopicseethesectionon‘Challengesanddecisions’,orD1.4.Learning Analytics from the EDSA Online Courses portal is a new entry to this DMP. It has beenpublishedopenlyviaGithub3,forotherstobenefitfrom,astherearenorestrictionswiththethirdparty.The entriesbelowhavebeen removed from thisDMPas theyhaveeitherbeen replacedby specificdatasetentries,orarenolongerexpectedtobecollectedorgeneratedastheprojecthasprogressed.Theremovedentriesare:

2PleasenotethattherearemultipleCreativeCommonslicences,whichareoutlinedontheirwebsite:https://creativecommons.org/licenses/3https://alexmikro.github.io/learning‐analytics‐dataset‐from‐the‐edsa‐online‐courses‐portal/

Page 8: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page8of43EDSAGrantAgreementno.643937

Aggregatedresultsfromtheonlinesurvey(asabove) Aggregatedstatisticsofnetworkingandengagementdata(datasetsnowexplicitlystated) Linkedopendatasources, suchas theDBLPComputerScienceBibliography4andGeoNames

Ontology5(datasetsnowexplicitlystated) Publically available governmental, financial, network and environmental datasets for each

course(datasetsnowexplicitlystated)

Table1:EntriesintheDataManagementPlan

WorkPackage

Lead Dataset ProjectPhase

Status NewentrytoDMPD5.6

WP1 ODI Corporaofcrawledweb‐basedadvertsfromLinkedIn

M6‐M18 Finished No

WP1 ODI AggregatedstatisticsofEuropeanskilldemandbasedonweb‐basedjobadverts

M6‐M18 Ongoing No

WP1 ODI Individualresultsfromdemandanalysis M2‐M18 Finished No

WP1 ODI DemandAnalysisSummary M18 Finished Yes

WP1 ODI De‐identifieddatafromdemandanalysis M2‐M18 Finished Yes

WP1 ODI Recordingsandtranscriptionsofinterviews M2‐M18 Finished No

WP1 ODI ideXlabsearchplatformresults M6‐M36 Ongoing No

WP2 ODI RelatedcoursedataregardingsimilarmodulesandtrainingofferingsacrosstheEU

M18 Finished No

WP2 Persontyle Datasetsforcourseexamplesandexercises M6‐M36 Ongoing No

WP2 TU/e Eventlogfromamunicipalityprocess M12‐M36

Ongoing No

WP3 OU LearningAnalyticsdatageneratedfromtheEDSAOnlineCoursesportal

M12‐M36

Ongoing Yes

WP3 JSI Repositorystatisticsondownloadsandviewsofeducationalresources

M12‐M36

Ongoing No

WP3 JSI Internallogsofelearningsystems M12‐M36

Ongoing No

WP3 JSI Statisticsofcourseregistration,participationandcompletion

M12‐M36

Ongoing No

4http://dblp.uni‐trier.de/5http://www.geonames.org/ontology/documentation.html

Page 9: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage9of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

WP3 JSI Aggregatedstatisticsofengagementwiththedevelopedcoursesandeducationalresources

M12‐M36

Ongoing No

WP3 TU/e RecordedbehaviorofstudentsfollowingthefirstsessionoftheprocessminingMOOC

M12 Finished No

WP4 SOTON WebserverlogsandGoogleanalyticsofprojectwebsiteaccess

M12‐M36

Ongoing No

WP4 SOTON Generatedsocialmediaengagementdata M12‐M36

Ongoing No

WP5 ideXlab Listofprojectexploitationresults–collaborations,institutionalandgeographicalbeneficiaries,

M18‐M36

Ongoing No

WP5 ODI EDSAregister M6‐M36 Ongoing Yes

Page 10: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page10of43EDSAGrantAgreementno.643937

2 PolicyInD5.5weoutlinedtheoverallEDSApoliciesfordatastandardsandmetadatastandards,datasharinganddatapreservation,inlinewithbestpracticeforestablishingaDMP6.

2.1 DatastandardsandmetadatapolicyforEDSA

Standardisingtheproject’scollectionandproductionofdataensuresreusabilityandinteroperabilitywithintheproject,andexternallyifopenlyavailable.Wherepossible,dataismadeavailableinCSV,JSONorlinkeddatainRDFformat

2.2 DatasharingpolicyforEDSA

Wherepossible,opendatawillbeprovidedsothatothersareabletoaccess,useandsharethedata.ThisdatawillbemadeavailableunderaCreativeCommonsAttributionlicence(CCBY4.0)TheOpenDataInstitutehasproducedadataspectrumassettoexplainfrequentlyused,butfrequentlymisinterpreted terms, such as open data, closed data, personal data, and big data. Themost usefulcategorisationofdataisthroughthelicenceandaccessrights.Dataexistsonaspectrum7,whichrangesfromclosedtoshared,toopen.

Figure1:TheDataSpectrum

CC‐BYTheOpenDataInstitute

6http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020‐hi‐oa‐pilot‐guide_en.pdf7https://theodi.org/data‐spectrum

Page 11: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage11of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

Thesurveyandinterviewdatafromthedemandanalysisprovidesuswithagoodexampleofdataacrossthespectrum.Thelistofnamesandemailaddressesofparticipantsisclosed.ThisiscurrentlyonlyheldbyonememberoftheConsortiumandwillonlybeusedifrequiredforauditpurposes.Theindividualrecordingsandtranscriptsareanexampleofshareddata.Theseareonlyavailabletomembersoftheconsortiumwho have been given named access. The de‐identified survey results from the demandanalysisareopen.ThisdataispublishedonGithubunderaCC‐BYlicence8.

2.3 SupportingpeoplewhowanttouseEDSAdataWeusetheODI’sOpenDataCertificatestandardtobenchmarkeachopendataset.Thiswillenableuserstoseewhenthedatawillbeupdated,whatformatthedataisin,whatsupportisavailableandwhereitcamefrom.Wherewehavepublisheddataopenly,wehaveusedtheOpenDataInstitute’scertificationprocesstodemonstratetopotentialreusersthatitisqualityopendata.TherearefourlevelsofCertificates9:

Table2:FourLevelsofCertificates

Bronze

The data is openly licensed, available with no restrictions, accessible and legallyreusable.

Silver

The data satisfies the Bronze requirements, the data is documented in amachinereadable format, is reliable and offers ongoing support from the publisher via adedicatedcommunicationchannel.

Gold

ThedatasatisfiestheSilverrequirements,ispublishedinanopenstandardmachinereadable format, has guaranteed regular updates, offers greater support,documentation,andincludesamachinereadablerightsstatement.

Platinum

The data satisfies the Gold requirements, has machine readable provenancedocumentation, uses unique identifiers in the data, the publisher has acommunications team offering support. This is an exceptional example of aninformationinfrastructure.

CC‐BYTheOpenDataInstitute

8http://davetaz.github.io/quantitative‐data‐from‐edsa‐demand‐analysis‐/9https://certificates.theodi.org/en/

Page 12: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page12of43EDSAGrantAgreementno.643937

Currently,notallofthedatathathasbeenpublishedhasbeencertified,althoughthisisouraimandinprogress.EDSADatasetsthatcurrentlyhaveacertificate:

TheEDSARegisterhasabronzecertificatebecauseitis:o Openlylicensedandlegallyreusable(=‘open’)o Accessibleontheweb

De‐identifiedsurveyresponsesfromthedemandanalysishasasilvercertificatebecauseitis:o Openlylicensedandlegallyreusable(=‘open’)o Accessibleonthewebo Publishedinamachinereadableformato OffersongoingsupportfromthepublisherviaGithub

ItisimportanttonotethatthereisnoissueinbeingatBronzelevel,asthedataisstillpublishedopenlytoa level thatmeetsuserneeds.Higher isnotalwaysappropriate for theEDSAdataas there isnotalwaysamechanism inplace forongoingdiscussionof thedata (Silver), and itmaynotbeupdatedregularly, especiallybeyond the lengthof theproject (Gold).Over99%of allODI certificates are toBronzestandard10.

2.4 DatastorageandmanagementpolicyforEDSATherearecurrentlythreemaintypesofrepositoryforEDSAdata:openaccessrepositories(forexampleGithub),theEDSAprojectwebsiteandinternalinstitutionalandorganisationalrepositoriesforsecurelyholdingdata.Thecriteriafordeterminingwheredataisstoredisasfollows:Openaccessrepositories:Wearefollowingapolicyof‘openbydefault.’Ifthereisnoreasonwhythedatacannotorshouldnotbepublishedopenly,thenourpolicyisthatitshouldbepublishedunderanopenlicence.Opendataaboutindividualsshouldbede‐identified,andonlypublishedwiththeconsentoftheindividualsconcerned.Thedatashouldalsobeunrestrictedbytermsofuse.TheEDSAprojectwebsite:TheaimisthatallofthedatathatispublishedopenlywillbemadeavailableviatheEDSAwebsite.Thisistoensurethatthedataisfindablebyaswideanaudienceaspossible.Datathatisopenlylicensedbutdifficulttodiscoverisnotwidelyconsideredtobeopendata.TheEDSAwebsitealsodisplaysdatathatcannotbepublishedopenly,oftenduetorestrictionsintermsofuse.Thisallowsuserstoviewthedata,oraggregationsofthedata.Internalinstitutionalandorganisationalrepositories:Somedatasetsinthedatamanagementplanarehostedinrepositoriesoftheorganisationresponsibleforthatdata.Whilesomeof theseare internal,hosted inConsortiumpartners’ internalrepositories,

10https://certificates.theodi.org/status

Page 13: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage13of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

somedatasetsusedincoursematerialsarehostedonexternalrepositoriesthereforethatorganisationisresponsibleformaintainingthedata.Datasetshostedininternalrepositoriescannotbepublished,usuallyduetorestrictionsofuseofpersonaldata.

2.5 DatapreservationandarchivingpolicyforEDSAStriving forpreservationofdatawill enable long‐termvalue tobeadded to thedomainbeyond theproject.ItwillalsoproveavaluableresourcetoaEuropeanwideinitiative(EDSA)initiatedaspartofworkpackage5. Althoughtheaimoftheprojectistopreserveasmuchofthedataaspossible,datapublishedinexternalopenrepositoriesisreliantonthatsystem.Asaproject,EDSAareyettodetermineapolicyregardingarchivingofdatasets.ThiswillbedecidedpriortothefinalDataManagementPlan(M36).

Page 14: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page14of43EDSAGrantAgreementno.643937

3 ChallengesanddecisionsCreatingandmaintainingaDMPfortheprojecthasensuredthatwehavebeenabletohighlightpotentialdatamanagementandusagechallengesandmakeinformeddecisionswithintheConsortiumonhowtheyshouldbeaddressed.Inthissection,wehighlightsomeofthemaintopicsofconsiderationwhenmanagingprojectdatasets.TheseprovideuswithusefullessonsforfurtheriterationsoftheDMP.

3.1 Informedconsent

Itisalegalrequirementtoinformpeoplehowtheirpersonaldataisgoingtobeusedandtoretrievetheirinformedconsent.Whilstthereareexceptionstothisrequirement,suchasnationalsecurityorforservicesinthepublicinterest,theseexceptionsdonotapplytothisproject.Thisisareathatwehaveaddressedintheprojectoverthelast12months.Theintendeduseofthedemandanalysissurveydatachangedoverthecourseofthe18‐monthdatacollection,duetothelengthofthestudy.Atthestartoftheproject,weplannedtoreleasesummarystatisticsofthequantitativesurveydatathroughourskillsdashboard.Accordingly,participantswereinformedthatdatawouldbemadeaccessibleinananonymous,aggregatedform.Following discussions during the evaluation of the pilot study, we established that releasing de‐identifiedsurveyresponseswouldaddvaluetotheproject’soutputs.Thisdatawouldprovideaccesstoresponsesonanindividualbasis,thusaddingmuchgreaterdetailandutilitytopotentialreusersofthedata.Consequently, thedatawouldnolongerbeaggregated.Inmonth9oftheproject,wethereforechangedthewordingoftheinformedconsentsectionofthesurvey,statingthatdatacouldlaterbemadepubliclyavailableinde‐identifiedformats,usinganopenlicence.Duetothischangeinuse,wehavenotusedthedatacollectedbeforethepermissionschangeonthedashboard,orintheopendataset.ThedatacollectedfromearlystudyparticipantshasbeenincludedintheaggregatedanalysisinD1.4,butarenotavailableintheopendata.Wehavealsonotpublisheddatafrompeoplewhohadwithheldconsentforthisuseoftheirdata.

3.2 Anonymisationofpersonaldata

ItwasimportanttoensurethattheLearningAnalyticsdatawasanonymisedbeforeitcouldbepublishedopenlyandthatnoindividualusercouldbeidentified.TheConsortiumcametoanagreedpolicywhichwillbeappliedwhereappropriatetofurtherdatasets.Wewillpublishdataopenlyifthedatahasbeende‐identified, and individual users cannot be recognised. De‐identified data will also be publishedalongside a Privacy Impact Assessment which identifies potential risks and how they have beenmanaged.

Page 15: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage15of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

3.3 Thirdpartylicences

Therearemanytypesofopenlicences.Severaltimes,theConsortiumfacedchallengeswhenscouringthetermsandconditionsofthirdpartysites,suchasLinkedIn11,AdzunaAPI12,LearningLocker13,toensurethetermsofuseareadheredto.FortheEDSAdata,theConsortiumisencouragedtouseCreativeCommonslicencestoensurethatpeoplewishingtouseourdatacanclearlyfindhowtheycanuseit.AlternativeopenlicencesusedintheprojectaretheopensourceGNUGeneralPublicLicenceV314andthe3TU.DatacentrumGeneralTermsofUse15.Whendataiscollectedfromathirdpartywebsite,itisvitaltotrackthetermsofuse,asthesecanchange.Atthebeginningoftheproject,theConsortiumcollectedandpublisheddatafromathirdpartywebsite,LinkedIn. At the time, the terms allowed such use, but during the project the licence provided byLinkedInchanged.Wedidnotkeeparecordoftheoriginallicence.Therewasdebateaboutwhethertheprojectcouldkeepthedataopenlyavailable,asatthetimeofcollectionthiswaspermitted.Toavoidriskwedecidedtoremovethatdata,especiallyasthelinkstothelicencethatwehadusedspecificallystatedthatwecouldnotuseitinthewaywehadplanned.Another notable challenge was the use of data from Trovit16, a website that aggregates jobadvertisementsfromacrossEurope.Thisdatapopulatesthejobsdashboard.Thetermsofthelicencedidnotallowustousethedata,howeveraftercarefulconsiderationwedecidedasaConsortiumthattheUKtextanddatamining(TDM)exceptionforresearchpurposes17allowedtheuseofthedataaslongas thedata itself wasnotaccessiblebyothers.TheUK law followsguidance from theEUDatabaseDirective(96/9/EC),anddiscussionsonanEU‐wideTDMexceptionarelikelytotakeplacein2016.Restrictionsondatausefrequentlypreventindividualsfrommaximisingthevalueofthatdata.Ifthedatawasopen, and anyone could access, use and share it for anypurpose,Trovitwouldultimatelybenefit from increased coverage and traffic, via attribution. If a company does notwant anyone tobenefitfinanciallyfromtheirwork,anon‐commerciallicencesuchasCC‐BY‐NC4.018wouldstillenableotherstousethedataandlinkbacktoTrovit.

11https://developer.linkedin.com/legal/api‐terms‐of‐use12https://developer.adzuna.com/13http://learninglocker.net/14http://www.gnu.org/licenses/gpl‐3.0.en.html15http://researchdata.4tu.nl/fileadmin/editor_upload/pdf/General_terms_of_use_3TU.Datacentrum.pdf16http://jobs.trovit.co.uk/17http://www.legislation.gov.uk/ukdsi/2014/9780111112755p6(accessed30/06/2016)18https://creativecommons.org/licenses/by‐nc/4.0/

Page 16: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page16of43EDSAGrantAgreementno.643937

4 DataManagementPlan

4.1 Summary

WhencreatingtheEDSADMP,wetookguidancefrombestpractice,onlinetoolssuchasDMPTool19andDMPonline20.DMPonlineallowsyoutoselectthespecificprojectcategory, inthiscaseHorizon2020pilot action on open access research data, and therefore ensure thatwe captured all the necessarymetadata.Thedatasetsareorganisedbyworkpackage.EachtablerepresentsonedatasetgeneratedorcollectedbytheEDSAproject.Eachtableincludesthefollowinginformation:

Datasetreferenceandname Datasetdescription Standardsandmetadata Datasharing Archivingandpreservation

WecreatedadatasetofallthedatasetsgeneratedorcollectedbytheEDSAproject.Detailsofthiscanbefoundbelow.

4.2 TheEDSARegister

4.2.1 Introduction

The EDSA Register is published under a CC‐BY 4.0 Creative Commons licence21. It is published onGitHub22andcanalsobeaccessedviatheEDSAwebsiteathttp://edsa‐project.eu/resources/datasets/ThedatasethasbeencertifiedasBronzeusingtheOpenDataCertificates23.ThisdatasetisupdatedeverythreemonthsbytheODIwithinformationfromtheWorkPackageleads.Thenextupdateisdueatmonth21oftheproject.WiththeConsortiumworkpackageleads,weexploredthedatasetsforeachworkpackage,enablingdiscoveryofwhatdatacouldbepublishedopenly.Welooktosharebestpracticesandtoensureahighqualityofopendata.Bestpracticesinclude:

Publishinginamachinereadableformat,e.g.CSV Providingsupportingdocumentationormetadata Using a clear open licence, preferably Creative Commons Attribution 4.0 licence24 for

consistency.

19https://dmp.cdlib.org/20https://dmponline.dcc.ac.uk/21https://creativecommons.org/licenses/22https://theodi.github.io/european‐data‐science‐academy‐register/ 23https://certificates.theodi.org24https://creativecommons.org/licenses/

Page 17: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage17of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

WeusedaGoogleSheetasourdatamanagementtool,whichweupdateeverythreemonths.ItisthisGoogleSheetwhichisembeddedontheEDSAwebsite25.AnyeditsmadeontheGoogleSheetshowonthewebsiteinrealtime.

4.3 Workpackage1‐Demandanalysisandadvisoryboard

WP1hascollectedandgenerateddatafromthedemandanalysisstudy.Thisincludesrecordingsandtranscriptionsoftheinterviews,surveyresponsesandanonymisedresultsofthesurveysandinterviews

4.3.1 Corporaofcrawledweb‐basedadvertsfromLinkedIn

Table3:CorporaofcrawledWeb‐basedadvertsfromLinkedIn

DatasetReferenceandName

DatasetIdentifier WebSiteHarvest

Datasetdescription

Generatedorcollected Collected

Origin LinkedIn

Scale 46terms31languages47countries1harvestperday2162datapointsperday

Whoisthisusefulfor? Internaldemandanalysisandtoinformcurriculumdevelopment.

Similarexistingdatasets Manydatasetsarecollectedinthisarea,howeverduetothespecificnatureofthisstudy,collectionofnewdataisrequiredandintegrationwithexistingdatasets isnot viable.Thevalueof thisdataset comesfromtheprovisionofanup‐to‐datesnapshotofcurrentdatascienceskillsneedsacrosstheEU.

Standardsandmetadata

Methodologyfordatacollection/management AlldatacollectedistranslatedintoCSVformat.

Metadata,supportingmaterial Datawillbenotavailableforreuseoraccessiblebyanyoneoutsideofthe project. The data collectedwill be used for internal analysis toinformthecreationofcurriculum.

Statusandlocationofmetadata

Metadataisnotpublicallyavailable

25http://edsa‐project.eu/datasets

Page 18: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page18of43EDSAGrantAgreementno.643937

Datasharing

Licensing,dataprotection,ownershipandcopyright UsageoftheLinkedInserviceisboundbytheuseragreement

Ifthedatacannotbepublishedopenly,why?

ThetermsoftheLinkedInuseragreementnowforbidharvestingandcollection of data without express permission. When the data wascollected,thiswasnotthecase.

https://www.linkedin.com/legal/user‐agreement?trk=hb_ft_userag

Howwillthedatabeshared? Datawillbenotsharedoravailableforreuse

Datarepository InternalODIRepository

DatasetLink Thereisnoexternallink

Archivingandpreservation

Howlongshouldthedatabepreserved?

Untiltheendoftheproject

Approxendvolume <1Gb

Whoisresponsibleforthedatamanagementandcuration?

ODI lead data management and curation, other WP1 partners willcontribute

Qualityassuranceincludingbackupprocedures

BackeduptoaninternalODIrepository

Associatedcostsfordatamanagement

Approximately1dayeffortpermonth

4.3.2 AggregatedstatisticsofEuropeanskilldemandbasedonweb‐basedjobadverts

Table4:AggregatedStatisticsofEuropeanskilldemandonweb‐basedjobadverts

DatasetReferenceandName

DatasetIdentifier WebSiteStatistics

Datasetdescription

Generatedorcollected Collected

Page 19: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage19of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

Origin AdzunaAPI26Trovit27

Scale Varied

Whoisthisusefulfor? Populating the dashboard, internal demand analysis and to informcurriculumdevelopment.

Similarexistingdatasets Manydatasetsarecollected inthisarea,howeverduetothespecificnatureofthisstudy,collectionofnewdataisrequiredandintegrationwith existing datasets is not viable. The value of this dataset comesfromtheprovisionofanup‐to‐datesnapshotofcurrentdatascienceskillsneedsacrosstheEU.

Standardsandmetadata

Methodologyfordatacollection/management

AlldatacollectedistranslatedintoCSVformat.

Metadata,supportingmaterial TheAdzunadataisaccessibleviatheAdzunaAPI.TheTrovitdatawillbe not available for reuse or accessible by anyone outside of theproject.

Statusandlocationofmetadata

Metadataisnotpublicallyavailable

Datasharing

Licensing,dataprotection,ownershipandcopyright

ThedatawillbeavailableforuseviatheEDSAdashboard.Howeveritwillnotbeavailable todownloadas this contravenesTrovit’s termsandconditions.

Ifthedatacannotbepublishedopenly,why?

Trovit’s terms of use prohibit the use of their data. The researchexceptionallowsustousethedatabutnottomakeitavailableinrawformatforotherstoconsumeforcommercialpurposes.

Howwillthedatabeshared? ViatheEDSAdashboard

Datarepository InaninternalJSIrepository

DatasetLink N/A

Archivingandpreservation

Howlongshouldthedatabepreserved?

Untiltheendoftheproject

Approxendvolume <1Gb

26https://developer.adzuna.com/27https://www.trovit.com/

Page 20: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page20of43EDSAGrantAgreementno.643937

Whoisresponsibleforthedatamanagementandcuration?

ODI lead data management and curation, other WP1 partners willcontribute

Qualityassuranceincludingbackupprocedures

BackeduptoaninternalJSIrepository

Associatedcostsfordatamanagement

Approximately1dayeffortpermonth

4.3.3 Individualresultsfromdemandanalysis

Table5:Individualresultsfromdemandanalysis

Datasetreferenceandname

Datasetidentifier IndividualResponses

Datasetdescription

Generatedorcollected Generated

Origin Guidedsurveysandonlineresponses

Scale 584surveys108interviews

Whoisthisdatausefulfor? Internaldemandanalysis.

Similarexistingdatasets A number of surveys exist in this domain but their data is notavailableto thisproject. ThisdatawillenableEDSAtobuildupacountrybycountryviewofcurrentcapacityandrequirementsfordatascienceskills.

Standardsandmetadata

Methodologyfordatacollection/management

Data collection methods outlined in D1.4. Translated into CSVformat.

Metadata,supportingmaterial

Datawillbenotavailableforreuseoraccessiblebyanyoneoutsideoftheproject.Thedatacollectedwillbeusedforinternalanalysistoinformthecreationofcurriculum.De‐identifieddatawillbepubliclyavailable,wherepossible.

Statusandlocationofmetadata

Metadataisnotpublicallyavailable

DataSharing

Licensing,ownershipandcopyright

Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.

Ifthedatacannotbepublishedopenly,why?

Dataprotectionofpersonaldata

Page 21: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage21of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

Howwillthedatabeshared? Datawillbenotsharedoravailableforreuse

Datarepository InternalODIrepository

DatasetLink Thereisnoexternallink

Archivingandpreservation

Howlongshouldthedatabepreserved?

Untiltheendoftheproject

Approximateendvolume <100Mb

Whoisresponsiblefordatacurationandmanagement?

ODI leaddatamanagementandcuration,otherWP1partnerswillcontribute

Qualityassuranceincludingbackupprocedures

BackeduptoaninternalODIrepository

Associatedcostsfordatamanagement

Approximately1dayeffortpermonth

4.3.4 Summarydatafromsurveysandinterviews

Table6:Summarydatafromsurveysandinterviews

Datasetreferenceandname

Datasetidentifier DemandAnalysisSummary

Datasetdescription

Generatedorcollected Generated

Origin Guidedsurveysandonlineresponses

Scale584surveys108interviews

Whoisthisdatausefulfor?Externalanalysisofrespondentswhotookthesurveysandinterviews.

Similarexistingdatasets None

Standardsandmetadata

Methodologyfordatacollection/management

DatacollectionmethodsoutlinedinD1.4.TranslatedintoCSVformat.

Metadata,supportingmaterialAREADME.mdfileisavailabledetailingthedatastructureandbasicusage.

Statusandlocationofmetadatahttps://theodi.github.io/edsa‐demand‐analysis‐summary‐data/

DataSharing

Licensing,ownershipandcopyright

CreativeCommonsAttribution(CCBY4.0)https://creativecommons.org/licenses/by/4.0/

Page 22: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page22of43EDSAGrantAgreementno.643937

Ifthedatacannotbepublishedopenly,why?

Thedataispublishedopenly

Howwillthedatabeshared?DatawillbeavailabletoaccessfromtheEDSAwebsiteandtheODI’sGithubrepository.

Datarepository Github/EDSAwebsite

DatasetLinkhttps://theodi.github.io/edsa‐demand‐analysis‐summary‐data/

Archivingandpreservation

Howlongshouldthedatabepreserved?

AslongasGithubexistsasaminimum.Beyondthatavaluejudgementwouldhavetobemade.

Approximateendvolume <100Mb

Whoisresponsiblefordatacurationandmanagement?

ODIleaddatamanagementandcuration,otherWP1partnerswillcontribute

Qualityassuranceincludingbackupprocedures

Storedinexternalrepositories‐EDSAwebsiteandGithub

Associatedcostsfordatamanagement

Storedinexternalrepositories‐EDSAwebsiteandGithub

4.3.5 De‐identifiedsurveyresponsesfromdemandanalysis

Table7:De‐identifiedsurveyresponsesfromdemandanalysis

Datasetreferenceandname

Datasetidentifier DeidentifiedResponses

Datasetdescription

Generatedorcollected Generated

Origin OnlineSurveyhttp://edsa‐project.eu/resources/survey/

Scale 496surveyresults

Whoisthisdatausefulfor? Externalanalysisofresultsandtrendsbyanyonewhowishestogathersurveydataintheareaofdatascience

Similarexistingdatasets Thereareanumberofothersurveysthathavebeenaggregatedthatwecancompareourresulttooandusetheseresultsifnecessary.Thisdatasethasthesameeventualvaluetoothersinthearea.

Standardsandmetadata

Methodologyfordatacollection/management

DatacollectionmethodsoutlinedinD1.4.TranslatedintoCSVformat.

Page 23: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage23of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

Metadata,supportingmaterial

AREADME.mdfileisavailabledetailingthedatastructureandbasicusage.

Statusandlocationofmetadata

http://davetaz.github.io/quantitative‐data‐from‐edsa‐demand‐analysis‐/

DataSharing

Licensing,ownershipandcopyright

CreativeCommonsAttribution(CCBY4.0)https://creativecommons.org/licenses/by/4.0/

Ifthedatacannotbepublishedopenly,why?

Thedataispublishedopenly

Howwillthedatabeshared? DatawillbeavailabletoviewontheEDSAdashboardandaccessibleforfreeintheEDSAdashboardGithubrepository.

Datarepository Github/EDSADashboardonwebsite

DatasetLink http://davetaz.github.io/quantitative‐data‐from‐edsa‐demand‐analysis‐/

Archivingandpreservation

Howlongshouldthedatabepreserved?

AslongasGithubexistsasaminimum.Beyondthatavaluejudgementwouldhavetobemade.

Approximateendvolume <100Mb

Whoisresponsiblefordatacurationandmanagement?

ODIleaddatamanagementandcuration,otherWP1partnerswillcontribute

Qualityassuranceincludingbackupprocedures

Storedinexternalrepositories‐EDSAwebsiteandGithub

Associatedcostsfordatamanagement

Storedinexternalrepositories‐EDSAwebsiteandGithub

4.3.6 Recordingsandtranscriptionsofinterviews

Table8:Recordingsandtranscriptionsofinterviews

Datasetreferenceandname

Datasetidentifier InterviewTranscipts

Datasetdescription

Generatedorcollected Generated

Origin Interviews

Scale108transcripts108recordings

Whoisthisdatausefulfor? Internaldemandanalysis

Page 24: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page24of43EDSAGrantAgreementno.643937

SimilarexistingdatasetsNosimilardatasetsexistthatareusableforthisproject.Theinterviewsprovideinsightsanddatapointsforuseinthedemandanalysis.

Standardsandmetadata

Methodologyfordatacollection/management

QualitativeandquantitativeresearchmethodologyforcollectionoutlinedinD1.4

Metadata,supportingmaterial

Datawillbenotavailableforreuseoraccessiblebyanyoneoutsideoftheproject.Thedatacollectedwillbeusedforinternalanalysistoinformthecreationofcurriculum.

Statusandlocationofmetadata

Metadataisnotpublicallyavailable

DataSharing

Licensing,ownershipandcopyright

Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.

Ifthedatacannotbepublishedopenly,why?

Dataprotectionofpersonaldata

Howwillthedatabeshared? Datawillbenotsharedoravailableforreuse

Datarepository InternalODIrepository

DatasetLink Thereisnoexternallink

Archivingandpreservation

Howlongshouldthedatabepreserved?

Untiltheendoftheproject

Approximateendvolume <3GB

Who is responsible for datacurationandmanagement?

ODIleaddatamanagementandcuration

Quality assurance includingbackupprocedures

BackeduptoaninternalODIrepository

Associated costs for datamanagement

AspartofthesubcontractingcostsofWP1

4.3.7 ideXlabsearchplatformresults

Table9:IdeXlabsearchplatformresults

DatasetReferenceandName

DatasetIdentifier ExpertIdentification

Datasetdescription

Generatedorcollected Collected

Origin Researchpublications

Page 25: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage25of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

Scale Notyetknownascollectionisongoing

Whoisthisusefulfor? Internal demand analysis and to inform curriculum development.Providesinsightsintooffersideofskillsanalysis.

Similarexistingdatasets Notinthisarea.Thisdatasetwillprovidevalidationofthedemandanalysisandformthebasisforfurtherinsights.

Standardsandmetadata

Methodologyfordatacollection/management

TheideXlabsearchenginewillusethesamplingapproachoutlinedinD1.2.fordatacollection.CSVdatawillbecreated

Metadata,supportingmaterial Datawillbenotavailableforreuseoraccessiblebyanyoneoutsideofthe project. The data collectedwill be used for internal analysis toinformthecreationofcurriculum.

Statusandlocationofmetadata

Accompanyingdocumenttoexplaindatastructure.Thiswillnotbemadeopen.

Datasharing

Licensing,dataprotection,ownershipandcopyright

Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.

Ifthedatacannotbepublishedopenly,why?

Protectionofpersonaldata

Howwillthedatabeshared? Thedatawillnotbesharedduetorestrictionsontheuseofpersonaldata.

Datarepository ideXlabsearchplatform

DatasetLink Thereisnoexternallink

Archivingandpreservation

Howlongshouldthedatabepreserved?

Untiltheendoftheproject

Approxendvolume Est.1000returns

Whoisresponsibleforthedatamanagementandcuration?

ideXlableaddatamanagementandcuration,otherWP1partnerswillcontribute

Qualityassuranceincludingbackupprocedures

BackeduptoaninternalideXlabrepository

Associatedcostsfordatamanagement

Approx2persondayspermonth.Nootherexternalcosts

Page 26: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page26of43EDSAGrantAgreementno.643937

4.4 Workpackage2–CurriculaandcoursedevelopmentWP2hascollecteddatafromopenlyavailablesourcesandcreatedsubsetsofthisdatatobeusedinthelearningresourcesproduced.Datahasalsobeencollectedaboutexistingdatasciencecoursesasperearlierrecommendations.

4.4.1 RelatedcoursedataregardingsimilarmodulesandtrainingavailableacrosstheEU

Table10:RelatedcoursedataregardingsimilarmodulesandtrainingavailableacrosstheEU

Dataset Reference andName

DatasetIdentifier DataScienceCourses

Datasetdescription

Generatedorcollected Collected

Origin Coursewebsites

Scale Notyetknown

Whoisthisusefulfor? Internalusefordevelopmentofcurriculaandlearningmaterials.

SimilarexistingdatasetsNone.Thedatawillprovideauseful resourceaspartof thedemandanalysis.

Standardsandmetadata

Methodology for datacollection/management

Systematic search and reviewof available data science courses. Thesearch terms were Data Science, Big Data, Data Analytics, BusinessAnalytics, Machine Learning, Distributed Computing, AdvancedComputingDataScienceStream,DataAnalyticsstream.

Metadata,supportingmaterial Metadatahasbeenpublishedalongsidethedata

Status and location ofmetadata https://theodi.github.io/data‐science‐courses‐in‐europe‐2016/

Datasharing

Licensing, data protection,ownershipandcopyright ThedataislicensedunderaCreativeCommonsCC‐BY4.0licence

Ifthedatacannotbepublishedopenly,why? Thedataispublishedopenly

Howwillthedatabeshared?GitHub/EDSAwebsite

Datarepository GiHub.AlsoavailableviatheEDSAwebsite

DatasetLinkhttps://theodi.github.io/data‐science‐courses‐in‐europe‐2016/

Page 27: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage27of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

Archivingandpreservation

How long should the data bepreserved? Untiltheendoftheproject

Approxendvolume <1GB

Who is responsible for thedata management andcuration?

ODIleaddatamanagementandcuration

Quality assurance includingbackupprocedures

BackeduptoaninternalODIrepository

Associated costs for datamanagement

AspartofthesubcontractingcostsofWP1.Noongoingcosts.

4.4.2 Datasetforcourseexamplesandexercises

Table11:Datasetforcourseexamplesandexercises

DatasetReferenceandName

DatasetIdentifier Using namespace notation to specify R packages: sml::poly4,sml::poly4b, sml::kmeans, sml::seeds, car::Duncan, car::Davis,datasets::car, datasets::HairEyeColor, datasets::Airquality,datasets::swiss,bestGLM::zprostate,MASS::menarche

Datasetdescription

Generatedorcollected Both

Origin Third party R packages students download from CRAN. Some in anauthordevelopedpackagehostedonCRAN

Scale 12smalldatasets.<1MB

Whoisthisusefulfor? Students in the "EssentialsofDataAnalytics andMachineLearning"course.

Similarexistingdatasets DatasetsarearchivedinCRAN.Usedincourseexamplesandexercises.

Standardsandmetadata

Methodologyfordatacollection/management None

Page 28: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page28of43EDSAGrantAgreementno.643937

Metadata,supportingmaterial Thedatasetswillbeusedwithinlearningactivitiesofferedaspartofthe"EssentialsofDataAnalyticsandMachineLearning"course.TheyarestoredinthesmlRpackage.

Statusandlocationofmetadata

Package documentation (except, currently, for those in the smlpackage)

Datasharing

Licensing,dataprotection,ownershipandcopyright

GNUGPLV3

http://www.gnu.org/licenses/gpl‐3.0.en.html

Ifthedatacannotbepublishedopenly,why? Thedataispublishedopenly

Howwillthedatabeshared? ViaRpackages,searchableonline.

Datarepository CRAN

DatasetLink https://vincentarelbundock.github.io/Rdatasets/datasets.html

Archivingandpreservation

Howlongshouldthedatabepreserved? Aslongastheownersdonotremovethem.Ifthedatasetsarenolonger

accessible,othersimilardatasetswillbeusedinthemodule.

Approxendvolume <1MB

Whoisresponsibleforthedatamanagementandcuration?

Persontyle lead data management and curation, third parties forcollecteddata

Qualityassuranceincludingbackupprocedures RelyingonCRAN

Associatedcostsfordatamanagement None

Page 29: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage29of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

4.4.3 Eventlogfromamunicipalityprocess

Table12:Eventlogfromamunicipalityprocess

DatasetReferenceandName

DatasetIdentifier a07386a5‐7be3‐4367‐9535‐70bc9e77dbe6

Datasetdescription

Generatedorcollected Collected

Origin Dutchmunicipality

Scale 200KB

Whoisthisusefulfor? Usersinterestedinreallifeeventlogs.

Similarexistingdatasets Large collection of real life event logs athttp://data.3tu.nl/repository/collection:event_logs_real

Standardsandmetadata

Methodologyfordatacollection/management Managementthrough3TUdatacentre

Metadata,supportingmaterial Includesnumberoftraces,events,attributes,timespan,etc.

Statusandlocationofmetadata

http://data.3tu.nl/repository/uuid:a07386a5‐7be3‐4367‐9535‐70bc9e77dbe6

Datasharing

Licensing,dataprotection,ownershipandcopyright

Ownlicence(Attribution,non‐commercial)

http://researchdata.4tu.nl/fileadmin/editor_upload/pdf/General_terms_of_use_3TU.Datacentrum.pdf

Ifthedatacannotbepublishedopenly,why?

Thedataisavailablepublicly.Astherearerestrictionsofusewiththelicence,thiscannotbeconsidered‘opendata’

Howwillthedatabeshared? Via3TUDatacentre

Datarepository 3TUDatacentre

DatasetLink http://data.3tu.nl/repository/uuid:a07386a5‐7be3‐4367‐9535‐70bc9e77dbe6

Archivingandpreservation

Howlongshouldthedatabepreserved? pastprojectend

Page 30: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page30of43EDSAGrantAgreementno.643937

Approxendvolume 200KB

Whoisresponsibleforthedatamanagementandcuration? 3TU

Qualityassuranceincludingbackupprocedures

Reliantonthirdparty.Ifthedatasetbecomesunavailablewewilluseasimilaroneintheonlinemodule.

Associatedcostsfordatamanagement None

4.5 Workpackage3–TrainingdeliveryandlearninganalyticsfeedbackWP3hasstartedcollectingdataonthetrainingdeliveredintheproject–face‐to‐faceandonline‐andwillcontinuetocollectdataasmoretrainingiscreatedanddelivered.

Thisincludesdataoncourseregistration,participationandstudentretentionrate.Weusethisdatatoinformbestpracticesforstudentsandeducators,andtoimprovethecurriculaandcontent.Thisisstillalottobeexploredaroundthelearninganalyticsdata,especiallyaswecontinuetocreatemoreonlinemodules.Differentpartnershavecreatedmodulesusingdifferentsoftware.ForexampleCoursera28,TinCanAPI(xAPI)29,LearningLocker30.

4.5.1 Repository statistics on downloads and views of educationalresources

Table13:Repositorystatisticsondownloadsandviewsofeducationalresources

DatasetReferenceandName

DatasetIdentifier RepositoryStatistics

Datasetdescription

Generatedorcollected Collected

Origin videolectures.net

Scale Viewsandcommentsforeachvideolecture

Whoisthisusefulfor? Internalanalysis,curriculumdevelopment,externaldemandanalysis

28https://www.coursera.org29http://tincanapi.com/30http://learninglocker.net/

Page 31: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage31of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

Similarexistingdatasets None.Providesevidenceofresourceusageandbasisfor improvingcurriculum,contentandcoursestructure.

Standardsandmetadata

Methodologyfordatacollection/management CSVisusedforVideolecturesAPI

Metadata,supportingmaterial Videolectures REST api documentation. An MD Readme file isavailablefordownload

Statusandlocationofmetadata https://github.com/innanoval/edsa‐videolectures‐statistics‐dataset‐1/tree/gh‐pages/data

Datasharing

Licensing,dataprotection,ownershipandcopyright ThedataispublishedunderaCC‐BYlicence.

Ifthedatacannotbepublishedopenly,why? N/A

Howwillthedatabeshared? Availabletoseeatvideolectureswebsite;describedaspartofWP3deliverables;publishedonGithub

Datarepository Github/videolecturesrepository.Proximitytodatasource.

DatasetLink https://github.com/innanoval/edsa‐videolectures‐statistics‐dataset‐1/tree/gh‐pages/data

Archivingandpreservation

Howlongshouldthedatabepreserved?

thedatawillbeavailableaftertheprojectendsaspartoftheproject'slearningmaterials

Approxendvolume <1GB

Whoisresponsibleforthedatamanagementandcuration? JSIleaddatamanagementandcuration.OUcontribute

Qualityassuranceincludingbackupprocedures

videolectures ‐ relying on internal quality assurance & back upprocedures

Associatedcostsfordatamanagement Approximately1daypermonthduringtheproject’slifetime

Page 32: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page32of43EDSAGrantAgreementno.643937

4.5.2 Learning Analytics data generated from the EDSA Online Coursesportal

Table14:LearninganalyticsdatageneratedfromEDSAonlinecoursesportal

DatasetReferenceandName

DatasetIdentifier EDSAOnlineCoursesLA

Datasetdescription

Generatedorcollected Generated

Origin http://courses.edsa‐project.eu

Scale Notyetknown

Whoisthisusefulfor? Courseproducerscangetanunderstandingofhowtheircoursesarebeingused.Learnerscanmonitortheirlearningprogress.

Similarexistingdatasets NotmanyLearningAnalyticsdatasetsarepubliclyavailable.TheOUhas recently published a similar dataset:https://analyse.kmi.open.ac.uk/open_dataset

Standardsandmetadata

Methodologyfordatacollection/management

ThexAPIspecificationisusedforexpressingthedata;theopensourceLearningLockersoftwareisusedforstoringandvisualisingthedata.

Metadata,supportingmaterial Introduction to the xAPI (or Tin Can API):https://tincanapi.com/overview/. Introduction to Learning Locker:https://learninglocker.net

Statusandlocationofmetadata https://tincanapi.com/overview/

https://learninglocker.net

https://alexmikro.github.io/learning‐analytics‐dataset‐from‐the‐edsa‐online‐courses‐portal/

Datasharing

Licensing,dataprotection,ownershipandcopyright

Creative Commons Attribution (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/

Ifthedatacannotbepublishedopenly,why? Thedataispublishedopenly.

Howwillthedatabeshared? ViatheEDSAwebsite/Github

Datarepository WehavesetupadedicatedEDSALearningLocker.Thiswaschosenforthereasonsoutlinedinhttps://learninglocker.net/benefits/

Page 33: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage33of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

DatasetLink https://alexmikro.github.io/learning‐analytics‐dataset‐from‐the‐edsa‐online‐courses‐portal/

Archivingandpreservation

Howlongshouldthedatabepreserved? Atleastuntiltheendofproject

Approxendvolume Notyetknown

Whoisresponsibleforthedatamanagementandcuration? OUleaddatamanagementandcuration.

Qualityassuranceincludingbackupprocedures

RelyingonthebackupproceduresoftheOU,asthedatasetishostedonanOUserver.

Associatedcostsfordatamanagement

Serverstoragehasalreadybeenpurchased.EffortforanalysingthedatahasbeenallocatedinTask3.4.

4.5.3 InternallogofeLearningsystems

Table15:Internallogofelearningsystems

Datasetdescription

Generatedorcollected Collected

Origin videolectures.net

Scale 20.000 videos, 17.431 lectures, 12.998 authors, 952 events, 579categories

Whoisthisusefulfor? Internaldemandanalysis

Similarexistingdatasets None.Providesevidenceofresourceusageandbasisforimprovingcurriculum,contentandcoursestructure.

Standardsandmetadata

Methodologyfordatacollection/management JSONisusedforVideolecturesAPI

Metadata,supportingmaterial VideolecturesRESTapidocumentation

Statusandlocationofmetadata N/A

Page 34: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page34of43EDSAGrantAgreementno.643937

Datasharing

Licensing,dataprotection,ownershipandcopyright

Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.

Ifthedatacannotbepublishedopenly,why?

Privacy.Datarequiresanonymisationand/oraggregation,andatthemomenttheusecaseforanonymiseddataisnotclear.

Howwillthedatabeshared? Availabletoseeatvideolectureswebsite;describedaspartofWP3deliverables

Datarepository videolecturesrepository.Proximitytodatasource.

DatasetLink Thereisnoexternallink

Archivingandpreservation

Howlongshouldthedatabepreserved? atleastuntiltheendofproject

Approxendvolume N/A

Whoisresponsibleforthedatamanagementandcuration? JSIleaddatamanagementandcuration.OUcontribute

Qualityassuranceincludingbackupprocedures

Videolectures ‐ relying on internal quality assurance & back upprocedures

Associatedcostsfordatamanagement N/A

4.5.4 Statisticsofcourseregistration,participationandcompletion

Table16:Statisticsofcourseregistration,participationandcompletion

DatasetReferenceandName

DatasetIdentifier StatisticsForCourses

Datasetdescription

Generatedorcollected Collected

Origin videolectures.net

Scale Forvideolectures‐availablepervideolecture,perviewer

Whoisthisusefulfor? Internaldemandanalysis

Page 35: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage35of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

Similarexistingdatasets None.Providesbasis for improvingcurriculum,contentandcoursestructure.

Standardsandmetadata

Methodologyfordatacollection/management JSONisusedforVideolecturesAPI

Metadata,supportingmaterial VideolecturesRESTapidocumentation

Statusandlocationofmetadata N/A

Datasharing

Licensing,dataprotection,ownershipandcopyright

Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.

Ifthedatacannotbepublishedopenly,why?

Privacy. Data requires anonymisation and/or aggregation. It isintendedthatthisdatawillbepublishedbeforetheendoftheproject.

Howwillthedatabeshared? Availabletoseeatvideolectureswebsite;describedaspartofWP3deliverables

Datarepository videolecturesrepository.Proximitytodatasource.

DatasetLink N/A

Archivingandpreservation

Howlongshouldthedatabepreserved? atleastuntiltheendofproject

Approxendvolume <1GB

Whoisresponsibleforthedatamanagementandcuration? JSIleaddatamanagementandcuration.OUcontribute

Qualityassuranceincludingbackupprocedures

videolectures ‐ relying on internal quality assurance & back upprocedures

Associatedcostsfordatamanagement N/A

Page 36: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page36of43EDSAGrantAgreementno.643937

4.5.5 Aggregatedstatisticsofengagementwiththedevelopedcoursesandeducationalresources

Table17:Aggregatedstatisticsofengagementwiththedevelopedcoursesandeducationalresources

DatasetReferenceandName

DatasetIdentifier AggregatedStatistics

Datasetdescription

Generatedorcollected Generated

Origin videolectures.net

Scale Forvideolectures‐availablepervideolecture,perviewer

Whoisthisusefulfor? Internalanalysis,demandanalysis

Similarexistingdatasets None. Provides evidence of adoption and basis for improvingcurriculum,contentandcoursestructure.

Standardsandmetadata

Methodologyfordatacollection/management JSONisusedforVideolecturesAPI

Metadata,supportingmaterial VideolecturesRESTapidocumentation

Statusandlocationofmetadata N/A

Datasharing

Licensing,dataprotection,ownershipandcopyright

Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.

Ifthedatacannotbepublishedopenly,why?

Privacy. Data that does not contain privacy issues might bepublishable

Howwillthedatabeshared? Availabletoseeatvideolectureswebsite;describedaspartofWP3deliverables

Datarepository videolecturesrepository.Proximitytodatasource.

DatasetLink N/A

Archivingandpreservation

Page 37: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage37of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

Howlongshouldthedatabepreserved? Atleastuntiltheendofproject

Approxendvolume <1GB

Whoisresponsibleforthedatamanagementandcuration? JSIleaddatamanagementandcuration.OUcontribute

Qualityassuranceincludingbackupprocedures

Videolectures ‐ relying on internal quality assurance & back upprocedures

Associatedcostsfordatamanagement Approximately1dayofeffortpermonth

4.5.6 Recorded behavior of students following the first session of theprocessminingMOOC

Table18:RecordedbehaviourofstudentsfollowingthefirstsessionoftheprocessminingMOOC

DatasetReferenceandName

DatasetIdentifier CourseraMOOCprocmin001

Datasetdescription

Generatedorcollected Collected

Origin coursera.org

Scale severallargetables

Whoisthisusefulfor? learninganalyticswithinEDSA

Similarexistingdatasets EveryCourseracoursehasthisdatarecorded

Standardsandmetadata

Methodologyfordatacollection/management DatacollectionismanagedbyCoursera

Metadata,supportingmaterial Thereisnoexternallinktothemetadata

Statusandlocationofmetadata Thereisnoexternallinktothemetadata

Datasharing

Licensing,dataprotection,ownershipandcopyright

RawdataismanagedbyTU/eandcannotbesharedduetoCourserarestrictionsofuse.

Page 38: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page38of43EDSAGrantAgreementno.643937

Ifthedatacannotbepublishedopenly,why? Restrictionsofusefromthedataprovider

Howwillthedatabeshared? Thisdatawillnotbepublishedopenly

Datarepository ThedataiscollectedbyandstoredonaCourserarepository.

DatasetLink Thereisnoexternallinktothedata.

Archivingandpreservation

Howlongshouldthedatabepreserved? N/A

Approxendvolume Around1GB

Whoisresponsibleforthedatamanagementandcuration? JoosBuijs

Qualityassuranceincludingbackupprocedures N/A

Associatedcostsfordatamanagement N/A

4.6 Workpackage4–DisseminationandcommunitybuildingWP4hascontinuedtocollectdatafromwebserverlogsandGoogleanalyticsfortheprojectwebsite,aswell as socialmedia engagementdata fromTwitterandLinkedIn.This allows formonitoringof theprojects community building and dissemination. Aggregated statistics of the networking andengagementdatawillbeproducedandincludedinD4.4andD4.5.

4.6.1 WebserverlogsandGoogleanalyticsofprojectwebsiteaccess

Table19:WebserverlogsandGoogleanalyticsofprojectwebsiteaccess

DatasetReferenceandName

DatasetIdentifier WebsiteAnalytics

Datasetdescription

Generatedorcollected Collected

Origin http://edsa‐project.eu

Scale 1website

Page 39: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage39of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

Whoisthisusefulfor? Internal analysis for dissemination and community analysis.Secondaryuseforimplicitdemandanalysis.

Similarexistingdatasets None. Provides evidence of engagement and basis for UXimprovement.

Standardsandmetadata

Methodologyfordatacollection/management

Quantitative recording of website traffic via Google Analyticsdashboard,analysedusingavarietyofanalytictools.

Metadata,supportingmaterial Sessions,Pageviews,Demographics,UserFlow,Bouncerate.

Statusandlocationofmetadata There isnometadatapublicallyavailable as thedata isnotopenlypublishedAll sections thatwillbeusedarewithinhttps://analytics.google.com/

Datasharing

Licensing,dataprotection,ownershipandcopyright

Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.

Ifthedatacannotbepublishedopenly,why?

User privacy. The data can be aggregated and published under anopenlicence.Ajudgementcallwillhavetobemadeonwhetherthisisworthit.

Howwillthedatabeshared? AnalyseddatawillbemadeavailablethroughoutdeliverablereportsinWP4.

Datarepository InternalinstitutionalSoton/OUrepositories

DatasetLink Thereisnoexternallink

Archivingandpreservation

Howlongshouldthedatabepreserved? Atleastuntiltheendofproject

Approxendvolume <1GB

Whoisresponsibleforthedatamanagementandcuration? OUleaddatamanagementandcuration.Sotoncontribute

Qualityassuranceincludingbackupprocedures Backedupremotely

Associatedcostsfordatamanagement Freestorage.0.5daypermonth

Page 40: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page40of43EDSAGrantAgreementno.643937

4.6.2 Generatedsocialmediaengagementdata

Table20:Generatedsocialmediaengagementdata

DatasetReferenceandName

DatasetIdentifier SocialMediaEngagements

Datasetdescription

Generatedorcollected Collected

Origin Twitter

Scale 1TwitterAccount

Whoisthisusefulfor? Internalanalysisforcommunitystrengthandprojectdissemination.

Similarexistingdatasets None that relate to EDSA. Provides evidence for engagement withproject, effectiveness of dissemination activities. Provides basis forunderstandingwhatcontentusersfindmostengaging.

Standardsandmetadata

Methodologyfordatacollection/management Regularaccessofdatafromanalytics.twitter.com

Metadata,supportingmaterial Tweets,Impressions,ProfileVisits,Followers,Mentions

Statusandlocationofmetadata https://analytics.twitter.com/user/edsa_project/home

Datasharing

Licensing,dataprotection,ownershipandcopyright

Datawillbelicensedincompliancewitheachsocialnetwork'stermsandconditions

Ifthedatacannotbepublishedopenly,why?

Datasharingneedstocomplywithindividualsitelicenses.Howeverthemajorityofsocialnetworksdonotpermitcollection,harvestingandrepublicationofdata

Howwillthedatabeshared? DashboardonEDSAwebsite.DeliverablereportsinWP4.

Datarepository InternalinstitutionalSotonrepositories

DatasetLink Thereisnoexternallinkasthetermsandconditionshavenotyetbeenchecked.

Archivingandpreservation

Howlongshouldthedatabepreserved? Untiltheendoftheproject

Page 41: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage41of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

Approxendvolume <1GB

Whoisresponsibleforthedatamanagementandcuration? Sotonleaddatamanagementandcuration.

Qualityassuranceincludingbackupprocedures Backedupremotely

Associatedcostsfordatamanagement Freestorage.1daypermonth

4.7 Workpackage5–ExploitationWP5willgenerateanon‐goinglistofestablishedcollaborationinitiatives,institutionsbenefitingfromtheproject and geographical regions using the project’s results. TheEDSARegister is an additionaldatasetthatcomesunderthisworkpackage.

4.7.1 Listofprojectexploitationresults ‐collaborations,institutionalandgeographicalbeneficiaries

Table21:Listofprojectexploitationresults‐collaborations,institutionalandgeographicalbeneficiaries

DatasetReferenceandName

DatasetIdentifier ProjectExploitation

managementdescription

Generatedorcollected Generated

Origin Projectpartners

Scale Variable

Whoisthisusefulfor? Internalanalysisforresultstobeexploitedandtargets

Similarexistingdatasets None.Providesdataondisseminationactivity,networkandresults.

Standardsandmetadata

Methodologyfordatacollection/management Reportdetailingresultsfrominterviewsandexploitationactivities

Metadata,supportingmaterial Thisdatawillbeinternalonly

Statusandlocationofmetadata Thisdatawillbeinternalonly

Datasharing

Page 42: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

Page42of43EDSAGrantAgreementno.643937

Licensing,dataprotection,ownershipandcopyright

Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.

Ifthedatacannotbepublishedopenly,why? Confidentiality

Howwillthedatabeshared? DeliverablereportsinWP5.

Datarepository Googledocsshareddocument

DatasetLink Thisdatawillbeinternalonly

Archivingandpreservation

Howlongshouldthedatabepreserved? Untiltheendoftheproject

Approxendvolume <500MB

Whoisresponsibleforthedatamanagementandcuration? ideXlableaddatamanagementcuration

Qualityassuranceincludingbackupprocedures Backedupremotely

Associatedcostsfordatamanagement Freestorage.1daypermonth

4.7.2 TheEDSARegister

Table22:TheEDSARegister

DatasetReferenceandName

DatasetIdentifier EDSARegister

Datasetdescription

Generatedorcollected Generated

Origin Projectpartners

Scale <500KB

Whoisthisusefulfor?Anyone interested in understanding the datasets used within theEDSAproject.Internalmanagementtool.

Similarexistingdatasets None.

Standardsandmetadata

Page 43: D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D5.6 Updated EDSA Data Management Plan

D5.6UpdatedEDSADataManagementPlanPage43of43

2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.  

Methodology for datacollection/management

Project partners update every three months until the end of theproject. ODI responsible for conversion to CSV and publication asopendata.

Metadata,supportingmaterialAREADME.mdfileisavailabledetailingthedatastructureandbasicusage.

Statusandlocationofmetadata https://theodi.github.io/european‐data‐science‐academy‐register/

Datasharing

Licensing, data protection,ownershipandcopyright ThisdatasetispublishedonGithub,underaCC‐BYlicence.

Ifthedatacannotbepublishedopenly,why? N/A

Howwillthedatabeshared?Via Github and via the EDSA website (http://edsa‐project.eu/resources/datasets/)

Datarepository Github

DatasetLink https://theodi.github.io/european‐data‐science‐academy‐register/

Archivingandpreservation

How long should the data bepreserved?

As long as Github exists as a minimum. Beyond that a valuejudgementwouldhavetobemade.

Approxendvolume <500KB

Whoisresponsibleforthedatamanagementandcuration?

ODI leaddatamanagementandcuration,otherWP1partnerswillcontribute

Quality assurance includingbackupprocedures Storedinexternalrepositories‐EDSAwebsiteandGithub

Associated costs for datamanagement

Stored in external repositories ‐ EDSA website and Github;approximately2dayspermontheffortformaintenance.