Data Quality Control and Assurance · Quality assurance and Quality control Data contamination...
Transcript of Data Quality Control and Assurance · Quality assurance and Quality control Data contamination...
DataQualityControlandAssuranceDataONECommunityEngagement&OutreachWorking
Group
Definitions
QualityassuranceandQualitycontrolDatacontaminationTypesoferrors
QA/QCbestpractices
BeforedatacollectionDuringdatacollection/entryAfterdatacollection/entry
CCimagebycobalt123onFlickr
LessonTopics
LearningObjectivesAftercompletingthislesson,theparticipantwillbeableto:
DefinedataqualitycontrolanddataqualityassurancePerformqualitycontrolandassuranceontheirdataatallstagesoftheresearchcycle
CCimageby0xFCAFonFlickr
TheDataLifeCycle
DataONELifeCycle
DefinitionsDataContamination:
Processorphenomenon,otherthantheoneofinterest,thataffectsthevariablevalueErroneousvalues
Definitions:TypesofErrorsErrorsofCommission
IncorrectorinaccuratedataenteredExamples:malfunctioninginstrument,mistypeddataErrorsofOmissionDataormetadatanotrecordedExamples:inadequatedocumentation,humanerror,anomaliesinthefield
CCimagebyNickJWebbonFlickr
DefiningQA/QCStrategiesforpreventingerrorsfromenteringadatasetActivitiestoensurequalityofdatabeforecollectionActivitiesthatinvolvemonitoringandmaintainingthequalityofdataduringthestudy
QA/QCBeforeCollectionDefine&enforcestandards
FormatsCodesMeasurementunitsMetadataAssignresponsibilityfordataqualityBesureassignedpersoniseducatedinQA/QC
QA/QCDuringDataEntryDoubleentry
DatakeyedinbytwoindependentpeopleCheckforagreementwithcomputerverification
RecordareadingofthedataandtranscribefromtherecordingUsetext-to-speechprogramtoreaddataback
CCimagebyweskrieselonFlickr
QA/QCDuringDataEntryDesigndatastoragewell:
MinimizenumberoftimesitemsthatmustbeenteredrepeatedlyUseconsistentterminologyAtomizedata:onecellperpieceofinformationDocumentchangestodataAvoidsduplicateerrorcheckingAllowsundoifnecessary
MakesuredatalineupinpropercolumnsNomissing,impossible,oranomalousvaluesPerformstatisticalsummaries
QA/QCAfterDataEntry
CCimagebycobalt123onFlickr
QA/QCAfterDataEntryLookforoutliers:
OutliersareextremevaluesforavariablegiventhestatisticalmodelbeingusedThegoalisnottoeliminateoutliersbuttoidentifypotentialdatacontamination
Methodstolookforoutliers
GraphicalNormalprobabilityplotsRegressionScatterplots
MapsSubtractvaluesfrommean
QA/QCAfterDataEntry
SummaryDatacontaminationisdatathatresultsfromafactornotexaminedbythestudythatresultsinaltereddatavaluesDataerrortypes:commissionoromissionQualityassuranceandqualitycontrolarestrategiesfor
preventingerrorsfromenteringadatasetensuringdataqualityforentereddatamonitoring,andmaintainingdataqualitythroughouttheproject
IdentifyandenforcequalityassuranceandqualitycontrolmeasuresthroughouttheDataLifeCycle
ResourcesD.Edwards,inEcologicalData:Design,ManagementandProcessing,WKMichenerandJWBrunt,Eds.(Blackwell,NewYork,2000),pp.70-91.Availableatwww.ecoinformatics.org/pubsR.B.Cook,R.J.Olson,P.Kanciruk,L.A.Hook,Bestpracticesforpreparingecologicaldatasetstoshareandarchive.Bull.Ecol.Soc.Amer.82,138-141(2001).A.D.Chapman,“PrinciplesofDataQuality:.ReportfortheGlobalBiodiversityInformationFacility”(GlobalBiodiversityInformationFacility,Copenhagen,2004).Availableathttp://www.gbif.org/communications/resources/print-and-online-resources/download-publications/bookelets/
AboutParticipateinourGitHubrepo:https://dataoneorg.github.io/dataone_lessons/
Suggestedcitation:DataONEEducationModule:DataManagement.DataONE.RetrievedNovember12,2016.Fromhttp://www.dataone.org/sites/all/documents/L01_DataManagement.pptx
Copyrightlicenseinformation:Norightsreserved;youmayenhanceandreuseforyourownpurposes.WedoaskthatyouprovideappropriatecitationandattributiontoDataONE.