DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

26
DDI 3 Comparison Test- Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR

Transcript of DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

Page 1: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison Test-Case at ICPSR

Sanda IonescuDocumentation Specialist

ICPSR

Page 2: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 - Comparison“Research Questions”

• How can we use DDI 3 to document comparability, and support data harmonization projects?– Explore use of Comparative module (information

coverage, functionality)– Compare use of Comparative module and use of

inheritance through grouping: are both methods as effective in capturing necessary information?

– Can we build a tool to assist in documenting comparability and data harmonization in DDI 3? What would such a tool look like?

Page 3: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-caseBackground

• DDI 3 markup was applied to the “Adult Demographics” variables of three nationally representative surveys on mental health, integrated in the Collaborative Psychiatric Epidemiology Surveys (CPES):

– The National Comorbidity Survey Replication (NCS-R) – The National Latino and Asian American Study (NLAAS)– The National Survey of American Life (NSAL) http://www.icpsr.umich.edu/CPES/

Page 4: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-caseBackground

• CPES studies :– Conducted individually but with comparison in mind.– May be analyzed independently.– NOT longitudinal design (all collected 2001-2003)– Comparison intended across populations, or subpopulations,

of the USA: • NCSR – US national probability sample• NLAAS – target populations: Latino and Asian-American• NSAL – target populations: African-American and Afro-

Carribean – Comparability could be documented using either group and

inheritance, or the comparative module.

Page 5: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-caseBackground

• Choosing between use of Group/Inheritance or Comparison module– Comparison by design vs. post-hoc comparison:

sometimes not a clear-cut distinction, suggesting possibility of using either method (?)

– Important to know what are the practical implications of using either method – advantages, disadvantages, issues related to applying markup and/or processing: test by documenting the same example in both ways.

Page 6: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-caseBackground

• A typical harmonization process workflow was outlined based on an ongoing ICPSR project seeking to produce a harmonized dataset of ten U.S. family and fertility surveys, belonging to three different, but related, series of longitudinal data:– Growth of American Families, 1955 and 1960– National Fertility Survey, 1965 and 1970– National Survey of Family Growth, Cycles I-VI (1973, 1976,

1982, 1988, 1995, and 2002)

(Integrated Fertility Survey Series – IFSS: http://www.icpsr.umich.edu/IFSS/)

Page 7: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case

• Harmonization procedure:– Datasets are searched (by keyword or concept, if

available).– Potentially comparable variables are selected.– Complete variable descriptions are extracted from

existing documentation:• Variable name (and label)• Question text / textual description of variable• Physical representation (values, value labels, etc.)• Universe• Question context (preceding questions)

Page 8: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case• Harmonization procedure (continued):– Similarities/differences in listed elements are

examined.– A harmonized variable is projected based on the

findings in the step above (there are no fixed rules, this is done on a case-by-case basis).

– A decision is made regarding the action on the component variables (recode, or simply add).

– Statistical software commands are generated and applied to data to create new harmonized dataset.

Page 9: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case

• Harmonized dataset is documented.• New variables description includes:– Information about source variables.– Information about aggregation procedure

(recodes, etc.)– Information about similarities and differences in

source variables compared with the harmonized one (usually in the form of a note).

Page 10: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case

How does DDI 3 fit in the harmonization procedure?• When a harmonized dataset is being produced,

documenting pairwise comparisons between source variables in DDI as an intermediary step (pre-harmonization) appears to be superfluous:– It does not assist in the decision-making process, which takes a

more holistic approach, assessing candidate variables as a group

– It would involve an expense of time and effort that would not be justified by its limited/transitory utility (since the harmonized variable would capture the comparability among sources anyway)

Page 11: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case

How does DDI 3 fit in the harmonization procedure?

• When a harmonized dataset is being produced, there is greater benefit in using the comparison module to document similarities and differences between the harmonized variable and each of its sources (post-harmonization) :– This kind of documentation is required by harmonization

best-practices anyway– Information about the comparability among source

variables may also be recreated by parsing their pairwise comparison with the harmonized one.

Page 12: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case

• How does DDI 3 fit in the harmonization procedure? Post-harmonization:

DDI 3Documentation

Individual studiesSearch Display Harmonize

dataExamine

Document harmonized dataset

and source comparison in DDI 3

DisplayDisseminate

DiscoverAnalyze

Page 13: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-caseHow does DDI 3 fit in the harmonization

procedure?• If a harmonized dataset is NOT being produced, then

it is useful to document the comparability of “original” variables to assist data users in analysis.

NO harmonization:

DDI 3Documentation

Individual studies

Search Display Examine

Document comparability in DDI 3

DisplayDisseminate

Discover Analyze

Page 14: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case

How can a tool assist in documenting comparability in DDI 3 ?

• (Projected) Tool: – Searches DDI documentation of individual studies with full variable

descriptions– Allows narrowing down results to customized selection– Provides same page display of selected variables’ descriptions

(ideally complete with concept and universe statements)– Search results are saved, and may be retrieved, to facilitate

variables evaluation, decisions about harmonizing them, and ultimately help develop a translation table

• Steps above available in ICPSR SSVD – Internal Search

Page 15: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case(Projected) Tool: Example customized selection

Page 16: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case• Potential/Projected Tool:

– On the selected search results list, allows further pairwise selection and display of variables with full descriptions

– Interactive feature allows user to flag as similar or different the elements in the variables descriptions

– Based on the information entered in the step above,

• DDI 3 Comparison module is created.– Elements flagged as similar or different are listed in the

<Correspondence><Commonality> or <Correspondence><Difference> fields

– The <CommonalityTypeCoded> element may be filled in an automated way based on the information entered above (all common=“identical”; some different=“some”; use of “none”?)

Page 17: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-caseUse of the Comparison Module

• The Comparison Module: Structure– Maps: Concepts, Variables, Questions, Categories, Codes,

Universes.– MAP: SourceSchemeReference (M)

TargetSchemeReference (M) Correspondence (M)

ItemMap: SourceItem (M)TargetItem (M)Correspondence: Commonality (M)

Difference (M)CommonalityTypeCoded (O,

NR)CommonalityWeight (O,NR)

UserDefinedCorrespProperty (O,R)

Page 18: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case• Used by ICPSR in CPES markup example:– Commonality– DifferenceAre mandatory.If the list of elements is structured and used consistently, may

become machine-actionable, eliminating the need for the User Defined Correspondence (Should we enable an optional CV to allow interoperability? -Such a list would only apply to one type of map – variables, in our case)

– CommonalityTypeCoded with the proposed CV:• Identical• Some• None

Page 19: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-caseHTML view of Variable Map in DDI 3 Comparison Module

Page 20: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case

• Using XSLT to (re)create the variables cross-walk from the pairwise comparisons:– If we compare sources with a harmonized

variable, the latter will always be the “target”.– A -> H– B -> H– C -> H– In this case the crosswalk will be relatively easy to

create.

Page 21: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case• Using XSLT to (re)create the variables cross-walk from

the pairwise comparisons:• If we compare individual variables for analysis purposes, creating a cross-

walk can become very difficult/labor intensive:• A->B• B->C• A->C • A->D • B->D • C->D

– There is nothing in the discrete pairs to indicate their relationship; parsing done by multiple iterations results in duplications that need to be cleaned up; “source” and “target” denotations become irrelevant, but give the relationship a directionality which makes it more difficult to process

Page 22: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case• Recreating the variables cross-walk from the

pairwise comparisons:• Same structure used for handling two different types of

comparison (pre-harmonized and post-harmonized)• Do we need a different model / structure for

comparing “original” (individual) variables ? • Or some additional element that would provide a key

for the pairs needing to be linked? Explore possibility to use of ItemMap@alias?• Use a different solution than XSLT to create cross-walk?

(more sophisticated programming may be needed to capture complex relationships)

Page 23: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case

• Use of Comparison Module: Questions/Comments

– We normally include items (i.e., variables in our case) that have some degree of comparability. “None” would not be routinely used.

– Use of CommonalityWeight is optional: a scale of weights would have to be defined

– UserDefinedCorrespondenceProperty may replace CommonalityTypeCoded in user-specific cases

– Map structure identical (except for codes) but items compared are organically different : not all elements are relevant in all maps. (For variables we find it necessary to list similar and different components of their description, but for universes, or questions, etc., comparison would be at a more conceptual level)

Page 24: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case

• Use of Comparison Module: Questions/Comments

Comparing non-harmonized variables:– Is there a rationale for documenting comparability between

their components as well (in addition to flagging them as similar or different)?

– The Comparison module does not provide links between items included in different maps, and the same item (question, universe, code scheme) may be used by multiple variables that are part of different mappings

– The complete variable descriptions may be pulled from the Logical Product

Page 25: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case

• Use of Comparison Module: Questions/Comments Comparing harmonized variables with their sources:– The GenerationInstruction sequence in Code Map allows

referencing source variable(s) and may document the recodes performed to harmonize it.

– This sequence mirrors the Coding:GenerationInstruction section in the Data Collection module.

– Coding is Identifiable (may be referenced by the resulting variable), GenerationInstruction is not Identifiable (cannot be referenced).

Page 26: DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR.

DDI 3 Comparison test-case

• Use of Comparison Module: Questions/Comments– Documentation of comparability is “dissociated” from

individual variables descriptions– Could group+inheritance be a more effective way to capture

both variable descriptions and their comparability, while at the same time allowing a complete description of individual datasets, including variables that have no comparable counterparts?

– Test by documenting the same data in both ways – when V3.1 is published, to allow identification of variable Name (in some instances, the only element that changes)