Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for...

25
www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD education statistics questionnaire Michael Bruneforth, UNESCO Institute for Statistics Expert Group on SDMX [email protected] May 10, 2007, UN, Geneva

Transcript of Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for...

Page 1: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

Building SDMX Data Structure Definitions based on a generic conceptual model for contents

Experience with the joint Eurostat-Unesco-OECD education statistics questionnaire

Michael Bruneforth, UNESCO Institute for Statistics

Expert Group on SDMX

[email protected] May 10, 2007, UN, Geneva

Page 2: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

Overview

The world of international education data collections

Why building a conceptual model

Steps to build the model

The model

From the model towards a SDMX data structure definition

Page 3: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

The world of international education data collections

Page 4: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

The UNESCO-UIS / OECD / EUROSTAT (UOE) Data Collection on Education Statistics

• EXCEL based questionnaire, organized in 31 work sheets• 47 countries, 14,000+ data points• Changes: 2003

The World Education Indicators Project (WEI)• Based on UOE Instruments, extended by 10 work sheets• 16 countries , >15,000+ data points• Examples at www.uis.unesco.org/publications/wei2006

The UIS Survey• Pdf based E-Questionnaire infrastructure, plus paper form• All remaining countries, 5,000+ data points• Examples at www.uis.unesco.org -> current surveys

Instruments used in the system of international education data collections

Page 5: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

Instruments used in the system of international education data collections (II)

UOE

i ii i i

i i

UIS

Can be transformed

Can be transformed

WEI

Page 6: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

Education Questionnaires: ever changing

1998• Tables were introduced after ISCED 97 was adopted.

2000• Redesign of Finance tables.

2001 – 2005: ???

2005• Major redesign: Tables redesigned, some tables spilt or combined.

2006• In ENRL8a; ENRL8b and ENRL8c: the Caribbean countries are now included with Latin America

instead of Northern America.

2007• In table ENRL-7, three new sub-categories, “unknown residence”, “unknown prior education”,

and “unknown citizenship” have been added..

• In ENTR-2 a new row has been added to collect typical age of entry.

• In GRAD-1 and GRAD-3 a new row has been added to collect typical graduation age.

Page 7: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

Why building a conceptual model?

Meta data

• Theoretical basis for describing data

• Visualization of data

• Validation of codes

Questionnaire design

• Improving internal consistency in questionnaires

• Maintaining the coding schemes:

» Avoiding random or ad-hoc data descriptions leading to inconsistent, incomprehensible systems

(we need discipline as much as a model!)

Page 8: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

Why using a conceptual model as basis for SDMX?

A model describes a universe of questionnaires

• Consistency across questionnaires

• Consistency across tables

• Consistency across statistical units

• Facilitates adaptation of SDMX to changes to tables» Typically no/few keys need to be changed, most new data can be defined using existing keys

A model can be used to describe indicators and derived data

• SDMX exchange of results (->WorldBank, MDG)

A model can be transformed into/from data base definitions

• Use of existing meta data (efficiency)

• Avoid redundant information (less error prone)

• Basis to match national data to international SDMX definitions

Page 9: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

Building the model

Step 1: Bo Sundgren’s analysis of the UIS Questionnaire

Step 2: Analysis of the relational data base at UIS

Step 3: Correction / Expansion of Bo’s model

Step 4: Model verification1: review of UOE questionnaires

Step 5a: Model verification2: Transformation of UIS database model to conceptual model, automated creation of full code list

Step 5b: Model verification2: Analysis of the relational data base at OECD

Step 6: Creation of data structure definition based on existing meta data

Page 10: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

Student

.count .partTimeFraction.sum x age x sex x origin (array) x previousEducation

Enrolment

- partTimeFraction x workmode x repeater x completer x entrant x (adjustement)

ProgramExecution(Class, pedagogical unit, …)

.count x adult x grade x ISCED.field x location

Institutions(education provider; non-instructional institutions; …)

x sector x InstType

EducationProgram(Utility)

- startingAge - duration - Name x ISCED.level x ISCED.orientation x ISCED.destination x ISCED.degreePos

is enrolled in

for

of

provides

Page 11: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

Example 1: Students and Repeater

ENRL3

COUNTRY

School year, data collection period: Please indicate the dates in table ENRL1a. Sources:

Methods:

LEVEL OF EDUCATION PRIMARY (ISC 1)

LOWER SECONDARY

(ISC 2)

UPPER SECONDARY

(ISC 3)PRIMARY

(ISC 1)

LOWER SECONDARY

(ISC 2)

UPPER SECONDARY

(ISC 3) UOE version:All educational programmes

All general programmes

All general programmes

All educational programmes

All general programmes

All general programmes

TOTAL PUBLIC AND PRIVATE INSTITUTIONS 1 2 3 4 5 6TOTAL FULL-TIME AND PART-TIMETotal males and females Grade groups

A1 Total: All grade groups (within ISC-Level) (A2toA12) X

A2 Grade 1 (within ISC-Level) (A14+A26)

A3 Grade 2 (within ISC-Level) (A15+A27)

A4 Grade 3 (within ISC-Level) (A16+A28)

A5 Grade 4 (within ISC-Level) (A17+A29)

A6 Grade 5 (within ISC-Level) (A18+A30)

A7 Grade 6 (within ISC-Level) (A19+A31)

A8 Grade 7 (within ISC-Level) (A20+A32)

A9 Grade 8 (within ISC-Level) (A21+A33)

A10 Grade 9 (within ISC-Level) (A22+A34)

A11 Grade 10 (within ISC-Level) (A23+A35)

A12 Grade unknown (within ISC-Level) (A24+A36)

Males Grade groups

A13 Total: All grade groups (within ISC-Level) (A14toA24)

A14 Grade 1 (within ISC-Level)

A15 Grade 2 (within ISC-Level) Y

A16 Grade 3 (within ISC-Level)

NUMBER OF STUDENTS AND REPEATERS (ISC123) IN GENERAL PROGRAMMES BY LEVEL OF EDUCATION, SEX AND GRADE

Number of repeatersNumber of students Block Check Global Check & Save

Row Instructions

RowNotes ColumnNotes CellNotes

Missing Value Codes:

Page 12: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

Student

.count .partTimeFraction.sum x age x sex x origin (array) x previousEducation

Enrolment

- partTimeFraction x workmode x repeater x completer x entrant x (adjustement)

ProgramExecution(Class, pedagogical unit, …)

.count x adult x grade x ISCED.field x location

Institutions(education provider; non-instructional institutions; …)

x sector x InstType

is enrolled in

for

of

provides

EducationProgram(Utility)

- startingAge - duration - Name

x ISC.level: 2 x ISC.orientation: General x ISCED.destination x ISCED.degreePos

Count of lower secondary general students

Page 13: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

Student

.count .partTimeFraction.sum x age

x sex: Male x origin (array) x previousEducation

Enrolment

- partTimeFraction x workmode

x repeater: Yes x completer x entrant x (adjustement)

ProgramExecution(Class, pedagogical unit, …)

.count x adult

x grade: 2 x ISCED.field x location

Institutions(education provider; non-instructional institutions; …)

x sector x InstType

EducationProgram(Utility)

- startingAge - duration - Name

x ISC.level: 2 x ISC.orientation: General x ISCED.destination x ISCED.degreePos

is enrolled in

for

of

provides

Count of male lower secondary general repeater at grade 2

Page 14: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

Example 2: Students and Classes

Class1 AVERAGE CLASS SIZE BY LEVEL OF EDUCATIONCountry AND BY TYPE OF INSTITUTIONS

School year start (mm/yyyy): Sources:

School year end (mm/yyyy): Methods:

PRIMARY Education (ISC 1)

LOWER SECONDARY

SCHOOLS (ISC 2) UOE version:

TYPE OF INSTITUTIONS All regular

programmesAll general

programmes

1 2TOTAL: Public and private institutions

A1 Average class size

A2 Number of students

A3 Number of classes

Public institutions

A4 Average class sizeA5 Number of students X

A6 Number of classes Y

Page 15: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

Count of lower secondary general classes and students, class size

Student

.count .partTimeFraction.sum x age x sex x origin (array) x previousEducation

Enrolment

- partTimeFraction x workmode x repeater x completer x entrant x (adjustement)

ProgramExecution(Class, pedagogical unit, …)

.count x adult x grade x ISCED.field x location

Institutions(education provider; non-instructional institutions; …)

x sector x InstType

is enrolled in

for

of

provides

EducationProgram(Utility)

- startingAge - duration - Name

x ISC.level: 2 x ISC.orientation: General x ISCED.destination x ISCED.degreePos

Page 16: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

Example 2: Students and Classes

ANNUAL INTAKE BY LEVEL OF EDUCATION

AND PROGRAMME DESTINATION

TOTAL PUBLIC AND PRIVATE INSTITUTIONS UPPER POST-

SECONDARY

TOTAL FULL-TIME AND PART-TIME SECONDARY NON-TERTIARY

Total males and females ISCED 3 ISCED 4 ISCED 5A ISCED 5B ISCED 6

Enrolment 1 2 3 4 5

A1Total number of students enrolled (ENRL1, row A1) (A2+A3+A4)

Of which:

A2 New entrants (B1)

A3 Re-entrants

A4 Continuing students

New entrants

B1 New entrants (from A2) (B2+B3)

Of which:

B2With previous education at the other tertiary level

B3Without any previous education at the tertiary level

TERTIARY

Page 17: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

Count of new entrants to tertiary 5B with previous tertiary education

Student

.count .partTimeFraction.sum x age x sex x origin (array)

x previousEducation: 5A

Enrolment

- partTimeFraction x workmode x repeater x completer

x entrant: New entrant x (adjustement)

ProgramExecution(Class, pedagogical unit, …)

.count x adult x grade x ISCED.field x location

Institutions(education provider; non-instructional institutions; …)

x sector x InstType

is enrolled in

for

of

provides

EducationProgram(Utility)

- startingAge - duration - Name

x ISC.level: 5 x ISC.orientation: General

x ISCED.destination: B x ISCED.degreePos

Page 18: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

Student

.count .partTimeFraction.sum x age x sex x origin (array) x previousEducation

Enrolment

- partTimeFraction x workmode x repeater x completer x entrant x (adjustement)

EducationStaff(Teacher, …)

.partTimeFraction.sum x age x sex x training

Engagement

.count - partTimeFraction x workmode x engagementType

Institutions(education provider; non-instructional institutions; …)

x sector x InstType

EducationProgram(Utility)

- startingAge - duration - Name x ISCED.level x ISCED.orientation x ISCED.destination x ISCED.degreePos

EducationSystem(Utility)

- compulsoryEducationStart - compulsoryEducationEnd - academicYearBeginn - academicYearEnd - financialYearBeginn - country - currency

Funder(Governements, private entities, …)

x sector

Expenditure

.amount.sum x nature

is enrolled in

belongs to

for

isEngagedIn

for

of

spends on

belongs to

transfers to/spends on

receives transfer

receives transfer

ProgramExecution(Class, pedagogical unit, …)

.count x adult x grade x location

provides

Page 19: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

Institutions(education provider; non-instructional institutions; …)

x sector x InstType

Funder(Governements, private entities, …)

x sector

Expenditure

.amount.sum x nature

Householdsspends on

transfers to/spends on

receives transfer

receives transfer

receives transfer

transfers to/spends on

Page 20: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

Principles for the generation the detailed model for individual data points

Use existing meta data

Avoid multiple capturing of questionnaire information

Ensure consistency with existing systems

Page 21: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

Generate the detailed model for individual data points

Cell ID: 1052108025 (ENTR1-B2:4, Version 2005 to 2099)

Object: Student: Count

-age: Total -sex: Female -origin: All -previousEducation: 5A

enrolled in -> Object: Enrolment

-workmode: Total -repeater: NO -completer: NO -entrant: First time entrant

for -> Object: ProgramExecution

-adult: Total -grade: Total -ISCED.field: Total -location: Total

of -> Object: EducationProgram

-ISCED.level: 5 -ISCED.orientation: Total -ISCED.destination: B -ISCED.degreePos: First

provided by -> Object: Institution -sector: Total -InstType: Instructional

Page 22: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

The basis: the UIS meta data (relational database description)

Page 23: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

Example: UIS meta data (relational database codes, XML version)

Page 24: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

What is needed beyond the model to get a complete data structure definition?

The data structure definition has to cope with data points collected twice.

• Total number of primary students is collected in ENRL1a, ENRL1, ENRL3, ENRL4, CLASS1

The data structure definition has to cope with adjustements to data concerning coverage of data.

• The count of student is collected with coverage adjusted to expenditure data.

Page 25: Www.uis.unesco.org Building SDMX Data Structure Definitions based on a generic conceptual model for contents Experience with the joint Eurostat-Unesco-OECD.

www.uis.unesco.org

Questions, comments?

Education content: Michael Bruneforth ([email protected])

IT: Brian Buffett ([email protected])

Thanks