NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is...

38
28 October 2014 1 NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & RESPONSE RATES

Transcript of NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is...

Page 1: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

28 October 2014

1

NEC METHODS: MATCHING,

DEDUPLICATION, ANALYSIS &

RESPONSE RATES

Page 2: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Matching & Deduplication 2

Page 3: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Purpose of the Merged Analytic Cross-

Region Datasets 3

PIF-ER Merged Dataset

Analyses on types of trainees who attended particular

events

PIF-ER-ACRE Merged Dataset

Analyses on outcomes of AETC training programs related

to self-assessed changes in provider behavior and clinical

practice.

Page 4: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Analytic Dataset Creation Overview

1. Collect regional process and evaluation data

2. Convert data in submitted format (Excel, CSV, SPSS) to SAS

3. Reformat regional datasets to match expected data file specifications (e.g., character/numeric type)

Process data: HRSA data manual

Evaluation data: ACRE implementation manual

4. Create all-region ER, PIF, ACRE IP, ACRE FUP, and FTCC PIF datasets by concatenating/appending regional files of the same type

5. Create analytic PIF-ER merged dataset

6. Create analytic PIF-ER-ACRE datasets

4

Page 5: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Cross-Region Analytic Data 5

Steps 1, 2, 3, 4:

Collect, convert, reformat

data. Create all-region ER,

PIF, ACRE IP and FUP

datasets.

Step 5:

Create analytic ER-PIF dataset

Step 6:

Create analytic ER-PIF-ACRE dataset

Page 6: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Creating the Analytic PIF-ER Merged

Dataset 6

Check to see which regions have repeats on

PROG_ID by LPS

Merge PIF and ER

For 1-2 regions with repeated PROG_ID, sort and

merge the PIF and ER by AETC – LPS – and PROG_ID

For all other regions that have distinct PROG_ID, sort

and merge the PIF and ER by AETC and PROG_ID

only

PROG_ID AETC LPS

Bottom of PIF:

Page 7: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Creating the Analytic PIF-ER-ACRE

Merged Dataset (1) 7

Select eligible ACRE IP data

Check to see which regions have repeats on PROG_ID by LPS

Exclude records where all 4 IP questions are missing/blank

Exclude records where the PIF_ID is . [missing], 0, or 99999999

De-duplicate IP records by AETC, LPS (if applicable), PROG_ID, PIF_ID, AIP1, AIP2

Select eligible records from the previously created ER-PIF merged dataset

Include only records where there is at least 1 PIF record included (e.g., there are some ERs without any PIFs)

Exclude records where the PIF_ID is . [missing], 0, or 99999999

Cont.’d

Page 8: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Creating the Analytic PIF-ER-ACRE

Merged Dataset (2) 8

Sort the ER-PIF and the ACRE IP data by AETC LPS (if applicable) PROG_ID

PIF_ID. The ER-PIF dataset is further sorted by PIFDATE

Merge the ER-PIF-IP by AETC LPS PROG_ID PIF_ID

De-duplicate the data based on the key variables AETC, LPS (if

applicable), PROG_ID, PIF_ID [*Note, this deletes <200 records]

Sort the all-region ACRE FUP by AETC LPS (if applicable) PROG_ID PIF_ID

Sort the previously created ER-PIF-IP dataset by AETC LPS (if applicable)

PROG_ID, PIF_ID

Merge the ER-PIF-IP with the ACRE FUP by these key variable

Restrict the analytic dataset to records with a valid, non-missing PIF_ID with

a PIF available [Note, approx 20K records removed]

Page 9: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

PIF ID 9

PIF ID is available on the PIF, ACRE IP, and ACRE FUP data

Though not on the ER form, the Program ID on the PIF and ER allows PIF IDs to be associated with events

PIF ID used for matching

Across training events (repeat trainees)

Across evaluation forms (ACRE IP and FUP)

month of birth + day of birth + last 4 digits of SSN

PIF_ID

Page 10: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

NEC valid PIF ID algorithm 10

Valid PIF ID contains:

Valid month of birth (1-12)

Valid day of birth (1-31)

Valid last 4 digits of SSN (≥1 and not 9999)

Valid PIF ID is a numeric value <99999999

Examples of invalid PIF IDs:

99999999

0

. [missing]

12345678

04049999

1122420932

Records with invalid PIF IDs are excluded from regression analyses

Page 11: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

De-Duplication Examples 11

For overall ACRE regression analyses:

ER-PIF-ACRE dataset restricted to records with a valid PIF ID and with a linked PIF

Restricted dataset sorted by combined AETC region, PIF ID, eligibility for ACRE IP, having associated IP record, and PIF date

Last record is outputted

For MAI ACRE regression analyses, similar:

ER-PIF-ACRE dataset restricted to records with a valid PIF ID and with a linked PIF

Restricted ER-PIF-ACRE dataset sorted by combined AETC region, PIF ID, having an MAI training record, eligibility for ACRE IP, having associated IP record, and PIF date

Last record is outputted

Page 12: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Recoding & Analysis 12

Page 13: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Eligible Records for ACRE Regression

Analyses 13

Last eligible record among repeat trainees is used

“Eligible” means the PIF_ID is not an invalid code according to the NEC algorithm, there is truly an associated PIF in the linked dataset

Analytic population includes:

For IP: targeted IP trainee (i.e., attended Level 1, 2, or 3 training), who has an associated PIF and IP record, and is a direct HIV provider (PIF13=1)

For FUP: targeted FUP trainee (i.e., attended Level 2 training and topic included clinical management [ER4_1-16] or prevention and behavior change [ER4_29-31] topics), who has an associated PIF and FUP record, and is a direct HIV provider (PIF13=1)

Page 14: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

ACRE IP Eligible Trainings

Event Record form

14

ACRE immediate post questions asked immediately after training event

ER9_3>0

ER9_2>0

ER9_1>0

-OR-

-OR-

Page 15: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

ACRE FUP Eligible Trainings

-AND-

ANY

Event Record form

15

ACRE follow-up asked 6 weeks after training through a web-based survey

ER9_2>0

ER4_1=1 or

ER4_2=1 or

etc.

…. or

ER4_31=1

Page 16: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

FY 11/12 AETC Cross-Region Trainees

in IP Analyses 16

Data source: cross-region ER-PIF and ACRE IP FY11-12.

N = 108,687 excludes n = 2,459

event records without a PIF

associated and n = 5,736 records

with an invalid PIF ID. This number

includes repeat trainees.

Though n = 93,756 records

fulfilled the IP target criteria,

n = 42,465 (45.3%) ER-PIF-

IP records that linked and

fulfilled the target.

Of these, n = 15,979

(52.7%) indicated they were

direct HIV providers on the

PIF.

N = 72,642

ACRE IP records received by

NEC

N = 108,687

FY 11-12 trainees (based on

linked AETC PIF and ER)

n = 45,452

linked ER-PIF-ACRE IP

n = 42,465

linked records and a

targeted IP training

n = 2,987

linked records and NOT a

targeted IP training

n = 30,331

linked records, IP targeted,

and trainee’s last record in FY 11-12

Page 17: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

FY 11/12 AETC Cross-Region Trainees

in FUP Analyses 17

Data source: cross-region ER-PIF and ACRE FUP FY11-12.

N = 3,847

ACRE FUP records received

by NEC

N = 108,687

FY 11-12 trainees (based on

linked AETC PIF and ER)

n = 2,620

linked ER-PIF-ACRE FUP

n = 2,018

linked records and a

targeted FUP training

n = 602

linked records and NOT a

targeted FUP training

n = 1,707

linked records, FUP targeted,

and trainee’s last record in FY

11-12

N = 108,687 excludes n = 2,459

event records without a PIF

associated and n = 5,736 records

with an invalid PIF ID. This number

includes repeat trainees.

Though n = 61,647 records

fulfilled the FUP target

criteria, n = 2,018 (3.3%)

ER-PIF-FUP records that

linked and fulfilled the

target.

Of these, n = 1,014 (59.4%)

indicated they were direct

HIV providers on the PIF and

FUP survey.

Page 18: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Analytic Variables 18

Regression models have included the following predictors:

Big 6

Worked in Ryan White funded setting

Minority provider

Minority serving

Provider experience

HIV+ clients per month

Repeat trainee

All of the above predictors come directly from the PIF except for Repeat trainee status, which is based on the linked PIF-ER

Regression models are restricted to direct providers of HIV+

ACRE FUP web survey is targeted to direct providers

Page 19: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Analytic Variable: Clinical Providers

“BIG 6” 19

Comes from PIF question 3

Clinical providers encompass 7

professional categories, though

we often refer to them as “big

6”

All other non-missing responses

are coded as non-clinical

providers

Participant Information Form

PIF3 Mutually exclusive

Page 20: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Analytic Variable: Ryan White-Funded 20

From the RWFUND administrative variable on the

bottom of the PIF

Participant Information Form

Exceptions apply: some regions have advised the NEC

to use PIF8A for this information

RWFUND

=1 =0

=1 =0 =9

PIF8A

Page 21: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Analytic Variable: Minority Provider 21

A minority provider is

Hispanic, multiracial, AI/AN,

Asian, Native Hawaiian or

Pacific Islander, or Black

A non-minority provider is a

non-Hispanic White provider

with only a single race

indicated

Those without any race

indicated are left as missing

Participant Information Form

PIF10_1 PIF10_2

PIF10_3 PIF10_4

PIF10_5

PIF9

=0 =1

Mutually exclusive

Not mutually exclusive

Page 22: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Analytic Variable: Minority Serving 22

Among providers with direct service experience to

HIV-infected clients (PIF12_1=1 and PIF13=1):

“Minority serving” (i.e., serves greater than half

minorities): PIF12B = 3 or 4

Not minority serving (i.e., serves fewer than half

minorities): PIF12B = 0, 1, or 2

=0 =1 =2 =3 =4

Participant Information Form

Skip pattern: This question

should only be answered if

PIF12_1=1 and PIF13=1 PIF12_2

Page 23: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Analytic Variable: Provider Experience 23

Among providers with direct service experience to

HIV-infected clients (PIF12_1=1 and PIF13=1):

Novice: 0 to <2 years of experience

New: 2 to <3 years of experience

Experienced: 3 or more years of experience

= continuous numeric variable

Skip pattern: This question

should only be answered if

PIF12_1=1 and PIF13=1

PIF14

Page 24: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Analytic Variable: HIV+ Clients per

Month 24

Categories for HIV+ clients per month:

0/month: PIF13 = 0 (No direct HIV+ services provided)

or PIF15 = 0

1-19/month: PIF15 = 1 or 2

20+/month: PIF15 = 3 or 4

=0 =1 =2 =3 =4

Skip pattern: This question

should only be answered if

PIF12_1=1 and PIF13=1

PIF15

Page 25: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Special Initiatives 25

Page 26: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Repeat Trainees 26

Repeat trainee status is relative to the last eligible record during the analysis period

An individual who attended multiple AETC trainings with only 1 MAI training would not be categorized as a repeat trainee in an MAI analysis, since the last eligible MAI training record is the first and only MAI training

However, this same individual would be considered a repeat trainee for a cross-region analysis during this time period

A trainee is considered non-unique if s/he has the same PIF ID within a combined AETC region (e.g., AETC13, 39, 51 considered combined PAMA region)

Assumption: An individual took trainings within one region only. For example, Trainee who moved from CA to NY with training records in both regions would be counted as two separate individuals in the cross-site data.

Page 27: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Repeat Trainees – Combined AETC

Codes 27

Regional AETC Name Combined AETC Codes

Delta 1, 30

Florida/Caribbean 2, 31, 57, 61

Midwest 4, 32

Mountain Plains 5, 33, 56

New England 8, 35

NY/NJ 10, 36

Northwest 11, 37, 52

Pacific 12, 38, 50, 68

PAMA 13, 39, 51

Southeast 15, 40, 58

TX/OK 16, 41

Page 28: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Repeat Trainees- Example 28

PIF_ID AETC Funding Type Training event (any type)

during analytic period

12345678 13 MAI 1

12345678 13 Base 2

12345678 39 CDC testing 3

12345678 13 Base 4

If PIF_ID 12345678 were truly a valid ID and the records below are all

event data for this trainee in the fiscal year, sorted by event date:

In an MAI analysis, the latest MAI record would be retained. This trainee is not a repeat trainee during the MAI training.

In an overall analysis, this trainee is a repeat trainee. The fourth training record retained for the analysis.

Notes: AETC=39 is grouped with AETC=13 for the region PA/MA. Repeat trainee analyses are coupled with the de-duplication process.

Page 29: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

We identified data to include by limiting records to

those identified as MAI on the ER:

MAI Initiative Events

Event Record form

29

ER5_3=1

Not mutually exclusive

Page 30: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

We identified data to include by limiting records to

those identified as “HIV Testing” on the ER and

through the code associated with CDC funding used

by AETC Regions:

HIV Testing Events

-OR-

-OR-

AETC = 30-41 (CDC testing code)

Event Record form

30

ER4_7=1

ER4_31=1

Page 31: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

ACRE Rescaled Outcomes 31

Original Scale IP Meanings FUP Meanings New Scale

1 “Novice”

“Poor”

“Disagree Strongly”

“Strongly Disagree” 0

2 “Disagree” 25

3 “Neither Agree or

Disagree”

50

4 “Agree” 75

5 “Expert”

“Excellent”

“Agree Strongly”

“Strongly Agree” 100

For ease of interpretation, all outcome responses were rescaled from 1-5 to 0-100 so that the results could be interpreted as percent change:

Original scale values of 0 or >5 are recoded to missing. Decimal values

between 1-5 are rounded down to a whole number.

Page 32: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Response Rates (ACRE-FUP) 32

Page 33: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Response Rates - Background 33

Over a wide range of disciplines, email response

rates average 20-30%

Factors hypothesized to influence response rates

Number of questions

Pre-notification

Follow-up

Salience

Page 34: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

2013 Response Rates* 34

2013 response rates: 5% - 64%, avg: 30%

Top responders:

University of Hawaii (Pacific) – 64.1%

YVFWC (Northwest) – 47.7%

UNC Chapel Hill (SEATEC) – 42.5%

AARTH (Northwest) – 42.1%

Indiana (MATEC) – 38.5%

*Response rates by LPS for VF users with a minimum of 20 total participants

Page 35: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

2014 Response Rates*,** 35

2014 response rates: 11% - 61%, avg: 27%

Top responders:

UK (SEATEC) – 60.5%

USC (SEATEC) – 55.8%

AZ AIDS ETC (Pacific) – 47.4%

SPIPA (Northwest) – 44.0%

Pittsburgh (PA/MA) – 41.0%

*Response rates by LPS for VF users with a minimum of 20 total participants

**Response rates through October 1, 2014

Page 36: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Response Rates 36

LPS with >1 events per year have higher response

rates: 31% vs 23%

Average response rate for LPS with 10+ events: 35%

Average response rate for LPS with 50+

attendees/event: 28%

Average response rate for LPS with <20

attendees/event: 35%

Page 37: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Response Rates 37

Email comments from top responders:

Online registration (UK & USC)

Participant buy-in (UK, USC, SPIPA)

Cultural awareness of participants (SPIPA)

Monthly audits from central office (UK & USC)

Page 38: NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & … slides 2014 10 27d_0...NEC algorithm, there is truly an associated PIF in the linked dataset Analytic population includes: For

Response Rates 38

Additional comments?

Questions/concerns?