Big Data in the U.S. Consumer Price Index Presentation...Big Data in the U.S. Consumer Price Index:...

Post on 28-May-2020

3 views 0 download

Transcript of Big Data in the U.S. Consumer Price Index Presentation...Big Data in the U.S. Consumer Price Index:...

1 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Big Data in the U.S. Consumer Price Index:

Experiences & Plans

Crystal Konny, Brendan Williams, and David Friedman

Federal Economic Statistical Advisory Committee Meeting - June 14, 2019

2 — U.S. BUREAU OF LABOR STATISTICS • bls.gov2 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Potential Benefits

Transaction prices

Larger sample sizes

Reduced collection costs

Reduced or eliminated respondent burden

Data descriptiveness

Real-time expenditures and weights

3 — U.S. BUREAU OF LABOR STATISTICS • bls.gov3 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Challenges Methodological

Product life cycle, representativeness, data descriptiveness

Operational

Data lag, continuity, quality verification

Geographic structure

System design

Legal, Policy, and Budgetary

Contracting for data, webscraping agreements, confidentiality concerns

4 — U.S. BUREAU OF LABOR STATISTICS • bls.gov4 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Alternative Data Data not collected through traditional field

collection procedures by BLS staff

(traditional = in-store/on-phone/manually on website)

Three main categories:

Corporate

Secondary Source

Web/Mobile app scraping data

Decade of explorations & pilot projects –transition into production

5 — U.S. BUREAU OF LABOR STATISTICS • bls.gov5 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

General steps for Alt Data Projects

Determine what to pursue

Evaluate options

Evaluate selected source (definition, coverage, other quality dimensions)

Evaluate data quality over predefined time

Methods to test

Evaluate results

Transition to production?

6 — U.S. BUREAU OF LABOR STATISTICS • bls.gov6 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Criteria for use in production (to date)

As good or better than current pricing methodology

Does improvement in index justify any additional costs – cost effective?

In general, is it a good fit for CPI?

Use of short-term solution while continuing to research longer-term improvements

7 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Corporate Data

CorpX

0

20

40

60

80

100

Oct

-14

Jan-1

5

Apr-

15

Jul-15

Oct

-15

Jan-1

6

Apr-

16

Jul-16

Cosmetics Misc. Goods Jewelry

Dishes Misc. Household Men’s suits or blazers

Women's tops, skirts, and suits Women’s outerwear

Impact of Incorporating CorpX

96

98

100

102

104

106

108

Jul-16

Oct

-16

Jan-1

7

Apr-

17

Jul-17

Oct

-17

Jan-1

8

Apr-

18

Jul-18

Oct

-18

Apparel CPI Apparel CPI + Transaction Data

10 — U.S. BUREAU OF LABOR STATISTICS • bls.gov10 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

CorpY

February 2012 refused to initiate new prescription drug sample

March 2015 agreement to supply data corporately

May 2015 first use in index

11 — U.S. BUREAU OF LABOR STATISTICS • bls.gov11 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

CorpYCorpY In-store

Item Selection

Probability Proportional to Size (PPS) over the past year nationally by sales excluding lowest 10% of transactions

PPS based on price of the last 20 prescriptions sold

Geography National Outlet Specific

Price Average price of at least 100 transactions

Single price

Insurance prices Mostly cash prices

National price Outlet specific price

Per pill price Per prescription price

Patent Loss Unit prices averaged across brand and generic

Based on analyst monitoring of patents for an NDC

Data Frequency

Bimonthly odd collection Monthly and bimonthly odd/even collection

12 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Secondary Source Data

13 — U.S. BUREAU OF LABOR STATISTICS • bls.gov13 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Hospitals and Physicians’ Services

Relative Importance 4.04%; response rate for Medical Care is 48.1%

4,116 price quotes

Cash price overrepresented

High respondent burden

High collection costs

Difficult collection methodology

Researching use of medical claims datasets

14 — U.S. BUREAU OF LABOR STATISTICS • bls.gov14 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

New Vehicle Observations

0

100,000

200,000

300,000

400,000

500,000

CPI JDPower

Num

ber

of

Obse

rvations/

Month

Model Year Price Indexes

80

85

90

95

100

105

Jan-0

8

Jan-0

9

Jan-1

0

Jan-1

1

Jan-1

2

Jan-1

3

Jan-1

4

Jan-1

5

2009 2010 2011 2012 2013 2014 2015

Experimental Index for New Vehicles

Untaxed)

90

95

100

105

110

115

200712 200812 200912 201012 201112 201212 201312 201412 201512 201612 201712 201812

Exp New Vehicles (Untaxed) Official New Vehicles (Taxed)

17 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Web/Mobile app scraping data

Crowd Sourced Motor FuelsRegular Unleaded Gasoline

90

95

100

105

110

115

120

Nov-1

7

Dec-

17

Jan-1

8

Feb-1

8

Mar-

18

Apr-

18

May-1

8

Jun-1

8

Jul-18

Aug-1

8

Sep-1

8

Oct

-18

CPI GasBuddy Data

19 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Plans

20 — U.S. BUREAU OF LABOR STATISTICS • bls.gov20 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Establishing Priorities Relative importance of the item

Number of quotes replaced

Cost of collection relative to cost of alternative data

Respondent relationship with BLS

Concentration of respondents in the sample

Ease of implementation

Accuracy issues in the current index…

21 — U.S. BUREAU OF LABOR STATISTICS • bls.gov21 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

In the works

Item RI#

quotes concentration issues

priority

Source of data

% sample

Gasoline (all types)4.344 3,778 M L H scrape 100

Other motor fuels0.094 830 M L H scrape 90

New vehicles3.695 1,900 L H H sec 100

Physicians' services1.728 1,993 L H H sec 75

Hospital services2.312 2,123 L H H sec 85

Cable and satellite television service 1.501 1,906 H H H sec 95

Wireless telephone services1.693 1,279 H H H sec 98

Land-line telephone services0.572 874 H H H sec 95

Internet services & electronic info providers 0.780 773 H H H sec 95

22 — U.S. BUREAU OF LABOR STATISTICS • bls.gov22 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

In pursuitRI

# quotes

concentration issues priority

Source of data Experience

% sample

Prescription drugs1.316 4,641 H H H corp some

Limited service meals and snacks 2.542 2,808 M L M corp pursue

Delivery services0.014 231 H L corp pursue

Airline fares0.683 1,745 H L M

scrape, corp research

Used cars and trucks2.329 4,537 H H H sec Prod, seek 100

Postage0.094 230 H L sec prod

Leased cars and trucks 0.655 265 L H M sec research 100

Electricity2.655 1,406 M M H seek

Utility (piped) gas service 0.747 1,404 M M H seek

Rent and OER 31.548 seek

23 — U.S. BUREAU OF LABOR STATISTICS • bls.gov23 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Conclusions

Significant portion of the CPI based on alternative data within 5 years

Substantial R&D on methodology needed

Alternative data introduced incrementally alongside monthly publication

24 — U.S. BUREAU OF LABOR STATISTICS • bls.gov24 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Questions for FESAC

Do you have any reactions to the general criteria CPI has used to date for determining fitness for use? Are we missing anything, etc.?

Do our criteria for establishing priorities in moving forward make sense to you?

Any advice for meeting the methodological challenges BLS faces with some of the alternative data sources?

Contact Information

25 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Brendan Williams

Senior Economist

Branch of Consumer Prices

Williams.Brendan@bls.gov

Crystal Konny

Branch Chief

Branch of Consumer Prices

Konny.crystal@bls.gov

David Friedman

Associate Commissioner

Prices and Living Conditions

Friedman.david@bls.gov