Google Trends Predicting Present v2

download Google Trends Predicting Present v2

of 42

Transcript of Google Trends Predicting Present v2

  • 8/22/2019 Google Trends Predicting Present v2

    1/42

    Google Confidential and Proprietary 1

    Predicting the PresentWith Google Trends

    Hyunyoung Choi

    Hal Varian

    June 2009

  • 8/22/2019 Google Trends Predicting Present v2

    2/42

    Google Confidential and Proprietary 2 2

    Problem statement

    Government agencies and other organizations produce monthly reports on economic activity

    Retail Sales

    House Sales

    Automotive Sales

    Unemployment

    Problems with reports

    Compilation delay of several weeks Subsequent revisions

    Sample size may be small

    Not available at all geographic levels

    Google Trends releases daily and weekly index of search queries by industry vertical

    Real time data

    No revisions (but some sampling variation)

    Large samples

    Available by country, state and city

    Can Google Trends data help predict currenteconomic activity?

    Before release of preliminary statistics

    Before release of final revision

  • 8/22/2019 Google Trends Predicting Present v2

    3/42

    Google Confidential and Proprietary

    3

    Categories in Google Trends by Query Shares

    Note: Queries from 2009-01-01 to 2009-04-30 & Growth Comparison w/ the same time window

  • 8/22/2019 Google Trends Predicting Present v2

    4/42

    Google Confidential and Proprietary

    Real Estate

  • 8/22/2019 Google Trends Predicting Present v2

    5/42

    Google Confidential and Proprietary 5

    Geography

    Category

    Time window

  • 8/22/2019 Google Trends Predicting Present v2

    6/42

    Google Confidential and Proprietary 6

    Real Estate Agencies

    Rental Listings & Referrals

    Home Insurance

    Home Inspections

    & Appraisal

    P

    roperty

    M

    anagement

    Home Financing

    6

    Subcategories under Real Estate by Query Shares

  • 8/22/2019 Google Trends Predicting Present v2

    7/42Google Confidential and Proprietary 7 7

    Search on Real Estate Agencies

  • 8/22/2019 Google Trends Predicting Present v2

    8/42Google Confidential and Proprietary 8 8

    Searches on Rental Listings & Referrals

  • 8/22/2019 Google Trends Predicting Present v2

    9/42Google Confidential and Proprietary 9

    Depicting trends

    Google Trends measures normalizedquery

    share of particular category of queries controls for overall growth

    Often useful to look at year-on-year changes

    to eliminate seasonality.

    Illustrate correlations and covariates.

    Improving predictions

    Forecast time series using its own lagged

    values and add Trends data as a predictor.

    Statistical significance?

    Improved fit?

    Improved forecasts?

    Identify turning points?

    9

    2006 2007 2008

    30

    20

    10

    0

    10

    20

    Real Estate Agencies Query Index

    Oct Jan Apr Jul20

    15

    10

    5

    0

    5

    Real Estate Agencies YOY Growth Index

  • 8/22/2019 Google Trends Predicting Present v2

    10/42Google Confidential and Proprietary 10 10

    15 yr Mortgage Rate vs. Home Financing

  • 8/22/2019 Google Trends Predicting Present v2

    11/42Google Confidential and Proprietary 11 1111

    Forecasting primer

    Basic forecasting models

    Autoregressive: value at time t depends on

    Value at time t-1

    Seasonal adjustment: value at time t depends on

    Value at time t-12

    For monthly data

    Transfer function: value at time t depends on

    Other contemporaneous or lagging variables

    Seasonal autoregressive transfer model: Value at time t depends on

    Value at time t-12 (seasonality)

    Value at time t-1 (recent behavior)

    Other lagging or contemporaneous variables (such as Google Trends data)

    Typical question of interest

    How much more accurate forecasts can you get from additional variables over and above the accuracy

    you get with the history of the time series itself?

  • 8/22/2019 Google Trends Predicting Present v2

    12/42Google Confidential and Proprietary

    New Home Sales

    Model

    Recent Trend with NewHome Sales at t-1

    Seasonality with NewHome Sales at t-12

    Recent Search Activity on

    Real Estate Agencies

    Rental Listings & Referrals

    Home Inspections &Appraisal

    Property Management

    Home Insurance

    Home Financing

    Time Series Google Trends

    Housing affordabilitywith Average/MedianHome Price

    Exogenous Variables

  • 8/22/2019 Google Trends Predicting Present v2

    13/42Google Confidential and Proprietary 13 13

    Predicting the present

    Monthly release 24 28 days after the

    month

    Seasonally adjusted

    National and Regional aggregate

    Home Inspections & Appraisal

    Home Insurance

    Home Financing

    Property Management

    Rental Listings & Referrals

    Real Estate Agencies

    New Residential Sales from US Census Google Trends Real Estate by Category

  • 8/22/2019 Google Trends Predicting Present v2

    14/42

    Google Confidential and Proprietary 14 14

    New House Sales vs. Real Estate Google Trends

  • 8/22/2019 Google Trends Predicting Present v2

    15/42

    Google Confidential and Proprietary 15

    Model:

    Yt = 446.1 + 0.864 * Yt - 1 4.340 * us378.1 + 4.198 * us96.2 0.001 * AvgPt 1

    Yt : New house sold at t-th month

    AvgPt 1: Average Sales Price of New One-Family Houses Sold at (t-1)-th month

    us378.1 : Google Trend of vertical id = 378 (Rental Listings & Referrals ) at t-th month 1st week

    us96.2 : Google Trend of vertical id = 96 (Real Estate Agent) at t-th month 2nd week

    15

    Analysis and Forecasting

    July 2008

    Actual = 515K

    Predicted = 442.98K

    Z-score = 2.53

    August 2008 Prediction = 417.52K

  • 8/22/2019 Google Trends Predicting Present v2

    16/42

    Google Confidential and Proprietary 16 16

    Analysis and Forecasting

    Observations

    Since 2005 new house sales have been decreasing, with little seasonality

    Google Trends captures seasonality & recent trends

    Positive association with Real Estate Agencies (96)

    Negative association with Rental Listings & Referrals (378) and Average Price

  • 8/22/2019 Google Trends Predicting Present v2

    17/42

    Google Confidential and Proprietary 17

    Travel

  • 8/22/2019 Google Trends Predicting Present v2

    18/42

    Google Confidential and Proprietary 18

    Hotels & Accommodations

    Attractions & Activities

    Air Travel

    Bus & Rail

    Cruises &

    Charters

    A

    dventure

    Travel

    Car Rental

    & Taxi Services

    Vacation Destinations

    18

    Subcategories under Travel by Query Shares

  • 8/22/2019 Google Trends Predicting Present v2

    19/42

    Google Confidential and Proprietary 19 19

    Travel to Hong Kong

    Monthly summaries release with 1

    month lag

    Reports Country/Territory of Residence

    of visitors

    Data available 2004-2008

    Hotels & Accommodations

    Air Travel

    Car Rental & Taxi Services

    Cruises & Charters

    Attractions & Activities

    Vacation Destinations

    Australia

    Caribbean Islands

    Hawaii

    Hong Kong

    Las Vegas Mexico

    New York City

    Orlando

    Adventure Travel

    Bus & Rail

    Google Trends Travel by CategoryVisitors Arrival Statistics from Hong

    Kong Tourism Board

  • 8/22/2019 Google Trends Predicting Present v2

    20/42

    Google Confidential and Proprietary 20 20

    Visitors Arrival Statistics vs. Google Trends

  • 8/22/2019 Google Trends Predicting Present v2

    21/42

    Google Confidential and Proprietary 21 21

    Analysis and Forecasting

    Model:

    log(Yi,t) = 0.664 + 0.113 * log(Yi,t-1) + 0.828 * log(Yi,t-12) + 0.001 * Xi,t,2 + 0.001 * Xi,t,3

    + 0.005 * FXrate i,t + i, + ei,t

    ei,t ~ N(0, 0.09382), i ~ N(0, 0.0228

    2)

    Yi,t = Arrival to Hong Kong at month t and from i-th country

    Xi,t,1 = Google Trend Search at 1st week of month t and from i-th country

    Xi,t,2 = Google Trend Search at 2nd week of month t and from i-th country

    Xi,t,3 = Google Trend Search at 3rd week of month t and from i-th country

    FXrate i,t = Hong Kong Dollar per one unit of i-th countrys local currency at month t. Average of first

    weeks FX rate is used as a proxy to FX rate per each month.

  • 8/22/2019 Google Trends Predicting Present v2

    22/42

    Google Confidential and Proprietary 22 22

    Visitor Arrival Statistics - Actual & Fitted

  • 8/22/2019 Google Trends Predicting Present v2

    23/42

    Google Confidential and Proprietary 23 23

    Analysis and Forecasting

    Conclusion

    Arrival at time t is positively associated with arrival at time t-1 and arrival at time t-12.

    It shows strong seasonality and autocorrelation

    Arrival at time t is positively associated with searches on [Hong Kong].

    Arrival at time t is positively associated with FX rates.

    When the local currency appreciates relative to Hong Kong Dollar, visitors to Hong Kong increase.

  • 8/22/2019 Google Trends Predicting Present v2

    24/42

    Google Confidential and Proprietary 24

    Automobiles

  • 8/22/2019 Google Trends Predicting Present v2

    25/42

    Google Confidential and Proprietary 25 2525

    US Auto Sales by Make

    Monthly summaries released 1 week

    after end of month

    Data available by Car Sales, Truck

    Sales and Total Sales for each make

    Data available from 2003-2008

    Source:Automotive News Data Center

    Google Trends subcategory Vehicle

    Brands.

    Weekly Search query index

    Total 31 verticals in this subcategory

    27 verticals matching to Monthly Sales

    available

    Google Trends under Vehicle Brands

    CategoryUS Auto Sales by Make

  • 8/22/2019 Google Trends Predicting Present v2

    26/42

    Google Confidential and Proprietary 26 26

    Google Categories under Vehicle Brands

    NOTE: Area represents the queries volume from first half year 2008 and the color represents queries yearly growth rate

  • 8/22/2019 Google Trends Predicting Present v2

    27/42

    Google Confidential and Proprietary 27 2727

    Auto Sales by Make (Top 9 Make by Sales)Monthly Sales vs. Google Trends at Second Week of each month

  • 8/22/2019 Google Trends Predicting Present v2

    28/42

    Google Confidential and Proprietary 28 2828

    Analysis and Forecasting

    Fixed effects model:

    log(Yi,t) = 2.4276 + 0.2552 * log(Yi,t-1) + 0.4930 * log(Yi,t-12)

    + 0.0005 * Xi,t,2 + 0.0014 * Xi,t,2 + ai * Makei + ei,t

    ei,t ~ N(0, 0.13472) , Adjusted R2 = 0.9829

    Yi,t = Auto Sales of i-th Make at month t

    Xi,t,1 = Google Trend Search at 1st week of month t and from i-th make

    Xi,t,2

    = Google Trend Search at 2nd week of month t and from i-th make

    Makei =Dummy variable for Auto Make

    ai = Coefficient to capture the mean level of Auto Sales by Make

    ANOVA Table

    Df Sum Sq Mean Sq F value Pr(>F)trends1 1 12.89 12.89 710.3542 < 2e-16 ***

    trends2 1 0.05 0.05 2.7987 0.09455 .

    log(s1) 1 1532.95 1532.95 84452.7530 < 2e-16 ***

    log(s12) 1 24.07 24.07 1325.9741 < 2e-16 ***

    as.factor(brand) 26 3.34 0.13 7.0696 < 2e-16 ***

    Residuals 1480 26.86 0.02

  • 8/22/2019 Google Trends Predicting Present v2

    29/42

    Google Confidential and Proprietary 29 29

    Actual vs. Fitted Sales (Top 9 Make by Sales)

  • 8/22/2019 Google Trends Predicting Present v2

    30/42

    Google Confidential and Proprietary 30 3030

    Analysis and Forecasting

    Conclusion

    Sales at time t are positively associated with Sales at time t-1 and Sales at time t-12.

    Sales show strong seasonality and autocorrelation

    Monthly Sales are positively correlated to the first and second weeks search volume of each

    month.

    If the search volume increase by 1%, the sales volume will increase by an average of 0.19%.

  • 8/22/2019 Google Trends Predicting Present v2

    31/42

    Google Confidential and Proprietary 31

    Unemployment

  • 8/22/2019 Google Trends Predicting Present v2

    32/42

    Google Confidential and Proprietary

    YoY Growth in Initial Claims & Google Search

    According to the NBER, the current recession started December 2007.

    National unemployment rate passed 5% in mid 2008 and search queries on [Welfare

    and Unemployment] also increased at same time.

  • 8/22/2019 Google Trends Predicting Present v2

    33/42

    Google Confidential and Proprietary

    Initial claims is an important leading indicator

    Google Trends data [Search Insights screenshot]

  • 8/22/2019 Google Trends Predicting Present v2

    34/42

    Google Confidential and Proprietary

    Google Trends data [Search Insights screenshot]

  • 8/22/2019 Google Trends Predicting Present v2

    35/42

    Google Confidential and Proprietary

    Initial Claims and Google Trends

    Month May 2009

    Week3/15/09 -

    3/21/09

    3/22/09 -

    3/28/09

    3/29/09 -

    4/4/09

    4/5/09 -

    4/11/09

    4/12/09 -

    4/18/09

    4/19/09 -

    4/25/09

    4/26/09 -

    5/2/09

    Initial Claims 81,236 74,179 69,471 75,875 84,410Continued Claims 859,561 826,924 866,734 834,569 846,477

    Covered Employment 15,395,215 15,395,215 15,395,215 15,356,117 15,356,117

    Insured Unemployment Rate 5.58 5.37 5.63 5.43 5.51

    Jobs 9% 6% 2% 0% 1% -9% -11%

    Welfare & Unemployment -2% -9% -13% -12% -6% -9% -10%

    California

    March 2009 April 2009

    Release at

    5/7/09

    Release at

    5/14/09

    Google

    Trends

    US Dept of

    Labor

  • 8/22/2019 Google Trends Predicting Present v2

    36/42

    Google Confidential and Proprietary

    Strong Autocorrelation in Initial Claims

    Time Series Autocorrelation Function

  • 8/22/2019 Google Trends Predicting Present v2

    37/42

    Google Confidential and Proprietary

    Initial Claims Before/After Recession Started

    California New York

  • 8/22/2019 Google Trends Predicting Present v2

    38/42

    Google Confidential and Proprietary

    Time Window for Analysis

    Window For Long Term Model

    Window For Short Term Model

    Recession Starts

  • 8/22/2019 Google Trends Predicting Present v2

    39/42

    Google Confidential and Proprietary

    Model

    Reference ARIMA(0,1,1) X (1,0,0)12 Model

    ARIMA(0,1,1) X (1,0,0)12 Model With Google Trends

    Model Fit improved significantlysmaller Standard deviation, high log likelihood and smaller AIC

    Initial Claims are positively correlated with searches on Jobs and Welfare.

    Sigmalog

    likelihoodAIC Sigma

    log

    likelihoodAIC

    LT Model -0.755 *** 0.619 *** 0.086 268.85 -531.69 -0.725 *** 0.565 *** 0.004 ** 0.003 ** 0.083 285.96 -561.91

    ST Model -0.691 *** 0.463 *** 0.098 99.04 -192.08 -0.657 *** 0.359 ** 0.002 0.007 *** 0.088 114.19 -218.38

    Reference Model Model with Google Trends

    Theta Phi Theta Phi Jobs Welfare

    Signif. codes: 0.001 ***0.05 ** 0.01 *

  • 8/22/2019 Google Trends Predicting Present v2

    40/42

    Google Confidential and Proprietary

    Long Term Model: Prediction Comparison with MAE

    With Google Trends, the out-of-sample prediction MAE decreases by 16.84%.

    Prediction with rolling window from 1/11/2009 to 4/12/2009

    Prediction Error at t:

    Mean Absolute Error:

  • 8/22/2019 Google Trends Predicting Present v2

    41/42

    Google Confidential and Proprietary

    Short Term Model: Prediction Comparison with MAE

    With Google Trends, the out-of-sample prediction MAE decreases by 19.23%.

    Prediction errors are within the same range as LT Model.

    Fit improvement is better with ST Model.

  • 8/22/2019 Google Trends Predicting Present v2

    42/42

    Summary

    Google Trends significantly improves out-of-sample prediction of state unemployment, up

    to 18 days in advance of data release.

    Mean absolute error for out-of-sample predictions declines by 16.84% for LT Model and19.23% for ST Model.

    Further work

    Can examine metro level data

    Other local data (real estate)

    Combine with other predictors

    Detect turning points?