New Perspectives for Survey Research. Data linkage ......New Perspectives for Survey Research. Data...

28
New Perspectives for Survey Research. Data Linkage, Georeferenced Data, Causal Analysis (with examples from environmental research projects) Andreas Diekmann Zurich team: Heidi Bruderer Enzler, Jennifer Gewinner, Matthias Näf, Anouk Widmer September 12th 2018

Transcript of New Perspectives for Survey Research. Data linkage ......New Perspectives for Survey Research. Data...

  • New Perspectives for Survey Research. Data Linkage, Georeferenced Data,

    Causal Analysis(with examples from environmental

    research projects)

    Andreas Diekmann

    Zurich team: Heidi Bruderer Enzler, Jennifer Gewinner, Matthias Näf, Anouk Widmer

    September 12th 2018

  • ► „French Paradox“: Wine, cheese, and longerlife – how does it go together?

    Is moderate alcohol consumptiongood for health?

  • ► „French Paradox“: Wine, cheese, and longerlife – how does it go together?► „Groenbaek-Study confirms: „Wine drinkerslive healthier than abstinent people!“ (Forum Wein und Gesundheit e.V., Klatsky et al., 2003. Wine, liquor, beer andmortality. Am. J. of Epidemiology 158.

    Is moderate alcohol consumptiongood for health?

  • J-Curve of alcohol consumption

    https://www.wineinmoderation.eu/en/content/Wine-Health.11/

  • J-Curve of alcohol consumption

    https://www.wineinmoderation.eu/en/content/Wine-Health.11/

    Abstinent persons: A selective sampleconsisting of a higher proportion offormer addicts and other confounders.

    ► Selective sample may be one reason for the fallacy in earlierstudies (e.g. Fekjær 2013).

  • Wood et al., Lancet 2018Meta analysis of 83 studies

    100 g per week (!)threshold

    ► J – curve disproved!

    “The level of alcohol consumption that minimised harm across health outcomes was zero.”

    Griswold et al.; Lancet 2018

  • ► Selective samples; declining response rates► Unobserved heterogeneity (unobserved confounders)► Fallacy of causal inference fromsurvey/correlational/non-experimental data► Validity of indicators (particularly when based on self-reported behavior)► „Replication Crisis“ (Ioannidis 2005, „Why mostpublished research findings are false“; Nosek et al. 2015, „reproducibility project“, 64% of findings in psychologywere not replicable!)

    Survey Research Problems

    ► How to cope with these problems? Examples from our research projects on environmentally related behaviour, energy saving, and pollution Open Science Collaboration 2015

  • How to cope with these problems?

    1. Data Linkage2. New Data Sources3. Methods of Causal Inference4. Future Perspectives

  • 1. Data Linkage„Cocain in sewage. Wastewater tellsmore about drugs than any survey“.

    („Koks im Kanal. Über Drogentrends erzählt das Abwasser mehr als jede Umfrage“ – Süddeutsche Zeitung, August 11/12, 2018)

    ► Survey data on self-reported behaviour are oftenunreliable. RRT („randomized response“ and relatedtechniques are rarely helpful.(Höglinger, Jann, Diekmann 2017; Höglinger and Diekmann 2017)

    ► Combine survey data and process produced data!

  • Heidi Bruderer Enzler1, Andreas Diekmann1 & Ulf Liebe21ETH Zurich ([email protected]), 2University of Bern, Switzerland,Funded by a grant of the Swiss National Science Foundation (SNF)

    Example 1: Do Future-Oriented Persons Use Less Energy? A Study Combining Survey and Metered Electricity Usage Data (NFP 71)

    ► Data linkage by utility company

    • Contrary to expectations, we do not find any correlations between energy use and subjective discount rates.

    • The future orientation scale CFC, however, is related to energy use. • There is a large impact of gender. In one-person households, women's energy

    consumption is 20% lower than men's energy consumption.

  • Spatial Context: Environmental Justice in a GIS

    Respondents of Swiss Environmental Survey(FORS Data archive)

    Particulate MatterPollutant Map, Federal Office for the Environment. Particles (PM10) (µg/m3)x1000 in year 2000

    Linkage of household data with geo-referenced data of environmental pollution(particulate matter, NO2, noise)

    Example 2: Data linkage of survey data with geo-referenced data in a GISThe Social Gradient of Environmental Pollution. Do migrants suffer more by pollution than Swiss people? Is income related to environmental burden?

  • NO2 PM10 PM 2.5 Ozon Tag Nacht

    Schweizer/In ref. ref. ref. ref. ref. ref.

    Westeuropa, Nordamerika -0.37 -0.18 -0.10 2.77 0.54 0.62(-0.73) (-0.60) (-0.47) (0.28) (0.90) (1.02)

    Südeuropa 2.28** 1.44** 0.91** 18.82+ 1.48* 1.84**(4.10) (4.35) (4.01) (1.73) (2.22) (2.75)

    3.20** 1.72** 1.05** -21.23 1.89* 2.03*(4.16) (3.74) (3.33) (-1.41) (2.01) (2.14)

    0.27 0.04 0.02 0.74 -0.54 -0.76(0.68) (0.17) (0.13) (0.10) (-1.13) (-1.58)

    -0.14** -0.07** -0.05** -0.05 -0.04 -0.04(-4.88) (-4.28) (-4.15) (-0.08) (-1.13) (-1.19)

    Ländliches Gebiet ref. ref. ref. ref. ref. ref.

    Agglomeration 6.55** 3.36** 2.29** -22.83** 2.30** 2.61**(21.47) (18.52) (18.41) (-3.82) (6.33) (7.16)

    Kleine od. Mittlere Stadt 8.33** 2.90** 1.91** -54.37** 5.03** 4.00**(22.49) (13.13) (12.61) (-7.50) (11.37) (8.99)

    Großstadt 16.91** 8.50** 5.45** -139.77** 5.37** 4.21**(50.44) (42.58) (39.78) (-21.30) (13.40) (10.45)

    Konstante 16.42** 17.24** 13.27** 303.35** 48.52** 37.50**(29.59) (52.13) (58.52) (27.92) (73.02) (56.12)

    Korr. R-Quadrat 0.526 0.446 0.409 0.195 0.086 0.052Anzahl Fälle 2569 2568 2568 2565 2546 2546

    Äquivalenzeinkommen (mon. in Tsd.)

    Luftbelastung Straßenlärm

    Andere (Balkan, Osteuropa, Asien, Südamerika)

    Bildungsjahre HH (BFS 2007, in Zehner)

    Noise and air pollution in Switzerlandby nationality and income

    Diekmann & Meyer 2010, 2012 (funded by SNF; new project: Bern, Zurich, Mainz, Hannover – Bruderer Enzler, Diekmann, Kurz, Liebe, Preisendörfer SNF & DFG

    Surprisingly, very flat „social gradient“of environmental burden by incomein Switzerland!

    LiesMich

    Die Nummerierung der Tabellen entspricht nicht der Nummerierung im Artikel, sondern der Nummerierung der Modelle in den pdf-Files

    aus .tex und den dazugehörigen Modellrechnungen.

    BivariateKorrelationen

    Bivariate Korrelationen

    LuftbelastungStraßenlärm

    0NO2PM10PM 2.5OzonTagNacht

    Schweizer/Inref.ref.ref.ref.ref.ref.

    Ausländer0.10**0.08**0.08**-0.000.07**0.08**

    0.0000.0000.0000.8120.0000.000

    Bildungsjahre HH (BFS 2007, in Zehner)0.10**0.08**0.08**-0.06**0.01-0.02

    0.0000.0000.0000.0010.6630.404

    Äquivalenzeinkommen (mon. in Tsd.)-0.04+-0.03+-0.03+-0.02-0.02-0.03

    0.0740.0850.0800.3850.3020.194

    Durchschnittliche Anzahl Fälle282828272827282428012801

    p-Werte unter den Korrelationskoeffizienten

    + p

  • Spatial Data – New developments (GIS) based on old ideas:Snow, John, 1855. „On the Mode of Communication of Cholera“

    Dr. John Snow analysed thespatial data ofcholera deaths1848-49 in London. He detected that the source of the illness was contaminatedwater.

    http://www.ph.ucla.edu/epi/snow/snowbook2.htmlUCLA Department of Epidemiology Detail from

    Snow‘s map

    The pumpon BroadStreet

    Frequency of deaths

  • Follow-Up Project Environmental Justice Surveys 2018

    • Mailed questionnaires, three reminders, noincentive, random sample from registrationoffice

    Response rate (AAPOR):Bern 55 %Zurich 48 %Mainz 45 %Hannover 35 %

    Research project on „Environmental Justice“. Environmental burden in four urban areas in Germany and Switzerland, data linkage of survey and pollution data in a GIS. Funded byDFG and SNF.

    ► Going back to low-techtraditional methods pays off! Larger response rates than mostCATI and CAPI surveys!

  • 2. New Data Sources: Automatic web scraping of digital data

    ► A huge amount of information on web pages. Renaissance of„unobtrusive measures“ (Webb, Campbell, Schwartz, Sechrest 1966).

    ► Example: Reputation, price, and feed back giving on digital markets(More than 350‘000 auctions, Diekmann, Jann, Przepiorka, Wehrli, ASR 2014)

    ► Example: New (pilot) project on rating of energy intensive products:N = 1730 ratings of 754 refrigerators on Amazon.Automatic web scraping of product characteristics and ratings.

    ► Rating = b0 + b1Efficiency + b2Volume + b3 noise dB + 4 price + [b] producer

    Sign. coefficients, α = 0.05

    aeg6% amica

    4%bauknecht

    6%

    beko5%blomberg

    0%bomann3%

    bosch14%

    comfee1%

    domo0%

    exquisit2%

    ggv1%gorenje12%

    haier1%

    hisense1%

    hoover0%

    hotpoint1%

    husky1%

    kalamera0%

    klarstein4%

    lg electronics2%

    lg gbb0%

    liebherr4%

    miele1%

    neff1%

    pkm1%

    samsung3%

    schaub-lorenz1%

    severin4%

    siemens13%

    smeg4%

    syntrox0%

    tristar0%

    ultratec0%

    whirlpool0%

    zanussi1%

    Total

    aeg

    amica

    bauknecht

    beko

    blomberg

    bomann

    bosch

    comfee

    domo

    exquisit

    ggv

    155

    394

    168

    17 15 3 1 10

    200

    400

    600

    A+++ A++ A+ A B C D G

    Energy Efficiency

  • 3. Causal Inference. Experiments, Field Experiments, Intervention Studies

    ► Correlational analysis, multiple regression, SEM often misleading!

    ► „Gold standard“: RCT (Randomized Controlled Trial) in evidence based medicine

    ► Increasingly used design in the social sciences (Banerjee and Duflo 2011)

    ► Example: Allcott (2011): A large field experiment in theUS with 600’000 households. About 2 % Energyreduction by mailed reports with social comparison norms.

    ► Approximations of RCT; less restrictive (Quasi-)experiments

    ► Sometimes useful; still there is the problem of a „Hawthorne Effect“.

    ► Interesting study by Schwartz et al. (2013). They found 2.7% less energy usesimply for subjects being aware of participating in a research project. Possiblecausal fallacy when probands are aware of participation in the treatment group but not in the control group!

  • Green energy default has an extremely large effect of about 80% increase in green energy consumption (and a corresponding

    decrease of conventional energy)

    Intervention year 2

    Households Businesses

    Liebe, Ulf, 2018. Green Energy Defaults Have Massive and Persistent Effects in the Householdand Business Sector; Liebe, Gewinner, Diekmann 2018, funded by NFP71.

    10,659 households and 1,139 businesses

    Our field studies of green energy are certainly not prone to a Hawthorne effect: „Covered“ design, process produced data.

    ca. 80% ca. 70%

  • Causal Inference forNon-Experimental Data

    Most significant development in statistics & software: Fixed effects regression of paneldata control for observed and unobservedheterogeneity (under certain assumptions). Instructive overview: Brüderl 2010.

    Important progress in statistics but not completely new: e.g. Mundlak 1961, Hsiao1985.

  • The Future of Survey Research► Surveys will not die out! They are indispensable in many areas, particularly foropinion research.

  • The Future of Survey Research► Surveys will not die out! They are indispensable in many areas, particularly foropinion research.► Systematic survey design: Total-survey error (TSE) approach and transparent reporting.

  • The Future of Survey Research► Surveys will not die out! They are indispensable in many areas, particularly foropinion research.► Systematic survey design: Total-survey error (TSE) approach and transparent reporting. ► The weakness of surveys in some areas may be compensated by data linkage(registered data, spatial data, social context, GIS).

  • The Future of Survey Research► Surveys will not die out! They are indispensable in many areas, particularly foropinion research.► Systematic survey design: Total-survey error (TSE) approach and transparent reporting. ► The weakness of surveys in some areas may be compensated by data linkage(registered data, spatial data, social context, GIS).► Spread of large panel (and trend) survey programs, „globalization“ and intern. comparison of household panels; specialized panels on health, education etc.

  • The Future of Survey Research► Surveys will not die out! They are indispensable in many areas, particularly foropinion research.► Systematic survey design: Total-survey error (TSE) approach and transparent reporting. ► The weakness of surveys in some areas may be compensated by data linkage(registered data, spatial data, social context, GIS).► Spread of large panel (and trend) survey programs, „globalization“ and intern. comparison of household panels; specialized panels on health, education etc.► But there will be also increasing use of alternative data sources: automatic textprocessing, web scraping and other techniques.

  • The Future of Survey Research► Surveys will not die out! They are indispensable in many areas, particularly foropinion research.► Systematic survey design: Total-survey error (TSE) approach and transparent reporting. ► The weakness of surveys in some areas may be compensated by data linkage(registered data, spatial data, social context, GIS).► Spread of large panel (and trend) survey programs, „globalization“ and intern. comparison of household panels; specialized panels on health, education etc.► But there will be also increasing use of alternative data sources: automatic textprocessing, web scraping and other techniques.► Social sciences should become more professional. They should attain the status ofa cumulative science! (We are still in a state like e.g. chemistry in the 18th. century.)

  • The Future of Survey Research► Surveys will not die out! They are indispensable in many areas, particularly foropinion research.► Systematic survey design: Total-survey error (TSE) approach and transparent reporting. ► The weakness of surveys in some areas may be compensated by data linkage(registered data, spatial data, social context, GIS).► Spread of large panel (and trend) survey programs, „globalization“ and intern. comparison of household panels; specialized panels on health, education etc.► But there will be also increasing use of alternative data sources: automatic textprocessing, web scraping and other techniques.► Social sciences should become more professional. They should attain the status of a cumulative science! (We are still in a state like e.g. chemistry in the 18th. Century.)► The curriculum of social science studies needs profound revision: multivariate statistics, data science, knowledge in methods and model building.

  • The Future of Survey Research► Surveys will not die out! They are indispensable in many areas, particularly foropinion research.► Systematic survey design: Total-survey error (TSE) approach and transparent reporting. ► The weakness of surveys in some areas may be compensated by data linkage(registered data, spatial data, social context, GIS).► Spread of large panel (and trend) survey programs, „globalization“ and intern. comparison of household panels; specialized panels on health, education etc.► But there will be also increasing use of alternative data sources: automatic textprocessing, web scraping and other techniques.► Social sciences have to become more professional. They should attain the status ofa cumulative science! (We are still in a state like e.g. chemistry in the 18th. Century.)► The curriculum of social science studies needs profound revision: multivariate statistics, data science, knowledge in methods and model building.► Data archives like FORS, GESIS etc. are extremely important for archiving andgiving access to data, for replications, for methodological research, and for datacollection.

  • The Future of Survey Research► Surveys will not die out! They are indispensable in many areas, particularly foropinion research.► Systematic survey design: Total-survey error (TSE) approach and transparent reporting. ► The weakness of surveys in some areas may be compensated by data linkage(registered data, spatial data, social context, GIS).► Spread of large panel (and trend) survey programs, „globalization“ and intern. comparison of household panels; specialized panels on health, education etc.► But there will be also increasing use of alternative data sources: automatic textprocessing, web scraping and other techniques.► Social sciences have to become more professional. They should attain the status ofa cumulative science! (We are still in a state like e.g. chemistry in the 18th. Century.)► The curriculum of social science studies needs profound revision: multivariate statistics, data science, knowledge in methods and model building.► Data archives like FORS, GESIS etc. are extremely important, for archiving and givingaccess to data, for replications, for methodological research, and for data collection.► Replications, replications, replications! Never trust a single study! Most newlyreported effects are either wrong or inflated. The SNF and FORS should supportreplication studies. Establish an annual prize for the best replication study!

  • Brigitte.de

    Congratulations to FORS‘ 10th Anniversary!

    New Perspectives for Survey Research. Data Linkage, Georeferenced Data, �Causal Analysis� (with examples from environmental� research projects)�Is moderate alcohol consumption good for health?Is moderate alcohol consumption good for health?J-Curve of alcohol consumptionJ-Curve of alcohol consumptionWood et al., Lancet 2018�Meta analysis of 83 studies Slide Number 7How to cope with these problems?1. Data LinkageSlide Number 10Slide Number 11Slide Number 12�Spatial Data – New developments (GIS) based on old ideas:�Snow, John, 1855. „On the Mode of Communication of Cholera“�Follow-Up Project Environmental Justice Surveys 2018 Slide Number 15Slide Number 16Green energy default has an extremely large effect of about 80% increase in green energy consumption (and a corresponding decrease of conventional energy)Causal Inference for �Non-Experimental DataThe Future of Survey ResearchThe Future of Survey ResearchThe Future of Survey ResearchThe Future of Survey ResearchThe Future of Survey ResearchThe Future of Survey ResearchThe Future of Survey ResearchThe Future of Survey ResearchThe Future of Survey ResearchSlide Number 28