Research and Training on Big Data · 2018. 3. 15. · Email: [email protected]. Innovative Data...

32
Research and Training on Big Data Seminar on Statistical Capacity Building for New Data Sources Keio Plaza Hotel, Tokyo, Japan 8 December 2017 Kaushal Joshi Asian Development Bank

Transcript of Research and Training on Big Data · 2018. 3. 15. · Email: [email protected]. Innovative Data...

  • ResearchandTrainingonBigData

    SeminaronStatisticalCapacityBuildingforNewDataSources

    KeioPlazaHotel,Tokyo,Japan8December2017

    KaushalJoshiAsianDevelopmentBank

  • Outline

    • ConventionalvsBigDatasources

    • ADB’sInnovativeDataCollectionforAgricultureandRuralStatistics

    • ADB’sforthcomingDataforDevelopmentInitiative

    • ConcludingObservations

  • ConventionalDataSourcesofOfficialStatistics• SURVEYS,• CENSUSES,• ADMINISTRATIVEREGISTERS.

    InnovativeDataSources• SATELLITEIMAGES,• MOBILEPHONERECORDS,• SENSORSANDSCANNERDATA,• SOCIALMEDIADATA,etc.

    ConventionalvsBigDatasources

  • SDGs callfornooneisleftbehind.

  • Leavenoonebehindprinciplerequires

    GRANULARDATA• incomeclass• populationsubgroups• gender• ethnicity• geographiclocation• migrationstatus• disabilitystatus• etc.

    ConventionalvsBigData

  • ConventionalvsBigData

    • Limitationsofsamplesurveys• Increasingcoststocollectandanalyze• Potentiallossofquality• Pressuretocollectmoreinformation• Responseburden• Politicsof/overdata• Transparency,etc

    ChallengesforDatadisaggregationfromtraditionalsources

  • • Source of Funds: Japan Fund for Poverty Reduction• Pilot Countries: Lao PDR, Philippines, Thailand andViet Nam

    • Implementation Period: June 2013 to October 2017• Objectives:

    • Development of customized software applicationsand methodology to estimate paddy rice cultivationarea and crop production using satellite data,

    • Training of counterpart staff in the four pilotcountries, and

    • Development of an online training program on theuse of satellite data for agricultural and ruralstatistics.

    InnovativeDataCollectionforAgricultureandRuralDevelopment ‐ R‐CDTA8369

  • • DevelopedcustomizedversionsofINternationalAsianHarvestmOnitoring systemforRice(INAHOR‐AD)

    • TrainedstafffromLaoPDR,thePhilippines,Thailand,andVietNam• basicremotesensing,useofINAHOR‐AD

    software,useofQGIS,cropcutting,farmerrecallsurvey,and

    • geospatialtechnologies(e.g.SNAP)andcomputer‐assistedpersonalinterviewing(e.g.SurveySolutions)

    InnovativeDataCollectionforAgricultureandRuralDevelopment ‐ R‐CDTA8369

  • • DevelopedanonlinetrainingconEstimatingRicePaddyExtentandProductionwithALOS‐2/PALSAR‐2andINAHOR‐AD

    • Promotionalvideoforthecoursehttps://youtu.be/SSwg000ooHc

    • Linkforthecourse:http://adbx.online/

    InnovativeDataCollectionforAgricultureandRuralDevelopment ‐ R‐CDTA8369

  • MethodologicalResearch‐ 1UsingAreaFrameforPaddyRiceStatistics:MethodologyandWeightingProcedures,ResultsofSurveyEstimatesandSamplingErrors.

    • Areaframeapproachinconjunctionwithcropcuttingtechniqueisusedtoestimatepaddyricearea,yield,andproductionforthe2015croppingseason(July2015–November2015)intheprovincesofSavannakhet,LaoPDR;AngThong,Thailand;andThaiBinh,VietNam.

    • Resultsobtainedarecomparedwithexistingadministrativedatasources.Significantdeviationforriceareabetweenthetwoestimates.YieldestimatesaresimilarforbothmethodsinallcountriesexceptinLaoPDR.

    InnovativeDataCollectionforAgricultureandRuralDevelopment

  • MethodologicalResearch‐ 2LandMeasurementBias:ComparisonsfromGlobalPositioningSystem,Self‐Reports,andSatelliteData

    • ThisresearchlooksatdifferencesinfarmerreportedareaversusGPS(goldstandard)andGoogleEarthforagriculturalplotarea.

    • Farmerreportedplotareaestimatesarefoundtobestatisticallydifferentwhencomparedwiththetwomethodsinthreeoutoffourcountries.

    • GoogleEarthperformsjustaswellasGPS(nostatisticallysignificantdifferences).

    InnovativeDataCollectionforAgricultureandRuralDevelopment

  • MeasuringRiceYieldfromSpace

    • ComparedareaandyieldestimatesbetweenALOS‐2satelliteandLandsat‐MODISfusiondata.

    • ALOS‐2satelliteandLandsat‐MODISfusiondataareequallyefficientforpaddyriceareaestimation,butLandsat‐MODISprovidesbetterresultsforcropyieldestimation.

    • Grounddata(cropcutting)andfusionLandsat‐MODISdatausedtocreateaspatiallydelineatedriceyieldmapforThaiBinh provinceinVietNamtopermitspatialanalysis.

    InnovativeDataCollectionforAgricultureandRuralDevelopment

    MethodologicalResearch‐ 3

  • InnovativeDataCollectionforAgricultureandRuralDevelopment ‐ R‐CDTA8369

  • TA9356‐ADB’sDataforDevelopmentProject aimstostrengthen thecapacityofNSOsmeetthedisaggregateddatarequirementsoftheSDGs.

  • TargetOutputs• TrainingworkshopsonSAEandBigDataanalyticsfortargetedtoNSOstaff

    • TrainingManualonDisaggregationofOfficialStatisticsandSDGs

    • OnlineCourseModulesonSAEandBigDataAnalytics

    • Country‐SpecificCaseStudiesonSAEandBigDataAnalytics

    TA9356‐REG:DataforDevelopment

  • Country‐SpecificCaseStudies

    • Explorethepotentialofusingbigdataasanalternativesource.

    • Facilitatecomparisonofestimatedindicatorsbasedonmethodsusingtraditionaldatasourcesandbigdatacomplementedtechniques.

    • HelpNSOsidentifytheiroperationalresourcerequirementsinintegratingbigdataanalyticsintheirworkprograms.

    TA9356‐REG:DataforDevelopment

  • TAfocusareascancapitalizeonthefollowinginnovativedatasources

    • Satelliteimages• Publiclyaccessible• Hasvariousapplications• Developedmethodsforestimatesalready

    existing

    • Mobilephonecalldetailrecords(CDRs)

    TA9356‐REG:DataforDevelopment

  • FocusofCountry‐specificCaseStudies

    • PopulationMapping

    • PovertyMapping

    TA9356‐REG:DataforDevelopment

  • Populationmappingexample:(Top‐left)Populationdensityfromcensusdataforeachadministrativelevel2unitinanareaofnorthernVietnam,(Top‐right)Landcoverdatasetforthesamearea,(Bottom‐left)Satelliteimageoftheareaatnight,(Bottom‐right)WorldPoppopulationmodellingmethodstakethecensusdataasinput,thenusemachinelearningmethodstoexploittherelationshipbetweenpopulationdensityandhighresolutionlandscapefeatures,suchasthosefromlandcoverandsatellitedata,topredictpopulationdensitiesforeach100x100mgridcellonthelandscape.Source: http://www.worldpop.org.uk/about_our_work/case_studies/

    TA9356‐REG:DataforDevelopment

  • Source: https://unstats.un.org/unsd/bigdata/taskteams/si‐gsd/default.asp

    TA9356‐REG:DataforDevelopmentBigdata– UNGlobalWorkingGroup

  • • Big data with advantages of timeliness and geo‐spatial details offers immense potential in generating quick and more granular estimates. 

    • Methods and results from big data applications however, need to be tested and cross validated with traditional surveys results for robustness. 

    • Proxy indicators correlated with traditional indicators (like night time lights) provide opportunities to generate more frequent estimates and can complement traditional databased estimates for early estimates and forecasting trends.

    ConcludingObservations

  • • Big data pose challenges ‐ privacy issues, costs, sharing of data by holders of big data, capacity to use.

    • While big data should be embraced, but traditional sources will remain important. 

    • More methodological research needed to adopt big data in official statistics. 

    ConcludingObservations

  • Thankyou!Email:[email protected]

  • InnovativeDataCollectionforAgricultureandRuralDevelopment

    SATELLITE SOURCE SPATIALRESOLUTIONTEMPORALRESOLUTION COST

    SENSORTYPE

    MODIS NASA

    1km/500m/250m 1‐2days FREE Optical

    LandsatUSGS/NASA 30m 16days FREE Optical

    ALOS‐2 JAXA 100m 14days Paid SAR

    Sentinel‐2 ESA 10m 5days FREE Optical

  • Appendix: BigDataAnalytics

    Source: Google Images

    DATAFORDEVELOPMENT

  • Appendix: BigDataAnalytics

    Source:ADB’sKeyIndicatorsforAsiaandthePacific2016

    DATAFORDEVELOPMENT

  • Appendix:BigDataAnalytics

    Source:ADB’sKeyIndicatorsforAsiaandthePacific2016

    https://blogs.adb.org/blog/how‐nighttime‐lights‐help‐us‐study‐development‐indicators

    DATAFORDEVELOPMENT

  • Appendix: BigDataAnalytics

    Source:www.unglobalpulse.org

    DATAFORDEVELOPMENT

  • Appendix:BigDataAnalytics

    DATAFORDEVELOPMENT

  • Appendix:BigDataAnalytics

    Source: (Science) – Combining satellite imagery and machine learning to predict poverty

    DATAFORDEVELOPMENT

  • Appendix:BigDataAnalytics

    DATAFORDEVELOPMENT

  • Appendix:BigDataAnalytics

    DATAFORDEVELOPMENT