Research and Training on Big Data · 2018. 3. 15. · Email: [email protected]. Innovative Data...
Transcript of Research and Training on Big Data · 2018. 3. 15. · Email: [email protected]. Innovative Data...
-
ResearchandTrainingonBigData
SeminaronStatisticalCapacityBuildingforNewDataSources
KeioPlazaHotel,Tokyo,Japan8December2017
KaushalJoshiAsianDevelopmentBank
-
Outline
• ConventionalvsBigDatasources
• ADB’sInnovativeDataCollectionforAgricultureandRuralStatistics
• ADB’sforthcomingDataforDevelopmentInitiative
• ConcludingObservations
-
ConventionalDataSourcesofOfficialStatistics• SURVEYS,• CENSUSES,• ADMINISTRATIVEREGISTERS.
InnovativeDataSources• SATELLITEIMAGES,• MOBILEPHONERECORDS,• SENSORSANDSCANNERDATA,• SOCIALMEDIADATA,etc.
ConventionalvsBigDatasources
-
SDGs callfornooneisleftbehind.
-
Leavenoonebehindprinciplerequires
GRANULARDATA• incomeclass• populationsubgroups• gender• ethnicity• geographiclocation• migrationstatus• disabilitystatus• etc.
ConventionalvsBigData
-
ConventionalvsBigData
• Limitationsofsamplesurveys• Increasingcoststocollectandanalyze• Potentiallossofquality• Pressuretocollectmoreinformation• Responseburden• Politicsof/overdata• Transparency,etc
ChallengesforDatadisaggregationfromtraditionalsources
-
• Source of Funds: Japan Fund for Poverty Reduction• Pilot Countries: Lao PDR, Philippines, Thailand andViet Nam
• Implementation Period: June 2013 to October 2017• Objectives:
• Development of customized software applicationsand methodology to estimate paddy rice cultivationarea and crop production using satellite data,
• Training of counterpart staff in the four pilotcountries, and
• Development of an online training program on theuse of satellite data for agricultural and ruralstatistics.
InnovativeDataCollectionforAgricultureandRuralDevelopment ‐ R‐CDTA8369
-
• DevelopedcustomizedversionsofINternationalAsianHarvestmOnitoring systemforRice(INAHOR‐AD)
• TrainedstafffromLaoPDR,thePhilippines,Thailand,andVietNam• basicremotesensing,useofINAHOR‐AD
software,useofQGIS,cropcutting,farmerrecallsurvey,and
• geospatialtechnologies(e.g.SNAP)andcomputer‐assistedpersonalinterviewing(e.g.SurveySolutions)
InnovativeDataCollectionforAgricultureandRuralDevelopment ‐ R‐CDTA8369
-
• DevelopedanonlinetrainingconEstimatingRicePaddyExtentandProductionwithALOS‐2/PALSAR‐2andINAHOR‐AD
• Promotionalvideoforthecoursehttps://youtu.be/SSwg000ooHc
• Linkforthecourse:http://adbx.online/
InnovativeDataCollectionforAgricultureandRuralDevelopment ‐ R‐CDTA8369
-
MethodologicalResearch‐ 1UsingAreaFrameforPaddyRiceStatistics:MethodologyandWeightingProcedures,ResultsofSurveyEstimatesandSamplingErrors.
• Areaframeapproachinconjunctionwithcropcuttingtechniqueisusedtoestimatepaddyricearea,yield,andproductionforthe2015croppingseason(July2015–November2015)intheprovincesofSavannakhet,LaoPDR;AngThong,Thailand;andThaiBinh,VietNam.
• Resultsobtainedarecomparedwithexistingadministrativedatasources.Significantdeviationforriceareabetweenthetwoestimates.YieldestimatesaresimilarforbothmethodsinallcountriesexceptinLaoPDR.
InnovativeDataCollectionforAgricultureandRuralDevelopment
-
MethodologicalResearch‐ 2LandMeasurementBias:ComparisonsfromGlobalPositioningSystem,Self‐Reports,andSatelliteData
• ThisresearchlooksatdifferencesinfarmerreportedareaversusGPS(goldstandard)andGoogleEarthforagriculturalplotarea.
• Farmerreportedplotareaestimatesarefoundtobestatisticallydifferentwhencomparedwiththetwomethodsinthreeoutoffourcountries.
• GoogleEarthperformsjustaswellasGPS(nostatisticallysignificantdifferences).
InnovativeDataCollectionforAgricultureandRuralDevelopment
-
MeasuringRiceYieldfromSpace
• ComparedareaandyieldestimatesbetweenALOS‐2satelliteandLandsat‐MODISfusiondata.
• ALOS‐2satelliteandLandsat‐MODISfusiondataareequallyefficientforpaddyriceareaestimation,butLandsat‐MODISprovidesbetterresultsforcropyieldestimation.
• Grounddata(cropcutting)andfusionLandsat‐MODISdatausedtocreateaspatiallydelineatedriceyieldmapforThaiBinh provinceinVietNamtopermitspatialanalysis.
InnovativeDataCollectionforAgricultureandRuralDevelopment
MethodologicalResearch‐ 3
-
InnovativeDataCollectionforAgricultureandRuralDevelopment ‐ R‐CDTA8369
-
TA9356‐ADB’sDataforDevelopmentProject aimstostrengthen thecapacityofNSOsmeetthedisaggregateddatarequirementsoftheSDGs.
-
TargetOutputs• TrainingworkshopsonSAEandBigDataanalyticsfortargetedtoNSOstaff
• TrainingManualonDisaggregationofOfficialStatisticsandSDGs
• OnlineCourseModulesonSAEandBigDataAnalytics
• Country‐SpecificCaseStudiesonSAEandBigDataAnalytics
TA9356‐REG:DataforDevelopment
-
Country‐SpecificCaseStudies
• Explorethepotentialofusingbigdataasanalternativesource.
• Facilitatecomparisonofestimatedindicatorsbasedonmethodsusingtraditionaldatasourcesandbigdatacomplementedtechniques.
• HelpNSOsidentifytheiroperationalresourcerequirementsinintegratingbigdataanalyticsintheirworkprograms.
TA9356‐REG:DataforDevelopment
-
TAfocusareascancapitalizeonthefollowinginnovativedatasources
• Satelliteimages• Publiclyaccessible• Hasvariousapplications• Developedmethodsforestimatesalready
existing
• Mobilephonecalldetailrecords(CDRs)
TA9356‐REG:DataforDevelopment
-
FocusofCountry‐specificCaseStudies
• PopulationMapping
• PovertyMapping
TA9356‐REG:DataforDevelopment
-
Populationmappingexample:(Top‐left)Populationdensityfromcensusdataforeachadministrativelevel2unitinanareaofnorthernVietnam,(Top‐right)Landcoverdatasetforthesamearea,(Bottom‐left)Satelliteimageoftheareaatnight,(Bottom‐right)WorldPoppopulationmodellingmethodstakethecensusdataasinput,thenusemachinelearningmethodstoexploittherelationshipbetweenpopulationdensityandhighresolutionlandscapefeatures,suchasthosefromlandcoverandsatellitedata,topredictpopulationdensitiesforeach100x100mgridcellonthelandscape.Source: http://www.worldpop.org.uk/about_our_work/case_studies/
TA9356‐REG:DataforDevelopment
-
Source: https://unstats.un.org/unsd/bigdata/taskteams/si‐gsd/default.asp
TA9356‐REG:DataforDevelopmentBigdata– UNGlobalWorkingGroup
-
• Big data with advantages of timeliness and geo‐spatial details offers immense potential in generating quick and more granular estimates.
• Methods and results from big data applications however, need to be tested and cross validated with traditional surveys results for robustness.
• Proxy indicators correlated with traditional indicators (like night time lights) provide opportunities to generate more frequent estimates and can complement traditional databased estimates for early estimates and forecasting trends.
ConcludingObservations
-
• Big data pose challenges ‐ privacy issues, costs, sharing of data by holders of big data, capacity to use.
• While big data should be embraced, but traditional sources will remain important.
• More methodological research needed to adopt big data in official statistics.
ConcludingObservations
-
Thankyou!Email:[email protected]
-
InnovativeDataCollectionforAgricultureandRuralDevelopment
SATELLITE SOURCE SPATIALRESOLUTIONTEMPORALRESOLUTION COST
SENSORTYPE
MODIS NASA
1km/500m/250m 1‐2days FREE Optical
LandsatUSGS/NASA 30m 16days FREE Optical
ALOS‐2 JAXA 100m 14days Paid SAR
Sentinel‐2 ESA 10m 5days FREE Optical
-
Appendix: BigDataAnalytics
Source: Google Images
DATAFORDEVELOPMENT
-
Appendix: BigDataAnalytics
Source:ADB’sKeyIndicatorsforAsiaandthePacific2016
DATAFORDEVELOPMENT
-
Appendix:BigDataAnalytics
Source:ADB’sKeyIndicatorsforAsiaandthePacific2016
https://blogs.adb.org/blog/how‐nighttime‐lights‐help‐us‐study‐development‐indicators
DATAFORDEVELOPMENT
-
Appendix: BigDataAnalytics
Source:www.unglobalpulse.org
DATAFORDEVELOPMENT
-
Appendix:BigDataAnalytics
DATAFORDEVELOPMENT
-
Appendix:BigDataAnalytics
Source: (Science) – Combining satellite imagery and machine learning to predict poverty
DATAFORDEVELOPMENT
-
Appendix:BigDataAnalytics
DATAFORDEVELOPMENT
-
Appendix:BigDataAnalytics
DATAFORDEVELOPMENT