1 1 Immigration Immigration Immigration. 2 Test Your Immigration Knowledge.
Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration...
Transcript of Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration...
EmergingUsesofBigDatainImmigrationResearch
FinalReportSubmittedtoSSHRCOctober13th,2016
GrantNo.421-2015-2036
WilliamAshton,PallabiBhattacharyya,EleniGalatsanou,SallyOgoeandLoriWilkinson1
Thisresearchwassupportedbythe
SocialSciencesandHumanitiesResearchCouncilofCanada.
1Thenamesarewritteninalphabeticalorderanddonotreflectorderofauthorship.SpecialthankstoKimLemkyforherinputwithproposaldevelopment.
1
TableofContentsKeymessages...............................................................................................................................................................2
Executivesummary......................................................................................................................................................3
1.Introduction.............................................................................................................................................................4
2.Context–theissue..................................................................................................................................................5
3.ImplicationsandPotentialUsesofbigdatainImmigrationResearch....................................................................7
4.Methodology...........................................................................................................................................................8
5.Results.....................................................................................................................................................................9
5.1.Howhas“bigdata”beenusedinimmigrationstudies?...................................................................................9
5.2.Challengesandlimitationsofusingbigdatainimmigrationstudies.............................................................15
5.3.Opportunitiesforimmigrationresearchwithbigdata...................................................................................17
6.KnowledgeMobilization........................................................................................................................................18
6.1.Integratedknowledgemobilization................................................................................................................18
6.2.End-of-Grantandpost-grantknowledgemobilization...................................................................................18
7.References.............................................................................................................................................................19
2
KeymessagesHowhasbigdatabeenusedinimmigrationstudies.
§ Thereisanincreasingtrendinutilizingbigdatainimmigrationstudiesfrom2007onwards,withmoreandmorepublishedstudiesusingbigdatathaneverbefore.ThemajorityofstudiesarepublishedinjournalarticleswhileLabormarketandSocialIntegrationarethetwomainresearchthemesthatdominatetheliterature.
§ Closeto60%ofimmigrationstudiesuseStatisticsCanadadatafortheirresearch,eitherintheformoftheCensus,LongitudinalandothersurveysbutalsolinkedbyStatisticsCanadadatabases.TheCanadianCensusandtheLongitudinalSurveyofImmigrantstoCanada(LSIC)areutilizedthemostamongthelargedatabases.
Challengeswithusingbigdatainimmigrationstudies
§ Bigdatasets,despitetheirlargenumberofparticipantsandvariables,haveissuesofpopulationunderrepresentation,especiallyinadministrativedataandoftencontributetoinaccurateandunsettledrelationshipbetweenselectedvariables.Largedatasetsoftenlackspecificvariablesoradditionalexplanationonvariablesandimmigrationresearchersrelyonassumptionsandproxiestocompletetheirstudies.
§ Methodologicalchallengesandchallengeswithdatamanipulationandvalidationinfluencesthekindofstudiesresearcherscandoandthekindofquestionstheycanask.Storage,management,andsharingofbigdataremainachallengeforallusersofbigdata.Withoutauniversity-affiliation,mostofthisdataisoutofthereachofnon-governmentsettlementorganizations,groupsthatcancriticallyanalyzethisinformationforpracticalpurposes.
§ Thereareethicalissueswiththeuseoflinkeddataandwithadministrativedataandmanyofthelargedatasetsdidnotseekethicalapprovalbyparticipants.Credibilityofresultsisamajorissuesincedatafromadministrativesources,collectedfromsocialmedia,internetandonlinecommunicationsourcesaresubjecttobias.
Opportunities
§ Bigdatacanproveverybeneficialforsettlementandcommunityorganizationsfordesigningandimplementationofpoliciesandprograms,provideditistimelyreleasedandeasytoaccess.Researchersshouldbeencouragedtopartnerwithinstitutionsandorganizationstoexaminequestionsrelevanttothepracticalneedsofthesettlementsector.
§ Greatknowledgecanbegainedfromexaminingdatafromonlinesources(largedatasets,socialmedia,webloginhistory,IPaddressesandothers)andcanprovidenewopportunitiesforimmigrationstudies.Addressinginherentrisksandtrainmoreresearchersinstatisticalmodelinganddataanalysisaresomemandatorystepsinorderforimmigrationresearchtoutilizeallformsofbigdata.
§ Bytheyear2018,newdevelopmentsconcerninglinkedadministrativeandsurveydataareexpectedtobereleasedandwillgiveadditionalopportunitiesforfurtherresearchonavarietyofdifferenttopics.
3
ExecutivesummaryThisprojectseekstounderstandtheuse,challenges,andopportunitiesofbigdatasetsinimmigrationresearch.Bigdataarelargesetsofdatawhosesizeisbeyondtheabilityoftypicalsoftwaretoolstocapture,store,manage,andanalyze.Thedatasetscomefromgovernmentsurveys,fromgeolocationlikeFacebooktags,anddigitaltransactionlikefinancialservices.Attheoutsetofthisproject,itisnotclearhoworifbigdataarebeingusedbyresearcherstodigdeeperandmorecomprehensivelyaboutthelivesofimmigrants.Suchfindingscaninformimmigrationsettlementservices,businessproductsandservicesofferedtoimmigrants,alongwithgovernmentpoliciesandprograms.Ascopingreviewmethodisolatedkeythemesembeddedwithinalargebodyofimmigrationliterature.Theestablishsix-stageframeworkbyLevac,Colquuhoum,andO'Brien(2010)enabledarigorousapproachthatidentified453papers,andafterconfirminghighlevelsofinter-raterreliability,therewere251immigrationstudies(1994-2016)usedinthisknowledgesynthesisproject.About60%werejournalarticles,withanother20%asreports,andthebalancefromgreyliteratureandtheses.Thepapersexaminedfivethemes,yetdominatedbylabourmarket(57%)andsocialintegration(25%)papers.Thestudiesused15databases,with60%ofthemdrawingfromStatisticsCanadaand65%usedonedatasetand15%utilizedmorethanthreesets.Authorsreportedmanychallengesthatweregroupedunderfivetopics,includingsamplingissues,lowresponserates,invalidatedvariables,access,andconfidentiality.Theauthorsidentifiedopportunitiesaswell,rangingfromtheneedfordiverseknowledgeondatamanipulationgiventhecomplexityandgrowingnatureofsomedatasets,todesignexperimentsonunprecedentedscale,skillsdevelopmentofresearcherstousedatasets,linkingdatabasestoexaminedifferentaspectsofimmigrants'lives,andneedtoadvancehardwareandsoftwarealongwithtoolsandtechniquesofdatamining.Bigdatahasgreatpotentialinimmigrationresearch.Asmentionedearlier,geocodeddatacanhelpaidorganizationsidentifywherepeopleinneedmightbelocated,butalsowhattheymightneed!TheSyrianHumanitarianTrackerhasbeenusedbythousandsofinternallydisplacedpeopleandasylumseekersabroadtoidentifysafetravelroutesandtoavoidhumantraffickers.ThiskindofdatahasbeenusedinNorthAmericatoproviderealtimealerts,withinformation,aboutmissingchildren(e.g.,AmberAlertsinCanada).Themonitoringofsocialmediadatabynon-governmentaidagenciescanhelpthembetterdeterminehowmanypeoplemightbewaitingtoaccessEnglishorFrenchlanguagecourses.Canadahasrecentlyprovidedextensiveandalmoston-demanddataontheSyrianrefugeearrivals.ThroughImmigrationRefugeeandCitizenshipCanada’sonlinedatabase,thepubliccannowviewmapswhichareuploadedfromtheirwebsiteandprovidegeographicallylinkedinformationonthenumberandlocationofresettledSyrianrefugeesinCanada.ThegovernmentofCanada’s“OpenData”initiativeprovidessomedataoncitizenshipuptake,internationalstudents,temporaryforeignworkpermitholdersandotherimmigrationrelatedissues.Thesearestatictables,butprovideadditionalinformationforuserswhichwerenotpreviouslymadeavailable.StatisticsCanadaalsoprovidestabularinformationbasedonresultsfromtheNationalCensusthatallowsitevisitorstoidentifyvarioustrendsinimmigrantsandlabourmarketidentifiers,ethnicidentityandsecond-generationandlanguageuseamongnewcomers.
4
StatisticsCanada,togetherwithImmigrationRefugeesandCitizenshipCanada,havebeenworkingtoprovideresearcherswithevenmoredataopportunities.ThisisinadditiontothewealthofdataStatisticsCanadaprovidesinbothPublicUseMicrodatafileswhichnoviceandintermediatestatisticianscanusetoproducetheirownstatisticalestimatesandequations(e.g.,EthnicDiversitySurvey,LongitudinalSurveyofImmigrantstoCanada,etc.).Moreadvanceddatausersaffiliatedwithuniversitiescanaccessthemasterdatafilesofoverthreedozencurrentsurveys.Thisallowsuserstolookatsmall-scaletrendsandpatternsindatathatcannotbereleasedtothegeneralpublic.Althoughitispossibleforindependentresearcherstogainaccesstoadministrative-leveldatasuchasthemasterdatafileforIMDBwhichishousedinOttawa,theseopportunitiesaregiventoonlyafewselectresearchers.Therehavebeensomeveryinnovativeandrecentusesofthedata,suchastheGISMappingProject(Garceaetal.,2016)whichusesdatafromtheImmigrantLandingFileandIMDBtogeo-codelandingsandotherrecordsamongimmigrantsandrefugeestoCanada’swesternandnorthernregions.
1.IntroductionWhatdobigdataandrefugeemovementshaveincommon?Turnsout,alotactually.Thepast18monthshavebeenabooninthespeed,volumeandtypeofdatabeingsharedbothpubliclyandbygovernmentsconcerningrefugeemovementsinEurope,AfricaandtheMiddleEast.Large-scalemigrationfromSyria,althoughoverfiveyearsold,reallybeganinearnestinSpring2015whenlargenumbersofasylumseekersbravedthe4KMjourneybetweenBodrumTurkeyandKosGreeceinricketyplasticdinghiesandbrokendownovercrowdedfishingboats.ThesigningoftheEU-TurkeyRepatriationofRefugeesAgreementinlate2015wassupposedtoend,oratleastgreatlydecrease,thenumberofwould-berefugeesmakingtheperilousjourneyacrosstheMediterraneanOcean.Instead,itisestimatedthatbyearlyDecember2016,over170,000newasylumseekerswillarriveontheshoresofItaly,surpassingthenumbersarrivinginGreece(IOM,2016).Howdoweknowthis?TheIOMreleasesweeklyfiguresofarrivals,deportationsandresettlementofrefugeesinEurope.Theydependondatasharingagreementswiththevariouscountriesandthentabulateanddistributethisinformationtoanetworkofpolicyanalysts,researchersandnon-governmentalorganizationsonaweeklybasis.TheCanadiangovernmentdidthesame,reportingweeklyonImmigrationRefugeeandCitizenshipCanada’s(IRCC)websiteonthenumbersandlocationsofthenewlyarrivedSyrianstoCanada.Itisthefirsttimeinhistorythatthislevelofdetailed,geographicallyconnecteddatawaswidelydistributedandpubliclyavailableinsuchashortperiodoftime.BoththeIOMandIRCCreportthattraffictothesewebsiteshasincreaseddramaticallyinthepastyearasthethirstforknowledgeanddataaboutthenewlyarrivingrefugeesincreased.Theserefugeemovements,alongwiththevariousgovernmentandnon-governmentinterventionstocountandaccountforrefugeemovement,gotourteamthinkingabouttheextenttowhichbigdatahasbeenusedinimmigrationresearchinCanada.Ourprojectaimstoidentifystudiesusingbigdataandtooffersomeinsighttothetypeofdatatheyuse,thetopicstheypursueandwherethesestudiesmightbepublished.Themovementtouse‘bigdata’inallsortsofresearchanddecisionmakingispartofagrowingtrendthattakesadvantageofthelargeamountofdatawegenerateashumansinthe21stcentury.In2015,“weproducedasmuchdataaswascreatedinallpreviousyearsofhumancivilization”(RattiandHelbing,2016:1).ThisdatahasbeenusedtoharnessthemovementandactivitiesoftheSyrianpeopleastheyfleetheconflictthatmirestheircountry.TheSyriaTrackerCrisisMap(HumanitarianTracker,2016),isanappcreatedbydisplaced
5
SyriansintendedtoprovideinformationforSyriansontherun(Rathnam,2015).ThedatacomeslargelyfromsocialmediaplatformssuchasTwitter,FacebookandSkype,alongwithmediareportstoproduceinformationwhichispostedtothesiteandintendedtoassistNGOsandotherorganizationswiththeneedsofdisplacedSyrians.Ithashelpedidentifygeographicregionswherediseaseamongstthedisplacedpeoplemightoccurnext,assistedinthepreventionofhumantrafficking,andiscreditedwithpredictingwhenthenextattackwilloccur(Rathnam,2015).WewonderediftheseinitiativesweretakingplaceintheCanadianresearchcontext.Thisreportbeginswithadiscussionofthecontextofbigdataandresearchingeneral.Theimplicationsofusingthiskindofdata,alongwiththemethodologyofthestudyfollow.Sectionfourdetailsthequantitativefindingsofourstudy,whilesections5and6detailtheknowledgemobilizationplansandsomeconcludingcomments.
2.Context–theissueTraditionally,bigdatahasbeendefinedasmega-sizedsetsofinformationthataretoolargeforcommondataprocessingsoftwaretohandle.Notsolongago,bigdatawouldhaveincludedthepublicusemicrodatafiles(PUMF)oftheCensusofCanada.Backinthemid-1990s,microcomputerpowerwassounder-developedthatPUMFfileshadtobehand-loadedontoamicroprocessorandtimetoanalyzetheinformationhadtobe‘booked’onaninstitution’smainframecomputer.Inthatage,acensusfileofhalfamillionpeoplewouldtakehourstocomputeeventhesimplestcross-tabulationsbecausecomputerprocessingpowerwasweak.Bythestartofthe21stcentury,microprocessorsbecamesmallerandfasterandourabilitytostoredatadigitally,insteadofbyanalog,hasmeantanexplosionindatastoring,sharingandcomputingcapabilities.Andbecausethecostofowningacomputerandstoringlargeamountsofdatahavegreatlydeclined,almostanyonewithdatabaseandstatisticalknowledgecanworkwithlargedatasets.Now,researcherscandownloadandanalyzecensusandothersimilardataontheirownandcanmanipulatethedatainmeresecondstoproducesimilar(andoftenbetter)andfasterresultsthaninpreviousyears.Alongsidethedevelopmentofbetterpersonalcomputerprocessors,companies,governmentsandotherorganizationsbegantodiscoverthatvalidpredictionsandtrendscouldbecalculatedbyexaminingthedatatheyalreadycollect.Andwecollectalotofdata.It’sbeenestimatedthat2.5exabytesofnewdataaregeneratedworldwideonadailybasis(HilbertandLópez,2011).This,inadditiontothedatacollectingpowerofvarioussoftwareprogramsdevelopedtogleaninformationfromtheinternethasmeantthatithasbecomebotheasierthanevertoconductdataanalyses,butalsomeantaproliferationindatacollectionactivitiesincludingsocialmediaandinternettrafficmonitoring,administrativedatamining,andlinkingdatabetweendatasets,theextentofwhichmanyoftheseactivitiesthepublicareunaware.Dataisn’tjustlimitedtotheaffluentorindevelopednations.Nearly3.5billionpeoplehavesomeaccesstotheInternet,representing40%oftheworld’spopulationandalmosthalfofthemarecurrentlylivinginAsia(InternetLiveStatistics,2016).AccordingtoICT(2016)thereareover7billionmobilephonesubscriptionsrepresentingabout60%oftheworld’spopulation.Thismeansagrowingnumberofpeople,regardlessoftheirlivingsituation,havesomeaccesstoelectroniccommunication,atypeofdatathatiseasilycollectedandmonitored.Thistypeofdataissometimesminedinbigdataprojectssoourabilitytomeasurecertainfeaturesoflifeandcertainlyaspectsofonlineconnectivityhavegreatlycontributedtothebigdatamovement.Thisdataisparticularlyimportantintermsofaddingageographic
6
dimension—wherepeoplearemakinginquiriesaboutinformationisaveryimportantindicatorforseveralsocialandhealthproblems.Internetsearchesfor‘flu’helphealthresearchersidentifypotentialoutbreaksandclustersbeforetheyarereportedtohealthauthorities.TheSyrianHumanitarianTrackermentionedaboveisanotherexampleofgeo-codeddataused.Governmentsareincreasinglyusingtheiradministrativedatabasestotracktrendsamongitscitizensandtomonitorcertainaspectsoftheirlives.Thisadministrativedataisadifferenttypeofinformation,noless‘messy’thanthedataminedfrominternetsources.Governmenttaxrecords,provincialhealthdatabasesandcriminaljusticeadministrativedatahaveallbeenusedtoproducereports,informpolicyandprovidemeaningful,large-scaleinformationtogovernmentoffices.Historically,thisdatahasonlybeenavailabletogovernmentemployees.Outsiders,suchasNGOs,independentresearchersandacademicswerelargelypreventedfromaccessingtheseinterestingdatasourcesduetoprivacyconcerns.Today,however,governmentsareincreasinglyconnectingwithacademicsanduniversitiestoanalyzethesedata.TwoexamplesaretheManitobaCentreforHealthPolicyResearchandPolicywiseforChildrenandFamiliesinAlbertaaretwoinitiativeswhereindependentresearchersroutinelygainaccesstovariousadministrativedatabaseswiththeexpresspurposeofproducinganalysesandreportsthatinformgovernmentpolicyanddecisionmaking.Thistypeofdataisalsoassociatedwiththebigdatamovementbecauseofthesizeandeverchangingnatureoftheinformationheldwithinthedatabase.Lesscommonareother,usuallyabitsmaller,formsofbigdatathatrelyondatalinkage.Datalinkagereferstotheconnectionoftwoormoreunrelatedsetsofdata.Intheimmigrationfield,themostwidelyusedandwellknownlinkeddataistheIMDBorImmigrantDatabase.ItlinkstheImmigrantLandingFileandtheCanadianTaxFilesforallimmigrantswhohavelandedinCanadasince1980.Thedataispreparedasa‘cube’meaningthatitislessflexiblethanotherdataasusersareusuallyconstrainedtochangingtheconditionsofthreeorfourvariablesatatimeandthatstatisticalanalysisisnotlargelypossible.Wecanthinkofthesekindsofdataasmorestatic.Inanattempttomakethemmoreuserfriendly,administratorshavemountedtheminplatformssuchas2020IvisionBrowserwhichallowuserstocreatetheirowntablesbymanipulatingaseriesofrowsandcolumns.Sadly,theseformatstendtobeclumsyandfrustratingfornoviceusers.Additionally,theselargedatabasesaresourcesofpotentialidentitydisclosure,soonlyafewvettedandpre-approvedresearchersandgovernmentemployeeseverhaveachancetousetheseverycomplexbutrichdata.Itisthislatestformofdata,linkeddatabasesthathaveproducedthemostrecentandvoluminousimmigrationresearchinCanada.“Bigdatareferstodatasetswhosesizeisbeyondtheabilityoftypicaldatabasesoftwaretoolstocapture,store,manage,andanalyze”(Manyikaetal.,2011:1).Butwhatistypicaldatabasesoftware?Althoughmanywell-knownandcommonlyusedstatisticalpackagesdohavelimitsastothenumberofgigabytesorterabytestheycanprocess,theircomputinglimitationsdon’tseemtopreventresearchersandstatisticiansfromharnessingthevastinformationcontainedwithinthem.TheOECD(2013)hascharacterizedbigdataashavingthesethreeattributes:
• Volume:theamountofdatawithinadatabase“challengesthecapacitiesoftraditionalsoftwareandstorage”
7
• Velocity:datacanbedistributed,disseminatedandanalyzedwithincreasingspeed.Thegoalistoprocessdatainrealtime
• Variety:datathatcomesfromavarietyofdifferentdatasources,preferablytriangulated
Thereareother‘Vs’thathavebeenaddedbyvariousresearchersincludingvalue,whichreferstotheeconomicandsocialusesofdata(LaczkoandRango,2014).OtherVsidentifiedincludevalue,visualization,veracityandvariability(vanRijmenam,2014).AsMcNulty(2014)observes,itisthe3Vsthatremainthemostwidelyusedtocategorizedataandarethecategorieswerelyuponinthisreport.
3.ImplicationsandPotentialUsesofbigdatainImmigrationResearchBigdatahasgreatpotentialinimmigrationresearch.Asmentionedearlier,geocodeddatacanhelpaidorganizationsidentifywherepeopleinneedmightbelocated,butalsowhattheymightneed!TheSyrianHumanitarianTrackerhasbeenusedbythousandsofinternallydisplacedpeopleandasylumseekersabroadtoidentifysafetravelroutesandtoavoidhumantraffickers.ThiskindofdatahasbeenusedinNorthAmericatoproviderealtimealerts,withinformation,aboutmissingchildren(e.g.,AmberAlertsinCanada).Themonitoringofsocialmediadatabynon-governmentaidagenciescanhelpthembetterdeterminehowmanypeoplemightbewaitingtoaccessEnglishorFrenchlanguagecourses.Canadahasrecentlyprovidedextensiveandalmoston-demanddataontheSyrianrefugeearrivals.ThroughImmigrationRefugeeandCitizenshipCanada’sonlinedatabase,thepubliccannowviewmapswhichareuploadedfromtheirwebsiteandprovidegeographicallylinkedinformationonthenumberandlocationofresettledSyrianrefugeesinCanada.ThegovernmentofCanada’s“OpenData”initiativeprovidessomedataoncitizenshipuptake,internationalstudents,temporaryforeignworkpermitholdersandotherimmigrationrelatedissues.Thesearestatictables,butprovideadditionalinformationforuserswhichwerenotpreviouslymadeavailable.StatisticsCanadaalsoprovidestabularinformationbasedonresultsfromtheNationalCensusthatallowsitevisitorstoidentifyvarioustrendsinimmigrantsandlabourmarketidentifiers,ethnicidentityandsecond-generationandlanguageuseamongnewcomers.StatisticsCanada,togetherwithImmigrationRefugeesandCitizenshipCanada,havebeenworkingtoprovideresearcherswithevenmoredataopportunities.ThisisinadditiontothewealthofdataStatisticsCanadaprovidesinbothPublicUseMicrodatafileswhichnoviceandintermediatestatisticianscanusetoproducetheirownstatisticalestimatesandequations(e.g.,EthnicDiversitySurvey,LongitudinalSurveyofImmigrantstoCanada,etc.).Moreadvanceddatausersaffiliatedwithuniversitiescanaccessthemasterdatafilesofoverthreedozencurrentsurveys.Thisallowsuserstolookatsmall-scaletrendsandpatternsindatathatcannotbereleasedtothegeneralpublic.Althoughitispossibleforindependentresearcherstogainaccesstoadministrative-leveldatasuchasthemasterdatafileforIMDBwhichishousedinOttawa,theseopportunitiesaregiventoonlyafewselectresearchers.Therehavebeensomeveryinnovativeandrecentusesofthedata,suchastheGISMappingProject(Garceaetal.,2016)whichusesdatafromtheImmigrantLandingFileandIMDBtogeo-codelandingsandotherrecordsamongimmigrantsandrefugeestoCanada’swesternandnorthernregions.
8
Insummary,therearemanypossibilitiesforusingthisdataandnon-governmentagencies,researchersandpolicyanalystsaremakingincreasinguseofthistimelyandimportantdata.Thisreportoutlinessomeoftheoutcomesofthisresearch.
4.MethodologyThisprojectseekstounderstandtheuseandchallengesofbigdatainimmigrationresearch.Ascopingreviewmethodservedtoisolatekeythemesembeddedwithinsubjectareaswhichenabledapreliminarysummaryofalargebodyofimmigrationliterature.Givenashorttimeline,Levac,ColquhounandO’Brien’s(2010)six-stageframeworkprovidedarigorousmethodformappingliterature.Steponeofthescopingreviewisdeterminingthesearchtermstobeusedandinourcase,thedatabasesmostcommonlyused.Weusedseveralsearchtermsincluding:administrativedata,refugees,mentalhealth,evidencebasedresearch,datadrivenresearch,bigdataimmigration,labourmarketincomeandjobs,amongothers.Wealsosearchedformajordatabaseswhichwewereawareof.Theseinclude:bigdata,LongitudinalSurveyofImmigrantstoCanada(LSIC),LandedImmigrantsDatafile(LIDS),ImmigrantLandingFile(ILF),ImmigrantDatabase(IMDB),CensusofCanada,NationalHouseholdSurvey(NHS),andCanadianCommunityHealthSurvey(CCHS).Thesecondstageidentifiedarticlesfromjournalsaswellasotherrelevantnon-indexedliteratureincludinggovernmentreports,OECDreports,andunpublishedstudies.Thesesearchtermswereusedtosearchuniversitylibrarywebsitesanddatabases,suchasEBSCOhostthatindexesandabstractsover3,000periodicals,andfulltextofover1,500periodicals.Manyoftheseperiodicaljournalsarepeer-reviewed.Inaddition,librariansfromtheUniversityofManitobaandBrandonUniversityreviewedandtheirsuggestionswereusedtoassistusinthesearchprocedures.Thissearchresultedinadatapoolwiththerelevantliteratureextracted.Thethirdstageofthestudyinvolvedtworesearchersindependentlyreviewingtheabstracts,researchthemes,methodologyandstudylimitationsforeacharticleandreport.Allthisinformationhelpedindecidingiftheiteminvolvedbigdataandhowtheyhavebeenusedinimmigrationstudies.Themappingofthedatawasdoneintwoseparatespreadsheetsforatotalof453studies.Bothreviewersindependentlyextractedtherequiredinformationandcodedtheliteratureas“include”,“exclude”or“uncertain”.Twenty-fourstudieswereduplicatesofoneanotherand127studieswereexcluded.Atotalof251studieseventuallymettheprojectcriteria:beingaboutatopicofimmigrationinCanadaandpossiblyinvolving‘bigdata’.Therewerealsoabout60generalresearchpapersandresearcharticlesthatexaminebigdataasaresearchmethodology,whichwereseparatelycodedforansweringfewofthestudyquestionsonchallengesandopportunities.Thefourthstageinvolvedconfirmingconsistencyofcodingthearticlesbetweenthetworesearchers.Tenrandomentriesweregiventoeachresearchassistanttorecodeasaprocessofcheckingvalidityandreliabilityincodingofdata(foratotalof20studiesrecoded).Aftersmallchangesweremadetoensureconsistencytheyproceededtocodeallarticles.Athirdresearcherscrutinizedthecodingresultsandresolvedanydiscrepancies.Thesechecksservedtostrengthenthereliabilityofthecoding.
9
Thefifthstageinvolvedcollatingandsummarizingresultswithadescriptivenumericanalysis(numberofdocuments,type,yearpublished,etc.)andathematicanalysis(e.g.identifythemesused).Reportingresultsincludedidentifyingimplicationsforresearch,policyandpractice.Thelaststage,consultation,wascompletedwiththeadvisorypanelforthepurposeofknowledgemobilization.Therewereinbetweenconsultationswiththeadvisorypanelforincludingtheirideasandsharingtheresearchprocessandoutcomesfromtimetotimeduringthewholetenureoftheproject.Therewerefewmethodologicalchallengeswhichtheresearchersfacedwhiledoingthescopingreview.Itwasdifficulttodecideatthebeginningifthearticlesinthesearchlistwereappropriatefortheintendedresearchtopic,especiallyifthedatathatwasusedwasreally‘bigdata’.Intheend,wedecidedtocodethedatasetsinawaythatallowedustoeasilyidentifytheadministrativelinkeddataandcouldseparatethosefromothertypesofdata(someofwhichisarguablynot‘bigdata’atall).Wefoundthatkeywordsthatauthorsusedtodescribetheirstudiesdidnotaccuratelyreflecttheactualcontentofthearticle.OursearchwasconductedusingEnglishtermsonly,whichbiasedoursearchresults.Giventhesixmonthtimeconstrainttocompletethisproject,weranoutoftimetoincludethevaluableFrenchlanguagepapers.Thiswaywemighthavemissedsomescopingreviewsintheavailableliterature.Itwasalsoverydifficulttodecideifwehaveincludedalltherelevantarticlesandtheonlywaywecouldassurethatwehavesearchedexhaustively,wasthroughrepeatedsearchleadingtosimilararticlesshowingupagainandagain.Oneoftheprincipleinvestigatorsassistedtheteambyidentifyingthenamesoflargedatasetsasawayoffilteringinandoutrelevantmaterial.Forthisreview,boththereviewershadtousetheirstrongjudgmenttodeterminethateachreviewasawholesufficientlymetourstudyhypothesisandresearchquestionsofscopingreview.Thismighthaveleftsomegapwithinthemethodologicalprocesssubjecttoreviewerbias.
5.Results
5.1.Howhas“bigdata”beenusedinimmigrationstudies?Accordingtothereport“bigdatainActionforDevelopment”,producedbytheWorldBankGroup(López.H,etal.2014.),therearetwobasicbigdataapproachesidentifiedinGlobalcontext.Amongthetwoapproaches,onewayofutilizingbigdatais“forprojectsorprocesseswhichseektoanalyzebehaviorsoutsideofgovernmentordevelopmentagenciesinordertoheightenawarenessandinformdecisionmaking”(López.H,etal.2014,Pg.11).Thesecondapproachhelpsinanalyzing“behaviorsinternaltoasingleinstitution,suchasthegovernment,oftentostreamlineandimproveservices.”(López.H,etal.2014,Pg.11).Generallyspeaking,ourreviewfindsthatmostofthedatausedinpublishedresearchonimmigrantsdealswithdatainthesecondapproach,governmentleveladministrativedata.Therewere,however,ahandfulofstudiesthatutilizednon-administrativedatatoexaminecertainaspectsofthemigrationexperience.Thereportonthe“BigdatainActionforDevelopment”(López.H,etal.2014),givesanaptexampleonhowtextanalysisofsocialmediadatahasbeenabletosuccessfullyidentifyproblemsofdifferentgroupsofpeoplesuchasrefugeesandalsomaphumanmigrationpatterns,utilizingBigDatafordevelopingactionprojectsinvolvingbothdatascientistsandpractitionerstogetherforsolvingdifferentsocialissues.Topicsuncoveredinourscopingreviewonimmigrationand‘bigdata’includelabourmarketoutcomes,culturalandethnicdiversity,healthassessments,andvariousaspectsrelatedtosocial
10
integration.Thefollowingchartsandtablesreflectsomeofthemajorthemesandobservationswemadeaswesurveyedtheexistingresearch.Themajorityofimmigrationstudiesutilizingbigdatawerepublishedafter2007,butthetrendappearstobeupward,withmoreandmorepublishedstudiesusingbigdatathaneverbefore.Figure1-NumberofBigDataStudiesPublishedbyYearofPublication
Figure1examinesthegrowingtrendofpublicationsinvolvingbigdataandimmigration.Approximately79%(199outof251)ofthebigdataresearchstudiesidentifiedhavebeenpublishedbetween2007and2016.In2010,thelargestnumberofstudiesinvolvingbigdatawerepublished,withatotalof41.After2010,about20studiesperyearinvolvedbigdata.Clearly,researchersandpolicymakersarebeginningtorelymoreonbigdatafortheirresearchinthefieldofimmigrationstudiesandthetrendseemstobecontinuing.Thisislikelyareflectionofbetteraccesstobigdataandresearcherswithbetter‘bigdata’skillsetsthaninpreviousyears.
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
0
5
10
15
20
25
30
35
40
45
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
NumberofBigDataStudiesPublishedbyYearofPublica9on
#studies Percentage Linear(#studies)
11
Themajorityofimmigrationresearchutilizingbigdataispublishedinacademicjournals.Figure2-Frequencydistributionofbigdatastudiesbypublicationtype
Mostofthebigdataimmigrationresearchwecouldlocatehasbeenpublishedasjournalarticles(60%).Another20%arereports,greyliterature(18%)ordoctoralormaster’stheses(2%)(seeFig.2).Outofthe251studies,150appearinpeer-reviewedacademicjournals.Another51studiesaregovernmentreportsand44appearinthegreyliterature(definedastechnicalandworkingpapers,statisticalreports,etc.).Labourmarketissuesdominatestudiesusingbigdata.Withinthereviewedliterature,thepredominantthemewasimmigrantlabourmarketoutcomeswith57%ofthereviewedpapers(144outof251).Thisisnotsurprisinggiventhatanexaminationofpublicationthemesofthemajorpeer-reviewedjournalsinimmigrationworldwiderevealsthatlabourmarketstudiesarethethemeofover60%ofallpublishedarticlesforthepast15years(Wilkinson,2015)andisatrendthathascontinuedsincethemid-1990s(Li,1997).Thenextmostpopulartopicwassocialintegrationwith25%(63outof251)ofthereviewedstudies.Health,ethnicandculturaldiversityandothersmakeuptheremaining18%ofthearticlesusingbigdata.Figure3presentsthemajorthemesandsub-topicsofimmigrationstudiesthatusebigdata.Asmentionedabove,themajorityofstudiesutilizingbigdataintheirresearchhaveexaminedthefieldofimmigrantlabourmarketoutcomes.Humancapital,labourmarketdiscrimination,Labourhealthandjobs,andincomearethetopicsmostdiscussed.Labourmarketintegrationisthemostpredominantsub-theme,notonlywithinthelabourmarketdomain,butinalloftheresearchweidentified(72.9%).Another26studieswerecategorizedbroadlyas“labourmarket”andincludedrelatedtopicssuchasImmigrantSelf-EmploymentandEntrepreneurship,ImmigrantsandtheDynamicsofHigh-WageJobs,Immigrantearningsandeconomicoutcomes.Withinthebroaderthemeof“socialintegration”,wefindsub-themessuchas“housing”,“immigrantattractionandretention”,“immigrationandeducation”,“integration”,“settlement”and“socialcapital”amongthemostpublishedtopics.A‘catch-all’categorycontainsthepapersthatcouldn’tbecategorizedin
60%20%
18% 2%
Frequencydistribu9onofbigdatastudiesbypublica9ontype
Ar]cle Report Grey Thesis
12
otherways.Itcontainsresearchfocusingonaspectsrelatedtomigrationandchildren,thenumberofimmigrantarrivals,andotherstudiesthatcouldn’tbecategorizedsuchasastudyon“racializedselectionofimmigrantsinCanada”andastudyon“Globalbankingandfinancialservicestoimmigrants”.Withinthehealththeme,physicalhealthwasmostpredominant,withmentalhealthand“theinfluenceoflanguageonvariousaspectsrelatedtohealthasmajorthemesofresearch.Therewere5studieswhichwerebasedon“multiculturalismanddiversity”includedwithinthebroadercategoryofdiversity.Figure3-Indepthexaminationofbigdataimmigrationtopics
Closeto60%oftheimmigrationstudiesutilizingbigdatauseStatisticsCanadasurveysastheirdatasource.InCanada,thefederalgovernmentfundsthebulkofsocialscienceresearch.Theyarealsothemaincollectorsofpublicusemicrodataintheformsofsurveys.Itisnotsurprisingthatgovernmentdatamakesupthemajorityofdatasourcesforimmigrationandbigdataresearch.Almostathird(29%)ofallthedatasourcesinvolvedsomelinkedgovernmentdata,usuallybetweenanadministrativedatabaseandasurvey.Anotherthirdusedlongitudinalsurveys.Figure4alsoshowsthatalmost60%ofthestudiesreviewedutilizeStatisticsCanadadata.Onlyninepercentofthestudiesusedataextractedfromindependentandnon-governmentsurveysandinternetsources.
5 8 2 4 6
26
2 2 3
105
2 518
311 15 20
4 10
MULTICULTURA
LISM
/DIVER
SITY
HEALTH
HEALTH
ANDLANGU
AGE
MEN
TALHE
ALTH
HUMAN
CAP
ITAL
LABO
URMAR
KET
LABO
URMAR
KETAN
DDISCRIMINAT
ION
LABO
URMAR
KETAN
DHE
ALTH
LABO
URMAR
KETAN
DINCO
ME
LABO
URMAR
KETINTEGR
ATION
IMMIGRA
NTCH
ILDR
EN
MIGRA
TIONFLO
WS
OTH
ER
HOUSING
IMMIGRA
NTAT
TRAC
TIONAND
RETENTION
IMMIGRA
TIONAND
EDUCA
TION
INTEGR
ATION
SETTLEMEN
T
SOCIALCAP
ITAL
DIVERSITY HEALTH LABORMARKET OTHERS SOCIALINTEGRATION
Indepthexamina9onofbigdataimmigra9ontopics
13
Figure4–Typesofdatasourcesusedinimmigrationresearch
TheCanadianCensusandtheLongitudinalSurveyofImmigrantstoCanada(LSIC)arethedatasetsmostutilizedbyresearchers.Figure5-NumberofDataSourcesperstudy
Figure5representsthenumberofbigdatasourcesusedinthereviewedstudies.Amongthe251studies,163(65%)usedonedataset,35(15%)usedtwodatasets,14(6%)usedthreedatasets
Sta]s]csCanada-LongitudinalSurvey
33%
Sta]s]csCanada-Census12%
Sta]s]csCanada-Survey11%Censuslinked
withSta]s]csCanada-Survey
2%
CensuslinkedwithSta]s]csCanada-LongitudinalSurvey
2%
LinkedDatafromGovernmentSources14%
LinkedDatafromGovernmentand
IndependentSources8%
CensuslinkedwithGovernment
Administra]veData3%
Other(mostlyindependentandnon-gov.surveys,internet
sources)9%
GovernmentAdministra]veData
6%
65%14%
6%15%
Fig.5NumberOfDataSources
1datasource 2datasources 3datasources >3datasources
14
whileanother39(15%)ofthestudiesusedmorethanthreedatasets.Insummary,bigdataisoftennotlimitedtotheexaminationofasingledatasource,whichmakesanalysismoredifficultbutmoreholistic.Figure6-MostPopularDatasetsused
Figure6showsthetypesofdatabasesusedbythestudiesincludedinourscopingreview.Studiesusingmorethanonedatabasewerecountedmorethanonce.TheCensusdatahasbeenusedin73studiesoutof251(29%)andisbyfar,themostciteddata.ThisisfollowedbytheLongitudinalSurveyofImmigrantstoCanada(LSIC),appearingin24%ofthestudiescapturedinthescopingreview.Thisisinteresting,giventhatthesurveyendeddatacollectionin2004butresearcherscontinuetopublishfromthewealthofthiswork.Therearedatabasesusinginternet-baseddata,independentsurveys,datafromothercountries(forcomparativepurposes)whichwerecategorizedas“others”andwerefoundinfiftythreestudiesReflectingthepreoccupationwithlabourmarketoutcomes,theSurveyofLabourandIncomeDynamics(SLID)(29studies),LabourForceSurvey(LFS)(15studies),WorkplaceandEmployeeSurvey(WES)(14studies),FinancialSecurity(SFS)(2studies),andSurveyofFamilyExpenditures(Famex),YouthinTransitionSurvey(YITS)arealsocitedassourcesofdata.
1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 5 5 7 8
14 15 15
24 29
32 53
59 73
0 10 20 30 40 50 60 70 80
SurveyofFamilyExpenditures(Famex)
LongitudinalSurveyofLow-IncomeStudents(L-SLIS)
Na]onalSurveyofWorkandLifelongLearning(WALL)
LabourMarketAc]vitySurvey(LMAS)
Interna]onalAdultLiteracyandSkillsSurvey(IALSS)
PanelStudyofIncomeDynamics(PSID)
Intergenera]onalIncomeDatabase(IID)
LongitudinalAdministra]veDatabank(LAD)
Na]onalPopula]onHealthSurvey(NPHS)
CanadianCommunityHealthSurvey(CCHS)
WorkplaceandEmployeeSurvey(WES)
LabourForceSurvey(LFS)
SurveyofLabourandIncomeDynamics(SLID)
Other(internetdata,independentsurveys,othersmaller
Census
MostPopularDatasetsused
15
5.2.ChallengesandlimitationsofusingbigdatainimmigrationstudiesAlthoughtherearenumerousbenefitstousingbigdata,weknowtherearevariouschallengesandlimitationsandtheresultsofthescopingreviewrevealedmany.Wehavesummarizedthemhereandaddedsomeofourownideas.Experts inbigdata talk about limitations regarding the sample selectionand samplingbiasbuiltintobigdata.Bigdatasets,despitetheirlargenumberofparticipantsandvariables,haveissuesofpopulation underrepresentation, especially in administrative data and often contribute toinaccurateandunsettledrelationshipbetweenselectedvariables.Forexample,peoplewhohavenocontactwiththemedicalsystemmaynotappearinadministrativedatabasesandskewtheresults.This has bigger implications for immigration research. Researchers who engage in immigrationresearch face the issue of having to deal with relatively small sample sizes even within largedatasets. TheNational Longitudinal Study of Children and Youth, for example, chronically undersampledimmigrantandrefugeechildrenforthefirstfewyearsofitsexistence.EvenLSIChasbuilt-inbiases,particularlyinthesmallsamplesizeofadultsaged55andolder(Hochbaum2013).Tobefair,LSICwasintendedtofollownewcomersatarrivalandveryfewimmigrantsarrivetoCanadawhoareaged55andolder.Samplingissuescanalsooccurevenintargeteddatabases.Forinstance,even though theLongitudinal Surveyof Immigrants toCanada (LSIC) contains a large volumeofinformationabouttheimmigrantexperience,itisverydifficulttoexaminenewcomersfromallbutthefivelargestimmigrant-sendingcountries.Evensexdifferencescanbedifficulttoexaminegiventhe small sample size. Pottie and colleagues (2008) describe the recruitment problems withlongitudinal studies, particularlywith immigrants, asmany fear for their future in Canada (theymightbedeported)andwerereluctanttoparticipateinthestudy.Simoneandherresearchteam(2014) complain that LSIC does not include refugee claimants and temporaryworkers so thesegroupsarelargelyignoredinresearch.Administrativedatacanbedifficulttouse,particularlyinimmigrationstudies.Often,thisdatamustbelinkedwiththeImmigrantLandingFile(ILF)—andeventhoughitisacensusofallnewcomersto Canada, itmay not be possible tomake positivematcheswith newdata. Individualswho aredifficulttomatch(whomayhaveincorrectidentifiersenteredintotheadministrativedatabase,forinstance)arenot‘matched’andarethenexcludedfromthedatabase.Thereareknownpatternstothis type of information—with those arriving as children or as teens more likely to be‘unmatchable’ due to identification-based database errors of this sort. Undoubtedly,when ILF isusedtolinkwithaStatisticsCanadaSurvey,researchershaveachancetoexaminedatainwaysthatwerenotpreviouslypossible.Thishasopenedupresearchonquestionsrelatedtorefugees,whohave historically been excluded from analyses or are mixed in with other types of newcomers.Whilethishasbeenimportanttoadvancingourknowledgeofthisgroup,thereareotherproblemswithlinkingILFtosurveysmeantforthegeneralpopulation.Lightmanandhercolleagues(2013)haddifficultyanalyzing theSurveyonLabourand IncomeDynamicsbecause itemspertaining tothe immigrant experience, such as country of last job and country of highest level of education,were not asked in that survey. It is frustrating from a Canadian standpoint because nearly one-quarter of the Canadian population was not born here and the failure of general surveys toroutinelycollecttheseitemsembedsevenmorebiasintoresults.Itisafrustrationechoedbyotherresearchersinvestigatingotherdatabases(Marzia2015;Papademetriouetal.,2009).
16
Another challenge is that some comprehensive big data sets, such as the ones obtained fromStatisticsCanada,havebeenreportedbyusersasbeingnearlyimpossibletovalidatethemeasuresandindicatorswithinthedataduetomissingdata,notknowinghowtomanipulatetheinformation,inconsistencyinhowthesurveysorcensuseswerecollected,difficultywithhowtooperationalizeand measure selected variables (Nikolaos 2002). The impact it has on immigration studies inCanadaisthat, it influencesthekindofstudiesresearcherscandoandthekindofquestionstheycanask.Storage,management, and sharingofbigdata remaina challenge for all usersofbigdata.Thesethree issues are connected. In Canada, as elsewhere, many of these large data sets, bothadministration and survey-based, are located in secure facilities. The protection of data is a bigissue in 21st century research and the possibility that the anonymity of participants could bejeopardized if the data were to fall into the wrong hands is real (Franke et al, 2015).With theincreasedsecurityofdata, comesrestriction toaccess,meaning fewerand fewerresearcherscanaccessthedata.Rigoroussecurityscreeningisoftenrequiredbeforearesearchercanaccessdata.They are also constrained by the office hours of the organization ofwhose data they are using.Studentshaveaparticularlydifficulttimeinaccessingdata,althoughtheResearchDataCentresofStatistics Canada have been a golden opportunity for them to access data that only theirsupervisors could examine in previous decades. One of the unintended consequences of secureaccessisthatresearchersoutsidethegovernmentanduniversityfinditnexttoimpossibletoaccessdata. Without a university-affiliation, most of this data is out of the reach of non-governmentsettlementorganizations,groupsthatcancriticallyanalyzethisinformationforpracticalpurposes.Itsetsupatwo-tieredsystemofaccesswhereresearchersareincreasinglythegatekeepersofdatathatcommunityorganizationscoulduse.HeatherandJames(2013)pointtoarelatedissuewhichtheycall“immaturetechnology”.Byimmaturetechnology,theylamentthepaceatwhichbigdatabecomes available to the public as compared to the technology needed to manage and analyzemega-datasets. Alongside immature technology, we would like to add that the number ofresearcherswiththeadvancedskillsetsinbothdatamanagementandstatisticsareinshortsupplyso the reality is that even ifmorebigdatawere tobecomeavailable, therewill be a shortageofqualifiedpeopletoanalyzeandinterpretthenumbers.AccordingtoIBM,therewillbeashortageofover4.6milliondataandanalyticalworkersinthecomingdecade.Thereareethicalissueswiththeuseoflinkeddataandwithadministrativedata.Often,thisdataiscollected from individualswho are in need of a service such asmedical care. No one tells themwhentheyvisittheirphysician(orfiletheirincometaxorsendtheirchildtoschool)thatthedatacollected could be used for research purposes in the future. All universities and federally andprovinciallyfundedresearchrequiresanethicsreviewpriortothebeginningofdatacollectionandmanyof thedata setsused in studies for this scoping reviewdidnot seekethical approval fromparticipants.Onenotorious example is the IMDBwhich links the ILFwith the tax record file.AllimmigrantswhohavearrivedtoCanadaonorafter1980arepartofthisdataset.Datafromadministrativesources,collectedfromsocialmedia,internetandonlinecommunicationsourcesaresubjecttobias,eveniftheyarepresentedasacensus.Credibilityofresultsisamajorissue.Zagheniandhiscolleagues(2014)pointoutthatinformationcollectedfromsocialmediacanbe produced by anyone (or anything such as a cyberbot) without going through the rigorousprocess of data verification, validation and analysis. There is no valid way for researchers toidentifydatageneratedbycomputerversushumansothissortofdataisalwayssuspected.
17
5.3.OpportunitiesforimmigrationresearchwithbigdataDespite these problems, there is real potential for big data in advancing future immigrationresearch. Inthisperiodofdisplacementcrisis,“accurateandtimelystatisticsareneededtoassistmigrants effectively” (Marzia, 2015, para. 3). Migration data can be very helpful for settlementorganizations to usewhen developing new programs or expanding on existing ones. One of thebiggestchallengesthatcommunityorganizationsfacedduringthearrivalofSyrianrefugeesoverashortperiodoftimewasnotknowinghowbigtheirfamilieswere.Thisiswherethetimelyreleaseofbigdata,muchlikethehandymapsofSyrianarrivalstoCanadareleasedbyIRCC,canbeusedtoa greater degree. Even basic information such as average family size can help communityorganizationslocateappropriatehousinginamoretimelyfashion.Bigdataonmigrantshelpsnotonlyincreating“sensiblemigrationpolicies”butalsoensuresadequatedistributionofresources.Ifthesharingofthesedatacanbemoredemocratic,witheasieraccesstothisvitaldata,particularlytocommunityorganizations,thenbigdatacanbeharnessedtohelpdesignandimplementbetterpoliciesandprogramsfornewcomersinamoretimelyfashion.Good research is informed by both policy and practice. Researchers should be encouraged topartnerwithinstitutionsandorganizationstoexaminequestionsrelevanttothepracticalneedsofthe settlement sector. More of this research needs to be done and if these partnerships can besustained,moredatawillbeusedandmorearticlesandreportswillberead.Frankeetal.(2015)remindusthatresearcherscanbenefitfromtheinsightofcommunitysettlementserviceworkers,particularly in interpreting the results of data analysis. Since big data is complex, continuouslychangingandgrowing,itrequiresdiverseknowledgeondatamanipulation.TheSyriancrisishasshown,withgreatclarity,theubiquitousnatureofcellphoneuseandinternetaccess.Onlyafewstudieshaveusedthisdatatoexamineimmigration.Althoughthereareinherentriskswithusing thisdata, greatknowledgecanalsobegainedwhenweexamine it.This reflectshow“unprecedentedlylargeandcomplexamountofdataisbeinggeneratedinrealtimeeverytimeacalloranonlinepaymentismade,oreverytimepeopleinteractonsocialmedia,whichisusuallyreferredtoasbigdata”(Marzia,2015,para.9).Activitiesusingsocialmedia,webloginhistory,IPaddresses, emails and text messages can be geolocated and turned into migration data usingstatisticalmodeling.“Massivedatasetsandsocialnetworkingsitesprovideopportunitiestodesignexperiments on a scale that was previously impossible in the social sciences”, (Grimmer, 2015,p.81).Whenusedappropriately,thisdatacanshedlightonimportantaspectsofimmigration.In order for immigration research to benefit from the current and future opportunities with avarietyofnewbigdatasources,thereisaneedtotrainmorepeopleindataanalysisandstatisticalmodeling. The complex work done by researchers such as Jedwab and Soroka (2014) is oneexampleofhowcomplex theseanalyses canbe. Ifwe invest in additional trainingof students tohandle largedatasets,more researchcanusebigdata.Renewed trainingmayactuallyencouragemorepeopletoentertheimmigrationfieldasmoredataanalysisopportunitiesbecomeavailable.Therearesomeexcitingdevelopmentsconcerninglinkedadministrativeandsurveydatarelatingtoimmigrationstudies inCanada. In2016,IRCC,alongwithStatisticsCanada,announcedsomenewexciting data linkages to be released in the coming 18 months. The ILF will be linked to the
18
following datasets: the Canadian Employer and Employee Dynamic Database (includinginternational students and temporary workers), the 2011 National Household Survey, the 2016CensusofCanada,theCanadianCommunityHealthSurveyandthe2013GeneralSocialSurveyonSocialIdentity. IMDBwillcontinuetobereleased.ThenewSettlementOutcomesSurveywillalsobe linkedtoadministrativedatabases.HealthrecordsfromOntario,BCandManitobawillalsobelinkedtotheILF.SomeresearchersmaybeabletoaccessiCARE,theadministrativedatabasethathousesinformationonallsettlementservicesusedbynewcomersacrossCanada(IRCC,2016).Inshort, there will be great opportunities in the very near future for researchers and communityorganizationstolearnmoreaboutnewcomersonavarietyofdifferenttopics.
6.KnowledgeMobilizationThepotentialknowledgeusersofsynthesisresultsinclude,butarenotlimitedto,researchersinvariousacademicinstitutions,governmentofficials,policyanalysts,andrepresentativesoftheintegrationandsettlementsectors,aswellasgraduatestudents.Theresearchfindingsfromthisreportwillbedisseminatedwidelytoensureanoptimalresearchuptakefromallresearchknowledgegroups.Thisprojectutilizedthreetypesofknowledgemobilizationstrategies:integrated,end-of-grantandpost-grant.Theknowledgemobilizationstrategiesweredevelopedbytheresearchteamandwerecommunicatedwiththeadvisorypanelmembers.
6.1.Integratedknowledgemobilization.TheAdvisoryCommitteewascomprisedofthreeindividuals;onerepresentativeoftheManitobaprovincialgovernmentandtworepresentativesofnon-governmentalorganizations(NGOs)inBritishColumbiaandAlberta.ThepurposeoftheAdvisoryCommitteeistoassisttheresearchteamintheinterpretationandthedisseminationoftheresearchfindingsandtoprovideinputandfeedbackonthereporting.Thethreeindividualsrepresentgroupsofpotentialknowledgeusersandparticipatedinonemeetingtoprovidefeedbackontheresearchapproachandtheirinputonhowthisresearchcanbenefittheirorganizations(May).Aninformalconsultationtookplaceviae-mailsinAugustwherethepreliminaryresearchoutcomesweresharedwiththeAdvisoryCommitteeinvitingthemembers’feedback.Theadvisorypanelwillassisttheteamindisseminatingtheresultsoftheresearch.
6.2.End-of-Grantandpost-grantknowledgemobilization.Theresearchfindingswillbedisseminatedinvariouswaystoensureanoptimalresearchuptakefromallresearchknowledgegroups.Thefollowingend-of-grantandpost-grantknowledgemobilizationactivitiesareplanned:• ConductameetingwiththeAdvisoryCommittee(November)todiscusstheresearch
findingsandthemostefficientandpracticalwaystopackageanddisseminatetheresearchfindingstonon-academicaudiencesandfront-endbigdatausers.
• Presenttheresearchfindingsat“TheCanadianInstituteforIdentitiesandMigration(CIIM)&ImmigrationResearchWest(IRW)RegionalSymposium:MigrationandRefugeinWesternCanada”onOctober21st,2016.
• PresenttheresearchfindingsatSocialSciencesandHumanitiesResearchCouncil’sFallForuminOttawaonNovember22,2016
• Submitaproposalforapresentationatthe19thNationalMetropolisConference.• SubmitapublicationtotheCanadianDiversity,AssociationforCanadianStudies.
19
• Produceaweb-basedresourceofthetabulateddatabaseusedduringthestudyselectionandchartingthedataphasesoftheresearchprocess.AllmaterialsproducedfromthisstudywillbehostedontheRuralDevelopmentInstitute,BrandonUniversityandImmigrationresearchWest,UniversityofManitoba’websites.LinkstotheabovewebsiteswillbedistributedthroughResearchPartnersNetwork(e.g.RuralPolicyLearningCommons)andImmigrationResearchWestNetwork.
7.ReferencesBenton,M.(2014).Smartinclusivecities:Hownewapps,bigdata,andcollaborativetechnologiesaretransformingimmigrantintegration.Washington,DC:TheMigrationPolicyInstitute.Papademetriou,D.G.,Somerville,W.,&Sumption,M.(2009).Thesocialmobilityofimmigrantsandtheirchildren.MigrationPolicyInstitute,Washington.Franke,B.,Plante,J.F.,Roscher,R.,Lee,A.,Smyth,C.,Hatefi,A.,...&Hoffman,M.M.(2015).StatisticalInference,LearningandModelsinBigData.arXivpreprintarXiv:1509.02900.Garcea,J.,JasonD.,WinstonZ.,&Wilkinson,L.(2016).Geo-spatialDataofImmigrationtoCanada’sWesternandNorthernRegion.Winnipeg;ImmigrationResearchWest.Accessedonlineathttp://gistest.usask.ca/irw/Grimmer,J.(2015).Weareallsocialscientistsnow:howbigdata,machinelearning,andcausalinferenceworktogether.PS:PoliticalScience&Politics,48(01),80-83.Smith,H.,A.,&McKeenJ,D.(2012).BigdataandDataAnalytics.CIOBrief,18(3),1-6.Hilbert,M.,&López,P.(2011).Theworld’stechnologicalcapacitytostore,communicate,andcomputeinformation.science,332(6025),60-65.Hochbaum,C.V.(2013).TooOldtoWork?:TheInfluenceofRetrainingonEmploymentStatusforOlderImmigrantstoCanada.CanadianEthnicStudies,44(3),97-120.Huebner,R.A.(2013).ASurveyofEducationalData-MiningResearch.Researchinhighereducationjournal,19.HumanitarianTracker(2016).SyriaTracker.Accessedonlineathttp://www.humanitariantracker.org/syria-trackerICT(2016)WorldTelecommunication/ICTIndicatorsDatabase.12July2016.Accessedonlineathttp://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspxInternationalOrganizationforMigration(2016).MixedMigrationFlowsintheMediterraneanandBeyond.Geneva:IOM.Accessedonlineathttp://migration.iom.int/docs/WEEKLY%20Flows%20Compilation%20No26%206%20October%202016.pdf
20
InternationalLiveStatistics(2016)InternetUsersbyCountry.Accessedonlineathttp://www.internetlivestats.com/internet-users/ImmigrationRefugeesandCitizenshipCanada(2016).Mapofdestinationcommunitiesandsettlementserviceproviderorganizations,updated21September2016.Ottawa:IRCC.Accessedonlineathttp://www.cic.gc.ca/english/refugees/welcome/map.aspImmigrationRefugeesandCitizenshipCanada(2016).ResearchingSyrianRefugees:DataAvailabilityandAccess.Ottawa:IRCC.Jedwab,J.&Soroka,S.(2014).“IndexingIntegration:AReviewOfNationalAndInternationalModels”.AreportpreparedfortheDepartmentofCitizenshipandImmigrationCanadabytheAssociationforCanadianStudies(CanadianInstituteforIdentitiesandMigration),30.Khandor,E.,&Koch,A.(2011).Theglobalcity:newcomerhealthinToronto.TorontoPublicHealth.Laczko,F.&Rango,M.(2014).Canbigdatahelpusachievea‘migrationdatarevolution’?MigrationPolicyPractice6(2),20-29.Levac,D.,Colquhoun,H.,&O'Brien,K.K.(2010).Scopingstudies:advancingthemethodology.ImplementationScience,5(1),1.Li,P.S.(1997)AReviewoftheAcademicLiteratureonImmigrationStudiesinCanada.Ottawa:CitizenshipandImmigrationCanada.López.H,etal.2014.BigdatainActionforDevelopment.TheWorldBankgroup.Accessedonlineathttp://live.worldbank.org/sites/default/files/Big%20Data%20for%20Development%20Report_final%20version.pdfManyika,J.,Chui,M.,Brown,B.,Bughin,J.,Dobbs,R.,Roxburgh,C.,&Byers,A.H.(2011).Bigdata:Thenextfrontierforinnovation,competition,andproductivity.Accessedonlineathttp://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovationMcNulty,E.(2014).“Understandingbigdata:the7vs”Dataconomy.Accessedonlineathttp://dataconomy.com/seven-vs-big-data/Mendez,P.,Wyly,E.,&Hiebert,D.(2011).Landingathome:InsightsonimmigrationandmetropolitanhousingmarketsfromthelongitudinalsurveyofimmigrantstoCanada.Liodakis,N.,&Satzewich,Victor.(2002).TheVerticalMosaicWithin:Class,GenderandNativitywithinEthnicity,ProQuestDissertationsandTheses.OECD(2013).ExploringData-drivenInnovationasaNewSourceofGrowth:MappingthePolicyIssuesRaisedby‘bigdata’.Accessedonlineathttp://www.oecd-ilibrary.org/science-and-technology/exploring-data-driven-innovation-as-a-new-source-of-growth_5k47zw3fcp43-en
21
Pandey,M.,&Townsend,J.(2011).Provincialnomineeprograms:Anevaluationoftheearningsandretentionratesofnominees.PrairieMetropolisCentre.Pottie,K.,Ng,E.,Spitzer,D.,Mohammed,A.,&Glazier,R.(2008).Languageproficiency,genderandself-reportedhealth:ananalysisofthefirsttwowavesofthelongitudinalsurveyofimmigrantstoCanada.CanadianJournalofPublicHealth/RevueCanadiennedeSante'ePublique,505-510.Rango,M.2015.“HowbigdataCanHelpMigrants”.WorldEconomicForum.Extractedfromonlinesourcehttps://www.weforum.org/agenda/2015/10/how-big-data-can-help-migrants/Rathnam,L.(2015)“CanbigdatahelptheSyrianconflict?”icrunchdata.comOctober8.Accessedonlineathttps://icurnchdata.com/big-data-help-syrian-conflict/Ratti,C.&Helbing,D.(2016).“Thehiddendangerofbigdata”TheStraitsTimes,September12,Accessedonlineatwww.straitstimes.com/opinion/the-hidden-danger-of-big-data.comSimone,D.,&Newbold,K.B.(2014).Housingtrajectoriesacrosstheurbanhierarchy:analysisofthelongitudinalsurveyofimmigrantstoCanada,2001–2005.HousingStudies,29(8),1096-1116.Sweetman,A.,&Warman,C.(2012).ThestructureofCanada’simmigrationsystemandCanadianlabourmarketoutcomes.Queen’sEconomicsDepartmentWorkingPaperNo,1292.UNGlobalPulse.2014.EstimatingMigrationFlowsUsingOnlineSearchData,GlobalPulseProjectSeriesno.4.VanRijmenam,M.(2014).Whythe3Vsarenotsufficienttodescribebigdata.Datafloq.Accessedonlineathttp://dataconomy.com/seven-vs-big-data/Wilkinson,L.(2015)“Immigrationresearchandpolicydevelopments:whatdoweknowandwhereareweheaded?”plenarypanelpaperpresentedattheCanadianSociologicalAssociationAnnualConference,Ottawa,UniversityofOttawaZagheni,E.,Garimella,V.R.K.,&Weber,I.(2014).Inferringinternationalandinternalmigrationpatternsfromtwitterdata.InProceedingsofthe23rdInternationalConferenceonWorldWideWeb(pp.439-444).ACM.