Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and...

Post on 27-Jul-2020

2 views 0 download

Transcript of Provided by the author(s) and University College Dublin ... · Welcome to the 3rd European Data and...

Provided by the author(s) and University College Dublin Library in accordance with publisher

policies. Please cite the published version when available.

Title Proceedings of the 3rd European Data and Computational Journalism Conference

Publication date 2019-07-01

Conference details The 3rd European Data and Computational Journalism Conference, Malaga, Spain, 1 - 2

July 2019

Publisher University College Dublin

Link to online version https://www.datajconf.com/2019/

Item record/more information http://hdl.handle.net/10197/11426

Downloaded 2020-10-02T15:28:01Z

The UCD community has made this article openly available. Please share how this access

benefits you. Your story matters! (@ucd_oa)

Some rights reserved. For more information, please see the item record link above.

Editors: BaharehHeravi,MartinChorley,GlynMottershead

Copyright:TheAuthorsofthepapersinthecollection.

ISBN:978-1910963388

Welcometothe3rdEuropeanDataandComputationalJournalismConference!

The3rdEuropeanDataandComputationalJournalismConferenceaimstobringtogetherindustry,practitionersandacademicsinthefieldsofjournalismandnewsproductionandinformation,data,socialandcomputersciences,facilitatingamultidisciplinarydiscussiononthesetopicsinordertoadvanceresearchandpracticeinthebroadareaofDataandComputationalJournalism.HeldinMalaga,Spain,theconferencepresentedamixofacademictalksandkeynotesfromindustryleaders.Itwasfollowedbyadayofworkshopsandtutorials.Submissionsofbothacademicresearch-focusedandindustry-focusedtalksfortheconference,onthesubjectsofjournalism,datajournalism,andinformation,data,socialandcomputerscienceswereinvitedfortheconference.Topicsofinterestinclude,butarenotlimitedto:

• Applicationofdataandcomputationaljournalismwithinnewsrooms• Datadriveninvestigations• Datastorytelling• Opendataforjournalism,storytelling,transparencyandaccountability• Algorithms,transparencyandaccountability• Automated,robotandchatbotjournalism• Newsroomsoftwareandtools• ‘Post-fact’journalismandtheimpactofdata• Userexperienceandinteractivity• DataandComputationalJournalismeducation• Post-desktopnewsprovision/interaction• Dataminingnewssources• Visualisationandpresentation• NewsgamesandgamificationofNews• Bias,ethics,transparencyandtruthinDataJournalism• Newsroomchallengeswithrespecttodatajournalism,bestpractices,successand

failurestories

Collectedwithintheseproceedingsaretheacademicabstractspresentedattheconference.Wewouldliketotakethisopportunitytothanktheprogrammecommitteefortheirhardworkreviewingsubmissionsandhelpingustocomeupwiththefantasticline-upoftalksforthisyear.AndanenormousthankyoutotheorganisingcommitteeattheUniversityofMalagaforthebeingsuchexcellenthosts.WelcometoMalaga,andwelcometoDataJConf2019!BaharehRHeravi,MartinJChorley&GlynMottersheadDataJConf2019co-chairs

2

Title PageInvitedtalkDanieleGrasso,ElPais

WithoutthehumanelementyourdatastoriesarejustspreadsheetsMohammedHaddad-AlJazeera

Howdoyoucoveruncertainelections?JoshRayman&AliceGrenié-BBCWorldService

DetectingnewsworthyeventsinajournalisticplatformJTareqAl-Moslmi,MarcGallofréOcaña,AndreasLOpdahlandBjørnarTessem-UniversityofBergen

3

FakeNewsDetectionBasedonNamedEntityRecognitionandMachineLearningFranciscoLopezValverde,RafaelaBenitezRochelandMariaGuerreroAguilar-UniversityofMalaga

6

RODA:atoolforsemi-automaticdata-drivenvisualstoriesXaquínVeira-González,AntonBardera,AppleChan-FardelandMaríaLuisaOteroLópez-UniversityofGirona,UniversityofSantiagodeCompostela

9

BecomingaDataJournalist:theroleofidentityindatajournalismeducationLizabethHannaford-ManchesterMetropolitanUniversity

12

PredictivesentimentanalysisofmessagesforJournalisticPurposes:Real-timeclassificationoftweetsbasedonMachineLearningFélixOrtega,CarlosArcilaandAntonioGarcía-UniversityofSalamanca,UniversityReyJunCarlos

15

BuildingaStatsBotSophieWarnes,JureStabucandHenryLau-OfficeforNationalStatistics

Style,Singularity,andSubstance:WhatPictureEditorsWantfromA.I.MartinSchönandNeilThurman-LMUMunich

20

Candatajournalismreallystimulatelocalnews?AcasestudywithmediainthecountrysideofPortugalRicardoMoraisandPedroJerónimo-UniversityofBeiraInterior/Labcom.IFP

23

InvitedTalkMeredithBroussard,NewYorkUniversityForday2panelsandworkshoppleasevisitconferencewebsiteondatajconf.com.

3

DetectingNewsworthyEventsinaJournalisticPlatform

TareqAl-Moslmi MarcGallofrẽ AndreasLOpdahlBjørnarTessemDept.InformationScienceandMediaStudies,UnivofBergen,N-5020Bergen,Norway

{Tareq.Al-Moslmi,Marc.Gallofre,Andreas.Opdahl,Bjornar.Tessem}@uib.no

Abstract:Socialandotheropenandproprietarydatasourcesarerapidlychangingthenatureofnewsandofjournalisticwork.IntheNewsAnglerproject,wewanttoharnesssuchbig-datasourcesforjournalisticpurposes.Weproposeaplatform,NewsHunter,thatisabletosuggestappropriatenewsanglesonunfoldingeventstojournalists.Preciselyassessingthenewsworthinessoftheseeventsisimportanttoavoidalertfatigue.Newsanglesareseenaspatternsthatcanbematchedbynewseventsrepresentedintheknowledgegraph.Workontheplatformsofarsuggeststhatnewsworthinesscanbeestimatedasaninterplayofatleastthreefactors:reliability–thattheeventiscorroboratedbymultipleindependentand/ortrustedsources;match–thattheeventfitsanewsanglethatisalignedwiththeintendedaudienceandnewsroomprofile;andnovelty–thattheeventhasnotbeenreportedwidelyfromthisanglealready.Keywords:Journalisticplatforms,newsroomsystems,knowledgegraphs,bigdata,newsangles,newsvalues,newsworthiness.

IntroductionTheNewsAnglerprojectaimstoharnesssocialandotheropenandproprietarydatasourcesforjournalisticpurposes.Specifically,wewanttoleveragetheconceptofnewsanglestohelpjournalistseffectivelyidentifynewseventsandnarratenewsstoriesthatmayinteresttheiraudience.Examplesofanglesareconflict,localperson,andfallfromgrace.Someanglesaremoredetailedversionsofothers,suchasDavid-versus-Goliath,asubtypeoftheconflictangle.Incollaborationwithadeveloperofnewsroomsystemsfortheinternationalmarket,wearedevelopingaplatform,NewsHunter,thatisabletoharvestpotentiallynews-relevanttextitemsfromtheweb,analysethemsemantically,ingestthemintoaknowledgegraph,aggregateitemsinthegraphintopotentiallynewsworthyevents,andsuggestsuitablenewsanglesonunfoldingeventstojournalists(Bervenetal.2018,Gallofréetal.2018).Newsanglesarepatternsthatcanmatchandmakeinterestingeventsinthisknowledgegraph.Withever-increasinginformation,preciselyassessingnewsworthinessofunfoldingeventsisessentialtopreventjournalisticalertfatigue.Thereisalreadyabroadvarietyofnews-relevantinformationplatformsavailable(Diakopoulos2016).TheyrangefromgeneralnewsservicessuchasGoogleandYahooNews,throughgeneralinformationplatformssuchasEMM,OCCRPandWebLyzard,tonews-specificonessuchasBloomberg’sknowledgegraph(Voskarides2018),EventRegistry(Leban2014)andReutersTracer(Liu2017).Manyofthemalreadyuseknowledgegraphsandrelatedsemantictechnologies,butwearenotawareofexistingapproachesthataimtosupportnewsanglesandusethemtoassessnewsworthiness.OurworkontheNewsHunterplatform(Bervenetal.2018,Gallofréetal.2018)suggeststhatnewsworthinesscanbeestimatedasaninterplayofatleastthreefactors:reliability–thattheeventiscorroboratedbymultipleindependentand/ortrustedsources;match–thattheeventfitsanewsanglethatisalignedwiththeintendedaudienceandnewsroomprofile;andnovelty–thattheeventhasnotbeenreportedwidelyfromthisanglealready.

4

MethodsWeaimtounderstandnewsplatforms,newsangles,andnewsworthinessthroughdesignresearch(Hevner2007),developingaseriesofprototypesbasedonstate-of-the-artbigdataandknowledgegraphtechnologies.Practicalrelevanceisensuredbyourindustrialpartner,whosharestheirunderstandingofindustrialandjournalisticneedsandwishes.Theoreticalrelevanceisensuredbyfocussingonopenresearchissuessuchashownewsplatformscansupportnewsanglesandhowknowledge-grapharchitecturescanscaletobig-datasettings.

FindingsandArgumentOurworkontheNewsHunterplatformsofarsuggeststhatnewsworthinesscanbeestimatedasaninterplayofatleastthreefactors,whichwenowdiscussinmoredepthtoestablishrequirementsforanewsplatformthatsupportsangles.

Figure1-OverviewoftheNewsHunterplatform(fromBervenetal.2018).Reliability:Mostimportantly,inordertobenewsworthy,aneventmustbereliable.Ifthesourceishighlytrusted,theeventmaybenewsworthyevenifitisreportedonlybyasingleitem.Butinmostcases,theeventmustbecorroboratedbyitemsoriginatingfrommultiplesources.Reliabilityoftheeventisinfluencedbothbythereliabilityof(ortrustin)thosesourcesandoftheindependenceoftheitems.Forexample,twotweetsmaybebasedonthesameunderlyingsourceoronemaysimplybearetweetoftheother.Finally,eventreliabilityisalsoinfluencedbyhowreliabletheliftingoftextualitemsintosemanticitemgraphsandtheaggregationofthoseitemgraphsintoeventgraphswere.Tosupportcorroboration,thenewsplatformmustthereforebeabletotraceeventsbacktotheiroriginatingitemsandthoseitems’sources.Trustinsourcesmustbeestimatedandmaintained,aswellastrustintheliftingandaggregationstepsonthewayfromtextualitemstoeventgraphs.Totheextentpossible,theexternalsourcesofitemsshouldalsobeidentified–atleastitemsthatarebaseddirectlyononeanotheroronacommonprecursorneedtobeidentified.Match:Inordertobenewsworthy,theeventmustmatchanewsangle,whichisapatternformouldinganevent,ifpossible,intoafabula,whichisasub-graphoffactsabouttheeventthatcanbenarratedtobecomeastory.Itisimportantthatanglesdonotonlyfittheevent,butalsothenewsorganisation’sintendedaudienceandprofile.Tosupportanglematching,thenewsplatformmustthereforemaintainalibraryofangles,whethercreatedmanuallyorlearnedautomatically.Itmustbeawareofwhicheventsandanglesthatfittheaudienceandnewsroomprofile,anditmustbeabletomatchangleswitheventstomouldfabulas.Also,tomatchanewsangle,theeventgraphmustbesufficientlydetailed.Thisisanotherreasonforaggregatingnewsitemsintoeventgraphs,whichwillpresumedlynotonlybemorereliable,butalsomoredetailedandcompletethantheindividualitemgraphs.Thenewsplatformshouldinvitethejournalisttocollectfurtherfactswhenneededtocompleteapromisingangle.

5

Novelty:Finally,inordertobenewsworthy,anangledeventshouldbeoriginal.Othernewsmediathattargetthesameaudienceshouldnotalreadyhavecoveredtheeventfromthesameangle.Tosupportnovelty,thenewsplatformmustthereforeharvestnewsitemspublishedbycompetingmediaorganisationsthattargetasimilaraudienceinrealtime.Itmustbeabletotracefromanewsitemtotheeventitdescribes,anditmustbeabletodetecttheanglefromwhichaneventisnarrated.

ConclusionsWehavepresentedtheobjectivesoftheNewsAnglerprojectandhowweplantoidentifynewsworthyeventsbyassessingtheirreliability,match,andoriginality.InfutureworkwewillvalidateourapproachbycontinuingtoextendtheNewsHunterplatformtosupportnewsworthinessandnewsangles.

ReferencesBerven,Arne,etal.(2018)“NewsHunter:Buildingandminingknowledgegraphsfornewsroomsystems.”Proc.NOKOBIT26,Svalbard.Diakopoulos,Nicholas(2016)"Computationaljournalismandtheemergenceofnewsplatforms."RoutledgeComp.Dig.JournalismStudies.GallofréOcana,Marc,etal.(2018)"TowardsaBigDataPlatformforNewsAngles⋆."Proc.NOBIDS’18,Trondheim.Hevner,AlanR.(2007)"Athreecycleviewofdesignscienceresearch."SJIS19.2:4.Leban,Gregor,etal.(2014)"Eventregistry:learningaboutworldeventsfromnews."Proc.23rdInt.Conf.WWW.Liu,Xiaomo,etal.(2017)"Reuterstracer:Towardautomatednewsproductionusinglargescalesocialmediadata."IEEEInt.Conf.BigData.Voskarides,Nikos,etal.(2018)"Weakly-supervisedContextualizationofKnowledgeGraphFacts."Proc.41stACMInt.SIGIRConf.

6

FakeNewsDetectionBasedonNamedEntityRecognitionandMachineLearning

FranciscoL.ValverdeDept.ofComputerScienceUniversityofMalagavalverde@uma.es

RafaelaBenítezRochelDept.ofComputerScienceUniversityofMalagarbenitezr@uma.es

MaríaGuerreroAguilarJournalistofPressOfficeUniversityofMalagamariaguerrero@uma.es

Abstract:Falsenewshasbecomeaproblemofthefirstmagnitudeforthegovernmentsofnationsandthemedia.Duetothelargevolumeofinformationtoanalyzetosolvethisproblemitseemsthatthesolutionshouldbeanautomaticmethodthatmanagestodetectfalsenews.However,todaywestilldonothavethetechnologytohaveanautomaticandefficientsolution.Forthisreason,theonlysolutionsthatareworkingarebasedonmanualoperation.Ourproposalconsistsofadecisionsupportsystemfortheseorganizationstofacilitatetheirwork.UsingamachinelearningsystembasedonNamedEntityRecognitionandidentitiesoftheauthorsitispossibletomakeapriorclassificationofauthenticity.Thankstothisscheme,theamountofinformationthatisnecessarytoanalyzemanuallyisgreatlyreduced.

Keywords:FakeNewsDetection,NamedEntityRecognition,MachineLearning.SuperVectorMachines,Identityauthenticity. IntroductionInthispaperwepresentaproposalforasemiautomaticschemeforthedetectionoffalsenews.Overrecentyears,theextensivegrowthinthenumberandtypesoffakenewshasledtothenecessityforbuildingandeffectivedetectionsystemforfakesnewsidentificationwiththecapabilityofhandlingthevolume,thevarietyandthevelocityassociatedwiththem.AugeyandAlcaráz(2019)inarecentinvestigationconcludethatmostofthefalsenewsismotivatedbyfinancialobjectives.Weare,therefore,facedwithanewchallengethat,asMcNair(2017)pointsout,doesnotrespondtoanisolatedculturalproblem,buttotheresultofthesocialtrendsofthe21stcentury.Globalization,theriseofrelativism,thecrisisofobjectivity,theconsumptionofdigitalnewsorthedeclineoftrustinjournalismaresomeofthefactorsthatthisauthoridentifiesasexplosivesoftheriseoffalsenews.Inrelationtotheonlinefakenewsaudience,thelatestworksindicatethatitisasubsetofthetotal,disloyalandhighavailabilitynewsaudienceontheInternet(L.Nelson&Taneja,2018).TheWorldEconomicForum(WEF)hasbeenwarningforyearsabouttheglobaldangerofthemassiveexistingdigitaldisinformation,asatechnologicalandgeopoliticalrisk.Likewise,theEuropeanBarometer464on'Falsenewsanddisinformationonline',madein2018,detectedahighdegreeofbeliefinSpainofbeingexposedtofakenews.Itsdirectconsequencesinpoliticsarealsothesubjectofmanyotherstudies.Inthisline,anarticlepublishedintheresearchcenter'PewResearchCenter'(2016)claimedthatmostAmericans(64percent)suspectedthatthefalsenewsgeneratedconfusionandhadapotentialimpactonbothpoliticallifeasintheindividualcitizens.J,VargoandA.Amazeen(2017)warnoftherelationshipofthesenewswiththedigitalpartisanmedia,whichtheyidentifyashighlysensitive,howevertheydownplaytheirimpactonthenewmedia,althoughthesealsorespondtothefalsenewsagendas.

7

Inthisworkweproposeadecisionsupportsystemthatfacilitatestheworkofagenciesthatspecializeinthedetectionoffalsenews.UsingamodelofmachinelearningbasedonNamedEntityRecognition(NER)andidentityauthenticity,itispossibletomakeamassivepreliminaryclassificationwithanaccuracyofmorethan80percent

MethodsCurrently,therearefactscheckingorganizationssuchasSnopes,Politifact,TruthOrFiction,Factcheck,OpenSources,FakeNewsWatch,fakespot,reviewmeta,Opensecrets.org,etcwhichoperateonthebasisofthetraditionaljournalisticmodel.Intheseorganizationsthereportershavetoevaluatefactsinordertoobtaintheveracityofastatement.Thisapproachisnotautomatedandisoftentime-consuminganddifficulttocompetewiththequantityoffakenewspublisheddaily.Thisproblemhasledresearchersandtechnicaldeveloperstolookatseveralautomatedwaysofassessingthetruthvalueofpotentiallydeceptivetextbasedonthepropertiesofthecontentandthepatternsofcomputer-mediatedcommunication.Machinelearning:SupervisedmachinelearningalgorithmslikeDecisionTree,RandomForest,SupportVectorMachine(SVM),LogisticRegression,K-nearestNeighbourareextensivelyusedinpreviousliteraturesforonlinehoaxes,frauds,anddeceptivein-formationclassification(Afrozetal.,2012);deeplearningbasedmethodsaregoodsolutionsforonlinefakenewsrepresentationanddetection,andhavebeenintroducedinRuchanskyetal.(2017).Unsupervisedlearningmodelforfakenewsdetection,theyare:clusteranalysis,outlieranalysis,semanticsimilarityanalysis(Li,McLean,Bandar,O’shea,&Crockett,2006),andunsupervisednewsembeddingtechniquesincludeWord2vec,FastText(Bojanowski,Grave,Joulin,&Mikolov,2017),Sent2vec(Pagliardini,Gupta,&Jaggi,2017),andDoc2vec(Le&Mikolov,2014).OurmethodisbasedontheNamedEntityRecognitiononnewsandidentityofauthors.UsingonlythisinformationinaSuperVectorMachineitispossibletoclassifyastoryasprobablyauthenticorprobablyfalse.Ourfocusisonthesimplicityoftheanalysisthatmakesitmoreappropriatewhenanalyzinglargeamountsofnews.Onlythosenewsthathavebeenclassifiedasfalseareanalyzedmanuallybytheorganizationstoensureandconfirmthisclassification. FindingsandArgumentOnasimulateddatasetof100falsenewsand100truthfulnews,theSVMsystemwasabletocorrectlydetectandclassify81.5%ofthenews.Asyoucanseeinfigure1theclassificationisquiteaccurate.

Figure1–NewsclassificationbySVMbasedonNER.1indicatestrueand-1false

ConclusionsAnewhybridmethodtodetectfalsenewshasbeenpresentedinthisarticle.Themaincontributionisthesimplicitythatmakesitsuitableforanalyzinglargeamountsofdata.Itisadecisionsupportsystemforcompaniesspecializedinthedetectionoffalsenewssinceitfacilitatestheirworkandgreatlyreducesthespeedandcosts.

8

ReferencesAlcaraz,M.&Augey,D.(2019)WillFakeNewsKillInformation?Inbook:DigitalInformationEcosystems.JournalofCommunication,pp139-159.DOI:10.1002/9781119579717.ch7Afroz,S.,Brennan,M.,&Greenstadt,R.(2012).Detectinghoaxes,frauds,anddeceptioninwritingstyleonline.Securityandprivacy(sp),2012IEEEsymposiumon.IEEE461–475.Bojanowski,P.,Grave,E.,Joulin,A.,&Mikolov,T.(2017).Enrichingwordvectorswithsubwordinformation.TransactionsoftheAssociationforComputationalLinguistics,5,135–146.Barthel,M.,Mitchell,A.&Holcomb,J.(2016).Manyamericansbelievefakenewsissowingconfusion.PewResearchCenter.Retrievedfrom:https://www.journalism.org/2016/12/15/many-americans-believe-fake-news-is-sowing-confusion/EuropeanCommission(2018).Eurobarometer464on“FakeNewsandDisinformationonline”.Retrievedfrom:http://ec.europa.eu/commfrontoffice/publicopinion/index.cfm/survey/getsurveydetail/instruments/flash/surveyky/2183Gualda,E.(2019)Teoríasdelaconspiración,confianzaycredibilidadenlainformación.Communication&SocietyVOL.32(1).Retrievedfrom:https://www.unav.es/fcom/communication-society/es/articulo.php?art_id=728Hernández,A.(2017).Resilienciadelaorganizacióndelainformaciónenlaeradelasposverdad.Alcance,6(14),47-59.High-LevelExpertGrouponFakenewsandDisinformation(2018).Amulti-dimensionalapproachtodisinformation.ReportoftheindependentHighlevelGrouponfakenewsandonlinedisinformation.March.EuropeanCommission,Directorate-GeneralforCommunicationNetworks,ContentandTechnology,Luxemburg.Retrievedfrom:https://publications.europa.eu/en/publication-detail/-/publication/6ef4df8b-4cea-11e8-be1d-01aa75ed71a1/language-enLe,Q.,&Mikolov,T.(2014).Distributedrepresentationsofsentencesanddocuments.Internationalconferenceonmachinelearning1188–1196Li,Y.,McLean,D.,Bandar,Z.A.,O’shea,J.D.,&Crockett,K.(2006).Sentencesimilaritybasedonsemanticnetsandcorpusstatistics.IEEETransactionsonKnowledgeandDataEngineering,18(8),1138–1150McNairB.(2017)FakeNews.Falsehood,FabricationandFantasyinJournalism.RoutledgeFocus.1stedition.Pagliardini,M.,Gupta,P.,&Jaggi,M.(2017).Unsupervisedlearningofsentenceembeddingsusingcompositionaln-gramfeatures.arXiv:1703.02507Ruchansky,N.,Seo,S.,&Liu,Y.(2017).Csi:Ahybriddeepmodelforfakenewsdetection.Proceedingsofthe2017ACMonconferenceoninformationandknowledgemanagement.ACM797–806.Vargo,J.,Guo,L.&Amazeen,A.(2017)Theagenda-settingpoweroffakenews:Abigdataanalysisoftheonlinelandscapefrom2014to2016.NewMedia&Society,vol.20,5:pp.2028-2049.Weir,W.(2009).History´sGreatestLies.Beverly:FairWindsPress.

9

RODA:atoolforsemi-automaticdata-drivenvisualstoriesXaquínVeira-GonzálezUniversitatdeGirona,GraphicsandImagingLaboratoryxocasgv@gmail.com

AntonBarderaUniversitatdeGirona,GraphicsandImagingLaboratoryanton.bardera@imae.udg.edu

AppleChan-Fardelapple.cjcfardel@yahoo.com

MaríaLuisaOteroLópezUniversidadedeSantiagodeCompostelamarotlo@usc.s

Abstract:RODA,ourrobotdataassistant,isaninteractivetoolfordatainquiryandautomationofdata-drivenvisualnarratives.Sofar,mostoftheresearchondata-drivenautomationinjournalismhasdrilleddownoneithertextnarrativesorisolateddatavisualizations.OurresearchforRODAisthefirsttimethattheaimistoproducenarrativesincorporatingsemanticallyinterwoventextandvisualizations.Byusingthecurrentadvancesinnatural-languagegenerationandresearchonstructuresofthistypeofstorytellingdevices,weexpandthetheoreticalframeworkofdata-drivenvisualstorytelling.Keywords:data-drivenstorytelling,informationvisualization,natural-languagegeneration,narrativevisualization,narrativeautomation,data-drivenvisualstories,robotjournalism,datajournalism,visualstorytelling

IntroductionLeadingnewsroomsaredevotingmoreresourcestomakevisual-drivenformatsintegraltotheirvocabulary.TheNewYorkTimes’2020report“JournalismThatStandsApart”chartedthegrowthof“storieswithdeliberatelyplacedvisualelements”fromclosetonothingin2014to12.1%bySeptember2016.“Deliberatelyplaced”isthecrucialnuancethathintstoaholisticwayofediting,asCairo(2017)describesit,thatiscurrentlybeingusedinmanyofleadingnewsrooms:text,visualizations,pictures,andvideosarejustdifferentmediatypestotellaportionofastory,whichflowsinandoutofthemseamlessly.Textandgraphicsblendinthenarrative;theyuseshortsentencesanchoredinsummarystatisticsthatrefertowhatthegraphicsshow.Manytoolsautomatetheproductionofchartsfromdata,andthecurrentadvancesinNLGarestartingtoprovidetoolstoautomatethewritingofstoriesfromdata.Wepresentarobotdataassistant(RODA)withtheambitiontoautomatetheproductionofcompletenarrativesthatcoherentlymixtextandvisualizationsbasedontheinputdata.Theuserwouldenteradatasetintothesystem,andfollowingaconversationwiththeapplicationtherobotwouldtrytounderstandthedata,questiontheuseraboutnecessarychecks,summarizepossiblepatternsortrends,recommendvisualizationtypes,andusenatural-languagegeneration(NLG)toassembleanarrativewithtextandvisualsthatwouldfittheprioritiesofthecommunicator.

MethodsThefocusofinformationvisualization(InfoVis)hadbeen,untilrecently,oninteractivevisualrepresentationsofdataasisolatedinterfacesforthedata.SegelandHeer’s(2010)shiftedthefocusandintroducedtheconceptofnarrativevisualizations,asacombinationofvisuals,multimediaandtextualelementsintegratedwithindata-drivenstorytellingsystems.InRicheetal.(2018),practitionersandresearchersexplorestorytellingtechniques,thelifecycleofthestory,andnarrativepatterns,definingfuturelinesofresearchandexplorationfordata-drivenvisualstorytelling.FollowingthepatternsinVeira-GonzálezandPerez-Montoro's(2018)weexploretheunderlyingstructuresofthesedata-drivenvisualstoriesandtheiratomiccomponents.

10

FindingsandArgumentRODAisdesignedasablendofchatbotandon-screeninteractionwiththeuserfromwhichitcalculatesthesummarystatistics,recommendscharts,understandswhatthedatameans,gatherstheuser’sprioritiesforthestory,andoutputsastorystructurecomposedofnarrativeblockswithsemanticallyinterconnectedtextandvisuals.

Figure1-StreamlinedflowchartofhowRODA’sinputsandfeedbacktotheuserwork Eachstoryatomiscomposedofadatadescriptiontext,avisualization,andanexplanationandtransitionaltextalltiedtothecurrentviewofthedata.TheyresembleKosara’s(2017)Claim,Fact,andConclusion(CFO)patternbasedonCohn’s(2013)narrativestructureforcomics.RODA’srecommendationsrestonresearchthattwooftheauthorsdidwhiledesigningTheGuardian’sin-housechartingtool.Whiletheimplementationstartswithadatasetandthensuggestsavisualdisplay,ourapproachwalkedbackwardfromadozenofvisualizations:thesystemparsesthedatatypesoftheinput,therangesofthenumericproperties,andfilterstheavailablevisualizationmethodsbasedonasetofconstraints.Aroundeachinstanceofthevisualization,thetextreferstothecriticalfeaturesvisualized,andifitisknown,providesthecontextandthereasonsforwhatitissalient.Textblocksinthesetypeofstories,especiallytheexplanatorycopy,serveasimilarfunctiontotheannotationlayerwithindatavisualizations.Inordertoautomaticallydetectfactsfromthedataforcontentplanningofthetransitionaltext,severalstatisticshavetobecomputed.Inthefirstiterationoftheprototype,wefocusonbasicstatisticalmeasures,suchasmean,median,percentilesandquantiles,andstandarddeviation.Thetoolusesrule-basedgenerationtowritethetext,amoresophisticatedmethodthanfill-in-the-blanks-with-datatemplates.

11

Theuser’sanswersdeterminethecontentforexplanatorytextsinthefinalpartoftheapplicationflow.Oncethetoolhascomputedasetoffactsandhasunderstoodhowtorefertothedata,itcanasktoaddcontextorreasonsbehindsomeofthosecalculatedfacts,suchas“dotheoutliershaveanythingincommon,”“howaboutitemsneartheaverageorthemedian.”Theapplicationwouldthensummarizethoseanswers.Inordertostructurethosenarrativeblocksandcomposethestory,theapproachthatbestfitsourpurposesisKosara’s(2017)Claim,Fact,andConclusion(CFO)pattern,basedonCohn’s(2013)narrativestructureforcomics.WealsouseVeira-GonzálezandPerez-Montoro's(2018)storypatternstodeterminetheorderandscope(overviewordetails)ofthenarrativeblocks.

ConclusionsWhatRODAcandoinitscurrentiterationisinherenttoitspurpose.Itisn’tjustatoolforautomatingdata-drivenstories,butalmostmoreimportantlyatoolfortrainingjournalists:anaidtocontributetodataandvisualliteracyinnewsroomsandothercommunicationenvironments.Someofthelimitationsalsocomefromtherelativelysmallbodyofresearchonthedata-drivenvisualstories—stillinitsinfancycomparedtoothersubfieldsofInfoVis—andthefactthatthispaperisafirst-everapproachtoautomatingthesetypeofnarratives.Futureresearchwillgainaswellfromsurveyingtheeffectivenessofthesemachine-writtenvisualstoriescomparedbothtohuman-writtentextpiecesandtoindividualchartsbythemselves.

ReferencesCairo,A.,2017.Nerdjournalism:Howdataanddigitaltechnologytransformednewsgraphics(Doctoraldissertation,UniversitatObertadeCatalunya).Cohn,N.,2013.Visualnarrativestructure.Cognitivescience,37(3),pp.413-452.Kosara,R.,2017,June.Anargumentstructurefordatastories.InProceedingsoftheEurographics/IEEEVGTCConferenceonVisualization:ShortPapers(pp.31-35).EurographicsAssociation.Pérez-Montoro,M.andVeira-González,X.,2018.InformationVisualizationinDigitalNewsMedia.InInteractioninDigitalNewsMedia(pp.33-53).PalgraveMacmillan,Cham.Riche,N.H.,Hurter,C.,Diakopoulos,N.andCarpendale,S.eds.,2018.Data-drivenstorytelling.CRCPress.

12

Becomingadatajournalist:theroleof

identityinjournalismeducation

LizHannafordManchesterMetropolitanUniversityL.Hannaford@mmu.ac.uk

Abstract:Asdatajournalismbecomesmainstream,journalismeducatorsneedtofindwaystobringitintotheirteaching.However,theliteratureshowsthatthiscanbeproblematicforstaffandstudentsinanalreadytightly-packedcurriculum.Theobjectiveofthisstudyistoexplorewaysinwhichlearningtododatajournalismcanbereconceptualisedasasocialprocessofbecomingadatajournalistwherebystudentsareinvitedtotakeonthebeliefsandvaluesofthenewprofessionalidentitiesmadedesirableinadatifiedsociety.Iaddressthisproblembyusingadiscourseanalysisapproachtoexplorehowvocaladvocatesinthisfieldusedifferentdiscursivestrategiestojustifytheirpractices.Preliminaryfindingssuggestspeakersredrawtheboundariesofjournalismastheynegotiateandlayclaimtocompetingidentitieswithimplicationsforjournalismeducators.Keywords:Datajournalismeducation,identity,CommunityofPractice,discourseanalysis

IntroductionThispapersetsoutthebackground,rationale,methodsandsomeinitialfindingsofmydoctoralresearch,currentlyongoing,whichisastudyofthediscoursesofdatajournalism,theconstructionofdatajournalistidentitiesandtheimplicationsforjournalismeducation.Encounteringspreadsheets,statisticsandcodecanbeajarringexperienceforundergraduateswhodidnotexpecttheirjournalismcoursetobe‘technical.’Educatorsneedtoaddresswhatprofessionalandsocialidentitiesweareinvitingstudentstoinvestinwhentheystudyjournalismandhowtheseidentitieshavebeenrecalibratedbyjournalism’s‘quantitativeturn.’Toachievetheseaims,myresearchisanalysingthetaken-for-grantednormsthathavebecomeembeddedinthediscourseofdatajournalism,andaskingwhetherandhowthisdiscoursecanbeexclusionary.ThereisnowasmallbutgrowingnumberofspecialistdatajournalismcoursesatMasterslevelintheUK(Bradshaw,2018),butattemptstointroducedatajournalismskillsintotraditionaljournalismprogrammeshaveencounteredobstacles.Theseincludejournalismstudents’aversiontomaths,afearthatstudentsareputoffbythesubjectandthelackofqualifiedstafftoteachdatajournalism(Hewett,2015).Theliteratureondatajournalismeducationhaspredominantlyfocusedontheextenttowhichitistaughtaroundtheworldandthechallengesitpresents(Splendoreetal.,2016insixEuropeancountries;BerretandPhillips,2016intheUnitedStates;YangandDu,2016inHongKong;DaviesandCullen,2016inAustralia;Heravi,2019globally).Elsewhere,researchhasexploredtheimportanceinthisfieldofpeer-to-peerlearningthroughinformalnetworkssuchastheNICARlistserv(Howard,2014;FinkandAnderson,2015;HermidaandYoung,2017),HacksHackersmeet-upsaroundtheworld(LewisandUsher,2014)anddedicatedsocialmediagroups(Appelgren,2016),allofwhichsuggesttheimportanceofsocialidentityandcommunityparticipationinthisfield.

13

Theimplicationsoftheselearningpracticesforformaljournalismeducationhaveyettobefullyexploredintheliterature.Giventheneedforjournalismeducationtoembracebasicdataskillsasarequirementoftheprofession(StalphandBorges-Rey,2018),itisarguedherethatsocio-culturalperspectivescouldprovidevaluableinsightsintolearningasanongoingsocialprocess.Researchfromotherprofessionsprovidessupportforthisapproach(Monrouxe,(2010)inmedicalstudentsandBeauchampandThomas,(2009)instudentteachers,forexample).Thispaperpresentsinitialfindingsfromanalysisofthediscoursesofdatajournalismasdeployedbyitsinfluentialearlypioneersduringthesocialinteractionofinterviewsandpaneldiscussions.Theanalysissuggeststhatthedominantdiscoursesofdatajournalismoftenrelyonnegativerepresentationsoftraditionaljournalismas‘broken’whilstrepresentingtechnologistsasheroicsaviours.Thesediscursivestrategiestalkdatajournalism‘out’ofthejournalismcurriculumandcanantagonisestudents’socialidentity.Iarguethatitwouldbemorebeneficialtofindwaysoftalkingit‘in’tothecurriculumtohelpstudentsmanagethetransitiontothenewprofessionalidentitiesrequiredinadatifiedsociety.

MethodsThemethodologyisdrivenbythefollowingresearchquestions.Howdovocaladvocatesofdatajournalismtalkaboutthisfield?Whatidentities(subjectpositionings)andpracticesdothesewaysoftalkingmakepossible?Whataretheimplicationsforjournalismeducators?Theresearchiscurrentlyongoing.Thepreliminaryanalysispresentedinthispaperisbasedonusingadiscourseanalysisapproach(JagerandMaier,2016),whichbuildsonrecentinterestininvestigatingdatajournalismasasocio-discursiveconstructproducedthroughsocialinteraction(Powers,2012;DeMaeyeretal.,2015;Borges-Rey,2017)asopposedtoaninevitablerealityexisting‘outthere’.ThedatatowhichthisanalysishasbeenappliedconsistsoftranscriptsofinterviewsandpaneldiscussionsinvolvingprominentpractitionersandpioneersinNorthAmericanandEuropeandatajournalismfrom2008to2018.Toproduceamanageabledatasetforadiscourseanalysisapproach,apurposiveselectionofinteractionaltextswasmade.Theadvantageofanalysingsocialinteractiontextsasopposedtowrittentextsisthattheyarearichsourceofnarrativesabouthowthespeakersseethemselvesandwhattheydo.

FindingsandArgumentTheresearchisongoing,andassuchthefindingspresentedarepreliminary.Anumberofdifferentdiscursivestrategieswereusedbyvocaladvocatesofdatajournalismtolegitimisetheirpractice.Thereisevidenceoftheuseofnegativerepresentationsoftraditionaljournalismas‘broken’,theidealisedpursuitofjournalistic‘truth’,theunquestionedprivilegingofnumerical,structureddataoverotherformsofknowing,aradicalvisionofthefutureofjournalismbutalsopositiverepresentationsoftechnologyasemotionallyfulfillingandadiscourseofoptimismaboutjournalism’sfuture.Thesediscoursesinvolvethespeakersnegotiatingandlayingclaimtodifferentandcompetingidentitiesastheyredrawtheboundariesofjournalismandexplorenewwaysofbecomingajournalist.Thesenegotiationstakeplaceagainstthebackgroundofanexistentialcrisisinjournalism.Neo-liberaldiscoursesthusblendwiththesenewidentitiesasjournalismbecomesadigitisedcommodityinaglobalmarket.Practicesrequiredbythesewaysoftalkingaboutdatajournalismincludethefetishisationofopenness,collaboration,disruptionandinnovation.Continuallearning–oftenfrompeers–ishighlyvaluedandalignedwithsocialparticipationincommunitiesoflikemindedpractitionersthattranscendorganisationalboundaries.

ConclusionsThepreliminaryconclusionsoftheresearchsuggestthatjournalism’squantitativeturnrequiresmorefromeducatorsthanjustsqueezingnewskillsintoanalreadytightly-packedcurriculum.Iarguethatknowledgeandidentityareintertwined(LaveandWenger,1991)andsoeducatorsneedtoconsiderwhostudentsneedtobeasmuchaswhattheyneedtoknow.Theexperienceofbecomingajournalistrepeatedlyraisesissuesofidentity,valuesandbeliefsthatneedtobeaddressedintheclassroom.Studentshavetobeabletomakesenseof

14

themselvesinthisnewjournalism-technologyenvironmentand,iftheychoose,inhabittheprofessionalidentitiesthatresult.

ReferencesAppelgren,E.(2016)'DataJournalistsUsingFacebook.'NordicomReview,37(1)pp.156-169.Beauchamp,C.andThomas,L.(2009)'Understandingteacheridentity:anoverviewofissuesintheliteratureandimplicationsforteachereducation.'CambridgeJournalofEducation,39(2)pp.175-189.Berret,C.andPhillips,C.(2016)TeachingDataandComputationalJournalism.NewYork,NY:ColumbiaJournalismSchool.Borges-Rey,E.(2017)'TowardsanepistemologyofdatajournalisminthedevolvednationsoftheUnitedKingdom:Changesandcontinuitiesinmateriality,performativityandreflexivity.'Journalism(publishedonlineaheadofprint1stFebruary)AvailableatDOI:10.1177/1464884917693864Bradshaw,P.(2018)'DataJournalismTeaching,FastandSlow.'AsiaPacificMediaEducator,28(1)pp.55-66.Davies,K.andCullen,T.(2016)'DataJournalismClassesinAustralianUniversities:EducatorsDescribeProgresstoDate.'AsiaPacificMediaEducator,26(2)pp.132-147.DeMaeyer,J.,Libert,M.,Domingo,D.,Heinderyckx,F.andLeCam,F.(2015)'WaitingforDataJournalism:Aqualitativeassessmentoftheanecdotaltake-upofdatajournalisminFrench-speakingBelgium.'DigitalJournalism,3(3)pp.432-446.Fink,K.andAnderson,C.(2015)'DataJournalismintheUnitedStates:Beyondthe“usualsuspects”.'JournalismStudies,16(4)pp.467-481.Jäger,S.andMaier,F.,2016.Analysingdiscoursesanddispositives:aFoucauldianapproachtotheoryandmethodology.InWodak,R.andMeyer,M.(eds)3rded.Methodsofcriticaldiscoursestudies,LosAngeles:Sage,pp.109-136.Heravi,B.R.,2019.3WSofDataJournalismEducation:What,whereandwho?.JournalismPractice,13(3),pp.349-366.Hermida,A.andYoung,M.L.(2017)'FindingtheDataUnicorn:Ahierarchyofhybridityindataandcomputationaljournalism.'DigitalJournalism,5(2)pp.159-176.Hewett,J.(2015)'Learningtoteachdatajournalism:Innovation,influenceandconstraints.'Journalism,17(1)pp.119-137.Howard,A.B.(2014)TheArtandScienceofData-drivenJournalism.NewYork:TowCenterforDigitalJournalism.[Online][Accessedon21stMarch2019]Availableathttps://doi.org/10.7916/D8Q531V1Lave,J.andWenger,E.(1991)Situatedlearning:Legitimateperipheralparticipation.CambridgeUniversityPress.Lewis,S.C.andUsher,N.(2014)'Code,Collaboration,AndTheFutureOfJournalism:AcasestudyoftheHacks/Hackersglobalnetwork.'DigitalJournalism,2(3)pp.383-393.Monrouxe,L.V.(2010)'Identity,identificationandmedicaleducation:whyshouldwecare?'Medicaleducation,44(1)pp.40-49.Powers,M.(2012)'“InFormsThatAreFamiliarandYet-to-BeInvented”AmericanJournalismandtheDiscourseofTechnologicallySpecificWork.'JournalofCommunicationInquiry,36(1)pp.24-43.Splendore,S.,DiSalvo,P.,Eberwein,T.,Groenhart,H.,Kus,M.andPorlezza,C.(2016)'Educationalstrategiesindatajournalism:AcomparativestudyofsixEuropeancountries.'Journalism,17(1)pp.138-152.Stalph,F.andBorges-Rey,E.(2018)'DataJournalismSustainability:Anoutlookonthefutureofdata-drivenreporting.'DigitalJournalism,6(8)pp.1078-1089.Yang,F.andDu,Y.R.(2016)'StorytellingintheAgeofBigData:HongKongStudents’ReadinessandAttitudetowardsDataJournalism.'AsiaPacificMediaEducator,26(2)pp.148-162.

15

PredictivesentimentanalysisofmessagesforJournalisticPurposes:Real-timeclassificationoftweetsbasedonMachineLearning

Prof.Dr.FélixOrtega Dr.CarlosArcila Prof.AntonioGarcía

UniversidaddeSalamancaE-mailfortega@usal.es

UniversidaddeSalamancaE-mailcarcila@usal.es

UniversidadReyJuanCarlosE-mailantonio.garcia@urjc.es

Abstract:Algorithms,bigdata,machinelearningandartificialintelligencesystemsarekeyconceptsandmethodsforthereshapingofoursociocultural,economicandpoliticalrelationsinoureverydaylife.Digitalcultureandcommunicationareinevitablychangingasmediainfrastructures,mediapracticesandsocialenvironmentsbecomeincreasinglymoredataconscious-driven.Theeffortsinbringingtogetherautomatedsentimentanalysisbasedonmachinelearningandstreamingtechnologiesthatproduceimportantamountofdata,arerelativelynewtojournalisticorientedmediaenterprises.Thispaperdescribesandassessthecreationofmachinelearningmodelstopredictsentimentsinreal-timetweetsassociatedtotheunderrevolutionJournalisticValueChain(JVC)anddepictshowthisprocesscanbescaledusingcommercialdistributedcomputingwhenpersonalcomputersdonotsupportcomputationsandstorageinordertoprovidetheDataJournalisticUnit(DJU)withtoolsforabetterjobandinformationperformance.Keywords:Predictivesentimentanalysis;PoliticalOpinion;Twitter;MachineLearning;BigData;Politicaltweets;Bigdatadigitaljournalism.

IntroductionAlgorithms,bigdata,machinelearningandartificialintelligencesystemsarekeyconceptsandmethodsforthereshapingofoursociocultural,economicandpoliticalrelationsinoureverydaylife.Digitalcultureandcommunicationareinevitablychangingasmediainfrastructures,mediapracticesandsocialenvironmentsbecomeincreasinglymoredataconscious-driven.Theconsumer´suseofthecommonplacemediatechnologiesinaworldofinformationandnewsismediatedbydata,interpelattingandadaptingtoconsumerpreferencesinaneverydaymoreautomatedway.Weliveinaworldwhichisincreasinglyinfluencedbyalgorithmsandartificialintelligencemethodsandprocesses.Thistrendisnowspreadingrapidlytotheanalysisandcomprehensionoftheflowofinformation,opinionsandnewsinthedigitalmedia.Algorithms,machinelearningandartificialintelligenceisbeingimplementedinthefilteringofalargepercentageofthecontentpublishedonsocialmediaplatformsandtheirApps,pickingoutwhatispotentiallynewsworthy(Thurmanetal.,2016,2017)fortheconsumergivenitspreferencesandtransformingnews`managementandagendasettinginmediaenterprisesintoinamoreDataBrokerManagementJournalism(DBMJ)wherenewjournalisticcompetencesconvergewithonlineandalmostrealtimeanalysisofthe“trends”,“visits”,“ratios,…andbrokermarketingorientedvisualizations.Inthiscontext,thereisagrowinginterestinsurveyingopinionsusinglarge-scaledataproducedbysocialmedia(Cobb,2015;O'Connor,2010;Bollen,Mao&Pepe,2011)inmediaenterprisesandinparticularthetraditionaljournalisticallyorientedbusiness.Thevastmajorityoftheseresearchisbaseduponeitheronmanualclassificationorautomatedcontentanalysisusingdictionariesthatscorewords(e.g.givinganapriorinegativeorpositivevaluetoeachword)(Leetaru,2012;Feldman,2013)andotherapproachessuchassupervisedmachinelearning(Vinodhini&Chandrasekaran,2012)arescarceincommunicationresearch(vanZoonen&Toni2016).Moreover,theeffortsinbringingtogetherautomatedsentimentanalysisbasedonmachinelearningandstreamingtechnologiesthatproduceimportantamountofdata,arerelativelynewtojournalisticorientedmediaenterprises.Thispaperdescribesandassessthecreationofmachinelearningmodelstopredictsentimentsinreal-timetweetsassociatedtotheunderrevolutionJournalisticValueChain(JVC)anddepictshowthisprocesscanbescaledusingcommercial

16

distributedcomputingwhenpersonalcomputersdonotsupportcomputationsandstorageinordertoprovidetheDataJournalisticUnit(DJU)withtoolsforabetterjobandinformationperformance.Localcomputingsolutionsforjournalisticpurposesmayprovideseriouslimitationsintherangeofhighamountofdataanalysiswhichnecessarilyrequiresscalablestorageanddistributedcomputing.Runningstreamingdataanalysisindistributedplatformshasbeenchallenginginthecomplexandchangingbigdatalandscape(Turck&Hao,2016).TheincorporationoftoolssuchasApacheKafkahasallowedthecurrentmostextendedopensoftwarefordistributedcomputingApacheSparktofulfillthisgapwithSparkStreaming(SparkKafkaIntegration,2016),whichcanreadcodeinScalaoralsoinPython(withthemodulePySpark).WeanalyzeinourresearchhowjournalistscanextendsentimentanalysiswithApacheSparkStreaminginlocalmachinesusingtrainedmodelswithSparkMachineLearning.Wealsoexplainhowthisprocedureisscalableusingcommercialtools(insteadofacademicgrids)suchasthemostpopularInfrastructureasaService(IaaS)AmazonWebServices(AWS),thatoffersAmazonS3formassivestorageandAmazonElasticComputingCloud(EC2)tocreateaflexiblesetofconnectedinstancesinthecloudinordertocomputetheanalysis.

MethodsThecomputationalmethodsandservicesexplainedinourresearchmaycontributetohelpjournalistsinmediaenterprisesstudy,interpretandanalyzebigamountsoftweetsinanylanguagerunningsentimentanalysisinreal-time.Thesemethodsandtechniquesdorequiresomeprogrammingskills,howeverexitingmodelsallowsshort-timelearningcurvesforthefinaluserprovidingeasyadaptations.WeprovidejournalistswithallthecodeforSpark(writteninPythonandusingPySpark)inaiNoteBook(ipynb).Nomathematicalbackgroundisneededtorunthemachinelearningmodels,butatheoreticalunderstandingofthealgorithmswillincreasethequalityoftheinterpretationforthetrainedjournalist.Inthecaseofthementionedcommercialservices(AWS,Azure,IBM,etc.)mediaenterprisesmustconsiderthefinancialcostsassociatedforthisanalysis.Inaddition,workingwithinterdisciplinaryteams(computerscientists,statisticians,computationallinguistics,etc.)canimprovetheresultsandsaveresourcesforthedesignoftheDatajournalisticenterprise.Thedescribedproceduretomonitortweetsinstreamingmighthelptestingtraditionalandemergingtheoreticalapproachesincommunicationresearchthatrequirelongitudinaldataandmightalsocontributetoexperimentalstudieswhichneedreal-timeinputstocreateoradapttostimuli,understandingthemediajournalisticvaluechaininareciprocalwayiskeytotheconsolidationoftheprofessionalprofilesoftheDataAnalysisEra.

FindingsandArgumentInasectorassociatedtoinnovationandtechnology,mediaenterprisesareadaptingandtransformingtheirworkflowstructuresintoamoreautomatedjournalisticvaluechain,whereartificialintelligencesoftwareissubstitutingtraditionaljournalistic“roles”intoascenariowherelittleornohumaninterventionisrequiredasidefromthesoftware-hardwareimplementationandprogramming(Carlson,2015)insomespecificnichenewsproductionandredistributionlikesocialnetworks.IfImayuseametaphortoillustrate,roboticsongoingimplementationisrevolutionizingtheautomobileindustryatacontinuouspace,likewisethedevelopmentofbigdatabroadly,artificialintelligenceprocessesandmachine-learningandwrittennewsisopeninganewscenariowheretechnologyprovidersimplementalgorithmsandartificialintelligenceprocessestodeliverautomatednewsinmultiplelanguageswithethicalchallengesarising(Dörretal,2016,2017).Thisisthetruerevolutionforjournalistic-mediaenterprises.Algorithmsarebeingusedinnewwaystodistributeandpackagenewscontent,bothenablingconsumerstorequestmoreofwhattheylikeandlessofwhattheydon’tandalsomakingdecisionsonconsumers’behalfbasedontheirdatapreferencecurveandprofile(GrootKormelinkandCosteraMeijer,2014).WeprovidejournalistsasindicatedabovewithallthecodeforSpark(writteninPythonandusingPySpark)inaiNoteBook(ipynb).Nomathematicalbackgroundisneededtorunthemachinelearningmodels,butatheoreticalunderstandingofthealgorithmswillincreasethequalityoftheinterpretationforthetrainedjournalist.

17

ConclusionsThesocialroleofjournalismwillprevailasalongstandingfacilitatorandinterpreterofwhatisgoingon,butthelabourprocessofthejournalisticrolesandauthoritieswillmergeintoanArtificialIntelligenceroleandahumanbasedvalueaddedprovider.Thequalityofthenews,theirinterpretation,thefinalaccountabilitywillbeplacedprogressivelyinthehandsofartificialneuralnetsandhumanneuronssimultaneously.Asimplementedintheautomobileindustry,thehumanneuronswillsuperviseandprogrammethesoftware-hardware-roboticprocessesworkflowandcontributewherecomplexreasoningandreprogramming,orcomplexwritingisatneedandsocio-economicandpoliticallyviable.Thereshouldnotbeacatastrophicconcernaboutthequalityofthenews,theirtransparencyandaccountabilitywiththedatarevolutioninplace.Itwillremainalmostasitistodayandimprovingwithmoredataavailableandanalysis.Itwillimplylesshumaninterventionwhere“machinelearningprocesses”willprevailgiventheircomparativeadvantagestothatofthehumanworkforce.ThejournalisticworkerwillevolveintoaDataBrokerManagementJournalist(DBMJ)ridinghis“datadrivensurfboard”inanalldigitalrevolutionisedJournalisticValueChain(JVC),followingdataandnewslabelledviablockchainsforbettertraceability.ThisrenewedjournalistwilladjustprocessesandtakedecisionsbysupervisingthejobdonebytheArtificialIntelligenceBigDataNewssolutions,filteringandcontrasting“fakeorirrelevant”newsandinsomemorevalue-addedandquality-orientatedmediabusinesswilltosomeextentcomplementtheirproductsandservicesprovidinghumananalysisandinterpretationwheresuitablefortheconsumerandprofitableforthejournalisticbusiness.ThetraditionalhumanbrandbasedjournalismwhichtheNYT,theWashingtonPost,ElPaís,LaNación,…amongothersrepresent,aremergingatasteadypacewiththeDBMJandanalldigitalanddatabasedjournalisticvaluechain.Theoldandneweditorialfunctionsareprogressivelybeingallocatedtonewworkingprofilesandrolessituatedatrenewedandspecificvaluechainlociwheretheintersectionofthehumanandthemachineisbeingsubstitutedbyprimarilyartificialintelligenceprocessesgivenitscompetitiveadvantages.Theobligationsofautomation,artificialintelligence,machinelearningjournalismshouldkeepinplacethenorms,ethicsandvaluestranscribedintothenew“alldigital”software-hardwarejournalism.Therelationshipwiththeaudienceeitherperformedbyanallautomatic“roboticnewsprovider”withthesupervisionandmediationofhumanintelligenceshouldhavetheobligationofpreservingconsumerrightsandeffectiveimplementationofprivacyanddatamanagementpolicies.Weprovideinourresearchwiththeprototypeofoneofthelatestpredictiveinstrumentsandmethodsforresearchonalgorithms,machinelearning,automation,andnews,the“distributedjournalisticorientedsentimentanalysis”toolforrealtimetweetsDJOSA-tool,oneofmanytoolsfortheDBMJ.Anewparadigmforcontemporaryempiricalresearch,andrigorousconceptualdevelopmentondigitaljournalismanddataanalysishastobeimplementedinthecomingyearsinthenewdatascenariofornewsproductionanddistribution.Researchorientatedtoempiricalanalysiswithrichconceptualdiscussingwillbepresentedinourcasestudy.Thepromisedlandforjournalismswhichalgorithmsandautomationisbuildingwillprovidepersonalisationofcontent,fasternewsprovisionassociatedtopreferencecurves.Appsareboundtobetheutensilinengagingusersinamixedpublicityfundedanddirectpaybusinessmodel.Socialnetworksandcontentproviderswillnecessarilymergeandsearchforqualitycontentinordertoretainconsumerswithintheirpersonalisedenvironments.Inthisarticle-researchwedescribeandevaluatetheapplicationofPredictiveSentimentAnalysis-PSA-,toapoliticalcommunicationcasestudythroughareal-timeclassifierofpoliticalopinionsinSpanishtweetsusingmachinelearningmethodsandtechniquesbothonalocalcomputerandusingdistributedcomputingforBigDataproblems.Wepresentthepilotapplicationandthefirstresultsofthedesigneddataexperimentandprototype.Wedescribetheassociatedemergingmethodologiesandtechniquesandanalyzethethreatsandopportunitiesthattheseinnovationsrepresentforpoliticalcommunicationandothercommunicationalresearchareasofinterest.Thisprototypefreelyaccessible-opensourcedenablingthecommunicationresearchertoautonomouslyinterpretthedatagivenminimalpriortrainingonthetechniquesandmethods.Itprovidesascientificinstrumentdesignedfortheunderstandingofmediaflowsandpoliticalthoughandopinion.

18

Thedataparadigmhasarrivedasanunquestionablesourceofinformationconceptforthestudiesofdigitalcultureanddigitalmedia,communicationandtechnology.Algorithmsanddataaretodayfundamentalinordertoeffectivelycontrastformerunexplorablehypothesis.Itmaybedisruptiveatthebeginningbutitiscertainlyachangeofparadigm,fromobscuritytopotentiallylargequantitiesofdataanalysisandmachinelearningmethodologies.Thefocusdoesnotchange,itisunderstandingcommunicationalprocesses,buttheinstrumentsanddatadochangehowweunderstand,studyandresearchinourdisciplines.Theshiftoffocusonalgorithmsanddataispositivelydisruptiveforthewaysinwhichweseeourresearchanddisciplines.Itmayevenappeartolimitthetheoreticalandmethodologicaltoolsthroughwhichweincreasinglytrytounderstandmediation,theformationofidentity,sociallife,politicsandthecreativeindustries.ThereisaneedtoreformulatethetheoreticalandempiricalperspectivesandevenparadigmsonCommunicationResearchanditsrelationwithdataacquisition,curationandinterpretation.IfwearetoaugmentanddiversifyourperspectivesattheCommunicationResearchAcademia,algorithms,machinelearningandartificialintelligencearecertainlymustmethodsandinstrumentinbringinglighttothestillscientificallyunexploreddigitalcitizen-consumer.Thisresearchshowsamethodologicalandinstrumentalmethodstoaddresscommunicationaldigitalprocesses,fromacomplementaryapproachtoexistingscientificmethods.

ReferencesAnderson,C.W.(2017)Socialsurveyreportage:Context,narrative,andinformationvisualizationinearly20thcenturyAmericanjournalism.Journalism18:1,pages81-100.Ausserhofer,J.,RobertGutounig,MichaelOppermann,SarahMatiasek,EvaGoldgruber.(2017)Thedataficationofdatajournalismscholarship:Focalpoints,methods,andresearchpropositionsfortheinvestigationofdata-intensivenewswork.Journalism17.Bollen,J.,Mao,H.,&Pepe,A.(2011).Modelingpublicmoodandemotion:Twittersentimentandsocio-economicphenomena.ICWSM,11,450-453.Carlson,M.(2015).TheRoboticReporter.Automatedjournalismandtheredefinitionoflabor,compositionalforms,andjournalisticauthority.DigitalJournalism,3:3,pp.416-431.http://dx.doi.org/10.1080/21670811.2014.976412Cobb,W.N.W.(2015).Trendingnow:usingbigdatatoexaminepublicopinionofspacepolicy.SpacePolicy,32,11-16.Dörr,K.N.(2016)MappingthefieldofAlgorithmicJournalism.DigitalJournalism,4:6,pp.700-722.Dörr,K.N.,Hollnbuchner,K.(2017)EthicalChallengesofAlgorithmicJournalism,DigitalJournalism,5:4,404-419,DOI:10.1080/21670811.2016.1167612Davies,K.,TrevorCullen.(2016)DataJournalismClassesinAustralianUniversities:EducatorsDescribeProgresstoDate.AsiaPacificMediaEducator26:2,pages132-147.FaridaVis(2013)Twitterasareportingtoolforbreakingnews.DigitalJournalism,1:1,27-47,DOI:10.1080/21670811.2012.741316Feldman,R.(2013).Techniquesandapplicationsforsentimentanalysis.CommunicationsoftheACM,56(4),82-89.Go,A.,Bhayani,R.,&Huang,L.(2009).Twittersentimentclassificationusingdistantsupervision.CS224NProjectReport,Stanford,1,12.Groot-Kormelink,T.,Costera-Meijer,Irene(2014)Tailor-MadeNews,meetingthedemandsofnewsusersonmobileandsocialmedia.JournalofJournalismStudies,Volume15:5:Futureofjournalism:inanageofdigitalmediaandeconomicuncertainty.Pp.632-641.http://dx.doi.org/10.1080/1461670X.2014.894367Jung,J.,HaeyeopSong,YoungjuKim,HyunsukIm,SewookOh.(2017)Intrusionofsoftwarerobotsintojournalism:Thepublic'sandjournalists'perceptionsofnewswrittenbyalgorithmsandhumanjournalists.ComputersinHumanBehavior71,pages291-298.Kelleher,J.D.,MacNamee,B.,&D'Arcy,A.(2015).Fundamentalsofmachinelearningforpredictivedataanalytics:algorithms,workedexamples,andcasestudies.MITPress.Leetaru,K.(2012).Dataminingmethodsforthecontentanalyst:Anintroductiontothecomputationalanalysisofcontent.Routledge.MarkCoddington(2015)ClarifyingJournalism’sQuantitativeTurn,DigitalJournalism,3:3,331-

19

348,DOI:10.1080/21670811.2014.976400Marchetti,R.,Ceccobelli,D.(2016)TwitterandTelevisioninaHybridMediaSystem.JournalismPractice10:5,pages626-644.MominM.Malik,JürgenPfeffer.(2016)AMacroscopicAnalysisofNewsContentinTwitter.DigitalJournalism4:8,pages955-979.NicholasDiakopoulos(2015)AlgorithmicAccountability,DigitalJournalism,3:3,398-415,DOI:10.1080/21670811.2014.976411O'Connor,B.,Balasubramanyan,R.,Routledge,B.R.,&Smith,N.A.(2010).Fromtweetstopolls:Linkingtextsentimenttopublicopiniontimeseries.ICWSM,11(122-129),1-2.SparkKafkaIntegration(2016).SparkStreaming+KafkaIntegrationGuide.Availableat:http://spark.apache.org/docs/latest/streaming-kafka-integration.htmlSethC.Lewis(2015)JournalismInAnEraOfBigData,DigitalJournalism,3:3,321-330,DOI:10.1080/21670811.2014.976399Shahin,S.(2016)WhenScaleMeetsDepth:IntegratingNaturalLanguageProcessingandTextualAnalysisforStudyingDigitalCorpora.CommunicationMethodsandMeasures10:1,pages28-50.Sormanen,N.,JukkaRohila,EppLauk,TuroUskali,JukkaJouhki,MaijaPenttinen.(2016)ChancesandChallengesofComputationalDataGatheringandAnalysis.DigitalJournalism4:1,pages55-74.Swart,J.,Peters,C.,Broersma,M.(2016)NavigatingCross-MediaNewsUse.JournalismStudies0:0,pages1-20.Swart,J.,ChrisPeters,MarcelBroersma.(2017)Repositioningnewsandpublicconnectionineverydaylife:auser-orientedperspectiveoninclusiveness,engagement,relevance,andconstructiveness.Media,Culture&Society,pages016344371667903.TuomoHiippala(2017)TheMultimodalityofDigitalLongformJournalism,DigitalJournalism,5:4,420-442,DOI:10.1080/21670811.2016.1169197Turck,M.&Hao,J.(2016).TheChartoftheBigDataLandscape2016(Version3.0).Availableat:http://mattturck.com/big-data-landscape-2016-v18-final/Thurman,N.,Dörr,K.N,Kunert,J.(2017)Whenreportersgethand-onwithrobo-writing:ProfessionalsCondierAutomatedJournalism`sCapabiitiesandConsequences,DigitalJounalism.http://dx.doi.org/10.1080/21670811.2017.1289819Thurman,N.J.,Schifferes,S.,Fletcher,R.,Newman,N.,Hunt,S.&Schapals,A.K.(2016).Givingcomputersanosefornews:exploringthelimitsofstorydetectionandverification.DigitalJournalism,4(7),pp.838-848.doi:10.1080/21670811.2016.1149436vanZoonen,W.,&Toni,G.L.A.(2016).Socialmediaresearch:Theapplicationofsupervisedmachinelearninginorganizationalcommunicationresearch.ComputersinHumanBehavior,63,132-141.Vinodhini,G.,&Chandrasekaran,R.M.(2012).Sentimentanalysisandopinionmining:asurvey.InternationalJournalofAdvancedResearchinComputerScienceandSoftwareEngineering,2(6),282-292.Wiesenberg,M.,Zerfass,A.,MorenoA.(2017)BigDataandAutomationinStrategicCommunication.InternationalJournalofStrategicCommunication11:2,pages95-114.Wilson,T.,Wiebe,J.&Hoffmann,P.(2005).RecognizingContextualPolarityinPhrase-LevelSentimentAnalysis.Proc.ofHLT-EMNLP-2005.RecognizingContextualPolarityinPhrase-LevelSentimentAnalysis.Yeon-Lee,N.,Kim,Y.,Sang,Y.(2017)HowdojournalistsleverageTwitter?ExpressiveandconsumptiveuseofTwitter.TheSocialScienceJournal54:2,pages139-147.

20

Style,Singularity,andSubstance:WhatPictureEditorsWantfromA.I.

MartinSchön ProfessorNeilThurmanLMUMunichmartingeorg.schoen@gmail.com

LMUMunichneil.thurman@ifkw.lmu.de

Abstract:Thispaperaimstoassesstheopportunitiesandchallengesinherentintheuseofartificialintelligenceinjournalisticpictureediting.Todothiswebuiltatoolthatsuggestsimagestoillustratenewsarticlesusingkeywordextractionandevaluateditusingaqualitativeonlinesurveyofprofessionalpictureeditorsandasimple,manual“suitability”heuristic.Ourresultsshowthatthetoolisabletoreturnafactually“suitable”imageabouthalfthetime,performingbetteronnationalorinternationalstoriesthanonthosewithalocalorregionalfocus.However,thesurveyofpictureeditorsrevealedthatwhetheranimagematchesastory’stopicisnottheonlycriteriausedinimageselection.Alsoimportantiswhethertheimage:workswithinthespaceallocatedtoitonthepage,matchesthetargetpublication’shousestyle,hasparticularaestheticqualities,andisoriginal—helpingthestoryanditsrespectivepublicationtostandoutfromthecompetition.Thedevelopmentanddeploymentofartificialintelligencetoolsinjournalisticpictureeditingwillneedtoconsiderthesecontextualandartisticissues,aswellastheresistancetoautomationthatsomeprofessionalpictureeditorsexpressedtous.Keywords:Artificialintelligence,ImageSelection,Keywordextraction,Machinelearning,PictureeditingIntroductionInAugust2018,GettyImageslaunched‘Panels’,an“artificialintelligencetool...thatrecommends...visualcontenttoaccompanyanewsarticle.”Thepromisewasthatitcouldhelppictureeditorscreate“betterstories,morequickly”(Getty2018).Thelaunchispartofatrendforartificialintelligencetobeappliedtobotheditorialfunctionswithinnews,aswellastothecreationofstillandmovingimagesmorewidely.Inthispaperwedescribeasystemthatwehavebuiltthatautomaticallyselectsimagesfornewsarticlesandevaluatethatsystemwiththehelpofprofessionalpictureeditorsinorderthatwecanassesstheopportunitiesandchallengesinherentintheuseofartificialintelligenceinjournalisticpictureediting.

MethodsWebuiltasystemthatsuggestsimagestoillustratenewsarticlesusingkeywordextraction.Thetooltakestheplaintextofanewsarticleasinput,returningasearchstringthatisusedtoqueryanimagedatabase.Thetoolranksalltermsthatoccurinthearticleaccordingtothreecriteria.Firstly,termfrequency:thenumberoftimesatermoccurs.Secondly,firstoccurrence:thepositionoftheterm’sfirstmention.Thirdly,entitycategory:asemanticcategorisationofthetermretrievedfromtheThomsonReutersOpenCalaistaggingservice.ExamplesofcategoriesincludeHumanProtagonistandLocation.Therankingisperformedbyafeedforwardneuralnetwork.Thenetworkistrainedtoclassifygoodandbadsearchtermswithasetof100,000termsgeneratedfromourowncorpusof20,000BBCNewsarticles.Thehighestrankedtermsarecompiledintoasearchquery.Inadditiontothismachinelearningapproach,asecondrankingmechanismwasdeveloped.Thisstatisticalapproachcalculatestherankingscoreforeachtermdirectlyfromthetermfrequencyandfirstoccurrencevalues,withoutanypriorlearninginvolved.AdemonstrationofthetoolusingtheGettyImagesAPIisavailableonline(Schön2018).Inordertoevaluatethesystemweusedaqualitativeonlinesurveyofprofessionalpictureeditorsandasimple,manual“suitability”heuristic.Thesurveyusedaconveniencesample(N=25)andaskedpictureeditorsabouttheirworkroutines,withafocusonidentifyingtasksthathadthepotentialtobeautomatedandonhowtheyselectedimagestoillustratearticles.Theeditorswerethengivenanopportunitytouseoursystembyinputtingtextstories

21

toreceive,automatically,suggestionsforillustrativeimages.Followingtheinteractivedemonstration,therespondentswereaskedtodiscussthesystem’sstrengthsandlimitations,andAI’sfuturepotentialintheirwork.

FindingsOurfirstevaluationusedasimple,manualheuristictodeterminethesystem’sperformanceintermsofthegeneralsuitabilityoftheimagessuggestedforagivennewsstory.Wetestedimplementationsofbothrankingmechanisms—fourdifferentneuralnetworksandonestatisticalapproach—with100articlesfromtheBBCNewscorpus.TheGettyImagesAPIwasusedastheimagedatabase.Theresultingimagesweremanuallyclassifiedaseither“suitable”or“unsuitable”withregardtotherespectivearticle.Animagewasdeemed“suitable”ifitillustratedthemaintopicofthestory,butnojudgementwasmadeontheimage’saestheticqualities.BecauseallimagesreturnedcamefromtheGettyImagesdatabase,theymetGetty’sminimumstandardsforsharpness,exposure,compositionandsoforth.Theevaluationresultsshowthatbothrankingmechanismsperformsimilarlywell,withtheneuralnetworksperformingslightlybetter.Thetoolworkedbetteronarticleswithoutalocalorregionalfocus,suchasnewsaboutinternationalpolitics,technology,scienceorbusiness.Onarticleswithalocalfocus,thestatisticalapproachoutperformedthenetworksbyfar.Firstoccurrenceprovedtobethemostpowerfulcriterionforjudgingthesuitabilityofatermassearchquery.Onaveragearoundhalftheimagesreturnedwereclassifiedas“suitable.”Oursecondevaluationinvolvedaqualitativeonlinesurveyofprofessionalpictureeditors.Mostoftheeditorsselectedimagestoillustratespecificarticlesatleastdaily,butveryfewwereawareof,orhadused,anysoftwarethatcouldhelpautomatetheirroutinetasks.Thosewhohaddidnotcomeawayveryimpressed.Onehadusedsoftwarethatcouldautomaticallycropimagesbutfoundit“restrictive”.AnotherhadusedtheGetty‘Panels’productmentionedintheintroductionbutthoughtitwasdeficient,notbeing“smartorsubtleenough.”Thepictureeditors’feedbackonoursystemwas,onbalance,morenegativethanpositive.Onthepositivesidesomeacknowledgedthatitwascapableofsuggestingsuitableillustrativeimagesquickly.Onesaidtheymightuseit“ifIhadarushonorwasstuckforideas”.Otherssuggesteditmightbeusefulforteamswhodidnothaveadesignated,orexperienced,pictureresearcher.Onthenegativeside,acommoncriticismconcernedthelackofrelevanceofsomeofthesuggestedimages—alimitationour“suitability”heuristicalsohighlighted.Theeditors’surveyrevealedothershortcomingstoo.Someeditorswantedthesystemtobeabletosuggestimagesfromawidervarietyofsources(notjustGetty)andtoshowthecostsoflicencingparticularimages—notinsurmountabletechnicalchallenges.However,someoftheeditors’otherwishespresentmoreofachallenge.Firstly,severalmentionedthatimagesdonotonlyhavetoillustratethecontentofaparticularstorybuttheyneedtobeinkeepingwiththehousestyleofthetargetpublication.Secondly,animagehastobesuitableforthespaceavailableforitonthepage,whichmightprecludeimageswithcertaincompositionsoraspectratios.Thirdly,editorsemphasizedtheimportanceofimages’aesthetics,forexampletheir“beauty”and“visualimpact.”Finally,theimportanceofhaving“original”or“unique”imageswasemphasized,athoughtencapsulatedinthisresponsefromoneofourrespondents:“Iwantsomethingdifferenttowhateveryoneelsehas.”

ConclusionsInthisstudywehavehighlightedsomeoftheworkbeingundertakentoapplyartificialintelligencetothetaskofselectingimagestoillustratestories.Ourownsystemwasdescribed,andwedemonstrated,bothquantitativelyandqualitatively,thatitiscapableofreturningafactually“suitable”imageabouthalfthetime.Withfurtherdevelopmentthisproportionshouldimprove.However,oursurveyofprofessionalpictureeditorsrevealedthat,inthereal-world,itisnotenoughforanimagetomatchthetopicofastory.ForAIimageselectiontools—likeoursandGetty’s‘Panels’—tobeusefultheyneedtodomore.Twointerestingchallengesare,firstly,tomakethetoolscontextuallyaware,bothatthepagelevel(wheretheimagewillappear)andattheoutletlevel(thepublication’shousestyle)and,secondly,toofferselectionsthatareabletofulfileditors’desiretohave“original”imagesthatare“different”totheircompetitors.Twootherrequirementspresentamuchgreaterlevelofchallenge.Firstly,toselectimagesthathavetherightaestheticand

22

emotionalqualitiesand,secondly,tobuildsystemsthatdonotmakepictureeditorsfeelthatthe“creative”elementofpictureeditingisbeing“takenaway”fromthem.

ReferencesGetty(2018)“GettyImagesLaunchesAITooltoTransformSearchforMediaPublishers”2August,http://press.gettyimages.com/getty-images-launches-ai-tool-to-transform-search-for-media-publishers/Schön,Martin(2018)PicpicExplorer.http://picpic-explorer.argonn.me/#/.Accessed24October2018.

23

Candatajournalismreallystimulatelocalnews?AcasestudywithmediainthecountrysideofPortugal

RicardoMorais PedroJerónimoUniversityofBeiraInterior/Labcom.IFPricardo.morais@labcom.ubi.pt

UniversityofBeiraInterior/Labcom.IFP&CECSpedrojeronimo.phd@gmail.com

Abstract:Thisstudyaimsatoutliningthedevelopmentofdatajournalismattwenty-fivePortugueselocalmedia.Theresultspresentedarebasedonasurveyto107journalistsfromlocalnewspapersandradios.Acontentanalysisevidencingdatajournalismpublishedontheirwebsitesisalsopresentedhere,andconfrontsresultsobtainedthroughthesurvey.Themainfindingssuggestthatdespitetheknowledgejournalistshave,thereisnoinvestmentindatajournalismbecausethistypeofpracticeisnotconsideredtobeadeterminingfactorinattractingnewaudiencesandapproachingthemost-assessednews.Consideringthecarried-outanalysis,wepropose,withinthescopeoftheprojectinwhichwedevelopedthisstudy,somepracticesthatlocalmediacanadoptthatcanhelpjournaliststorealizethepotentialofdatajournalism,butaboveall,toencouragethemtoadopttheirpractices.Keywords:Datajournalism;localnews;Portugal;Remedia.Lab

IntroductionInrecentyearsmuchhasbeensaidaboutthepossibilitiesofdatajournalismandhowitcanimprovethejournalisticfield.Butthetruthis,asRogersreveals,“Datajournalismisnotnew”,actually,thefirstexampleofdatajournalismdatesbackto1821,“intheveryfirstGuardian”andisrelatedtoa“listofschoolsinManchesterandSalford,withhowmanypupilsattendedeachoneandaverageannualspending”(Rogers,2013,p.60).Therefore,thiswell-knownformofjournalismhasjustawakenedinrecentyearsbecausesocietiesbecomeincreasinglydigitalandtheamountofinformationavailableonnetworksgrows.Itwaspreciselythisincreaseintheamountofinformationavailablethatmadedatajournalismbecomedeterminantintwolevels:“1)analysistobringsenseandstructureoutofthenever-endingflowofdataand2)presentationtogetwhat’simportantandrelevantintotheconsumer’shead”(MeyerapudGray,ChambersandBounegru,2012,p.6).Atatimewhereinformationiseverywhere,themostimportanttasksarenolongersearchandgather,butfilteringandverification.Theroleofjournalistsis,inthiscontext,particularlysignificant,sincetheyhavethepowertomakesenseofinformation.AsPilhofersays,datajournalism“canincludeeverythingfromtraditionalcomputer-assistedreporting(usingdataasa“source”)tothemostcutting-edgedatavisualizationandnewsapplications”,buttheultimategoalremainsthesame:“providinginformationandanalysistohelpinformusallaboutimportantissuesoftheday”(PilhoferapudGray,ChambersandBounegru,2012,p.6).Inspiteofthat,therearesomelimitationsindatajournalismqualityeveninmajormediacompanies(Young,HermidaandFulda,2018).Importantfortheproductionanddisseminationofinformationinthenewdigitalecosystem,datajournalism“maybethemostpowerfulforumofcollectivejournalisticsensemaking”(Anderson,2019).Thispracticeassumesparticularimportanceincertaincontexts,suchasthelocalone.AsKristenMuller,achiefcontentofficeratKPCC,says,“iflocalnewsroomsaregoingtoachievedigitalsustainability,wemusttrynewthings.Weneedtoexperimentwithdifferentapproachestocoverageandrevenue”(2018).Therefore,proximity,thatisacharacteristicoflocalandregionalmedia,canhaveindatajournalismauniquepossibilityto“findinguniquestories(notfromnewswires),andexecutingthewatchdogfunction”.JerryVermanenbelievesdatajournalismiscrucialtoregionalnewspapers,“becauselocalnewspapershavethisdirectimpactintheirneighborhoodandsourcesbecomedigitalized,ajournalistmustknowhowtofind,analyzeandvisualizeastoryfromdata(Vermanen

24

apudGray,ChambersandBounegru,2012,p.7).AlsoStefanBack(2018)pointedoutthattheengagementofjournalistsandcivictechnologistscanbechallengingtopublicserviceatalocallevel.Thequestionthenarisesastowhethertheselocalmedia,whereworkingconditionsareoftenscarcesincethenumberofjournalistsislow,understandthepotentialofdatajournalismandarereadytoit.Sometimestheydoandalsohavepeopleinthenewsroomswiththatkindofknowledge(editorialandtechnicalstaff)but,unfortunately,thatisnotembeddedinacompanyoreditorialstrategies(Jerónimo,2015).Dolocaljournalistsseedatajournalismasawaytoscrutinizetheworldandholdthepowersaccountable?Arejournalistsawareofdatajournalismtechniques?Cantheyunderstandbasicskillsfromtraditionaljournalismjustaren’tenoughinadigitalera?Thesearesomeofthequestionsthatweseektoanswerinthispaper,throughananalysisofasetoflocalmediainthecentralregionofPortugal,asignificantpartofthePortuguesemedialandscapeandwherethenumberofmediahasdeclinedinrecentyears,duetothelackofpublicsupportandlowaudiences.

MethodsIntermsofresearchmethods,weoptedforthestrategyofthestudycase,sinceitseemstousasamoreadaptedtoolfortherealitythatweintendtostudy.ForYin(1989)casestudyisempiricalresearchwhichconsistintheanalysisofaparticularphenomenonintherealworld,throughdifferentwaysofcollectingdata.RossmanandRallis(2003)considerthatthecasestudies“seektounderstandthelargerphenomenonthroughacloseexaminationofaspecificcaseandthereforefocusontheparticular”(p.104).Ourcasestudyis,infact,amultiplecasestudy(Yin,1989,p.52),whichwascharacterizedbythefactofperformingindifferentlocalmediaatthesametime.Toanswerourquestions,wecollectdatathroughasurveywithjournalistsfromtwenty-fivelocalnewspapersandradios,butweconductalsoacontentanalysisinsearchofdatajournalismexamples,publishedonthewebsitesofthemediainvestigated.WeactaccordingtotheproposalsofYin(1989),whoadvocatetheuseofdifferentdatasources,i.e.,“multiplesourcesofevidence”(p.23).ThejournalisticprojectschoseninthisstudyaimatrepresentingthecentralareaofPortugal,oneofthemostaffectedatthelevelofmediacommunicationforeclosure.Ontheotherhand,thesemediaarealsopartoftheprojectRemedia.Labinwhichwetry“todiagnosisthecurrentsituationoflocal/regionalmedia,promotingexperimentaltoolsandstrategiestostrengthentheirbusinessmodel,increasingtheirinnovationdegreeandimprovingtheirconnectionwiththepublic”.

FindingsandArgumentResultsfromthesurveyshowthatjournalistsseeasveryimportanthavingknowledgeinweb-scrapping,obtainingtoolsforanalysisanddatacollection,aswellasgainingknowledgeinthecreationofinfographicsanddatapresentation.Furthermore,answerscollectedfromthelocaljournalistssurprisinglyshowthattheseprofessionalshaveagoodknowledgeinasetofskills,suchasweb-scraping,datavisualizationandpresentation.However,findingsshowthatjournalistsdonotconsidertheuseofdatajournalismasthemosteffectiveapproachtoattractnewaudiencesaswellastoretainthem.Alsothelackofcompanyandeditorialstrategiescanhelptodiscouragethiswayofthinking(Jerónimo,2015).Thisisthekindofdatathatweseektoexploreinthiswork,especiallysinceseveralstudiesindicatethatdatajournalismcanhelptorevitalizelocaljournalismingeneral,andmoreparticularly,smalllocaljournalismoffices.Cantheseresultshelpexplainthelimitednumberofjournalismworksbasedondatajournalismwefoundonmediawebsites?If,asthesurveyshows,knowledgeindatajournalismtechniquesexist,isthelackofinvestmentduetoinsufficiencyofhumanandmaterialresources?Thesearesomeoftheresultsthatwewillexploreinthisworktryingtoconfrontthisrealitywiththeamountofinformationthatisnowavailableonthenetwork,butalsoquestioningtheaccessibilityofthedataforjournalists'work.

ConclusionsAlthoughourfindingsshowthatlocalmediajournalistshavetechnicalknowledgetomanagedatajournalism,thoseskillsarenotinvestedinnewsproduction.Suchsituationcanbeexplainedbythelackofstrategies,thepresenceofsmallnewsroomsandatraditionalnewsmakingculture,asitisespeciallyevidentinnewspapers.Ontheotherhand,wecannotignorefindingsofpreviousstudiesthatidentify“keyactors”inthenewsrooms:

25

journalistswiththeabilitytoinnovateintheirfield,ontheirown,evenwithoutrecurringtoexistingstrategies(Jerónimo,2015).Withthiswork,wewouldliketopointoutthatthelackofdatajournalisminPortugueselocalmediaisanopportunityforjournalisticprojectstoassumethatthereareopportunitiesinthejournalisticfieldthatmustbeexploredtocaptureandmaintainaudiences.Identifying,encouragingandhelpingthekeyplayersintheessaysaresomeofthestepsthatneedtobetakennext.Theseresults,togetherwiththeevaluationofthetypeofdatathatpublicservicesandgovernmentsprovide,constituteimportantknowledgethatcanbetransmittedintheformofadvicetothelocalmedia,inordertohelpthemimplementdatajournalismintheiressays.Theseresultswillalsobeimportantforpublicandprivateentitiesatlocalandregionalleveltoopenlydisclosetheirdata.

ReferencesAlexandre,I.A.R.(2014).JornalismodeDados:oestadodaartenosjornaisgeneralistasdiáriosemPortugal.MestradoemNovosMediaePráticasWeb.FaculdadedeCiênciasSociaiseHumanasdaUniversidadeNovadeLisboa.Availableathttps://run.unl.pt/handle/10362/13615[AccessedApril3,2019].Anderson,C.W.(2019).GenealogiesofDataJournalism.In:J.GrayandL.Bounegru,eds.,TheDataJournalismHandbook2:TowardsaCriticalDataPractice.EuropeanJournalismCentreandGoogleNewsInitiative.Availableathttps://datajournalismhandbook.org/handbook/two/situating-data-journalism/genealogies-of-data-journalism[AccessedApril3,2019].Gray,J.andBounegru,L.(Eds.)(2019).TheDataJournalismHandbook2:TowardsaCriticalDataPractice.EuropeanJournalismCentreandGoogleNewsInitiative.Availableathttps://datajournalismhandbook.org/handbook/two#situating-data-journalism[AccessedApril3,2019].Baack,S.(2018).PracticallyEngaged,DigitalJournalism,6(6),673-692,DOI:10.1080/21670811.2017.1375382Gray,J.,Bounegru,L.andChambers,L.(2012).TheDataJournalismHandbook.Howjournalistscanusedatatoimprovethenews.Califórnia:O’ReillyMedia.Availableathttp://datajournalismhandbook.org/[AccessedApril3,2019].Jerónimo,P.(2015).Ciberjornalismodeproximidade:redações,jornalistasenotíciasonline.Covilhã:LabComBooks.Martinho,A.I.P.(2013).Jornalismodedados:contributoparaumacaracterizaçãodoestadodaarteemPortugal.DissertaçãodeMestrado.ISCTE-IUL.Availableathttp://hdl.handle.net/10071/8329[AccessedApril3,2019].Meyer,P.(2002).PrecisionJournalism:areporter’sintroductiontosocialsciencemethods.Maryland:Rowman&LittlefieldPublishers.Rogers,S.(2013).FactsareSacred:ThePowerofData.London:GuardianBooks.Rossman,G.andRallis,S.(2003).LearningintheField:Anintroductiontoqualitativeresearch.ThousandOaks(California):SagePublications.Yin,R.(1989).CaseStudyResearch–DesignandMethods.London:SagePublications.Young,M.L.,Hermida,A.andFulda,J.(2018).WhatMakesforGreatDataJournalism?,JournalismPractice,12(1),115-135.DOI:10.1080/17512786.2016.1270171Young,W.D.(2018).DataJournalismGoesUndercover.Availableathttps://www.niemanlab.org/2019/01/data-journalism-goes-undercover/[AccessedApril3,2019].