Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does...
Transcript of Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does...
![Page 1: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/1.jpg)
HAL Id: hal-01672282https://hal.archives-ouvertes.fr/hal-01672282
Submitted on 23 Dec 2017
HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.
Visual Network Exploration for Data JournalistsTommaso Venturini, Mathieu Jacomy, Liliana Bounegru, Jonathan Gray
To cite this version:Tommaso Venturini, Mathieu Jacomy, Liliana Bounegru, Jonathan Gray. Visual Network Explorationfor Data Journalists. Scott A. Eldridge II; Bob Franklin. The Routledge Handbook of Developmentsin Digital Journalism Studies, Routledge, 2018, 9781138283053. �hal-01672282�
![Page 2: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/2.jpg)
1
VISUALNETWORKEXPLORATIONFORDATAJOURNALISTSTOMMASOVENTURINI,MATHIEUJACOMY,LILIANABOUNEGRU,JONATHANGRAY
Networksareclassicbutunder-acknowledgedfiguresofjournalisticstorytelling.Whoisconnectedtowhomandbywhichmeans?Whichorganizationsreceivesupportfromwhichothers?Whatresourcesorinformationcirculatethroughwhichchannelsandwhichintermediariesenableandregulatetheirflows?Theseareallcustomarystoriesandlinesofinquiryinjournalismandtheyallhavetodowithnetworks.Additionally,therecentspreadofdigitalmediahasincreasinglyconfrontedjournalistswithinformationcomingnotonlyinthetraditionalformofstatistictables,butalsoofrelationaldatabases.Yet,journalistshavesofarmadelittleuseoftheanalyticalresourcesofferedbynetworks.Toaddressthisprobleminthischapterweexaminehow“visualnetworkexploration”maybebroughttobearinthecontextofdatajournalisminordertoexplore,narrateandmakesenseoflargeandcomplexrelationaldatasets.Weborrowthemorefamiliarvocabularyofgeographicalmapstoshowhowkeygraphicalvariablessuchasposition,sizeandhuecanbeusedtointerpretandcharacterisegraphstructuresandproperties.Weillustratethistechniquebytakingasastartingpointarecentexamplefromjournalism,namelyacatalogueofFrenchinformationsourcescompiledbyLeMonde’sTheDecodex.Weestablishthatgoodvisualexplorationofnetworksisaniterativeprocesswherepracticestodemarcatecategoriesandterritoriesareentangledandmutuallyconstitutive.Toenrichinvestigationwesuggestwaysinwhichtheinsightsofthevisualexplorationofnetworkscanbesupplementedwithsimplecalculationsandstatisticsofdistributionsofnodesandlinksacrossthenetwork.Weconcludewithreflectionontheknowledge-makingcapacitiesofthistechniqueandhowthesecomparetotheinsightsandinstrumentsthatjournalistshaveusedintheDecodexproject–suggestingthatvisualnetworkexplorationisafertileareaforfurtherexplorationandcollaborationsbetweendatajournalistsanddigitalresearchers.
INTRODUCTIONFewpeopleknowaswellasjournaliststhattheworldismadeofrelations.Followingalliances,unveilinglinks,unravellingthreadsis,andhaslongbeen,acentralpartoftheirinvestigations.Ifsocialscientistscanspeculateaboutlongstandingstructuresandglobalarrangements,journalistshavenosuchleisure.Theirworkconsistsintracingthespecificassociationsthatconnectindividualsandinstitutionstouncoverhowlumpsofmoney,influenceandknowledgeareexchangedthroughthemandwhereunethicalbehaviour,corruption,fraudorunfairpoliticalinfluencemayoccur.Theadventofdigitaltechnologieshasmadesuchworkbotheasierandmoredifficult.Easier,becauseithasincreasedthetraceabilityofeconomicandpoliticalassociations.Moredifficult,becauseithassubmergedjournalistswithmoreinformationthantheirinvestigativetoolkitisusedtohandling.
When,forexample,thereportersoftheInternationalConsortiumforInvestigativeJournalism(ICIJ)receivedthe2,6terabytesand11,5milliondocumentscomposingtheso-called'PanamaPapers',theyobviouslycouldnotprocessthemmanually(Baruch&Vaudano,2016).Notethatthisisnotjusta‘bigdata’problem.Thetroublewiththeleakwasnotonlyitssize,butthefactthatitsinterestcamefromthelinksitestablishedbetweenspecificindividualsandparticulartax-havens.Extracting“key”figuresthroughstatisticalaggregationorabstractedcomputationalmodelswouldmissthepointofmanyofthestoriesthatjournalistsweremostkeentoexplore.Theinquirycouldnotsimplifythedataset,buthadtoexploreeachandeveryoneoftheconnectionsitexposed.Thiswasdone,amongothersways,throughatoolcalledLinkurious(http://linkurio.us),whoseinterestcomeslessfromitscomputationalpowerthanfromthewayinwhichitallowsitsuserstoseeandfollowtheconnectionsofanetwork.
ThePanamaPapercaseisinteresting,butalsointerestinglyisolated.Despitelongstandinginterest,theuseofnetworksinjournalismremainscomparativelymarginal(cf.Bounegruetal.,2016foranoverviewoftheemergingusesofnetworksinjournalism).Thereasonsarenotdifficulttoimagine.Graphmathematicsismoredemandingandlesswidelyknownthantraditionalstatisticalapproachesanddoesnotcomewiththesamereadilyaccessibleandpubliclyrecognisedvocabularyofvisualmotifs.Withallits
![Page 3: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/3.jpg)
2
computationalpower,graphmathematicsdoesnotfitjournalisticneedsbecauseittendstobeobscureforbothreportersandtheirreaders.
Inthischapter,weaddressthisdifficultybysuggestingatechniqueforthevisualexplorationofnetworks.Aswewilltrytoshow,whenperformedcorrectly,thevisualrepresentationofnetworktranslatessomeofthemostimportantgraphstructuresintographicalvariables(therebysupportinginvestigativework)andallowingtheinterpretationofnetworkswithconventionssimilartothosedevelopedforgeographicalmaps(therebyremaininglegibleforalargeaudience).Afterhavingintroducedthemathematicalandhistoricalbasesofourapproach,wewillpresentourtechniqueforthevisualexplorationofnetworks.Usingasanexample,athenetworkoftheFrenchinformationsphere,wewillillustratetherecursiveworkofinterpretationandcategorisationthatallowtoreadthenetworkasanorganisedterritory.Visualnetworkexploration,whichisgrowinginprominenceamongstdigitalmethodsresearchersforsocialandculturalresearch,maybeusefulnotonlyforstudyingmedialandscapes,butalsofordigitaljournalismpractitionerswhoareinterestedinexploringandtellingstorieswithnetworksandrelationaldata.
UNDERSTANDINGFORCE-DIRECTEDLAYOUTSFarfrombeingmerelyaesthetic,thegraphicalrepresentationofnetworkshasanintrinsichermeneuticvalue,whichyouwillhaveexperiencedifyouhaveeverusedapublictransportationmap.Suchmapsaredistinctivelydifferentfromroadmapsorcitymaps.Itisnotonlythattransportationmapsaresimpler(thelevelofdetailsdependingonlyontheresolutionofthemap),itisthattheyrepresentanetworkandnotageographicalterritory.AnillustrationofthisdifferencecanbefoundinthefamousmapoftheLondontubeasdesignedbyHarryBeckin1933.BeforeBeck’sredesign,thediagramwasaclassicgeographicalmaplocatingstationsaccordingtotheircoordinates.Aftertheredesign,itbecameanetworkofcorrespondencesinwhichstationsarepositionedaccordingtotheirrelativeproximityandconnectivity.Thegaininlegibilityisevidentasthefunctionofthetransportationmapisnottosituatestationsinurbanspace,butrelativetoeachother,soastohelpuserstomovefromonetoanother(atypeoforientationthatresemblesstrikinglytooneusedbyoftraditionalseanavigators,see,forexample,Turnbull,2000,pp.133-165).
a. b.
Figure1.Londontubemap(a)in1920beforeBeckredesignand(b)in1933afterBeckredesign.
AnotherexampleofsuchmappingapproachcomesfromearlyworksinSocialNetworkAnalysis(Freeman,2000).JacobMoreno,founderofSNA,isexplicitabouttheimportanceofvisualization:'Aprocessofchartinghasbeendevisedbythesociometrists,thesociogram,whichismorethanmerelyamethodofpresentation.Itisfirstofallamethodofexploration'(1953,pp.95-96).InaninterviewreleasedbyMorenototheNewYorkTimesin1933,networkanalysisispresentedasa'newgeography'.Moreimportantthanthetitle,however,isthefigurethataccompaniesthatinterview,depictingfriendshipsamongfourthgradepupils.Thesociogrampresentedbythesefigurespowerfullyrevealshowfriendshipisnotequallydistributedintheclass.Oneonlyneedtoknowthattrianglesrepresentboysandcirclesgirlstoseehowinter-genderrelationshipsarediscouragedatthatspecificage(oratleastthedeclarationofsuchfriendships).Thetrick,ofcourse,onlyworksbecausethenodesarenotpositioned
![Page 4: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/4.jpg)
3
randomlyinthespace,butinawaythatminimizesline-crossing(inMoreno’sownwords'thefewerthenumberoflinescrossing,thebetterthesociogram',1953,p.141).Itisbecausetrianglesarepushedononesideandcirclesonanotherthatitiseasytospottheexistenceofasingleinter-genderconnection.
a. b.
Figure2.Sociogramrepresentingfriendshipamongschoolpupils(originaltitleandimageaccompanyingMoreno’s1933NewYorkTimeinterview)(a)intheoriginalversionand(b)inthemodernforce-directedspatialisation.
Moreno’sruleofspatialisationiseasytofollowonagraphofafewdozennodesandedgesbutimpracticableonlargernetworks.Graphswiththousandsofnodesandedgesaresointricatethatthedirectcountingofline-crossingsbecomesprohibitivelytime-consuming.Anindirectapproachconsistsofdrawingclosertheconnectednodestominimizethelengthoftheedgesandthereforethepossibilityofcrossings.Buteveninthiscase,sinceeachnodemaybeconnectedtoseveralothernodeswhicharethemselvesconnectedmanyothernodes,minimizingthelengthoftheedgesisfarfromatrivialexercise.
Thuswemightexplorethenetworkusingatechniquecalled'force-directedspatialisation'.Suchspatialisationfollowsaphysicalanalogy:nodesarechargedwitharepulsiveforcethatdrivesthemapart,whileedgesactasspringsbindingthenodesthattheyconnect.Oncethealgorithmislauncheditchangesthedispositionofnodesuntilitreachesabalancesuchofforces(Jacomyetal.,2014).Suchequilibriumreducesline-crossingsandimprovesthelegibilityofthegraph.FrüchtermanandReingold(1991),whoproposedthefirstefficientforce-directedalgorithm,citeline-crossingasthesecondoftheiraestheticcriteria.
Yet,scholarsworkingwithnetworkssoonrealisedthatavoidingline-crossingisnotthemostinterestingeffectofforce-directedlayouts.Atequilibrium,thevisualdensityofnodesandedgesbecomesanapproximatebutreliableproxyofthemathematicalstructureofthegraph(foradetailedmathematicalproof,seeVenturinietal.,forthcoming).Groupsofnodesgatheringinthelayouttendtocorrespondtotheclustersidentifiedbycommunity-detectiontechniques(Noack,2009);structuralholes(Burt,1995)tendtolooklikesparserzones;centralnodesmovetowardsmiddlepositions;andbridgesarepositionedsomewaybetweendifferentregions(Jensenetal.,2015).
Thetrickofforce-directedalgorithmsisallthemoreremarkable,giventhatthespaceofnetworksisrelativeratherthanabsolute(itcanberotatedormirroredwithoutdistortionofinformation)andthatitisaconsequenceandnotaconditionofelementpositioning.Intraditionalgeographicalrepresentation,thespaceisdefinedaprioribythewaythehorizontalandverticalaxesareconstructed.Pointsareprojectedonsuchpre-existingspaceaccordingtoasetofrulesthatassignaunivocalpositiontoapairofcoordinates.ThesameistrueforanyCartesiandiagram(scatterplotsforinstance),butnotfornetworks,inwhichthespaceisdefinedbythepositionofthenodesandnottheotherwayaround.
Despitesuchdifferences(whichshouldnotbeforgotten),force-directedalgorithmsallowreadingnetworksasgeographicalmaps,translatingcomplicatedmathematicalconceptsintomoreconventionalvocabularyofregionsandmargins,pathandlandmarks,centresandperipheries(Lynch,1960).Thisisacrucialadvantagethatexplainswhyforce-directedalgorithmshavebecomethede-factostandardofnetworkvisualisation:theyfacilitatetheexplorationofnetworksandrelationsbymeansofmorefamiliarandintuitivespatialmetaphors,aswellasthroughlessfamiliarcomputationalandstatisticalmetrics.
![Page 5: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/5.jpg)
4
THEDÉCODEX:ACONTROVERSIALCASESTUDYInthefollowingpages,wewillillustratethetechniqueofvisualnetworkexplorationdrawingonaconcreteexample.OurcasestudyisanetworkofwebsitesextractedfromalistingcompiledbytheFrenchjournalLeMonde.Since2009,agroupofjournalistsgatheredunderthenameofLesDécodeurs(www.lemonde.fr/les-decodeurs/article/2014/02/12/l-equipe-des-decodeurs_4365082_4355770.html)hasverifiedtheaccuracyofthousandsofstoriescirculatingintheFrenchblogosphereandinsocialmedia.InJanuary2017(atthebeginningtheFrenchpresidentialcampaign),LesDécodeurshavelaunchedanonlinetoolcalledtheDécodex(www.lemonde.fr/verification),allowingreaderstosearchforthemostimportantsourcesofonlineinformationrelevanttoFrenchpublicdebates(thoughnotnecessaryinFrench).Eachsourceisaccompaniedbyashortdescriptionand,morecrucially,byanevaluationofitstrustworthinessaccordingtothejournalistsofLeMonde.
Figure3.UserinterfaceoftheDécodextoolbyLeMonde
Notsurprisingly,theclassificationprovidedbyLesDécodeurshasstirredmuchdebateintheFrenchmediaspheres.Severalofthesourcescategorizedasimpreciseorunreliable,alongwithothernewspapersandblogs,havecontestedtheDécodex,withcritiquespanningfromchallengingthewayinwhichwebsitesareover-simplisticallyclassified;toquestioningtherightofLeMonde(whichisitselfarivalsourceofinformation)tonotethereliabilityofotherwebsites;todisputingthelegitimacyandinterestofsuchclassificationingeneral(arguingthatsomeofthewebsitesinthelistmeanstocirculateopinionsratherthaninformation).LesDécodeursthemselvesadmittedthedifficultyoftheirexercise,themanyambiguitiesthattheywereobligedtodecideonandtheerrorsandinaccuraciesthatmayhavederivedfromthem.Atthesametime,theydefendedtheirworkbypointingattheincreasingquantityoffalseorpartisaninformationcirculatingonlineandbyaffirmingtheiropennesstodiscussingtheirclassificationandrevisingitifnecessary.
ThecontroversyaroundtheDécodexisagoodexampleofdifficultiesconnectedtothedetectionoffakenewsonline(Bounegruetal.,2017),butalsoofthemoregeneraldebatessurroundingallkindofclassifications.Categorizingthingsisneveraself-evidentorinnocentpractice(Bowker&Star,1999)andshouldalwaysbecarriedoutwiththegreatestcaution.ThisistruefortheinitialclassificationoftheDécodex,butitisalsotrueforthenetworkextractedfromit.Aswewillseeinthefollowingpages,thevisualexplorationofnetworkinvolvesaconstanttoingandfroingofcategorizationandobservation,typologyandtopology.
Tobuildourexamplenetwork,wehaveextracted,incollaborationwithLesDécodeurs,allthewebsites
![Page 6: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/6.jpg)
5
containedintheDécodexandinvestigatedthewayinwhichtheyciteeachother.Todoso,weemployedHyphe(http://hyphe.medialab.sciences-po.fr)awebcrawlerdevelopedbythemédialabofSciencesPo,whichfacilitatestheexplorationofwebsitesandfollowingthehyperlinkspresentintheirpages.AllthewebsitescomprisingtheDécodexcorpushavebeencrawledatadepthofoneclickstartingfromthehomepage.Wesoobtainedanetworkwith653nodesand5943edges.WhilstLesDécodeursfocusoneditorialjudgementsabouthowtoclassifywebsitesintheFrenchmedialandscape,ournetworkexplorationexaminestherelationsbetweenthemandotherwebsitesbymeansoftheirlinkingpractices.Whilesomeresearchersfocusonhownetworksareheldtogetherthroughfinancialties,organisationalaffiliations,businessrelationshipsandfamilyandsocialrelations–weconsidertheirrelationsaccordingtothehyperlink,inaccordancewithalongertraditionofdigitalmethods,digitalsociologyandnewmediastudiesresearch(see,e.g.Marres&Rogers,2005;Rogers,2013)
Thetreatmentofsocialplatforms(suchasFacebook,Twitter,YouTube…)inourcrawlrequiressomeadditionalexplanation.Theseplatformsarebothsourcesofinformationasawholeandcontainersofmultipleindividualsourcesintheformofpagesoraccounts.SinceextractingallthehyperlinksfromasiteaslargeasFacebookwouldhavebeenimpossible,weonlycrawledtheaccountsthatwerespecificallymentionedintheDécodex.Wehave,however,keptarecordofallthelinkspointingtowardthemainsocialmediaplatformtoinvestigatehowtheyarecitedbytheotherwebsitesofourcorpus.
AVISUALEXPLORATIONOFTHEDÉCODEXNETWORKThevisualexplorationofnetworksexploitsthreevisualvariablestographicallyrepresenttheirfeatures:position,sizeandhue(foradefinitionofthesevariablesandtheirsemioticaffordances,seeBertin,1967).Forthereasonsdiscussedabove,positioniscrucialintranslatingthemathematicalcharacteristicsofthegraphs.Force-directedlayoutscreateregionswherenumerousnodesaredenselyassembledandregionsthatarelesscrowded.Thesedifferencesofdensity,determinedbytheunevendistributionoflinks,revealtheunevenassociationbetweentheentitiesofthenetwork.Everythingmaybeconnectedinthisworld,butnoteverythingisequallyconnected.
Discerningthespatialstructureofnetworks,however,isnotalwaysstraightforward.Intheeasiestcases,thedifferenceinthedensityofassociationissuchthatclustersappearaswelldefinedknotsofnodesandedgesseparatedbyempty(oralmostempty)zones.Thesezonesarecalled'structuralholes'(Burt,1995)and,whentheyexist,theyprovideacrucialguidancefortheinterpretationofthenetwork.Thankstotherupturescreatedbystructuralholes,theboundariesofclusterscanbeeasilydetected,likecliffsseparatingaplateaufromavalley.Mostofnaturalandsocialnetworks,however,donotexhibitsuchaclearseparationandthebordersoftheirclustertendtobegradualasthehillsideslopes.Thefuzzinessofclusters’frontiersisnotnecessarilyanobstacletotheirrecognition(onecanpointatahillevenwhenitisimpossibletosayexactlywhereitstartsandends),butitcertainlymaketheiridentificationmoredifficult.Thisiswhyvisualnetworkanalysisisoftenmorelikeanexploratoryexpedition-wheremeaningsandfindingsareprogressivelyandhermeneuticallygenerated-thantothestatisticalconfirmationofasetofpre-existinghypotheses(onthedifferencebetweenexploratoryandconfirmatoryanalysisseeTuckey,1997andBehrensandChong-Ho,2003).
ThisiscertainlythecaseforourDécodexnetwork,which,atafirstlook,doesnotpresentanymanifeststructuralholeoranyclearspatialstructure.Tovisualiseournetworkweusedtwomaintools:Gephi(https://gephi.org)forfilteringandspatializingthenetwork(usinginparticulartheforce-drivenalgorithmForceAtlas2)andGraphRecipes(http://tools.medialab.sciences-po.fr/graph-recipes)totweakthevisualrenderingofthenetwork.ThoughnostructuralholesareevidentintheDécodexnetwork,lookingcloselyatthelayoutmakesitispossibletonoticethatthenetworkdoesnotspatializeasaperfectcircle,butratherinanavocado-likeshapewithasmallertopandandalargerbottom.Theseirregularities(asweakandsubtleastheycanbe)oftensuggestthepresenceofpolarisingeffectswhichcanbeinterestingtoinvestigatefurther.
![Page 7: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/7.jpg)
6
Figure4.TheDécodexnetworkspatializedbyForceAtlas2.Thesizeofnodesisproportionaltoin-degree.
Thefirstandmostcrucialwaytoexploreournetworkistolookattheidentityofthenodesthatoccupyitsdifferentregions.Thismayseemtrivial,butitisnot.Itisadistinctadvantageofvisualexplorationcomparedtootherformofstatisticalanalysis,thatitdoesnotaggregatetheindividualentitiesthatcomposeitscorpus:eachandeverynodeisvisibleinthelayoutandcanbeinterrogatedbytheresearcher.Evenonasmallnetworkastheoneinourexample,however,thequantityofnodescanmakeitdifficult(andtimeconsuming)tolookatallofthem.
Thisiswherethesecondvariableofourvisualexploration,size,comesinhandy.Since,innetworks,nodesaredefinedfirstandforemostbytheirconnections,wehaverankedthenodesaccordingtothenumberofedgespointingtothem.Inthejargonofnetworkanalysisthisnumberiscalled'in-degree'andnodeswithanelevatedin-degreearecalled'authorities',becausetheyarerecognisedandreferredtobymanyothers.Inthepreviousfigureandinallfollowing,wehavesizedthenodesaccordingtotheirin-degreesothatagreaterauthorityliterallytranslatesintoincreasedvisualprominence.
Readingthenamesofwebsitesthatoccupythetwopolesofouravocado,itseemsnaturaltosupposethattheirseparationderivesfromalinguisticfracture.ThewebsitesinthelowerpartarepredominantlyFrench,whilethoseintheupperpartaremoreinternational.AwaytohighlightthisistoshowtheunevendistributionofTLD(TopLevelDomain)inthenetwork.
![Page 8: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/8.jpg)
7
Fig.5.DistributionofTDLintheDécodexnetwork.
Thelinguisticseparationwejusthighlighted,however,isnotparticularlysurprisingorinteresting.Thiskindofdivisionisregularlyobservedinnetworkofwebsitesandhyperlinks.Detectingitisimportant,butratherinanegativeway-itmakesusawarethatinordertogeneratemoreinterestingfindings,wewillhavetolookbeyondit.
Furtherexploringthenetwork,wemaynoticetheroleofnotjustlanguages,butalsosocialnetworkplatforms,suchasYouTube,Facebook,Twitter,InstagramandDailymotion.WiththeremarkableexceptionofWikipedia,allthemainsocialmediaplatformsarelocatedinthemiddlerightofthelayout-somewherein-betweentheEnglishandtheFrenchwebsites(asonewouldexpectgiventhemultilinguality),butalsoseparatedfrombothbytheirdistinctivenature(andpossiblybythedifferentwayinwhichtheyhavebeentreatedinthecrawl).
Moreover,byfocussingonthelowerandlargerpartofthenetwork,wecanrecognisetwodifferentsub-poles,withnationalsources(suchasLeMonde,LeFigaro,FranceInfo,Libération...)occupyingmostofthelowerregionandtheregionalpressclusteringatthebottom-rightofthelayout.
![Page 9: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/9.jpg)
8
Fig.6.ZoomontheFrenchregionalpress
Thedistinctivepositionoftheplatformsandthenational/regionalpressarebothinterestingandnontrivialfindings,butwecanpushouranalysisfurther.Thewaytodosoisbyplayingwiththethirdvisualvariableexploitedbyvisualexplorationofnetwork:thehueofthenode.Thisisalaboriousbutrevealingpartofourvisualexploration.Itconsistsincategorizingthenodesofthenetworkaccordingtomultipleclassificationsandvisualizingtheseclassesonthenetworkasdifferentcolorsor(asinthispaper)asdifferentshadesofgrey.Itisimportanttonoticethattheoperationofclassifyingthenodesandofreadingthedispositionofclassesarenotseparated,butperformedatthesametime.Asitwillbecomeclearinthenextpages,ourtechniquedoesnotconsistsimplyintheprojectionofasetofpre-existingcategoriesonaconnectivity-basedlayout,butonrecursivelyusingthecategoriestomakesenseofthelayoutandthelayouttodefinethecategories.Itisimportanttorememberthatthecolorisa‘non-mixable’visualvariable.Anodecanberedorblue,forexample,butnotthetwoatthesametime.Whencategorizingnodes,itisthereforenecessarytoemployexclusivecategories.Awebsite,forexample,canbeclassedinthecategory'news'or'satire',butnotinboth.Inthe(notuncommon)caseofnodesresistingauniqueclassification,researchercanintroducearesidualcategorysuchas'multiple'or'misc'.
Asafirststepinourcombinedexplorationoftopologyandtypology,wewillcolorthenodesofthenetworkaccordingtotheoriginalcategoriesoftheDécodex.ThesecategoriesrefertothetrustworthinessofthesourcesasmanuallyassessedbythejournalistsofLeMondeinthefourcategoriesare'reliable','imprecise','unreliable'and'satirical'.Preciselybecausethesecategorieshavebeendefinedbeforeandindependentlyfromtheextractionofthenetwork,theirdispositiondoesnotfollowthespatialarticulationofthenetwork.Rather,itispossibletofindnodesofeverycategoriesinalmostofregionsofthenetwork.Aremarkableexceptionarethesatiricalwebsitesthataretobefoundontherightsideofthelayoutbothinitsupperandlowerpart.Arguably,thispositionisnotduetothehyperlinksbetweenthesatiricalwebsites(whichdonotciteeachotherverymuch),butbytheirstrongconnectionwithsocialmediaplatformstowhichallthesesitesextensivelylink.
![Page 10: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/10.jpg)
9
Fig.7.The'satirical'websitesaccordingtotheoriginalDecodexclassification(nodehavebeenemphasizedbytheblack
colorandbydoublingtheirradiusdespitetheirlowdegree)
Theotherclassesaredistributedmoreevenlybutnotrandomly.The'reliable'websitestendtooccupythecenterofbothintheinternationalandFrenchpole,whilethe'imprecise'and'unreliable'takeamoremarginalposition.Moreinterestingly,lookingatthelowerpartofthenetwork,weobservetwogroupsof'imprecise'and'unreliable'sources-whileamajorityofthesenodesarepositionedabovethecoreofnationalandreliablewebsites(andhencein-betweentheFrenchandtheinternationalwebsite),asignificantminorityislocatedbelowthem.
![Page 11: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/11.jpg)
10
Fig.8.Highlightofthe'reliable'websites(left)and'unreliable'and'imprecise'websites(right)
Toaccountforthisseparation,weintroduceanadditionalcategorisationbasedonthepoliticalleaningofthewebsites.Inparticular,wedistinguishthewebsitesthatdisseminateunreliableorimpreciseinformationbecausetheypursuearight-wingorextreme-rightagenda(whichoccupythecenterofthenetwork)andthewebsitesexhibitingamoregeneralconspiritorialattitude(whichoccupythebottomofthenetwork).
Fig.9.Highlightofthe‘conspiritorial’websites(left)and'right'and'extremeright'websites(right)
Throughouriterativeexplorationoftypologyandtopologywehaveeventuallyrevealedapartitioningof
![Page 12: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/12.jpg)
11
thenetworkthat,whileinvisibleatfirstglance,allowstointerpretsomeofthemaincontoursoftheFrenchmedialandscape.Thoughtheseterritoriesarenotseparatedbyclearstructuralholes,thenodesthattheycontainarefairlyconsistent.Interestingly,ourfinalclassificationproducesahomogeneouspartitionofthelayoutnotinspite,butbecauseofitsheterogeneity,whichmixeslinguisticcategories,trustworthinessclassesandpoliticalleanings.Thefactthatanon-homogenouscategorizationturnsuptoofferthebestcharacterizationofthestructureofournetworkshouldnotcomeasasurprise.Networksarecomplexobjectswhicharticulatediverseelementsthroughdisparatelogics.Inthis,theyremindusofapassagebyJorgeLuisBorgescitedbyFoucaultasaperfectexampleofaheterogenousclassificationthat,whiledefyingourtraditionalcategories,isnonethelesshighlyefficienttodescribethecultureinwhichithasbeenelaborated:
“[Borges] quotes a ‘certain Chinese encyclopaedia’ in which it is written that ‘animals are divided into: (a) belonging to the Emperor, (b) embalmed, (c) tame, (d) suckling pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies’. In the wonderment of this taxonomy, the thing we apprehend in one great leap, the thing that, by means of the fable, is demonstrated as the exotic charm of another system of thought, is the limitation of our own”. (Foucault,1970p.XV).
Fig.10.TheheterogenousterritoriesoftheDécodexnetwork.
LINKINGPATTERNSINTHEDÉCODEXNETWORKNowthat,bymeansofvisualexploration,wehavedefinedaheterogenousbuthermeneuticallyrobustpartitioningofournetwork,wecanuseitasabasisforastatisticalanalysis.Whilepraisingtheadvantagesofthevisualinterpretation,wearealsoawarethatnotallstructuralpropertiescanberenderedvisually.Thedirectionofedgesortheconnectionbetweendifferentclasses,inparticular,arenoteasilyreadinnetworkimages.Thesequestions,however,canbeinvestigatedbyothermeansoncethepartitioningofthenetworkhasbeendefined.
![Page 13: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/13.jpg)
12
Fig.11.Distributionofthenumberofnodespercategory
Fig.11showsthedistributionofnodesintheregionsidentifiedinourfinalclassification(seefigure10),towhichwehaveaddedthe‘satirical’websites(whichwediscussedabovebutnotincludedinfigure10forthesakeoflegibility)aswellas“otherreliable”and‘otherunreliable’.Thesetworesidualcategoriescomprisetogetheraboutonefifthofthenodesofthenetwork.Thisrelativelyhighfigureisnotuncommon.Giventheheterogeneityofthenetworkstheyworkwith,socialscientistsandjournalistsshouldaimatclassificationsthatarerobustandinsightful(capableofdelineatinghomogenouszonesinthegraph)ratherthancomprehensive.
![Page 14: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/14.jpg)
13
Fig.12.Connectivitybetweenthecategoriesofourfinalclassification.Rowsconveyhowmanytimethenodesofagivencategorycitesthenodesofothercategories.
Columnsconveyhowmanytimethenodeofagivencategoryarecitedbythenodesofothercategories.
Ourempiricalcategoriesarepowerfultoolstounveildifferentlinkingstrategiesinthenetwork.Figure12abovepresentsthelinksinthecorpusaggregatedbycategories.Aswecansee,notallcategoriesciteorarecitedthesameway.‘Frenchnationalmedia’and‘platforms’aremuchcitedandbyvariousactors(theircolumnscontainlargercircles),while‘satirical’websitesarescarcelycited(theircolumnisalmostempty).Platformsdonotcitemuch,butthisismerelyaconsequenceofourmethodsince(asexplainedabove)mostofthemhadnotbeennotcrawled.‘Right-wing’,conspiracytheoristandother‘unreliable’websitesareonthecontrarytheoriginsofthehighestnumberofcitationsand,veryinterestingly,theyseemtofavour“reliable”sourcesover“unreliable”ones.Asexpected,thereliablewebsitesdonotlinkbacktothem,andthisasymmetryrevealsanimportanthierarchy.Toinvestigatethislinkingpattern,wewillcomparetheincomingandoutgoinglinksofsomeofthemostinterestingcategories.
![Page 15: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/15.jpg)
14
fig.13.Hierarchicalstructureinthecorpus,basedonourfinalcategories.Blackarrowsontherightsidesummarizethe
linksstructurebetweenthesecategories.
![Page 16: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/16.jpg)
15
fig.14.Simplifiedversionofthestatisticalanalysispresentedinfigure13.
Thiskindofhierarchicalstructureiscommononthewebandhasbeenexplainedasaconsequenceofpreferentialattachment(Barabási&Albert,1999):actorstendtolinktootherwebsitesthattheyperceiveashigherinthehierarchyandavoidlinkingtothosethattheyperceiveaslower.Thisstyleofpreferentialattachmentwherebysmalleractorslinktoestablishmentactorswithoutreciprocationofthelinkingacthaselsewherebeencalled“aspirationallinking”(Rogers,2013).Linksinanetworkdonotalwaysproduceahierarchyofcategoriesbutthisbehaviourdoes.Thislinkingpatternandthewayitfitsourempiricalcategories,maysuggestanalternativewaytocharacterisethetrustworthinessbeinginvestigatedbyLeDécodeurs:reliablesourcesarecitedbyalltypesofwebsites,whileunreliablesourcesareonlycitedbyfewothertypes(ifany).
Thisobservationisinmanywaysatoddswithwhatisoftenaffirmedabout“post-truthera”inwhichwehavesupposedlylanded.Whilefakenewsissaidtoleveragethehorizontalityofdigitalmediatoblurtheboundariesbetweentrueandfalse,thelinkingpatternsofthe(French)informationspheressuggestadifferentpicture.Despitetheirdifferentideologicalleanings,allwebsitesagreeontheoverallhierarchyofreliabilitybycitinginonesenseandnotintheother.The‘right-wing’websites,forexample,trytoblurthelinesbycitingboththeirpeersandmorereliablesources,buttheyalsotrytodrawalinebetweenthemandtheevenlessreliable‘conspiracytheorist’websites.Whateveritspositioninthepyramidofhyperlinking,everyactortriestoimproveitssituationbylinkingupwardstoauthoritiesabove,andnotlinkingtolessreputablewebsitesbelow,thusreinforcingthehierarchy.
CONCLUSIONThischapterdiscussedthevisualexplorationofnetworkswiththeaimofimprovingtheunderstandingofoneofthedominantvisual-analyticalformsofourdigitalage–thenetworkdiagram–anditspotentialroleinrelationtothestudyandpracticeofdigitaljournalism.Drawingongraphsemioticsandtraditional
![Page 17: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/17.jpg)
16
cartography,thischapterproposedamodelwherebytheinterpretationofnetworktopologywithitsregions,paths,coresandperipheries,isguidedbythreevisualvariables:position,sizeandhue.Theprocessthatwedescribedisonethatemphasizestheexploratoryanditerativecharacteroftheinvestigation.Whilecounter-intuitiveatfirst,weemphasisedthatinordertosurfacethemultiplelogicsthatplayoutinthestructureofanetworkgraph,analysisshouldnotlimititselftooneclassificatoryprinciple.Multipleheterogeneouscriteriaofclassificationareoftennecessarytocharacterizethetopologyofanetworkmap.Finally,weadvocatedformixingmethods,complementingvisualnetworkexplorationwithstatisticalanalysesinordertofurthercharacterisenetworkproperties.ThroughthecasestudyofFrenchmediahyperlinkmap,wetriedtoshownhowthevisualexplorationofnetworksrevealsnewangleswhichotheranalysesmayleaveunexplored.Inthiscasethechapterillustratedanalternativewaytoassesswebsites’reliabilitythatcomplementsthetraditionalfact-checkingapproachofqualifyingcontentwithanexaminationofthelinkingpatternsbetweendifferentregionsofthenetworkasreputationalmarkers(Rogers,2013).InthisanalysisthuswehavecombinedthemanualclassificationofreliabilityundertakenbyLeMonde’sjournalistswiththestandingofasourceaccordingtothehyperlinksthatitreceivesandgives.Thisapproachenabledustobringfreshfindingstocurrentdebatesaroundfakenews.Inspiteoftheproliferationoffabricatedcontentofvariousshades,reputationhierarchiesonthewebseemtobemaintained(atleasttosomeextent),asfakeandhyper-partisansitesdeployaspirationalhyperlinkingstyleswhichfavour,perhapssurprisingly,authoritativesources.
REFERENCESBarabási,A.L.,&Albert,R.(1999).Emergenceofscalinginrandomnetworks.Science,286(5439),509.Retrieved
fromhttp://www.sciencemag.org/cgi/content/abstract/sci;286/5439/509
Baruch,J.,&Vaudano,M.(2016,April8).« Panamapapers » :undéfitechniquepourlejournalismededonnées.LeMonde.Paris.Retrievedfromhttp://data.blog.lemonde.fr/2016/04/08/panama-papers-un-defi-technique-pour-le-journalisme-de-donnees
Behrens,J.T.,&Chong-Ho,Y.(2003).ExploratoryDataAnalysis.InI.B.Weiner(Ed.),HandbookofPsychology(pp.33–64).London:Wiley.http://doi.org/10.1002/0471264385.wei0202
Bounegru,L.,Gray,J.,Venturini,T.,&Mauri,M.(2017).AFieldGuidetoFakeNews.Retrievedfromfakenews.publicdatalab.org
Bounegru,L.,Venturini,T.,Gray,J.,&Jacomy,M.(2016).NarratingNetworks:ExploringtheAffordancesofNetworksasStorytellingDevicesinJournalism.DigitalJournalism,
Bowker,G.C.,&Star,S.L.(1999).SortingThingsOut:ClassificationandItsConsequences(InsideTechnologyS.).CambridgeMA:MITPress.
Burt,R.S.(1995).StructuralHoles:TheSocialStructureofCompetition.CambridgeMA:HarvardUniversityPress.Retrievedfromhttp://books.google.com/books?id=E6v0cVy8hVIC&pgis=1
Foucault,M.(1970).TheOrderofThings.NewYork:PantheonBooks.Freeman,L.C.(2000).VisualizingSocialNetworks.JournalofSocialStructure,1(1).Fruchterman,T.M.,&Reingold,E.M.(1991).Graphdrawingbyforce-directedplacement.Software:Practiceand
Experience,21(NOVEMBER),1129–1164.Retrievedfromhttp://onlinelibrary.wiley.com/doi/10.1002/spe.4380211102/abstract
Jacomy,M.,Venturini,T.,Heymann,S.,&Bastian,M.(2014).ForceAtlas2,aContinuousGraphLayoutAlgorithmforHandyNetworkVisualizationDesignedfortheGephiSoftware.PloSOne,9(6),e98679.http://doi.org/10.1371/journal.pone.0098679
![Page 18: Visual Network Exploration for Data Journalists · 2 computational power, graph mathematics does not fit journalistic needs because it tends to be obscure for both reporters and their](https://reader036.fdocuments.net/reader036/viewer/2022071001/5fbd48acbd570e08125b3484/html5/thumbnails/18.jpg)
17
Jensen,P.,Morini,M.,Karsai,M.,Venturini,T.,Vespignani,A.,Jacomy,M.,…Fleury,E.(2015).Detectingglobalbridgesinnetworks.JournalofComplexNetworks,cnv022.http://doi.org/10.1093/comnet/cnv022
Lynch,K.(1960).Theimageofthecity.CambridgeMA:MITPress.Retrievedfromhttp://books.google.com/books?hl=it&lr=&id=_phRPWsSpAgC&pgis=1
Marres,N.,&Rogers,R.(2005).RecipeforTracingtheFateofIssuesandTheirPublicsontheWeb.InB.Latour&P.Weibel(Eds.),MakingThingsPublic:AtmospheresofDemocracy(pp.922–935).Cambridge,MA:MITPress.
Moreno,J.(1953).WhoShallSurvive?(SecondEdition).NewYork:BeaconHouseInc.Noack,A.(2009).Modularityclusteringisforce-directedlayout.PhysicalReviewE,79(2).
http://doi.org/10.1103/PhysRevE.79.026102
Rogers,R.(2013).DigitalMethods.Cambridge,MA:MITPressTheNewYorkTimes.(1933).EmotionsMappedbyNewGeography.TheNewYorkTimes,3April.Tukey,J.W.(1977).ExploratoryDataAnalysis.Reading,MA:Addison-Wesley.Turnbull,D.(2000).Masons,TrickstersandCartographers.London:Routledge.Venturini,T.,Jacomy,M.,&Jensen,P.(n.d.).WhatdoweSee,WhenweLookAtNetworks.TowardsaPositive
MeasureofSpatialisationQualityforForce-DrivenNetworkLayouts.Forthcoming.