Cell Systems, Volume 3 · Cell Systems, Volume 3 Supplemental Information Accurate Reconstruction...

17
Cell Systems, Volume 3 Supplemental Information Accurate Reconstruction of Cell and Particle Tracks from 3D Live Imaging Data Juliane Liepe, Aaron Sim, Helen Weavers, Laura Ward, Paul Martin, and Michael P.H. Stumpf

Transcript of Cell Systems, Volume 3 · Cell Systems, Volume 3 Supplemental Information Accurate Reconstruction...

Cell Systems, Volume 3

Supplemental Information

Accurate Reconstruction of Cell

and Particle Tracks from 3D Live Imaging Data

Juliane Liepe, Aaron Sim, Helen Weavers, Laura Ward, Paul Martin, and Michael P.H.Stumpf

1

Accuratereconstructionofcellandparticletracksfrom3DliveimagingdataJulianeLiepe*1;2,AaronSim*1;2,HelenWeavers3,LauraWard4,PaulMartin3;4;5,MichaelPHStumpf1;21DepartmentofLifeSciences,ImperialCollegeLondon,London,UK,SW72AZ2CentreforIntegrativeSystemsBiologyandBioinformatics,ImperialCollegeLondon,UK,SW72AZ3DepartmentofBiochemistry,MedicalSciences,UniversityofBristol,Bristol,UK,BS81TD4SchoolofPhysiologyandPharmacology,UniversityofBristol,UK,BS81TD5SchoolofMedicine,UniversityofCardiff,UK,BS81TD*Theseauthorsequallycontributedtothework.1.SupplementalExperimentalProcedures1.1.Tutorial:HowtounwrapcelltrajectorydatausingJupyterIn this sectionweprovide abrief examplehow tounwrap trajectorydata that lie on a curved surface. TherequiredcodeisprovidedinSuppl.MaterialsasanhtmlfileaswellasaJupyternotebook.YoucanfollowstepbystepthenextsectionsinparallelwiththeJupyternotebook.Prerequisites.The code iswritten inR. To run the Jupyternotebookyouneed: (i) an installationof the Jupyternotebook(http://Jupyter.org/), (ii) an installation of the R statistical environment (http://www.r-project.org/) and (iii)theRKernelfortheJupyternotebook,whichcanbeinstalledfromhttps://github.com/IRkernel/IRkernel.Forthe latter the instructions found at http://www.michaelpacer.com/maths/r-kernel-for-ipython-notebook aregoodforinstallingthisunderOSX.VisualizingtheoutputinRrequirestherglpackage,whichcanbeinstalledusingyourRenvironment.Providingthatthesepackagesareinplacethecodeinthisnotebookshouldrunwithoutanyfurtherproblems.Installation.TheJupyternotebook(previouslyIPythonnotebook)requiresaworkinginstallationofPythoninthefirstplace;mostPythondistributionsaimedatscientificcomputingcontaintherelevantfilesandpackages.TheAnaconda(https://www.continuum.io/downloads) is, inourexperience,particularlystraightforwardto install,use,andmaintain. Installation can be done via the provided installers (for Windows, OSX and Linux), or from thecommand line (the website https://www.continuum.io/downloads contains instructions for the variousversions).Maintainingandupgradingthedistribution’spackagesisdoneusingthecondapackage-manager.Toupgradethejupyternotebook,forexample,atthecommandlinewrite> conda upgrade jupyter ThisinstallsallthefilesrequiredforusingJupyterinconjunctionwithPython.Otherkernelscanbeinstalledasdescribedontherelevantwebpages,whicharelinkedtoathttps://github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages.FortheRkernelthecondadistributionoffersaconvenientwayof installingtherelevantpackages(assumingthatarecentRinstallationispresent),> conda install –c r r-essentials (seehttps://www.continuum.io/blog/developer/jupyter-and-conda-rforfurtherdetails).

2

HowtoexecutetheJupyternotebook.The Jupyternotebooks are available in the folder Jupyter. A step-by-step guide is presentedas awebpagecalledUnwRapping.html(DataS1).Toexecutethenotebook,atthecommandlineenter(intheJupyterfolderDataS1)> jupyter notebook Thiswillstartthedefaultbrowserwithandtheloadsthecontentsofthedirectory,

Clicking on the relevant Jupyer notebook (the files with an extension “.ipynb”) will then start the relevantJupyternotebook.RunningMannifold_learning.ipynbwill be straightforward with any recent Python installation; running theUnwRapping.ipynbnotebookwillrequiretheinstallationoftheRkernel.Routines forUnwrappingData.Wefirstdefineasetofnecessaryroutines.getAngleBias() letsusdefinetheangletothewound(target).getAnglePersistence()determinestheanglebetweensuccessivestepsandhencemeasuresthepersistence.transformData()andunwrapData3D()aretheroutinesthattransformthedataandunwrapthemintoaflatspace.ThecodeisprovidedintheJupyternotebook.Datapreparation.Weprovideanexampledataset inSuppl.Materials(exampleDataBPRW.csv) inthefolderSimulatedData.Thisdatasetdescribessimulatedcelltrajectoriesbasedonabiasedpersistentrandomwalkonanellipsoidsurface.Thedataaresavedascsv format,whichcanbeopened inanytexteditororExcel.Thedatafilehastobeprovidedinaspecificlayout:Itshouldcontain4columns,wherethefirstcolumnisthecelltrack ID (id), and the second, third and fourth columns are the x-, y- and z-coordinates of the cell tracks,respectively.Therowsarethentheindividualtimepointsforallcelltracks.Theusercanreplacethisdatafilewithowndatafiles.Weusethesamedataformatforallmethodsprovided.Thetargetofthebiasedcells(wound)isatposition(-7;0;0),whichneedstobedefinedinthefirststep.Nextthedataareimportedandreformatted(point6inthenotebook).PlotDatain3D.Webeginbyplottingthetrajectorydatain3D(usingrgl).Thisallowsustogetanideaofthegeometryofthedata(seenotebookpoint6).Wecanthenclearlyseethatthetrajectorydatalieonasurfaceofanellipsoidwithradii6;6and18.

3

Analysisofrandomwalkstatisticsinthe2Dprojection.Beforeunwrappingthedatawecalculatethebiasandpersistence of the random walk data in the conventional projection down to 2D, i.e. using the x- and y-coordinates only. We call the routines getAngleBias() and getAnglePersistence() by passing the relevantcoordinates.Unwrapping of the data onto a flat manifold. We next transform the data onto a flat space using theunwrapping method by calling the routine unwrapData(). If a simple representation of the manifold isavailable(suchasacylinderorellipsoid)thenwecanunwrapthedatabymappingthecorrectpositionsonthemanifold(akintocartographicprojections)inawaythatmaintainstheanglescorrectly.Theroutinewillcreatea3Dgraphicalpresentationoftheprogressoftheunwrappingprocedure.Firstly,theoriginaldataareplottedin3DandthenshiftedtobesuitablefortheunwrappingusingtheroutinetransformData().Theoriginaldataarethenclusteredalongthex-axisandplottedin3D,whereeachclusterisshowninadifferentcolor.Next,anellipseisfittedtoeachcluster.Thedataarethenunrolledontoanewspacebasedonthecharacteristicsofthefitted ellipse (for exact details see section 4). This first step is an approximation to manifold learningtechniques.Inthefollowingthedataobtainedfromthefirststeparefurtherunrolledbyfittinganellipseontoaflatspace.Theroutinewillplotthetransformeddataingrey.Analysisofrandomwalkstatisticsinthemanifoldprojection.Wecannowcalculatethebiasandpersistenceoftherandomwalkdatainthe2Dmanifoldprojection.Again,thisisdoneviatheroutinesgetAngleBias()andgetAnglePersistence()bypassingtherelevantcoordinates.Comparisonofinferredbiasedandpersistencebehaviorinthe2Dx-yandunwrapped(manifold)projection.Finallywe can compare the computed statistics for thexy-projectionwith the statistics computed from theunwrappeddata.Thisisforexampledoneviaplottingthehistogramsanddensitiesforthebiasandpersistentdistributions.Doingso,weobservestrongartifacts in theobtaineddistributionsbasedonthexy-projection.Onthecontrary,unwrappingmanagestorecovertheexpectedbiasandpersistencedistributions.1.2.Tutorial:HowtounwrapcelltrajectorydatausingRAdditionallytotheJupyternotebookwealsoprovidetheplainRcodeforunwrappingdatathatlieonacurvedsurface.Theequivalentroutine for theabove-describedexampleofabiased-persistentrandomwalkcanbefoundinthefolderexampleCode_Unwrapping_1_BPRW(partofDataS2).Furthermoreweprovidethesameroutine for a purely persistent randomwalk (without bias) in the folderexampleCode_Unwrapping_2_PRW(part of Data S2) and for an in vivo data set extracted from a fly embryo in the folderexampleCode_Unwrapping_3_inVivo(partofDataS2).Inthelatterdatasettheflywaswoundedwithalaser.In this example we find that without unwrapping it is possible to detect a weak bias towards the wound.However,afterunwrappingthedataitbecomesapparentthatthecellsarestronglybiasedtowardsthewoundbutalsointotheoppositedirection,i.e.awayfromthewound.InstallationandPrerequisites.You need to install the R statistical environment (http://www.r-project.org/). You can download theprecompiledbinary file for installation formost computerplatforms (http://cran.ma.imperial.ac.uk/). Simplydownload thebinary file suitable foryourplatform,doubleclick it to start installation.MoreadvancedusermightchosetoinstallRfromsourcecode.ForvisualizationpurposeyouneedtheRlibrary‘rgl’.Again,youcandownloadthebinaryfilefor installationfromhttps://cran.r-project.org/web/packages/rgl/index.html.Alternatively,openaterminal,startRbytypingR followedbyenter.Thentypeinstall.packages('rgl') whichwillalsoinitiatetheinstallationofthelibrary.

4

HowtousetheRscripts.Firstofallopenaterminal.OnmostMacsyoucanfindtheterminalinthefolder‘Applications/Utilities’.Whenthe terminal is started, it usually links to your home directory. Type ‘pwd’ to know inwhich directory youcurrentlyare.Changethedirectorytooneofthe3examplecodefoldersbytypingintheterminalforexample:cd WorkFolder/exampleCode_Unwrapping/ exampleCode_Unwrapping_1_BPRW Ifyouarenewtoterminalandtherelatedcommands,pleaserefertohttp://ss64.com/osx/ThenstartRbytypingintheterminalR followedbyenter.Toruntheexamplescripttypesource(“runAnalysis.r”) Thescriptwill first read in thedata ‘exampleDataBPRW.csv’ located in the folder ‘simulatedData’.Thedataaresavedascsvformat,whichcanbeopenedinanytexteditororExcel.Thedatafilehastobeprovidedinaspecific layout: It shouldcontain4columns,where the first column is thecell track ID (id),and thesecond,thirdandfourthcolumnsarethex-,y-andz-coordinatesofthecelltracks,respectively.Therowsarethentheindividualtimepointsforallcelltracks.Theusercanreplacethisdatafilewithowndatafiles.Afterreadingthedata,awindowpops-up,whichshowsthedataplottedin3D.Thedataarethentransformedviaunwrapping.Inthesamepop-upwindowtheprocedureofthealgorithmcanbefollowed,i.e.thedataareshifted, grouped, unwrapped in the first dimension (rainbow colored, still curved surface), followed byunwrappingintheseconddimensionresultingintransformeddatapointsthatlieinonaflat2Dsurface(greyplottedpoints).Afterthedatatransformationtookplace,theRscriptanalysestheinitialdatatransformedviaxy-projectionaswellasthedatatransformedviaunwrapping.Forbothdatasetsthebiasandpersistenceanglesarecomputedandplottedashistograms,overlaidwiththeestimateddensityoftheresultingdistributions(redlines).If you run the in vivo example in the folder ‘exampleCode_Unwrapping_3_inVivo’ an additional pop-upwindowwillopen,wherethetrajectoriesareplottedinthexy-projectionandafterunwrappingforcomparison.In this example the analysis output of bias and persistence distribution differs slightly. Here, we plotadditionallytothebiasandpersistencedistributions,thetransformedbiasdistributions.Asexplainedfurtherbelow,thebiasanglescantakevaluesbetween–piandpi.Inprinciple,thisdistributionshouldbeplottedonacircle, because angles generate circular distributions. We refrain from doing so, since the circularrepresentationishardertoread.However,oneshouldkeepinmindthat–pi isequivalentto+pi.Weaimtohighlight this by shifting our obtained bias distribution by –pi. In this way it becomes clear that the biasdistributionobtainedfromtheunwrappeddataindicatestwomaxima(oneat0andoneat–pi),whichshowsthatcellsarebiasedtowardsthewound(0),butalsointheoppositedirection,i.e.awayfromthewound(-pi).Finally,weprovideforcomparisontheJupyternotebookforthepresentedmanifoldlearningtechniqueontheexample of the persistent random walk based on the same data set as the R routine for unwrappingexampleCode_Unwrapping_2_PRW(seeprevioussection).

5

1.3.Tutorial:HowtodomanifoldlearningwithJupyterInstallationandPrerequisites.ThemethodisimplementedinPythonandmakesuseofthefollowingpackages:NumPy,Pandas,scikit-learn,MatplotlibandSeabornThe Jupyternotebookcanbe installed following the instructions in theunwrappingexamplesabove,exceptthereisnoneedfortheRkernelshere.NotethatthecodecanbeexecutedinbothPython2.7and3.x.The Jupyter notebook document is titledManifold_learning.ipnb (part of Data S1) and can be found in thefolderJupyter(DataS1).Adetailedstep-by-stepinstallationguideisincludedwithin.Forreference,anHTMLversion of the documentManifold_learning.html, which can be accessed using any web browser, is alsoprovidedinthesamefolder.1.4.TimingAllprovidedmethodsareabletohandle largedatasets.Theprovidedexamplesrun inacoupleofseconds.TheUnwrappingwastestedonadatasetwith3000datapoints.Thistakesdependingonthecomputerused(here:MacOS10.8,2.7GhzIntelCorei7,16GBMemory)severalseconds.Themanifoldlearningmethodsareslightly slower, again depending on data set size and computer used. The provided example in the Jupyternotebookanalysesadatasetwithabroadlyrealisticsizeof3000datapoints.Theexamplecanberununderaminuteonastandardworkstation.Themanifoldlearningcomputationisdominatedbythecalculationofthesimilaritymatrix,whichinturnscalesasO(N2)whereNisthenumberofdatapoints.Thereforeadatasetwith20-25Kdatapointscouldbeanalyzedwithinapproximatelyonehour. Inpractice,thelimitationsaredefinedbythecomputerequipment(e.g.processor,memory,diskspace).1.5.MethodsinBrief1.5.1.RandomwalksinBiologyTherearedifferenttypesofrandomwalksthatarecommonlydescribedinBiology.Wecanclassifythemintorandom walks that describe the step length distribution and random walks that describe the angulardistributions.Thedefinitionofrandomwalksviasteplengthdistributionissomewhatmorefrequentlyused.However, to investigate if a cell or a molecule is targeted in its movement, it is easier to look at angulardistributions.ThemostprominentrandomwalkisBrownianmotion.Theangulardistributionis isotropic,meaningthatateachstepacellormoleculehasequalprobabilitytomoveinanydirection.Ifwemeasuretheanglesbetweenamotionvector(cellstep)andareferencedirection,wewillfindthattheresultingangulardistributionisflat(uniformly distributed). If, on the contrary, a cell has a specific target direction, then the cell has higherprobabilitytomovetowardsthattargetdirectioncomparedtoallremainingdirections.Inthiscasewespeakabout abiased randomwalk. Theexpectedangulardistributionwill have apeak at the anglewhichpointstowardsthetargetdirection.Theremainingcharacteristicsoftheangulardistributionofsuchbiasedrandomwalk depend on the details of the exhibitedwalk,which are usually unknown.However, a commonly useddescriptionoftheangulardistributionisawrappednormaldistribution(anormaldistributionwrappedaroundacircletodescribecircularvariablessuchasangles).Themeanofthewrappednormaldistributionindicatesthebiasdirectionandthevarianceindicatesthestrengthofthebias.Thelowerthevariance,thenarroweristhe distribution and the stronger is the exhibited bias. A further type of random walk frequently used todescribeanimalmovementandcellmigrationisapersistentrandomwalk.Acellexhibitingthistypeofwalkhas higher probability of moving in the same direction as in the previous step compared to changing its

6

direction. If we measure the angles between consecutive motion vectors (consecutive cell steps) we willobserveapeakat0, i.e.nochangeofdirection.Asforthebiasedrandomwalk,thepersistentrandomwalkcanalsobedescribedusingawrappednormaldistributionwith0meananda variancewhich indicates thestrengthofthepersistence(thelowerthevariancethestrongerthepersistence).All threetypesofwalkshavebeendescribedformigrationof immuneandothercells,migrationofanimals,andmovementofmoleculesinsidethecell.Oftenamixofthesethreetypesisobserved.1.5.2.AnalyzingcellmigrationdataCell migration trajectories are often extracted from confocal time-lapse microscopy imaging data. Recentadvances allow researchers to collect such data even in vivo in living animals. Examples include imaging ofmacrophage,neutrophils and cancer cellmigration in zebrafish tail fin, flanks, gills or yolk; imagingof stemcellsandhematopoieticcellsinmousebonemarrow;imagingofhaemocytesinvariousstagesandorgansofdrosophila;imagingofmigratingneutrophilsonthesurfaceoftheheartandmanymore.While these data contain potentially a huge amount of new information about the underlying biologicalprocessesinvivo,theircorrectanalysisburiesavastrangeofchallengesandoneofthemwehighlightedinthisstudy:themovementofcellsoncurvedsurfaces.Inordertoextractinformationaboutbiasandpersistencefromobservedcelltrajectories,wehavetocomputetwotypesofangles:(i)theangleαbetweenafixedreferencedirectionandthecellmotionvectorand(ii)theangleβ between two consecutivemotion vectors.While the first angleα helpsus todetectpotential biasdirection, the later angle β helps us to measure the strength of persistence (as described in the previoussection).Ifthemovementofthecellisrestrictedtoacurvedsurface,thendirectlymeasuringtheanglesαandβbasedontheoriginal(untransformed)datawillprovideuswithartifacts,whichinsomecasescouldmimicatargetbiaswhereinrealitythereisnone.Inordertostillbeabletoextractbiasandpersistenceinformationfromsuchdata,weneedtoeithertransformthetrajectorydatainsuchwaythattheylieonaflatsurface(andthen apply the standard analysis tools), or use some methods to learn the exact surface (manifold) andcompute the angles on suchmanifold. Either way, the aim is to remove any artifacts that appear throughcurvedsurfacesfromtheanalysis.Thefirstsolutioncanbeobtainedviaunwrapping;thesecondbringsustothefieldofmanifoldlearning.1.5.3.UnwrappingtrajectorydataAs mentioned in the previous section, unwrapping trajectory data aims to transform data from a curvedsurfacesothattheylieonaflatsurface.Morespecifically,theUnwrappingmethodisfittingseveralellipsestotheobserveddatapoints.Theseellipsescan then be unrolled onto a 2D surface. The basic idea behind thismethod is rather simple and intuitive:Imagineourcellsaremigratingonthepeelofanorange,whichisclearlyacurvedsurfacedescribingasphereoranellipsoid.Theaimisnowtopeeltheorangeinsuchwaythatwecanlaythepeelontheflattableandstillconserve the characteristics of the cell trajectories. We are here interested to conserve directionalcharacteristics,morethandistances.Theresultingtransformedcelltrajectoriescannowbeanalyzedwiththecommonlyusedtools.Unwrappingisbestsuitedfor3Dobjectsthathavearathersmall intrinsicandconvexcurvature.Thismeansbeforethismethodisapplied,wealreadyhaveanideaofthetruegeometry.

7

1.5.4.ManifoldlearningManifoldlearningreferstoadiversesuiteofmethodsthataimtogeneralizewell-knownlineardimensionalityreduction methods – Principal Component Analysis (PCA), Independent Component Analysis (ICA), LinearDiscriminantAnalysis(LDA)–toaccountfornon-linearfeaturesinthedata.The key assumption underlying dimensional reductionmethods – linear and non-linear – is that the ‘true’numberofdegreesoffreedomislowerthantheapparentdimensionalityofthedata.Theproblemaddressedinthispapergivesthesimplestandperhapsthemostexplicit illustration:wehavepoint-clouddata inthreedimensionsconstrainedtolieontwo-dimensionalsurfaces,whichmayormaynotbeflat.The study of smooth curved spaces belongs to themathematical field ofdifferential geometry. There is anintuitiveideaunderlyingthisfield:atsmallenoughscales,everylocalpatchofasurfacecanbeapproximatedbyaflatsurface;acurvedmanifoldisthensimplyanoverlappingpatchworkof(small)flatspaces.Thisistheapproach adopted by most manifold learning algorithms – that is, to identify local neighborhoods of datapoints,treattheseaslinearspaces,andthenfindsome‘optimal’methodofjoiningthesetogetherintoaflatglobalrepresentation.Anecessaryrequirementforthesemethodstoworkwell,therefore,isthatthedensityofthedatapointsishighenoughtoallowonetoconsidersmalllinearpatches.If one starts to consider distances between points and angles between vectors, then we augment thisdescriptionof themanifoldwithametric,which is amathematicalobject that, loosely speaking,providesalocalspecificationof lengthsandangles.Amanifoldwithametric isknownasaRiemannianmanifoldandistheobjectofstudyinRiemanniangeometry.Manifold learning algorithms do not explicitly preserve themetric information. However one can augmentthesemethodsbyextractingthemetricateachofthedatapoints.Thisturnsoutstobeessentialforobtainingaccuratedirectionalstatistics,asweshowinthispaper.1.6.Imageprocessingandcelltracking.Imagingresultedinimagestackswithdarkbackgroundandfluorescentcells.TheimageprocessingwasdoneinRusingthepackageEBImage[1].Theinformationofthecellswasextractedautomaticallyfromtheimagesusing an edge detectionmethod.Amanually set threshold of the light intensitywas usedper image stack.Eachdetectedcellwasdescribedasanobjectwiththecoordinatesofitsgeometricalcenterindicatingthecelllocationandthetimethecellwasobserved.Thecellsweretrackedandreconstructedoverthez-stackusingasurface algorithm. The surface algorithmwas then applied to track reconstructed cells over time, which isbasedontheshortestdistancebetweencells fromtwoconsecutive images.Weexcludedallcell trajectoriesthat included timepoints inwhich thecellwas locatedat theedgeof the image.Extractedcell trackswerereoriented,sothatthecenteroftheimagedobject(embryooryolksyncytium)waspositionedinthecenterofthecoordinatesystem(x=y=z=0)forfurtherprocessing.1.7.Dimensionalreduction:fromlinearmethodstoRiemannianmanifoldlearning.In this sectionweprovide the theoreticalbackground to themethodsemployed in thepaper, including theunwrapping method outlined in Section 1. The challenge of describing and visualizing the geometry ofembeddedcurvedsurfacesiscommonlyencounteredinphysics[2],computervision,andmachinelearning[3]tasks.Thetechniquesusedarethosefromdifferentialgeometryor,morespecifically,Riemanniangeometry.For the sake of completeness and consistency, we adopt this more formal mathematical description. Weprovideabrief introduction to theessential topics; formoredetails,we refer the reader to [2]. Letℳbeasmooth𝑚-dimensionalmanifoldand𝑔theRiemannianmetricdefined foreverypoint𝑝 ∈ ℳ. Fora smoothmanifold𝒩withdim(𝒩) ≡ 𝑛 ≥ 𝑚,let𝑓:ℳ → 𝒩beanisometricembedding,i.e.forall𝑝 ∈ ℳandtangentvectors𝑢, 𝑣 ∈ 𝑇7ℳ

8

𝑢, 𝑣 89 = 𝑑𝑓7 𝑢 , 𝑑𝑓7 𝑣 <=(9),here , 89 is the inner product on the tangent space𝑇7ℳ,𝑔7 ≡ 𝑔(𝑝), andℎthemetric defined for everypoint𝑞 ∈ 𝒩. 𝑑𝑓7: 𝑇7ℳ → 𝑇A(7)𝒩istheJacobianof𝑓at𝑝.Foradataset𝐷 = 𝑞C, … , 𝑞E ofpointsin𝒩,dimensionalreductionisthetaskof inferringtheinversemap𝑓FC:𝒩 → ℳ.Formanypurposesitisoftensufficienttoinferthecorrespondingimages 𝑥 𝑝C , … , 𝑥(𝑝E) for𝑞H = 𝑓(𝑝H)and in some coordinate chart𝑥:ℳ → ℝJ. Throughout this paper,we restrict ourselves to𝒩 ⊂ℝL(i.e. 3D imaging data). Ifℳ ⊂ ℝC,Mthen we can use linear dimensional reduction methods. Here weconsidertwolinearandthreenon-linearmethods.ProjectionintotheXY–plane.Atrivialandlineardimensionalreductionmethodisthesimpleprojectionontosomepre-defined set of coordinate axes.We assume,without loss of generality, that these are the firstmcoordinates;intwodimensions,thesearetheX-andY-axes,hencethename.Thenfor𝑖 = 1, … , 𝑁,wesimplyhave𝑥(𝑝H) Q = 𝑞H Q,𝑎 = 1, … ,𝑚,wherethe𝑎subscriptisthecoordinateindex.Principalcomponentanalysis.Insteadofpre-specifyingtheaxes,wecanperformlineardimensionalreductionviaprincipalcomponentanalysis (PCA).Let𝐶 = C

EFC(𝑞H − 𝑞)(𝑞H − 𝑞)UE

HVC be thesamplecovariancematrix,

with themean𝑞 = CE

𝑞HEHVC . Then for the rotationmatrix𝑅 = 𝑒C 𝑒M 𝑒L 𝜖𝑂 𝑁 , 𝑒C, 𝑒M, 𝑒Lthe eigenvectors

of𝐶indecreasingorderoftheirrespectiveeigenvalues,wehave𝑥(𝑝H) Q = 𝑅𝑞H Q,𝑎 = 1, … ,𝑚.Unwrapping.Weintroduceamethodtomapdatapointsona2Dconvexsurfaceontoasubspaceof𝔼M,the2D Euclidean space, which we call the Unwrapping method. This method is a particularly effectiveapproximation for surfaces of small intrinsic curvature (e.g. a thin cigar-shaped surface, a small patch on alargecurvedsurface,etc).Theunwrappinghappensintwosteps,bothofwhichinvolvesfittingaseriesof1Dellipses to thedata. The first step is a transformation,which removes the intrinsic curvatureof the surfacewhile seeking tomaintain thegeometrical relationshipsbetween thepoints (i.e.distances, angles, etc). Thesecondstepsimplyunwrapsthetransformedsurfaceontoaflat2Dsurface.Let(𝑥H, 𝑦H, 𝑧H)representthe3Dcoordinatesofthedatapoint𝑞H.Ifweapproximatethedatasetaspointsonasubspaceofanellipsoid,wechoosetoalignourcoordinatesystemsuchthatthelargestradiusoftheellipsoidisdescribedbythex-axisandthesecondlargestradiusisthey-axis.Nextthedatapointsareclusteredintonequal-sizedbinsalongthex-axis.Foreachclustercwefitanellipse𝐸_ asthelocusoftheequation`a

bca+ (eFJf)a

bfa= 1,

for radii𝑟 , 𝑟e and the z-coordinate𝑚e of the midpoint 𝑥_, 0,𝑚e ,with𝑥_ the mean x-coordinate of theclusterc.Theellipse is thenunrolledontoastraight lineparallel tothey-axiswith𝑥 = 𝑥_ and𝑧 = 𝑧_JQithemaximumz-coordinatevalueofthepointsinclusterc.Thisthenguidesthefirsttransformationofthepoints(𝑥H, 𝑦H, 𝑧H) → (𝑥Hj, 𝑦Hj, 𝑧Hj)as follows. We keep the x-coordinate fixed, i.e.𝑥Hj = 𝑥H. As for the z-coordinate,because,ingeneral,thedatapointsdonotlieontheellipse(i.e.∉ 𝐸_),welet𝑧Hjbeequaltothedifferenceindistances to the center of𝐸_ from𝑞H and the point on𝐸_ along the line joining𝑞H and the centre. It can beshownthat

9

𝑧Hj = |𝑞H − 𝑞Hm| ≡ 𝑑H,where𝑞Hm isapointwithcomponents[𝑞Hm]C = 𝑥H

[𝑞Hm]M =𝑦H𝑟eM

2 𝑧H − 𝑚e 𝑟M−

𝑦HM𝑟eq

4(𝑧H − 𝑚e)M𝑟q+

𝑦H𝑟eM

𝑧H − 𝑚e

[𝑞Hm]L =𝑟M𝑟eq − 𝑟eM([𝑞Hm]M)M

𝑟M

Todetermine𝑦Hj,wedefineapoint-specificellipse𝐸H withthesamecentre 𝑥_, 0,𝑚e as𝐸_ butwithradii𝑟e,H = 𝑟e + 𝑑H and𝑟 ,H = 𝑟e,H

bcbf.Thenif𝑞H,eistheintersectionofthe𝐸H withthe

xy-plane,𝑦Hjis then the arc length of𝐸H between𝑞H,eand𝑞H. Repeating this transformation for alln clustersresultsinthefirstunwrappingofthedatapoints.Forthesecondunwrappingthesameprocedureisrepeatedonthetransformeddatasetgiving𝑞Hj → 𝑞Hjj,butwiththevariableswap𝑥 ↔ 𝑦.Note, ifalldatapoints{𝑞H}HVCE liestrictlyonanellipsoidalsurfacethen𝑧Hjj = 0forall𝑖 = 1, … , 𝑁.Anexampletutorialisprovidedinsuppl.material.Manifoldlearning.Manifoldlearningreferstoaclassofnon-lineardimensionalreductionmethodsthatseekto recover the geom etry of the low-dimensional manifold. These include ISOMAP [4], Locally LinearEmbedding(LLE)[5],andLaplacianEigenmaps[6],amongstseveralothers.Ineverycase,themetriconMisaglobalEuclideanmetric,i.e.𝑔Qv = 𝛿Qv,whereδthekroneckerdeltaoridentitymatrixanda,bthecoordinateindices.WerefertotheseapproachesasEuclideanmanifoldlearning.InthispaperwehaveemployedLLEinoursimulationandanalysisofrealdata.LLEisbasedontheexpectationthatgivenasufficientlylargedataset,eachdatapointanditsclosestneighborslieonalocallylinearpatchofthe surface. The algorithm has two steps: 1. Expressing each higher dimensional data point as a linearcombinationofitsneighbors,and2.Obtainingasetoflowerdimensionalcoordinatesgivenrelationsabove.Inboth steps, we proceed by minimizing the mean square errors of the data points from their linearreconstructions.Ourmotivation foradopting LLE comes fromboth its intuitiveapproach todimensionality reduction (locallylinear patches) and also its effectiveness in providing an isometric reduction for surfaces with no intrinsiccurvature,e.g.thesurfaceofacylinderordatapointsonacylindrical‘swiss-roll’.Furthermoreitisalsowidelyusedinthemachinelearningcommunity.InthispaperwehaveusedtheimplementationofLLEintheScikit-learnPythonmachinelearningpackage.Riemannian manifold learning. Riemannian manifold learning aims to augment the set of coordinates𝑥 𝑝H HVC

E withthecorrespondingsetoflocalmetricvalues,i.e.(𝑥 𝑝H), 𝛿Qv HVC

E → (𝑥 𝑝H , [𝑔 𝑝H ]Qv) HVCE ,

where𝑎, 𝑏 = 1, … ,𝑚labelthemetriccomponents.Bydefinition,𝑔issymmetricandpositivesemidefinite.Inthissetup,onecanrecovertheprecisegeometricalinformationoftheembeddedmanifold.Inthispaper,wehaveadaptedtheLEARNMETRIC5[3]algorithmtorecoverthe2Dcoordinatesandthecorrespondingmetriccomponentsfrom3Ddatapoints.Forcertainapplications,suchascomputingthegeodesics(seebelow),thereisaneedtoderivemetricvaluesfor out-of-sample points onℳ. Themetric𝑔(𝑥)for𝑥 ∉ {𝑥(𝑝H)}HVCE are approximated in two steps. Firstweperform a regression for each of the𝑚(𝑚 + 1)/2unique components of𝑔. In this paper we used the

10

implementationofGaussianProcessesintheScikitlearnPythonmachinelearningpackage.Secondwesatisfythepositivesemidefiniteconstraintbyreplacingthematrix𝑔(𝑥)withthenearestpositivesemidefinitematrixasmeasuredbytheFrobeniusNorm.WeimplementthisusingtheapproximationmethodofHigham(2002)[7].1.8.ExtractionofgeometricalinformationAlltherelevantgeometricalinformationofinterestcanbeextractedfromthemetric.Theanglebetweenthetwovectors𝑢, 𝑣isgivenby

𝜃 = 𝑐𝑜𝑠FC ~,� �9

~,~ �9 �,� �9.

Thegeodesic𝛾: ℝ → ℳ,isthepathofextremallengthandisthesolutiontothesetofHamiltonianequations.Withslightabuseofnotationwriting𝑥(𝛾 𝑡 ) ≡ 𝑥(𝑡),theseequationsare𝑥Q = ��

�b�= 𝑔Qv𝑟v,

𝑟Q = − CM�8��

�i�𝑟v𝑟_,

where𝑟Qistheconjugatemomentato𝑥Q, 𝑔Qvthecomponentsoftheinversemetric,𝑥 ≡ 𝑑𝑥/𝑑𝑡,andtheHamiltonian

𝐻 =12𝑔Qv𝑟v

Thebiasdirectionfromcellatagiventimetoapointsourceofattractionisgivenbythetangentvector𝑢jtothe geodesic connecting the two points. Therefore we solve the geodesic equations (9) for𝑥(𝑡)under theconstraints𝑥 𝑡 = 0 = 𝑥 𝑝C ,𝑥 𝑡 = 1 = 𝑥(𝑝M).Oneapproachistosimulatethesegeodesicsfrom𝑝Cvia,say,thesimpleEulermethod[8]andfindtheinitialvector thatgenerates thepointson thegeodesic that intersects𝑝M.However this seeminglystraightforwardprocess ishighly sensitive toerrors in theout-of-samplemetricapproximations– theerrors compoundandoneoftenendsupwithunstabletrajectories.Inthispaper,weimplementamorestableandefficientdiscreteapproximationasfollows.Wefirstoverlayagridoverthelearnedmanifold.Herewefixthegriddimensionsto200x200.Next,usingtheGaussianProcess regressionmethoddescribedabove,weobtain themetric valuesat every grid vertexandconsequentlythelengthsofthesidesofthecellsacrossthegrid.Thenweapproximateourgeodesicsbetweenthe point of interest and the bias point by the grid path thatminimizes the total length.WeusdDijkstra’salgorithm to accomplish this step. Finally, we approximate the initial vector by performing a simple linearregressiononthefirstfewverticesinourdiscretizedgeodesicpath.1.9.DirectionalStatistics.Thestraightness index,𝐷, is commonlyused to investigatecellmigrationstrategiesand it isdefinedas𝐷 =|i�,i�|

�,where𝑥�is thepositionof thecellat time0,𝑥U is thepositionof thecellat timeT,|𝑥�, 𝑥U|indicates

theshortestdistancebetween𝑥�and𝑥U and𝑙istheactuallengthofthepaththecelltookfrom𝑥�to𝑥U .FormostapplicationstheshortestpathbetweenstartandendpointissimplytheEuclideandistance,however,forcurvedsurfacesthemetricneedstobelearned.Anotherwayofdescribingcellmigrationtracksisbydeterminingtheirbiastowardsaspecifictarget(sourceof

11

attractant, likewoundsorothercell types)and theirpersistence.Wedefine thepersistenceofacellas theprobabilityofthecellmovingattimetinthesamedirectionasattime𝑡 − 1.Therefore,weneedtocomputethe angles (𝛽�) between twomotion vectors,whichwill result in a characteristic distribution. Thewrappednormal distribution has been successfully used to describe persistent movement of cells. The probabilitydensityfunctionisdefinedas

𝑁� 𝛽� 𝜇, 𝜎 = C� M�

exp(− (��F��M��)a

M�a)�

�VF� ,where 𝜇is themeanand𝜎is the standarddeviation. In thecaseofpersistencewehave𝜇 = 𝛽�FC.Wecanthendefinethestrengthofthepersistence𝑝as𝜎 = −2log(𝑝).Acellthatishighlypersistenthasa𝑝closeto1,whileacellthatisnotpersistentatallhasa𝑝of0,inwhichcasethewrappednormaldistributionbecomesa wrapped uniform distribution. The bias of a cell is also described by an angular distribution. Here, wecomputetheangle(𝛼)betweenamotionvectorofacellandthevectorthatpointsfromthecelltowardsthetarget.Again,weapply thewrappednormaldistributionwith thebiasparameter𝑏(insteadof𝑝)describingthestrengthofthebiasand𝜇 = 0.1.10.SimulationofrandomwalksonellipsoidsTheellipsoidisdefinedasthesetofpointssatisfyingia

Qa+ `a

va+ ea

_a= 1,

with𝑥, 𝑦, 𝑧the3Dcartesiancoordinates,and𝑎, 𝑏, 𝑐thethreeshapeparameters.Weconsiderseveraldifferentellipsoidshapeswith𝑎/𝑐ratiosfromtheset{1,0.66,0.5,0.4,0.33,0.1}where𝑎 = 𝑏throughout.Weusethe2Dparameterization𝑥 = 𝑎cos𝜇sin𝜈,𝑦 = 𝑏sin𝜇sin𝜈,𝑧 = 𝑐cos𝜈,

withmetriccomponents𝑔�� = sinM𝜈 𝑎MsinM𝜇 + 𝑏McosM𝜇 ,𝑔�¦ = 𝑔¦� = sin𝜇cos𝜇sin𝜈cos𝜈,𝑔¦¦ = cosM𝜈 𝑎McosM𝜇 + 𝑏MsinM𝜇 + 𝑐MsinM𝜈.

Wesimulate400randomwalksoneachoftheellipsoidsasfollows:startingfromthe initialpoint 𝜇�, 𝜈� =𝜋, �

M,werandomlyselectaninitialangleofmotion𝜃�~𝐵(𝜃, 𝜃�j ),where𝐵isthebiasangledistributionwith

𝜃�j thedirectiontothebiassourceattime-stepindext;𝜃�j isdeterminedfollowingtheminimizationproceduredescribed above and, in turn, defines an initial tangent vector. The particle moves along the geodesicgeneratedbythisvectorwithrandomsteplengthtakenfroma𝜒M-distributionwithmean𝑘 = 2;theabsolutelengthsarescaledwithaconstant factor in therange0.015–0.25toensurethat thetrajectoriescover theellipsoid.Ateachsubsequenttimestep𝑡 > 1,theparticlechangesdirectionandtakesanangleaccordingtoaweighteddistribution𝜃� = 𝑤𝐵 𝜃, 𝜃�FCj + 1 − 𝑤 𝑃(𝜃 − 𝜃�¯�°),where𝑃is thepersistenceangledistributionand𝜃�¯�° isangleofmotionprior tochangingdirectionsat timestept.Both𝑃and𝐵areeitheruniformdistributionsintherange(0,2𝜋]orwrappednormaldistributionswithparameter𝜎 = 1.2,dependingonthe typeof randomwalkbeingsimulated.For thebiaspersistent randomwalk,wefixtheweightparametertobe𝑤 = 0.5.Werepeateachpathfor20timesteps,giving21trajectorypointsperpath.

12

2.Supplementalfigure,movieanddatalegendsFigure S1. Related to Figure 2. Performance comparison of xy-projection, the unwrapping method andmanifoldlearningmethods.Shownaretheexactdistributions(black)thatdescribebiasandpersistencefor3types of randomwalks: (A) Brownianmotion, (B) biased randomwalk and (C) persistent randomwalk. Celltrajectoriesweresimulatedonthesurfaceofellipsoidswithdifferentratiosof itsradii (a/cratio=0.1,0.25,0.33, 0.5, 0.66 and 1.0). For each scenario we compute the distributions based on the xy-projection, theunwrappingmethodandthetwomanifoldlearningmethods.Thetwomanifoldlearningmethodsarenotableto deal with the two lowest a/c ratios and where left out. The reason is that these ellipsoids were tooelongated,sothatthemanifoldlearningmethodstreatedthedataasiftheywerelocatedinaslimplane.(E)Shownaretheexactpersistenceangledistribution(black),thepersistenceangledistributioncomputedfromthe3Dvectors(green)andthepersistenceangledistributionresultingfromunwrappingthedataintoanon-curved surface (red). Theunderlying data are persistent randomwalk trajectories simulatedon an ellipsoidwith radii ratio a/c = 0.33. (F-G) Application of the unwrapping method (orange) to neutrophil cell tracksextracted from the epidermis overlying the yolk syncytium of a zebrafish and its comparison to the xy-projection (blue)andprinciplecomponentanalysis (green).Theepidermiswaswoundedwitha laserbeforeimageacquisition.(CorrespondstoFigure2F)Figure S2. Related to Figure 2. Performance measurements of the different methods. We computed theKolmogorov-Smirnovdistance(ks-distance)betweenthetrueangulardistributionsandtheextractedangulardistributionsusing xy-projection, unwrapping, Euclidianmanifold learning andmetricmanifold learning.WeconsideredthesamedatageneratedfortherandomwalkmodelsdescribedinSupplementalFigure1A-C.Thesmallertheks-distanceis,thebetteristheperformanceofthemethods.FigureS3.RelatedtoFigure2.Shownarethe2DprojectionsofcelltracksextractedfromDrosophilaembryos(coloredtracks)afterapplying(A)xy-projection,(B)unwrappingand(C)manifoldlearning.Thegreen,orange,darkblueandredtracksarehighlightedforeasycomparisonbetweenthemethods.Thegreylinesin(B)and(C)areforcomparisonwiththexy-projectionastheyarethesametracksusingthexy-projection.FigureS4.RelatedtoFigure2.Manifoldlearningwithmorecomplexdata.(A)Wesimulatedrandomwalkcelltrajectorydataonacomplexsurface.Shownisthe3Drepresentationofthesedata(blue)withthexy-,xz-andyz-projections(grey).Thisexampleistoocomplextobesuccessfullytransformedviaunwrapping,butcanbedealtwithmanifoldlearningtechniques.ThetrajectoriesdisplayaBrownianmotiontyperandomwalk,wherethebiasandpersistencedistributionsareexpectedtobeflat.(B)Weanalyzethedatainthexy-projectionandcomparethistotheEuclidianmanifoldlearningalgorithmandthemetricmanifoldlearningalgorithm.Ascanbe seen in supplemental figure 4B, the xy-projection induces extremeartifacts. Thebias distribution iswellextractedfrombothmanifoldlearningmethods,whilethepersistencedistributionisonlycorrectlyextractedusingthemetricmanifoldlearningalgorithm,highlightingtheneedtoaccuratelylearnthemetricofthedataMovieS1.Related toFigures1and2.ShownareexamplerawdatafortheunwoundedDrosophilaembryodatasetanalyzedinfigures1A,2EandSupplementalFigure1.Time-lapsemovieofthedynamicbehaviorofDrosophila immunecells(heamocytes)inunwoundedtissue.EpithelialcellsarelabeledusingE-cadherin-GFP(greencelloutlines),immunecellnucleiarelabeledusingnuclearRed-Stinger(red)andimmunecellcytoplasmusingcytoplasmicGFP(green)bothdrivenbysrp-Gal4.Movie S2.Related to Figures 1 and 2. Shown are example raw data for the wounded zebrafish data setanalyzed in figures 1B and 2F. Time-lapse movie of the dynamic behavior of zebrafish immune cells(neutrophils)inlaser-inducedwoundedtissue.ImmunecellsarelabeledusingcytoplasmicdsRed(red)drivenbythelysozymeC(lyz)promoter.Movie S3. Related to Figures 1 and 2. Shown are the haemocyte cell tracks extracted from Drosophilaembryosin3D.Thedifferentrotationanglesshowthecurvatureofthespacethehaemocytesaremigratingin.MovieS4.RelatedtoFigures1and2.Shownaretheneutrophilcelltracksextractedfromzebrafishembryo.

13

Theepidermisoverlying theyolk syncytiumwaswounded.The locationof thewound is indicatedbya lightbluedot.DataS1.RelatedtoExperimentalProcedures.ThisfoldercontainsthetwoJupyternotebooksforUnwrappingandManifoldlearning.Furthermoreitincludesthetwowebsitesforbothmethods.Exampledataarestoredinthefolder‘SimulationData’.

DataS2.RelatedtoExperimentalProcedures.ThisfoldercontainsalldescribedRscriptsandtheprovidedexampledatainordertoperformUnwrappingonsimulatedandinvivodata.

3.Supplementalfigures.

14

FigureS1.RelatedtoFigure2.

15

FigureS2.RelatedtoFigure2.

FigureS3.RelatedtoFigure2.

FigureS4.RelatedtoFigure2.

16

4.SupplementalReferences.1. Pau,G.,Fuchs,F.,Sklyar,O.,Boutros,M.&Huber,W.Ebimage -anrpackage for imageprocessingwith

applicationstocellularphenotypes.Bioinformatics26,979–81(2010).2. Nakahara,M.Geometry,TopologyandPhysics,SecondEdition(CRCPress,2003).3. Perraul-Joncas,D.&Meila,M.Non-lineardimensionalityreduction:Riemannianmetricestimationandthe

problemofgeometricdiscovery.arXiv.org(2013).1305.7255v1.4. Tenenbaum,J.B.,deSilva,V.&Langford,J.C.Aglobalgeometricframeworkfornonlineardimensionality

reduction.Science(NewYork,N.Y.)290,2319–+(2000).5. Roweis, S. T.& Saul, L. K.Nonlinear dimensionality reduction by locally linear embedding. Science (New

York,N.Y.)290,2323–+(2000).6. Belkin,M.&Niyogi,P.Laplacianeigenmapsfordimensionalityreductionanddatarepresentation.Neural

Computation15,1373–1396(2003).7. Higham,N.J.Computingthenearestcorrelationmatrix-aproblemfromfinance.IMAJournalofNumerical

Analysis22,329–343(2002).8. Leimkuhler,B.&Reich,S.SimulatingHamiltonianDynamics(CambridgeUniversityPress,2004).