Analysing RNA-Seq data with the DESeq package - Bioconductor
Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of...
Transcript of Analysis of single-cell RNA-seq data with R and Bioconductor · 2017. 8. 8. · Analysis of...
Analysisofsingle-cellRNA-seqdatawithRandBioconductor.
FannyPerraudeau,KellyStreet,DavideRisso
Tasksandpackages
• DimensionalityreducDonwithzinbwave.– bioconductor.org/packages/zinbwave
• ClusteranalysiswithclusterExperiment.– bioconductor.org/packages/clusterExperiment
• Lineageinferenceandtrajectoryanalysiswithslingshot.– github.com/kstreet13/slingshot
Workshopmaterial:github.com/fperraudeau/bioc2017singlecell
Acknowledgements UC Berkeley
Methods development Sandrine Dudoit
Elizabeth Purdom Nir Yosef
Svetlana Gribkova JP Vert
Experiments and analyses Russell Fletcher
Diya Das John Ngai Lab
Core Facilities QB3 Functional Genomics Laboratory
QB3 Genomics Sequencing Laboratory Cancer Research Laboratory
Funding
NIH BRAIN Initiative Cell Census Consortium
Adult stem cells maintain and regenerate tissues
Bond, Ming, Song (2015) Cell Stem Cell
Fletcher et al. (2017) Cell Stem Cell
Usefullinks
• ZINB-WaVEpreprint:– hYps://doi.org/10.1101/125112
• Slingshotpreprint:– hYps://doi.org/10.1101/128843
• Pleasesubmitbugreportsandotherissuesat:– github.com/drisso/zinbwave/issues– github.com/epurdom/clusterExperiment/issues– github.com/kstreet13/slingshot/issues
Contactus!
• E-mail– [email protected]– [email protected]– [email protected]
• TwiYer:– @fannyperraudeau– @KStreetBerkeley– @drisso1893
RSEC:Resampling-basedSequenDalEnsembleClustering
• UseaclusteringrouDnethatfindsalargenumberofsmall,coherentclusters:– Subsamplingofdatatofindrobustclusters– SequenDalclusteringàfindagroupofcoherentsamples,removethem,startover
• PerformthisrouDneovermanydifferentparameters• Findasingleconsensusovertheclusterings• Mergetogethernon-differenDalclusters• FindbiomarkersviadifferenDalexpressionwithtargetedcomparisons
• ImplementedwithvisualizaDontoolsinclusterExperiment package
Whatmeanbysubsampleclustering?
• Pickanunderlyingclusteringstrategy(e.g.kmeansorPAMwithparDcularchoiceofK)
• Repeatthefollowing:– Subsamplethedata,e.g.70%ofsamples– Findclustersonthesubsample
• CreatecoClusteringmatrixD:%ofsubsampleswheresampleswereinsamecluster
ExamplesofmatrixD
Note,hereforcedthesamplesinordergivenbyPAMAlsousedkmeansinresampling,ratherthanPAM
K=3 K=7
Whatmeanbysubsampleclustering?
• Pickanunderlyingclusteringstrategy(e.g.kmeansorPAMwithparDcularchoiceofK)
• Repeatthefollowing:– Subsamplethedata,e.g.70%ofsamples– Findclustersonthesubsample
• CreatecoClusteringmatrixD:%ofsubsampleswheresampleswereinsamecluster
• ClustermatrixDforfinalclustering– Couldbewithoriginalclusteringstrategy,ordifferentone– “Right”KmaynotbetheKusedinoriginalclustering(e.g.kmeans)
– Weuseamoreflexibleapproachofhierarchicalclusteringandpickingclusterssohaveatleast1-αsimilarity
– ChangefrompickingKtopickingα,moreintuiDvechoice– Notallsamplesgetclustered
WhatmeanbysequenDalclustering?
• OverrangeofstarDngparameters,doclustering• Theclusterthatstaysatleastβsimilar,idenDfyascluster
andremove• RepeatunDlnomoresuchclustersfound,ornotenough
sampleslen• Drawsonideasof“Dghtclustering”ofgenesofTsengand
Wong(2005)
Specifically,• WerangeoverKinunderlyingPAMinsubsampling• Wefindclustersbasedonresultsofsubsampleclustering
(somaynotbesameKasinputparameter)
Becauseofsubsampling,changingKismoreaperturbaDon
K=4 K=5
K=6 K=7
Findconsensuscluster
• Createaco-ClusteringmatrixDofhowmanyDmesco-clustertogetherandclusterD– Likewithsubsampling,onlynowacrossdifferentparameters
– Again,importantthatclustersarelargelyperturbaDons,notradicallydifferentclusters
Bioconductorworkflowforsingle-cellRNA-seqdataanalysis:
dimensionalityreduc<on,clustering,andpseudo<meordering.
FannyPerraudeau,DavideRisso,KellyStreetBioC2017
July28th,2017
Clustereddatainlow-dimensionalspace.Weconsiderdimensionalityreduc<onandclusteringtobeseparateproblems,butgenerallypreferPCAandRSEC.
2
Step0:Input
●●
●● ●
●
●
●●
●
●●●
●
● ●●
●
●
●
●
●●
●
●●●
●● ●●●
●
●
●●
●●●
●●
●
● ●●●
●
●
●
●●●●
●●
●
●
●● ●
●
●●
●●● ●
●
●●
●●
●
●●
●
●● ●● ●
●● ●●
●
●●
●
●
●
●●● ●
●●●●
● ● ●
●
●
●
●
●
●●●
●●●●
●
●●
●
●
●
●●
●●
●
●
●
●●
● ●●
●
●
●
●● ●
●●
●●
●● ●
●
●
●●
●
●●●
●
● ●●
●
●
●
●
●●
●
●●●
●● ●●●
●
●
●●
●●●
●●
●
● ●●●
●
●
●
●●●●
●●
●
●
●● ●
●
●●
●●● ●
●
●●
●●
●
●●
●
●● ●● ●
●● ●●
●
●●
●
●
●
●●● ●
●●●●
● ● ●
●
●
●
●
●
●●●
●●●●
●
●●
●
●
●
●●
●●
●
●
●
●●
● ●●
●
●
●
●● ●
●●
●●
●● ●
●
●
●●
●
●●●
●
● ●●
●
●
●
●
●●
●
●●●
●● ●●●
●
●
●●
●●●
●●
●
● ●●●
●
●
●
●●●●
●●
●
●
●● ●
●
●●
●●● ●
●
●●
●●
●
●●
●
●● ●● ●
●● ●●
●
●●
●
●
●
●●● ●
●●●●
● ● ●
●
●
●
●
●
●●●
●●●●
●
●●
●
●
●
●●
●●
●
●
●
●●
● ●●
●
●
●
●● ●
●●
● ● ●
●
●
3
Step1:ClusterMST
Constructminimumspanningtree(MST)onclusters.Thenodesareclusters,notclustercenters.Requiresadistancemetric.
●●
●● ●
●
●
●●
●
●●●
●
● ●●
●
●
●
●
●●
●
●●●
●● ●●●
●
●
●●
●●●
●●
●
● ●●●
●
●
●
●●●●
●●
●
●
●● ●
●
●●
●●● ●
●
●●
●●
●
●●
●
●● ●● ●
●● ●●
●
●●
●
●
●
●●● ●
●●●●
● ● ●
●
●
●
●
●
●●●
●●●●
●
●●
●
●
●
●●
●●
●
●
●
●●
● ●●
●
●
●
●● ●
●●
● ● ●
●
●
4
Step1:ClusterMST
Constructminimumspanningtree(MST)onclusters.Selectstar<ngclusterbasedonmarkergenesorparsimony.
●●
●● ●
●
●
●●
●
●●●
●
● ●●
●
●
●
●
●●
●
●●●
●● ●●●
●
●
●●
●●●
●●
●
● ●●●
●
●
●
●●●●
●●
●
●
●● ●
●
●●
●●● ●
●
●●
●●
●
●●
●
●● ●● ●
●● ●●
●
●●
●
●
●
●●● ●
●●●●
● ● ●
●
●
●
●
●
●●●
●●●●
●
●●
●
●
●
●●
●●
●
●
●
●●
● ●●
●
●
●
●● ●
●●
● ● ●
●
●
5
Step1:ClusterMST
Specifyknownterminalclustersforaddi<onalsupervision.Thisresultsintheconstruc<onofaconstrainedMST.
●●
●● ●
●
●
●●
●
●●●
●
● ●●
●
●
●
●
●●
●
●●●
●● ●●●
●
●
●●
●●●
●●
●
● ●●●
●
●
●
●●●●
●●
●
●
●● ●
●
●●
●●● ●
●
●●
●●
●
●●
●
●● ●● ●
●● ●●
●
●●
●
●
●
●●● ●
●●●●
● ● ●
●
●
●
●
●
●●●
●●●●
●
●●
●
●
●
●●
●●
●
●
●
●●
● ●●
●
●
●
●● ●
●●
● ● ●
●
●
6
Step1:ClusterMST
Constructminimumspanningtree(MST)onclusters.Selectstar<ngclusterbasedonmarkergenesorparsimony.
●●
●● ●
●
●
●●
●
●●●
●
● ●●
●
●
●
●
●●
●
●●●
●● ●●●
●
●
●●
●●●
●●
●
● ●●●
●
●
●
●●●●
●●
●
●
●● ●
●
●●
●●● ●
●
●●
●●
●
●●
●
●● ●● ●
●● ●●
●
●●
●
●
●
●●● ●
●●●●
● ● ●
●
●
●
●
●
●●●
●●●●
●
●●
●
●
●
●●
●●
●
●
●
●●
● ●●
●
●
●
●● ●
●●
7
Step1.5:PrincipalCurves
Highlystable,nonlineargeneraliza<onofprincipalcomponents.Fitsacurvetothe“middle”ofthedata.Inconsistent,failstoreflectunderlyingbiology(ie.branching).
●●
●● ●
●
●
●●
●
●●●
●
● ●●
●
●
●
●
●●
●
●●●
●● ●●●
●
●
●●
●●●
●●
●
● ●●●
●
●
●
●●●●
●●
●
●
●● ●
●
●●
●●● ●
●
●●
●●
●
●●
●
●● ●● ●
●● ●●
●
●●
●
●
●
●●● ●
●●●●
● ● ●
●
●
●
●
●
●●●
●●●●
●
●●
●
●
●
●●
●●
●
●
●
●●
● ●●
●
●
●
●● ●
●●
8
Step2:SimultaneousPrincipalCurves
S<llhighlystableandnonlinear.Extendstheconceptofprincipalcurvestomul<ple,branchingcurveswithcommonorigin(smoothtreestructures).
●●
●● ●
●
●
●●
●
●●●
●
● ●●
●
●
●
●
●●
●
●●●
●● ●●●
●
●
●●
●●●
●●
●
● ●●●
●
●
●
●●●●
●●
●
●
●● ●
●
●●
●●● ●
●
●●
●●
●
●●
●
●● ●● ●
●● ●●
●
●●
●
●
●
●●● ●
●●●●
● ● ●
●
●
●
●
●
●●●
●●●●
●
●●
●
●
●
●●
●●
●
●
●
●●
● ●●
●
●
●
●● ●
●●
9
Step2:SimultaneousPrincipalCurves
S<llhighlystableandnonlinear.Extendstheconceptofprincipalcurvestomul<ple,branchingcurveswithcommonorigin(smoothtreestructures).
10
Step2:SimultaneousPrincipalCurves
Projectcellsontocurvestoobtainpseudo<mevalues.Likelinearprincipalcomponents,thecurvesseektominimizesquaredprojec<ondistance(subjecttosomeconstraints).
●●
●● ●
●
●
●●
●
●●●
●
● ●●
●
●
●
●
●●
●
●●●
●● ●●●
●
●
●●
●●●
●●
●
● ●●●
●
●
●
●●●●
●●
●
●
●● ●
●
●●
●●● ●
●
●●
●●
●
●●
●
●● ●● ●
●● ●●
●
●●
●
●
●
●●● ●
●●●●
● ● ●
●
●
●
●
●
●●●
●●●●
●
●●
●
●
●
●●
●●
●
●
●
●●
● ●●
●
●
●
●● ●
●●