Supplemental Information Automatic Segmentation of … Information Automatic Segmentation of...

8
Current Biology, Volume 26 Supplemental Information Automatic Segmentation of Drosophila Neural Compartments Using GAL4 Expression Data Reveals Novel Visual Pathways Karin Panser, Laszlo Tirian, Florian Schulze, Santiago Villalba, Gregory S.X.E. Jefferis, Katja Bühler, and Andrew D. Straw

Transcript of Supplemental Information Automatic Segmentation of … Information Automatic Segmentation of...

Current Biology, Volume 26

Supplemental Information

Automatic Segmentation of Drosophila

Neural Compartments Using GAL4 Expression

Data Reveals Novel Visual Pathways

Karin Panser, Laszlo Tirian, Florian Schulze, Santiago Villalba, Gregory S.X.E.Jefferis, Katja Bühler, and Andrew D. Straw

 Figure  S1.  (data  related  to  Figure  1):  Evaluation  of  k-­‐medoids  clustering  for  automatically  segmenting  brain  regions  into  anatomical  structures.  (A)  Repeatability  scores  across  multiple  runs  of  the  k-­‐medoids  algorithm.  The  adjusted  Rand  index,  a  measure  of  repeatability,  was  calculated  based  on  10  repeated  runs  of  the  k-­‐medoids  algorithm  for  both  datasets  and  several  brain  regions.  (B)  Colocalization  similarity  (measured  as  Dice  coefficient  s  on  the  set  of  voxels  in  the  manually  annotated  region  and  the  set  in  the  clustering  result)  between  the  Janelia  FlyLight  dataset  and  manual  assignments  using  the  same  3D  template  brain.  Manual  assignments  were  based  on  a  manually  segmented  neuropil  image.  Glomeruli  that  could  not  be  unambiguously  identified  were  labeled  “glomerulus”.  (Janelia  FlyLight  data  for  the  right  antennal  lobe  region,  run  1,  6502  voxels,  3462  driver  lines,  k  equal  60.)  (C)  Automatic  segmentation  of  central  complex  (CX).  3D  axes  scale  30  µm.    (D)  Individual  singleton  clusters  (left)  and  average  image  of  strongly  expressing  driver  lines  in  each  cluster  with  broad  driver  lines  removed  (right).  Scale  bars  20  µm.  (E)  Average  images  from  agglomerated  clusters  (top)  and  dendrogram  of  agglomerated  hierarchy.  Scale  bars  20  µm.  (F)  As  in  E,  but  from  the  Vienna  dataset,  k=60.  Scale  bars  20  µm.  Panels  C-­‐E:  Janelia  FlyLight  data  for  CX,  run  1,  27598  voxels,  3462  driver  lines,  k=60.  

A

0 20 40 60 80 100 120 140 160number of clusters (k)

0.0

0.2

0.4

0.6

0.8

1.0

Adju

sted

Ran

d In

dex

Janelia ALJanelia CXJanelia MBJanelia oVLNPJanelia SEZVienna ALVienna CXVienna MBVienna oVLNPmean

automatic singleton cluster assignment

man

ual a

nnot

atio

n

co-localization similarity (s)(Dice coefficient)

B

C43 C13

C23C46

singletoncluster average image

singletoncluster average image

C D

0.0 0.64

E F

C116C110

C43

C13

C23

C46

Janelia dataset, k=60, run 1

C110 C116 C117

C117

Vienna dataset, k=60, run 1

C88

C108C110

C88 C108 C110

agglomerated cluster average images agglomerated cluster average images

C51

C39

C29

C48

C55

C04

C03

C14

C58

C40

C01

C02

C45

C06

C21

C33

C26

C19

C22

C53

C47

C36

C10

C05

C07

C28

C49

C13

C41

C34

C38

C12

C09

C25

C23

C08

C35

C42

C32

C11

C15

C16

C17

C18

C20

C24

C27

C30

C31

C37

C43

C44

C46

C50

C52

C54

C56

C57

C59

C60

VVA3DL3

glomerulusVM2DM6DM3

glomerulusglomerulus

VL1DA1

VL2pVM3VA2VA5

glomerulusDM5,DM2,DM1

glomerulusglomerulus

DL5D

VL2aDP1lDL2dDA4?

glomerulusVM7

DP1mDC1VM2

glomerulusVM1

glomerulusDC2VA7

glomerulusVA1dVA1v

glomerulusDC3

glomerulusglomerulus

 Figure  S2.  (data  related  to  Figures  2-­‐3):  Clustering  quality  for  oVLNP  in  both  datasets.  (A)  Quantification  of  similarity  between  clusters  as  measured  by  voxel-­‐to-­‐voxel  co-­‐expression  distance  ( ,  where  s  is  the  Dice  coefficient  between  the  two  sets  of  enhancer  expression)  for  each  medoid  of  every  cluster  of  run  1  in  the  oVLNP  region  using  the  Janelia  dataset.    (B)  Dendrogram  of  agglomerative  hierarchical  clustering  using  average  linkage  showing  a  representation  of  co-­‐expression  distance  between  medoids  in  the  oVLNP  region  of  the  Janelia  dataset.    (C)  Quantification  of  similarity  between  clusters  as  measured  by  voxel-­‐to-­‐voxel  co-­‐expression  distance  for  each  medoid  of  every  cluster  in  the  oVLNP  region  of  run  1  the  Vienna  dataset.    (D)  Dendrogram  as  in  B  using  the  Vienna  dataset.  

Janelia FlyLight cluster

A B

C D

Jane

lia F

lyLi

ght c

lust

er

0.0 1.0co-expression distance(metric Dice distance)

Vienna Tiles cluster

Vien

na T

iles

clus

ter

0.0 1.0co-expression distance(metric Dice distance)

0.0 0.5 1.0average linkage distance

0.0 0.5 1.0average linkage distance

441833252115626523129235855204130446163735711514540287492412321460484250275938475834174322919525613533954363110

282353572554161225630523124411374219509546432672139361413494027445118358383561176059201541478553222453348341029

1− s

 Figure  S3.  (data  related  to  Figures  2-­‐3):  Automatically  assigned  oVLNP  singleton  clusters  colocalize  with  manually  segmented  optic  glomeruli,  repeated  clustering  of  the  same  dataset  gives  similar  results,  and  clustering  of  different  datasets  gives  similar  results.  (A-­‐B)  Colocalization  similarity  (measured  as  Dice  coefficient  s  on  the  set  of  voxels  in  the  manually  annotated  region  and  the  set  in  the  clustering  result)  between  the  Janelia  FlyLight  dataset  and  manual  assignments  using  the  same  3D  template  brain.  (Janelia  FlyLight  data  for  oVLNP,  42317  voxels,  3462  driver  lines,  k  equal  60.)    (C-­‐D)  Colocalization  similarity  between  the  Vienna  Tiles  dataset  and  manual  assignments  using  the  same  3D  template  brain.  (Vienna  Tiles  data  for  oVLNP,  13458  voxels,  6022  driver  lines,  k  equal  60.)  

B

D

0.00

0.32

0.64

co-lo

caliz

atio

n si

mila

rity (s)

(Dic

e co

effic

ient

)

automatic singleton cluster assignment (run 2, Janelia FlyLight dataset)

man

ual a

nnot

atio

n

C17

C48

C52

C03

C32

C45

C26

C33

C47

C08

C07

C10

C30

C57

C14

C12

C25

C01

C59

C49

C05

C04

C02

C06

C09 C11

C13

C15

C16

C18

C19

C20

C21

C22

C23

C24

C27

C28

C29

C31

C34

C35

C36

C37

C38

C39

C40

C41

C42

C43

C44

C46

C50

C51

C53

C54

C55

C56

C58

C60

LC04LC06LC09LC10LC11LC12LC13LC15LC16LC17LC18LC20LC21

LC22/LPLC4LC24LPC1

LPLC1LPLC2LPLC3MC61MC62MC63

automatic singleton cluster assignment (run 2, Vienna Tiles dataset)

man

ual a

nnot

atio

n

C24

C42

C02

C33

C16

C13

C58

C40

C49

C38

C44

C18

C19

C14

C22

C59

C30

C06

C28

C17

C41

C55

C01

C03

C04

C05

C07

C08

C09

C10 C11

C12

C15

C20

C21

C23

C25

C26

C27

C29

C31

C32

C34

C35

C36

C37

C39

C43

C45

C46

C47

C48

C50

C51

C52

C53

C54

C56

C57

C60

LC04LC06LC09LC10LC11LC12LC13LC15LC16LC17LC18LC20LC21

LC22/LPLC4LC24LPC1

LPLC1LPLC2LPLC3MC61MC62MC63

A

C

automatic singleton cluster assignment (run 1, Janelia FlyLight dataset)

C33

C57

C32

C22

C07

C05

C46

C28

C37

C23

C29

C43

C40

C16

C53

C30

C18

C44

C35

C56

C48

C50

C01

C02

C03

C04

C06

C08

C09

C10 C11

C12

C13

C14

C15

C17

C19

C20

C21

C24

C25

C26

C27

C31

C34

C36

C38

C39

C41

C42

C45

C47

C49

C51

C52

C54

C55

C58

C59

C60

LC04LC06LC09LC10LC11LC12LC13LC15LC16LC17LC18LC20LC21

LC22/LPLC4LC24LPC1

LPLC1LPLC2LPLC3MC61MC62MC63

man

ual a

nnot

atio

n

0.00

0.32

0.64

co-lo

caliz

atio

n si

mila

rity (s)

(Dic

e co

effic

ient

)

automatic singleton cluster assignment (run 1, Vienna Tiles dataset)

man

ual a

nnot

atio

n

C26

C27

C60

C55

C18

C06

C51

C44

C40

C38

C01

C14

C02

C42

C16

C05

C07

C21

C46

C34

C56

C57

C03

C04

C08

C09

C10 C11

C12

C13

C15

C17

C19

C20

C22

C23

C24

C25

C28

C29

C30

C31

C32

C33

C35

C36

C37

C39

C41

C43

C45

C47

C48

C49

C50

C52

C53

C54

C58

C59

LC04LC06LC09LC10LC11LC12LC13LC15LC16LC17LC18LC20LC21

LC22/LPLC4LC24LPC1

LPLC1LPLC2LPLC3MC61MC62MC63

 Table  S1.  (data  related  to  Figure  4):  Table  with  VPN,  Clusters,  Driver  lines,  Flycircuit  IDs.  Note:  MC63  may  be  synonymous  with  VPN-­‐MB1  [S3],  which  was  published  while  this  study  was  under  review.  

C (J

anel

ia F

lyLi

ght

data

set)

C' (

Vien

na T

iles

data

set)

C'' (

Jane

lia

FlyL

ight

dat

aset

, 2n

d ru

n)

C'''

(Vie

nna

Tile

s da

tase

t, 2n

d ru

n)

LC04

Col

A (M

u et

al.,

201

2; S

traus

feld

&

Oka

mur

a, 2

007;

Stra

usfe

ld a

nd

Hau

sen,

197

7)G

MR

26G

09, G

MR

47H

03VT

0427

58, V

T046

005

Cha

-F-0

0013

8, C

ha-F

-200

257,

Gad

1-F-

3002

56C

33, C

21, C

15, C

25C

'26,

C'3

9C

''02,

C''1

7C

'''24

LC06

S4 (F

isch

bach

and

Lyl

y-H

üner

berg

, 198

3)G

MR

41C

07, G

MR

22A0

7VT

0065

49, V

T009

855

Cha

-F-0

0003

9, G

ad1-

F-40

0244

, Gad

1-F-

2003

26C

57C

'27

C''4

8C

'''42

LC09

S4 (F

isch

bach

and

Lyl

y-H

üner

berg

, 198

3)G

MR

71C

02, G

MR

14A1

1VT

0142

09,

VT00

5102

, VT0

2770

4C

ha-F

-000

028,

Gad

1-F-

7001

45, G

ad1-

F-20

0274

C32

, C14

C'5

9, C

'60

C''5

2, C

''56,

C''3

5C

'''02

LC10

S3 (F

isch

bach

and

Lyl

y-H

üner

berg

, 198

3)G

MR

22D

06, G

MR

35D

04VT

0217

60, V

T043

920

Gad

1-F-

1000

80, C

ha-F

-300

390,

fru-

F-80

0100

C22

, C09

, C19

C'3

2, C

'55,

C'4

8,

C'2

9C

''03,

C''5

4, C

''49,

C

''06

C'''3

3, C

'''34,

C'''5

0

LC11

L1C

N (M

u et

al.,

201

2)G

MR

23D

02, G

MR

87B0

4, G

MR

51F0

9, G

MR

22H

02VT

0049

68, V

T008

647,

VT0

0496

7C

ha-F

-000

153,

Cha

-F-2

0013

2, G

ad1-

F-30

0060

C07

, C45

C'1

8C

''32.

C''3

0C

'''16

LC12

GM

R59

B10,

GM

R35

D04

, GM

R19

G01

VT06

2247

, VT0

4091

9C

ha-F

-000

124,

Cha

-F-0

0001

5, V

Glu

t-F-0

0005

6, V

Glu

t-F-4

0034

7C

26, C

05C

'06

C''4

5C

'''39,

C'''1

3

LC13

GM

R50

C10

, GM

R14

A11

VT05

7283

, VT0

2577

1C

ha-F

-000

255,

Cha

-F-1

0000

3, G

ad1-

F-10

0040

C46

C'5

1C

''26,

C''0

1C

'''58

LC14

DC

neu

rons

(Has

san

et a

l., 2

000)

GM

R21

H10

, GM

R12

F01,

GM

R58

H11

VT03

7804

Cha

-F-4

0022

8, C

ha-F

-400

231,

Gad

1-F-

3000

16x

C'0

3C

''34

C'''0

8

LC15

GM

R42

H06

, GM

R24

A02

VT01

4207

, VT0

4787

8, V

T012

320

Cha

-F-0

0036

1, C

ha-F

-100

351

C28

C'4

4C

''33,

C''2

1C

'''41,

C'''4

0

LC16

GM

R32

D04

, GM

R25

G03

VT06

1079

, VT0

2577

1G

ad1-

F-10

0202

, Cha

-F-0

0031

6, fr

u-F-

0000

32, V

Glu

t-F-0

0060

3C

37, C

03C

'40,

C'2

7C

''47

C'''4

9

LC17

GM

R21

B04,

GM

R65

C12

VT03

4259

, VT0

3330

1C

ha-F

-100

017,

Cha

-F-0

0000

4, G

ad1-

F-00

0025

C23

, C26

, C01

C'3

5, C

'38,

C'5

8C

''08,

C''4

5C

'''38,

C'''2

9, C

'''35,

C

'''11,

C'''3

9, C

'''60,

C

'''12

LC18

GM

R92

B11

VT00

8183

5-H

T1B-

F-50

0016

, Cha

-F-0

0033

3, fr

u-F-

2000

61, G

ad1-

F-30

0054

C29

, C02

C'0

1C

''07,

C''5

3C

'''37,

C'''4

4

LC20

GM

R17

A04,

GM

R71

G09

VT02

5718

VGlu

t-F-2

0056

4, V

Glu

t-F-7

0016

3, G

ad1-

F-20

0101

C43

xC

''10

x

LC21

GM

R85

F11,

GM

R25

A07

VT01

4960

Gad

1-F-

4001

02, C

ha-F

-300

208

C40

, C28

, C07

C'1

8C

''30,

C''4

0C

'''40,

C'''1

6

LC22

: Gad

1-F-

9000

22, C

ha-F

-600

134,

VG

lut-F

-500

700

LPLC

4: G

ad1-

F-20

0058

, Cha

-F-2

0030

2, C

ha-F

-200

028

LC24

GM

R20

G09

VT03

8216

Cha

-F-0

0028

3, C

ha-F

-200

073,

Cha

-F-4

0011

6C

37C

'40

C''4

7C

'''10

LPLC

1LP

L2C

N (M

u et

al.,

201

2)G

MR

36B0

6, G

MR

12G

03VT

0077

67C

ha-F

-200

219,

Cha

-F-3

0003

5, G

ad1-

F-40

0140

C18

, C44

, C25

C'0

7C

''25

C'''3

0

LPLC

2G

MR

75G

12, G

MR

12E0

4VT

0071

94, V

T049

479

Gad

1-F-

0003

00, C

ha-F

-100

287,

Cha

-F-3

0011

1C

44C

'21

C''2

5C

'''06,

C'''3

0

LPLC

3G

MR

9C11

, GM

R49

A05

VT04

4492

, VT0

6262

4C

ha-F

-100

027,

Cha

-F-3

0000

4, G

ad1-

F-20

0099

, fru

-F-5

0000

9C

35, C

55, C

20, C

30C

'46,

C'0

5, C

'09

C''5

9, C

''13,

C''1

9C

'''28,

C'''1

4

LPC

1G

MR

37G

12, G

MR

77A0

6, G

MR

81A0

5, G

MR

20A0

9 (s

ubse

t)VT

0460

05VG

lut-F

-700

361,

Cha

-F-0

0027

2, fr

u-F-

0001

01C

04, C

30, C

20C

'05

C''1

2, C

''59,

C''1

9C

'''46

MC

61LC

10c

(Ots

una

& Ito

, 200

6)G

MR

53B0

8VT

0020

72, V

T021

203

Gad

1-F-

4000

23, C

ha-F

-300

285,

Cha

-F-2

0002

6,

C56

C'3

4, C

'10

C''4

9C

'''17

MC

62G

MR

78G

04, G

MR

85C

01VT

0626

24no

ne id

entif

ied

C48

C'5

6C

''05

x

MC

63VP

N-M

B1?

(Vog

t et a

l., 2

016)!

GM

R72

C11

VT02

2290

, VT0

0818

3, V

T017

001

Cha

-F-2

0010

3C

42, C

48C

'25,

C'5

6C

''04,

C''1

1, C

''05

C'''5

5

Lat

GM

R16

G04

, GM

R13

E10,

GM

R85

G07

, GM

R39

F04

VT04

5604

, VT0

1496

3, V

T033

613

TH-F

-200

107,

Trh

-F-1

0001

9, T

H-F

-100

004,

Cha

-F-3

0033

3C

50, C

42C

'30,

C'5

2, C

'56,

C

'57

C''0

4C

'''55

Clu

ster

s co

rresp

ondi

ng to

opt

ic g

lom

erul

us o

r tra

ct a

ssoc

iate

d w

ith a

VPN

GM

R24

A05

VT05

8688

LC22

/LPL

C4

C16

C'4

2, C

'19

C''5

7C

'''14

VPN

type

Syno

nym

sBe

st e

nhan

cers

iden

tifie

d fo

r neu

ron

type

from

Jan

elia

G

AL4

libra

ryBe

st e

nhan

cers

iden

tifie

d fo

r neu

ron

type

fro

m V

ienn

a til

es (V

T) G

AL4

libra

ryFl

yCirc

uit.t

w -

Sing

le c

ell e

xam

ples

for n

euro

n ty

pe

Supplemental  Experimental  Procedures  

Thresholding,  Dice  similarity,  k-­‐Medoids,  and  Hierarchical  Agglomeration  

GAL4  expression  patterns  were  transformed  into  a  binary  representation  in  two  steps.  

First,  the  image  is  thresholded  and  second,  morphological  opening  (dilation  of  the  erosion  

by  a  3x3x3  structuring  kernel)  is  applied  to  reduce  clutter.  The  threshold  was  chosen  so  

that  the  resulting  mask  yielded  1%  stained  voxels.  This  simple  heuristic  was  more  reliable  

for  the  datasets  tested  compared  to  other  standard  automatic  thresholding  methods.  

From  the  binarized  images,  the  set  of  expressing  lines  was  assembled  for  each  voxel.  

Similarity  between  voxels  based  on  the  respective  expression  set  from  voxel  A  and  the  set  

from  voxel  B  is  computed  using  Dice’s  coefficient  as    where  ∩  denotes  

intersection  and  ∣x∣  denotes  the  number  of  elements  in  set  x.  To  decrease  the  effects  of  

registration  error  and  image  acquisition  noise  and  to  increase  the  speed  of  subsequent  

processing  steps,  we  binned  the  original  image  voxel  data  into  larger  voxels,  using  a  3x3x3  

nearest-­‐neighbor  downsampling.  Analysis  was  performed  on  specific  brain  regions  (e.g.  

antennal  lobe  or  oVLNP)  defined  by  a  3D  brain  atlas  of  neuropils  (included  in  the  

supplemental  data).  Voxels  in  the  bounding  cube  but  not  in  the  defined  neuropil  were  

excluded.  The  k-­‐medoids  algorithm  [S1]  was  run  in  Julia  0.4.0  using  JuliaStats  Clustering  

0.5.0  (see  Supplementary  file  1).  The  k-­‐medoids  was  performed  on  Dice  dissimilarity  (1-­‐s).  

To  agglomerate  the  medoids,  we  used  the  fastcluster  package  [S2]  with  Python  2.7.10  

using  average  linkage  with  metric  distance    between  medoids.  

Initial  clustering  was  performed  on  a  distance  matrix  found  as  follows.  For  each  voxel  

within  the  analyzed  brain  region  (e.g.  antennal  lobe  or  lateral  protocerebrum),  we  

calculated  the  set  of  driver  lines  for  which  GFP  expression  was  higher  than  a  threshold.  

We  used  the  Dice  coefficient  (a  measure  of  overlap,  see  above)  to  quantify  expression  

similarity  between  each  possible  pair  of  n  voxels.  This  n  x  n  distance  matrix  was  used  to  

group  voxels  into  clusters  of  similar  expression  using  k-­‐medoids  clustering,  a  standard  

clustering  technique  (Figure  1A,  see  Experimental  Procedures  for  details).  Clustering  with  

other  standard  algorithms  such  as  mini-­‐batch  k-­‐means  gave  qualitatively  similar  results,  

and  we  focus  here  on  k-­‐medoids  only  for  convenience.  As  typical  for  clustering  algorithms,  

one  parameter  controls  the  number  of  clusters,  and  in  our  case  we  chose  several  different  

values  for  k  and  evaluated  results  for  different  choices  and  in  each  of  the  two  independent  

datasets.  Every  voxel  in  the  analysis  is  assigned  to  exactly  one  cluster.  Neither  manual  

inspection  nor  calculation  of  a  metric  designed  to  measure  clustering  repeatability,  

adjusted  Rand  index  (Figure  S1A),  showed  an  obvious  optimal  value  for  k.  Therefore,  we  

chose  a  value  of  k  equal  60  as  a  number  which  appeared  to  provide  sufficiently  many  

s =2 A∩BA + B

1− s

clusters  to  capture  important  structures  at  a  small  scale  without  producing  an  

overwhelming  number.  The  result  of  the  initial  clustering  algorithm  is  the  assignment  of  

each  voxel  in  the  input  brain  region  to  one  of  k  clusters.  The  second  major  step,  

hierarchical  clustering,  took  the  cluster  centers  from  the  first  step  and  agglomerated  these  

‘singletons’  into  2k-­‐1  clusters.  

Evaluating  repeatability  of  clustering  

As  discussed  above,  automatic  calculation  of  a  measure  of  repeatability  (adjusted  Rand  

index,  Figure  S1A)  found  no  obvious  optimum  value  of  k  used  in  the  initial  clustering  step.  

Therefore,  we  sought  to  gain  a  more  biologically  meaningful  sense  of  consistency  across  

multiple  runs  of  the  algorithm  for  k=60  by  comparing  visually  the  results  of  manual  and  

automatic  segmentations.  We  did  this  for  the  oVLNP  with  each  of  four  different  clustering  

runs,  two  from  each  dataset  (Figure  S3).  The  results  show  that,  despite  different  random  

number  initialization  seeds,  most  optic  glomeruli  have  a  strong  correspondence  with  a  

singleton  cluster  across  repeated  runs  of  the  algorithm  within  and  across  the  two  datasets  

(Vienna  Tiles  and  Janelia  FlyLight).  This  indicates  substantial  biologically  meaningful  

repeatability  within  and  between  datasets  at  the  first  clustering  step,  which  agglomeration  

then  structures  hierarchically.  

 

Supplemental  References  

S1.  Kaufman,  L.,  and  Rousseeuw,  P.  J.  (1987).  Clustering  by  Means  of  Medoids.  In  Statistical  Data  Analysis  Based  on  the  L1  Norm  and  Related  Methods.  

S2.  fastcluster:  Fast  Hierarchical,  Agglomerative  Clustering  Routines  for  R  and  Python  Journal  of  Statistical  Software  https://www.jstatsoft.org/article/view/v053i09.  

S3.  Vogt,  K.,  Aso,  Y.,  Hige,  T.,  Knapek,  S.,  Ichinose,  T.,  Friedrich,  A.  B.,  Turner,  G.  C.,  Rubin,  G.  M.,  and  Tanimoto,  H.  (2016).  Direct  neural  pathways  convey  distinct  visual  information  to  Drosophila  mushroom  bodies.  eLife  5,  e14009.