Analysis of Chromosome Aberration Frequencies using ...

22
Analysis of Chromosome Aberration Frequencies using Algebraic Statistics Serkan Ho¸ sten Mathematics Department San Francisco State University May 19, 2014 Serkan Ho¸ sten Testing Proximity of Chromosome Territories

Transcript of Analysis of Chromosome Aberration Frequencies using ...

Page 1: Analysis of Chromosome Aberration Frequencies using ...

Analysis of Chromosome Aberration Frequenciesusing Algebraic Statistics

Serkan Hosten

Mathematics DepartmentSan Francisco State University

May 19, 2014

Serkan Hosten Testing Proximity of Chromosome Territories

Page 2: Analysis of Chromosome Aberration Frequencies using ...

Motivation

Definition

A chromosome territory is a discrete region within the cell nucleus that isoccupied by a chromosome.

Figure: Representation of Chromosome Territories. (Bolzer et al. 2005)

Serkan Hosten Testing Proximity of Chromosome Territories

Page 3: Analysis of Chromosome Aberration Frequencies using ...

Motivation

Chromosome positioning within cell’s nucleus is shown to influence

gene expression,

DNA damage processing,

genetic diseases and cancer.

Where? and How?: poorly understood.

Serkan Hosten Testing Proximity of Chromosome Territories

Page 4: Analysis of Chromosome Aberration Frequencies using ...

Motivation

Figure: Red indicates CT of chromosome 18 (gene-poor) and green, chromosome 19

(gene-rich) in the nucleus of a non-simulated human lymphocyte. Image courtesy

M.Cremer and I. Solovei.

In general,

chromosomes {1, 19, 22, 17} are located in close proximity to the center of nucleus.

chromosomes {18, 2, 4, 13} are located more on the boundary of nucleus.

Serkan Hosten Testing Proximity of Chromosome Territories

Page 5: Analysis of Chromosome Aberration Frequencies using ...

Chromosome Aberrations

Application of ionizing radiation to the cell nucleus during G0/G1 phase ofcell cycle causes double stranded breaks of DNA, which promotechromosome aberrations and rearrangements of pieces.

Definition

Interchange is an exchange-type chromosome aberration involving twodifferent chromosomes (i.e. non-homologous) chromosomes.

Serkan Hosten Testing Proximity of Chromosome Territories

Page 6: Analysis of Chromosome Aberration Frequencies using ...

Chromosome Aberrations

To study chromosome interchanges multicolor fluorescence in situhybridization (mFISH) method is used.

Figure: In this mFISH image we see that each chromosome is labeled in adifferent color, the chromosome exchanges are marked by the white arrows.

Serkan Hosten Testing Proximity of Chromosome Territories

Page 7: Analysis of Chromosome Aberration Frequencies using ...

Proximity-Effect Hypothesis

Definition

The proximity-effect hypothesis states that chromosomes which are inclose juxtaposition are more likely to have higher yield of interchanges .

Serkan Hosten Testing Proximity of Chromosome Territories

Page 8: Analysis of Chromosome Aberration Frequencies using ...

Data

Chr 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Sum

1 44 38 42 29 26 29 18 39 29 25 18 15 18 34 31 22 12 14 22 9 27 541

2 43 37 32 30 24 25 29 16 24 30 29 9 26 8 24 8 7 12 13 15 485

3 21 31 32 24 21 26 23 25 23 21 18 18 19 21 11 17 11 12 10 465

4 23 27 28 24 26 20 13 19 23 22 20 16 18 11 6 12 10 7 425

5 17 31 26 25 24 30 25 25 15 19 8 19 13 7 16 7 4 426

6 18 22 21 31 13 30 18 15 19 14 15 13 10 9 8 7 395

7 20 20 17 28 25 13 18 8 18 23 11 9 19 6 7 396

8 13 12 24 11 25 15 16 12 16 17 4 9 7 8 345

9 21 25 7 23 23 27 20 15 22 8 9 7 10 416

10 18 21 14 14 10 19 14 9 5 11 7 3 338

11 25 5 15 16 19 15 8 10 12 3 11 364

12 9 16 9 12 16 8 13 10 5 5 337

13 29 10 10 7 16 5 6 7 9 319

14 22 13 6 10 2 6 13 11 310

15 22 13 9 7 11 7 9 332

16 12 15 12 20 8 13 321

17 5 4 11 5 10 291

18 2 11 9 3 223

19 6 0 8 156

20 7 10 240

21 6 156

22 193

Table: Table of Chromosome Interchanges for 3585 Lymphocyte Cells. Each entry f (j , k) holds thenumber of cells in which at least one exchange between chromosome j and k was recorded (Arsuagaet. al. 2004).

Serkan Hosten Testing Proximity of Chromosome Territories

Page 9: Analysis of Chromosome Aberration Frequencies using ...

Log-Linear Model for Chromosome Interchange

Since 22 different colors were used to mark each chromosomeinvolved we have

(222

)= 231 heterologous autosome pairs.

Based on the possible outcomes of this experiment, we can define arandom variable X ∈ {(j , k) : 1 ≤ j < k ≤ 22} which takes 231 valuescorresponding to all possible colored pairs.

Serkan Hosten Testing Proximity of Chromosome Territories

Page 10: Analysis of Chromosome Aberration Frequencies using ...

Log-Linear Model for Chromosome Interchange

To model no proximitiy effect, consider ϕ : R22 → R231 defined by

ϕ(ϑ1, ..., ϑ22) = ϑiϑj ∈ R231,

where i 6= j for i , j ∈ {1, ..., 22} and ϑiϑj = pi ,j , the probability ofchromosome i and j interacting.

Serkan Hosten Testing Proximity of Chromosome Territories

Page 11: Analysis of Chromosome Aberration Frequencies using ...

Goodness of Fit Test

Consider the hypothesis testing problem:

H0 : p ∈M versus H1 : p 6∈ Musing

χ2(F ) =∑

1≤j<k≤22

(F (j , k)− f (j , k))2

f (j , k)

where f (j , k) are the entries of the maximum likelihood estimator (MLE)table with respect to the no-proximity effect model given the data table f .

To compute p-value of the test observe that

we can not use asymptotic test as data table has small counts ofexchanges for some chromosome pairs;

we can not use Fisher exact test as it is too computationallyexpensive since it requires enumeration of all 22× 22 tablespreserving minimal sufficient statistics.Use Markov Monte Carlo Chain (MCMC)

Serkan Hosten Testing Proximity of Chromosome Territories

Page 12: Analysis of Chromosome Aberration Frequencies using ...

Goodness of Fit Test

Consider the hypothesis testing problem:

H0 : p ∈M versus H1 : p 6∈ Musing

χ2(F ) =∑

1≤j<k≤22

(F (j , k)− f (j , k))2

f (j , k)

where f (j , k) are the entries of the maximum likelihood estimator (MLE)table with respect to the no-proximity effect model given the data table f .

To compute p-value of the test observe that

we can not use asymptotic test as data table has small counts ofexchanges for some chromosome pairs;

we can not use Fisher exact test as it is too computationallyexpensive since it requires enumeration of all 22× 22 tablespreserving minimal sufficient statistics.Use Markov Monte Carlo Chain (MCMC)

Serkan Hosten Testing Proximity of Chromosome Territories

Page 13: Analysis of Chromosome Aberration Frequencies using ...

Goodness of Fit Test

Consider the hypothesis testing problem:

H0 : p ∈M versus H1 : p 6∈ Musing

χ2(F ) =∑

1≤j<k≤22

(F (j , k)− f (j , k))2

f (j , k)

where f (j , k) are the entries of the maximum likelihood estimator (MLE)table with respect to the no-proximity effect model given the data table f .

To compute p-value of the test observe that

we can not use asymptotic test as data table has small counts ofexchanges for some chromosome pairs;

we can not use Fisher exact test as it is too computationallyexpensive since it requires enumeration of all 22× 22 tablespreserving minimal sufficient statistics.Use Markov Monte Carlo Chain (MCMC)

Serkan Hosten Testing Proximity of Chromosome Territories

Page 14: Analysis of Chromosome Aberration Frequencies using ...

MLE Table

Chr 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Sum

1 47 43 38 37 33 33 27 34 26 28 25 23 22 24 23 20 14 8.9 15 8.8 11 541

2 37 32 32 29 29 24 30 23 25 22 21 20 22 21 18 13 8.3 14 8.2 11 485

3 30 30 27 27 23 28 22 24 22 20 19 21 20 18 13 8.2 14 8.2 11 465

4 27 24 24 21 26 20 22 20 18 18 19 18 16 12 7.8 13 7.8 10 425

5 24 24 21 26 20 22 20 19 18 19 19 17 12 8 13 8 10 426

6 22 19 24 19 20 19 17 17 18 17 16 12 7.7 12 7.7 9.7 395

7 19 24 19 20 19 17 17 18 17 16 12 7.9 13 7.9 9.9 396

8 20 16 18 16 15 15 16 15 14 11 7.2 11 7.2 9 345

9 20 21 20 18 18 19 18 17 13 8.6 13 8.5 11 416

10 17 16 15 15 16 15 14 11 7.3 11 7.4 9.2 338

11 17 16 16 17 16 15 11 8 12 8 9.9 364

12 15 15 16 15 14 11 7.6 12 7.6 9.4 337

13 14 15 15 13 10 7.4 11 7.4 9.1 319

14 15 14 13 10 7.4 11 7.4 9 310

15 15 14 11 7.9 12 7.9 9.6 332

16 14 11 7.9 11 7.9 9.5 321

17 10 7.4 11 7.4 8.9 291

18 6 8.6 6 7.3 223

19 6.4 4.6 5.6 156

20 6.6 7.8 240

21 5.7 156

22 193

Table: MLE Table

Serkan Hosten Testing Proximity of Chromosome Territories

Page 15: Analysis of Chromosome Aberration Frequencies using ...

Markov Basis for the Second Hypersimplex

The model M is the toric model of the second hypersimplex ∆[2, 22]given by the matrix A[2, 22] where the columns of the matrix areej + ek with 1 ≤ j < k ≤ 22.A quadratic lexicographic Groebner basis for the toric ideal of A[2, n]has been given by De Loera, Sturmfels, and Thomas (1995)

Serkan Hosten Testing Proximity of Chromosome Territories

Page 16: Analysis of Chromosome Aberration Frequencies using ...

Results of goodness of fit test

Based on the results of Metropolis-Hastings algorithm that generated1, 000, 000 tables (with 30, 000 steps inbetween each “pick”), we couldn’treject the no-proximity-effect model (p ≈ 1).

Serkan Hosten Testing Proximity of Chromosome Territories

Page 17: Analysis of Chromosome Aberration Frequencies using ...

MLE and Experiment Data

We have noticed that differences between expected values of interchanges and actual observed ones are bigger forsome chromosome pairs.

Chr 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

1 -3.2 -5.5 4.2 -8.2 -7.2 -3.8 -9 5 3.3 -2.9 -7 -8.1 -4 10 8.4 2.2 -2 5.1 6.8 0.2 16

2 6.2 4.6 -0.1 1.2 -4.6 1.1 -0.9 -6.9 -0.9 7.6 8.2 -11 4.4 -13 5.9 -4.9 -1.3 -2.1 4.8 4.3

3 -9.4 0.8 4.7 -3.2 -1.8 -2.4 1.1 1.2 1.4 0.9 -1.3 -2.8 -0.9 3.4 -1.7 8.8 -2.8 3.8 -0.52

4 -3.9 2.5 3.6 3.3 0.3 0.0 -8.7 -0.8 4.5 4.2 0.8 -2.4 1.7 -0.9 -1.8 -0.9 2.2 -3

5 -7.5 6.6 5.2 -0.6 3.9 8.2 5.1 6.4 -3 -0.4 -11 2.5 0.9 -1 2.9 -1.0 -6.2

6 -4.4 2.8 -2.5 12 -7.2 11 0.6 -1.8 0.9 -3.4 -0.5 1.5 2.3 -3.5 0.3 -2.7

7 0.7 -3.5 -1.7 7.7 6.4 -4.5 1.1 -10 0.5 7.3 -0.7 1.1 6.3 -1.9 -2.9

8 -7.2 -4.3 6.4 -5.2 9.7 0.1 0.0 -3.5 2 6.5 -3.2 -2.4 -0.2 -1

9 1.2 3.7 -13 4.5 5.1 7.9 1.6 -1.6 9.5 -0.6 -4.5 -1.5 -0.7

10 0.8 5 -1.1 -0.7 -5.8 3.7 0.1 -1.6 -2.3 -0.4 -0.4 -6.2

11 7.8 -11 -0.9 -1.0 2.6 0.1 -3.4 2 -0.3 -5 1.1

12 -6.2 1.2 -6.8 -3.3 2 -2.8 5.4 -1.6 -2.6 -4.4

13 15 -5 -4.6 -6.4 5.6 -2.4 -5.2 -0.4 -0.1

14 7.3 -1.3 -7.1 -0.2 -5.4 -5 5.6 2

15 6.8 -0.9 -2 -0.9 -0.7 -0.9 -0.6

16 -1.6 4.2 4.1 8.5 0.1 3.5

17 -5 -3.4 0.3 -2.4 1.1

18 -4 2.4 3 -4.3

19 -0.4 -4.6 2.4

20 0.4 2.2

21 0.4

Table: Deviation Between Observed and MLE Counts.

Serkan Hosten Testing Proximity of Chromosome Territories

Page 18: Analysis of Chromosome Aberration Frequencies using ...

Modified Log-Linear Model

To examine the effect of pairwise interaction for only one pair ofchromosomes {s, r} consider the following map:

ϕ : R22 → R232

defined asϕ(ϑ1, .., ϑ22) = ϑiϑjαij = pij (1)

where

αij =

{α if i = r and j = s,

1 otherwise.

Serkan Hosten Testing Proximity of Chromosome Territories

Page 19: Analysis of Chromosome Aberration Frequencies using ...

Log-Ratio Test

Let M0 be our original log-linear model and M be modified model.Observe that by construction our original log-linear model is nested withinmodified model.

We performed the following test:

H0 : p ∈M0 versus HA : p ∈M\M0

The likelihood ratio test statistic is:

G 2 = 2∑i<j

f (i , j) log

(f (i , j)

f0(i , j)

)

where f (i , j) is the MLE under the assumption that M is valid and f0(i , j)is the MLE under the assumption that M0 is valid.

Serkan Hosten Testing Proximity of Chromosome Territories

Page 20: Analysis of Chromosome Aberration Frequencies using ...

Results of Log-Ratio Test

p-value

Chromosome Pairs Table Table Table(Cornforth et al. 2001) (Lucas et al. 2002) (Arsuaga et al. 2004)

{1, 22} 0.001 0.007 0.001{1, 16} 1.000 1.000 0.104{2, 15} 0.800 0.500 0.102{3, 19} 1.000 1.000 0.161{6, 10} 1.000 1.000 0.169{6, 12} 1.000 0.162 0.284{8, 13} 0.589 1.000 0.535{9, 18} 1.000 1.000 0.342{9, 13} 0.120 1.000 1.000{11, 12} 1.000 1.000 0.316{13, 14} 0.879 1.000 0.007{16, 20} 0.918 1.000 0.535

Table: p-values adjusted with the Bonferroni correction for the log-ratio test of modified model andoriginal model based on all three observed datasets.

Serkan Hosten Testing Proximity of Chromosome Territories

Page 21: Analysis of Chromosome Aberration Frequencies using ...

Conclusion

1 We were not able to reject proposed log-linear model of no proximityeffect with p-value of 1 for the data set published in Arsuaga et al.2004.

2 We considered modified log-linear model with a proximity factor forthe chromosome pair of our interest. We could not reject modifiedmodel for chromosome pair {1, 22} and chromosome pair {13, 14}.

Serkan Hosten Testing Proximity of Chromosome Territories

Page 22: Analysis of Chromosome Aberration Frequencies using ...

Conclusion

1 We were not able to reject proposed log-linear model of no proximityeffect with p-value of 1 for the data set published in Arsuaga et al.2004.

2 We considered modified log-linear model with a proximity factor forthe chromosome pair of our interest. We could not reject modifiedmodel for chromosome pair {1, 22} and chromosome pair {13, 14}.

Serkan Hosten Testing Proximity of Chromosome Territories