Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.;...

17
Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU [email protected] 1

Transcript of Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.;...

Page 1: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

Probabilistic Reasoning in Data Analysis

Lawrence SirovichMt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU

[email protected]

1

Page 2: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

Coin Tossing

Urn Model: Urn filled with Nb, black, and Nw ,white marbles, chosen at random with replacement.

Experiments with probability

More generally …

and for all permutations we obtain the binomial probability distribution (slide 3)

Table for 3 tosses

2

where Nh= number of heads in N trials.

Generalize to k colors so that

Probability of q heads followed by p tails

Page 3: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

This is the gamma distribution of rate λ, since <tGs>= λ-1

Observations on probabilities

3

}

, for x discrete or continuous

Factorial for noninteger n defined by

Therefore

is a probability in x, since

}

, for x discrete or continuous}

, for x discrete or continuous}

, for x discrete or continuous}

, for x discrete or continuous, for x discrete or continuous}

, for x discrete or continuous, for x discrete or continuous

From

and a little thought it is seen that Bn(k) is probability of k heads in n trials.

Page 4: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

Which yields Stirling’s formula:

Gaussian probabilty:

Miscellanea

4

0

! exp( log )n y n y dy

Hint: Expand log for 1/n small in log(1 / )(1 / )n n x nx n e

Hint: Integrand has a max at y = n, and area is concentrated therein

Probability distribution functions, pdfs.

Page 5: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

Binomial probability distribution

Binomial distribution: A special case

Assume that:

Thus, if

This is called the Poisson distribution

As with all probability distributions

5

and Stirling’s formula are substituted

, a pdf in k for t fixed.0

/ !.x k

k

e x k

Hint:

Page 6: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

Expected Outcomes

6

For pdf P(x) and some function f(x) define expectation

Thus, the average or mean is

Page 7: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

Biological examples: Photon arrivals activating retinal photoreceptorsSpontaneous neurotransmitter release events at a synapse

Suppose N(t) is the number of events (arrivals) in the time t;then the arrival rate is estimated by

Random Arrivals: Events that do not depend on prior history

7

The approximate number of events for any t is

Since the process is memoryless, the pdf in Ƭ satisfies

This functional relation can only be satisfied by an exponential

Page 8: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

Poisson Process

This is the probability of waiting times for a Poisson process

(not the same as a Poisson Distribution)

Average or expected waiting time defined by:

8

This form guarantees

Since <t>=1/λ

Page 9: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

Consequences of the Poisson process

9

For any time t = t1, the probability of an event in the interval t is

A consequence of this is that the probability of no event is

1 1

1 10

( ) ( ) ( ) (1 ) ( )

( ) ( ) ( )

k k k

k k kdt

P t dt dt P t dt P t

dP t P t P t

dt

Since λdt is the probability of an event, and 1 – λdt a nonevent, in an increment dt in general

The previously defined Poisson pdf satisfies this differential equation,

which justifies the notation. Recall

The variance is given by so that mean & variance are equal.

Page 10: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

Poisson Processes in Biology

A classic, Nobel-worthy, paper

Hecht, Shlaer, & Pirenne, J. Gen. Physiol. 25:819-840, 1942.

addresses the question: How many photons must be captured by the retina for the subject to correctly perceive an event?

In the psychophysical experiment, subjects are exposed to brief 1-ms flashes of light to determine the probability dependence of

perception on the brightness of the flash.

Quantal content of 1-ms flash

It is a reasonable hypothesis that absorption of quanta by the retina obeys a Poisson distribution

10

Page 11: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

The Cumulative Poisson pdf

Note: The curves can be distinguished from one anotherby the steepness, which depends on n.

11

The probability that n or more photons are detected is described by the cumulative pdf:

Page 12: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

Original data from Hecht, Shlaer, & Pirenne

These data are to be analyzed in the Problem Set

Hecht, Shlaer, & Pirenne, J. Gen. Physiol. 25:819-840, 1942.

12

Page 13: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

Poisson Processes in BiologyRelated studies

Boyd & Martin, J. Physiol. 132:74-91, 1956.

References

Luria & Delbrück, Genetics 28:491-511, 1943.13

del Castillo & Katz, J. Physiol. 124:560-573, 1954.

Next, we consider the work of Bernard Katz (del Castillo & Katz, 1954) who recorded nervous activity at the neuromuscular junction and noted low-level persistent voltage activity, later called mini end-plate potentials. He pursued the origin of this noise and conjectured that it was due to the synaptic release of vesicles of neurotransmitter of uniform size and further hypothesized that the number of arriving vesicles followed a Poisson distribution. The ensuing experiments confirmed his speculations and contributed to his Nobel prize in 1970. The next two slides are based on subsequent verification (Boyd & Martin, 1956), andsummarize the experimental and theoretical deliberations that went into this brilliant scientific effort. In a complementary vein, Luria & Delbrück (1943) demonstrated that bacterial mutations were of random origin. In effect, they did this by refuting the hypothesis that that the mutations were governed by Poisson statistics; in the process, they established the genetic basis of bacterial reproduction and were awarded the Nobel prize for this work in 1969.

Page 14: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

Neuro- muscular Vesicle ReleaseFluctuations in post-synaptic end plate potential (e.p.p.) as re-

investigated by Boyd & Martin (1956)

14

This publication reports on No = 198 trials measuring postsynaptic epps in response to a single presynaptic neural impulse. The inset to the figure below is the histogram of spontaneous activity-no upstream impulse.

Under Katz’s hypotheses, this implies that a single vesicle produces a 0.4-mV fluctuation Over all trials, the mean fluctuation was 0.993 mV. Therefore, the mean arrival is m = 0.993/0.4 = 2.33.

Page 15: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

The Gaussian Fit

15Boyd & Martin, J. Physiol. 132:74-91, 1956

The continuous curve in the previous slide was created by a Gaussian fit, as explained in the figure legend below, that allows for the inclusion of the side bars

that are seen in the above histogram.

0 0 / !k mkN p N m e kThis implies that the number vesicles is given by: 0 0 / !k mkN p N m e kThis implies that the number vesicles is given by:

Page 16: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

Theory vs. Experiment

Read this paper to see how beautifully all the data agree with the hypothesis that evoked release follows a Poisson distribution

k 0 1 2 3 4 5 6 7 8

Poisson 19 44 52 40 24 11 5 2 1

Experiment 18 44 55 36 25 12 5 2 1

Comparison between theory and experiment is summarized in the following table

Page 17: Probabilistic Reasoning in Data Analysis Lawrence Sirovich Mt. Sinai School of Med.; Rockefeller U.; Courant Inst., NYU lsirovich@rockefeller.edu 1.

www.sciencesignaling.org

Slides from a lecture in the course Systems Biology—Biomedical Modeling

Citation: L. Sirovich, Probabilistic reasoning in data analysis. Sci. Signal. 4, tr14 (2011).