Skript Probability Oct12 2013
Transcript of Skript Probability Oct12 2013

7/27/2019 Skript Probability Oct12 2013
1/32
Probability
Dr. Michael Hinz, Bielefeld University
WS 2013/2014, Mo 1618 T2233, Tue 1618 T2204

7/27/2019 Skript Probability Oct12 2013
2/32
Contents
1 Introduction 1
2 Discrete probability spaces 72.1 Basic notions . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Conditional probability and independence . . . . . . . . . . . 182.3 Discrete random variables . . . . . . . . . . . . . . . . . . . . 232.4 Bernoulli trials . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Absolutely continuous distributions 25
4 Measure theory and Lebesgue integration 26
5 Product spaces and independence 27
6 Weak convergence and the Central Limit Theorem 28
7 Conditioning 29
8 Martingale theory 30
ii

7/27/2019 Skript Probability Oct12 2013
3/32
Chapter 1
Introduction
Any form of intelligent life  and a human being in particular  extractsinformation from the sensorial input provided by the immediate environment.This information is observed and then processed further. Its evaluation(e.g. danger or reward) allows to make decisions and to establish appropriateroutines and reactions (e.g. flight or cooperation). The single most importantmechanism in evaluating information that is trial and error. But on aphilosphical level, this is nothing but testing a hypothesis by an experiment:
An experiment is an orderly procedure carried out with the goal
of verifying, refuting or establishing the validity of a hypothesis.(URL: http://en.wikipedia.org/wiki/Experiment)
1

7/27/2019 Skript Probability Oct12 2013
4/32
2 CHAPTER 1. INTRODUCTION
A hypothesis is a proposed explanation for an observed phe
nomenon.(URL: http://en.wikipedia.org/wiki/Hypotheses)
Sometimes the idea of trial and error  source of all intelligent behaviour is shamelessly disregarded. This also happens often in the educationalsystem, where the terrible habit of considering mistakes something bad iswidespread. This is illogical, because without mistakes there cannot be anylearning process, any acquisition of reliable knowledge ! After having completed our learning process up to an adequate level we of course hope to haveestablished precise and correct thinking.
Maybe an experiment has more to do with perception than with physicalreality. This is an issue people discuss in constructivism. But in any casewe should keep in mind that what exactly is the experiment, what are itspossible outcomes, and what is the hypothesis to be tested, will always be amatter of agreement.
A sort of trivial experiment is an experiment with a predetermined outcome. In this case it suffices to perform this experiment just once to decidewhether the hypothesis is true or false. For example, we may throw a ballin the air and wait for things to happen. We assume our strength is finiteand the mass of the ball is positive. According to classical mechanics it willsooner or later fall down. The hypothesis the ball stays in the air forevercan be disproved at once. (Note however that we made silent and intrinsicassumptions: sooner or later fall down might suggest that we are standingon or above the surface of the earth, and that earth will not stop to existwhile the ball is in the air. In other words: We have to explain what wemean by common sense.)
In more interesting cases the experiment has several possible outcomes,i.e. its outcome is not predetermined. It is reasonable assume the collectionof all possible outcomes of the experiment is known. For example, we may
toss a coin (to be precise, an ideal coin that will always land on a face andnever remain standing on its edge). The hypothesis the coin always showshead can be disproved, but we may need to perform our experiment severaltimes to do so.

7/27/2019 Skript Probability Oct12 2013
5/32
3
The (theoretical) design of our experiment depends on the amount of
information we can observe and on the sort of information we would liketo extract. Suppose, for example, we record the temperature in a certaingeographical location. Suppose that measurements are taken daily over theperiod of one year of 365 days and that we are close to the equator, such thatseasons do not play a role. Assume that our thermometer only yields integernumbers. If we are interested in a single large set of data consisting of 365numbers, then this set may be considered an outcome of the experiment, andno further discussion is needed. But this is interesting only if our hypothesisneccessarily has to be phrased in terms of 365 numbers. If we would liketo compare two locations, it may be more intelligent to ask for an averagetemperature, i.e. one single number. This amounts to the idea of having
an experiment whose possible outcomes are integer numbers and which isperformed repeatedly for 365 times. If we plot the absolute numbers ofthe occurrences of certain temperatures in a histogram, then we may try toconclude that they obey a certain distribution. In other words: We thinkof our experiment as an experiment with random outcome and ask for aprobabilistic structure. This is filtering information: For instance, this ideadoes not tell whether it is 20oC today or yesterday.
When an experiment is performed repeatedly, then two ideas arise:
We may want to record, evaluate and interpret the obtained data. Forinstance, we may want to look at a time series to see whether we candetermine distributions or dependencies. Observing and interpreting
are the objectives of statistics.
We may want to model the phenomenon and extract further information from the model: Once we know the physical system under observation shows typical statistical features, we can try to come up with a

7/27/2019 Skript Probability Oct12 2013
6/32
4 CHAPTER 1. INTRODUCTION
probabilistic model. This model allows further theoretical conclusions
(independent of further physical observations) and maybe used to forecast the future behaviour of the system. The model and the deductionof further information is the task of probability.
In this sense probability is talking about models for experiments withseveral possible outcomes. We do not talk about reality, whatever reality is.Whether a probabilistic model is considered appropriate for a physical experiment is usually decided in subsequent steps of statistical testing or simplyby common sense. But in any case this is not a question to be decided bymathematics alone.
Another interpretation of assigning probabilities to certain events is tothink of it as a measure of belief, for instance think of opinion polls. Luckily, this idea can be captured by exactly the same mathematical idea as usedto model experiments, we dont have to come up with yet another theory.
As far as we know robability theory started in the sixteenth and seventeenth century (Cardano, Fermat, Pascal, Bernoulli, etc.; Bernoullis Arsconjectandi appeared 1713), mainly to analyze games of chance. Based onearlier results a great body of work was established in the nineteenth century (Euler, Lagrange, Gauss, Laplace, etc.; Laplaces Theorie analytiquedes probabilites appeared 1812). This is what we refer to as classical probability. It is already very useful and intuitive but still limited to modellingexperiments that have at most a countable number of possible outcomes orcan be described by probability measures that have densities (calculusbasedmethods, Riemann integration). We will follow this spirit in the first of ourlecture, and most concepts of probabilistic thinking can already be established at this level.Later in the nineteenth century other areas of mathematics bloomed, for instance set theory (Cantor, Dedekind, etc.), while at the same time probabilitytheory was merely unknown to the general public. In an encyclopedia fromaround 1890 it says that mathematical probability is the ratio with numerator
being the number of fortunate cases and denominator being the number of allpossible cases. They also describe an urn model, say that an increasing number of experimental runs improves the quality of observations using empiricalprobabilities, refer to Bernoulli and Laplace, and relate probability theory toGauss least squares method. Finally they mention a couple of contemporary

7/27/2019 Skript Probability Oct12 2013
7/32
5
textbooks on probability. But compared to what the encyclopedia tells about
geometry, algebra or analysis, the article is extremely short.

7/27/2019 Skript Probability Oct12 2013
8/32
6 CHAPTER 1. INTRODUCTION
In the course of the nineteenth century mathematicians started to under
stand that it would be clever to base probability on set theory (HausdorffsGrundzuge der Mengenlehre 1912 was an important influence for probability theory), because then abstract and general models for experiments withan uncountable number of outcomes could be formulated and investigated.This dream came with the question how to determine the probability of a set(called event), and after an intense discussion people realized that determining the probability of an event should be more or less the same as measuringlength, area or volume of a subset of the Euclidean space. This lead to theconcept of measure theory and to an axiomatization of probability theorybased on measure theory (along with Lebesgue integration), developed in theearly years of the last century (Hausdorff, von Mises, Wiener, Kolmogorov,
etc.) and commonly referred to as modern probability. Usually its inventionis attributed to Kolmogorov (Kolomgorovs Grundbegriff der Wahrscheinlichkeitsrechnung appeared 1933), but he himself remarked that these ideashad been around for some time. Modern probability is an incredibly flexibletheory, and closely related developments in physics or economics (Bachelier,Einstein, Smoluchowski, Wiener, Birkhoff, Bose, Boltzmann, etc.) went parallel. Because measure theory is a deep and somewhat abstract business closeto the axiomatic roots of mathematics we will only have a peek into some ofits concepts.
In terms of its strentgh and flexibility, probability theory took an evolution in two, maybe even three steps: discrete theory, theory based on calculusand modern theory based on measure theory. It still is a relatively youngmathematical subject.

7/27/2019 Skript Probability Oct12 2013
9/32
Chapter 2
Discrete probability spaces
Mon, Oct 142013In this chapter we have a look at key notions and ideas of probability with
a minimum of technicalities. This is possible for models that use discreteprobability spaces.
2.1 Basic notions
We use mathematical set theory to model an experiment.
There are different axiomatic systems mathematics can be based upon
(mathematics is neither true nor false, as shown bei Kurt Godel around1930, inspired by David Hilbert and Hans Hahn), and usually the notionset is explained in the chosen axiomatic system (for instance, the socalledZermeloFraenkel axioms). An earlier and less rigoros definition (at thattime axiomatic was not yet understood) was given by Georg Cantor in thelate nineteenth century:
A set is a gathering together into a whole of definite, distinctobjects of our perception or of our thought  which are calledelements of the set.
For our purposes let us please agree to accept this definition of the notionset.
Exercise 2.1.1. Study or review the customary notation and verbalizationof set theory (including power set), set relations and operations and relatednotions, Venn diagrams, connection to logical operations.
7

7/27/2019 Skript Probability Oct12 2013
10/32
8 CHAPTER 2. DISCRETE PROBABILITY SPACES
The collection of all possible outcomes of an experiment is represented
by a nonempty set , called the sample space. Its elements
are calledsamples or outcomes.
Examples 2.1.1.
(1) Tossing a coin: A reasonable model is = {H, T}, where H standsfor head and T for tails. The possible outcomes in this model are Hand T.
(2) Throwing a die: Common sense suggests = {1, 2, 3, 4, 5, 6}.
(3) Life span of a device or a living organism: A natural choice is =[0, +), the possible outcomes are all nonnegative real numbers 0 t < +. The outcome 0 [0, +) means broken from the beginningor born dead.
(4) Random choice of a number in [0, 1]: Of course = [0, 1].
Exercise 2.1.2.
(1) Find an appropriate sample space for tossing a coin five times.
(2) Find an appropriate sample space for two people playing rockpaperscissors.
A function (or map or mapping) f : from a set to a set is asubset of the cartesian product
:= {(, ) : , }
such that any is the first component of exactly one ordered pair (, ).If is the first component of (, ), called argument, we write f() todenote the second component , called value.

7/27/2019 Skript Probability Oct12 2013
11/32
2.1. BASIC NOTIONS 9
A function f : is called injective if f() = f() implies = .(Injectivity forbids two different arguments to have the same value.)
A set is called countable if there is an injective function from to the
natural numbers N. Such an injective map is called an enumeration of ourset.
Definition 2.1.1. A sample space is called discrete if it is countable.
If is discrete we can write = {j}j=1
, i.e. there exists an enumerationfor .

7/27/2019 Skript Probability Oct12 2013
12/32
10 CHAPTER 2. DISCRETE PROBABILITY SPACES
Examples 2.1.2. The sample spaces in Example 2.1.1 (1) and (2) are dis
crete, but those in (3) and (4) are not.Definition 2.1.2. A subset A of a discrete sample space is called anevent. An event of form {} with is called elementary event.
Note that an event is a set A consisting of certain outcomes . When theexperiment is practically (or mentally) carried out, then it yields a certainsample (outcome) . We say that the event A takes place or occurs oris realized with this sample if A.
Examples 2.1.3.
(1) Tossing a coin: The events are , , {H} and {T}.
(2) When throwing a die the idea to obtain an even number is representedby the event A := {2, 4, 6} = {1, 2, 3, 4, 5, 6}.
Exercise 2.1.3. Write down all events for tossing a coin.
Given two events A, B , we say that A or B occurs if A B occurs.Similarly, we say that A and B occur if A B occurs.
Exercise 2.1.4. When throwing a die, what is the event that the outcome isan even number and greater that two ? What is the event that it is an even
number or greater than two ?
Now some events may be particularly interesting for us, and we mightwant to somehow assign them a number telling how likely their occurrenceis.
Examples 2.1.4.
(1) Maybe {H} is interesting when tossing a coin to make an otherwisedifficult decision by chance.
(2) When throwing a dice we might want to know how likely it is to obtain
an even number. That would be the probability of {2, 4, 6}.
If the model is appropriate for the experiment, then the probability of anevent A should be close to the relative frequency (sometimes also calledempirical probability) observed when performing the experiment repeatedly

7/27/2019 Skript Probability Oct12 2013
13/32
2.1. BASIC NOTIONS 11
and in such a way that the different trials do not influence each other. If n
denotes the total number of trials, andnA the number of outcomes of theevent A, then the relative frequence is given by the ratio
nA
n.
Writing P(A) for the probability of A (which is yet to be defined at thispoint), then the best we could hope for is to observe the limit relation
limn
nA
n= P(A).
While the sample space is often dictated by the design of the experiment
or by common sense, the question which probabilites to assign to an eventreally is a matter of modelling. The following notion already provides acomplete mathematical model for many applications.
Definition 2.1.3. Let = {j}j=1
be a discrete sample space. A probability
measure P on the discrete sample space is a countable collection {pj}j=1
of numbers pj 0 with
j=1pj = 1. The probability of an event A isgiven by
P(A) :=
j1; jA
pj .
To (,P) we refer as discrete probability space.
If = {j}j=1
, {pj}j=1
and P are as in the definition, then obviously
P({j}) = pj , j = 1, 2,...
are the probabilities of the elementary events {j}.
Remark 2.1.1. This is a definition in the style of classical probability. Whendealing with uncountably infinite sample spaces (i.e. to describe experiments with a continuum of possible outcomes such as determining the lifespan of an organism), then it is usually impossible to come up with a reason
able way of assigning a probability to each subset A . Later we will seehow to fix this by regarding only a proper subset F of the power set P()as the collection of events.
We list a couple of properties of probability measures.

7/27/2019 Skript Probability Oct12 2013
14/32
12 CHAPTER 2. DISCRETE PROBABILITY SPACES
Lemma 2.1.1. Let(,P) be a discrete probability space.
(i) P() = 1 andP() = 0.
(ii) If A B thenP(A) P(B).
(iii) P(A B) = P(A) + P(B) P(A B).
(iv) P(Ac) = 1 P(A).
(v) P(
i=1 Ai)
i=1 P(Ai).
(vi) If A1, A2, A3,... are pairwise disjoint events (i.e. if Ai Ak = whenever i = k) then
P
i=1
Ai
=
i=1
P(Ai).
Proof. According to the definition we have P() =
j=1pj = 1, becausej for all j, and P() = 0, because there is no j such that j . Thisis (i). If A B we have
j1: jA
pj
j1: jB
pj,
what gives (ii). For (iii) note that
j1: jAB
pj =
j1: jApj +
j1: jB
pj
j1: jABpj
(substracting the last sum ensures each index j with j A B appearsexactly once on the right hand side of the equality). A special case of (iii)together with (i) gives (iv). Finally, we have
j1: j
i=1Ai
pj i=1
j1: Ai
pj
with equality if the Ai are pairwise disjoint. This proves (v) and (vi).
Remark 2.1.2. Given a discrete probability spaces (,P), we should try tothink of a probability measure P as a function P : P() [0, 1] from thepower set P() of into the unit interval [0, 1]. This function is normed by(i), monotoneby (ii), and the properties (v) and (vi) are called subadditivityand additivity, respectively.

7/27/2019 Skript Probability Oct12 2013
15/32
2.1. BASIC NOTIONS 13
Examples 2.1.5.
(1) Tossing a fair coin: = {H, T}, P({H}) = 12 .
(2) Throwing a fair die: = {1, ..., 6}, P({}) = p1 = ... = p6 =16
.
(2) Throwing an unfair (biased, marked) die: = {1, ..., 6}, P({6}) = p6 =12
and P({}) = p1 = ... = p5 =110
for {1, 2, 3, 4, 5}. Of coursethere are also many other ways to make a die unfair. But whether it isfair or not should be reflected by the choice of the probability measure.
(3) Tossing two fair coins at once (or tossing one coin twice but without anyinfluence between the two trials): = {(H, H), (H, T), (T, H), (T, T)}and P({}) = 1
4, .
(4) To open a safe one needs to know the correct code consisting of 4 digitseach between 0 and 9. We would like to know the probability of findingit by chance. In this case
= {(0, 0, 0, 0), (0, 0, 0, 1), ...., (9, 9, 9, 8), (9, 9, 9, 9)}
has 104 different elements, and common sense suggests that in a modelfor this problem each of these fourtuples should have the same probability, namely 104. Hence the probability to find the correct code is104.
(5) The letters A, B, C and D are randomly arranged in a line, each orderbeing equally likely. Then the probability to get the string ADCBis 1
4!= 1
24. The probability space for this example is made up by all
possible permutations of these letters,
= {ABCD,ABDC,ACBD,...,DCBA} .
(6) Suppose we have limited space and want to invite 20 out of our 100friends to a party. We cannot decide and do it randomly, but in afair manner (we do not prefer any combination). Given a fixed choice
of 20 friends, the probability that exactly these friends will be invitedois100
20
1. (This is a small number, maybe we better make a more
emotional, nonrandom choice ...) Here the probability space consistsof all combinations of 20 out of 100 elements, but is would be tedious towrite it down more explicitely. However, we know it has
100
20
elements.

7/27/2019 Skript Probability Oct12 2013
16/32
14 CHAPTER 2. DISCRETE PROBABILITY SPACES
(7) In a group there are 10 women and 10 men. We would like to form a
gender balanced team of four people. We do it in a fair manner andask for the probability that some fixed, preferred team we have in mindwill be the chosen one. The sample space could be thought of to consistof elements ((m1, m2), (w1, w2)), where (m1, m2) is a randomly chosencombination of two out of ten men, and (w1, w2) is a randomly chosencombination of 2 out of ten women. For each of these two choicesthere are
10
2
possibilities, hence our sample space will have
10
2
10
2
elements. The probability to choose the preferred team is
10
2
10
2
1.
Exercise 2.1.5. Why is it (strictly speaking) wrong to writeP(H) in Example2.1.5 (1) orP(1) in (2) ?
Please be careful. Some textbooks, encyclopedias or articles will nevertheless write these wrong expressions. This is often done to have a simplified(shorthand) notation, but with the agreement (often written somewhere, butsometimes silent) that this is written willingly to replace the mathematicallycorrect expression.
Remark 2.1.3. The best way to solve modelling problems is: First, determine the sample space and afterwards come up with a reasonable probabilitymeasure on it. Once the sample space is written correctly, the subsequentparts of the problem become much simpler.
Exercise 2.1.6. To design a new product 10 items have been produced, 8 ofthem are of high quality but 2 are defective. In a test we randomly pick twoof these 10 items. No item is preferred, and we do not place a drawn itemback. How likely is it that our random choice gives one high quality and onedefective item ?
Tue, Oct 15,
2013Lemma 2.1.2. (Inclusionexclusion principle)Let(,P) be a discrete probability space and letA1, A2,...,An be events. Then
Pn
i=1
Ai =n
i=1
P(Ai) 1i1

7/27/2019 Skript Probability Oct12 2013
17/32
2.1. BASIC NOTIONS 15
This generalizes Lemma 2.1.1 (iii). For n = 3 and three events A,B,C
Lemma 2.1.2 gives
P(A B C) = P(A) + P(B) + P(C)
P(A B) P(A C) P(B C)
+ P(A B C).
Exercise 2.1.7. Draw the corresponding Venn diagram for n = 3.
Exercise 2.1.8. Review or study the basic idea of proof by induction.
Proof. We proceed by induction. For n = 1 there is nothing to prove, and for
n = 2 the statement is known to be true by Lemma 2.1.1 (iii). We assumeit is true for n, this is called the induction hypothesis. We use the inductionhypothesis to prove the statement of the lemma for n = 1. If we manageto do so, it must be true for all natural numbers n and all choices of eventsA1,...,An, as desired.
Given n + 1 events A1,...,An, An+1, we observe
P
n+1i=1
Ai
= P
ni=1
Ai
An+1
= Pn
i=1
Ai+ P(An+1) Pn
i=1
Ai An+1by Lemma 2.1.1 (iii). The distributivity rule of set operations tells
ni=1
Ai
An+1 =
ni=1
(Ai An+1),
and by induction hypothesis this event has probability
Pn
i=1
(Ai An+1) =n
i=1
P(Ai An+1) 1i1

7/27/2019 Skript Probability Oct12 2013
18/32
16 CHAPTER 2. DISCRETE PROBABILITY SPACES
Using the induction hypothesis once more, now on n
i=1 Ai, we obtain
P
n+1i=1
Ai
=
ni=1
P(Ai) + P(An+1)
1i1

7/27/2019 Skript Probability Oct12 2013
19/32
2.1. BASIC NOTIONS 17
chosen permutation : {1, 2,...,n} {1, 2,...,n} has a fixed point (i.e.
some elementi
{1,
2,...,n
} with
(i) =
i).It makes sense to set the sample space to be the space of all different
permutations of {1, 2,...,n}. It has  = n! elements.If Ei := { : (i) = i} denotes the event that i is a fixed point, then
P(Ei1 Eik) =(n k)!
n!
for all 1 i1 < i2 < < ik n. Therefore1i1

7/27/2019 Skript Probability Oct12 2013
20/32
18 CHAPTER 2. DISCRETE PROBABILITY SPACES
(ii) If (En)n=1 is a decreasing sequence of events, then
limn
P(An) = P
n=1
An
.
Proof. Tor see (i) set
B1 := A1, B2 := A2 \ B1, . . . , Bn := An \
n1i=1
Bi
, . . .
Note that the Bi are pairwise disjoint,
i=1 Bi =
i=1 Ai, andn
i=1 Bi =
ni=1 Ai = An for all n. Then
P
i=1
Ai
= P
i=1
Bi
=
i=1
P(Bi) = limn
ni=1
P(Bi) = limn
P
ni=1
Bi
= limn
P(An).
Statement (ii) is left as an exercise, just use complements.
2.2 Conditional probability and independence
Sometimes we will only have partial information about an experiment, for
instance, we may have to impose or assume certain conditions in order todiscuss statistical features. If these conditions are nonrandom, we can somehow ignore them by choosing an adequate model. But sometimes theseconditions themselves are random, i.e. varying with the outcome of the experiment, and we need an appropriate probabilistic model to reflect this.
Examples 2.2.1. Consider a group of one hundred adults, twenty are womenand eighty are men, fifteen of the women are employed and twenty of the menare employed. We randomly select a person and find out whether she or heis employed. If employed, what is the probability that the selected person isa woman ?
Without the information on employment, we would have selected a womanwith probability 20
100= 1
5.
Knowing the selected person is employed, we can pass to another samplespace, now consisting of the thirty five employed people, and obtain theprobability 15
35= 3
7to have selected a woman.

7/27/2019 Skript Probability Oct12 2013
21/32
2.2. CONDITIONAL PROBABILITY AND INDEPENDENCE 19
In general it is more flexible to keep th sample space fixed but to imple
ment given information by conditioning.Definition 2.2.1. Let (,P) be a discrete probability space. Given twoevents A and B with P(B) > 0, the conditional probability of A given B isdefined as
P(AB) :=P(A B)
P(B).
Examples 2.2.2. For the previous example, set
A := {a woman selected}
andB := {the selected person is employed}
to see P(AB) = 37
.
The next Lemma is (almost) obvious.
Lemma 2.2.1. Let(,P) be a discrete probability space and B an event withP(B) > 0. ThenP(B) is again a probability measure.
We haveP(BB) = P(B) = 1. Moreover, statements (i)(vi) of Lemma2.1.1 and the statement of Lemma 2.1.2 remain valid forP(B) in place ofP.
More specific rules for conditional probability are the following.
Lemma 2.2.2. Let(,P) be a discrete probability space.
(i) If A and B both are events with positive probability, then
P(BA) =P(AB)P(B)
P(A).
(ii) If B1, . . . , Bn are pairwise disjoint events of positive probability andsuch that =
n
i=1 Bi then we have
P(A) =ni=1
P(ABi)P(Bi)
for any event A.

7/27/2019 Skript Probability Oct12 2013
22/32
20 CHAPTER 2. DISCRETE PROBABILITY SPACES
(iv) IfB1, B2, . . . are pairwise disjoint events of positive probability and such
that =
i=1Bi then we have
P(A) =i=1
P(ABi)P(Bi)
for any event A.
(iv) For any events A1, . . . , An withPn1
i=1 Ai
> 0 we have
P
ni=1
Ai
= P(A1)P(A2A1)P(A3A2 A1) P
An
n1i=1
Ai
.
Exercise 2.2.1. Prove Lemma 2.2.2.
(iii) and (iv) are called the law of total probability, (iv) is referred to asBayes rule.
Examples 2.2.3. We consider a medical test. Patients are being tested fora disease. For a patient sick with the disease the test will be positive in 99%of all cases. In 2% of all cases a healthy patient is tested positive. Statisticaldata show that one out of thousand patients really gets sich with the disease.We would like to know the probability that a tested patient indeed is sick.
We write S for the event {patient is sick}, + for {patient is tested positive}
and for {patient is tested negative}. We know that
P(S) = 0.001, P(+S) = 0.99 und P(+Sc) = 0.02.
We are looking for P(S+). By the law of total probability ,
P(+) = P(+S)P(S) + P(+Sc)P(Sc),
and therefore
P(S+) =P(S +)
P(+)=P(+S)P(S)
P(+)
=
0.99 0.001
0.99 0.001 + 0.02 0.999 ,
which is approximately 120
. This suggests that while the test may give someindication and serve as a first diagnostic tool, it is too inaccurate to allowthe conclusion of a diagnosis without performing further examinations.

7/27/2019 Skript Probability Oct12 2013
23/32
2.2. CONDITIONAL PROBABILITY AND INDEPENDENCE 21
Exercise 2.2.2. About 5% of all men and 1% of all women suffer from
dichromatism. Suppose that 60% of a group are women. If a person israndomly selected from that group, what is the probability that this individualsuffers from dichromatism ?
In our experiment or observation it may happen that the relative frequence of the occurence of some event A is not affected whether some othergiven event B occurs or not. In other words, ifn is the total number of trialsand nA denotes the number of trials such that A occurs, we observe that thenumbers
nA
nand
nAB
nB
are close to each other, at least for large n. A heuristic rearrangement ofthis relation is to say that the numbers
nAB
nand
nA
n
nB
n
are close. In the mathematical model this is formalized by the notion ofindependence.
Definition 2.2.2. Let (,P) be a discrete probability space. Two events Aand B are called independent if
P(A B) = P(A)P(B).
Events A1,...,An are called independent if
P(Aj1 Ajk) =ki=1
P(Aji)
for all distinct j1,...,jk {1, . . . , n}. The events A1, . . . , An are called pairwise independent if for any two different j, k {1, . . . , n} the two events Ajand Ak are independent.
(At first glance independence looks like some algebraic relation. In a
sense, this intuition is not all that wrong ... maybe at some point you willencounter or have encountered product measures, characteristic functionsand groups characters, etc.)
Examples 2.2.4. Three events A, B and C are independent if

7/27/2019 Skript Probability Oct12 2013
24/32
22 CHAPTER 2. DISCRETE PROBABILITY SPACES
1. P(A B C) = P(A)P(B)P(C),
2. P(A B) = P(A)P(B),
3. P(A C) = P(A)P(C) and
4. P(B C) = P(B)P(C).
Obviously independence implies pairwise independence. In general theconverse is false.
Exercise 2.2.3. Give an example of three events A,B,C that are pairwiseindependent but not independent.
Exercise 2.2.4. Verify that if two events A andB are independent then alsoA and Bc are independent and Ac and Bc are independent.
Whether in a mathematical model two events are independent or not isa consequence of the choice of the probability measure.
Examples 2.2.5. We toss two coins, not necessarily fair. A suitable samplespace is
:= {H, T}2 = {(1, 2) : i {H, T} , i = 1, 2} .
For a first model, let p, p (0, 1) and put
P({(H, H)}) = pp,
P({(T, H)}) = (1 p)p,
P({(H, T)}) = p(1 p),
P({(T, T)}) = (1 p)(1 p).
(2.1)
Then the events {1 = H} and {2 = H} are independent under P with
P({1 = H}) = p and P({2 = H}) = p. (2.2)
Conversely, it is not difficult to see that if we require these events to beindependent under a probability measure P and to have (2.2), then P has tobe as in (2.1).
For a second model, let the probability to get heads in the first trial bep (0, 1). If the first trial gives heads, then let the probability to get heads

7/27/2019 Skript Probability Oct12 2013
25/32
2.3. DISCRETE RANDOM VARIABLES 23
in the second trial be p (0, 1), otherwise let it be q (0, 1) with q = p.
For a probability measureQ
on satisfying these ramifications we must haveQ({1 = H}) = p,
Q({2 = H}  {1 = H}) = p,
Q({2 = H}  {1 = T}) = q.
This yields
Q({1 = H} {2 = H}) = Q({2 = H}  {1 = H})Q({1 = H})
= pp.
But on the other hand,
Q({2 = H}) = Q({2 = H}  {1 = H})Q({1 = H})
+Q({2 = H}  {1 = H})Q({1 = T})
= pp + q(1 p),
and therefore
Q({1 = H})Q({2 = H}) = p(pp + q(1 p))
= Q({1 = H} {2 = H}),
i.e. the events {1 = H} and {2 = H} are not independent under Q.
2.3 Discrete random variables
When performing and observing an experiment it is often useful to filteror rearrange information or to change perspective. For instance, we mightmeasure a temperature, viewed as random outcome of the experiment, andwant to calculate a reaction intensity that depends on the given temperature.Then this reaction intensity itself will be random. This, more or less, is the
concept of a random variable, or more generally, a random element.
Definition 2.3.1. Let (,P) be a discrete probability space and E = . Afunction X : E is called a random element with values in E. A functionX : R is called a random variable.

7/27/2019 Skript Probability Oct12 2013
26/32
24 CHAPTER 2. DISCRETE PROBABILITY SPACES
A random variable is a random element with values in E = R.
Notation: We agree to write
{X B} := { : X() B}
for any random element X with values in E and any E X, and in thespecial case B = {x} also
{X = x} := { : X() = x} .
Similarly, we agree to write
P(X B) := P ({X B}) ,
and in the special case B = {x} also
P(X = x) := P ({X = x}) .
These abbreviations are customary.
Examples 2.3.1. Given an event A , set
1A() :=
1 if A
0 if Ac.
This defines a random variable 1A on , usually referred to as the indicatorfunction of the event A. Note that {1A = 1} = A, {1A = 0} = Ac and{1A = x} = for any x {0, 1}
c.
Since is discrete, any random element X on with values in a setE can have at most a countable number of different values x E. If weenumerate these countably many different values by {xj}
j=1
, then the events
{X = xj} are pairwise disjoint and =j=1 {X = xj}. If X attains only
finitely many different values then of course the same is true with a finitenumber n in place of .
If X is a random variable, we may rewrite it as
X =
i=1xj1{X=xj},
where 1{X=xj} is the indicator function of the event {X = xj}.
2.4 Bernoulli trials

7/27/2019 Skript Probability Oct12 2013
27/32
Chapter 3
Absolutely continuous
distributions
25

7/27/2019 Skript Probability Oct12 2013
28/32
Chapter 4
Measure theory and Lebesgue
integration
26

7/27/2019 Skript Probability Oct12 2013
29/32
Chapter 5
Product spaces and
independence
27

7/27/2019 Skript Probability Oct12 2013
30/32
Chapter 6
Weak convergence and the
Central Limit Theorem
28

7/27/2019 Skript Probability Oct12 2013
31/32
Chapter 7
Conditioning
29

7/27/2019 Skript Probability Oct12 2013
32/32
Chapter 8
Martingale theory
30