Lecture Notes in Statistics 121 Chapter 1 (Daquis)

download Lecture Notes in Statistics 121 Chapter 1 (Daquis)

of 41

Transcript of Lecture Notes in Statistics 121 Chapter 1 (Daquis)

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    1/41

    LECTURE NOTES

    Statistics 121

    Probability Theory I

    John Carlo P. Daquis

    Assistant Professor 1UP School of Statistics

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    2/41

    Whenever I look through my notes on

    probability distributions, my eyes will

    look for her.

    For I am always fascinated by the

    normal distribution

    the elegance she has formed

    from irrationalities,

    the majestic curve that flows

    from a point in eternity

    towards the other unfathomable

    infinity,

    the power emanating from her

    that challenges the impossible.

    She has given me the power that love

    could only give:

    Now I can defy probabilities.

    MAJC

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    3/41

    iii

    SCHOOL OF STATISTICS

    University of the Philippines Diliman

    Statistics 121 (3 Units)

    PROBABILITY THEORY 1

    COURSE SYLLABUS

    First Semester, School Year 2014 2015

    Instructor: John Carlo P. Daquis

    Instructor Information

    OFFICE : School of Statistics Faculty Room 26

    OFFICE HOURS : 1:00 3:00 Tuesday to Friday

    OFFICE PHONE : (+632) 928 0881

    OFFICE WEBSITE : stat.upd.edu.ph

    E-MAIL ADRESS : [email protected]

    [email protected]

    CLASS HOURS : 8:30 10:00 (WFR)

    10:00 11:30 (WFU)

    SCHEDULE OF : 8:30 11:30 (TTh)OTHER CLASSES

    Student Information Card

    On a 3x5 index card, provide the following:

    FRONT, left side: FRONT, upper right:

    Last Name, Given Name M.I. a recognizable 1x1 photo

    Mobile Number nickname below photo

    Email Address

    Person to contact in case of emergency BACK:class schedule

    Contact number of the person

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    4/41

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    5/41

    v

    Course Objectives

    By the end of the course, a student enrolled in this class must be able

    to have a heartfelt and working knowledge about all the lessons taught in

    class. The student must be able to acquire an advanced skill in logicalreasoning. In particular, one must be able to:

    Define probability and other concepts such as conditionality,

    independence, random variables and probability distributions;

    Master the basic properties of the probability function;

    Compute for the probability of an event;

    Use calculus skill in obtaining the distribution function from a density

    or mass function and vice versa;

    Evaluate expectations and moments; Be well aware of some special univariate distributions and its basic

    properties; and

    Derive the distribution of a function of a random variable.

    Course Prerequisites

    Course Prerequisites:

    Statistics 117/Mathematics for Statistics (or equiv.)

    -

    For proving methods, cardinality and basic combinatorics, settheory and evaluating sums

    Mathematics 53/Elementary Analysis 1

    - For evaluating limits, derivatives and integrals

    Course Co-requisite:

    Mathematics 54/Elementary Analysis 2

    Course Requirements

    Requirement Breakdown

    3 Long Exams 60%

    1 Final Exam 15%

    Problem Sets 07%Peer Eval. of PS 03%

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    6/41

    vi

    Quizzes/Assignments 10%

    Attendance 05%

    Long Exams

    Materials needed:2 blue books, ballpens, pencils, calculators, tables (if

    necessary), a fully-functioning brain

    Duration: 3 hours

    Schedule: to be announced, but will be on Saturday or Monday

    Coverage:Our course will span 5 chapters (please see table of contents),

    long exam 1 will cover chapters 1 and 2, long exam2 will cover chapters 3

    and 4 and long exam 3, the last chapter.

    Final Exam

    Materials needed: ballpens, pencils, a fully-functioning brain

    Duration:2 hours

    Coverage:Though the coverage is all the five chapters, the final exam is an

    evaluation on how well you know the concepts. Thus, computation and

    proving is very minimal.

    Exemptions: None.

    Problem Sets

    Number of members: 3 cooperative members. You must be, since 3% of

    your overall grade will be based on how you contributed to your group.

    Perks: There will be a special discussion session right before an exam. A

    sign-up sheet for volunteers will be posted at the door of my room.

    Boardwork volunteers who satisfactorily get a correct answer will be

    rewarded a bonus 5% in the long exam. Volunteers whose solutions are wrong

    will still get bonus points, though it is reduced to 4%. The class will assist

    him/her in getting the right answer.

    Attendance, Assignments and Quizzes

    Attendance: Always checked. Again, being focused and attentive is

    imperative. You might miss important details.

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    7/41

    vii

    Assignments/Quizzes:The schedule of giving of assignments or quizzes is

    unstructured. By default, assignments are done by student and to be

    submitted the next meeting while quizzes are announced.

    Grading System

    95 100 1.00 72

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    8/41

    Chapter 1

    Probability

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    9/41

    2

    1.1 Introduction: Modeling Reality

    1.1.1 Theories and Models

    Theories are ideas or concepts that are used to explain a phenomenonhappening in the real world. Thus, theories are just approximations

    of reality but are not necessarily true. For example, it was then

    believed that the universe is always expanding while maintaining a

    constant average density. A universe following this Steady State

    Theory has no beginning or end. As of this moment, it has been

    superseded by the Big Bang Theory, which says that the universe

    expanded from a singularity or a zone with infinite density. The Big

    Bang is now the prevailing theory of the universes development, yet

    it does not mean it is true, entirely or partially. A scientific process is

    done to disprove or refine a theory.

    Models on the other hand is a theoretical construct representing a

    process or describing a phenomenon using a set of variables and

    quantitative relationships between them. Models, though they are

    theoretical approximations of reality are of big help in understanding

    what is happening in our world. Thus, theres this famous quote

    essentially, all models are wrong, but some are useful by statistician

    George Box (1979).

    1.1.2 Deterministic and Probabilistic Models

    Suppose we wish to measure the area of a rectangular lot. Denoting

    the area by R, the formula we will be using is R=lwwhere l is the

    length of the lot and wis the width of a lot. This is what we call a

    deterministic model. The deterministic comes from the part that

    once the length and width are known, the area is assumed to be

    known. The formula models reality because even though the lot is not

    perfectly a rectangle, it has the quantitative relationship that helps usapproximate measurements of the area. Deterministic models describe

    a phenomenon which will always produce the same outcome without

    any room for variation.

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    10/41

    3

    Now consider an experiment of tossing a balanced coin. There are two

    possible outcomes: heads or tails. No matter what, one cannot

    perfectly predict what exactly the next outcome will be. However, we

    can say that the chance that the next outcome will come up heads is

    0.5. The model described is probabilistic in nature. It describes thefact that, in the long run the chance that a head will show up in a

    single toss is 0.5 but cannot assume with certainty what the next value

    will be. Probabilistic models describe different outcomes not by fixed

    values, but rather by taking into account the presence of randomness

    in a phenomenon.

    1.1.3 Applications of Probability

    Probability models are used to describe a phenomenon.

    Example: Population models often take into account the birth and

    death rates as well as the population size at a particular

    given time. One probabilistic model shows that it is very

    likely for a population to go extinct if the birth rate is

    equal to the death rate. The population will likely

    survive if the death rate is lower than the birthrate.

    Probability is a useful tool in decision-making.

    Example: Weather reports nowadays also include a chance of

    raining the next day. This would assist viewers in

    making decisions like whether or not to bring umbrellas

    and cancelling or pushing through an appointment

    tomorrow.

    Probability theory is the foundation of statistical inference.

    Example: A car manufacturer claims that their new car model ismore efficient than the leading model in the market. To

    validate this claim, the manufacturer conducted an

    experiment and the resulting sample mean mileage,

    is indeed lower than the leading model claim by

    2kms/liter. Is the difference significant to support the

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    11/41

    4

    manufacturers claim? A probability distribution

    governs the behavior of the sample mean to tell whether

    or not (given a significance level) the evidence is

    significant to support the claim.

    1.2 Classical Probability

    Probability theory originated in games of chance. If we toss an

    unbiased coin independently ntimes, where n is very large, then the

    ratio of heads to the total number of tosses will be very close to 0.5.

    Similarly, the relative frequency of getting a hearts in a standard deck

    of 52 cards is 0.25.

    The classical definition of probability relies at the assumption that allpossible outcomes of the activity are equally likely. The ratios are

    obtained a priorieven without doing the actual experiment, unlike

    the relative frequency approach or obtaining the probability a

    posteriori.

    1.2.1 Random Experiment

    Consider the magic 8-ball toy, a novelty item at which its primary use

    is to give random advices, making it a popular toy for fortune telling.

    Inside the ball is a buoyant icosahedron. All of the twenty faces of

    this polyhedron hold an answer:

    Affirmative answers: Neutral answers:

    It is certain (a1) Reply hazy try again (o1)

    It is decidedly so (a2) Ask again later (o2)

    Without a doubt (a3) Better not tell you now (o3)

    Yes definitely (a4) Cannot predict now (o4)

    You may rely on it (a5) Concentrate and ask again (o5)

    As I see it, yes (a6)

    Most likely (a7) Negative answers:

    Outlook good (a8)

    Yes (a9) My reply is no (n1)

    Signs point to yes (a10) My sources say no (n2)

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    12/41

    5

    Don't count on it (n3)

    Outlook not so good (n4)

    Very doubtful (n5)

    Shaking the magic 8-ball and waiting for an answer to show up is an

    example of a random experiment. A random experiment is an activity

    which can be repeated many times under the same conditions, but

    whose outcome cannot be predicted with certainty. Thus in a random

    experiment, the outcome can no way be predicted by any previous

    outcomes.

    Here are some other examples of a random experiment:

    The tossing of a coin

    Rolling a die

    Selecting a numbered ball in an urn

    Spinning a bottle.

    1.2.2 Sample Space, Outcomes and Events

    A random experiment has a set of realizations or sample points. The

    realizations of a random experiment is called an outcome. All possible

    outcomes belong to a set called the sample space. Any subset of the

    sample space is an event.

    Definitions: Outcomes, denoted by are realizations of a random

    experiment. They are the elements of the sample space.

    Sample Space, denoted by is the set of all possible

    outcomes for a random experiment listed in a mutually

    exclusive and exhaustive way.

    No magic. The proprietary toy, Magic 8-Ball is

    nothing but an instrument used in a random

    experiment.

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    13/41

    6

    Events are any subsets of the sample space and are

    denoted by capital latin letters.

    Example: In the magic 8-ball experiment, there is a total of 20

    outcomes. Consider the following:

    = {a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, o1, o2, o3,

    o4, o5, n1, n2, n3, n4, n5}

    A = the event of observing the response my sources say

    no = {n2}

    B = the event of observing a neutral response

    = {o1, o2, o3, o4, o5}

    In the definition of a sample space, the phrase mutually exclusive

    means that there should be no overlap of outcomes. For example, the

    set

    = {H, T, heads, tails}

    is not appropriate since the outcome of seeing heads in the experiment

    of tossing a coin is represented by two outcomes in the set, hence the

    outcomes are not mutually exclusive. On the other hand, the word

    exhaustive means that all outcomes must be included in the sample

    space. The set

    * = {1, 2, 4, 5, 6}

    is not a valid sample space for the rolling of a die experiment because

    the outcome of three dots in the upper face is not represented in the

    set.

    Example: Consider the random experiment of tossing a coin thrice

    = {(HHH), (HHT), (HTH), (HTT), (THH), (THT),

    (TTH), (TTT)}1

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    14/41

    7

    = {=(1, 2, 3)| i{H, T}, i = 1, 2, 3}

    A = event of observing 2 heads

    = {(HHT), (HTH), (THH)}

    B = observing at least 2 tails

    = {(HTT),(THT),(TTH),(TTT)C = observing the same face on the 1st& 3rdtosses

    = {(HHH), (HTH), (THT), (TTT)}

    = {=(1, 2, 3)| i{H, T}, i = 1, 2, 3

    and 1 = 3}

    Writing the sample space is not unique in terms of the outcomes. For

    example, consider again the experiment tossing a coin thrice, the

    sample space

    1= {0, 1, 2, 3}

    can be used as a sample space especially when the attribute of interest

    is the number of heads.

    We say that an event Aoccurs when one of its elements is the outcome

    of the experiment. That is, if we say A= event of observing 2 heads

    occurs, then either HHT, HTHor THHis the outcome. Events having

    only one element are called elementary events while events having

    more than one elements are called compound events. Elementary and

    compound events will be formally defined later.

    So far, the examples above are all examples of discrete sample space,

    in particular, finite sample spaces. Suppose someone wants to measure

    the height of a person. The sample space for this experiment is

    = {| (0, )}.

    Definitions: Discrete Sample Space is a sample space which isfinite or countably infinite.

    Continuous Sample Spaceis a sample space which

    is neither finite nor countably infinite.

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    15/41

    8

    Example: Consider a random experiment of observing the number

    of people going inside a mall in a particular day and the

    average time in seconds of the waiting time between

    each person going inside the mall. The sample space can

    be represented as follows:

    ={=(x,y)| x=0,1,2, and y 0}.

    Here, an outcome (37,000, 0.394) is a point in the sample

    space wherein there were 37,000 people who went inside

    the mall in a particular day with an average waiting

    time of 0.394 seconds between each person going inside

    the mall.

    Exercise: Specify the correct sample space for the following

    random experiments and find their respective

    cardinalities:

    i. 6 balls are drawn without replacement from an

    urn filled with balls labeled 1 to 42, ordering is

    not important

    ii. 6 balls are drawn with replacement from an urn

    filled with balls labeled 1 to 42 and observing

    which element was selected at each draw

    iii. Measuring the weight of a newborn baby

    iv. Tossing a coin until a head comes up

    v. Rolling a die until the face with six dots comes

    up

    1.2.3 Basic Concepts of Set Theory

    Set theory concepts are important in learning probability. The sample

    space can be viewed as a set having all elements, or a universal set.Events are sets while outcomes are elements. Here is a review of the

    basic concept definitions in set theory.

    The occurrence of an event Acan be translated in the language of sets

    and can be schematically represented by its corresponding Venn

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    16/41

    9

    diagram. Consider the following below. Note that the shaded regions

    are the regions of interest.

    The sample space : universal set

    Event Aoccurs : set A

    Event Adoes not occur : complement

    of A, Ac

    At least one of Aand B : union

    Occur A B

    Both Aand Boccur : intersection

    A B

    Aand B cannot occur : A and B

    simultaneously disjoint

    A B=

    Aoccurs but notB : difference

    A-B = ABc

    Either in Aor in B : symmetric

    but not both difference

    (A B) (A B) = A B

    BA

    A

    Ac

    A

    A B

    A B

    A B

    A B

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    17/41

    10

    Definition: An event Eis said to be a simpleor elementary event

    if and only if

    i. E and

    ii.

    For every event A ,E A=or E A=E.

    A compound event is an event that can be expressed

    as the union of distinct elementary events.

    These set operations can now be used to form new events from the

    old or existing ones. The process is called event composition.

    Exercise: Consider the random experiment of tossing a coin thrice.

    Let Bi= event that the ith toss is a head, i = 1,2,3

    Express the following events in terms of the B is:

    C = the 1sttoss is a head

    B = the 1sttwo tosses are heads

    D = there is exactly one toss which results as a head

    E = there is at least one toss which results as a headF = there is at most one toss which results as a head

    G = the second toss is a tail and the third toss is a head

    H = the second toss is a tail and there is at most one

    head

    I = the second toss is a tail and there is exactly one

    head

    J = all tosses are tails

    Exercise: Let A, B and Cbe arbitrary events in . Define the

    following events in terms of A, Band C:

    D1= event that (et) at least two of the events

    A, B, C occur

    D2= et at exactly two of the events A, B, C occur

    D3= et at least one of the events A, B, C occur

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    18/41

    11

    D4= et exactly one of the events A, B, C occur

    D5= et at most two of the events A, B, C occur

    Now suppose that all A, B and C are disjoint, find

    suitable expressions for D1to D5.

    1.2.4 Generalized Union and Intersection of Events

    The union of events B1, B2, , Bnis the event consisting of outcomes

    which belong to at least one of the events B1, B2, , Bn.

    Finite Union : U

    =

    =

    Similarly, define the union of a countably infinite sequence of events

    B1, B2, as the event whose outcomes belong to at least one of the

    events B1, B2, .

    Countable Union : U

    =

    =

    The intersection of events B1, B2, , Bn is the event consisting of

    outcomes which belong to all of the events B1, B2, , Bn.

    Finite Intersection : I

    =

    =

    Similarly, define the intersection of a countably infinite sequence of

    events B1, B2, as the event whose outcomes belong to all of the

    events B1, B2, .

    Countable Intersection : I

    ==

    Exercise: Prove that De Morgans Laws are valid for a finite

    collection of sets. The same argument can be extended

    to a countable series of sets by considering an infinite

    sequence of events rather than a finite one.

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    19/41

    12

    Example: A couple plans to have 7 children. Define the event

    Gi= event that the ith child is a girl.

    The following events can be expressed in terms of Gias

    follows:

    A = et all children are girls

    I

    =

    =

    B = et all children are boys

    ==

    ==

    UI

    C = et only the first child is a girl

    U

    =

    =

    D = et there is only 1 girl among the seven children

    U U

    =

    =

    =

    Exercise: Define the sample space as the set of nonnegative real

    numbers. Let

    K

    =

    =

    =

    Evaluate the following sets:

    1.

    U

    =

    2. I

    =

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    20/41

    13

    1.2.5 Methods of Assigning Probabilities

    Let us now introduce the notation P()for probability. For example,

    the probability of an event Ais P(A).

    Definition: The classical probability or a priori approach in

    assigning probabilities considers a random experiment

    with n() equally likely outcomes. Consider an event

    with n(A)outcomes, then

    ( ) ( )

    ( )=

    .

    Classical probability assumes that all the outcome are equally likely

    and the sample space to be finite. The sample space should therefore

    be defined in such a way that the outcomes are equally likely to

    happen. As will be discussed later, classical probability definition is

    too restrictive and circular.

    Example: A bingo shaker contains three B chips labeled B1, B2

    and B3, two N chips labeled N31 and N32, and O61

    chip. A second bingo shaker has a B4 chip, two N chips

    labeled N33 and N34, and three O chips labeled O62,

    O63 and O64. One chip is selected randomly from each

    shaker. The sample space in this random experiment is

    as follows:

    ={=(x,y)| x{B1,B2,B3,N31,N32,O61} and

    y{B3,N33,N34,O62,O63,O64}}

    All outcomes are equally likely because selection is done

    at random. Also, n() = (6)(6) = 36.

    Define the following events:

    A = event of selecting 2 B chips

    B = event of selecting 2 N chips

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    21/41

    14

    Now,

    P(A)= n(A)/n() = (3)(1)/36 = 3/36

    P(B)= n(B)/n() = (2)(2)/36 = 4/36

    The probabilities in the example above is possible to acquire since the

    sample space is appropriately defined with equally likely outcomes. If

    the sample space is defined in this manner: ={=(x,y)| x{B,N,O}

    and y{B,N,O}} the outcomes are not anymore equally likely to

    occur. We have just seen a case where P({(BB)})

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    22/41

    15

    2000000 333275 0.1666375

    Notice that as the number of trials increase, the relative

    frequency approaches a certain value, 1/6. Thus, the

    relative frequency is a good probability estimate when nis large enough.

    Doing the experiment and solely relying on the relative frequency does

    not tell exactly that the probability is 1/6. The value could have been

    0.16666793. But classical probability gives an evidence that the limit

    of the relative frequency as the number of trials approach infinity is

    indeed 1/6.

    Definition: In Subjective Probability,P(A)is derived from an

    individuals personal judgment. If the individual feels

    that the event is more likely to occur then P(A)is close

    to 1. If it is less likely to occur then P(A)is close to 0.

    1.3 The Axiomatic Definition of Probability

    Classical probability definition has two main flaws: it is too restrictive

    and circular. For probabilities to be defined, classical probability

    restricts the outcomes of a random experiment to be equally likely

    (e.g. assume the coin is fair, the dice is balanced, etc.). The equally

    likely assumption also makes the definition circular. Equally likely is

    equally probable the very concept being defined is used as an

    assumption. Thus there is a need to provide a definition of probability

    which is free from these flaws while still not contradictory to the

    classical one.

    1.3.1 The Event Space

    Not all subsets of the sample space can be considered as an event.These events are not of interest or cannot be measured. This is not

    apparent when considering random experiments like tossing a coin or

    rolling a die. But when measuring length where the sample space is

    the set of positive real numbers, we can consider events like observing

    a length between 48 to 90 inches, but not events such as observing

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    23/41

    16

    irrational numbers between 48 to 90 inches. The events under

    consideration will form a class which is called the event space.

    Definition: Event space F is the class of all events associated with

    a given experiment. For mathematical consistency, weconsider the event space to be a -algebra. That is,

    i. F. The sample space is an event. Since it

    contains all possible outcomes, is called the sure

    event.

    ii. Closure under complementation. If F

    then F.

    iii. Closure under countable union. If A1, A2,

    A3 is a sequence of events belonging in F, then

    U

    = F.

    Theorem: F. The empty set is also called as the impossible

    event. This holds from i and ii.

    Theorem: F is closed under finite union. Consider a sequence of

    events A1, A2, A3, Ak, Ak+1, Ak+2, where Ak+i =

    for i = 0, 1, 2,.

    Theorem: F is closed under finite intersection. (pf. exercise)

    Theorem: F is closed under countable intersection. (pf. exercise)

    1.3.2 Mutually Exclusive Events and Partition of a Set

    Definition: Define a sequence of events A1, A2, A3, in F. The

    events in the sequence are mutually exclusive if all pairs

    Aiand Ajare mutually exclusive. That is,

    = .

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    24/41

    17

    It is impossible for any two or any number of events in

    the sequence to happen at the same time. Mutually

    exclusive events are also called pairwise disjoint events.

    Consequently, the definition of mutually exclusive

    events also hold for a finite number of events.

    Example: In a standard deck of 52 cards, the event of observing

    an Ace and the event of observing a king are two

    mutually exclusive events since there is no card which

    is both an Ace and a king. On the other hand, the event

    of observing a king and the event of observing a spade

    suit are not mutually exclusive events because of the

    fact that there is a king of spades.

    Definition: The sets A1, A2, A3, Anin Fis said to be a partition

    of a set Aif and only if the following conditions hold:

    i. Nonempty: K = ,

    ii. U

    =

    = and

    iii. A1, A2, A3, Anare pairwise disjoint.

    Exercise: Let A and Bbe sets in F. Show that the sets (A-B), AB

    and (B-A)form a partition of . Assume that iin

    the definition is true2.

    1.3.3 The Axiomatic Definition of Probability

    Published in his book Foundations of the Theory of Probability,

    Andrey N. Kolmogorov has laid the groundwork for the modern

    probability.

    Definition: A probability function P() is a set function which

    assigns a number P(A) for every A in the -field ,

    satisfying the following axioms:

    i. P(A) 0 for every F,

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    25/41

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    26/41

    19

    ( ) ( ) ( )

    =

    +=

    ( )

    =

    =

    Since the probability measure is nonnegative (axiomi), the

    last line holds if and only if = 0.

    Theorem: Finite Additivity.Let A1, A2,,Anbe a collection of

    mutually exclusive events in F, then

    ( )==

    =

    U .

    Proof: Let A1, A2,,Anbe a collection of mutually exclusive

    events in F. Define a countably infinite sequence {Ai}

    = A1, A2, , An, An+1, in Fsuch that Ai= for

    i>n. By the bound law and the assumption on A1,

    A2,,An, the events in the sequence {Ai}are mutually

    exclusive. Also by the bound law,

    UUUUU K

    =

    +====

    ===

    .

    Now, (supply justifications)

    ( )

    ( ) ( ) ( ) ( )

    ( )

    +==

    +==

    +==

    =

    ==

    +=

    +=+=

    =

    =

    UU

    ( )=

    =

    The example below shows that the classical probability assumption of

    equally likely outcomes need not hold when obtaining probabilities

    using the axiomatic definition. Another example shows that the

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    27/41

    20

    axiomatic probability definition is consistent with the classical

    probability definition.

    Example: Consider tossing a biased coin where the event of

    observing a head is twice as likely as the event ofobserving a tail. Find the probability of observing a tail.

    This is a random experiment where the outcomes are

    not equally probable. Still, = {H,T}. Define eventA

    = {T}. If we let P(A) = p, then P(Ac) = P({H}) = 2p.

    Now to find the value of p,

    ( ) ( ) ==

    ( ) ( ) ( )

    =

    =

    +=+=

    Example: Consider a sample space with nequally likely outcomes.

    Show that the probability of an event is

    consistent with the classical definition of probability.

    Define ={1, 2, , n}so that n() = n. Let E1,

    E2, , Enbe nequally likely events where Ei = {i}, i= 1,2,,n. That is, the Eis form a partition of the

    sample space. Define P(Ei) = pfor all i.

    Now,

    ( ) ( )

    ( )

    K

    U

    ===

    ===

    ==

    ===

    Using the result above, we get

    ( ) ( ) ( )

    ===

    =

    U

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    28/41

    21

    ( ) ( )

    = .

    Theorem: Probability of a Complement.For any event F,

    P(Ac) = 1 P(A).

    Proof: Let F. By the complement law, , so that

    (supply justifications)

    ( ) ( )( ) ( )

    =+

    ==

    ( ) ( )

    =

    Theorem: Probability of a difference. For any event F,

    P(A-B) = P(A) P(AB).

    Proof: Let, F. Note that ABand A-B form a partition

    of A (verify) so that (supply missing steps or

    justifications), P(A) = P(AB) + P(A-B). The result

    immediately follows.

    Theorem: Probability of a union of two events.For any event

    F, P(AB) = P(A) + P(B) P(AB).

    Proof: (Exercise. Hint: verify partitioning of some sets and use

    the property on probability of a difference.)

    Corollary: Probability of a Union of Three Events. For any

    events F,

    ( ) ( ) ( ) ( ) ( ) ( ) ( )( )

    +

    ++=

    Proof: (Exercise. Use the property on the probability of a union

    of two events.)

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    29/41

    22

    Corollary: For any mutually exclusive events F, P(AB) =

    P(A) + P(B). This corollary can also be a direct result

    of finite additivity.

    Theorem: Monotonicity Property. For any event F, If

    then P(A) P(B).

    Proof: (exercise)

    Corollary: For any event , P(A) 1. To prove this, define

    B=and use the previous theorem.

    Example: If P(A) = 0, then P(AB) = 0.

    Since ABis a subset of A, P(AB) P(A). By axiom (i),

    this is possible only when P(AB) = 0.

    1.3.4 Properties on Generalized Set Operations

    The following special theorems show properties of probabilities of

    events expressed as a generalized union or intersection.

    Theorem: Inclusion Exclusion Formula. Let A1, A2,,Anbe

    a collection of events in F. The probability of the union

    of these nevents is given by the formula below:

    ( ) ( ) ( )

    ( )

    +

    +=

    =

    +

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    30/41

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    31/41

    24

    ( ) ( )

    ( ) ( )( )

    ++

    =

    +

    =

    ++

    +

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    32/41

    25

    UU

    =

    =

    =

    The result follows.

    Booles inequality also holds for a finite sequence of events A1,

    A2,,An.

    Example: Prove that if A and B are events then P(AB) 1

    P(AC) P(Bc).

    Proof: (supply justifications)

    ( ) ( )( )

    =

    =

    But

    ( ) ( ) ( )( ) ( ) ( )( )

    +

    +

    The result immediately follows.

    Theorem: Bonferronis Inequality.Define a finite sequence of

    eventsA1, A2, , An in F . The following inequality

    holds:

    ( )==

    I .

    Another form of Bonferronis inequality is as follows:

    ( ) ( )

    ==

    I .

    Proof: (exercise)

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    33/41

    26

    Theorem: Continuity from Below. If {An} is a monotone

    nondecreasing sequence of events in Fand

    F

    then

    ( ) ( )

    =

    ==

    U

    Proof: (exercise)

    Theorem: Continuity from Above. If {An} is a monotone

    nonincreasing sequence of events in F and

    F

    then

    ( ) ( )

    =

    ==

    I

    Proof: (exercise)

    1.3.4 Event Composition

    Definition: Event Compositionis a way of defining an event interms of other events using set operations. This method

    when paired with the properties of the probability

    measure can be used to determine the probability of the

    composed events.

    The following are the steps in obtaining probabilities using event

    composition and properties of the probability measure:

    Step 1: Define the basic events.

    Step 2: List (or compute) the probabilities of these basic events.

    Step 3: Express events in question in terms of the basic events.

    Step 4: Use the properties of the probability measure to obtain

    the probabilities of these events.

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    34/41

    27

    Example: Amelia Mintz and Tony Chu can detect if aspartame is

    present in a particular meal. If aspartame is present,

    Amelia can detect it with probability 0.95 while Tony

    can detect it with probability 0.90. They can both detect

    it 88% of the time. If a meal indeed has aspartamepresent, find the probability that

    i. at least one of them will detect aspartame

    presence.

    ii. they will not detect aspartame in the meal.

    iii. only Amelia will be able to detect aspartame in

    the meal.

    Solution: Define the following events:

    A = et Amelia detects aspartame in the meal.

    B = et Tony detects aspartame in the meal.

    Given:

    P(A) = 0.95

    P(B) = 0.90

    P(AB) = 0.88

    i. =P(A) + P(B) P(AB)

    = 0.95 + 0.90 0.88

    = 0.97

    ii. =1

    = 1 0.97 = 0.03

    iii.

    = 0.95 0.88 = 0.07

    1.3.4 Special Probability Examples

    Example: This example shows that there can be nonempty setswhich have a probability of zero.

    Suppose probabilities are assigned to the Borel sets of

    =[0, 1]in such a way that for any real number aand

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    35/41

    28

    bwhere 0a

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    36/41

    29

    The event above is a match, thus the name matching

    problem. The sample space is defined as:

    ( ) { } ===

    KK

    where n() = n(n-1)(n-2)(3)(2)(1) = n!.

    The random experiment has equiprobable outcomes,

    because the hatcheck person is absent-minded and

    randomly gives back the hats. Classical probability can

    be used in this example.

    The following are the probabilities:

    P(Ai) = (n-1)!/n! = 1/n

    P(AiAj) = (n-2)!/n! = 1/n(n-1)

    P(AiAjAk) = (n-3)!/n! = 1/n(n-1)(n-2)

    P(A1A2An) = 1/n!

    We therefore need to find

    =U

    , or the probability

    of at least one match since the probability of no matchis just 1

    =U

    . Now, by the inclusion-exclusion

    formula,

    ( ) ( ) ( )

    ( ) ( )

    K

    KU

    +

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    37/41

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    38/41

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    39/41

    32

    b. sampling is randomly done without replacement

    4. Let Aand B be events. Establish the following:

    a.

    b.

    c. Verify if is true or not.

    5. Let A and Bbe disjoint events.

    a. Are Acand Bcdisjoint?

    b. Are and disjoint for any event nonempty set C?

    c. Are and disjoint for any event nonempty set C?

    6. A delegation of 4 students is selected each year from the School of

    Statistics to attend the annual conference of the Philippine

    Statistical Association.

    a. In how many ways can the delegation be chosen if there are 12

    eligible students?

    b. There are 7 eligible males while there are 5 eligible females.

    What is the probability that the delegation will be composed

    of 2 male and 2 female students?

    c. Two students are lovers and they are in agreement that they

    should never part because love dominates everything they do.

    What is the probability that this selection process will test

    their supposedly unrelenting mutual pact, that is, find the

    probability that the selection process includes either one of

    them but not both?

    7. After a typhoon, 50% of the residents of a particular municipality

    in Camarines Sur were without electricity, 47% without water and

    38% without telephone services. One of five residents still have all

    three while 10% were without all three, 12% were without

    electricity and water but still had a working telephone and 4%

    were without electricity and a working telephone but still hadwater. A resident of the municipality is randomly selected for an

    interview. Express the given events in set notation and calculate

    the following probabilities:

    a. the selected resident still has at least one of the utilities

    b. the selected resident had water and telephone service but

    without electricity

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    40/41

  • 8/9/2019 Lecture Notes in Statistics 121 Chapter 1 (Daquis)

    41/41

    12.Diseases A, B and C are prevalent among people in a certain

    population. It is assumed that 10% of the population will contract

    disease Asometime during their lifetime, 25% will contract disease

    Band 20% will contract disease Ceventually. There are 5% who

    will contract diseases Aand B, 5% who will contract diseases Aand Cand 5% who will contract diseases Band C. Lastly, 3% will

    contract all three diseases sometime during their lifetime. A person

    is chosen randomly from this population. Find the probability that

    this selected person:

    a. will contract at least one of the three diseases

    b. will never contract any of the three diseases

    c. will contract at most one disease

    d. will contract only diseases Aand B

    will contract exactly two diseases

    13.Solve for the probability of total derangement in the old hats

    problem where there are 10 guests who have checked their hats in

    the counter.

    14.There are 5 collectible stickers in pancit canton pouches. An

    impulsive collector buys 15 pouches of the product. What is the

    probability that the collector solves his problem and collects all 5

    different stickers?

    15.Prove Bonferronis inequality.

    16.Prove continuity from above.