Summary 1231 Alg

download Summary 1231 Alg

of 25

description

Summary of UNSW's Algebra course for Math1231

Transcript of Summary 1231 Alg

  • A Summary of

    MATH1231 Mathematics 1B

    ALGEBRA NOTES

    For MATH1231 students, this summary is an extract from the Algerba Notes in the MATH1231/41Course Pack for revision.

    If you found any mistake or typo, pleaase send me an email to [email protected]

  • ii

  • Chapter 6

    VECTOR SPACES

    6.1 Definitions and examples of vector spaces

    Definition 1. A vector space V over the field F is a non-empty set V of vectorson which addition of vectors is defined and multiplication by a scalar is defined insuch a way that the following ten fundamental properties are satisfied:

    1. Closure under Addition. If u,v V , then u+ v V .2. Associative Law of Addition. If u,v,w V , then (u+v)+w = u+(v+w).3. Commutative Law of Addition. If u,v V , then u+ v = v + u.4. Existence of Zero. There exists an element 0 V such that, for all v V ,

    v+ 0 = v.

    5. Existence of Negative. For each v V there exists an element w V(usually written as v), such that v +w = 0.

    6. Closure under Multiplication by a Scalar. If v V and F, thenv V .

    7. Associative Law of Multiplication by a Scalar. If , F and v V ,then (v) = ()v.

    8. If v V and 1 F is the scalar one, then 1v = v.9. Scalar Distributive Law. If , F and v V , then (+ )v = v+ v.10. Vector Distributive Law. If F and u,v V , then (u+v) = u+v.

    The following are vectors spaces.

    Rn over R, where n is a positive integer. Cn over C, where n is a positive integer. P(F), Pn(F) over F, where F is a field. Usually F is either Q, R or C.

    1

  • 2 CHAPTER 6. VECTOR SPACES

    Mmn(F) over F, where m, n are positive integers and F is a field.

    Furthermore, the following set, its subset of all continuous function and its subset of all differentiablefunctions are vector spaces over R.

    R[X], the set of all possible real-valued functions with domain X.

    6.2 Vector arithmetic

    Proposition 1. In any vector space V , the following properties hold for addition.

    1. Uniqueness of Zero. There is one and only one zero vector.

    2. Cancellation Property. If u,v,w V satisfy u+ v = u+w, then v = w.

    3. Uniqueness of Negatives. For all v V , there exists only one w V such that v+w = 0.

    Proposition 2. Suppose that V is a vector space over a field F, F, v V , 0 is the zero scalarin F and 0 is the zero vector in V . Then the following properties hold for multiplication by a scalar:

    1. Multiplication by the zero scalar. 0v = 0,

    2. Multiplication of the zero vector. 0 = 0.

    3. Multiplication by 1. (1)v = v (the additive inverse of v).

    4. Zero products. If v = 0, then either = 0 or v = 0.

    5. Cancellation Property. If v = v and v 6= 0 then = .

    6.3 Subspaces

    Definition 1. A subset S of a vector space V is called a subspace of V if S isitself a vector space over the same field of scalars as V and under the same rules foraddition and multiplication by scalars.

    In addition if there is at least one vector in V which is not contained in S, thesubspace S is called a proper subspace of V .

    Theorem 1 (Subspace Theorem). A subset S of a vector space V over a field F, under the samerules for addition and multiplication by scalars, is a subspace of V if and only if

    i) The vector 0 in V also belongs to S.

    ii) S is closed under vector addition, and

    iii) S is closed under multiplication by scalars from F.

  • 6.4. LINEAR COMBINATIONS AND SPANS 3

    6.4 Linear combinations and spans

    Definition 1. Let S = {v1, . . . ,vn} be a finite set of vectors in a vector space Vover a field F. Then a linear combination of S is a sum of scalar multiples of theform

    1v1 + + nvn with 1, . . . , n F.

    Proposition 1 (Closure under Linear Combinations). If S is a finite set of vectors in a vectorspace V , then every linear combination of S is also a vector in V .

    Definition 2. Let S = {v1, . . . ,vn} be a finite set of vectors in a vector space Vover a field F. Then the span of the set S is the set of all linear combinations of S,that is,

    span (S) = span (v1, . . . ,vn)

    = {v V : v = 1v1 + + nvn for some 1, . . . , n F}.

    Theorem 2 (A span is a subspace). If S is a finite, non-empty set of vectors in a vector space V ,then span(S) is a subspace of V . Further, span(S) is the smallest subspace containing S (in thesense that span(S) is a subspace of every subspace which contains S).

    Definition 3. A finite set S of vectors in a vector space V is called a spanningset for V if span (S) = V or equivalently, if every vector in V can be expressed asa linear combination of vectors in S.

    6.4.1 Matrices and spans in Rm

    Proposition 3 (Matrices, Linear Combinations and Spans). If S = {v1, . . . ,vn} is a set ofvectors in Rm and A is the m n matrix whose columns are the vectors v1, . . . ,vn then

    a) a vector b in Rm can be expressed as a linear combination of S if and only if it can beexpressed in the form Ax for some x in Rn,

    b) a vector b in Rm belongs to span (S) if and only if the equation Ax = b has a solution x inRn.

    Definition 4. The subspace of Rm spanned by the columns of an mn matrix Ais called the column space of A and is denoted by col(A).

  • 4 CHAPTER 6. VECTOR SPACES

    6.4.2 Solving problems about spans

    6.5 Linear independence

    Definition 1. Suppose that S = {v1, . . . , vn} is a subset of a vector space. Theset S is a linearly independent set if the only values of the scalars 1, 2, . . . , nfor which

    1v1 + + nvn = 0 are 1 = 2 = = n = 0.

    Definition 2. Suppose that S = {v1, . . . , vn} is a subset of a vector space. Theset S = {v1, . . . ,vn} is a linearly dependent set if it is not a linearly independentset, that is, if there exist scalars 1, . . . , n, not all zero, such that

    1v1 + + nvn = 0.

    6.5.1 Solving problems about linear independence

    We have seen that questions about spans in Rm can be answered by relating them to questionsabout the existence of solutions for systems of linear equations. The same is true for questionsabout linear dependence in Rm.

    Proposition 1. If S = {a1, . . . ,an} is a set of vectors in Rm and A is the m n matrix whosecolumns are the vectors a1, . . . ,an then the set S is linearly dependent if and only if the systemAx = 0 has at least one non-zero solution x Rn.

    6.5.2 Uniqueness and linear independence

    Theorem 2 (Uniqueness of Linear Combinations). Let S be a finite, non-empty set of vectorsin a vector space and let v be a vector which can be written as a linear combination of S. Thenthe values of the scalars in the linear combination for v are unique if and only if S is a linearlyindependent set.

    6.5.3 Spans and linear independence

    Theorem 3. A set of vectors S is a linearly independent set if and only if no vector in S can bewritten as a linear combination of the other vectors in S, that is, if and only if no vector in S isin the span of the other vectors in S.

    Note. The theorem is equivalent to:A set of vectors S is a linearly dependent set if and only if at least one vector in S is in thespan of the other vectors in S.

    Theorem 4. If S is a finite subset of a vector space V and the vector v is in V , thenspan (S {v}) = span (S) if and only if v span (S).

    Theorem 5. Suppose that S is a finite subset of a vector space. The span of every proper subsetof S is a proper subspace of span (S) if and only if S is a linearly independent set.

    Theorem 6. If S is a finite linearly independent subset of a vector space V and v is in V but notin span (S) then S {v} is a linearly independent set.

  • 6.6. BASIS AND DIMENSION 5

    6.6 Basis and dimension

    6.6.1 Bases

    Definition 1. A set of vectors B in a vector space V is called a basis for V if:

    1. B is a linearly independent set, and

    2. B is a spanning set for V (that is, span (B) = V ).

    Let B = {v1, . . . , vn} be a basis for a vector space V over F. Every vector v V can beuniquely written as

    v = 1v1 + + nvn, where 1, . . . , n F.

    An orthonormal basis is a basis whose elements are all of length 1 and are mutually orthogonal.The advantage of using an orthonormal basis is that we can write easily any vector as the uniquelinear combination of the basis by dot product.

    6.6.2 Dimension

    Theorem 1. The number of vectors in any spanning set for a vector space V is always greaterthan or equal to the number of vectors in any linearly independent set in V .

    Theorem 2. If a vector space V has a finite basis then every set of basis vectors for V containsthe same number of vectors, that is, if B1 = {u1, . . . ,um} and B2 = {v1, . . . ,vn} are two bases forthe same vector space V then m = n.

    Definition 2. If V is a vector space with a finite basis, the dimension of V ,denoted by dim(V ), is the number of vectors in any basis for V . V is called a finitedimensional vector space.

    Theorem 3. Suppose that V is a finite dimensional vector space.

    1. the number of vectors in any spanning set for V is greater than or equal to the dimension ofV ;

    2. the number of vectors in any linearly independent set in V is less than or equal to the dimen-sion of V ;

    3. if the number of vectors in a spanning set is equal to the dimension then the set is also alinearly independent set and hence a basis for V ;

    4. if the number of vectors in a linearly independent set is equal to the dimension then the setis also a spanning set and hence a basis for V .

  • 6 CHAPTER 6. VECTOR SPACES

    6.6.3 Existence and construction of bases

    Theorem 4. If S is a finite non-empty subset of a vector space then S contains a subset which isa basis for span (S).

    In particular, if V is any non-zero vector space which can be spanned by a finite set of vectorsthen V has a basis.

    Theorem 5. Suppose that V is a vector space which can be spanned by a finite set of vectors. If Sis a linearly independent subset of V then there exists a basis for V which contains S as a subset.In other words, every linearly independent subset of V can be extended to a basis for V .

    Theorem 6 (Reducing a spanning set to a basis in Rm). Suppose that S = {v1, . . . ,vn} is anysubset of Rm and A is the matrix whose columns are the members of S. If U is a row-echelon formfor A and S is created from S by deleting those vectors which correspond to non-leading columnsin U then S is a basis for span (S).

    Theorem 7 (Extending a linearly independent set to a basis in Rm).Suppose that S = {v1, . . . ,vn} is a linearly independent subset of Rm and A is the matrix whosecolumns are the members of S followed by the members of the standard basis for Rm. If U is a row-echelon form for A and S is created by choosing those columns of A which correspond to leadingcolumns in U then S is a basis for Rm containing S as a subset.

    Proposition 8. If V is a finite-dimensional space andW is a subspace of V and dim(W ) = dim(V )then W = V .

    Proof. By Theorem 4, there exists B a basis for W . So B is a linearly independent set in V . ByTheorem 3 part 4, B is also a basis for V and V = span (B) =W .

    6.9 A brief review of set and function notation

    6.9.1 Set notation.

    A set is any collection of elements. Sets are usually defined either by listing their elements or bygiving a rule for selection of the elements. The elements of a set are usually enclosed in braces {}.

    Definition 1. Two sets A and B are equal (notation A = B) if every element ofA is an element of B, and if every element of B is an element of A.

    To prove that A = B it is necessary to prove that the two conditions:

    1. if x A then x B, and

    2. if x B then x A

    are both satisfied.

    Definition 2. A set A is a subset of another set B (notation A B) if everyelement of A is also an element of B.

  • 6.9. A BRIEF REVIEW OF SET AND FUNCTION NOTATION 7

    To prove that A B it is necessary to prove that the condition:if x A then x B

    is satisfied.

    Definition 3. A is said to be a proper subset of B if A is a subset of B and atleast one element of B is not an element of A.

    To prove that A is a proper subset of B it is necessary to prove that the two conditions:

    1. if x A then x B, and

    2. for some x B, x is not an element of Aare both satisfied.

    Definition 4. The intersection of two sets A and B (notation: AB) is the setof elements which are common to both sets.

    That is,A B = {x : x A and x B}.

    Definition 5. The union of two sets A and B (notation: A B) is the set of allelements which are in either or both sets.

    That is,A B = {x : x A or x B}.

    6.9.2 Function notation

    The notation f : X Y (which is read as f is a function (or map) from the set X to the set Y )means that f is a rule which associates exactly one element y Y to each element x X. The yassociated with x is written as y = f(x) and is called the value of the function f at x or theimage of x under f. The set X is often called the domain of the function f and the set Y is oftencalled the codomain of the function f .

    Equality of Functions. Two functions f : X Y and g : X Y are defined to be equal if andonly if f(x) = g(x) for all x X.Addition of Functions. If f : X Y and g : X Y , and if elements of Y can be added, thenthe sum function f + g is defined by

    (f + g)(x) = f(x) + g(x) for all x X.

    Multiplication by a Scalar. If f : X Y and F, where F is a field, and if elements of Ycan be multiplied by elements of F, then the function f is defined by

    (f)(x) = (f(x)

    )for all x X.

  • 8 CHAPTER 6. VECTOR SPACES

    Multiplication of Functions. If f : X Y and g : X Y , and if elements of Y can bemultiplied, then the product function fg is defined by

    (fg)(x) = f(x)g(x) for all x X.

    Composition of Functions. If g : X W and f : W Y , then the composition functionf g : X Y is defined by

    (f g)(x) = f(g(x)) for all x X.

  • Chapter 7

    LINEAR TRANSFORMATIONS

    7.1 Introduction to linear maps

    Definition 1. Let V andW be two vector spaces over the same field F. A functionT : V W is called a linear map or linear transformation if the following twoconditions are satisfied.

    Addition Condition. T (v + v) = T (v) + T (v) for all v,v V , andScalar Multiplication Condition. T (v) = T (v) for all F and v V .

    Proposition 1. If T : V W is a linear map, then

    1. T (0) = 0 and

    2. T (v) = T (v) for all v V .

    Theorem 2. A function T : V W is a linear map if and only if for all 1, 2 F and v1,v2 V

    T (1v1 + 2v2) = 1T (v1) + 2T (v2). (#)

    Theorem 3. If T is a linear map with domain V and S is a set of vectors in V , then the functionvalue of a linear combination of S is equal to the corresponding linear combination of the functionvalues of S, that is, if S = {v1, . . . ,vn} and 1,. . .,n are scalars, then

    T (1v1 + + nvn) = 1T (v1) + + nT (vn).

    Theorem 4. For a linear map T : V W , the function values for every vector in the domain areknown if and only if the function values for a basis of the domain are known.

    Further, if B = {v1, . . . ,vn} is a basis for the domain V then for all v V we have

    T (v) = x1T (v1) + + xnT (vn),

    where x1, . . . , xn are the scalars in the unique linear combination v = x1v1+ +xnvn of the basisB.

    9

  • 10 CHAPTER 7. LINEAR TRANSFORMATIONS

    7.2 Linear maps from Rn to Rm and m n matricesTheorem 1. For each m n matrix A, the function TA : Rn Rm, defined by

    TA(x) = Ax for x Rn,is a linear map.

    Theorem 2 (Matrix Representation Theorem). Let T : Rn Rm be a linear map and let thevectors ej for 1 6 j 6 n be the standard basis vectors for R

    n. Then the m n matrix A whosecolumns are given by

    aj = T (ej) for 1 6 j 6 n

    has the property thatT (x) = Ax for all x Rn.

    7.3 Geometric examples of linear transformations

    Proposition 1. Suppose that T : Rn Rm is a linear map. It maps a line in Rn to either aline or a point in Rm.

    7.4 Subspaces associated with linear maps

    7.4.1 The kernel of a map

    The kernel of a map.

    Definition 1. Let T : V W be a linear map. Then the kernel of T (writtenker(T )) is the set of all zeroes of T , that is, it is the subset of the domain V definedby

    ker(T ) = {v V : T (v) = 0}.

    Definition 2. For an mn matrix A, the kernel of A is the subset of Rn definedby

    ker(A) = {x Rn : Ax = 0} ,that is, it is the set of all solutions of the homogeneous equation Ax = 0.

    Theorem 1. If T : V W is a linear map, then ker(T ) is a subspace of the domain V .

    Definition 3. The nullity of a linear map T is the dimension of ker(T ). Thenullity of a matrix A is the dimension of ker(A).

    Proposition 2. Let A be an m n matrix with real entries and TA : Rn Rm the associatedlinear transformation. Then

    ker(TA) = ker(A)

  • 7.4. SUBSPACES ASSOCIATED WITH LINEAR MAPS 11

    Proposition 3. For a matrix A:

    nullity(A) = maximum number of independent vectors in the solution space of Ax = 0

    = number of parameters in the solution of Ax = 0 obtained by Gaussian

    elimination and back substitution

    = number of non-leading columns in an equivalent row-echelon form U for A.

    Proposition 4. The columns of a matrix A are linearly independent if and only if nullity(A) = 0.

    7.4.2 Image

    Definition 4. Let T : V W be a linear map. Then the image of T is the setof all function values of T , that is, it is the subset of the codomain W defined by

    im(T ) = {w W : w = T (v) for some v V }.

    Definition 5. The image of an m n matrix A is the subset of Rm defined byim(A) = {b Rm : b = Ax for some x Rn} .

    Theorem 5. Let T : V W be a linear map between vector spaces V and W . Then im(T ) is asubspace of the codomain W of T .

    Proposition 6. Let A be an m n matrix with real entries and TA : Rn Rm the associatedlinear transformation. Then

    im(A) = im(TA)

    Definition 6. The rank of a linear map T is the dimension of im(T ). The rankof a matrix A is the dimension of im(A).

    Proposition 7. For a matrix A:

    rank(A) = maximal number of linearly independent columns of A

    = number of leading columns in a row-echelon form U for A

    7.4.3 Rank, nullity and solutions of Ax = b

    Theorem 8 (Rank-Nullity Theorem for Matrices). For any matrix A,

    rank(A) + nullity(A) = number of columns of A.

    Theorem 9 (Rank-Nullity Theorem). Suppose V and W are finite dimensional vector spaces andT : V W is linear. Then

    rank(T ) + nullity(T ) = dim(V ).

  • 12 CHAPTER 7. LINEAR TRANSFORMATIONS

    Theorem 10. The equation Ax = b has:

    1. no solution if rank(A) 6= rank([A|b]), and

    2. at least one solution if rank(A) = rank([A|b]). Further,i) if nullity(A) = 0 the solution is unique, whereas,

    ii) if nullity(A) = > 0, then the general solution is of the form

    x = xp + 1k1 + + k for 1, . . . , R,

    where xp is any solution of Ax = b, and where {k1, . . . ,k} is a basis for ker(A).

    7.5 Further applications and examples of linear maps

    7.10 One-to-one, onto and inverses for functions

    Definition 1. The range or image of a function is the set of all function values,that is, for a function f : X Y ,

    im(f) = {y Y : y = f(x) for some x X}.

    Definition 2. A function is said to be onto (or surjective) if the codomain isequal to the image of the function, that is, a function f : X Y is onto if for ally Y there exists an x X such that y = f(x).

    Definition 3. A function is said to be one-to-one (or injective) if no point inthe codomain is the function value of more than one point in the domain, that is, afunction f : X Y is one-to-one if f(x1) = f(x2) if and only if x1 = x2.

    Definition 4. Let f : X Y be a function. Then a function g : Y X is calledan inverse of f if it satisfies the two conditions:

    a) g f = idX , where idX is the identity function in X with the property thatidX(x) = x for all x X.

    b) f g = idY , where idY is the identity function in Y with the property thatidY (y) = y for all y Y .

    Theorem 1. A function has an inverse if and only if the function is both one-to-one and onto.

  • Chapter 8

    EIGENVALUES AND

    EIGENVECTORS

    8.1 Definitions and examples

    Definition 1. Let T : V V be a linear map. Then if a scalar and non-zerovector v V satisfy

    T (v) = v,

    then is called an eigenvalue of T and v is called an eigenvector of T for theeigenvalue .

    Note. An eigenvector is non-zero, but zero can be an eigenvalue.

    Definition 2. Let A Mnn(C) be a square matrix. Then if a scalar C andnon-zero vector v Cn satisfy

    Av = v,

    then is called an eigenvalue of A and v is called an eigenvector of A for theeigenvalue .

    8.1.1 Some fundamental results

    Theorem 1. A scalar is an eigenvalue of a square matrix A if and only if det(A I) = 0, andthen v is an eigenvector of A for the eigenvalue if and only if v is a non-zero solution of thehomogeneous equation (A I)v = 0, i.e., if and only if v ker(A I) and v 6= 0.

    Theorem 2. If A is an n n matrix and C, then det(A I) is a complex polynomial ofdegree n in .

    Definition 3. For a square matrix A, the polynomial p() = det(A I) is calledthe characteristic polynomial for the matrix A.

    13

  • 14 CHAPTER 8. EIGENVALUES AND EIGENVECTORS

    Theorem 3. An n n matrix A has exactly n eigenvalues in C (counted according to theirmultiplicities). These eigenvalues are the zeroes of the characteristic polynomial p() = det(AI).Note.

    1. The equation p() = 0 is called the characteristic equation for A.

    2. Theorem 3 is of fundamental theoretical importance, as it proves the existence of eigenvaluesof a matrix. However, with the exception of 2 2 and specially constructed larger matrices,modern methods of finding eigenvalues of matrices do not make use of this theorem. Theseefficient modern methods are currently available in standard matrix software packages suchas MAPLE, MATLAB.

    8.1.2 Calculation of eigenvalues and eigenvectors

    8.2 Eigenvectors, bases, and diagonalisation

    Theorem 1. If an n n matrix has n distinct eigenvalues then it has n linearly independenteigenvectors.

    Note. Even if the n n matrix does not have n distinct eigenvalues, it may have n linearlyindependent eigenvectors.

    Theorem 2. If an n n matrix A has n linearly independent eigenvectors, then there exists aninvertible matrix M and a diagonal matrix D such that

    M1AM = D.

    Further, the diagonal elements of D are the eigenvalues of A and the columns of M are the eigen-vectors of A, with the jth column of M being the eigenvector corresponding to the jth element ofthe diagonal of D.

    Conversely if M1AM = D with D diagonal then the columns of M are n linearly independenteigenvectors of A.

    Definition 1. A square matrix A is said to be a diagonalisable matrix if thereexists an invertible matrix M and diagonal matrix D such that M1AM = D.

    8.3 Applications of eigenvalues and eigenvectors

    8.3.1 Powers of A

    Proposition 1. Let D be the diagonal matrix

    D =

    1 0 . . . 00 2 0...

    . . ....

    0 0 . . . n

    .

  • 8.3. APPLICATIONS OF EIGENVALUES AND EIGENVECTORS 15

    Then, for k > 1,

    Dk =

    k1

    0 . . . 00 k

    20

    .... . .

    ...0 0 . . . kn

    .

    Proposition 2. If A is diagonalisable, that is, if there exists an invertible matrix M and diagonalmatrix D such that M1AM = D, then

    Ak =MDkM1 for integer k > 1.

    8.3.2 Solution of first-order linear differential equations

    Proposition 3. y(t) = vet is a solution of

    dy

    dt= Ay

    if and only if is an eigenvalue of A and v is an eigenvector for the eigenvalue .

    Proposition 4. If u1(t) and u2(t) are two solutions of the equation

    dy

    dt= Ay,

    then any linear combination of u1 and u2 is also a solution.

  • 16 CHAPTER 8. EIGENVALUES AND EIGENVECTORS

  • Chapter 9

    INTRODUCTION TO

    PROBABILITY AND STATISTICS

    9.1 Some Preliminary Set Theory

    Definition 1. A set is a collection of objects. These objects are called elements.

    Definition 2.

    A set A is a subset of a set B (written A B) if and only ifeach element of A is also an element of B; that is, if x A, then x B.

    The power set P(A) of A is set of all subsets of A. The universal set S is the set that denotes all objects of given interest. The empty set (or {}) is the set with no elements.

    Definition 3. A set S is countable if its elements can be listed as a sequence.

    Definition 4. For all subsets A,B S, define the following set operations:

    complement of A: Ac = {x S : x / A} intersection of A and B: A B = {x S : x A and x B} union of A and B: A B = {x S : x A or x B} difference: AB = {x S : x A but x / B} = A Bc

    Definition 5. Sets A and B are disjoint (or mutually exclusive) if and only if

    A B =

    17

  • 18 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS

    Definition 6. Disjoint subsets A1, . . . , Ak partition a set B if and only if

    A1 Ak = B

    Lemma 1. If A1, . . . , An partition S and B is a subset of S, then A1 B, . . . , An B partition B.

    Definition 7. If A is a set, then |A| is the number of elements in A.

    The Inclusion-Exclusion Principle. |A B| = |A|+ |B| |A B|

    9.2 Probability

    9.2.1 Sample Space and Probability Axioms

    Definition 1. A sample space of an experiment is a set of all possible outcomes.

    Definition 2. A probability P on a sample space S is any real function on P(S)that satisfies the following conditions:

    (a) 0 6 P (A) 6 1 for all A S;(b) P () = 0;(c) P (S) = 1;

    (d) If A and B are disjoint, then P (A B) = P (A) + P (B).

    Theorem 1. Let P be a probability on a sample space S, and let A be an event in S.

    1. If S is finite (or countable), then P (A) =aA

    P ({a}) .

    2. If S is finite and P ({a}) is constant for all outcomes a S, then P (A) = |A||S| .3. If S is finite (or countable), then

    aS

    P ({a}) = 1 .

    9.2.2 Rules for Probabilities

    Theorem 2. Let A and B be events of a sample space S.

    1. P (A B) = P (A) + P (B) P (A B) (Addition Rule)2. P (Ac) = 1 P (A)3. If A B, then P (A) 6 P (B).

  • 9.2. PROBABILITY 19

    9.2.3 Conditional Probabilities

    We now consider what happens if we restrict the sample space from S to some event in S.

    Definition 3. The conditional probability of A given B is denoted and definedby

    P (A|B) = P (A B)P (B)

    provided that P (B) 6= 0 .

    Lemma 3. For any fixed event B, the function P (A|B) is a probability on S.

    Multiplication Rule P (A B) = P (A|B)P (B) = P (B|A)P (A)

    Total Probability Rule

    If A1, . . . , An partition S and B is an event, then P (B) =ni=1

    P (B|Ai)P (Ai) .

    Bayes Rule

    If A1, . . . , An partition S and B is an event, then P (Aj |B) = P (B|Aj)P (Aj)ni=1

    P (B|Ai)P (Ai).

    9.2.4 Statistical Independence

    Definition 4. Events A and B are (statistically) independent if and only if

    P (A B) = P (A)P (B)

    Definition 5. Events A1, . . . , An are mutually independent if and only if, forany Ai1 , . . . , Aik of these,

    P (Ai1 Aik) = P (Ai1) P (Aik) .

    Theorem 4. If events A1, . . . , An are mutually independent and Bi is either Ai or Aci for each

    i = 1, . . . , n, then B1, . . . , Bn are also mutually independent.

    Theorem 5. If events A1,1, . . . , A1,n1 , A2,1, . . . , Am,nm are mutually independent and for eachi = 1, . . . ,m, the event Bi is obtained from Ai,1, . . . , Ai,ni by taking unions, intersections, andcomplements, then B1, . . . , Bn are also mutually independent.

  • 20 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS

    9.3 Random Variables

    Definition 1. A random variable is a real function defined on a sample space.

    Definition 2. For a random variable X on some sample space S, define for allsubsets A S and real numbers r R,

    {X A} = {s S : X(s) A} {X = r} = {s S : X(s) = r} {X 6 r} = {s S : X(s) 6 r} ... and so on.

    Definition 3. The cumulative distribution function of a random variable Xis given by

    FX(x) = P (X 6 x) for x R .

    9.3.1 Discrete Random Variables

    Definition 4. A random variable X is discrete if its image is countable.

    Definition 5. The probability distribution of a discrete random variable X isthe function P (X = x) on R. We sometimes write this as pk = P (X = xk).

    9.3.2 The Mean and Variance of a Discrete Random Variable

    Definition 6. The expected value (or mean) of a discrete random variable Xwith probability distribution pk

    E(X) =all k

    xkpk .

    The expected value E(X) is often denoted by or X .

    Theorem 1. Let X be a discrete random variable with probability distribution pk = P (X = xk).Then for any real function g(x), the expected value of Y = g(X) is

    E(Y ) = E(g(X)) =k

    g(xk)pk .

  • 9.4. SPECIAL DISTRIBUTIONS 21

    Definition 7. The variance of a discrete random variable X is

    Var(X) = E((X E(X))2) .

    The standard deviation of X is SD(X) =Var(X) .

    The standard deviation is often denoted by or X , and the variance is often written as 2 or 2X .

    Theorem 2. Var(X) = E(X2) (E(X))2.Theorem 3. If a and b are constants, then

    E(aX + b) = aE(X) + b

    Var(aX + b) = a2Var(X)

    SD(aX + b) = |a|SD(X) .

    9.4 Special Distributions

    A Bernoulli trial is an experiment with two outcomes, often success and failure, or Y(es) andN(o), or {1, 0}, where P (Y ) and P (N) are denoted by p and q = 1 p, respectively. A Bernoulliprocess is an experiment composed of a sequence of identical and mutually independent Bernoullitrials.

    9.4.1 The Binomial Distribution

    Definition 1. The Binomial distribution B(n, p) for n N is the functionB(n, p, k) =

    (n

    k

    )pk(1 p)nk where k = 0, 1, . . . , n .

    Theorem 1. If X is the random variable that counts the successes of some Bernoulli process withn trials having success probability p, then X has the binomial distribution B(n, p).

    We write X B(n, p) to denote that X is a random variable with this distribution.Theorem 2. If X is a random variable and X B(n, p), then

    E(X) = np ; Var(X) = npq = np(1 p).

    9.4.2 Geometric Distribution

    Definition 2. The Geometric distribution G(p) is the function

    G(p, k) = (1 p)k1p = qk1p where k = 1, 2, . . . .

    Theorem 3. Consider an infinite Bernoulli process of trials each of which has success probability p.If the random variable X is the number of trials conducted until success occurs for the first time,then X has the geometric distribution G(p).

  • 22 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS

    Theorem 4. If X G(p) and n is a positive integer, then P (X > n) = (1 p)n = qn.

    Corollary 5. If X G(p), then the cumulative distribution function F is given by F (x) = P (X 6x) = 1 (1 p)x = 1 qx for x R.

    Note that x denotes the largest integer less or equal to x.

    Theorem 6. If X is a random variable and X G(p), then E(X) = 1

    p;

    Var(X) = 1pp2

    .

    9.4.3 Sign Tests

    Often, we have a sample of data consisting of independent observations of some quantity of interest,and it might be of interest to see whether the observed values differ systematically from some fixedand pre-determined value.

    To answer this question, one may use a sign test approach as follows:

    1. Count the number of observations that are strictly greater than the target value (+).

    2. Count the total number of observations that are either strictly greater (+) or strictly smaller() than the target value.

    3. Calculate the tail probability that measures how often one would expect to observe as manyincreases (+) as were observed, if there were equal probability of + and .

    In this course, we will say that if the tail probability is less than 5% then we will regard this assignificant.

    9.5 Continuous random variables

    Definition 1. Random variable X is continuous if and only if FX(x) is continuous.

    Strictly speaking, FX(x) must actually be piecewise differentiable, which means that FX(x) isdifferentiable except for at most countably many points. However, the above definition is goodenough for our present purposes.

    Definition 2. The probability density function f(x) of a continuous randomvariable X is defined by

    f(x) = fX(x) =d

    dxF (x) , x R

    if F (x) is differentiable, and limxa

    d

    dxF (x) if F (x) is not differentiable at x = a.

    Theorem 1. F (x) =

    x

    f(t)dt.

  • 9.6. SPECIAL CONTINUOUS DISTRIBUTIONS 23

    9.5.1 The mean and variance of a continuous random variable

    Definition 3. The expected value (ormean) of a continuous random variable Xwith probability density function f(x) is defined to be

    = E(X) =

    xf(x)dx .

    Theorem 2. If X is a continuous random variable with density function f(x), and g(x) is a realfunction, then the expected value of Y = g(X) is

    E(Y ) = E(g(X)) =

    g(x)f(x)dx .

    Definition 4. The variance of a continuous random variable X is

    Var (X) = E((X E(X))2) = E(X2) (E(X))2 .The standard deviation of X is = SD(X) =

    Var (X).

    Theorem 3. If a and b are constants, then

    E(aX + b) = aE(X) + b

    Var(aX + b) = a2Var(X)

    SD(aX + b) = |a|SD(X) .

    Theorem 4. If E(X) = and Var(X) = 2, and Z =X

    , then E(Z) = 0 and Var(Z) = 1.

    9.6 Special Continuous Distributions

    9.6.1 The Normal Distribution

    Definition 1. A continuous random variableX has normal distributionN(, 2)if it has probability density

    (x) =12pi2

    e1

    2

    (x

    )2where < x