Summary 1231 Alg
-
Upload
benjamin-brown -
Category
Documents
-
view
46 -
download
3
description
Transcript of Summary 1231 Alg
-
A Summary of
MATH1231 Mathematics 1B
ALGEBRA NOTES
For MATH1231 students, this summary is an extract from the Algerba Notes in the MATH1231/41Course Pack for revision.
If you found any mistake or typo, pleaase send me an email to [email protected]
-
ii
-
Chapter 6
VECTOR SPACES
6.1 Definitions and examples of vector spaces
Definition 1. A vector space V over the field F is a non-empty set V of vectorson which addition of vectors is defined and multiplication by a scalar is defined insuch a way that the following ten fundamental properties are satisfied:
1. Closure under Addition. If u,v V , then u+ v V .2. Associative Law of Addition. If u,v,w V , then (u+v)+w = u+(v+w).3. Commutative Law of Addition. If u,v V , then u+ v = v + u.4. Existence of Zero. There exists an element 0 V such that, for all v V ,
v+ 0 = v.
5. Existence of Negative. For each v V there exists an element w V(usually written as v), such that v +w = 0.
6. Closure under Multiplication by a Scalar. If v V and F, thenv V .
7. Associative Law of Multiplication by a Scalar. If , F and v V ,then (v) = ()v.
8. If v V and 1 F is the scalar one, then 1v = v.9. Scalar Distributive Law. If , F and v V , then (+ )v = v+ v.10. Vector Distributive Law. If F and u,v V , then (u+v) = u+v.
The following are vectors spaces.
Rn over R, where n is a positive integer. Cn over C, where n is a positive integer. P(F), Pn(F) over F, where F is a field. Usually F is either Q, R or C.
1
-
2 CHAPTER 6. VECTOR SPACES
Mmn(F) over F, where m, n are positive integers and F is a field.
Furthermore, the following set, its subset of all continuous function and its subset of all differentiablefunctions are vector spaces over R.
R[X], the set of all possible real-valued functions with domain X.
6.2 Vector arithmetic
Proposition 1. In any vector space V , the following properties hold for addition.
1. Uniqueness of Zero. There is one and only one zero vector.
2. Cancellation Property. If u,v,w V satisfy u+ v = u+w, then v = w.
3. Uniqueness of Negatives. For all v V , there exists only one w V such that v+w = 0.
Proposition 2. Suppose that V is a vector space over a field F, F, v V , 0 is the zero scalarin F and 0 is the zero vector in V . Then the following properties hold for multiplication by a scalar:
1. Multiplication by the zero scalar. 0v = 0,
2. Multiplication of the zero vector. 0 = 0.
3. Multiplication by 1. (1)v = v (the additive inverse of v).
4. Zero products. If v = 0, then either = 0 or v = 0.
5. Cancellation Property. If v = v and v 6= 0 then = .
6.3 Subspaces
Definition 1. A subset S of a vector space V is called a subspace of V if S isitself a vector space over the same field of scalars as V and under the same rules foraddition and multiplication by scalars.
In addition if there is at least one vector in V which is not contained in S, thesubspace S is called a proper subspace of V .
Theorem 1 (Subspace Theorem). A subset S of a vector space V over a field F, under the samerules for addition and multiplication by scalars, is a subspace of V if and only if
i) The vector 0 in V also belongs to S.
ii) S is closed under vector addition, and
iii) S is closed under multiplication by scalars from F.
-
6.4. LINEAR COMBINATIONS AND SPANS 3
6.4 Linear combinations and spans
Definition 1. Let S = {v1, . . . ,vn} be a finite set of vectors in a vector space Vover a field F. Then a linear combination of S is a sum of scalar multiples of theform
1v1 + + nvn with 1, . . . , n F.
Proposition 1 (Closure under Linear Combinations). If S is a finite set of vectors in a vectorspace V , then every linear combination of S is also a vector in V .
Definition 2. Let S = {v1, . . . ,vn} be a finite set of vectors in a vector space Vover a field F. Then the span of the set S is the set of all linear combinations of S,that is,
span (S) = span (v1, . . . ,vn)
= {v V : v = 1v1 + + nvn for some 1, . . . , n F}.
Theorem 2 (A span is a subspace). If S is a finite, non-empty set of vectors in a vector space V ,then span(S) is a subspace of V . Further, span(S) is the smallest subspace containing S (in thesense that span(S) is a subspace of every subspace which contains S).
Definition 3. A finite set S of vectors in a vector space V is called a spanningset for V if span (S) = V or equivalently, if every vector in V can be expressed asa linear combination of vectors in S.
6.4.1 Matrices and spans in Rm
Proposition 3 (Matrices, Linear Combinations and Spans). If S = {v1, . . . ,vn} is a set ofvectors in Rm and A is the m n matrix whose columns are the vectors v1, . . . ,vn then
a) a vector b in Rm can be expressed as a linear combination of S if and only if it can beexpressed in the form Ax for some x in Rn,
b) a vector b in Rm belongs to span (S) if and only if the equation Ax = b has a solution x inRn.
Definition 4. The subspace of Rm spanned by the columns of an mn matrix Ais called the column space of A and is denoted by col(A).
-
4 CHAPTER 6. VECTOR SPACES
6.4.2 Solving problems about spans
6.5 Linear independence
Definition 1. Suppose that S = {v1, . . . , vn} is a subset of a vector space. Theset S is a linearly independent set if the only values of the scalars 1, 2, . . . , nfor which
1v1 + + nvn = 0 are 1 = 2 = = n = 0.
Definition 2. Suppose that S = {v1, . . . , vn} is a subset of a vector space. Theset S = {v1, . . . ,vn} is a linearly dependent set if it is not a linearly independentset, that is, if there exist scalars 1, . . . , n, not all zero, such that
1v1 + + nvn = 0.
6.5.1 Solving problems about linear independence
We have seen that questions about spans in Rm can be answered by relating them to questionsabout the existence of solutions for systems of linear equations. The same is true for questionsabout linear dependence in Rm.
Proposition 1. If S = {a1, . . . ,an} is a set of vectors in Rm and A is the m n matrix whosecolumns are the vectors a1, . . . ,an then the set S is linearly dependent if and only if the systemAx = 0 has at least one non-zero solution x Rn.
6.5.2 Uniqueness and linear independence
Theorem 2 (Uniqueness of Linear Combinations). Let S be a finite, non-empty set of vectorsin a vector space and let v be a vector which can be written as a linear combination of S. Thenthe values of the scalars in the linear combination for v are unique if and only if S is a linearlyindependent set.
6.5.3 Spans and linear independence
Theorem 3. A set of vectors S is a linearly independent set if and only if no vector in S can bewritten as a linear combination of the other vectors in S, that is, if and only if no vector in S isin the span of the other vectors in S.
Note. The theorem is equivalent to:A set of vectors S is a linearly dependent set if and only if at least one vector in S is in thespan of the other vectors in S.
Theorem 4. If S is a finite subset of a vector space V and the vector v is in V , thenspan (S {v}) = span (S) if and only if v span (S).
Theorem 5. Suppose that S is a finite subset of a vector space. The span of every proper subsetof S is a proper subspace of span (S) if and only if S is a linearly independent set.
Theorem 6. If S is a finite linearly independent subset of a vector space V and v is in V but notin span (S) then S {v} is a linearly independent set.
-
6.6. BASIS AND DIMENSION 5
6.6 Basis and dimension
6.6.1 Bases
Definition 1. A set of vectors B in a vector space V is called a basis for V if:
1. B is a linearly independent set, and
2. B is a spanning set for V (that is, span (B) = V ).
Let B = {v1, . . . , vn} be a basis for a vector space V over F. Every vector v V can beuniquely written as
v = 1v1 + + nvn, where 1, . . . , n F.
An orthonormal basis is a basis whose elements are all of length 1 and are mutually orthogonal.The advantage of using an orthonormal basis is that we can write easily any vector as the uniquelinear combination of the basis by dot product.
6.6.2 Dimension
Theorem 1. The number of vectors in any spanning set for a vector space V is always greaterthan or equal to the number of vectors in any linearly independent set in V .
Theorem 2. If a vector space V has a finite basis then every set of basis vectors for V containsthe same number of vectors, that is, if B1 = {u1, . . . ,um} and B2 = {v1, . . . ,vn} are two bases forthe same vector space V then m = n.
Definition 2. If V is a vector space with a finite basis, the dimension of V ,denoted by dim(V ), is the number of vectors in any basis for V . V is called a finitedimensional vector space.
Theorem 3. Suppose that V is a finite dimensional vector space.
1. the number of vectors in any spanning set for V is greater than or equal to the dimension ofV ;
2. the number of vectors in any linearly independent set in V is less than or equal to the dimen-sion of V ;
3. if the number of vectors in a spanning set is equal to the dimension then the set is also alinearly independent set and hence a basis for V ;
4. if the number of vectors in a linearly independent set is equal to the dimension then the setis also a spanning set and hence a basis for V .
-
6 CHAPTER 6. VECTOR SPACES
6.6.3 Existence and construction of bases
Theorem 4. If S is a finite non-empty subset of a vector space then S contains a subset which isa basis for span (S).
In particular, if V is any non-zero vector space which can be spanned by a finite set of vectorsthen V has a basis.
Theorem 5. Suppose that V is a vector space which can be spanned by a finite set of vectors. If Sis a linearly independent subset of V then there exists a basis for V which contains S as a subset.In other words, every linearly independent subset of V can be extended to a basis for V .
Theorem 6 (Reducing a spanning set to a basis in Rm). Suppose that S = {v1, . . . ,vn} is anysubset of Rm and A is the matrix whose columns are the members of S. If U is a row-echelon formfor A and S is created from S by deleting those vectors which correspond to non-leading columnsin U then S is a basis for span (S).
Theorem 7 (Extending a linearly independent set to a basis in Rm).Suppose that S = {v1, . . . ,vn} is a linearly independent subset of Rm and A is the matrix whosecolumns are the members of S followed by the members of the standard basis for Rm. If U is a row-echelon form for A and S is created by choosing those columns of A which correspond to leadingcolumns in U then S is a basis for Rm containing S as a subset.
Proposition 8. If V is a finite-dimensional space andW is a subspace of V and dim(W ) = dim(V )then W = V .
Proof. By Theorem 4, there exists B a basis for W . So B is a linearly independent set in V . ByTheorem 3 part 4, B is also a basis for V and V = span (B) =W .
6.9 A brief review of set and function notation
6.9.1 Set notation.
A set is any collection of elements. Sets are usually defined either by listing their elements or bygiving a rule for selection of the elements. The elements of a set are usually enclosed in braces {}.
Definition 1. Two sets A and B are equal (notation A = B) if every element ofA is an element of B, and if every element of B is an element of A.
To prove that A = B it is necessary to prove that the two conditions:
1. if x A then x B, and
2. if x B then x A
are both satisfied.
Definition 2. A set A is a subset of another set B (notation A B) if everyelement of A is also an element of B.
-
6.9. A BRIEF REVIEW OF SET AND FUNCTION NOTATION 7
To prove that A B it is necessary to prove that the condition:if x A then x B
is satisfied.
Definition 3. A is said to be a proper subset of B if A is a subset of B and atleast one element of B is not an element of A.
To prove that A is a proper subset of B it is necessary to prove that the two conditions:
1. if x A then x B, and
2. for some x B, x is not an element of Aare both satisfied.
Definition 4. The intersection of two sets A and B (notation: AB) is the setof elements which are common to both sets.
That is,A B = {x : x A and x B}.
Definition 5. The union of two sets A and B (notation: A B) is the set of allelements which are in either or both sets.
That is,A B = {x : x A or x B}.
6.9.2 Function notation
The notation f : X Y (which is read as f is a function (or map) from the set X to the set Y )means that f is a rule which associates exactly one element y Y to each element x X. The yassociated with x is written as y = f(x) and is called the value of the function f at x or theimage of x under f. The set X is often called the domain of the function f and the set Y is oftencalled the codomain of the function f .
Equality of Functions. Two functions f : X Y and g : X Y are defined to be equal if andonly if f(x) = g(x) for all x X.Addition of Functions. If f : X Y and g : X Y , and if elements of Y can be added, thenthe sum function f + g is defined by
(f + g)(x) = f(x) + g(x) for all x X.
Multiplication by a Scalar. If f : X Y and F, where F is a field, and if elements of Ycan be multiplied by elements of F, then the function f is defined by
(f)(x) = (f(x)
)for all x X.
-
8 CHAPTER 6. VECTOR SPACES
Multiplication of Functions. If f : X Y and g : X Y , and if elements of Y can bemultiplied, then the product function fg is defined by
(fg)(x) = f(x)g(x) for all x X.
Composition of Functions. If g : X W and f : W Y , then the composition functionf g : X Y is defined by
(f g)(x) = f(g(x)) for all x X.
-
Chapter 7
LINEAR TRANSFORMATIONS
7.1 Introduction to linear maps
Definition 1. Let V andW be two vector spaces over the same field F. A functionT : V W is called a linear map or linear transformation if the following twoconditions are satisfied.
Addition Condition. T (v + v) = T (v) + T (v) for all v,v V , andScalar Multiplication Condition. T (v) = T (v) for all F and v V .
Proposition 1. If T : V W is a linear map, then
1. T (0) = 0 and
2. T (v) = T (v) for all v V .
Theorem 2. A function T : V W is a linear map if and only if for all 1, 2 F and v1,v2 V
T (1v1 + 2v2) = 1T (v1) + 2T (v2). (#)
Theorem 3. If T is a linear map with domain V and S is a set of vectors in V , then the functionvalue of a linear combination of S is equal to the corresponding linear combination of the functionvalues of S, that is, if S = {v1, . . . ,vn} and 1,. . .,n are scalars, then
T (1v1 + + nvn) = 1T (v1) + + nT (vn).
Theorem 4. For a linear map T : V W , the function values for every vector in the domain areknown if and only if the function values for a basis of the domain are known.
Further, if B = {v1, . . . ,vn} is a basis for the domain V then for all v V we have
T (v) = x1T (v1) + + xnT (vn),
where x1, . . . , xn are the scalars in the unique linear combination v = x1v1+ +xnvn of the basisB.
9
-
10 CHAPTER 7. LINEAR TRANSFORMATIONS
7.2 Linear maps from Rn to Rm and m n matricesTheorem 1. For each m n matrix A, the function TA : Rn Rm, defined by
TA(x) = Ax for x Rn,is a linear map.
Theorem 2 (Matrix Representation Theorem). Let T : Rn Rm be a linear map and let thevectors ej for 1 6 j 6 n be the standard basis vectors for R
n. Then the m n matrix A whosecolumns are given by
aj = T (ej) for 1 6 j 6 n
has the property thatT (x) = Ax for all x Rn.
7.3 Geometric examples of linear transformations
Proposition 1. Suppose that T : Rn Rm is a linear map. It maps a line in Rn to either aline or a point in Rm.
7.4 Subspaces associated with linear maps
7.4.1 The kernel of a map
The kernel of a map.
Definition 1. Let T : V W be a linear map. Then the kernel of T (writtenker(T )) is the set of all zeroes of T , that is, it is the subset of the domain V definedby
ker(T ) = {v V : T (v) = 0}.
Definition 2. For an mn matrix A, the kernel of A is the subset of Rn definedby
ker(A) = {x Rn : Ax = 0} ,that is, it is the set of all solutions of the homogeneous equation Ax = 0.
Theorem 1. If T : V W is a linear map, then ker(T ) is a subspace of the domain V .
Definition 3. The nullity of a linear map T is the dimension of ker(T ). Thenullity of a matrix A is the dimension of ker(A).
Proposition 2. Let A be an m n matrix with real entries and TA : Rn Rm the associatedlinear transformation. Then
ker(TA) = ker(A)
-
7.4. SUBSPACES ASSOCIATED WITH LINEAR MAPS 11
Proposition 3. For a matrix A:
nullity(A) = maximum number of independent vectors in the solution space of Ax = 0
= number of parameters in the solution of Ax = 0 obtained by Gaussian
elimination and back substitution
= number of non-leading columns in an equivalent row-echelon form U for A.
Proposition 4. The columns of a matrix A are linearly independent if and only if nullity(A) = 0.
7.4.2 Image
Definition 4. Let T : V W be a linear map. Then the image of T is the setof all function values of T , that is, it is the subset of the codomain W defined by
im(T ) = {w W : w = T (v) for some v V }.
Definition 5. The image of an m n matrix A is the subset of Rm defined byim(A) = {b Rm : b = Ax for some x Rn} .
Theorem 5. Let T : V W be a linear map between vector spaces V and W . Then im(T ) is asubspace of the codomain W of T .
Proposition 6. Let A be an m n matrix with real entries and TA : Rn Rm the associatedlinear transformation. Then
im(A) = im(TA)
Definition 6. The rank of a linear map T is the dimension of im(T ). The rankof a matrix A is the dimension of im(A).
Proposition 7. For a matrix A:
rank(A) = maximal number of linearly independent columns of A
= number of leading columns in a row-echelon form U for A
7.4.3 Rank, nullity and solutions of Ax = b
Theorem 8 (Rank-Nullity Theorem for Matrices). For any matrix A,
rank(A) + nullity(A) = number of columns of A.
Theorem 9 (Rank-Nullity Theorem). Suppose V and W are finite dimensional vector spaces andT : V W is linear. Then
rank(T ) + nullity(T ) = dim(V ).
-
12 CHAPTER 7. LINEAR TRANSFORMATIONS
Theorem 10. The equation Ax = b has:
1. no solution if rank(A) 6= rank([A|b]), and
2. at least one solution if rank(A) = rank([A|b]). Further,i) if nullity(A) = 0 the solution is unique, whereas,
ii) if nullity(A) = > 0, then the general solution is of the form
x = xp + 1k1 + + k for 1, . . . , R,
where xp is any solution of Ax = b, and where {k1, . . . ,k} is a basis for ker(A).
7.5 Further applications and examples of linear maps
7.10 One-to-one, onto and inverses for functions
Definition 1. The range or image of a function is the set of all function values,that is, for a function f : X Y ,
im(f) = {y Y : y = f(x) for some x X}.
Definition 2. A function is said to be onto (or surjective) if the codomain isequal to the image of the function, that is, a function f : X Y is onto if for ally Y there exists an x X such that y = f(x).
Definition 3. A function is said to be one-to-one (or injective) if no point inthe codomain is the function value of more than one point in the domain, that is, afunction f : X Y is one-to-one if f(x1) = f(x2) if and only if x1 = x2.
Definition 4. Let f : X Y be a function. Then a function g : Y X is calledan inverse of f if it satisfies the two conditions:
a) g f = idX , where idX is the identity function in X with the property thatidX(x) = x for all x X.
b) f g = idY , where idY is the identity function in Y with the property thatidY (y) = y for all y Y .
Theorem 1. A function has an inverse if and only if the function is both one-to-one and onto.
-
Chapter 8
EIGENVALUES AND
EIGENVECTORS
8.1 Definitions and examples
Definition 1. Let T : V V be a linear map. Then if a scalar and non-zerovector v V satisfy
T (v) = v,
then is called an eigenvalue of T and v is called an eigenvector of T for theeigenvalue .
Note. An eigenvector is non-zero, but zero can be an eigenvalue.
Definition 2. Let A Mnn(C) be a square matrix. Then if a scalar C andnon-zero vector v Cn satisfy
Av = v,
then is called an eigenvalue of A and v is called an eigenvector of A for theeigenvalue .
8.1.1 Some fundamental results
Theorem 1. A scalar is an eigenvalue of a square matrix A if and only if det(A I) = 0, andthen v is an eigenvector of A for the eigenvalue if and only if v is a non-zero solution of thehomogeneous equation (A I)v = 0, i.e., if and only if v ker(A I) and v 6= 0.
Theorem 2. If A is an n n matrix and C, then det(A I) is a complex polynomial ofdegree n in .
Definition 3. For a square matrix A, the polynomial p() = det(A I) is calledthe characteristic polynomial for the matrix A.
13
-
14 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
Theorem 3. An n n matrix A has exactly n eigenvalues in C (counted according to theirmultiplicities). These eigenvalues are the zeroes of the characteristic polynomial p() = det(AI).Note.
1. The equation p() = 0 is called the characteristic equation for A.
2. Theorem 3 is of fundamental theoretical importance, as it proves the existence of eigenvaluesof a matrix. However, with the exception of 2 2 and specially constructed larger matrices,modern methods of finding eigenvalues of matrices do not make use of this theorem. Theseefficient modern methods are currently available in standard matrix software packages suchas MAPLE, MATLAB.
8.1.2 Calculation of eigenvalues and eigenvectors
8.2 Eigenvectors, bases, and diagonalisation
Theorem 1. If an n n matrix has n distinct eigenvalues then it has n linearly independenteigenvectors.
Note. Even if the n n matrix does not have n distinct eigenvalues, it may have n linearlyindependent eigenvectors.
Theorem 2. If an n n matrix A has n linearly independent eigenvectors, then there exists aninvertible matrix M and a diagonal matrix D such that
M1AM = D.
Further, the diagonal elements of D are the eigenvalues of A and the columns of M are the eigen-vectors of A, with the jth column of M being the eigenvector corresponding to the jth element ofthe diagonal of D.
Conversely if M1AM = D with D diagonal then the columns of M are n linearly independenteigenvectors of A.
Definition 1. A square matrix A is said to be a diagonalisable matrix if thereexists an invertible matrix M and diagonal matrix D such that M1AM = D.
8.3 Applications of eigenvalues and eigenvectors
8.3.1 Powers of A
Proposition 1. Let D be the diagonal matrix
D =
1 0 . . . 00 2 0...
. . ....
0 0 . . . n
.
-
8.3. APPLICATIONS OF EIGENVALUES AND EIGENVECTORS 15
Then, for k > 1,
Dk =
k1
0 . . . 00 k
20
.... . .
...0 0 . . . kn
.
Proposition 2. If A is diagonalisable, that is, if there exists an invertible matrix M and diagonalmatrix D such that M1AM = D, then
Ak =MDkM1 for integer k > 1.
8.3.2 Solution of first-order linear differential equations
Proposition 3. y(t) = vet is a solution of
dy
dt= Ay
if and only if is an eigenvalue of A and v is an eigenvector for the eigenvalue .
Proposition 4. If u1(t) and u2(t) are two solutions of the equation
dy
dt= Ay,
then any linear combination of u1 and u2 is also a solution.
-
16 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
-
Chapter 9
INTRODUCTION TO
PROBABILITY AND STATISTICS
9.1 Some Preliminary Set Theory
Definition 1. A set is a collection of objects. These objects are called elements.
Definition 2.
A set A is a subset of a set B (written A B) if and only ifeach element of A is also an element of B; that is, if x A, then x B.
The power set P(A) of A is set of all subsets of A. The universal set S is the set that denotes all objects of given interest. The empty set (or {}) is the set with no elements.
Definition 3. A set S is countable if its elements can be listed as a sequence.
Definition 4. For all subsets A,B S, define the following set operations:
complement of A: Ac = {x S : x / A} intersection of A and B: A B = {x S : x A and x B} union of A and B: A B = {x S : x A or x B} difference: AB = {x S : x A but x / B} = A Bc
Definition 5. Sets A and B are disjoint (or mutually exclusive) if and only if
A B =
17
-
18 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Definition 6. Disjoint subsets A1, . . . , Ak partition a set B if and only if
A1 Ak = B
Lemma 1. If A1, . . . , An partition S and B is a subset of S, then A1 B, . . . , An B partition B.
Definition 7. If A is a set, then |A| is the number of elements in A.
The Inclusion-Exclusion Principle. |A B| = |A|+ |B| |A B|
9.2 Probability
9.2.1 Sample Space and Probability Axioms
Definition 1. A sample space of an experiment is a set of all possible outcomes.
Definition 2. A probability P on a sample space S is any real function on P(S)that satisfies the following conditions:
(a) 0 6 P (A) 6 1 for all A S;(b) P () = 0;(c) P (S) = 1;
(d) If A and B are disjoint, then P (A B) = P (A) + P (B).
Theorem 1. Let P be a probability on a sample space S, and let A be an event in S.
1. If S is finite (or countable), then P (A) =aA
P ({a}) .
2. If S is finite and P ({a}) is constant for all outcomes a S, then P (A) = |A||S| .3. If S is finite (or countable), then
aS
P ({a}) = 1 .
9.2.2 Rules for Probabilities
Theorem 2. Let A and B be events of a sample space S.
1. P (A B) = P (A) + P (B) P (A B) (Addition Rule)2. P (Ac) = 1 P (A)3. If A B, then P (A) 6 P (B).
-
9.2. PROBABILITY 19
9.2.3 Conditional Probabilities
We now consider what happens if we restrict the sample space from S to some event in S.
Definition 3. The conditional probability of A given B is denoted and definedby
P (A|B) = P (A B)P (B)
provided that P (B) 6= 0 .
Lemma 3. For any fixed event B, the function P (A|B) is a probability on S.
Multiplication Rule P (A B) = P (A|B)P (B) = P (B|A)P (A)
Total Probability Rule
If A1, . . . , An partition S and B is an event, then P (B) =ni=1
P (B|Ai)P (Ai) .
Bayes Rule
If A1, . . . , An partition S and B is an event, then P (Aj |B) = P (B|Aj)P (Aj)ni=1
P (B|Ai)P (Ai).
9.2.4 Statistical Independence
Definition 4. Events A and B are (statistically) independent if and only if
P (A B) = P (A)P (B)
Definition 5. Events A1, . . . , An are mutually independent if and only if, forany Ai1 , . . . , Aik of these,
P (Ai1 Aik) = P (Ai1) P (Aik) .
Theorem 4. If events A1, . . . , An are mutually independent and Bi is either Ai or Aci for each
i = 1, . . . , n, then B1, . . . , Bn are also mutually independent.
Theorem 5. If events A1,1, . . . , A1,n1 , A2,1, . . . , Am,nm are mutually independent and for eachi = 1, . . . ,m, the event Bi is obtained from Ai,1, . . . , Ai,ni by taking unions, intersections, andcomplements, then B1, . . . , Bn are also mutually independent.
-
20 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
9.3 Random Variables
Definition 1. A random variable is a real function defined on a sample space.
Definition 2. For a random variable X on some sample space S, define for allsubsets A S and real numbers r R,
{X A} = {s S : X(s) A} {X = r} = {s S : X(s) = r} {X 6 r} = {s S : X(s) 6 r} ... and so on.
Definition 3. The cumulative distribution function of a random variable Xis given by
FX(x) = P (X 6 x) for x R .
9.3.1 Discrete Random Variables
Definition 4. A random variable X is discrete if its image is countable.
Definition 5. The probability distribution of a discrete random variable X isthe function P (X = x) on R. We sometimes write this as pk = P (X = xk).
9.3.2 The Mean and Variance of a Discrete Random Variable
Definition 6. The expected value (or mean) of a discrete random variable Xwith probability distribution pk
E(X) =all k
xkpk .
The expected value E(X) is often denoted by or X .
Theorem 1. Let X be a discrete random variable with probability distribution pk = P (X = xk).Then for any real function g(x), the expected value of Y = g(X) is
E(Y ) = E(g(X)) =k
g(xk)pk .
-
9.4. SPECIAL DISTRIBUTIONS 21
Definition 7. The variance of a discrete random variable X is
Var(X) = E((X E(X))2) .
The standard deviation of X is SD(X) =Var(X) .
The standard deviation is often denoted by or X , and the variance is often written as 2 or 2X .
Theorem 2. Var(X) = E(X2) (E(X))2.Theorem 3. If a and b are constants, then
E(aX + b) = aE(X) + b
Var(aX + b) = a2Var(X)
SD(aX + b) = |a|SD(X) .
9.4 Special Distributions
A Bernoulli trial is an experiment with two outcomes, often success and failure, or Y(es) andN(o), or {1, 0}, where P (Y ) and P (N) are denoted by p and q = 1 p, respectively. A Bernoulliprocess is an experiment composed of a sequence of identical and mutually independent Bernoullitrials.
9.4.1 The Binomial Distribution
Definition 1. The Binomial distribution B(n, p) for n N is the functionB(n, p, k) =
(n
k
)pk(1 p)nk where k = 0, 1, . . . , n .
Theorem 1. If X is the random variable that counts the successes of some Bernoulli process withn trials having success probability p, then X has the binomial distribution B(n, p).
We write X B(n, p) to denote that X is a random variable with this distribution.Theorem 2. If X is a random variable and X B(n, p), then
E(X) = np ; Var(X) = npq = np(1 p).
9.4.2 Geometric Distribution
Definition 2. The Geometric distribution G(p) is the function
G(p, k) = (1 p)k1p = qk1p where k = 1, 2, . . . .
Theorem 3. Consider an infinite Bernoulli process of trials each of which has success probability p.If the random variable X is the number of trials conducted until success occurs for the first time,then X has the geometric distribution G(p).
-
22 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Theorem 4. If X G(p) and n is a positive integer, then P (X > n) = (1 p)n = qn.
Corollary 5. If X G(p), then the cumulative distribution function F is given by F (x) = P (X 6x) = 1 (1 p)x = 1 qx for x R.
Note that x denotes the largest integer less or equal to x.
Theorem 6. If X is a random variable and X G(p), then E(X) = 1
p;
Var(X) = 1pp2
.
9.4.3 Sign Tests
Often, we have a sample of data consisting of independent observations of some quantity of interest,and it might be of interest to see whether the observed values differ systematically from some fixedand pre-determined value.
To answer this question, one may use a sign test approach as follows:
1. Count the number of observations that are strictly greater than the target value (+).
2. Count the total number of observations that are either strictly greater (+) or strictly smaller() than the target value.
3. Calculate the tail probability that measures how often one would expect to observe as manyincreases (+) as were observed, if there were equal probability of + and .
In this course, we will say that if the tail probability is less than 5% then we will regard this assignificant.
9.5 Continuous random variables
Definition 1. Random variable X is continuous if and only if FX(x) is continuous.
Strictly speaking, FX(x) must actually be piecewise differentiable, which means that FX(x) isdifferentiable except for at most countably many points. However, the above definition is goodenough for our present purposes.
Definition 2. The probability density function f(x) of a continuous randomvariable X is defined by
f(x) = fX(x) =d
dxF (x) , x R
if F (x) is differentiable, and limxa
d
dxF (x) if F (x) is not differentiable at x = a.
Theorem 1. F (x) =
x
f(t)dt.
-
9.6. SPECIAL CONTINUOUS DISTRIBUTIONS 23
9.5.1 The mean and variance of a continuous random variable
Definition 3. The expected value (ormean) of a continuous random variable Xwith probability density function f(x) is defined to be
= E(X) =
xf(x)dx .
Theorem 2. If X is a continuous random variable with density function f(x), and g(x) is a realfunction, then the expected value of Y = g(X) is
E(Y ) = E(g(X)) =
g(x)f(x)dx .
Definition 4. The variance of a continuous random variable X is
Var (X) = E((X E(X))2) = E(X2) (E(X))2 .The standard deviation of X is = SD(X) =
Var (X).
Theorem 3. If a and b are constants, then
E(aX + b) = aE(X) + b
Var(aX + b) = a2Var(X)
SD(aX + b) = |a|SD(X) .
Theorem 4. If E(X) = and Var(X) = 2, and Z =X
, then E(Z) = 0 and Var(Z) = 1.
9.6 Special Continuous Distributions
9.6.1 The Normal Distribution
Definition 1. A continuous random variableX has normal distributionN(, 2)if it has probability density
(x) =12pi2
e1
2
(x
)2where < x