November 7th, 2014 (Lecture 10)

45
SCNC1111 Scientific Method and Reasoning Part IIb Lecture 10 Probability and Statistics 7-11-2014

description

scnc

Transcript of November 7th, 2014 (Lecture 10)

  • SCNC1111 Scientific Method and Reasoning

    Part IIb

    Lecture 10 Probability and Statistics

    7-11-2014

  • Eddy Lam Department of Statistics and Actuarial Science

  • What is statistics? The term statistics is ultimately derived from the New

    Latin statisticum collegium ("council of state") and the Italian word statista ("statesman" or "politician").

    The German Statistik, first introduced in 1749, originally designated the analysis of data about the state, signifying the "science of state" (then called "political arithmetic" in English or Official Statistics of the Government).

    The scope of the discipline of statistics broadened in the early 19th century to include the collection and analysis of data in general. Today, statistics is widely employed in government, business, and natural and social sciences.

  • What is statistics?

  • What is statistics?

    Source: http://populationpyramid.net/hong-kong-sar-china/2050/

  • What is statistics? According to the Oxford Dictionary of

    Statistical Terms (2003), Statistics is the study of the collection, organization, presentation, analysis, and interpretation of data.

    There are many different kinds of surveys going on everyday.

    For example: election

  • Midterm Election 2014 (Nov. 4)

    Red Republican Blue - Democratic

    Source: http://en.wikipedia.org/wiki/United_States_elections,_2014

  • Survey Statistics

    Someone drew a conclusion that:

    In Japan, men yell at women while in Hong Kong, men are yelled by women

    Japanese women & Hong Kong men

    live Longest in the world

  • Favorite Food Survey

    Foodtank, 2006. Pickupcafe, 2008. Nipic, 2011.

    B. Hong Kong-style milk tea

    C. Turtle Herbal Jelly

    D. Wonton noodles

    A. Egg puffs

    E. Fish ball

  • 1st Egg puffs

    2nd Hong Kong-style milk tea

    3rd Turtle Herbal Jelly

    Foodtank, 2006. Pickupcafe, 2008. Nipic, 2011.

  • Favorite Food Survey

    Foodtank, 2006. Pickupcafe, 2008. Nipic, 2011.

    B. 5 21434 + 4 31431 + 3 13732 + 13702 + 2 11047 = 337290 C. 5 13732 + 3 21434 + 1 31431 + 13702 + 11047 =189142 D. 5 31431 + 4 11047 + 2 31431 + 13732 + 1 21434 = 313103 A. 5 31431 + 4 (21434 + 13732 + 13702) = 352627

    E. 5 11047 + 3 31431 + 2 21434 + 13732 + 1 13732 = 233592

    Students

    Choice 31431 21434 13732 13702 11047

    1st Choice A B C D E

    2nd Choice B A A A D

    3rd Choice E C B B A

    4th Choice D E D E B

    5th Choice C D E C C

  • Favorite Food Survey Method 1: Turtle Herbal Jelly ranks the third if we

    simply count the number of students that places it as the most favorite food.

    Method 2: Turtle Herbal Jelly ranks the fifth if we use the scoring method as discussed in the last page.

    Method 3: Turtle Herbal Jelly ranks the fifth if we simply count the number of students that places it as the least favorite food [61.5% (31431+13702+11047 out of 91346) rated it as the least favorite food].

    Wikipeida, 2012.

  • Rating of CE Leung Chun-ying? HKU POP 11/03/2014 based on a random sample of n = 998 Please use a scale of 0-100 to rate your extent of

    support to the Chief Executive Leung Chung-ying, with 0 indicating absolutely not supportive, 100 indicating absolutely supportive and 50 indicating half-half. How would you rate the Chief Executive Leung Chung-ying?

    0100010050

    Wikipeida, 2012.

  • Rating of CE Leung Chun-ying? Mean ratings = 47.4 (Failed?) Median ratings = 50 Mode ratings = 50 (280/998 - the answer that

    occurred most frequently, first runner up is 60 with 105/998, 2nd runner up is 40 with 92/998, 3rd runner up is 0 with 91/998, , 8th runner up is 100)

    Digit preference effect Mode ratings = 0 (91/998 - after some

    manipulation) Another measure is proportion of rating a 50 or

    above

  • Rating of CE Leung Chun-ying? 615/998 = 61.62% rated CE a 50 or above (or

    668/998 = 66.9% rated CE a 50 or below)

    with a 95% confidence interval given by (58.6%, 64.6%)

    Question: Which measure do you think is more

    appropriate to describe the rating of CY? Mean, Median, Mode, Proportion? The answer is very obvious!

  • Experimenters bias: Observer-expectancy effect

    Pro CY proportion (more than 61% people supported CY) is

    more appropriate or at least the median (50 absolutely a Pass) Against CY mean (47.5 absolute failure) mode (0 total failure) Scientifically comon people (ref?), a clear, objective

    definition of the outcome variable is important! For example: a perfect and SUCCESSFUL surgery, but

    the patient died.

  • Experimenters bias: Observer-expectancy effect

    Researchers cognitive bias may cause them to unconsciously influence the participants of an experiment.

    Confirmation bias can cause the experimenter to interpret the results with a tendency to look for information that conforms to their research hypothesis, and overlook information against it.

    Examples: Placebo effect patients given a placebo treatment will have a

    perceived or actual improvement in medical condition. Pygmalion effect, or Rosenthal effect, is the phenomenon

    whereby the greater the expectation placed upon people, the better they perform (education and social class research).

  • Summary of the Idealized Scientific Method

    Observation

    Hypothesis

    Prediction

    Experimental Test

    Confirmation or Falsification

    Statistics is particularly useful and has been extremely successful in proving

    (Generalization via induction)

    (Deduction)

  • Interpretations? Do you know that the great majority of

    people have more than the average number of legs?

    It's obvious really; amongst the 7 million people in Hong Kong there are probably 1,000 people who have only got one leg.

    Therefore the average number of legs is

    (10001) + (6,999,0002)7,000,000 = 1.999857...

    Since most people have 2 legs.......

    Wikipeida, 2012.

  • What is Statistics?

    Source: http://www.cafepress.com/+funny_math_statistics_poster,573965856

  • Interpretations? It is proven that the celebration of birthdays is

    healthy!

    -- S. den Hartog, Ph.D. Thesis, University of Groningen

    Weddingcake.name, 2012. NickRay2, 2012.

  • Interpretations? The Japanese eat very little fat and suffer fewer heart

    attacks than the British or the Americans. On the other hand, the French eat a lot of fat and

    also suffer fewer heart attacks than the British or the Americans.

    The Japanese drink very little red wine and suffer fewer heart attacks than the British or the Americans.

    The Italians drink excessive amounts of red wine and also suffer fewer heart attacks than the British or the Americans.

    Conclusion: Eat and drink whatever you like! It's speaking English that kills you.

    Englishblog, 2012.

  • Correlation OR Causation?! Statistics showed that 95% of young people

    have low back and neck pain

    The other 5% have no smart phone or computer

    Source: http://www.cospt.net/?portfolio=low-back-and-neck-pain-assessment-and-treatment

    Source:http://alexandertechniquebrighton.blogspot.hk/2012/07/can-i-be-as-smart-as-my-smart-phone-or.html

    Source:https://steynian.wordpress.com/category/preachery/

  • Correlation OR Causation? Eddy has low back and neck pain!

    What is the implication? There is a 95% chance that Eddy is a young

    man.

  • What is statistics? The mathematical foundations of statistics were laid in the 17th century with the development of probability theory by Blaise Pascal and Pierre de Fermat. Probability theory arose from the study of games of chance because of a gambler.

  • Gamblers problem The well-known gamblers problem: http://youtu.be/_hLbnscj8UQ

    Time : Year 1654 Place : France Person : C. de Mr

  • Probability Game 1 : Mr bets even money that he would throw

    at least one six in four throws of a die (i.e., Mr will win if at least one six turns out in four throws).

    Game 2 : Mr bets even money that a double-six would appear at least once if he were given twenty-four throws of two dice.

    Mr believed that these two games are indifferent according to the following argument :

    No. of possible outcomes

    No. of throws

    Game 1 6 4 Game 2 6 x 6 = 36 6 x 4 = 24

  • Probability Mrs calculation Pr "6" = 1

    6

    Pr "6" 4 = 46

    = 23

    Pr "6" = 1

    36

    Pr 1 "6" 24 = 2436

    = 23

  • Probability Mr suffered tremendous loss and posed the problem to a young

    French mathematician named B. Pascal. Pascal had discussed this problem with another mathematician, P.

    de Fermat. Through their discussions they not only came up with a convincing

    and self-consistent solution to this problem Pr(Mr wins in game 1) = 1 56 4 = 0.51775 Pr(Mr wins in game 2) = 1 3536 24 = 0.4914 Therefore Mr had more advantage in game 1 than in game 2.

    but also developed concepts that continue to be fundamental in to this day.

  • Probability To most people, probability is a loosely

    defined term employed in everydays conversations to indicate the measure of ones belief in the occurrence of a future event.

    In the U.S. and many other countries, the weather report even forecasts the probabilities of precipitation for the next day.

  • Probability Definition 1: A random experiment is a

    process leading to at least two possible outcomes with uncertainty as to which will occur.

    Examples are: (a) Tossing a coin; (b) Tossing a die; (c) Asking a student on whether s/he likes this

    course or not; (d) Rise or fall of todays HSI.

  • Probability Definition 2: The possible outcomes of a

    random experiment are called the basic outcomes, and the set of all basic outcomes is called the sample space, S.

    Examples: (a) S = { H, T }; (b) S = { 1, 2, 3, 4, 5, 6 }; (c) S = { Yes, So So, No }; (d) S = { up, down }.

  • Probability Definition 3: An event is a set of basic

    outcomes, or a collection of some basic outcomes from the sample space S, and it is said to occur if the random experiment gives rise to one of its constituent basic outcomes.

    Obtaining an even number is an event in the die tossing experiment as it consists the basic outcomes {2, 4, 6}.

  • Example 1 (Tossing coins) What is the sample space for the experiment of tossing a fair coin once, twice and three times? For tossing a coin once, 1 = , For tossing a coin twice, S2 = ,,, For tossing a coin three times, 3 = {,,,,,,,}

  • Axioms of Probability 1. If A is any event in the sample space S, then 0.0 P(A) 1.0. 2. Let A be an event in S, and let denote the basic outcomes (basic outcomes are mutually exclusive), then A = (A ) Or = ()=1 . 3. = 1.

  • Three Conceptual Approaches to Probability

    1) Classical Probability Basic outcomes that have the same probability of occurrence are called equally likely outcomes. The classical probability rule is applied to compute the probabilities of events for an experiment with all the basic outcomes are equally likely (say, tossing a coin, Mark Six) . Example 2 : Find the probability of obtaining a head and

    the probability of obtaining a tail in one toss of a coin. Example 3 : Find the probability of obtaining an even

    number in one roll of a die.

  • Example 4 (Rolling dice) When you roll two balanced dice together, you can get 36 equally likely outcomes, as shown below. What is the probability of (a) obtaining a sum of 2; P a sum of 2 = 136 = 0.027778 (b) obtaining a sum of 6; P a sum of 6 = 536 = 0.138889 (c) obtaining the same number? P the same number = 636 = 16 = 0.16667

    Figures from: Introductory Statistics. 9th ed. Neil A. Weiss. Boston: Pearson, 2012. P147

  • Example 5 (Drawing cards) A card is selected at random from a deck of 52 cards. What is the probability of (a) drawing a diamond; P drawing a diamond = 1352 = 14 = 0.25

    Figures from: Introductory Statistics. 9th ed. Neil A. Weiss. Boston: Pearson, 2012. p153,154

  • Cont. Example 5 (b) drawing an Ace; P drawing an Ace = 452 = 113 = 0.076923 (c) drawing a face card (J/Q/K)? Let E denotes the event of drawing a face card in a deck or cards, P E = 3 452 = 313 = 0.23077

  • 2) Relative Frequency Concept of Probability If an experiment is repeated (under the same conditions) n times and an event A is observed times, then the relative frequency concept of probability is given as

    =

    . Example 6 : Among the first 10000 consecutive deliveries

    in Hong Kong this year, 4888 of them are male. Hence

    ( ) = 488810000

    = 0.4888. Hence, probability is an abstraction of relative frequency or relative frequency is a realization of probability.

    Three Conceptual Approaches to Probability

  • 3) Subjective Probability Neither equally likely outcomes nor the experiments are repeatable. It is based on the individual's own judgment, experience, information and belief (say, horse racing, NBA games)

    Three Conceptual Approaches to Probability

    SCNC1111Scientific Method and ReasoningPart IIbLecture 10Probability and StatisticsSlide Number 2What is statistics?What is statistics?What is statistics?What is statistics?Slide Number 7Slide Number 8Midterm Election 2014 (Nov. 4)Slide Number 10Slide Number 11Favorite Food SurveySlide Number 13Favorite Food SurveyFavorite Food SurveyRating of CE Leung Chun-ying?Rating of CE Leung Chun-ying?Rating of CE Leung Chun-ying?Experimenters bias: Observer-expectancy effectExperimenters bias: Observer-expectancy effectSlide Number 21Interpretations?What is Statistics?Interpretations?Interpretations?Correlation OR Causation?!Correlation OR Causation?What is statistics?Gamblers problemProbabilityProbabilityProbabilityProbabilitySlide Number 34ProbabilityProbabilityProbabilityExample 1Axioms of ProbabilityThree Conceptual Approaches to ProbabilityExample 4Example 5Cont. Example 5Three Conceptual Approaches to ProbabilityThree Conceptual Approaches to Probability