Stat 201 – Project 2 – Fall 2018volweb.utk.edu/~bstevens/wp-content/uploads/2018/...2 Project...

12
1 Stat 201 – Project 2 – Fall 2018 Due Tuesday, October 30, 2018 (by 11:59pm, submitted to Canvas) Assignments submitted by 11:59pm on Friday, October 26 will receive +7 bonus points New Project File: For project 2 you will be using “STAT 201 – Fall 2018 – Project 2.jmp”. This file can be found on the STAT 201 webpage under the “Projects” tab. Getting Started: In this project, you will explore a subset (i.e., a sample) of some of the data collected from the survey that most Stat 201 students completed this semester. See pages 9-11 for a complete list of the questions asked in this survey (but don’t answer these questions!). Please be aware that some responses to the survey have been deleted, mostly to ensure anonymity of the results. You will be including a substantial amount of output within your write-up. INCLUDE ONLY THE OUTPUT NECESSARY TO ANSWER THE PROJECT QUESTIONS. The data are found in the file “STAT 201 – Fall 2018 – Project 2.jmp”, which is located on the STAT 201 webpage under the “Projects” tab. This file contains 806 responses. In real life situations, researchers would use all of the data they have available after conducting a survey. For this project, however, you will get JMP to help you take a random sample from the entire data set so that each student will have different results, and therefore will be turning in a UNIQUE project. The size of the random sample must be 600 plus the last two digits of your UT student ID number. For example, if your UT student ID number is 000314791, you will take a random sample of size 600 + 91 = 691. When you create your random sample from the original JMP file, JMP creates a new file that will be named “Subset of STAT 201 – Fall 2018 – Project 2”. You should immediately save a copy of this file by clicking the “File” menu and choosing “Save As…”. JMP will prompt you to keep the same name, which is acceptable, or you can rename it to something like “Stat Project2 – My Data” Taking Screenshots: Although there are many ways to get JMP graphics into a written presentation, we want you to use the “screen shot” method in all cases. Please see the video at http://tinyurl.com/utk-screenshots for instructions on how to take selective screen shots on a PC or a Mac. Clearly label what question and part you are answering so your project is graded correctly! See page 7-8 for an example project format. Tutorials and Write-up: See the JMP tutorials at http://web.utk.edu/~cwiek/201Tutorials/ and https://tinyurl.com/Project2PL for instructions on how to get JMP to perform most tasks. Use page 5 of this project for guidance in which tutorial to look at for each question in the project. In every question that asks you to produce output from JMP, we expect the output you produce to answer the question to be within the write-up. You should put this output immediately after your comments regarding that specific part of the assignment (i.e., not just a series of printouts from JMP at the back of your write-up). You can get help in the Stat 201 Lab with specific questions about the project. You can NOT ask a Stat 201 Lab worker to read your entire project for suggestions on what to change. Your finished work must be submitted within Canvas (see “Assignments”), and must be a Microsoft Word document (.doc or .docx). JMP and Hodges Library computers: Using JMP installed on your own computer is much simpler than using JMP on a library computer! If you choose to use a computer in the library to do your project, be sure to first read the document “Using JMP in the Library”, found in MyLab under the Project Files tab. Also, you will need to save your project and your random sample subset file to a location you can access later, such as a memory stick. You could also e-mail these files to yourself for later use. Writing a Good STAT 201 Project Report: Please take note that on page 12 of this document there is a page titled “Writing a Good Stat 201 Project Report”. This page contains a series of guidelines for the written part of your report. A portion of your grade (8%) is related to following these guidelines.

Transcript of Stat 201 – Project 2 – Fall 2018volweb.utk.edu/~bstevens/wp-content/uploads/2018/...2 Project...

Page 1: Stat 201 – Project 2 – Fall 2018volweb.utk.edu/~bstevens/wp-content/uploads/2018/...2 Project Questions Question One 1. (8 points) Using the JMP data file specified earlier, get

1

Stat 201 – Project 2 – Fall 2018 Due Tuesday, October 30, 2018

(by 11:59pm, submitted to Canvas) Assignments submitted by 11:59pm on Friday, October 26 will receive +7 bonus points

New Project File: For project 2 you will be using “STAT 201 – Fall 2018 – Project 2.jmp”. This file can be found on the STAT 201 webpage under the “Projects” tab.

Getting Started: In this project, you will explore a subset (i.e., a sample) of some of the data collected from the survey that most Stat 201 students completed this semester. See pages 9-11 for a complete list of the questions asked in this survey (but don’t answer these questions!). Please be aware that some responses to the survey have been deleted, mostly to ensure anonymity of the results. You will be including a substantial amount of output within your write-up. INCLUDE ONLY THE OUTPUT NECESSARY TO ANSWER THE PROJECT QUESTIONS. The data are found in the file “STAT 201 – Fall 2018 – Project 2.jmp”, which is located on the STAT 201 webpage under the “Projects” tab. This file contains 806 responses. In real life situations, researchers would use all of the data they have available after conducting a survey. For this project, however, you will get JMP to help you take a random sample from the entire data set so that each student will have different results, and therefore will be turning in a UNIQUE project. The size of the random sample must be 600 plus the last two digits of your UT student ID number. For example, if your UT student ID number is 000314791, you will take a random sample of size 600 + 91 = 691. When you create your random sample from the original JMP file, JMP creates a new file that will be named “Subset of STAT 201 – Fall 2018 – Project 2”. You should immediately save a copy of this file by clicking the “File” menu and choosing “Save As…”. JMP will prompt you to keep the same name, which is acceptable, or you can rename it to something like “Stat Project2 – My Data”

Taking Screenshots: Although there are many ways to get JMP graphics into a written presentation, we want you to use the “screen shot” method in all cases. Please see the video at http://tinyurl.com/utk-screenshots for instructions on how to take selective screen shots on a PC or a Mac. Clearly label what question and part you are answering so your project is graded correctly! See page 7-8 for an example project format.

Tutorials and Write-up: See the JMP tutorials at http://web.utk.edu/~cwiek/201Tutorials/ and https://tinyurl.com/Project2PL for instructions on how to get JMP to perform most tasks. Use page 5 of this project for guidance in which tutorial to look at for each question in the project. In every question that asks you to produce output from JMP, we expect the output you produce to answer the question to be within the write-up. You should put this output immediately after your comments regarding that specific part of the assignment (i.e., not just a series of printouts from JMP at the back of your write-up). You can get help in the Stat 201 Lab with specific questions about the project. You can NOT ask a Stat 201 Lab worker to read your entire project for suggestions on what to change. Your finished work must be submitted within Canvas (see “Assignments”), and must be a Microsoft Word document (.doc or .docx).

JMP and Hodges Library computers: Using JMP installed on your own computer is much simpler than using JMP on a library computer! If you choose to use a computer in the library to do your project, be sure to first read the document “Using JMP in the Library”, found in MyLab under the Project Files tab. Also, you will need to save your project and your random sample subset file to a location you can access later, such as a memory stick. You could also e-mail these files to yourself for later use.

Writing a Good STAT 201 Project Report: Please take note that on page 12 of this document there is a page titled “Writing a Good Stat 201 Project Report”. This page contains a series of guidelines for the written part of your report. A portion of your grade (8%) is related to following these guidelines.

Page 2: Stat 201 – Project 2 – Fall 2018volweb.utk.edu/~bstevens/wp-content/uploads/2018/...2 Project Questions Question One 1. (8 points) Using the JMP data file specified earlier, get

2

Project Questions

Question One 1. (8 points) Using the JMP data file specified earlier, get JMP to select a random sample of

size 600 plus the last 2 digits of your student ID number. Report the size of your sample. As you did in project 1, scroll to the bottom of your subset data file, and take a screen shot of the far-left hand portion of your file that includes the first column and at least the last 20 rows. Include this screenshot in your report. See page 7 of the project instructions for an example. Save this JMP file, and use it for all remaining questions.

NOTE - For the remainder of this project, you will be exploring a quantitative variable that you are interested in. You will use the power of the “Decision Tree” platform to help you quickly identify another quantitative variable that is associated with the variable you choose. You will then perform a regression analysis on the variable you choose, and the quantitative variable the Decision Tree identifies as most related to your chosen quantitative variable.

Question Two 2. Pick a quantitative variable in the survey that you are interested in exploring. Quantitative

variables are identified in JMP with blue triangles in the far-left display of the column names. The full survey questions can be found on pages 9-11 of this document.

a. (5 point) Report the variable name that you have selected, along with the complete

question text (from pages 9-11) for this variable and the units it was recorded in. b. (6 points) Make a histogram of this variable. It must be horizontally oriented, and have a

count axis. Include this histogram, along with the quantiles and summary statistics. c. (6 points) Examine your data for potential outliers and report whether your data has

outliers. Make sure to include a short explanation justifying the removal of the outliers or a short explanation justifying why you feel your histogram has no outliers. If you do feel you have outliers, use "Hide and Exclude" to generate another histogram without these observation(s). Your histogram should be formatted as in part (b).

Note: Don’t “Hide and Exclude” all data values the box plot identifies as an outlier! We always need justification to remove an outlier.

Page 3: Stat 201 – Project 2 – Fall 2018volweb.utk.edu/~bstevens/wp-content/uploads/2018/...2 Project Questions Question One 1. (8 points) Using the JMP data file specified earlier, get

3

Question Three

3. (6 points) When doing a Decision Tree analysis, we mentioned that identifier variables, and categorical variables with many different levels should not be used in the analysis. Which variables in the survey data should you not use in a Decision Tree analysis? (Hint: Is there an “Identifier Variable” in this file? Also, place ALL of the variables in “Y-columns” in the Analyze – Distribution platform, and look for variables with many levels. This output will be very large: DO NOT include that in your report!)

Question Four

4. Next you will use the “Partition” (i.e., “Decision Tree”) technique in JMP. Because

“Decision Trees” are not covered in the STAT 201 JMP Tutorials, detailed instructions for how to produce the necessary output are provided on page 6 or you can watch the following short YouTube video (Click Here).

a. (8 points) Place the variable you selected in question 2 in the “Y,Response” box. If you

“Hide and Excluded” any rows in question 2(c), be sure to keep these rows hidden and excluded. Place all other variables in the “X,factor” box. Be sure to remove your Y variable from the “X,factor” box, as well as the variables you identified in question 3. Produce three (3) “splits”. If a quantitative variable appears in the decision tree within the first three splits, use the quantitative variable highest in the tree (i.e., the quantitative variable that first showed up as you created your splits) as X in the remaining steps. Otherwise, continue to split until a quantitative variable is identified in a split, and use that quantitative variable as X in the remaining steps. Report the variable name, and the full question text for this variable (from pages 9-11) that you will use as your “X” variable in remaining steps. Include all the Decision Tree output in your report (if necessary, use multiple screen shots if you can’t get all the output from JMP on your screen at one time).

b. (6 points) Describe the direction of the association (either positive or negative, suggested

by the Decision Tree output) between your Y variable and the X variable. Use values from the Decision Tree output to help you explain the direction of the association.

c. (5 point) Are you surprised by the direction of the association between these two

variables? Briefly explain your reasoning. (Note: there is no “correct” answer to this, it is simply your opinion.)

Page 4: Stat 201 – Project 2 – Fall 2018volweb.utk.edu/~bstevens/wp-content/uploads/2018/...2 Project Questions Question One 1. (8 points) Using the JMP data file specified earlier, get

4

Question Five

5. You will now do a regression analysis using your Y variable and the X variable identified in question 4.

a. (5 point) Produce a scatterplot of your Y and X variables using JMPs Analyze - Fit Y by X

platform. Include this scatterplot in your report. b. (5 points) Examine your scatterplot for potential outliers and report whether your data has

outliers or not. If you do feel you have outliers, justify your conclusion then use "Hide and Exclude" to generate another scatterplot without these observation(s). Make sure to include a short explanation justifying the removal of the outliers or a short explanation justifying why you feel your scatterplot has no outliers. If you have any data that was hidden in 2c, make sure it is still hidden.

c. (5 points) Fit a least-squares regression line to your scatterplot, and include the scatterplot

with the line, and all resulting output in your report. d. (5 points) Report the value of RSquare. Interpret this value: don’t comment on the

magnitude of this number, tell the reader what this number means. e. (6 points) Is the linear relationship “statistically significant”? How do you know? f. (8 points) Is the direction of the association in the regression analysis the same as what

you described in question 4(b)? What numerical value in your output indicates the direction of the association? Briefly interpret this value in the context of these data.

Additional point values: Project organization and flow (8 points)

Projects should look neat and organized. Use the crop tool in Word if you need to improve screenshots. Your project should read like a report without the prompt of each question. Use of the guidelines on page 12 (8 points) The opening paragraph on the project should give a short summary (3-5 sentences) of the analysis they’re about to read. The closing paragraph should summarize interesting findings and discuss any ideas you have regarding further data collection and/or analysis.

Page 5: Stat 201 – Project 2 – Fall 2018volweb.utk.edu/~bstevens/wp-content/uploads/2018/...2 Project Questions Question One 1. (8 points) Using the JMP data file specified earlier, get

5

QUESTION HELP SECTION – Clickable Video Links Provided Question Heading Tutorial Video Link

1 Miscellaneous Topics Taking a Simple Random Sample CLICK HERE – Taking a Random Sample

2b Graphical Display of Quantitative Data Histogram & Box Plot CLICK HERE – Histograms and Box Plots

2c Miscellaneous Topics Excluding Data from an Analysis CLICK HERE – Excluding Data

4a -- -- CLICK HERE – Decision Trees

5a Correlation and Regression Scatterplot CLICK HERE – Scatterplots & Regression

5b Miscellaneous Topics Excluding Data from an Analysis CLICK HERE – Excluding Data

5b Correlation and Regression

Least-Squares Regression Line, Residual Plots and Histogram of Residuals

CLICK HERE – Scatterplots & Regression

Page 6: Stat 201 – Project 2 – Fall 2018volweb.utk.edu/~bstevens/wp-content/uploads/2018/...2 Project Questions Question One 1. (8 points) Using the JMP data file specified earlier, get

6

Producing the “Decision Tree” Output

1. Analyze -> Predictive Modeling -> Partition. 2. Click on your quantitative variable, then click “Y,Response”.

3. Highlight ALL the variables on the left, and click “X,Factor”. In the box to the right of “X,Factor”, scroll to the variable you placed in “Y,Response”, click it and then click Remove. Also remove the variables you identified in question 3. Click OK.

4. Click Split 3 times (or more, as described in question 4[a]). Take a screen shot of all resulting output and place it in your report.

This output may be large. It is suggested you use one page, or more if needed, to show the output in your report. See the example on page 8. Note: In the example on page 8, it took 6 splits before a quantitative variable showed up in the decision tree. You should see a quantitative variable show up in your decision tree before the 6th split!

Page 7: Stat 201 – Project 2 – Fall 2018volweb.utk.edu/~bstevens/wp-content/uploads/2018/...2 Project Questions Question One 1. (8 points) Using the JMP data file specified earlier, get

7

EXAMPLE FORMAT – Showing Question 1 STAT 201 Project #2 – Fall 2018 – Learning about Fall 2018 Stat 201 Students

Submitted by Jane Q. Student

Note: Your first three to five sentences should address item #1 on page 12 of this document. 1. Since the last two digits of my student ID number are 21, I took a random sample of size 621.

Page 8: Stat 201 – Project 2 – Fall 2018volweb.utk.edu/~bstevens/wp-content/uploads/2018/...2 Project Questions Question One 1. (8 points) Using the JMP data file specified earlier, get

8

EXAMPLE FORMAT – Showing Decision Tree Output Note: We suggest you use one full page of your report to clearly show the decision tree. Not all decisions

trees will be this big but make sure yours is large enough to be read. The following decision tree was made using Fall 2016 data and is not possible to replicate with Fall 2018 data. Your decision tree should

follow the guidelines in the project regarding the number of splits.

Page 9: Stat 201 – Project 2 – Fall 2018volweb.utk.edu/~bstevens/wp-content/uploads/2018/...2 Project Questions Question One 1. (8 points) Using the JMP data file specified earlier, get

9

STAT 201 SURVEY – Fall 2018 FOR REFERENCE ONLY - FULL TEXT OF QUESTIONS ASKED

Q0 Which section of Stat 201 are you in? Q1 What is your gender? Q2 How old are you (In years)? Q3 Were you born in Tennessee? Q4 What is your relationship status? Q5 How far do you live from campus? Q6 What was your high school GPA? Q7 Are you a member of a fraternity or sorority? Q8 Are you an only child, oldest child, middle child or youngest child? Pick one answer that best describes your birth order. Q09 Have you ever broken a bone? Q10 Do you have a Roth IRA? Q11 How many pets do you own? Q12 Estimate how much you spend on your pets per year; include veterinary expenses, food, toys, treats, grooming, etc. Enter zero if you have no pets. Q13 How many hours per week do you spend reading assignments from textbooks that your instructors assign Include all classes, not just Stat 201. Q14 What is your major? Q15 How many credit hours are you taking this semester? Q16 How would you identify the economic level of your immediate family? Q17 Are you in the honors program at UT? Q18 What do you expect your starting annual salary (in US dollars) to be when you obtain a college degree? Q19 Do you (or your parents) plan on, or have you (or your parents) already, taken out student loans to pay for your college expenses? Q20 How many hours a week do you currently work at a job? If you are not employed, please put 0. Q21 How many languages can you speak fluently? This includes your native language. This is the language you first learned. Q22 How would you classify your views on economic political issues? Q23 How would you classify your views on social political issues? Q24 What should happen to Confederate statues? Q25 Should the United States stop making pennies? (Eliminate the penny) Q26 Will humans step foot on Mars? Q27 Select the option below that completes the following sentence in a way that best describes your opinion. "Climate change on Earth: Q28 Which of these do you believe to be closest to the truth regarding life on Earth? Q29 Have you ever smoked marijuana? Q30 Do you think marijuana should be legalized at the federal level (For the whole US)? Q31 Should states have the ability to regulate what couples can marry? (i.e. defining marriage as only one man and one woman) Q32 Age when you had your first alcoholic beverage. IMPORTANT- Don't count sips or communion. This should be an actual drink of alcohol.

Page 10: Stat 201 – Project 2 – Fall 2018volweb.utk.edu/~bstevens/wp-content/uploads/2018/...2 Project Questions Question One 1. (8 points) Using the JMP data file specified earlier, get

10

STAT 201 SURVEY – Fall 2018 FOR REFERENCE ONLY - FULL TEXT OF QUESTIONS ASKED

Q33 When you eat out at a restaurant that involves a waitress or waiter, what percent do you usually tip? Enter response as a whole number with no decimals for a percentage from 0 to 100. Q34 Which statement best describes you behavior when you drink water on campus? Q35 What is the most you've paid for a single coffee based drink? This includes any size, additions and tips given for the drink. Q36 Do you use any tobacco products? Q37 Is vaping a safer alternative to smoking? Q38 How many times have you cheated in college? This includes looking at another test during an exam, taking another student's work and presenting it as your own and other forms of academic dishonesty. Q39 How many times did you cheat in high school? This includes looking at another test during an exam, taking another student's work and presenting it as your own and other forms of academic dishonesty. Q40 On an average night, how many hours of sleep during the school year do you usually get? Q41 What is the longest number of consecutive hours you've stayed awake? Q42 Have you ever been arrested? Q43 Approximately how many text messages do you send a week? Q44 Approximately how many text messages do you send on the weekend? Q45 On a typical school day last semester, approximately how many text messages would you send during class (while you were attending class)? Q46 What is your favorite app on your phone? Q47 What percentage of your income do you believe you should save in your 20s? Enter response as a whole number with no decimals for a percentage from 0 to 100. Q48 Have you ever purchased perishable food items online? Q49 In the past 6 months, have you purchased a product based on a TV commercial? Q50 How many on-line purchases (not counting music downloads) have you made in the last week? Q51 How often do you use coupons when you shop (not including on-line shopping)? Q52 Roughly, how many selfies have you posted on social media in the past month? Q53 How much are you willing to pay to see your favorite musician in concert? Q54 What do you think of Kanye West as an individual? Q55 What do you think of Steven Colbert as an individual? Q56 Have you ever watched most or all of a live sporting event on a smart phone or tablet? Q57 Have you ever gone to a physical store to check out the features of a product, with the intention of purchasing the item online later? Q58 How do you usually to listen to music? Q59 Have you ever reviewed a business or product on social media (i.e. Twitter, Facebook, ect.)? Q60 Are you "friends" on Facebook with one or more of your parents (include step-parents in your answer)?

Page 11: Stat 201 – Project 2 – Fall 2018volweb.utk.edu/~bstevens/wp-content/uploads/2018/...2 Project Questions Question One 1. (8 points) Using the JMP data file specified earlier, get

11

STAT 201 SURVEY – Fall 2018 FOR REFERENCE ONLY - FULL TEXT OF QUESTIONS ASKED

Q61 Approximately how many friends do you have on Facebook? If you don't have a Facebook account, answer 0 for this question. Q62 Approximately how many people have you defriended on Facebook. If you don't have a Facebook account, answer 0 for this question. Q63 In a typical week last semester, how much time (in hours) did you spend on "social media"? Include both time reading social media and time communicating with social media. Q64 The Tennessee Vols are scheduled to play 12 football games in the 2018 season (I.e., not including the SEC championship game, and not including a bowl game). Number of games won reported Q65 Did you lie at any point on this survey?

Page 12: Stat 201 – Project 2 – Fall 2018volweb.utk.edu/~bstevens/wp-content/uploads/2018/...2 Project Questions Question One 1. (8 points) Using the JMP data file specified earlier, get

12

Writing a Good STAT 201 Project Report

Writing a report to your boss about a statistical analysis he has asked you to do is very different than writing a novel, or writing to your Statistics instructor. What does it take to write a good project report? Of course, it’s important to know your audience when you write anything. Let’s assume you are writing your project report for some busy executives in the company, and they have asked you to answer the questions in the project. They are very intelligent people, but they are not “Statisticians”. Assume that these executives have had some basic statistical education, but perhaps a long time ago. Keep this in mind as you complete your project. Below are some guidelines for writing an effective project report: 1. The first sentence or two of your report should “orient” the reader. What is this

document about? Who is it from? What will you be covering? On what date did you complete the analysis?

2. Answer each question on the project instructions using correct sentence

structure, spelling and grammar. Sentences should be succinct and clear. You can assume the executives have a copy of the questions they asked.

3. Avoid using "statistical jargon". Explain the results of the analysis in a way that

the executives can understand it. 4. As explained in the project instructions, graphics from JMP and/or Excel that

address the project question must be imbedded within the document, at the point where the executives need to see them. Don’t make them hunt for the output at the back of your report.

5. Avoid including discussion and/or graphics within the report that have no

relevance to the question being addressed. 6. Wrap-up and sign off: Give two to three sentences that showcase your

meaningful findings and ideas regarding further data collection or analysis. The wrap-up should contain something meaningful related to the report you wrote and should not just restate results.