volweb.utk.eduvolweb.utk.edu/~bstevens/wp-content/uploads/2019/03/Stat... · Web view3.(3 points)...

20
Stat 201 – Project 2 – Spring 2019 Due Friday, March 8, 2019 (by 11:59pm, submitted to Canvas) Assignments submitted by 11:59pm on Wednesday, March 6 will receive +7 bonus points New Project File: For project 2 you will be using “STAT 201 – Spring 2019 – Project 2.jmp”. This file can be found on the STAT 201 webpage under the “Projects” tab. Getting Started: In this project, you will explore a subset (i.e., a sample) of some of the data collected from the survey that most Stat 201 students completed this semester. See pages 9-11 for a complete list of the questions asked in this survey (but don’t answer these questions!). Please be aware that some responses to the survey have been deleted, mostly to ensure anonymity of the results. You will be including a substantial amount of output within your write-up. INCLUDE ONLY THE OUTPUT NECESSARY TO ANSWER THE PROJECT QUESTIONS. The data are found in the file “STAT 201 – Spring 2019 – Project 2.jmp”, which is located on the STAT 201 webpage under the “Projects” tab. This file contains 1308 responses. In real life situations, researchers would use all of the data they have available after conducting a survey. For this project, however, you will get JMP to help you take a random sample from the entire data set so that each student will have different results, and therefore will be turning in a UNIQUE project. The size of the random sample must be 600 plus the last two digits of your UT student ID number. For example, if your UT student ID number is 000314791, you will take a random sample of size 600 + 91 = 691. When you create your random sample from the original JMP file, JMP creates a new file that will be named “Subset of STAT 201 – Spring 2019 – Project 2”. You should immediately save a copy of this file by clicking the “File” menu and choosing “Save As…”. JMP will prompt you to keep the same name, which is acceptable, or you can rename it to something like “Stat Project2 – My Data” Taking Screenshots: Although there are many ways to get JMP graphics into a written presentation, we want you to use the “screen shot” method in all cases. Please see the video at http://tinyurl.com/utk-screenshots for instructions on how to take selective screen shots on a PC or a Mac. Clearly label what question and part you are answering so your project is graded correctly! See page 8-9 for an example project format. Tutorials and Write-up: See the JMP tutorials at http://web.utk.edu/~cwiek/201Tutorials/ and https://tinyurl.com/Project2PL for 1

Transcript of volweb.utk.eduvolweb.utk.edu/~bstevens/wp-content/uploads/2019/03/Stat... · Web view3.(3 points)...

Page 1: volweb.utk.eduvolweb.utk.edu/~bstevens/wp-content/uploads/2019/03/Stat... · Web view3.(3 points) When doing a Decision Tree analysis, we mentioned that identifier variables, and

Stat 201 – Project 2 – Spring 2019Due Friday, March 8, 2019

(by 11:59pm, submitted to Canvas)Assignments submitted by 11:59pm on Wednesday, March 6 will receive +7 bonus points

New Project File: For project 2 you will be using “STAT 201 – Spring 2019 – Project 2.jmp”. This file can be found on the STAT 201 webpage under the “Projects” tab.

Getting Started: In this project, you will explore a subset (i.e., a sample) of some of the data collected from the survey that most Stat 201 students completed this semester. See pages 9-11 for a complete list of the questions asked in this survey (but don’t answer these questions!). Please be aware that some responses to the survey have been deleted, mostly to ensure anonymity of the results. You will be including a substantial amount of output within your write-up. INCLUDE ONLY THE OUTPUT NECESSARY TO ANSWER THE PROJECT QUESTIONS. The data are found in the file “STAT 201 – Spring 2019 – Project 2.jmp”, which is located on the STAT 201 webpage under the “Projects” tab. This file contains 1308 responses. In real life situations, researchers would use all of the data they have available after conducting a survey. For this project, however, you will get JMP to help you take a random sample from the entire data set so that each student will have different results, and therefore will be turning in a UNIQUE project. The size of the random sample must be 600 plus the last two digits of your UT student ID number. For example, if your UT student ID number is 000314791, you will take a random sample of size 600 + 91 = 691. When you create your random sample from the original JMP file, JMP creates a new file that will be named “Subset of STAT 201 – Spring 2019 – Project 2”. You should immediately save a copy of this file by clicking the “File” menu and choosing “Save As…”. JMP will prompt you to keep the same name, which is acceptable, or you can rename it to something like “Stat Project2 – My Data”

Taking Screenshots: Although there are many ways to get JMP graphics into a written presentation, we want you to use the “screen shot” method in all cases. Please see the video at http://tinyurl.com/utk-screenshots for instructions on how to take selective screen shots on a PC or a Mac. Clearly label what question and part you are answering so your project is graded correctly! See page 8-9 for an example project format.

Tutorials and Write-up: See the JMP tutorials at http://web.utk.edu/~cwiek/201Tutorials/ and https://tinyurl.com/Project2PL for instructions on how to get JMP to perform most tasks. Use page 5 of this project for guidance in which tutorial to look at for each question in the project. In every question that asks you to produce output from JMP, we expect the output you produce to answer the question to be within the write-up. You should put this output immediately after your comments regarding that specific part of the assignment (i.e., not just a series of printouts from JMP at the back of your write-up). You can get help in the Stat 201 Lab with specific questions about the project. You can NOT ask a Stat 201 Lab worker to read your entire project for suggestions on what to change. Your finished work must be submitted within Canvas (see “Assignments”), and must be a Microsoft Word document (.doc or .docx).

JMP and Hodges Library computers: Using JMP installed on your own computer is much simpler than using JMP on a library computer! If you choose to use a computer in the library to do your project, be sure to first read the document “Using JMP in the Library”, found in MyLab under the Project Files tab. Also, you will need to save your project and your random sample subset file to a location you can access later, such as a memory stick. You could also e-mail these files to yourself for later use.

Writing a Good STAT 201 Project Report: Please take note that on page 13 of this document there is a page titled “Writing a Good Stat 201 Project Report”. This page contains a series of guidelines for the written part of your report. A portion of your grade (8%) is related to following these guidelines.

1

Page 2: volweb.utk.eduvolweb.utk.edu/~bstevens/wp-content/uploads/2019/03/Stat... · Web view3.(3 points) When doing a Decision Tree analysis, we mentioned that identifier variables, and

Project QuestionsQuestion One

1. (5 points) Using the JMP data file provided, get JMP to select a random sample of size 600 plus the last 2 digits of your student ID number. Report the size of your sample. Scroll to the bottom of your subset data file, and take a screen shot of the far-left hand portion of your file that includes the first two columns and at least the last 20 rows. Include this screenshot in your report. See page 7 of the project instructions for an example. Save this JMP file, and use it for all remaining questions.

NOTE - For the remainder of this project, you will be exploring a qualitative (categorical) variable that you are interested in. You will use the power of the “Decision Tree” platform to help you quickly identify key variables that predict your variable. You will then perform a regression analysis on a quantitative variable you choose, and a quantitative variable correlated with the variable you are interested in predicting.

Question Two

2. Pick a two-level qualitative (categorical) variable in the survey that you are interested in exploring. Qualitative variables are identified in JMP with a red histogram in the far-left display of the column names. The full survey questions can be found on pages 9-11 of this document.

a. (8 points) Report the variable name that you have selected, along with the complete question text (from pages 9-11) for this variable. Tell us WHY you selected this variable. What about predicting/exploring this variable is interesting?

b. (4 points) Make a bar chart of this variable. It must be horizontally oriented, and have a count axis. Include this chart, along with the frequency statistics in your report.

2

Page 3: volweb.utk.eduvolweb.utk.edu/~bstevens/wp-content/uploads/2019/03/Stat... · Web view3.(3 points) When doing a Decision Tree analysis, we mentioned that identifier variables, and

Question Three – Decision Tree Prep

3. (3 points) When doing a Decision Tree analysis, we mentioned that identifier variables, and categorical variables with many different levels should not be used in the analysis. Which variables in the survey data should you not use in a Decision Tree analysis? (Hint: Is there an “Identifier Variable” in this file? Also, place ALL categorical variables in “Y-columns” in the Analyze – Distribution platform, and look for variables with many levels. This output will be very large: DO NOT include that in your report!)

Question Four – Decision Tree

4. Next you will use the “Partition” (i.e., “Decision Tree”) technique in JMP. Detailed instructions for how to produce the necessary output are provided on page 6 or you can watch the following short YouTube video (Click Here).

a. (6 points) Place the variable you selected in question 2a in the “Y, Response” box. Place all other variables in the “X, factor” box. Be sure to remove your Y variable from the X factor box, as well as the variables you identified in question 3. Produce three (3) “splits”. Select “Leaf Report” and “Show Fit Details” from the red dropdown menu. Include all the Decision Tree output in your report (if necessary, use multiple screen shots if you can’t get all the output from JMP on your screen at one time).

b. (6 points) Explain the significance and interpretation of the variable that appears in the first split. Use the full name of the variables in your write-up. Your interpretation should include the relationship between your Y variable and the X variable in the first split.

c. (6 points) Find the largest number in your “Response Counts” Leaf Label output. Report that number and explain it in context of the problem. If there is a tie, you can interpret either one.

d. (8 points) Locate the Confusion Matrix in your output. Using the table below to guide your description, comment on the number in each quadrant of the matrix. State the number and what it means in the context of the problem.

A B

C D

For example, state that the value for box “A” in my matrix is xx and this means ………

e. (5 points) Interpret the R2 of your tree in context.

f. (3 points) Would you use this Decision Tree to predict your variable? Why or why not?

3

Page 4: volweb.utk.eduvolweb.utk.edu/~bstevens/wp-content/uploads/2019/03/Stat... · Web view3.(3 points) When doing a Decision Tree analysis, we mentioned that identifier variables, and

Question Five – Regression

5. You will now do a regression analysis. Pick a quantitative variable Y, that you would be interested in predicting. Quantitative variables are identified in JMP with blue triangle in the far-left display of the column names. Pick 5 other quantitative variables that you believe may be correlated with your Y variable.

a. (6 points) Report the Y variable name that you have selected, along with the complete question text (from pages 9-11) for this variable. Tell us WHY you selected this variable. What about predicting/exploring this variable is interesting

b. (2 points) Produce a correlation matrix of your Y and the five X variables using JMPs Analyze > Multivariate Methods > Multivariate platform. Include this matrix in your report.

c. (5 points) Choose the X variable in your matrix with the strongest correlation to your Y variable (Only use your Y variable and one X). Produce a scatterplot of those two variables using JMP’s Analyze > Fit Y by X platform. Examine your scatterplot for potential outliers. Select the points you identified as outliers and color them red. Clear row states if you have colored points from the decision tree.

d. (7 points) Fit a least-squares regression line to your scatterplot, and include the scatterplot with the line, and all resulting output in your report.

e. (4 points) Report the value of RSquare. Interpret this value: don’t comment on the magnitude of this number, tell the reader what this number means in context.

f. (4 points) Is the linear relationship “statistically significant”? How do you know?

g. (6 points) What is the direction of the association in the regression analysis between your Y and X variables? What numerical value in your output indicates the direction of the association? Briefly interpret this value in the context of these data.

Additional point values:

Project organization and flow (4 points)

Projects should look neat and organized. Use the crop tool in Word if you need to improve screenshots. Your project should read like a report without the prompt of each question.

Use of the guidelines on page 13 (8 points)

The opening paragraph on the project should give a short summary (3-5 sentences) of the analysis they’re about to read. The closing paragraph should summarize interesting findings and discuss any ideas you have regarding further data collection and/or analysis.

4

Page 5: volweb.utk.eduvolweb.utk.edu/~bstevens/wp-content/uploads/2019/03/Stat... · Web view3.(3 points) When doing a Decision Tree analysis, we mentioned that identifier variables, and

QUESTION HELP SECTION – Clickable Video Links Provided

Question Heading Tutorial Video Link

1 Miscellaneous Topics Taking a Simple Random Sample CLICK HERE – Taking a Random Sample

2bGraphical Display of

Categorical Data Bar Charts CLICK HERE – Bar Charts

4a -- -- CLICK HERE – Decision Trees

5a Correlation and Regression Scatterplot CLICK HERE – Scatterplots & Regression

5b Scatterplot Matrix Scatterplot Matrix CLICK HERE – Scatterplot Matrix

5b Miscellaneous Topics Excluding Data from an Analysis CLICK HERE – Excluding Data

5

Page 6: volweb.utk.eduvolweb.utk.edu/~bstevens/wp-content/uploads/2019/03/Stat... · Web view3.(3 points) When doing a Decision Tree analysis, we mentioned that identifier variables, and

Producing the “Decision Tree” Output

1. Analyze -> Predictive Modeling -> Partition.2. Click on your qualitative variable, then click “Y,Response”.

3. Highlight ALL the variables on the left, and click “X,Factor”. In the box to the right of “X,Factor”, scroll to the variable you placed in “Y,Response”, click it and then click Remove. Also remove the variables you identified in question 3. Click OK.

4. Click Split 3 times (or more, as described in question 4[a]). Take a screen shot of all resulting output and place it in your report.

This output may be large. It is suggested you use one page, or more if needed, to show the output in your report. See the example on page 8.

6

Page 7: volweb.utk.eduvolweb.utk.edu/~bstevens/wp-content/uploads/2019/03/Stat... · Web view3.(3 points) When doing a Decision Tree analysis, we mentioned that identifier variables, and

EXAMPLE FORMAT – Showing Question 1STAT 201 Project #2 – Spring 2019 – Learning about Spring 2019 Stat 201 Students

Submitted by Jane Q. Student

Note: Your first three to five sentences should address item #1 on page 13 of this document.

1. Since the last two digits of my student ID number are 21, I took a random sample of size 621.

7

Page 8: volweb.utk.eduvolweb.utk.edu/~bstevens/wp-content/uploads/2019/03/Stat... · Web view3.(3 points) When doing a Decision Tree analysis, we mentioned that identifier variables, and

EXAMPLE FORMAT – Showing Decision Tree OutputNote: We suggest you use one full page of your report to clearly show the decision tree. Not all decisions

trees will be this big but make sure yours is large enough to be read. The following decision tree was made using Fall 2016 data and is not possible to replicate with Spring 2019 data. Your decision tree

should follow the guidelines in the project regarding the number of splits.

8

Page 9: volweb.utk.eduvolweb.utk.edu/~bstevens/wp-content/uploads/2019/03/Stat... · Web view3.(3 points) When doing a Decision Tree analysis, we mentioned that identifier variables, and

STAT 201 SURVEY – SPRING 2019FOR REFERENCE ONLY - FULL TEXT OF QUESTIONS ASKED

Q0 Which section of Stat 201 are you in?Q1 What is your gender?Q2 How old are you (In years)?Q3 Were you born in Tennessee?Q4 What is your relationship status?Q5 How far do you live from campus?Q6 What was your high school GPA?Q7 Are you a member of a fraternity or sorority? Q8 Are you an only child, oldest child, middle child or youngest child? Pick one answer that best describes your birth order.Q09 Have you ever broken a bone?Q10 Do you have a Roth IRA?Q11 How many pets do you own?Q12 Estimate how much you spend on your pets per year; include veterinary expenses, food, toys, treats, grooming, etc. Enter zero if you have no pets.Q13 How many hours per week do you spend reading assignments from textbooks that your instructors assign Include all classes, not just Stat 201.Q14 What is your major?Q15 How many credit hours are you taking this semester?Q16 How would you identify the economic level of your immediate family?Q17 Are you in the honors program at UT?Q18 What do you expect your starting annual salary (in US dollars) to be when you obtain a college degree?Q19 Do you (or your parents) plan on, or have you (or your parents) already, taken out student loans to pay for your college expenses? Q20 How many hours a week do you currently work at a job? If you are not employed, please put 0.Q21 How many languages can you speak fluently? This includes your native language. This is the language you first learned.Q22 How would you classify your views on economic political issues?Q23 How would you classify your views on social political issues?Q24 What should happen to Confederate statues?Q25 Should the United States stop making pennies? (Eliminate the penny)Q26 Will humans step foot on Mars?Q27 Select the option below that completes the following sentence in a way that best describes your opinion. "Climate change on Earth:Q28 Which of these do you believe to be closest to the truth regarding life on Earth?Q29 Have you ever smoked marijuana?Q30 Do you think marijuana should be legalized at the federal level (For the whole US)?Q31 Should states have the ability to regulate what couples can marry? (i.e. defining marriage as only one man and one woman)Q32 Age when you had your first alcoholic beverage. IMPORTANT- Don't count sips or communion. This should be an actual drink of alcohol.

9

Page 10: volweb.utk.eduvolweb.utk.edu/~bstevens/wp-content/uploads/2019/03/Stat... · Web view3.(3 points) When doing a Decision Tree analysis, we mentioned that identifier variables, and

STAT 201 SURVEY – SPRING 2019FOR REFERENCE ONLY - FULL TEXT OF QUESTIONS ASKED

Q33 When you eat out at a restaurant that involves a waitress or waiter, what percent do you usually tip? Enter response as a whole number with no decimals for a percentage from 0 to 100.Q34 Which statement best describes you behavior when you drink water on campus?Q35 What is the most you've paid for a single coffee based drink? This includes any size, additions and tips given for the drink.Q36 Do you use any tobacco products? Q37 Is vaping a safer alternative to smoking?Q38 How many times have you cheated in college? This includes looking at another test during an exam, taking another student's work and presenting it as your own and other forms of academic dishonesty.Q39 How many times did you cheat in high school? This includes looking at another test during an exam, taking another student's work and presenting it as your own and other forms of academic dishonesty.Q40 On an average night, how many hours of sleep during the school year do you usually get?Q41 What is the longest number of consecutive hours you've stayed awake?Q42 Have you ever been arrested?Q43 Approximately how many text messages do you send a week?Q44 Approximately how many text messages do you send on the weekend?Q45 On a typical school day last semester, approximately how many text messages would you send during class (while you were attending class)?Q46 What is your favorite app on your phone?Q47 What percentage of your income do you believe you should save in your 20s? Enter response as a whole number with no decimals for a percentage from 0 to 100.Q48 Have you ever purchased perishable food items online?Q49 In the past 6 months, have you purchased a product based on a TV commercial? Q50 How many on-line purchases (not counting music downloads) have you made in the last week?Q51 How often do you use coupons when you shop (not including on-line shopping)? Q52 Roughly, how many selfies have you posted on social media in the past month?Q53 How much are you willing to pay to see your favorite musician in concert?Q54 What do you think of Kanye West as an individual?Q55 What do you think of Steven Colbert as an individual?Q56 Have you ever watched most or all of a live sporting event on a smart phone or tablet?Q57 Have you ever gone to a physical store to check out the features of a product, with the intention of purchasing the item online later? Q58 How do you usually to listen to music?Q59 Have you ever reviewed a business or product on social media (i.e. Twitter, Facebook, ect.)?Q60 Are you "friends" on Facebook with one or more of your parents (include step-parents in your answer)?

10

Page 11: volweb.utk.eduvolweb.utk.edu/~bstevens/wp-content/uploads/2019/03/Stat... · Web view3.(3 points) When doing a Decision Tree analysis, we mentioned that identifier variables, and

STAT 201 SURVEY – SPRING 2019FOR REFERENCE ONLY - FULL TEXT OF QUESTIONS ASKED

Q61 Approximately how many friends do you have on Facebook? If you don't have a Facebook account, answer 0 for this question.Q62 Approximately how many people have you defriended on Facebook. If you don't have a Facebook account, answer 0 for this question.Q63 In a typical week last semester, how much time (in hours) did you spend on "social media"? Include both time reading social media and time communicating with social media.Q64 The Tennessee Vols are scheduled to play 12 football games in the 2019 season (I.e., not including the SEC championship game, and not including a bowl game). Number of games won reportedQ65 Did you lie at any point on this survey?

11

Page 12: volweb.utk.eduvolweb.utk.edu/~bstevens/wp-content/uploads/2019/03/Stat... · Web view3.(3 points) When doing a Decision Tree analysis, we mentioned that identifier variables, and

Writing a Good STAT 201 Project Report

Writing a report to your boss about a statistical analysis he has asked you to do is very different than writing a novel, or writing to your Statistics instructor. What does it take to write a good project report? Of course, it’s important to know your audience when you write anything.

Let’s assume you are writing your project report for some busy executives in the company, and they have asked you to answer the questions in the project. They are very intelligent people, but they are not “Statisticians”. Assume that these executives have had some basic statistical education, but perhaps a long time ago. Keep this in mind as you complete your project.

Below are some guidelines for writing an effective project report:

1. The first sentence or two of your report should “orient” the reader. What is this document about? Who is it from? What will you be covering? On what date did you complete the analysis? How was the data obtained?

2. Answer each question on the project instructions using correct sentence structure, spelling and grammar. Sentences should be succinct and clear. You can assume the executives have a copy of the questions they asked.

3. Avoid using "statistical jargon". Explain the results of the analysis in a way that the executives can understand it.

4. As explained in the project instructions, graphics from JMP and/or Excel that address the project question must be imbedded within the document, at the point where the executives need to see them. Don’t make them hunt for the output at the back of your report.

5. Avoid including discussion and/or graphics within the report that have no relevance to the question being addressed.

6. Wrap-up and sign off: Give two to three sentences that showcase your meaningful findings and ideas regarding further data collection or analysis. The wrap-up should contain something meaningful related to the report you wrote and should not just restate results.

12