Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint

Application

Now that you have completed all of the activities in this chapter, use the concepts and techniques that you've learned to respond to these questions. Notice that these problems are continuations of the corresponding scenarios and questions posed at the end of Chapter 13. Each problem begins with a residual analysis to assess the conditions for inference.

  1. Scenario: Return to the NHANES SRS data table.

    1. Exclude and hide respondents under age 18 and all males, leaving only adult females. Perform a regression analysis for BMI and waist circumference for adult women, evaluate the OLS conditions for inference, and report your findings and conclusions.

    2. If you can safely use this model, provide the 95% confidence interval for the slope and explain what it tells us about BMI for adult women.

    3. If you can safely use this model, use the linear fit graph to read off an approximate confidence interval for the mean BMI for women whose waist measurements are 68 cm.

  2. Scenario: High blood pressure continues to be a leading health problem in the United States. In this problem, continue to use the NHANES SRS table. For this analysis, we'll focus on just the following variables:

    • RIAGENDR: respondent's gender

    • RIDAGEYR: respondent's age in years

    • BMXWT: respondent's weight in kilograms

    • BPXPLS: respondent's resting pulse rate

    • BPXSY1: respondent's systolic blood pressure ("top" number in BP reading)

    • BPXD1: respondent's diastolic blood pressure ("bottom" number in BP reading)

    1. Perform a regression analysis with systolic BP as the response and age as the factor. Analyze the residuals and report on your findings.

    2. Perform a regression analysis of systolic and diastolic blood pressure and then evaluate the residuals. Explain fully what you have found.

    3. Create a scatterplot of systolic blood pressure and pulse rate. One might suspect that higher pulse rate is associated with higher blood pressure. Does the analysis bear out this suspicion?

  3. Scenario: We'll continue to examine the World Development Indicators data in BirthRate 2005. We'll broaden our analysis to work with other variables in that file:

    • MortUnder5: deaths, children under 5 years per 1,000 live births

    • Mortlnfant: deaths, infants per 1,000 live births

      1. Create a scatterplot for MortUnder5 and Mortlnfant. Run the regression and explain what a residual analysis tells you about this sample.

  4. Scenario: How do the prices of used cars vary according to the mileage of the cars? Our data table Used Cars contains observational data about the listed prices of three popular compact car models in three different metropolitan areas in the U.S. All of the cars are two years old.

    1. Create a scatterplot of price versus mileage. Run the regression, analyze the residuals, and report your conclusions.

    2. If it is safe to draw inferences, provide a 95% confidence interval estimate of the amount by which the price falls for each additional mile driven.

    3. You see an advertisement for a used car that has been driven 35,000 miles. Use the model and the scatterplot to provide an approximate 95% interval estimate for an asking price for this individual car.

  5. Scenario: Stock market analysts are always on the lookout for profitable opportunities and for signs of weakness in publicly traded stocks. Market analysts make extensive use of regression models in their work, and one of the simplest ones is known as the Random, or Drunkard's, Walk model. Simply put, the model hypothesizes that over a relatively short period of time the price of a particular share of stock will be a random deviation from the prior day. If Yt represents the price at time t, then Yt = Yt-1 + ε. In this problem, you'll fit a random walk model to daily closing prices for McDonald's Corporation for the first six months of 2009 and decide how well the random walk model fits. The data table is called MCD.

    1. Create a scatterplot with the daily closing price on the vertical axis and the prior day's closing price on the horizontal. Comment on what you see in this graph.

    2. Fit a line to the scatterplot, and evaluate the residuals. Report on your findings.

    3. If it is safe to draw inferences, is it plausible that the slope and intercept are those predicted in the random walk model?

  6. Scenario: Franz Joseph Haydn was a successful and well-established composer when the young Mozart burst upon the cultural scene. Haydn wrote more than twice as many piano sonatas as Mozart. Use the data table Haydn to perform a parallel analysis to the one we did for Mozart.

    1. Evaluate the residuals from the regression fit using Parta as the response variable.

    2. Compare the Haydn data residuals to the corresponding residual graphs using the Mozart data; explain your findings.

  7. Scenario: Throughout the animal kingdom, animals require sleep, and there is extensive variation in the number of hours in the day that different animals sleep. The data table called Sleeping Animals contains information for more than 60 mammalian species, including the average number of hours per day of total sleep. This will be the response column in this problem.

    1. Estimate a linear regression model using gestation as the factor. Gestation is the mean number of days that females of these species carry their young before giving birth. Assess the conditions using residual graphs and report on your conclusions.

  8. Scenario: For many years, it has been understood that tobacco use leads to health problems related to the heart and lungs. The Tobacco Use data table contains recent data about the prevalence of tobacco use and of certain diseases around the world.

    1. Using Cancer Mortality (CancerMort) as the response variable and the prevalence of tobacco use in both sexes (TobaccoUse), run a regression analysis and examine the residuals. Should we use this model to draw inferences? Explain.

    2. Using Cardiovascular Mortality (CVMort) as the response variable and the prevalence of tobacco use in both sexes (TobaccoUse), run a regression analysis and examine the residuals. Should we use this model to draw inferences? Explain.

  9. Scenario: In Chapter 2 our first illustration of experimental data involved a study of the compressive strength of concrete. In this scenario, we look at a set of observations all taken at 28 days (4 weeks) after the concrete was initially formulated; the data table is called Concrete 28. The response variable is the Compressive Strength column, and we'll examine the relationship between that variable and two candidate factor variables.

    1. Use Cement as the factor, run a regression analysis, and evaluate the residuals. Report on your findings in detail.

    2. Use Water as the factor, run a regression analysis, and evaluate the residuals. Report on your findings in detail.

  10. Scenario: Prof. Frank Anscombe of Yale University created an artificial data set to illustrate the hazards of applying linear regression analysis without looking at a scatterplot (Anscombe 1973). His work has been very influential, and JMP includes his illustration among the sample data tables packaged with the program. You'll find Anscombe both in this book's data tables and in the JMP sample data tables. Open it now.

    1. In the upper-left panel of the data table, you'll see a red triangle next to the words The Quartet. Click on the triangle, and select Run Script. This produces four regression analyses corresponding to four pairs of response and predictor variables. Evaluate the residuals for all four regressions and report on what you find.

    2. For which of the four sets of data (if any) would you recommend drawing inferences based on the OLS regressions? Explain your thinking.


  

You are currently reading a PREVIEW of this book.

                                                                                        

Get instant access to over
$1 million worth of books and videos.

  

Start a Free Trial