Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
The file BestBSchools.jmp contains data on post-MBA salaries for graduates of 60 business schools. The following variables are available for analysis:
| Variable Name | Description |
|---|---|
| 5 year salary gain ($thousand) | Total compensation after graduation, less tuition and forgone compensation. The five-year gain is before taxes and adjusted for the time value of money. |
| Years to payback | Payback on MBA tuition. |
| Pre-MBA salary | In thousands of dollars. |
| Post-MBA salary | In thousands of dollars. |
| Tuition | Total out-of-state tuition. |
Use the All Possible Models method to suggest candidate regression models that are good predictors of the 5-year salary gain.
Analyze the candidate models and select the model that best predicts 5-year salary gain. Write a paragraph providing justification for the model you selected.
A college admissions office is seeking to increase the number of students that remain at the school after their freshman year. The admissions files and transcripts for thirty randomly selected sophomores have been obtained. Those factors believed to be good predictors of freshman grade point average (GPA) have been entered into the file freshman_gpa.jmp. The definitions of the variables are as follows:
| Variable Name | Description |
|---|---|
| High School GPA | High school grade point average on a four-point scale. |
| SAT Critical Reading | Score on the Critical Reading component of the SAT Reasoning Test. Scores can range from 200 to 800. |
| SAT Math | Score on the Math component of the SAT Reasoning Test. Scores can range from 200 to 800. |
| SAT Writing | Score on the Writing component of the SAT Reasoning Test. Scores can range from 200 to 800. |
| Motivation | Rated on a scale from 0-100 using a questionnaire. |
| School Type | Public or private school. |
| Freshman GPA | Grade point average on a four-point scale for the freshman year of college. |
Use the data to build a multiple regression model that predicts freshman grade point average. Give the regression equation and an assessment of the quality of the model. Discuss the method used to create the model (Stepwise, All Possible Models) and why you chose that method.
A regional economic development commission is investigating the effect of professional sports teams on local economies. The file professional_sports.jmp contains information on cities that have both Major League Baseball (MLB) and National Football League (NFL) franchises. You will find the variable definitions in the Notes option found in the Column Properties menu in the Column Information dialog box.
Use stepwise regression to find candidate models to predict Major League Baseball attendance. Compare the Forward, Backward, and Mixed stepwise regression results.
Select a final model for predicting MLB attendance.
Use stepwise regression to find candidate models to predict National Football League attendance. Compare the Forward, Backward, and Mixed stepwise regression results.
Select a final model for predicting NFL attendance.
Discuss the differences in the final models to predict MLB and NFL attendance.
The file Canton_homes.jmp contains realtor-supplied selling prices and 10 other characteristics of homes, such as square footage, lot size, and so on, for a sample of 28 homes in the Canton, Ohio area. Use this data to develop a multiple regression model.
Transform any of the continuous variables as needed (e.g., Year Built).
Compare the models from Forward, Backward, and Stepwise regression.
Change the Prob to Leave and Prob to Enter values and note the effect.
How do the stepwise regression models compare with those suggested by the All Possible Models method?
Use one of the stepwise models as a starting point to develop a final model. Which variables are most important in predicting house price?
Evaluate the precision associated with this model. Is the model sufficiently precise to be of use to a real estate agent or buyer? If not, what additional actions would you take to improve the model?
A food scientist is analyzing the nutritional value of breakfast cereals and would like to develop a predictive model for calories. The data on 76 different cereals with calories as the Y-variable and 12 X-variables are in Cereal_Calories.jmp.
Obtain the pairwise correlations and scatterplots for all continuous variables. Mark or label any outliers you identify on the scatterplots. Are there highly correlated X-variables?
Develop a good multiple regression model for predicting calories.
Assess the goodness-of-fit.
Analyze the residuals. Do any of the outliers identified from the scatterplots appear as outliers in the residual plots? Exclude the outliers and rerun the regression to determine their influence on the model. Should these outliers be removed from the model? Explain.
One of the criteria a rental car company uses to select new cars for its fleet is average annual fuel cost. The data in fuel economy.jmp were obtained from the US Department of Energy Web site. You will find the variable definitions in the Notes option found in the Column Properties menu in the Column Information dialog box. Determine those factors that affect the average annual fuel cost by building a multiple regression model. Consider creating interaction terms to improve the model. Assess the goodness-of-fit of the model and the impact of any outliers. What additional factors not in the data set might potentially improve the model? How could the model assist in deciding which model cars the rental company should purchase?
An office manager is preparing next year's budget for computer equipment based on employee requests for replacement. Data for different types of computers have been collected from each of two authorized vendors and can be found in new_computer_purchases.jmp. The following variables are available for predicting the price of a new computer:
| Variable Name | Definition |
|---|---|
| RAM | Amount of random access memory in gigabytes. |
| HD Capacity | Amount of hard drive storage in gigabytes. |
| Processor | One of two models: GX2 or GX3. |
| Type | Laptop or desktop. |
| Screen Width | Measured in inches. |
| Vendor | OfficePlus or Clips. |
| Price | U.S. dollars. |
Build a multiple regression model that can be used to predict the price of a computer. Give the regression equation and associated goodness-of-fit measures. Use your model to answer the following questions.
Is one vendor preferable to the other for purchasing new computers?
What factor is most important in determining the price of a new computer?
Department heads have been discouraging employees from replacing desktop computers with laptops on the premise that laptops are more expensive than desktops. Based on the available data, are the department heads justified in their actions?
Should HD Capacity be included in the multiple regression model? Explain why or why not.
A two-income family is evaluating childcare options in their area. The file child_care_centers.jmp contains data as published in a local business magazine. Note that one of the childcare centers is open for only 3¼ hours; this is an after-school program. The variables have the following definitions:
| Variable Name | Definition |
|---|---|
| Weekly Rate ($) | Cost of childcare for one child for one week, in dollars. |
| FTE Enrollees | The number of full-time equivalent children enrolled at the childcare center. |
| Licensed Capacity | The maximum number of children that can be enrolled at a childcare center. |
| FTE Staff | The number of full-time equivalent staff employed at the childcare center. |
| Hours open/day | The number of hours the childcare center is open each day. |
| Summer Program | Indicates whether a center offers special activities (weekly field trips, swimming lessons, etc.) during the summer. |
Build a multiple regression equation to predict the weekly rate (cost) of childcare. Decide whether to include the after-school program in your analysis.
Create new two variables, Hourly Rate and Enrollee to Staff Ratio, by adding two columns with appropriate formulas to your JMP data table.
Look for a better regression model using either Hourly Rate or Weekly Rate as the dependent variable. Consider the use of Enrollee to Staff Ratio as an X-variable in your model.
Briefly discuss the differences between the various models you constructed.
Select a final model. Write a brief summary presenting your final model. Discuss the quality of the model and justification for excluding any outliers. Explain your choice for the dependent variable. Is your final equation easy to interpret? What does this model reveal about the childcare market in this area? Are there additional variables that should be investigated?