Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint

14.7 Summary

  • Simple linear regression analysis is used with two continuous variables (X, Y). X is the independent variable, Y is the dependent variable.

  • The model, estimated from the data, is a straight line with slope b1 and intercept b0

  • The estimated regression equation is used to give a summary of the data, to estimate and test the change in the Y-variable as a function of the X-variable, and to estimate means and predict Y values.

  • The regression equation does not fit the data perfectly, except in rare cases. Therefore, the adequacy of the fit of the equation needs to be assessed.

  • Inadequate model fit might be because of outlier observations, because the straight-line model does not adequately represent the relationship between Y and X, or because the variability in Y is too large for the observed range of X-values.

  • The estimated slope coefficient b1 is the estimated average change in Y per unit change in X.

  • The estimated intercept b0 is the estimated average of Y when X = 0. An interpretation of the intercept is practical only when a value of X = 0 makes sense and when X = 0 is within or near the range of the data.

  • The simple regression relationship is termed statistically significant if the null hypothesis H0: β1 = 0 can be rejected based on thep-value associated with b1.

  • Regression can be used for estimating the average of Y at a specific value of X, or for predicting a future value of Y at X, leading to confidence and prediction intervals respectively. For the same confidence level and X, prediction intervals are much wider than confidence intervals.

  • Prediction intervals are very sensitive to deviations from the Normal distribution assumption.

  • Adequate fit can be assessed using RMSE, R2, scatterplots, and residual plots.

  • A good measure of the effectiveness of the regression line is to compare the RMSE (standard deviation around the regression line) with the standard deviation around the overall mean of Y. A relatively small RMSE indicates a good fit.

  • RSquare (R2) is a widely used measure of fit. R2 = 0 indicates no fit, whereas R2 = 1 indicates a perfect linear fit.

  • In regression analysis, the presence of outliers can distort the results.

  • Extrapolation beyond the range of the data is dangerous, because you cannot ascertain whether or not the model equation will be adequate outside the range of the data.


You are currently reading a PREVIEW of this book.


Get instant access to over $1 million worth of books and videos.


Start a Free 10-Day Trial

  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint