14.7 Summary

  • Simple linear regression analysis is used with two continuous variables (X, Y). X is the independent variable, Y is the dependent variable.

  • The model, estimated from the data, is a straight line with slope b1 and intercept b0

  • The estimated regression equation is used to give a summary of the data, to estimate and test the change in the Y-variable as a function of the X-variable, and to estimate means and predict Y values.

  • The regression equation does not fit the data perfectly, except in rare cases. Therefore, the adequacy of the fit of the equation needs to be assessed.

  • Inadequate model fit might be because of outlier observations, because the straight-line model does not adequately represent the relationship between Y and X, or because the variability in Y is too large for the observed range of X-values.

  • The estimated slope coefficient b1 is the estimated average change in Y per unit change in X.

  • The estimated intercept b0 is the estimated average of Y when X = 0. An interpretation of the intercept is practical only when a value of X = 0 makes sense and when X = 0 is within or near the range of the data.

  • The simple regression relationship is termed statistically significant if the null hypothesis H0: β1 = 0 can be rejected based on thep-value associated with b1.

  • Regression can be used for estimating the average of Y at a specific value of X, or for predicting a future value of Y at X, leading to confidence and prediction intervals respectively. For the same confidence level and X, prediction intervals are much wider than confidence intervals.

  • Prediction intervals are very sensitive to deviations from the Normal distribution assumption.

  • Adequate fit can be assessed using RMSE, R2, scatterplots, and residual plots.

  • A good measure of the effectiveness of the regression line is to compare the RMSE (standard deviation around the regression line) with the standard deviation around the overall mean of Y. A relatively small RMSE indicates a good fit.

  • RSquare (R2) is a widely used measure of fit. R2 = 0 indicates no fit, whereas R2 = 1 indicates a perfect linear fit.

  • In regression analysis, the presence of outliers can distort the results.

  • Extrapolation beyond the range of the data is dangerous, because you cannot ascertain whether or not the model equation will be adequate outside the range of the data.


