Table of Contents#### Download Safari Books Online apps: Apple iOS | Android | BlackBerry

### 14.7 Summary

Entire Site

Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Simple linear regression analysis is used with two continuous variables (X, Y). X is the independent variable, Y is the dependent variable.

The model, estimated from the data, is a straight line with slope b

_{1}and intercept b_{0}The estimated regression equation is used to give a summary of the data, to estimate and test the change in the Y-variable as a function of the X-variable, and to estimate means and predict Y values.

The regression equation does not fit the data perfectly, except in rare cases. Therefore, the adequacy of the fit of the equation needs to be assessed.

Inadequate model fit might be because of outlier observations, because the straight-line model does not adequately represent the relationship between Y and X, or because the variability in Y is too large for the observed range of X-values.

The estimated slope coefficient b

_{1}is the estimated average change in Y per unit change in X.The estimated intercept b

_{0}is the estimated average of Y when X = 0. An interpretation of the intercept is practical only when a value of X = 0 makes sense and when X = 0 is within or near the range of the data.The simple regression relationship is termed statistically significant if the null hypothesis H

_{0}: β_{1}= 0 can be rejected based on thep-value associated with b_{1}.Regression can be used for estimating the average of Y at a specific value of X, or for predicting a future value of Y at X, leading to confidence and prediction intervals respectively. For the same confidence level and X, prediction intervals are much wider than confidence intervals.

Prediction intervals are very sensitive to deviations from the Normal distribution assumption.

Adequate fit can be assessed using RMSE, R

^{2}, scatterplots, and residual plots.A good measure of the effectiveness of the regression line is to compare the RMSE (standard deviation around the regression line) with the standard deviation around the overall mean of Y. A relatively small RMSE indicates a good fit.

RSquare (R

^{2}) is a widely used measure of fit. R^{2}= 0 indicates no fit, whereas R^{2}= 1 indicates a perfect linear fit.In regression analysis, the presence of outliers can distort the results.

Extrapolation beyond the range of the data is dangerous, because you cannot ascertain whether or not the model equation will be adequate outside the range of the data.