

However, the transformation actually makes the fit significantly worse.

In looking at the F statistic and the t-statistics for the individual ‘s, we can see that the regression is still significant. We attempt a log transformation of the independent variable to remove some of the skew. * Brain weight Versus log(bodyweight) */ * Brain weight Versus log(bodyweight) */ RUN PROC REG DATA=BB /* Run the Regression again on new BB */ * Add log transform of Bodyweight */ RUN A reasonable choice might be a log transform of the independent variable.ĭATA BB /* Create new Dataset BB */ SET BB /* Using existing Set BB */ To fix this, one thing we can do to make a valid model is to attempt transformations of the variables. It is clear to see that the residuals do not follow a normal distribution. We can also see this by the histogram of residuals on the bottom, with a superimposed normal curve. Given that one of the assumptions of least squares linear regression is that the residuals follow a normal distribution, we can see we have a problem here. In the “Quantile to Quantile” (or QQ) plot, we see how closesly the residuals from the fit match a normal distribution. Look at the diagnostic plots, the fitted vs residual plots, and the QQ plot.

However, none of these statistics accurately capture what is wrong with this fit. They say that the regression is significant, so that there is a linear effect. The individual statistics test significance of point estimates for slope and intercept, while the statistic is the overall significance of the regression. MODEL BRAINWEIGHT = BODYWEIGHT /* Brain ~ Body */īy default, the proc reg statement generates quite a bit of output. MODEL BRAINWEIGHT = BODYWEIGHT /* Brain ~ Body */ RUN /* Run the Regression */ PROC REG DATA=BB /* Regression on BB Dataset */
