Simple Linear Regression with Minitab

This document assumes you have Minitab installed on your computer. The instructions are based on Minitab 14, which is known to run on Windows 98SE and XP. Version 15 requires XP but initially had problems with Vista.

Getting and Opening Data Files

We will use an example data set from Regression Analysis by Example (4th ed.) by Chatterjee and Hadi (Wiley, New York, 2006). Go to the web site for this book at http://www.ilr.cornell.edu/~hadi/rabe4/. We will use the computer repair data. In this study a random sample of service call records for a computer repair operation were examined and the length of each call (in minutes) and the number of components repaired or replaced were recorded. The data are in file P027.MTB. Follow the directions on the book's home page to download this and save it somewhere where you can find it on your computer. The web site is a little misleading in that the file you actually obtain will be P027.zip. You will then need a program that can unzip this file into P027.mtb. Now you can run Minitab. From Minitab's main menu, select File, Open Worksheet and browse to where you put P027.mtb. Select that file and click on the Open button.

Simple Plots for Each Variable

Of course, the first step is to look at your data. Pull down the Graph menu and select Stem-and-Leaf.  Double-click on each variable.

MTB > GStd.
MTB > Stem-and-Leaf 'Minutes' 'Units'.

Stem-and-leaf of Minutes  N  = 14
Leaf Unit = 10


 2   0  22
 3   0  4
 5   0  67
(3)  0  899
 6   1  01
 4   1
 4   1  445
 1   1  6


Stem-and-leaf of Units  N  = 14
Leaf Unit = 0.10


 1   1   0
 2   2   0
 3   3   0
 5   4   00
 6   5   0
(2)  6   00
 6   7   0
 5   8   0
 4   9   00
 2   10  00

We could have made histograms or boxplots.  We simply want to see if there are any peculiarities in the data for each variable by itself before we look into relationships between variables.  We see none here.

Scatterplots

Pull down the Graph menu and select Scatterplot. Accept the defaults on the first dialog. On the second, you must indicate your response and predictor variables. Available choices appear in a list at left. Double click on Minutes to select it as the Y variable. The cursor moves to the X variable column and you can now double click on Units to select that. Then click on OK and a scatterplot will appear in a new window.

scatterplot

Note that the dialog boxes include numerous options (which we do not need at the moment). We are not surprised to see that the length of a service call increases with the number of components repaired or replaced.

Correlation and Covariance

Minitab has a command language in addition to a menu interface. Each has its pros and cons. A major advantage of the command line is that you can store commands in a macro file and rerun the same analysis over and over. This is useful, for example, if you do the same analysis every month for new data on the same variables (such as data on your business). If commands are not appearing in the top window (as they did in the stem and leaf example above) when you make choices from the menu, select Editor (not Edit) and Enable Commands. You can check this by putting the cursor at the Minitab prompt "MTB>" in the top half of the screen and typing desc c1. (This stands for "describe column 1" (of the spreadsheet on the bottom half of the screen)). The commands for correlation and covariance are easy to remember. You can cite variables by column (as c1, for example) or by name (in single quotes).

MTB > corr 'Minutes' 'Units'

Pearson correlation of Minutes and Units = 0.994
P-Value = 0.000

MTB > covariance 'Minutes' 'Units'

           Minutes      Units
Minutes  2136.0275
Units     136.0000     8.7692

The extra numbers in the covariance table are variances. Pull down the Stat menu and select Basic Statistics, then Display Descriptive Statistics. Pick one or both variables. Summary statistics for the variable(s) of your choice should appear in the existing top window.

Variable   N  N*   Mean  SE Mean  StDev  Minimum     Q1  Median     Q3  Maximum
Minutes   14   0   97.2     12.4   46.2     23.0   60.3    96.5  146.0    166.0
Units     14   0  6.000    0.791  2.961    1.000  3.750   6.000  9.000   10.000

(You may get a different selection of summary statistics.) These summaries do not include variances, so go through the process again. In the dialog box, click on the Statistics button. You get a list of possible summary statistics. Tick Variance and make any other changes that appeal to you. Click OK. The variances should be added to a new summary table and should match (except for rounding) the numbers in the covariance window.

Variable   N  N*   Mean  SE Mean  StDev  Variance  Minimum     Q1  Median
Minutes   14   0   97.2     12.4   46.2    2136.0     23.0   60.3    96.5
Units     14   0  6.000    0.791  2.961     8.769    1.000  3.750   6.000

Variable     Q3  Maximum
Minutes   146.0    166.0
Units     9.000   10.000

Running the Regression

Now pull down Stat yet again and select Regression, Regression. Select your variables and click OK. A brief regression output should appear.

The regression equation is
Minutes = 4.16 + 15.5 Units


Predictor     Coef  SE Coef      T      P
Constant     4.162    3.355   1.24  0.239
Units      15.5088   0.5050  30.71  0.000


S = 5.39172   R-Sq = 98.7%   R-Sq(adj) = 98.6%


Analysis of Variance

Source          DF     SS     MS       F      P
Regression       1  27420  27420  943.20  0.000
Residual Error  12    349     29
Total           13  27768

The t-values (here "T") test the hypotheses that the corresponding population parameters are 0. If you wish to test a nonzero value, subtract it from the coefficient in the regression output window and divide by the coefficient's s.e. (Use a calculator for this.) Similarly, if you want confidence intervals, use the coefficient plus or minus the product of its s.e. with a t-value for the desired confidence level and 12 degrees of freedom. (Use a calculator for this.)

To plot the regression line on the scatterplot, redo the scatterplot but this time pick With Regression in the first dialog box.

scatterplot with line

You can cut and paste Minitab output into your own reports but note that the text windows on the statistics.com Assignments page will only accept text input. So, of the output examples above, the scatterplots could not be pasted there. All the text that appears in the upper Session window in Minitab can be pasted into Assignments. To copy the contents of a graphics window (say for a report you are writing with your word processor) , first click on the graph window to make it the current window if it isn't already, then select Edit > Copy Window. You will not see anything happen but if you go to another application you can paste there.

Regression through the Origin

To fit a regression line through the origin (i.e., intercept=0) redo the regression but this time select Options on the dialog box where you pick variables. (While you are here, notice some of the other choices, such as computing the Durbin-Watson statistic.) Untick the box Fit Intercept. The new results (with commands) should be

MTB > Regress 'Minutes' 1 'Units';
SUBC>   NoConstant;
SUBC>   Brief 2.

The regression equation is
Minutes = 16.1 Units


Predictor      Coef  SE Coef      T      P
Noconstant
Units       16.0744   0.2213  72.63  0.000


S = 5.50228


Analysis of Variance

Source          DF      SS      MS        F      P
Regression       1  159683  159683  5274.42  0.000
Residual Error  13     394      30
Total           14  160077

or minutes = 16.0744*units.

If you wish to explore Minitab's command line, pull down the main Help menu and select Session Command Help (which may not be present unless you asked for it when installing Minitab).

Predictions

To make a prediction (with confidence interval), rerun the regression. (We will rerun the original regression rather than the one through the origin.) In the Options dialog box is a window labeled Prediction Intervals for New Observations. Type in the value of Units for which you want a prediction of Minutes. We predicted the length of a service call with four components repaired or replaced. The prediction output appears after the regular regression output and looks like


Predicted Values for New Observations

New
Obs    Fit  SE Fit      95% CI          95% PI
  1  66.20    1.76  (62.36, 70.03)  (53.84, 78.55)


Values of Predictors for New Observations

New
Obs  Units
  1   4.00

© 2007 statistics.com