Simple Linear Regression with Excel

Getting and Opening Data Files

We will use an example data set from Regression Analysis by Example (4th ed.) by Chatterjee and Hadi (Wiley, New York, 2006).We will use the computer repair data. In this study a random sample of service call records for a computer repair operation were examined and the length of each call (in minutes) and the number of components repaired or replaced were recorded. The data is given below.

 
Minutes	Units
23	1
29	2
49	3
64	4
74	4
87	5
96	6
97	6
109	7
119	8
149	9
145	9
154	10
166	10

Scatterplots:

To draw a Scatter plot select Insert then Chart wizard and in Standard Types select XY(Scatter) then click on first scatter plot and press Next, then click the Series then click Add,after that in x-values select a unit data values and in y-values select a Minutes data values. Click next to write chart title and lable x and y axis, then Finish We will get a Scatter plot.

{short description of image}

Correlation and Covariance:

In Excel we can directly find Correlation matrix. To find Correlation matrix select Tools then Data analysis, in that select Correlation and click on ok. Next in range select two columns with labels, then click on the Labels in First Row finally click Ok We get Correlation Matrix as below:

		Minutes		Units
Minutes		1	
Units		0.993698746	1

Form the above matrix we can observe that the correlation coefficent of Minutes and Units is 0.993698746, which is positive and close to 1. This implies that there is strong positive correlation between the length of each call (in minutes) and the number of components repaired or replaced were recorded.

We can find directly Variance-Covariance matrix in Excel. To find Variance-Covariance matrix select Tools then Data analysis, in that select Covariance and click on ok. Next in range select two columns with labels, then click on the Labels in First Row finally click Ok. We get Variance Covariance Matrix as below:

		Minutes		Units
Minutes		1983.454082	
Units		126.2857143	8.142857143

From the above matrix we can observe that the covariance between Minute and Units is 126.2857143. We also get the variances of Minute and units as 1983.454082 and 8.142857143 respectively.

Running the Regression :

In excel 2003 select Tools then Data analysis, in that select Regressionand click on ok. Next in Input Y- Range select variable Minute with label and in Input X-range select variable Unit with label, then click on Labels and on the Line Fit Plots. Finally Click Ok. A brief regression output should appear. In 2007 select Data then Data analysis then follow the same procedure as in 2003.


SUMMARY OUTPUT								
								
Regression Statistics								
Multiple R		0.993698746							
R Square		0.987437198							
Adjusted R Square	0.986390298							
Standard Error		5.391724917							
Observations		14							

ANOVA		
		df	SS		MS		F		Significance F			
Regression	1	27419.50877	27419.50877	943.2009225	8.91625E-13			
Residual	12	348.8483709	29.07069758					
Total		13	27768.35714						
								
		Coefficients	Standard Error	t Stat		P-value		Lower 95%	Upper 95%
Intercept	4.161654135	3.355100399	1.240396304	0.238534368	-3.148481533	11.4717898
Units		15.50877193	0.504981306	30.71157636	8.91625E-13	14.4085122	16.60903166

RESIDUAL OUTPUT								
								
Observation	Predicted Minutes	Residuals						
	1	19.67042607		 3.329573935						
	2	35.17919799		-6.179197995						
	3	50.68796992		-1.687969925						
	4	66.19674185		-2.196741855						
	5	66.19674185		 7.803258145						
	6	81.70551378		 5.294486216						
	7	97.21428571		-1.214285714						
	8	97.21428571		-0.214285714						
	9	112.7230576		-3.723057644
	10	128.2318296		-9.231829574
	11	143.7406015		 5.259398496
	12	143.7406015		 1.259398496
	13	159.2493734		-5.249373434
	14	159.2493734		 6.750626566

{short description of image}

From the above output we get the regression equation is as follows

Minutes = 4.16 + 15.5 Units

The t-values (here "t stat") test the hypotheses that the corresponding population parameters are 0. If you wish to test a nonzero value, subtract it from the coefficient in the regression output window and divide by the coefficient's s.e. (Use a calculator for this.) Similarly, if you want confidence intervals, use the coefficient plus or minus the product of its s.e. with a t-value for the desired confidence level and 12 degrees of freedom. (Use a calculator for this.).

To plot the regression line on the scatterplot we need to format the output graph. First click on fitted line points then right click and select a chart type. Next select third chart type (Scatter with data points connected by the smoothed line without Markers) and then click Ok.

We get Scatter plot with fitted regression line as follows:

{short description of image}

Regression through the Origin:

To fit a regression line through the origin (i.e., intercept=0) redo the regression but this time select option Constant is Zero.

Excel output looks like below:

SUMMARY OUTPUT		
		
Regression Statistics		
Multiple R		0.992887937	
R Square		0.985826455	
Adjusted R Square	0.908903378	
Standard Error		5.50227961							
Observations		14							

ANOVA								
		df	SS		MS		F		Significance F			
Regression	1	27374.78109	27374.78109	904.2017485	1.14526E-12			
Residual	13	393.5760518	30.27508091					
Total		14	27768.35714						
								
		Coefficients	Standard Error	t Stat		P-value		Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	0		#N/A		#N/A		#N/A		#N/A		#N/A		#N/A		#N/A
Units		16.07443366	0.221334146	72.62518671	2.38032E-18	15.5962704	16.55259692	15.5962704	16.55259692



RESIDUAL OUTPUT								
		
Observation	Predicted Minutes	Residuals
	1	16.07443366		 6.925566343
	2	32.14886731		-3.148867314
	3	48.22330097		 0.776699029
	4	64.29773463		-0.297734628
	5	64.29773463		 9.702265372
	6	80.37216828		 6.627831715
	7	96.44660194		-0.446601942
	8	96.44660194		 0.553398058
	9	112.5210356		-3.521035599
	10	128.5954693		-9.595469256
	11	144.6699029		 4.330097087
	12	144.6699029		 0.330097087
	13	160.7443366		-6.74433657
	14	160.7443366		 5.25566343

Form the above output we get the regression equation as follows:

Minutes = 16.1 Units

Predictions

To find the predicted value of Y for given value of x (Units)= 4. Plug Unit = 4 in the regression equation Minutes = 4.16 + 15.5 Units We get, Minute = 4.16 + 15.5 * 4 This implies Minute = 66.16.

Therefore predicted value of minute in 66.16