QUESTION:

Weighted Least Squares. What is it?

When fitting an Ordinary Least Squares regression line, what is the implicit weighting of each record with respect to estimating the slope of the line (assume only bivariate model)?

Example: I have 100 records, X and Y -- estimate Y = alpha + betaX + e

Does each record contribute equally to the estimated beta? or is their an implict weighting of the size of the Y variable or the residual squared???

ANSWER:

There is nothing implicit here. When you use OLS you are specifying explicitely that each and every Y value has a constant variance and thus each and very Y value is equally unbelievable or believable.

Ordinary Least Squares (OLS) tries to estimate the relationship between a 'dependent' variable and set of 'independent' variables. The sum of squares of the errors is to be as small as possible (least squares). However, if the variance of the y's can not assumed to be constant then one can use Generalized Least Squares which allows and incorporates non-constant variance ( and even autocorrelated errors ). The concept of influential observations be they Pulses, Seasonal Pulses, Level Shifts or Local Time Trends is an entirely different issue and speaks to the non-constancy of the mean of the errors not the variance of the errors.

I believe or read your question to speak to the issue of Weighted Regression which allows for non-constant variance. Other readers have responded as if you meant the issue of influential observations. I am not saying this is unimportant. I am simply saying that non-gaussian violations can arise in the variance as compared to non-gaussian violations in the mean of the errors.

Continuing .... Koopmans in 1937 developed the concept of Weighted Regression in his seminal piece "Linear Regression Analysis in Economic Time Series". The idea is that rather than minimizing the sum of e(i) squared minimize the sum of w(i)*e(i) squared where w(i) was the square root of the reciprocal of the variance for the ith reading.

Thus, if you had 20 observations and you either observed or knew beforehand that the variance for the first 10 y readings was 4 times larger than the variance of the last 10 then you would set out to minimize

(1/2) * e(1)

(1/2) * e(2)

(1/2) * e(3)

(1/2) * e(4)

(1/2) * e(5)

(1/2) * e(6)

(1/2) * e(7)

(1/2) * e(8)

(1/2) * e(9)

(1/2) * e(10)

(1) * e(11)

(1) * e(12)

(1) * e(13)

(1) * e(14)

(1) * e(15)

(1) * e(16)

(1) * e(17)

(1) * e(18)

(1) * e(19)

(1) * e(20)

rather than simple (equally weighted) sum of squares of the e's.

We write a piece of software called AUTOBOX which incorporated these concepts and extends the concept of Weighted Regression to Weighted ARIMA and Weighted Transfer Functions.

 

The issue is one of what weights do you use. If you use the estimated residuals to compute estimated local variances and then use these (proven to be different) variances to obtain weighted estimates then this is bothersome to some academics but not to this practitioner of applied econometrics.

I suggest that you go to a good statistical library and pursue Weighted Regression. You will find that good econometric texts will pursue estimation under heteroscedasticity (non-constant variance).

Please visit our web site to see lots of material on strategies for the identification of gaussian violations and practical remedies for dealing with them including non-constancy of parameters.

Unfortunately, I don't know of any c/c++ code to do this. The trick is to simply assume OLS and to identify anomalous data points that may be either one-time onlies (pulses) or seasonal pulses or level shifts or local time trends and then to create or impute an "adjusted series" or modified series which is free of "unusual values" . AUTOBOX call this the ADJUSTED SERIES.

Next, you estimate OLS and order the residuals from 1 to N based on the order of the Y's . Now break these N residuals into two mutually exclusive sets ( for example 1-10 and 11-20 where N=20 ) . Now compute the two variances and set up an F test to hypothesis that the two variances are equal ... and determine the P value for this test. Now make the groupings 1-9 versus 11-20 and recompute the P value. Do this for all possible groupings and if you use a minimum of 5 in a group thus would be 11 tests or 11 P values. Select the one that is smallest and assess whether or not it is significant against some threshold (perhaps .01) .

If no other differences or contrasts can be generated and the conclusion is that there was a variance change at time period 11 then one can ( not without losing some purity here ) obtain bootstrap estimates of the local variances

say 1-10 had a variance of 24 while 11-20 had a variance of 6

one would then have a four to one ratio and then one could use a weighted regression package (AUTOBOX,SAS and others)where the weights would be

1/4,1/4,1/4,1/4,1/4,1/4,1/4,1/4,1/4,1/4,1,1,1,1,1,1,1,1,1,1

The program would take these weights and use the square-root of these weights as multipliers of individual model errors in computing the error sum of squares which is to be minimized.

The term 1/2 is used on the error while the square of this (viz 1/4 ) is used on the square of the error. Perhaps I didn't make myself clear in the original response. Generalized least squares takes the inverse of V which is in this case 4,4,4,4,4,4,4,4,4,4,1,1,1,1,1,1,1,1,1,1

(N.B. a diagonal variance covariance matrix of order 20 by 20 )

It's inverse is

1/4,1/4,1/4,1/4,1/4,1/4,1/4,1/4,1/4,1/4,1,1,1,1,1,1,1,1,1,1

and the square root of these diagonal elements gives the weights for the error whilst the elements themselves would be the weights for the square of the errors.