QUESTION:
Can I get help in the following: I am working on five different genotypes of Sunflowers. I have used regression
to study the trends in few parameters. Some parameters have linear fit and some have quadratic fit
(parameter * time curves). Now I have problems in comparing those curves. I tried using covariance model to
compare intercept and slope separately, but that won't tell which lines are differing significantly or not. -By using
SAS can we compare the linear regression equations. Can we use any kind of mean comparision method to compare
the curves? -If one genotype fits linear curve and the other quadratic curve can we compare them?
ANSWER:
This question is as old as time itself (pun). How to distinguish between trend and random or both. Should a trend line
line ( perhaps multiple trends ? ) be used or should the forecast be based explictely on the last value plus/minus a
constant ? A trend model is a particular case of a DETERMINISTIC model i.e. an intervention model where the user
assumes that there is one and only one trend and that the trend starts immediately at time period 1 and continues
through the last point. This is an example of an INTERVENTION MODEL where the user knows a priori of this truth
and proceeds to estimation. INTERVENTION DETECTION can also lead to this model if the data so evidences it. The
root problem or opportunity is how you model the relationship of the sequential, equally-spaced observations. There are
two major approaches and I believe that the data should be allowed to suggest by analysis which one is appropriate.
Following is a brief discussion of the issue HOW TO INCORPORATE TIME Traditional time oriented re gression
analysis is usually presented in the following form: Y(t) =W0 + W1 * T + A(t) where W0 is the intercept and W1 is the
trend or slope and T is the counting numbers 1,2,3,,,N and A is a Gaussian Process. Y of course is the observed recordings,
readings, made at N equi-distant and consecutive points in time. In the following, for pedantic purposes, we assume a
simple trend and a simple AR(1) to illustrate the point. 3 Possibilities: 1 Deterministically Y(t) = W0 + W1*T
2 Stochastically Y(t) = W0 + W1*Y(t-1) 3 Both Y(t) = W0 + W1*T + B2*Y(t-1) DETERMINISTIC MODEL It is conventional
in econometric model building to use polynomials and dummy variables to describe "trends". Such methods are unsatisfactory
since it is unlikely that such deterministic trends are adequate to describe the development of observed time series
implying as they do that "growth rates remain constant indefinitely".
STOCHASTIC MODEL |
|
By incorporating differences or lags into the model one can capture stochastic or adaptive trends into the model.
EXAMPLE OF A DETERMINISTIC MODEL |
|
|
|
Y( t) =W0 + W1 T + A(t) |
|
where T = 1,2,3,....N |
|
The forecast for time period t+1 is independent of the most recent observation (Y(T)) save for the fact that the
parameters (W0 and W1) are estimated using the Y(T) reading. Even if one recomputed the model coefficients
after each new reading the impact on the one-period out forecast would be small since each observation is equally
weighted and and no particular importance is placed on recent events or readings. The fact that all T observations
participate in an egalitarian way, aside from weighted leas t squares, is both the strength and the weakness of this model.
EXAMPLE OF A STOCHASTIC MODEL |
|
|
|
Y( t) =W0 + W1 Y(t-1) + A(t) |
|
where T = 1,2,3,....N |
|
The forecast for time period t+1 is dependent of the most recent observation (Y(T)) since Y(T) is explicitly used and
that parameters (W0 and W1) are estimated using the Y(T) reading. Even if one recomputed the model coefficients
after each new reading the impact on the one-period out forecast would still be significant due to the explicit dependence
on the previous observation. The fact that all T observations participate in an egalitarian way in the estimation of the
parameters, but the forecast is de pendent on the most recent reading is both the strength and the weakness of this model.
AN EXAMPLE OF HOW ONE MIGHT IDENTIFY THE NEED FOR A SIMPLE TREND MODEL |
|
Suppose that the following model is the true model; |
|
Y( t) =W0 + W1 T + A(t) ...EQUATION 1... |
|
Since EQUATION 1 holds then EQUATION 2 follows (simply backspacing); |
|
Y( t-1) =W0 + W1 (t-1) + A(t-1) ...EQUATION 2... |
|
Subtract EQUATION 2 from EQUATION 1 and get EQUATION 3 ; |
|
Y( t)-Y(t-1)=W0 - W0 + W1 (t)-W1 (t-1) + A(t)-A(t-1) |
|
Simplifying , recognizing that ( t) - (t-1) is unity we get ; |
|
Y( t)-Y(t-1) =W1 + A(t)- 1 A(t-1) |
|
(1-B)Y( t) =W1 + A(t)- 1 A(t-1) |
|
(1-B)Y( t) =W1 + (1-éB)A(t) |
|
|
|
which is a first order difference model with a coefficient of 1 for the moving average coefficient Thus if one "identifies"
a stochastic model that requires differencing to make it stationary and upon estimation the moving-average coefficient is
1, the conclusion is that the stochastic model might be inadequate and should be replaced by a deterministic model. In
practice , one simply lets the alternative approaches vie for supremacy or dominance by efficient search procedures
leading to parsimonious mode ls that can and often do include both kinds of structures. AUTOBOX conducts these
numerical tournaments in its development of adequate model forms. The issue here was to show how a deterministic
trend model of the form can be expressed as particular ARIMA model with a root on the unit circle. After one has
adequately described each one of the time series, then one can take the most complicated or general model and estimate
it locally using each time series series independently. The Chow test, developed in t he early 60's follows thereafter
to test the hypothesis of a common set of coefficients across all groups. AUTOBOX has extended the CHOW test to
time series. I am sure that if you have good SAS skills it shouldn't take too much time to implement this in SAS.
On the other hand AUTOBOX might be a viable alternative since these features are already in place.
Note that intervention detection can lead to models like |
||||||||
|
||||||||
|
||||||||
Y = W0 + W1*T1 + B2*T2 |
||||||||
|
||||||||
WHERE T1 = 1 |
2 |
3 |
|
|
|
|
|
...T |
WHERE T2 = 0 |
0 |
0 |
1 |
2 |
3 |
..T-3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Y = W0 + W1*S1/(1-B) + B2*S2/(1-B) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
WHERE S1 = 1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 FOR ALL T |
|
WHERE S2 = 0 |
0 |
0 |
1 |
1 |
1 |
1 |
1 FOR ALL T |
|
|
|
|
|
|
|
|
|
|
SINCE A STEP IS EQUAL FIRST DIFFERENCES OF A TREND |
|
|
|
|
|
|
|
|
AND A PULSE IS EQUAL FIRST DIFFERENCES OF A STEP |
|
|
|
|
|
|
|
|
The presence of local trends is often overlooked in time series models. If you enable this feature AUTOBOX
will test for a variety of TIME variables and identify the optimal point where each trend starts.