DEN Discussion List Archive

[Date Prev][Date Next][Date Index] [Thread Index] [Author Index]

RE: Prediction and regression



Hello Deninzens. I'm very interested on this subject regarding regression
analysis and control charts. I have always wondered what is the probability
of concluding that a linear trended system is unstable, when in fact it is
stable.  To approach an answer to this question, I did a sort of a
Shewhartian experiment with regressions. Let me explainit a little bit.

1) I simulated a straight line and added a certain level of noise to it,
that is I used the equation: yt=b1 * t + b0 + e(t)

where b1 is the slope, t is time, bo is the Y intercept and e(t) is a random
normal variable with mean cero and a certain size for the standard
deviation. 

2) With the above equation, first I generated sample points with a small
amount of variation. By small amount of variation I mean that the Mean
Absolute Percentage Error, MAPE was around 5%. The MAPE is calculated by the
proportion et/Yt, where et is the residual after applying the linear
estimated model and Yt is the data point for period t. Note that a 5% MAPE
is roughly equivalent to a 95% R2(coefficient of determination). So with a
5% MAPE I simulated samples of k=5, 10, 15, 20 and 25 data points. 

3) For each sample size in step 2, I used regression calculations to
estimate the line equation and the residuals. Next I constructed two control
charts: one structural control chart for the linear data points, and a
second control chart for the residuals. For these calculations, I used
individuals and moving range formulas.

4) Note that under perfect circumstances, all this charts should lead me to
conclude that the system was stable. But the interesting thing is that
sometimes, and as expected, it did not lead me to this conclusion. That is
sometimes, I was making Shewhart's mistake of confusing a common cause as a
special cause. So  I repeated this experiment about 15 times for EACH sample
of size k, to obtain an approximation of the probability of this mistake and
to relate this probability with the amount of noise in the system.

5) For that reason,  I repeated everything I mentioned in steps 2 to 4, but
this time I used a higher amount of noise in the random equation, that is a
MAPE of approximately 30%.

What I have concluded so far is the following:

a) With small amount of noise (MAPE in the order of 5%) and with k=15 or
higher, the control charts work well. That is must of the time (aprox 90%)
you will conclude that the system is stable, when in fact it is. In 10% of
the time, you will conclude that the system is unstable, when in fact it is
stable. You can even work with small sample sizes, and still reach
appropriate conclusions 85% of the times.

b) With a large amount of noise (MAPE in the order of 20 to 30%), you do
have to work with k=20 or ideally more, or else you will often conclude the
system is unstable, when in fact it is stable.

Next I did everything again, but this time I added seasonality's to the
linear trend, the conclusions are very similar, with a bias to sample sizes
of k=20 or higher, even with small levels of noise. This of course is to to
the fact that one is loosing degrees of freedom, as one has to estimate the
different components of the regression equation. Unfortunately I did this
experiment with classical decomposition, so I still have to repeat it using
regression with dummy variables to model the seasonality's.

Hope this is useful and your comments and observations are appreciated. Best
regards, 

Carlos Méndez




DEN Home | Main Index | Thread Index | Author Index