DEN Discussion List Archive

[Date Prev][Date Next][Date Index] [Thread Index] [Author Index]

Re: Random Numbers



Ian,

The fact that "random" number generators aren't really
random is a good illustration of Deming's observation
that "there is no true value of any quantity."  No
process out there produces truly random results. We
can only hope to find something random enough for our
particular purposes.

Given any method of generating "random" numbers we
know, if you generate a humongous number of numbers,
you will find definite patterns. This is especially
true of computer-generated numbers. Most
computer-generated "random" number sequences are
actually periodic; the entire sequence will repeat
itself after some millions or billions numbers have
been generated. 

For some applications, this matters. If you are doing
the kind of application where you need atomic clocks
for highly precision work or you're sampling data from
a frame that numbers in the trillions - and there are
such datasets - you will need to be very, very careful
about what kind of random number generator you are
using, because some computer random-number generators
out there just aren't random enough when they start
dealing with numbers that big. The 
"pseudo" in "pseudorandom" really matters in these
cases.

The pseudorandom number generators used in higher-end
statistical software have had their randomness
properties analyzed and tested fairly extensively as
part of their quality assurance process.  For example,
I often use the random number generator found in SAS
software running on a PC. This product uses a prime
modulus multiplicative generator with modulus (2**31 -
1) and multiplier 397204094. An assessment of the
properties of this class of random number generators
was published in Fishman, G.S. and Moore, L.R. (1982),
"A Statistical Evaluation of Multiplicative
Congruential Generators with Modulus (2**31 - 1),"
Journal of the American Statistical Association, 77, 1
29-136. The effectiveness of this generator is limited
by the capacity of a PC: the number of bits in the
generator (the 31 in the 2**31 -1) is limited by the
fact that PCs have a CPU that processes only 32 bits
at a time. More powerful computers can process 64 or
even more bits at a time. There are random number
generators out there that use this extra power and
will take much, much longer to repeat themselves. And
there are people out there who need the extra
randomness.

Statisticians have found a number of deficiencies in
previous versions of the Microsoft Excel statistical
functions including its random number generator. These
were reported in Knusel, L., (1998) "On the accuracy
of statistical distributions in Microsoft Excel 97."
Computational Statistics and Data Analysis 26,
375-377; and McCullough B.D. and Wilson B., (1999) "On
the accuracy of statistical procedures in Microsoft
Excel 97." Computational Statistics and Data Analysis
31, 27-37. A summary of the problems can be found
on-line at

http://www.agresearch.co.nz/Science/Statistics/exceluse1.htm

Microsoft's tech support web site acknowledges the
problems, reports that they continued through Excel
2002, but reports that it has provided an improved
random number generator addressing the deficiencies in
Excel 2003.

http://support.microsoft.com/default.aspx?kbid=828888&product=xl2003

The support article states that "the chance of a
serious practical affect on your random data by the
random number generator in Excel 2002 and earlier is
minimal. For example, you must have a lengthy sequence
of random numbers (such as 1 million) before the
repetitive behavior would have a serious affect on
your results." 

In order to assess whether such a random number
generator is random enough for a particular use, it is
critical to understand the properties of the generator
and the needs of the use. Statisticians performing
simulation and other number-intensive work often need
to generate pseudorandom sequences in the millions or
more, so problems appearing with a million numbers
would be unacceptable for a much larger class of
people than problems appearing with sequences in the
billions.

It's certainly true that most applications in the
quality field use numbers so much smaller than these
limits that the level of randomness involved is more
than adequate for the purpose. A statistical control
chart generally samples from frames that number from
the dozens to perhaps the thousands. If your numbers
can fit in an Excel spreadsheet, this level of
pseudorandomness may not be a problem. 

Although Microsoft now reports that its random-number
generator has improved, it may be some time before
statisticians like me will venture to use Excel for
their most sensitive work. Microsoft's random number
generator performance was considerably worse than many
methods that had been out there for many years, and
remained so for a long time. Even if it makes no
difference for a particular use and even if everything
has been fixed, there will be statisticians who might
wonder why they should take a chance on a product with
a history of reliability problems, when there are many
things out there with a more proven track record of
quality.  

I believe this discussion may be a useful illustration
of Deming's emphasis on the importance of method in
taking any measurement. It is as important to identify
the method used to generate a "random" number, and
understand whether its properties and limitations are
suitable for ones purpose, as it is to identify and
understand the method used to perform any other type
of observation or measurement. And quality and
reliability considerations play a role in products of
all descriptions, even random number generators. 

Jonathan Siegel




DEN Home | Main Index | Thread Index | Author Index