RUNNING_STATS
The RUNNING_STATS function computes the mean and unbiased sample variance of an array without overflow. The function can also combine previously computed values with new data to allow computing mean and variance on data sets that are too large to fit into memory.
RUNNING_STATS uses the Welford "online" algorithm to compute the running mean and variance in a single pass through the data. The routine is more stable when computing the mean and variance, is significantly faster than the VARIANCE function, and unlike VARIANCE, does not require any additional memory.
Examples
; Define a vector of sample data:
IDL> A = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
; Compute the [mean, variance, count]:
IDL> result = RUNNING_STATS(A)
IDL> result
IDL prints:
5.5000000000000000 9.1666666666666661 10.000000000000000
Syntax
Result = RUNNING_STATS( X [, /NAN] [, PREVIOUS=value] )
Return Value
Returns the statistics of the array X in the form [mean, variance, count] in double precision.
Arguments
X
The array to be processed. This array can be any numeric type other than complex or double complex.
Keywords
NAN
Set this keyword to cause the routine to check for occurrences of the IEEE floating-point values NaN or Infinity in the input data. Elements with the value NaN or Infinity are treated as missing data.
Note: Since the value NaN is treated as missing data, if you set /NAN and Array contains only NaN values, the routine will return NaN for the mean and variance, and zero for the count.
PREVIOUS
Set this keyword to a three-element array containing the [mean, variance, and count] from a previous calculation. These three values will be combined with the new statistics computed from the input array. If this keyword is omitted or is set to [0, 0, 0], then a new calculation is started.
Tip: See below for examples of chaining together multiple calls to RUNNING_STATS using the PREVIOUS keyword.
Note: If the count from a previous calculation is zero, then a new calculation is started, regardless of the mean or variance values.
Thread Pool Keywords
This routine is written to make use of IDL’s thread pool, which can increase execution speed on systems with multiple CPUs. The values stored in the !CPU system variable control whether IDL uses the thread pool for a given computation. In addition, you can use the thread pool keywords TPOOL_MAX_ELTS, TPOOL_MIN_ELTS, and TPOOL_NOTHREAD to override the defaults established by !CPU for a single invocation of this routine. See Thread Pool Keywords for details.
When computing the statistics for a large number of values, the results will depend upon the order in which the numbers are combined. Since the thread pool will combine values in a different order, you may obtain a different — but equally correct — result than that obtained using the standard non-threaded implementation. This effect occurs because RUNNING_STATS uses floating point arithmetic, and the mantissa of a floating point value has a fixed number of significant digits. For more information on floating-point numbers, see Accuracy and Floating Point Operations.
Additional Examples
IDL> A = [1, 2, 3, 4, 5]
IDL> B = [6, 7, 8, 9, 10]
; First compute the stats for the combined array:
IDL> RUNNING_STATS([A, B])
; 5.5000000000000000 9.1666666666666661 10.000000000000000
; Now compute the stats of just A and then combine with B using PREVIOUS keyword
IDL> Stats_of_A = RUNNING_STATS(A)
IDL> Stats_of_A
; 3.000000000000000 2.500000000000000 5.000000000000000
IDL> RUNNING_STATS(B, PREVIOUS = Stats_of_A)
; 5.5000000000000000 9.1666666666666661 10.000000000000000
; use PREVIOUS keyword to efficiently calculate stats on a huge array
IDL> stats = [0, 0, 0]
IDL> for i=0,99 do stats = RUNNING_STATS(randomu(seed, 1e7), PREVIOUS=stats)
IDL> stats
IDL prints:
0.50000184809149439 0.083333037727096743 1000000000.0000000
Version History
8.8.3 |
Introduced |