Two-pass Algorithm
An alternate approach, using a different formula for the variance, first computes the sample mean,
- ,
and then computes the sum of the squares of the differences from the mean,
- ,
as given by the following pseudocode:
def two_pass_variance(data): n = 0 sum1 = 0 sum2 = 0 for x in data: n = n + 1 sum1 = sum1 + x mean = sum1/n for x in data: sum2 = sum2 + (x - mean)*(x - mean) variance = sum2/(n - 1) return varianceThis algorithm is often more numerically reliable than the naïve algorithm for large sets of data, although it can be worse if much of the data is very close to but not precisely equal to the mean and some are quite far away from it.
The results of both of these simple algorithms (I and II) can depend inordinately on the ordering of the data and can give poor results for very large data sets due to repeated roundoff error in the accumulation of the sums. Techniques such as compensated summation can be used to combat this error to a degree.
Read more about this topic: Algorithms For Calculating Variance