Convergence Tests, RTOL and ATOL

Tolerances are usually specified as either a relative tolerance RTOL or an absolute tolerance ATOL, or both. The user typically desires that

| True value – Computed value | < RTOL*|True Value| + ATOL (Eq.1)

where the RTOL controls the number of significant figures in the computed value (a float or a double), and a small ATOL is a just a “safety net” for the case where True Value is close to zero. (What would happen if ATOL = 0 and True Value = 0? Would the convergence test ever be satisfied?) You should write your programs to take both RTOL and ATOL as inputs.

Sometimes programs are (foolishly) written using only one tolerance, e.g. only an ATOL. The problem with this is that the floating point answer is only accurate to a certain number of significant figures, and if ATOL is too small it may be impossible to satisfy the convergence test. If you run into this, you may want to add RTOL to the program, but sometimes this is difficult. If your problem requires accuracy to a specified RTOL, but the program you are using can only take an ATOL, here is the solution:

1) run the calculation with some guess at ATOL.

2) if ATOL/(calculated value) < RTOL you are done

3) otherwise, set ATOL ~ RTOL * calculated value from step 1 and re-run.

Eqn. 1 is not usually useful, since one normally has no way of knowing or computing the True Value exactly (if there was, we wouldn’t be computing it numerically!). Instead we normally use this convergence test:

| Best Calculated Answer – Next Best Calculated Answer | <

RTOL*|Best Calculated Answer| + ATOL (Eq. 2)

Usually (but with no guarantee), if Eqn. (2) is satisfied, Eqn. (1) will also be satisfied. This will be so if the calculation is converging sufficiently rapidly. For example:

The numerical integral computed using Simpson's rule converges rapidly, with the number of intervals used "N", in fact one can prove that for large N the upper bound on the error drops as 1/N⁴. Suppose we do two Simpson’s rule calculations of an integral, one with N intervals and one with 2N intervals, so we have Integral(N) and Integral(2N). Both calculations are in error:

Error(N) = Integral(N) - true integral Eq.(3)

Error(2N) = Integral(2N) - true integral Eq.(4)

Because of the rapid convergence of Simpson’s rule, we expect Error(2N) is much smaller than Error(N), so

|Error(2N)| < |Error(N)-Error(2N)| =|Integral(N)-Integral(2N)| Eq.(5)

So if we satisfy convergence test 2:

|Integral(N)-Integral(2N)| < RTOL*|Integral(2N)| + ATOL Eq.(6)

we can be sure that

|Error(2N)| < RTOL*|Integral(2N)| + ATOL Eq.(7)

and so by Eq. (4)

|Error(2N)| < RTOL*|True Integral + Error(2N)| + ATOL Eq.(8)

and moving the Error(2N) on the right to the left hand side:

|True value – Computed Value| < (RTOL/1-RTOL)*|True value| + ATOL/(1-RTOL)

Eq.(10)

since typically RTOL <<1 the denominators are very close to 1, and Eq.(10) is essentially the same as Eq.(1). So Integral(2N) is acceptably close to the true value of the integral, without ever knowing exactly what the true value is.

The key assumption in this derivation is “…we expect Error(2N) is much smaller than Error(N)…”. Upper bounds on the errors for very large N are known for most numerical methods, but one normally does not know how large “very large” is. Also, it is possible that a particular small N-interval calculation might ‘accidentally’ be more accurate than a larger 2N-interval calculation, so the assumption might be incorrect at some point in the computation. This can lead to premature termination of the calculation when Eq.(2) is used as the convergence test. However, it is usually quite difficult to come up with anything better, so Eq.(2) is widely used and we recommend you use it in this course.