Better Algorithm for beta, betaln, betainc, and gammainc
Contents
Problem 1
Beta and Betaln use expressions like
% gammaln(z) + gammaln(w) - gammaln(z+w)
and when z >> w or w >> z, this has bad cancellation errors, even when neither large.
Reproduction Step:
betaln(1e15,3)
ans = -108
You wouldn't believe that such a functional value is an integer, would you? A more accurate number can be obtained by approximating this numerical derivative with the psi function
gammaln(3) -(psi(1e15+3/2)*3)
ans = -1.029231820041721e+002
Or we can work out the Stirling approximations to the gammaln functions as
n=1e15-1; k=3; gammaln(k)-(n*log1p(k/n)+k*log(n+k)-k+0.5*log1p(k/n)+1/12/n/(n+k))
ans = -1.029231820041721e+002
Those two approximations amazingly match at all their 16 digits. We know that the Stirling's approximation is accurate to more than 16 digits because the first correction term 1/12/n/(n+k) is already less than 5e-17.
Therefore the error in the current betaln is about %5. The error in beta will be about 15000 percent.
As this calculation suggests, we can use the approximation
% gammaln(w) -(psi(z+w/2)*w) (*)
as a replacement for betaln when z >> w, and a symmetric variation of this also exists for the case z << w. The error term in this approximation is about 1/48 * w^3/z^2, so the choice criterion could be that w^3/z^2 < 1e-10. This is done in mbetaln.
Alternatively, we can use the Stirling's approximation (SA) in the above form which has the advantage of being valid even when both parameters are comparable. However, SA is not valid when one parameter is large compared to the other, but not large compared to 1, i.e. the corrections, being proportional to 1/n^2, 1/n^3 etc will not be negligible.
Still, Stirling's formula is a good way to check the accuracy of our approximation (*) at large parameters:
nlist=logspace(10,16,25); w=logspace(0,6,25); for m=1:numel(nlist) z=nlist(m)+1; n=nlist(m); a1=gammaln(w) -(psi(z+w/2).*w); a2=gammaln(w)-(n.*log1p(w/n)+w.*log(n+w)-w+0.5*log1p(w/n)+1/12/n./(n+w)); err=max(abs((a1-a2)./a2)); end err
err =
3.077872643862345e-016
and it doesn't get better than this. Notice that the error is small even when w=1e6 and z=1e10+1, with just 4 orders of magnitude difference.
Problem 2
The betainc computes its normalizing constant as
% exp(gammaln(ak+bk)-gammaln(bk+1)-gammaln(ak) + bk.*log(xk)+ak.*log1p(-xk))
and this can suffer from the same cancellation problems among the first three terms as beta and betaln, if a>>b or b>>a.
Solution: We can use the same thing here when ak << bk or vice versa. This expression then becomes
% exp(bk.*(log(xk)+psi(ak+bk/2))-gammaln(bk+1)+ak.*log1p(-xk))
with error term about ak^3/bk^2, when ak is small. This is done in m2betainc.m. Later in part (4), we will see that this modification is crucial if the continued fractions are desired to be used at large parameters (and for better accuracy, they should).
Problem 3
The gammainc computes its normalizing constant as
% exp(-xk + ak.*log(xk) - gammaln(ak+1))
and there's cancellation when a and x are moderately large.
Solution: We can use the Stirlings formula when ak is large, then the formula becomes
% exp(-(xk-ak) + ak.*log(xk./ak) - 0.5*log(2*pi*ak) -1/12/ak +1/360/ak.^3);
with the relative error about 1/1260/ak.^5. So, we can use this formula when ak >1e3.
Problem 4
The betainc and gammainc have kind of poor approximations for cases when the parameters are very large, particularly for betainc when only one of them is large.
Reproduction Steps:
format long a=linspace(.999e7,1.001e7,100);b=2; x=a./(a+b);y=betainc(x,a,b); plot(x,y), xlabel('x'), ylabel('betainc(x,a,b)') y([1 end])
ans = 0.406005874992874 0.406831885344198
The reason why this step behaviour arises is that betainc uses different algorithm on different sides of the step:
- continued fractions (in subfunction betacore) for a+b <= 1e7, at most 1000 iterations or
- approximations 26.5.20 AS, for a+b>1e7 or when the continued fractions fail.
Note that this is less of a problem if beta is a little larger:
b=20; x=a./(a+b); y2=betainc(x,a,b);plot(x,y2), xlabel('x'), ylabel('betainc(x,a,b)') y2([1 end])
ans = 0.470257348696773 0.470292196745020
Continued fractions would always be more accurate unless it takes too many iterations to converge and errors accumulate. In this case, the algorithm would also be inefficient. If this is the reason why the approximations are used, the criterion a+b > 1e7 may not be the best, because the number of continued fractions iterations seemingly depend on min(a,b), and not a+b. Here's a plot of the number of iterations:
alist=logspace(0,7,15); blist=logspace(0,7,15); for ka=1:numel(alist); for kb=1:numel(blist); a=alist(ka); b=blist(kb); x=[a/(a+b)+(-100/(a+b):1/(a+b):100/(a+b)), ... b/(a+b)+(-100/(a+b):1/(a+b):100/(a+b))]; x=x(0<x & x<1); x=sort(x); [p1,m(ka,kb)]=cbetainc(x,a,b); end end levels=[2.^(0:1:10)]; loglevels=log10(levels); [aa,bb]=meshgrid(alist,blist); [cs,h]=contourf(aa,bb,log10(m),loglevels); cbar=colorbar;set(gca,'XScale','log'),set(gca,'YScale','log') set(cbar,'Ytick',loglevels),set(cbar,'Yticklabel',sprintf('%1.0f|',levels)) xlabel('a'), ylabel('b'),title('Number of Maximum Iterations in BetaCore')
This form of dependence of number of iterations can also be seen from the continued fractions formula AS 26.5.8 used in betainc: The coefficients d_n are always smaller than 1 (by careful choice of x), and they are decreasing. If a is much bigger than b, then d_{2m} will be small. If b is much bigger than a, again d_{2m} will get small quickly, because in this case x will be chosen small (or a and b will switch). If any of d's get small, the continued fractions converge, because C and Dinv are updated according to C=1+d./C, Dinv=1+d./C and these converge to the same number more quickly when d is small. When C ~ Dinv, the algorithm stops. Overall, if at least one parameter is not large, the algorithm converges quickly. So the AS approximations shall only be used when both parameters are large, not when their sum is large.
Solution: Change the continued fractions criterion to min(a,b)>1e7. However, this has some problems when max(a,b) is more than 1e14, as shown below
a=1e15; b=1.7; x=a/(a+b); cbetainc(a/(a+b),a,b)
ans = -1.252174470198046
Here, cbetainc is the betainc function which always uses continued fractions.
It turns out that this is related to round-off errors stated in problem (2). If we change the criterion in m2betainc (which has this problem fixed) to min(a,b) > 1e7, this problem goes away :
m2betainc(a/(a+b),a,b)
ans = 0.376361214697719
The result is close, but not too close to the one given by the approximation in betainc:
betainc(a/(a+b),a,b)
ans = 0.377002649992816
Let's test m2betainc for parameters for which betainc already works:
alist=logspace(0,6,15); %alist=alist(15); blist=logspace(0,6,15); %blist=blist(1); for ka=1:numel(alist); for kb=1:numel(blist); a=alist(ka); b=blist(kb); x=[a/(a+b)+(-100/(a+b):1/(a+b):100/(a+b)), ... b/(a+b)+(-100/(a+b):1/(a+b):100/(a+b))]; x=x(0<x & x<1); x=sort(x); y1=betainc(x,a,b); y2=m2betainc(x,a,b); imp=(y1>1e-290 & y2>1e-290); y1=y1(imp);y2=y2(imp); x=x(imp); [er(ka,kb) ier]=max(abs(y1-y2)./y1); end end err=max(max(er))
err =
4.368458830741453e-009
However, the match is not always as perfect (especially when one parameter is large and the other small):
a=1e7; b=1; x=[a/(a+b)+(-100/(a+b):1/(a+b):100/(a+b)),... b/(a+b)+(-100/(a+b):1/(a+b):100/(a+b))];x=x(0<x & x<1); x=sort(x); y1=betainc(x,a,b); y2=m2betainc(x,a,b); y3=binocdf(b-1,a+b-1,1-x); imp=(y1>1e-290 & y2>1e-290); y1=y1(imp);y2=y2(imp);y3=y3(imp); x=x(imp); loglog(x,y1,'b:',x,y2,'r',x,y3,'k--'), xlabel('x'), ylabel('betainc(x,a,b)') legend('betainc','m2betainc','binocdf','Location','SouthEast')
Here, we compare the results with binocdf, which is the most accurate since b = 1 (it is the same as binopdf, actually). As you can see, betainc is not accurate for most values. As a reminder, betainc uses approximations for this set of parameters. On the other hand, m2betainc uses continued fractions and matches perfectly with binocdf.
Conclusion
In this report, we have investigated the geck 289905. The author of this geck makes valid points about the beta, betaln, betainc and gammainc functions. The problems regarding the betaln and beta functions can be fixed by approximating the numerical derivative with the one given by the psi function. The same applies to the betainc function. In addition, the continued fractions in betainc function can be used for a larger set of values to increase the accuracy. A better approximation for betainc function at large parameters remains an open question.