Improvement to binoinv

Contents

Problem Description

binoinv is kind of slow

tic,binoinv(0.9,1e5,0.5),toc
ans =

       50203

Elapsed time is 6.072841 seconds.

And surprisingly, it is not very accurate either, for example the below quantity is supposed to be 1e4 (or 1e4-1, if 1e-12 is an exact binocdf)

binoinv(1-1e-12,1e4,0.4)+binoinv(1e-12,1e4,0.6)
ans =

        9989

A trivial change

The problem in the second example can be fixed by using

% binoinv(y,N,p) = 1-binoinv(1-y,N,1-p);

whenever y > N*p. This also increases the performance of the algorithm when y > N*p, as fewer terms are summed up.

Betainc & Newtons Method Replacement

We know that (*) binocdf(x,N,p) = betainc(1-p,N-x,x+1). Based on my previous reports, this relation is accurate

A binoinv algorithm based betainc can suffer from these errors. However, since binoinv always returns an integer, these errors will rarely be a problem. Besides, the current binoinv does not work for large N at all, so any replacement would be better than nothing.

Once we decide to use the relation (*), we can calculate the binoinv by newtons method. This method solves a continous version of the problem and lets x = ceil(x) at the end.

All these are implemented in mbinoinv.m.

Comparison and Testing

The following code is an attempt to verify the legitimacy of mbinoinv:

N=1e4;
p=0.6;
y=logspace(-300,0,301);

tic,x=binoinv(y,N,p);toc
tic,x2=mbinoinv(y,N,p);toc

er=max(abs(x-x2))
Elapsed time is 3.412440 seconds.
Elapsed time is 0.988535 seconds.

er =

     0

It seems like that the user needs to be really unlucky to get an incorrect answer. And even then mbinoinv may be more correct, especially if the input y is close to 1 (where a lot round-off errors accumulate in binoinv).

There is also doubt on the accuracy of binocdf, for example the numbers below need to be the same

p=0.45;
N=40;
x=39;

1-binocdf(x,N,p)
binopdf(N,N,p)
ans =

    2.331468351712829e-015


ans =

    1.344313472276060e-014

In this case, betainc seems to be doing better than binocdf,

1-betainc(1-p,N-x,x+1)
ans =

    1.343369859796439e-014

Even this %0.1 error can be eliminated by use of AS 26.5.2

betainc(p,x+1,N-x)
ans =

    1.344313472276063e-014

For binocdf, this problem can be fixed for binocdf by using the alternate formula binocdf(x,N,p)=1-binocdf(N-1-x,N,1-p) when x is larger than N*p.

binocdf(N-1-x,N,1-p)
ans =

    1.344313472276055e-014

Because of this reason, we trust binocdf only until x < N*p. The following code tests binoinv against binocdf in that range:

N=3e4;
p=0.45; x=1:round(N*p);
y=binocdf(x,N,p);
i=((1e-300<y)&(y<1)); y=y(i);x=x(i);
[y,i,j]=unique(y); x=x(i);
x2=mbinoinv(y,N,p);

er=max(abs(x-x2))
er =

     0

Here we restrict ourselves to y>1e-300. When y is on the order of eps(0), some error result from the fact that betainc has an earlier cut-off than binocdf. We don't think this range is of interest anyways.

The first example above also shows that mbinoinv runs faster than binoinv, but not quite, because the largest y in the list, except y = 1, is only y = 0.1. The computational time difference gets significant if N is large and y > 0.5 :-

tic,x=binoinv(0.9,1e5,0.5);toc
tic,x2=mbinoinv(0.9,1e5,0.5);toc
Elapsed time is 6.064765 seconds.
Elapsed time is 0.008710 seconds.

What about the example showing the inaccuracy of binoinv? It is handled with much more care by mbinoinv:

mbinoinv(1-1e-12,1e4,0.5)+mbinoinv(1e-12,1e4,0.5)
ans =

       10000

Side Application: New binornd

We have suggested new improvement for binornd before, this time just out of curiosity we briefly talk about how an efficient binoinv could be used to build a random binomial generator by the "inverse method" (see Devroye or any book on nonuniform random number generation).

The only step is to call

tic,binoinv(rand(1),N,p),toc
ans =

       13421

Elapsed time is 1.621720 seconds.

but this is not as fast as

tic,mybinornd(N,p,1),toc
ans =

       13309

Elapsed time is 0.000499 seconds.

apparently because binoinv is not that efficient.

Conclusion

In this report, we have argued that, although the betainc function is not a super-accurate replacement for binocdf, it can still be of great use in the calculation of binoinv. The Newton's method and betainc replacement algorithm outperforms the current binoinv algorithm in computational efficiency. It handles extreme inputs better as well, except the case that y is on the order of eps(0).

extreme inputs better % as well, except the case that y is on the order of eps(0). ##### SOURCE END ##### -->