Unset better formulae for POISSCDF, BINOCDF, NBINCDF:

Contents

Description of the problem: POISSCDF

The function poisscdf utilizes the cumsum function on poisspdf. This results in

When the ratio x/lambda gets large, poisscdf approaches 1, and summing the poisspdf is largely a waste of time. However, for finite x/lambda the computation yields nontrivial values away from 1. Some examples are

clear;
x=1e7; lam=1;
tic, p1 = poisscdf(x,lam); toc, p1
x=1e7; lam=1e7;
tic, p1 = poisscdf(x,lam); toc, p1
Elapsed time is 4.765935 seconds.

p1 =

   1.000000000000000

Elapsed time is 5.077998 seconds.

p1 =

   0.500084107548683

Suggested Solution (Peter Perkins 2003)

The function poisscdf(x,lambda), it turns out, is equal to 1-gamcdf(lambda,x+1,1) (Lawless, 2nd ed., pg43, ex. 1.6c). For example:

x=1e7;
lam=1e7;
tic, p1 = poisscdf(x,lam); toc, p1
tic, p2 = 1-gamcdf(lam,x+1,1); toc, p2
Elapsed time is 5.044705 seconds.

p1 =

   0.500084107548683

Elapsed time is 0.028917 seconds.

p2 =

   0.500084104393297

The two formulas agree up to 8 significant digits in this example. We will show in the next section that this is not always the case.

Accuracy of the Suggestion

We may assume that poisscdf is more accurate than 1-gamcdf, as the latter involves numerical integration of the gamma distribution function and therefore may more readily prone to errors. For an error check, we run the loop:

lamlist=10.^(1:7); % Number list for lam=lambda

for k=1:numel(lamlist);
    lam=lamlist(k); std=round(sqrt(lam));
    x=max(1,lam-40*std):lam+10*std;       % Choose x around lambda, where the
                                          % distribution takes non-trivial values
    p1list = poisscdf(x,lam);
    p2list = 1-gamcdf(lam,x+1,1);

    imp=find((p1list>1e-300 & p2list>1e-300)|p1list>1e-290|p2list>1e-300);
    p1list=p1list(imp);p2list=p2list(imp);x=x(imp);

    relerror(k)=max(abs(p1list-p2list)./(abs(p1list)+(p1list==0)));
    abserror(k)=max(abs(p1list-p2list));

end
loglog(lamlist,max(relerror,1e-16),'r',lamlist,abserror,'g');
xlabel('\lambda');axis([0 lamlist(end) 1e-16 1e4])
legend('Relative Error','Absolute Error','Location','NorthWest')
title('Replacement of poisscdf by 1-gamcdf')

Although absolute errors are reasonable, the relative error is always %100 because 1-gamcdf can never be less than 1e-16. If the true answer is less than 1e-16, the user is going to get just 'zero'.

Alternative Suggestion for POISSCDF:

It turns out there is a simpler formulation for poisscdf, which satisfies poisscdf(x,lam)=gammainc(lam,x+1,'upper'), given in wikipedia (Did not try hard but I am sure it can be found elsewhere, too. In addition, it's easy to obtain it by successive integration by parts on the gammainc function)

clear
lamlist=10.^(2:5); % Number list for lam=lambda

for k=1:numel(lamlist);
    lam=lamlist(k); std=round(sqrt(lam));
    x=max(1,lam-40*std):lam+10*std;       % Choose x around lambda, where the
                                          % distribution takes non-trivial values
    p1list = poisscdf(x,lam);
    p2list = gammainc(lam,x+1,'upper');

    imp=find((p1list>1e-300 & p2list>1e-300)|p1list>1e-290|p2list>1e-300);
    p1list=p1list(imp);p2list=p2list(imp);x=x(imp);

    relerror(k)=max(abs(p1list-p2list)./abs(p1list));
    abserror(k)=max(abs(p1list-p2list));

end
loglog(lamlist,max(relerror,1e-16),'r',lamlist,abserror,'g');
xlabel('\lambda');
legend('Relative Error','Absolute Error','Location','NorthWest')
title('Replacement of poisscdf by gammainc')

Although this formulation does much better than the one suggested in the previous section, it still does poorly at large lambda.

Similar considerations for BINOCDF and NBINCDF:

It is suggested that for large values of x, the relation binocdf(x,n,p)=1-betainc(p,x+1,n-x) = betainc(1-p,n-x,x+1) from (26.5.2 A&S) be used. We prefer the latter formulation as it would be more accurate near zero. A repetition of the numerical experiments can easily be run as follows:

% LOOP FOR BINOCDF:
p=0.5;
nlist=10.^(1:1:7);
for k=1:numel(nlist);
    n=nlist(k);
    mu=round(n*p);std=round(sqrt(n*p*(1-p)));
    % Choose x around mu, where the distribution takes non-trivial values
    x=max(1,mu-40*std):min(n-1,mu+10*std);

    p1list = binocdf(x,n,p);
    p2list = betainc(1-p,n-x,x+1);

    imp=find((p1list>1e-300 & p2list>1e-300)|p1list>1e-290|p2list>1e-300);
    p1list=p1list(imp);p2list=p2list(imp);

    relerror12(k)=norm((p1list-p2list)./p1list,'inf');
    abserror12(k)=norm(p1list-p2list,'inf');
end

loglog(nlist,relerror12,'r',nlist,abserror12,'g');
xlabel('n'),legend('Relative Error','Absolute Error','Location','NorthWest')
title('Replacement of binocdf by betainc')

For nbincdf, the formula is nbincdf(x,r,p) = betainc(p,r,x+1).

LOOP FOR NBINCDF:

p=0.5;
rlist=10.^(1:1:5);
for k=1:numel(rlist);
    r=rlist(k);
    mu=r*(1-p)/p; std=sqrt(r*(1-p)/p^2);
    % Choose x around mu, where the distribution takes non-trivial values
    x=round(max(0,mu-40*std)):round(mu);

    p3list = nbincdf(x,r,p);
    p4list = betainc(p,r,x+1);

    imp=find((p3list>1e-300 & p4list>1e-300)|p3list>1e-290|p4list>1e-290);
    p3list=p3list(imp);p4list=p4list(imp);

    relerror34(k)=norm((p3list-p4list)./p3list,'inf');
    abserror34(k)=norm(p3list-p4list,'inf');
end

loglog(rlist,relerror34,'r',rlist,abserror34,'g');
xlabel('n');
legend('Relative Error','Absolute Error','Location','NorthWest')
title('Replacement of nbincdf by betainc')

The situation is worse for nbincdf, where both the absolute and the relative errors quickly grow big.

Notes and Conclusions

at the hypergeometric distribution % probabilities are the coefficients in the Taylor series for the % hypergeometric function. ##### SOURCE END ##### --> at the hypergeometric distribution % probabilities are the coefficients in the Taylor series for the % hypergeometric function. ##### SOURCE END ##### -->