Improvement to FCDF
Contents
Problem Description:
For an F distribution with the first degree of freedom v1 very large, you can get rounding on line 49 of FCDF, because x + v2 / v1 is small relative to x. The result is a step function coming out of FCDF.
Instead of line 49-50,
% xx=x(k)./(x(k)+v2(k)./v1(k)); (1) % p(k) = betainc(xx,v1(k)/2,v1(k)/2); (2)
you could use
% xx=1./(1+x(k).*v1(k)./v2(k)); (3) % p(k) = 1 - betainc(xx,v2(k)/2,v1(k)/2); (4)
i.e., don't use A&S 26.5.2, but then you run the risk of losing precision in the left tail because of the 1-betainc. Ideally, you'd want to choose the right one to use. But are these mutually exclusive cases, and can you choose easily?
Reproduction steps:
x=linspace(0,10,10000); v1=5e14;v2=1; % Original calculation x1=x./(x+v2./v1); p1 = betainc(x1,v1/2,v2/2); % Suggestion x2=1./(1+x.*v1./v2); p2= 1 - betainc(x2,v2/2,v1/2); subplot('position',[0.1 0.4 0.35 0.4]) plot(x,p1,x,p2,'r'); legend('original form','new form','Location','SouthEast') subplot('position',[0.6 0.4 0.35 0.4]) loglog(x,p1,x,p2,'r'); legend('original form','new form','Location','SouthEast')
As the loglog plot shows though, we lose some accuracy in the left tail.
The Cause of the Problem
Why does this happen? As the Geck owner mentions, because x1 is not precise and contains repetitions of the same number due to round-off error, the resulting fcdf is a step function.
On the other hand x2 is more precise representation of 1-x1. And since
more precision in calculating x2 pays of at the end, because the beta function B(v_2/2,v_1/2) takes on a small value, and the differences in x2 are stretched. Although for v_2 >2, taking the fractional power will shrink the differences in x2, the stretching effect of 1/B will be stronger since we assume v_1 >> v_2.
If v_2 < 2, then the beta function may be order 1, but then the taking of the fractional power v_2/2 will stretch the small differences in 0 < x2 < 1.
All in all, calculating x2 = 1-x1 precisely is very important. Some precision may be lost at the last step 1-betainc, but the relative error is small unless betainc is close to 1.
On the other hand, what happens in the original calculation is that x1 is close to 1, so its precision is on the order of eps(1), as compared to eps(0) precision of x2.
Then the betainc function takes in x1, and returns a value close to
Again, the errors in (1-x1) are stretches either by 1/B or the fractional power. But this time we are feeding in a 1-x1=x2 value that is only precise to order eps(1).
Mutually Exclusive Cases
A glance at Fig.1 shows that the two formulas converge when x is close to zero. Actually, the original formula may be better as 1-betainc may result in loss of precision when betainc is close to 1. This happens when x and x1 is close to zero, and x2 is close to 1.
We can try to choose 1st formulation when x1 < 0.5, and the second formulation when x2<=0.5:
x=linspace(0,10,10000); v1=5e14;v2=1; xx=1./(1+v2./v1./x); oneminusxx=1./(1+x.*v1./v2); k1=find(xx<0.5);k2=find(oneminusxx<=0.5); p(k1) = betainc(xx(k1),v1/2,v2/2); p(k2) = 1-betainc(oneminusxx(k2),v2/2,v1/2); p0=fcdf(x,v1,v2); clf subplot('position',[0.1 0.4 0.35 0.4]) plot(x,p0,x,p,'r'); legend('original form','new form','Location','SouthEast') subplot('position',[0.6 0.4 0.35 0.4]) loglog(x,p0,x,p,'r'); legend('original form','new form','Location','SouthEast')
As you can see, this does not work.
An easy solution is to calculate the fcdf in both ways, and use the suggested form unless 1-betainc returns some really small number.
x=linspace(0,10,10000); v1=5e14;v2=1; xx=1./(1+v2./v1./x); oneminusxx=1./(1+x.*v1./v2); p = 1-betainc(oneminusxx,v2/2,v1/2); i1=find(p<1e-2); p(i1)= betainc(xx(i1),v1/2,v2/2); p0=fcdf(x,v1,v2); clf subplot('position',[0.1 0.4 0.35 0.4]) plot(x,p0,x,p,'r'); legend('original form','new form','Location','SouthEast') subplot('position',[0.6 0.4 0.35 0.4]) loglog(x,p0,x,p,'r'); legend('original form','new form','Location','SouthEast')
That seems to work, but let us not jump to conclusions: We have not yet answered the question:
Are the two cases mutually exclusive?
The answer is NO. In fact, if you change v1 to higher values, you'll immediately see that the above algorithm does not work. Let's copy and paste:
x=linspace(0,10,10000); v1=1e18;v2=1; xx=1./(1+v2./v1./x); oneminusxx=1./(1+x.*v1./v2); p = 1-betainc(oneminusxx,v2/2,v1/2); i1=find(p<1e-2); p(i1)= betainc(xx(i1),v1/2,v2/2); p0=fcdf(x,v1,v2); clf subplot('position',[0.1 0.4 0.35 0.4]) plot(x,p0,x,p,'r'); legend('original form','new form','Location','SouthEast') axis([x(1) x(end) 0 1.1]) subplot('position',[0.6 0.4 0.35 0.4]) loglog(x,p0,x,p,'r'); legend('original form','new form','Location','SouthEast') axis([x(1) x(end) min(p0) 1e5])
The suggested form is better most of the time, except when the answer is about 1e-50.
Trying to choose one or the other in the calculation process may yield discontinuous results.
Additional Suggestion
If b is large, and x is small, and the combination b*x is order 1, then the following approximation holds:
This can be derived by seeing that the betainc integral has the most significant contribution from around t=0 and approximating (1-t)^(b-1) as exp(-(b-1)*t), which holds when t is small and b is large.
Plugging this into the suggested second formula, we obtain p=1-gammainc((v1/2-1)*x,v2/2)=1-gammainc((v1/2-1)*x,v2/2,'upper').
x=linspace(0,1.5,10000); v1=5e14;v2=10; % Original function p0 = fcdf(x,v1,v2); % Original Suggestion x2= 1./(1+x.*v1./v2); p2= 1 - betainc(x2,v2/2,v1/2); % New Suggestion p3= gammainc((v1/2-1)*x2,v2/2,'upper'); clf subplot('position',[0.1 0.4 0.35 0.4]) plot(x,p0,'b',x,p3,'g',x,p2,'r'); legend('fcdf','gammainc','Peter','Location',[0.3 0.1 0.4 0.1]) axis([x(1) x(end) 0 1.1]) subplot('position',[0.6 0.4 0.35 0.4]) loglog(x,p0,'b',x,p3,'g',x,p2,'r'); axis([x(1) x(end) min(p0) 1e5])
The results strangely deviate from what they should be if v2 is not also a little large, like 1e3. This is because the gammainc approximation is not as good.
Best Solution: mbetainc
The best solution to handle this problem is to modify the betainc function so it has an option to compute the 'upper' tail. All the formulae have been easily adapted, and the new function is called mbetainc. Now the new fcdf, named mfcdf, reads:
p4= mbetainc(x2,v2/2,v1/2,'upper');
We can plot the results as before:
clf subplot('position',[0.1 0.4 0.35 0.4]) plot(x,p0,'b',x,p4,'g',x,p2,'r'); legend('fcdf','mbetainc','Peter','Location',[0.3 0.1 0.4 0.1]) axis([x(1) x(end) 0 1.1]) subplot('position',[0.6 0.4 0.35 0.4]) loglog(x,p0,'b',x,p4,'g',x,p2,'r'); axis([x(1) x(end) min(p0) 1e5])
Error Analysis
In this section, we compare the two function fcdf and mfcdf. Let us not be too judgmental, and only check the absolute difference between the two functions:
v1list=logspace(1,20,40); x=logspace(-15,1,150); for k=1:numel(v1list); v1=v1list(k); v2=1; x2=1./(1+x.*v1./v2); y0=fcdf(x,v1,v2); y1=mfcdf(x,v1,v2); [er(k) ier]=max(abs(y1-y0)); xr=x(ier);x2r=x2(ier); y0r=y0(ier); y1r=y1(ier); end clf, loglog(v1list,er)
Where are those errors coming from? Here's one culprit:
x2r= 1.3e-8; v1= 3e7; v2= 1; betainc(1-x2r,v1/2,v2/2)+betainc(x2r,v2/2,v1/2)
ans =
0.9925
which needs to be 1. The all new mbetainc function yields
mbetainc(x2r,v2/2,v1/2,'upper')+mbetainc(x2r,v2/2,v1/2)
ans =
1
for the same problem. Here we input x2r, because we know that 1-x2r is not as accurate because x2r is small.
We leave the reader the decision whether or not mfcdf is better than fcdf.
Conclusion
In this report, we have investigated the Geck 289192. The given suggestion indeed improves the fcdf function most of the time, but the cases mentioned are not mutually exclusive. The best solution to solve this problem would be to extend the function betainc so that it can take the tail option 'upper'. This may also be needed because sometimes (when x is small) betainc(1-x,b,a) may not be good enough.