Improvement to Binornd Function
Contents
Problem Description
Geck 179533: "A customer at the Zurich FMA focus group noted that binornd is too slow. I suspect that's because we create a large binary matrix and sum it, which is going to be slow for large N"
Binornd function is indeed much slower than it could potentially be:
tic, binornd(1e4,0.4,40);toc
Elapsed time is 1.426907 seconds.
because it is using a very inefficient high-school algorithm, namely
- Generate N random variables by rand(N)
- X = number of random variables less than p.
This algorithm takes O(N) time and memory. Yes, binornd crashes when N*numberOfRandomVariables >~ 1e8 (depending on the computer).
Some alternative algorithms are
- Waiting time algorithm, O(N*p) time, constant memory
- Acceptance-Rejection method, O(sqrt(N)) in time, also can devise the algorithm to bound the computational time by a large constant, so technically it is O(1) in time.
- Recursive methods, O(loglog(N)) in time, if betarnd is fast, i.e. O(1).
Solution
In this report, we will choose the recursive method because it is much simpler to code and understand than the acceptance-rejection method. In addition, we believe that betarnd is fast enough, and O(loglog(N)) is almost as good as O(1) in practice, given that loglog of the largest representable number is about 6.57.
The recursive algorithm as described in pages 536-7 of Devroye(1986) -see also errata for a typo on p537- reduces the problem of generating a (N,p) binomial random variate to generating another binomial variate with smaller N*p. The number N may persistently remain large, but the product N*p is bound to get small quickly.
This algorithm needs to be supported by a random variate generator that is fast for small N*p (the matlab's current one is not suitable because it is O(N)). For this purpose we choose the first waiting algorithm described on page 525, which is indeed O(N*p).
Those are implemented in the file mybinornd.m.
Comparison and Testing
The new algorithm is much superior than the original one in terms of computational time:
tic, binornd(1e4,0.4,60);toc tic, mybinornd(1e4,0.4,60);toc
Elapsed time is 3.598106 seconds. Elapsed time is 0.027788 seconds.
It is also more memory efficient:
try binornd(2e4,0.3,100); catch u=lasterror; display(u.message) end display('whereas ') mybinornd(2e4,0.3,100); display('generates no errors')
Error using ==> rand Out of memory. Type HELP MEMORY for your options. whereas generates no errors
But does it give the right answer? Here's one attempt to check that
N=100; p=0.25; mu=round(N*p); std=round(sqrt(N*p*(1-p))); % Find mu and std for plotting purposes M=10000; % Sample size x=1:N; y=M*binopdf(x,N,p); % The continous PDF curve % Generate random variates r1= mybinornd(N,p,M,1); r0= binornd(N,p,M,1); % Find out how many incidences of each case occured yr1=zeros(1,N); yr0=yr1; for k=1:M; yr1(r1(k))=yr1(r1(k))+1; yr0(r0(k))=yr0(r0(k))+1; end % Plot and compare with PDF subplot(1,2,1), hold off bar(x,yr1,'b'), hold on, plot(x,y,'m','LineWidth',2); axis([mu-5*std mu+5*std 0 max([yr1(:);yr0(:)])+1]) xlabel('r'), ylabel('Number of incidents with X = r') title('mybinornd') subplot(1,2,2), hold off bar(x,yr0),hold on, plot(x,y,'m','LineWidth',2); axis([mu-5*std mu+5*std 0 max([yr1(:);yr0(:)])+1]) xlabel('r') title('binornd')
The curious reader is urged to play with the parameters of the above code. We can also simulate with larger parameters, but this time we do not call binornd because it is too slow and it generates memory error.
clear, clf N=1000; p=0.45; mu=round(N*p); std=round(sqrt(N*p*(1-p))); M=1e6; x=1:N; y=M*binopdf(x,N,p); s=5; r1= mybinornd(N,p,M,1); yr1=zeros(1,N); for k=1:numel(r1); yr1(r1(k))=yr1(r1(k))+1; end bar(x,yr1), hold on, plot(x,y,'m','LineWidth',2); axis([mu-s*std mu+s*std 0 max(yr1(:)+1)]) xlabel('r'), ylabel('Number of incidents with X = r') title('mybinornd')
What we see is a close agreement with the binopdf curve when the sample size is large.
Conclusion
In this report, we have implemented a recursive algorithm for binornd, which is
- faster than the current algorithm.
- memory efficient, i.e. if the returned answer can fit in the memory, the algorithm would mostly work. This is not true for the current algorithm when N and p are scalars.
For large sample size, we have found that the generated random variates agree with the PDF function.
The only disadvantage for this algorithm could be its dependence on betarnd, which we hope uses an efficient algorithm.