Resourceshttp://stranogroup.mit.edu/index.php/resources2014-12-15T04:56:38+00:00Joomla! - Open Source Content ManagementAFM2012-09-21T17:52:34+00:002012-09-21T17:52:34+00:00http://stranogroup.mit.edu/index.php/resources/17-experimental-facilities/34-afmSuper Userzulissi@mit.edu<div class="feed-description"><p>AFM </p></div><div class="feed-description"><p>AFM </p></div>NoRSE Algorithm2012-10-05T22:00:03+00:002012-10-05T22:00:03+00:00http://stranogroup.mit.edu/index.php/resources/19-simulation-and-analysis-codes/norse-code-files/40-norse-algorithmSuper Userzulissi@mit.edu<div class="feed-description"><p>Abstract: </p>
<p class="AbstractText">NoRSE was developed to analyze high-frequency data sets collected from multi-state, dynamic experiments, such as molecular adsorption and desorption onto carbon nanotubes. As technology improves sampling frequency, these stochastic data sets become increasingly large with faster dynamic events. More efficient algorithms are needed to accurately locate the unique states in each time trace. NoRSE adapts and optimizes a previously published noise reduction algorithm (Chung <i>et al</i>., 1991) and uses a custom peak flagging routine to rapidly identify unique event states. The algorithm is explained using experimental data from our lab and its fitting accuracy and efficiency are then shown with a generalized model of stochastic data sets. The algorithm is compared to another recently published state finding algorithm and is found to be 27 times faster and more accurate over 55% of the generalized experimental space. NoRSE is written as an M-file for Matlab.</p>
<p class="AbstractText">Link to paper: <a href="http://bioinformatics.oxfordjournals.org/content/early/2011/11/14/bioinformatics.btr632"></a><a href="http://bioinformatics.oxfordjournals.org/content/early/2011/11/14/bioinformatics.btr632">http://bioinformatics.oxfordjournals.org/content/early/2011/11/14/bioinformatics.btr632</a></p>
<p class="AbstractText">Supplied code: NoRSE Algorithm for MATLAB</p>
<p>function NoRSE<br />% Originally Coded by Nigel F. Reuel 4.01.2011<br />% Massachusetts Institue of Technology<br />% Department of Chemical Engineering<br />%<br />% Last Updated 10.27.2011<br />%<br />% NoRSE was developed to analyze high-frequency data sets collected <br />% from multi-state, dynamic experiments, such as molecular adsorption <br />% and desorption onto carbon nanotubes. Full description available in the<br />% Bioinformatics publication entitled, "NoRSE: Noise Reduction and State <br />% Evaluator for High-Frequency Single Event Traces" by NF Reuel et al.<br />%<br />% The code is written to be as general as possible, but there are a few<br />% parameters that can be modified by the user to improve fitting to their<br />% specific data sets. See lines quoted below for further discussion of <br />% these parameters.<br />%<br />% ^^^ User specified parameters in the noise reduction subroutine ^^^<br />%<br />% 1. p (line 163)<br />% 2. M (line 162)<br />% 3. N (line 160)<br />%<br />% ^^^ User specified parameters in the peak finder subroutine ^^^<br />%<br />% 1. nbins (line 65)<br />% 2. SW (line 262)<br />% 3. MoleL (line 297)<br />% 4. SigBinD (line 413)<br />%<br />% ----- Step 1: Read in the Traces (can normalize here if desired) --------<br />% <br />% Data Columns = Trace number , Data Rows = Time steps<br />%<br />RTD = csvread('Data.csv');<br />%Determine the number of time steps (Ns) and number of transducers (Nt):<br />[Ns,Nt]= size(RTD);<br />RTD_n = RTD;<br />% Create empty matrix to recieve the noise reduced traces:<br />M = zeros(Ns,Nt);<br />%<br />% ---------------- Step 2: Denoise each of the Traces --------------------<br />%<br />%The denoising algorithm loses the first and last time steps, so set a new<br />%time window:<br />time = Ns - 2;<br />%Create an empty matrix to recieve the denoised trace data:<br />T_denoise = zeros(time,Nt);<br />for i = 1:Nt<br /> NT = RTD_n(:,i);<br /> IC = feval(@NoiseReduc,NT);<br /> T_denoise(:,i) = IC(:,1);<br />end<br />% csvwrite(['T_denoise',int2str(iii),'.csv'],T_denoise);<br />%<br />%---------------- Step 3: Construct APH for each trace ------------------<br />%<br />% Number of bins for the histogram construction. We found 200 works well<br />% for experimental traces of ~1000 steps in length. You may need to<br />% decrease this for shorter traces or increase for longer.<br />%<br />% ^^^ NOTE: User can change following parameter to improve fit ^^^<br />%<br />nbins = 200;<br />%<br />%Create empty recieving matrices for histogram counts (nM) and the bin<br />%centers (xM)<br />nM = zeros(Nt,nbins);<br />xM = zeros(Nt,nbins);<br />for i = 1:Nt<br /> Y = T_denoise(:,i);<br /> [n,xout] = hist(Y,nbins);<br /> nM(i,:) = n(1,:);<br /> xM(i,:) = xout(1,:);<br />end<br />%xlswrite('nM',nM);<br />%xlswrite('xM',xM);<br />%<br />%---------------- Step 4: Find peaks for each APH histrogram ------------<br />%<br />%Use PeakFinder Function to find the Average Peak Locations (APL) - these<br />%translate to the bin number <br />APL = feval(@PeakFinder,nM,Ns);<br />[Nt,Nlev] = size(APL);<br />%<br />%Translate the APL bin numbers back to the normalized intensity level to<br />%determine the stochastic step levels (SLL)<br />SLL = zeros(Nt,Nlev);<br />for i = 1:Nt<br /> % Very rarely, the peak finder will flag no bins as peaks, in this case<br /> % we will use a general gaussian fit of the denoised data to find a<br /> % mean step value.<br /> if sum(APL(i,:)) == 0<br /> Ncurves = 1;<br /> X = T_denoise(i,:);<br /> [u,~,~,~] = feval(@fit_mix_gaussian,X,Ncurves);<br /> SLL(i,1) = u;<br /> else<br /> for j = 1:Nlev<br /> if APL(i,j) ~= 0<br /> N_bin = APL(i,j);<br /> SLL(i,j) = xM(i,N_bin);<br /> end<br /> end<br /> end<br />end<br />%<br />%----------- Step 5 Determine the NoRSE Fit Trace ---------------<br />%<br />for ii = 1:Nt<br /> RealT = T_denoise(:,ii);<br /> count = 0;<br /> for m = 1:Nlev<br /> if SLL(ii,m) > 0<br /> count = count + 1;<br /> end<br /> end<br /> for i = 1:time<br /> compare = zeros(count,1);<br /> for j = 1:count<br /> compare(j,1) = abs(RealT(i,1)-SLL(ii,j));<br /> end<br /> [C,I] = min(compare);<br /> M(i+1,ii,1) = SLL(ii,I);<br /> end<br /> % Because the noise reduction program takes away the first and last time<br /> % point, estimate their values as those immediately following and<br /> % proceeding respectively<br /> M(1,ii,1) = M(2,ii,1); <br /> M(time+2,ii,1) = M(time+1,ii,1);<br />end<br />% Record the noise-reduced traces in a csv format:<br />csvwrite('NRData.csv',M)<br />end<br />%<br />%<br />%<br />%<br />% <--------------Functions embeded in the Trace Analyzer ------------><br />%<br />function IC = NoiseReduc(NT)<br />%<br />%Coded by Nigel Reuel on 8.31.2010<br />%<br />%Adaptation of Chung and Kennedy (1991) forward-backward non-linear <br />%filtering technique to reduce the noise of our<br />%fluorescent signal in an attempt to resolve the single molecule events<br />%amidst the noise.<br />%<br />%Input: Takes supplied noisy trace (NT)<br />%Output: Returns cleaned trace without the first and last data point<br />%<br />time = length(NT);<br />%<br />%Set algorithm parameters (discussion on this in Bioinformatics paper - Supplement A):<br />%<br />% ^^^ NOTE: User can change following parameter to improve fit ^^^<br />%<br />N = [4 8 16 32]; %Sampling window length<br />K = length(N); %Number of sampling windows<br />M = 10; %Weighting parameter length<br />P = 40; %Parameter that effects sharpness of steps<br />%<br />% -------- The Forward-Backward non-linear Algorithm -----------<br />%<br />%First - Run each time step and store the average forward and backward<br />%predictors for each of the sampling windows<br />%<br />I_avg_f = zeros(time,K);<br />I_avg_b = zeros(time,K);<br />for g = 1:time<br /> %Run for each sampling window<br /> for k = 1:K<br /> %Solve for the average forward predictor (w/ some<br /> %logic to help solve the beginning of the trace<br /> window = N(k);<br /> if g == 1<br /> I_avg_f(g,k) = NT(1,1);<br /> elseif g - window - 1 < 0 <br /> I_avg_f(g,k) = sum(NT(1:g-1,1))/g;<br /> else <br /> epoint = g - window;<br /> spoint = g - 1;<br /> I_avg_f(g,k) = sum(NT(epoint:spoint,1))/window;<br /> end<br /> %Now do the same for the backward predictor<br /> if g == time<br /> I_avg_b(g,k) = NT(g,1);<br /> elseif g + window > time<br /> sw = time - g;<br /> I_avg_b(g,k) = sum(NT(g+1:time,1))/sw;<br /> else <br /> epoint = g + window;<br /> spoint = g + 1;<br /> I_avg_b(g,k) = sum(NT(spoint:epoint,1))/window;<br /> end<br /> end<br />end<br />%<br />%Second - Solve for non-normalized forward and backward weights:<br />f = zeros(time,K);<br />b = zeros(time,K);<br />for i = 1:time<br /> for k = 1:K<br /> Mstore_f = zeros(M,1);<br /> Mstore_b = zeros(M,1);<br /> for j = 0:M-1<br /> t_f = i - j;<br /> t_b = i + j;<br /> if t_f < 1<br /> Mstore_f(j+1,1) = (NT(i,1) - I_avg_f(i,k))^2;<br /> else<br /> Mstore_f(j+1,1) = (NT(t_f,1) - I_avg_f(t_f,k))^2;<br /> end<br /> if t_b > time<br /> Mstore_b(j+1,1) = (NT(i,1) - I_avg_b(i,k))^2;<br /> else<br /> Mstore_b(j+1,1) = (NT(t_b,1) - I_avg_b(t_b,k))^2;<br /> end<br /> end<br /> f(i,k) = sum(Mstore_f)^(-P);<br /> b(i,k) = sum(Mstore_b)^(-P);<br /> end<br />end<br />% Third - Solve for vector of normalization factors for the weights<br />C = zeros(time,1);<br />for i = 1:time<br /> Kstore = zeros(K,1);<br /> for k = 1:K<br /> Kstore(k,1) = f(i,k) + b(i,k);<br /> end<br /> C(i,1) = 1/sum(Kstore);<br />end<br />% Fourth and final step - Put all parameters together to solve for the<br />% intensities<br />Iclean = zeros(time,1);<br />for i = 1:time<br /> TempSum = zeros(K,1);<br /> for k = 1:K<br /> TempSum(k,1) = f(i,k)*C(i,1)*I_avg_f(i,k) + b(i,k)*C(i,1)*I_avg_b(i,k);<br /> end<br /> Iclean(i,1) = sum(TempSum);<br />end<br />IC = Iclean(2:time-1,1);<br />return<br />end</p>
<p>function PeakLocAvg = PeakFinder(nM,Ns)<br />% Coded by Nigel Reuel on 9.15.2010 updated in March 2011 by NFR<br />% This function finds the peaks of a histogram using a modified running<br />% average algorithm + peak flagging routine<br />%<br />[nT,nB] = size(nM);<br />% <br />% Size of window for running average algorithm (increasing this causes more<br />% of a rough approximation of peaks) User may need to vary for their <br />% specific data set)<br />%<br />% ^^^ NOTE: User can change following parameter to improve fit ^^^<br />%<br />SW = 3;<br />%<br />% Centered running average algorithm to help find peaks:<br />% <br />T_ra = zeros(nT,nB);<br />for j = 1:nT<br /> for i = 1:nB<br /> if i < (SW + 1)<br /> T_ra(j,i) = sum(nM(j,1:i+SW))/(i+SW);<br /> elseif i >= (SW+1) && i <= (nB-SW)<br /> T_ra(j,i) = sum(nM(j,i-SW:i+SW))/(SW*2+1);<br /> else<br /> T_ra(j,i) = sum(nM(j,i:nB))/(nB-i+1);<br /> end<br /> end<br />end<br />%xlswrite('T_ra',T_ra);<br />%<br />%----------------------Peak finding algorithm---------------------------<br />%<br />% Base function on physical example of climbing up and down hills. Drop<br />% flags on supposed peaks. Eliminate "peaks" that are molehills. Combine<br />% peaks that are close neighbors.<br />Flags = zeros(nT,nB,6);<br />% Layer 1 = Peak Flag<br />% Layer 2 = Right Valley Bin#<br />% Layer 3 = Left Valley Bin#<br />% Layer 4 = Peak magnitude (Histogram Count)<br />% Layer 5 = Right Valley magnitude (Histogram Count)<br />% Layer 6 = Left Valley magnitude (Histogram Count)<br />% Molehill level (number of responses that are not significant - some fraction of the avg response)<br />%<br />%<br />% ^^^ NOTE: User can change following parameter to improve fit ^^^<br />%<br />MoleL = (Ns/nB/SW)*.5; % = #responses/#bins/SW ~= 1/2 of the Avg. Response - no significance. <br />% The multiplied factor ('0.5') can be decreased to allow smaller peak heights to be deemed 'significant' <br />% or increased to allow for only the highest peaks to be flagged.<br />%<br />PeakLocAvg = zeros(nT,2);<br />for i = 1:nT<br /> for j = 1:nB<br /> if T_ra(i,j) > MoleL<br /> % Logic to find significant peaks at endpoints (and the R/L<br /> % valleys)<br /> if j == 1 && T_ra(i,j) > T_ra(i,j+1)<br /> Flags(i,j,1) = 1;<br /> Flags(i,j,4) = T_ra(i,j);<br /> Pval = T_ra(i,j);<br /> Pnext = T_ra(i,j+1);<br /> index = j;<br /> % Find right hand valley:<br /> while Pval > Pnext<br /> index = index + 1;<br /> Pval = T_ra(i,index);<br /> Pnext = T_ra(i,index+1);<br /> end<br /> Flags(i,j,2) = index;<br /> Flags(i,j,5) = T_ra(i,index);<br /> elseif j == 1 && T_ra(i,j) <= T_ra(i,j+1)<br /> Flags(i,j,1) = 0;<br /> elseif j == nB && T_ra(i,j) > T_ra(i,j-1)<br /> Flags(i,j,1) = 1;<br /> Flags(i,j,4) = T_ra(i,j);<br /> Pval = T_ra(i,j);<br /> Pprior = T_ra(i,j-1);<br /> index = j;<br /> % Find left hand valley:<br /> while Pval > Pprior<br /> index = index - 1;<br /> Pval = T_ra(i,index);<br /> Pprior = T_ra(i,index-1);<br /> end<br /> Flags(i,j,3) = index;<br /> Flags(i,j,6) = T_ra(i,index);<br /> elseif j == nB && T_ra(i,j) <= T_ra(i,j-1)<br /> Flags(i,j,1) = 0;<br /> % Logic to find peaks at all midpoints (and their R/L valleys)<br /> elseif T_ra(i,j) >= T_ra(i,j-1) && T_ra(i,j) >= T_ra(i,j+1)<br /> Flags(i,j,1) = 1;<br /> Flags(i,j,4) = T_ra(i,j);<br /> % Find left hand valley:<br /> Pval = T_ra(i,j);<br /> Pprior = T_ra(i,j-1);<br /> index = j;<br /> if j > 2<br /> Switch = 1;<br /> else<br /> Switch = 0;<br /> index = 1;<br /> end<br /> while Pval > Pprior && Switch == 1;<br /> index = index - 1;<br /> Pval = T_ra(i,index);<br /> Pprior = T_ra(i,index-1);<br /> if index == 2<br /> Switch = 0;<br /> index = 1;<br /> end<br /> end<br /> Flags(i,j,3) = index;<br /> Flags(i,j,6) = T_ra(i,index);<br /> % Find right hand valley:<br /> Pval = T_ra(i,j);<br /> Pnext = T_ra(i,j+1);<br /> index = j;<br /> if j < nB - 1<br /> Switch = 1;<br /> else<br /> Switch = 0;<br /> index = nB;<br /> end<br /> while Pval > Pnext && Switch == 1;<br /> index = index + 1;<br /> Pval = T_ra(i,index);<br /> Pnext = T_ra(i,index+1);<br /> if index == nB - 1;<br /> Switch = 0;<br /> index = nB;<br /> end<br /> end<br /> Flags(i,j,2) = index;<br /> Flags(i,j,5) = T_ra(i,index);<br /> end<br /> end<br /> end<br /> %{<br />X = (1:200)';<br />Y1 = Flags(i,:,1)';<br />Y2 = T_ra(i,:);<br />plot(X,Y1,X,Y2)<br /> %}<br />%Make a vector containing the peak locations:<br />PeakVec = 0;<br />count = 1;<br /> for j = 1:nB<br /> if Flags(i,j,1) == 1<br /> PeakVec(1,count) = j;<br /> count = count + 1;<br /> end<br /> end<br /> % Now analyze peak pairs:<br /> % Specify significant peak valley distance and significant bin<br /> % distance:<br /> %<br /> % ^^^ NOTE: User can change following parameter to improve fit. We found this parameter is analogous to that specified in line 297 ^^^<br /> %<br /> SigD = MoleL; <br /> %<br /> % ^^^ NOTE: User can change following parameter to improve fit ^^^<br /> %<br /> SigBinD = nB*(.025); % = 2.5% of the total bin #.<br /> % Decreasing the multiplied factor (0.025) will allow for peaks closer<br /> % together to remain unique, increasing the factor causes more of the<br /> % flagged peaks to be unified into single 'significant' peaks.<br /> %<br /> %<br /> Npeaks = length(PeakVec);<br /> % Determine the peak pairwise interactions...each will have a right<br /> % hand value and left hand value (except for the endpoints):<br /> % 1: Unique<br /> % 2: Combine<br /> % 3: Ignore<br /> % Create a matrix to recieve these calculations:<br /> PeakCode = zeros(2,Npeaks); % Row1 = to right, Row2 = to left<br /> for j = 1:Npeaks-1<br /> % Determine the peak bin numbers:<br /> PBN1 = PeakVec(1,j);<br /> PBN2 = PeakVec(1,j+1);<br /> % Calcualate the valley distances (left to right)<br /> VD1 = abs(Flags(i,PBN1,4) - Flags(i,PBN1,5)); <br /> VD2 = abs(Flags(i,PBN2,6) - Flags(i,PBN2,4));<br /> %<br /> if VD1 <= SigD && VD2 <= SigD <br /> % Combine<br /> PeakCode(1,j) = 2;<br /> PeakCode(2,j+1) = 2;<br /> elseif VD1 >= SigD && VD2 <= SigD<br /> % Left Peak sig, right peak ignore<br /> PeakCode(1,j) = 1;<br /> PeakCode(2,j+1) = 3;<br /> elseif VD1 <= SigD && VD2 >= SigD<br /> % Left peak ignore, right peak significant<br /> PeakCode(1,j) = 3;<br /> PeakCode(2,j+1) = 1;<br /> elseif VD1 >= SigD && VD2 >= SigD<br /> % Both peaks are significant<br /> PeakCode(1,j) = 1;<br /> PeakCode(2,j+1) = 1;<br /> end<br /> end<br /> % Logic for the peak endpoints<br /> Left = abs(Flags(i,PeakVec(1,1),4) - Flags(i,PeakVec(1,1),6));<br /> if Left >= SigD<br /> PeakCode(2,1) = 1;<br /> else<br /> PeakCode(2,1) = 3;<br /> end<br /> Right = abs(Flags(i,PeakVec(1,Npeaks),4) - Flags(i,PeakVec(1,Npeaks),5));<br /> if Right >= SigD<br /> PeakCode(1,Npeaks) = 1;<br /> else<br /> PeakCode(1,Npeaks) = 3;<br /> end<br /> % Now determine what to do with the peaks. Look at combinations<br /> % reading left to right. Put signifcant peaks into a vector.<br /> SPV = 0;<br /> j = 1;<br /> count = 1;<br /> while j <= Npeaks<br /> C1 = PeakCode(1,j);<br /> C2 = PeakCode(2,j);<br /> if C1 == 1 && C2 == 1<br /> % This is a bonafide unique peak. Put it in the vector.<br /> SPV(1,count) = PeakVec(1,j);<br /> count = count + 1;<br /> j = j+1;<br /> elseif C1 == 2 <br /> % This peak will be merged with other peak(s)<br /> count2 = 2;<br /> Merge = PeakVec(1,j);<br /> while C1 == 2<br /> j = j + 1;<br /> C1 = PeakCode(1,j);<br /> Merge(1,count2) = PeakVec(1,j);<br /> count2 = count2 + 1;<br /> end<br /> % Insert logic here to eliminate valleys and slow rises (see<br /> % what is happening at the significant enpoints!)<br /> End1 = Flags(i,Merge(1,1),4) - Flags(i,Merge(1,1),6);<br /> End2 = Flags(i,Merge(1,end),4) - Flags(i,Merge(1,end),5);<br /> Distance = Merge(1,end) - Merge(1,1);<br /> if End1 >= SigD && End2 >= SigD<br /> SPV(1,count) = round(mean(Merge));<br /> count = count + 1;<br /> j = j + 1;<br /> elseif End1 >= SigD && End2 <= (SigD)*(-1) && Distance >= SigBinD<br /> SPV(1,count) = round(mean(Merge));<br /> count = count + 1;<br /> j = j + 1;<br /> elseif End1 <= SigD*(-1) && End2 >= (SigD) && Distance >= SigBinD<br /> SPV(1,count) = round(mean(Merge));<br /> count = count + 1;<br /> j = j + 1;<br /> end<br /> else % Represents 1/3, 3/1, and 3/3 combos which are all ignored!!<br /> j = j + 1;<br /> end<br /> end<br /> Nsigpeaks = length(SPV);<br /> for j = 1:Nsigpeaks<br /> PeakLocAvg(i,j) = SPV(1,j);<br /> end<br /> %{<br /> % Plot the PeakLocAvg to see how the algorithm performs)<br /> S = Nsigpeaks;<br /> FF = zeros(1,nB);<br /> for k = 1:S<br /> Temp = PeakLocAvg(i,k);<br /> FF(1,Temp) = 1;<br /> end<br /> X = (1:200)';<br /> Y1 = FF(1,:)';<br /> Y2 = T_ra(i,:);<br /> plot(X,Y1,X,Y2)<br /> stop = 1;<br /> %} <br />end</p>
<p><br />return<br />end</p>
<p>function [u,sig,t,iter] = fit_mix_gaussian( X,M )<br />%<br />% Gaussian Fit Function by Ohad Gal (c) 2003<br />% Download at:<br />% <a href="http://www.mathworks.com/matlabcentral/fileexchange/4222-a-collection-of-fitting-functions">http://www.mathworks.com/matlabcentral/fileexchange/4222-a-collection-of-fitting-functions</a><br />% on 3.21.2011<br />%<br />% fit_mix_gaussian - fit parameters for a mixed-gaussian distribution using EM algorithm<br />%<br />% format: [u,sig,t,iter] = fit_mix_gaussian( X,M )<br />%<br />% input: X - input samples, Nx1 vector<br />% M - number of gaussians which are assumed to compose the distribution<br />%<br />% output: u - fitted mean for each gaussian<br />% sig - fitted standard deviation for each gaussian<br />% t - probability of each gaussian in the complete distribution<br />% iter- number of iterations done by the function<br />%<br />% initialize and initial guesses<br />N = length( X );<br />Z = ones(N,M) * 1/M; % indicators vector<br />P = zeros(N,M); % probabilities vector for each sample and each model<br />t = ones(1,M) * 1/M; % distribution of the gaussian models in the samples<br />u = linspace(min(X),max(X),M); % mean vector<br />sig2 = ones(1,M) * var(X) / sqrt(M); % variance vector<br />C = 1/sqrt(2*pi); % just a constant<br />Ic = ones(N,1); % - enable a row replication by the * operator<br />Ir = ones(1,M); % - enable a column replication by the * operator<br />Q = zeros(N,M); % user variable to determine when we have converged to a steady solution<br />thresh = 1e-3;<br />step = N;<br />last_step = inf;<br />iter = 0;<br />min_iter = 10;</p>
<p>% main convergence loop, assume gaussians are 1D<br />while ((( abs((step/last_step)-1) > thresh) & (step>(N*eps)) ) | (iter<min_iter) ) <br /> <br /> % E step<br /> % ========<br /> Q = Z;<br /> P = C ./ (Ic*sqrt(sig2)) .* exp( -((X*Ir - Ic*u).^2)./(2*Ic*sig2) );<br /> for m = 1:M<br /> Z(:,m) = (P(:,m)*t(m))./(P*t(:));<br /> end<br /> <br /> % estimate convergence step size and update iteration number<br /> prog_text = sprintf(repmat( '\b',1,(iter>0)*12+ceil(log10(iter+1)) ));<br /> iter = iter + 1;<br /> last_step = step * (1 + eps) + eps;<br /> step = sum(sum(abs(Q-Z)));<br /> fprintf( '%s%d iterations\n',prog_text,iter );</p>
<p>% M step<br /> % ========<br /> Zm = sum(Z); % sum each column<br /> Zm(find(Zm==0)) = eps; % avoid devision by zero<br /> u = (X')*Z ./ Zm;<br /> sig2 = sum(((X*Ir - Ic*u).^2).*Z) ./ Zm;<br /> t = Zm/N;<br />end</p>
<p>sig = sqrt( sig2 );</p>
<p>return<br />end</p></div><div class="feed-description"><p>Abstract: </p>
<p class="AbstractText">NoRSE was developed to analyze high-frequency data sets collected from multi-state, dynamic experiments, such as molecular adsorption and desorption onto carbon nanotubes. As technology improves sampling frequency, these stochastic data sets become increasingly large with faster dynamic events. More efficient algorithms are needed to accurately locate the unique states in each time trace. NoRSE adapts and optimizes a previously published noise reduction algorithm (Chung <i>et al</i>., 1991) and uses a custom peak flagging routine to rapidly identify unique event states. The algorithm is explained using experimental data from our lab and its fitting accuracy and efficiency are then shown with a generalized model of stochastic data sets. The algorithm is compared to another recently published state finding algorithm and is found to be 27 times faster and more accurate over 55% of the generalized experimental space. NoRSE is written as an M-file for Matlab.</p>
<p class="AbstractText">Link to paper: <a href="http://bioinformatics.oxfordjournals.org/content/early/2011/11/14/bioinformatics.btr632"></a><a href="http://bioinformatics.oxfordjournals.org/content/early/2011/11/14/bioinformatics.btr632">http://bioinformatics.oxfordjournals.org/content/early/2011/11/14/bioinformatics.btr632</a></p>
<p class="AbstractText">Supplied code: NoRSE Algorithm for MATLAB</p>
<p>function NoRSE<br />% Originally Coded by Nigel F. Reuel 4.01.2011<br />% Massachusetts Institue of Technology<br />% Department of Chemical Engineering<br />%<br />% Last Updated 10.27.2011<br />%<br />% NoRSE was developed to analyze high-frequency data sets collected <br />% from multi-state, dynamic experiments, such as molecular adsorption <br />% and desorption onto carbon nanotubes. Full description available in the<br />% Bioinformatics publication entitled, "NoRSE: Noise Reduction and State <br />% Evaluator for High-Frequency Single Event Traces" by NF Reuel et al.<br />%<br />% The code is written to be as general as possible, but there are a few<br />% parameters that can be modified by the user to improve fitting to their<br />% specific data sets. See lines quoted below for further discussion of <br />% these parameters.<br />%<br />% ^^^ User specified parameters in the noise reduction subroutine ^^^<br />%<br />% 1. p (line 163)<br />% 2. M (line 162)<br />% 3. N (line 160)<br />%<br />% ^^^ User specified parameters in the peak finder subroutine ^^^<br />%<br />% 1. nbins (line 65)<br />% 2. SW (line 262)<br />% 3. MoleL (line 297)<br />% 4. SigBinD (line 413)<br />%<br />% ----- Step 1: Read in the Traces (can normalize here if desired) --------<br />% <br />% Data Columns = Trace number , Data Rows = Time steps<br />%<br />RTD = csvread('Data.csv');<br />%Determine the number of time steps (Ns) and number of transducers (Nt):<br />[Ns,Nt]= size(RTD);<br />RTD_n = RTD;<br />% Create empty matrix to recieve the noise reduced traces:<br />M = zeros(Ns,Nt);<br />%<br />% ---------------- Step 2: Denoise each of the Traces --------------------<br />%<br />%The denoising algorithm loses the first and last time steps, so set a new<br />%time window:<br />time = Ns - 2;<br />%Create an empty matrix to recieve the denoised trace data:<br />T_denoise = zeros(time,Nt);<br />for i = 1:Nt<br /> NT = RTD_n(:,i);<br /> IC = feval(@NoiseReduc,NT);<br /> T_denoise(:,i) = IC(:,1);<br />end<br />% csvwrite(['T_denoise',int2str(iii),'.csv'],T_denoise);<br />%<br />%---------------- Step 3: Construct APH for each trace ------------------<br />%<br />% Number of bins for the histogram construction. We found 200 works well<br />% for experimental traces of ~1000 steps in length. You may need to<br />% decrease this for shorter traces or increase for longer.<br />%<br />% ^^^ NOTE: User can change following parameter to improve fit ^^^<br />%<br />nbins = 200;<br />%<br />%Create empty recieving matrices for histogram counts (nM) and the bin<br />%centers (xM)<br />nM = zeros(Nt,nbins);<br />xM = zeros(Nt,nbins);<br />for i = 1:Nt<br /> Y = T_denoise(:,i);<br /> [n,xout] = hist(Y,nbins);<br /> nM(i,:) = n(1,:);<br /> xM(i,:) = xout(1,:);<br />end<br />%xlswrite('nM',nM);<br />%xlswrite('xM',xM);<br />%<br />%---------------- Step 4: Find peaks for each APH histrogram ------------<br />%<br />%Use PeakFinder Function to find the Average Peak Locations (APL) - these<br />%translate to the bin number <br />APL = feval(@PeakFinder,nM,Ns);<br />[Nt,Nlev] = size(APL);<br />%<br />%Translate the APL bin numbers back to the normalized intensity level to<br />%determine the stochastic step levels (SLL)<br />SLL = zeros(Nt,Nlev);<br />for i = 1:Nt<br /> % Very rarely, the peak finder will flag no bins as peaks, in this case<br /> % we will use a general gaussian fit of the denoised data to find a<br /> % mean step value.<br /> if sum(APL(i,:)) == 0<br /> Ncurves = 1;<br /> X = T_denoise(i,:);<br /> [u,~,~,~] = feval(@fit_mix_gaussian,X,Ncurves);<br /> SLL(i,1) = u;<br /> else<br /> for j = 1:Nlev<br /> if APL(i,j) ~= 0<br /> N_bin = APL(i,j);<br /> SLL(i,j) = xM(i,N_bin);<br /> end<br /> end<br /> end<br />end<br />%<br />%----------- Step 5 Determine the NoRSE Fit Trace ---------------<br />%<br />for ii = 1:Nt<br /> RealT = T_denoise(:,ii);<br /> count = 0;<br /> for m = 1:Nlev<br /> if SLL(ii,m) > 0<br /> count = count + 1;<br /> end<br /> end<br /> for i = 1:time<br /> compare = zeros(count,1);<br /> for j = 1:count<br /> compare(j,1) = abs(RealT(i,1)-SLL(ii,j));<br /> end<br /> [C,I] = min(compare);<br /> M(i+1,ii,1) = SLL(ii,I);<br /> end<br /> % Because the noise reduction program takes away the first and last time<br /> % point, estimate their values as those immediately following and<br /> % proceeding respectively<br /> M(1,ii,1) = M(2,ii,1); <br /> M(time+2,ii,1) = M(time+1,ii,1);<br />end<br />% Record the noise-reduced traces in a csv format:<br />csvwrite('NRData.csv',M)<br />end<br />%<br />%<br />%<br />%<br />% <--------------Functions embeded in the Trace Analyzer ------------><br />%<br />function IC = NoiseReduc(NT)<br />%<br />%Coded by Nigel Reuel on 8.31.2010<br />%<br />%Adaptation of Chung and Kennedy (1991) forward-backward non-linear <br />%filtering technique to reduce the noise of our<br />%fluorescent signal in an attempt to resolve the single molecule events<br />%amidst the noise.<br />%<br />%Input: Takes supplied noisy trace (NT)<br />%Output: Returns cleaned trace without the first and last data point<br />%<br />time = length(NT);<br />%<br />%Set algorithm parameters (discussion on this in Bioinformatics paper - Supplement A):<br />%<br />% ^^^ NOTE: User can change following parameter to improve fit ^^^<br />%<br />N = [4 8 16 32]; %Sampling window length<br />K = length(N); %Number of sampling windows<br />M = 10; %Weighting parameter length<br />P = 40; %Parameter that effects sharpness of steps<br />%<br />% -------- The Forward-Backward non-linear Algorithm -----------<br />%<br />%First - Run each time step and store the average forward and backward<br />%predictors for each of the sampling windows<br />%<br />I_avg_f = zeros(time,K);<br />I_avg_b = zeros(time,K);<br />for g = 1:time<br /> %Run for each sampling window<br /> for k = 1:K<br /> %Solve for the average forward predictor (w/ some<br /> %logic to help solve the beginning of the trace<br /> window = N(k);<br /> if g == 1<br /> I_avg_f(g,k) = NT(1,1);<br /> elseif g - window - 1 < 0 <br /> I_avg_f(g,k) = sum(NT(1:g-1,1))/g;<br /> else <br /> epoint = g - window;<br /> spoint = g - 1;<br /> I_avg_f(g,k) = sum(NT(epoint:spoint,1))/window;<br /> end<br /> %Now do the same for the backward predictor<br /> if g == time<br /> I_avg_b(g,k) = NT(g,1);<br /> elseif g + window > time<br /> sw = time - g;<br /> I_avg_b(g,k) = sum(NT(g+1:time,1))/sw;<br /> else <br /> epoint = g + window;<br /> spoint = g + 1;<br /> I_avg_b(g,k) = sum(NT(spoint:epoint,1))/window;<br /> end<br /> end<br />end<br />%<br />%Second - Solve for non-normalized forward and backward weights:<br />f = zeros(time,K);<br />b = zeros(time,K);<br />for i = 1:time<br /> for k = 1:K<br /> Mstore_f = zeros(M,1);<br /> Mstore_b = zeros(M,1);<br /> for j = 0:M-1<br /> t_f = i - j;<br /> t_b = i + j;<br /> if t_f < 1<br /> Mstore_f(j+1,1) = (NT(i,1) - I_avg_f(i,k))^2;<br /> else<br /> Mstore_f(j+1,1) = (NT(t_f,1) - I_avg_f(t_f,k))^2;<br /> end<br /> if t_b > time<br /> Mstore_b(j+1,1) = (NT(i,1) - I_avg_b(i,k))^2;<br /> else<br /> Mstore_b(j+1,1) = (NT(t_b,1) - I_avg_b(t_b,k))^2;<br /> end<br /> end<br /> f(i,k) = sum(Mstore_f)^(-P);<br /> b(i,k) = sum(Mstore_b)^(-P);<br /> end<br />end<br />% Third - Solve for vector of normalization factors for the weights<br />C = zeros(time,1);<br />for i = 1:time<br /> Kstore = zeros(K,1);<br /> for k = 1:K<br /> Kstore(k,1) = f(i,k) + b(i,k);<br /> end<br /> C(i,1) = 1/sum(Kstore);<br />end<br />% Fourth and final step - Put all parameters together to solve for the<br />% intensities<br />Iclean = zeros(time,1);<br />for i = 1:time<br /> TempSum = zeros(K,1);<br /> for k = 1:K<br /> TempSum(k,1) = f(i,k)*C(i,1)*I_avg_f(i,k) + b(i,k)*C(i,1)*I_avg_b(i,k);<br /> end<br /> Iclean(i,1) = sum(TempSum);<br />end<br />IC = Iclean(2:time-1,1);<br />return<br />end</p>
<p>function PeakLocAvg = PeakFinder(nM,Ns)<br />% Coded by Nigel Reuel on 9.15.2010 updated in March 2011 by NFR<br />% This function finds the peaks of a histogram using a modified running<br />% average algorithm + peak flagging routine<br />%<br />[nT,nB] = size(nM);<br />% <br />% Size of window for running average algorithm (increasing this causes more<br />% of a rough approximation of peaks) User may need to vary for their <br />% specific data set)<br />%<br />% ^^^ NOTE: User can change following parameter to improve fit ^^^<br />%<br />SW = 3;<br />%<br />% Centered running average algorithm to help find peaks:<br />% <br />T_ra = zeros(nT,nB);<br />for j = 1:nT<br /> for i = 1:nB<br /> if i < (SW + 1)<br /> T_ra(j,i) = sum(nM(j,1:i+SW))/(i+SW);<br /> elseif i >= (SW+1) && i <= (nB-SW)<br /> T_ra(j,i) = sum(nM(j,i-SW:i+SW))/(SW*2+1);<br /> else<br /> T_ra(j,i) = sum(nM(j,i:nB))/(nB-i+1);<br /> end<br /> end<br />end<br />%xlswrite('T_ra',T_ra);<br />%<br />%----------------------Peak finding algorithm---------------------------<br />%<br />% Base function on physical example of climbing up and down hills. Drop<br />% flags on supposed peaks. Eliminate "peaks" that are molehills. Combine<br />% peaks that are close neighbors.<br />Flags = zeros(nT,nB,6);<br />% Layer 1 = Peak Flag<br />% Layer 2 = Right Valley Bin#<br />% Layer 3 = Left Valley Bin#<br />% Layer 4 = Peak magnitude (Histogram Count)<br />% Layer 5 = Right Valley magnitude (Histogram Count)<br />% Layer 6 = Left Valley magnitude (Histogram Count)<br />% Molehill level (number of responses that are not significant - some fraction of the avg response)<br />%<br />%<br />% ^^^ NOTE: User can change following parameter to improve fit ^^^<br />%<br />MoleL = (Ns/nB/SW)*.5; % = #responses/#bins/SW ~= 1/2 of the Avg. Response - no significance. <br />% The multiplied factor ('0.5') can be decreased to allow smaller peak heights to be deemed 'significant' <br />% or increased to allow for only the highest peaks to be flagged.<br />%<br />PeakLocAvg = zeros(nT,2);<br />for i = 1:nT<br /> for j = 1:nB<br /> if T_ra(i,j) > MoleL<br /> % Logic to find significant peaks at endpoints (and the R/L<br /> % valleys)<br /> if j == 1 && T_ra(i,j) > T_ra(i,j+1)<br /> Flags(i,j,1) = 1;<br /> Flags(i,j,4) = T_ra(i,j);<br /> Pval = T_ra(i,j);<br /> Pnext = T_ra(i,j+1);<br /> index = j;<br /> % Find right hand valley:<br /> while Pval > Pnext<br /> index = index + 1;<br /> Pval = T_ra(i,index);<br /> Pnext = T_ra(i,index+1);<br /> end<br /> Flags(i,j,2) = index;<br /> Flags(i,j,5) = T_ra(i,index);<br /> elseif j == 1 && T_ra(i,j) <= T_ra(i,j+1)<br /> Flags(i,j,1) = 0;<br /> elseif j == nB && T_ra(i,j) > T_ra(i,j-1)<br /> Flags(i,j,1) = 1;<br /> Flags(i,j,4) = T_ra(i,j);<br /> Pval = T_ra(i,j);<br /> Pprior = T_ra(i,j-1);<br /> index = j;<br /> % Find left hand valley:<br /> while Pval > Pprior<br /> index = index - 1;<br /> Pval = T_ra(i,index);<br /> Pprior = T_ra(i,index-1);<br /> end<br /> Flags(i,j,3) = index;<br /> Flags(i,j,6) = T_ra(i,index);<br /> elseif j == nB && T_ra(i,j) <= T_ra(i,j-1)<br /> Flags(i,j,1) = 0;<br /> % Logic to find peaks at all midpoints (and their R/L valleys)<br /> elseif T_ra(i,j) >= T_ra(i,j-1) && T_ra(i,j) >= T_ra(i,j+1)<br /> Flags(i,j,1) = 1;<br /> Flags(i,j,4) = T_ra(i,j);<br /> % Find left hand valley:<br /> Pval = T_ra(i,j);<br /> Pprior = T_ra(i,j-1);<br /> index = j;<br /> if j > 2<br /> Switch = 1;<br /> else<br /> Switch = 0;<br /> index = 1;<br /> end<br /> while Pval > Pprior && Switch == 1;<br /> index = index - 1;<br /> Pval = T_ra(i,index);<br /> Pprior = T_ra(i,index-1);<br /> if index == 2<br /> Switch = 0;<br /> index = 1;<br /> end<br /> end<br /> Flags(i,j,3) = index;<br /> Flags(i,j,6) = T_ra(i,index);<br /> % Find right hand valley:<br /> Pval = T_ra(i,j);<br /> Pnext = T_ra(i,j+1);<br /> index = j;<br /> if j < nB - 1<br /> Switch = 1;<br /> else<br /> Switch = 0;<br /> index = nB;<br /> end<br /> while Pval > Pnext && Switch == 1;<br /> index = index + 1;<br /> Pval = T_ra(i,index);<br /> Pnext = T_ra(i,index+1);<br /> if index == nB - 1;<br /> Switch = 0;<br /> index = nB;<br /> end<br /> end<br /> Flags(i,j,2) = index;<br /> Flags(i,j,5) = T_ra(i,index);<br /> end<br /> end<br /> end<br /> %{<br />X = (1:200)';<br />Y1 = Flags(i,:,1)';<br />Y2 = T_ra(i,:);<br />plot(X,Y1,X,Y2)<br /> %}<br />%Make a vector containing the peak locations:<br />PeakVec = 0;<br />count = 1;<br /> for j = 1:nB<br /> if Flags(i,j,1) == 1<br /> PeakVec(1,count) = j;<br /> count = count + 1;<br /> end<br /> end<br /> % Now analyze peak pairs:<br /> % Specify significant peak valley distance and significant bin<br /> % distance:<br /> %<br /> % ^^^ NOTE: User can change following parameter to improve fit. We found this parameter is analogous to that specified in line 297 ^^^<br /> %<br /> SigD = MoleL; <br /> %<br /> % ^^^ NOTE: User can change following parameter to improve fit ^^^<br /> %<br /> SigBinD = nB*(.025); % = 2.5% of the total bin #.<br /> % Decreasing the multiplied factor (0.025) will allow for peaks closer<br /> % together to remain unique, increasing the factor causes more of the<br /> % flagged peaks to be unified into single 'significant' peaks.<br /> %<br /> %<br /> Npeaks = length(PeakVec);<br /> % Determine the peak pairwise interactions...each will have a right<br /> % hand value and left hand value (except for the endpoints):<br /> % 1: Unique<br /> % 2: Combine<br /> % 3: Ignore<br /> % Create a matrix to recieve these calculations:<br /> PeakCode = zeros(2,Npeaks); % Row1 = to right, Row2 = to left<br /> for j = 1:Npeaks-1<br /> % Determine the peak bin numbers:<br /> PBN1 = PeakVec(1,j);<br /> PBN2 = PeakVec(1,j+1);<br /> % Calcualate the valley distances (left to right)<br /> VD1 = abs(Flags(i,PBN1,4) - Flags(i,PBN1,5)); <br /> VD2 = abs(Flags(i,PBN2,6) - Flags(i,PBN2,4));<br /> %<br /> if VD1 <= SigD && VD2 <= SigD <br /> % Combine<br /> PeakCode(1,j) = 2;<br /> PeakCode(2,j+1) = 2;<br /> elseif VD1 >= SigD && VD2 <= SigD<br /> % Left Peak sig, right peak ignore<br /> PeakCode(1,j) = 1;<br /> PeakCode(2,j+1) = 3;<br /> elseif VD1 <= SigD && VD2 >= SigD<br /> % Left peak ignore, right peak significant<br /> PeakCode(1,j) = 3;<br /> PeakCode(2,j+1) = 1;<br /> elseif VD1 >= SigD && VD2 >= SigD<br /> % Both peaks are significant<br /> PeakCode(1,j) = 1;<br /> PeakCode(2,j+1) = 1;<br /> end<br /> end<br /> % Logic for the peak endpoints<br /> Left = abs(Flags(i,PeakVec(1,1),4) - Flags(i,PeakVec(1,1),6));<br /> if Left >= SigD<br /> PeakCode(2,1) = 1;<br /> else<br /> PeakCode(2,1) = 3;<br /> end<br /> Right = abs(Flags(i,PeakVec(1,Npeaks),4) - Flags(i,PeakVec(1,Npeaks),5));<br /> if Right >= SigD<br /> PeakCode(1,Npeaks) = 1;<br /> else<br /> PeakCode(1,Npeaks) = 3;<br /> end<br /> % Now determine what to do with the peaks. Look at combinations<br /> % reading left to right. Put signifcant peaks into a vector.<br /> SPV = 0;<br /> j = 1;<br /> count = 1;<br /> while j <= Npeaks<br /> C1 = PeakCode(1,j);<br /> C2 = PeakCode(2,j);<br /> if C1 == 1 && C2 == 1<br /> % This is a bonafide unique peak. Put it in the vector.<br /> SPV(1,count) = PeakVec(1,j);<br /> count = count + 1;<br /> j = j+1;<br /> elseif C1 == 2 <br /> % This peak will be merged with other peak(s)<br /> count2 = 2;<br /> Merge = PeakVec(1,j);<br /> while C1 == 2<br /> j = j + 1;<br /> C1 = PeakCode(1,j);<br /> Merge(1,count2) = PeakVec(1,j);<br /> count2 = count2 + 1;<br /> end<br /> % Insert logic here to eliminate valleys and slow rises (see<br /> % what is happening at the significant enpoints!)<br /> End1 = Flags(i,Merge(1,1),4) - Flags(i,Merge(1,1),6);<br /> End2 = Flags(i,Merge(1,end),4) - Flags(i,Merge(1,end),5);<br /> Distance = Merge(1,end) - Merge(1,1);<br /> if End1 >= SigD && End2 >= SigD<br /> SPV(1,count) = round(mean(Merge));<br /> count = count + 1;<br /> j = j + 1;<br /> elseif End1 >= SigD && End2 <= (SigD)*(-1) && Distance >= SigBinD<br /> SPV(1,count) = round(mean(Merge));<br /> count = count + 1;<br /> j = j + 1;<br /> elseif End1 <= SigD*(-1) && End2 >= (SigD) && Distance >= SigBinD<br /> SPV(1,count) = round(mean(Merge));<br /> count = count + 1;<br /> j = j + 1;<br /> end<br /> else % Represents 1/3, 3/1, and 3/3 combos which are all ignored!!<br /> j = j + 1;<br /> end<br /> end<br /> Nsigpeaks = length(SPV);<br /> for j = 1:Nsigpeaks<br /> PeakLocAvg(i,j) = SPV(1,j);<br /> end<br /> %{<br /> % Plot the PeakLocAvg to see how the algorithm performs)<br /> S = Nsigpeaks;<br /> FF = zeros(1,nB);<br /> for k = 1:S<br /> Temp = PeakLocAvg(i,k);<br /> FF(1,Temp) = 1;<br /> end<br /> X = (1:200)';<br /> Y1 = FF(1,:)';<br /> Y2 = T_ra(i,:);<br /> plot(X,Y1,X,Y2)<br /> stop = 1;<br /> %} <br />end</p>
<p><br />return<br />end</p>
<p>function [u,sig,t,iter] = fit_mix_gaussian( X,M )<br />%<br />% Gaussian Fit Function by Ohad Gal (c) 2003<br />% Download at:<br />% <a href="http://www.mathworks.com/matlabcentral/fileexchange/4222-a-collection-of-fitting-functions">http://www.mathworks.com/matlabcentral/fileexchange/4222-a-collection-of-fitting-functions</a><br />% on 3.21.2011<br />%<br />% fit_mix_gaussian - fit parameters for a mixed-gaussian distribution using EM algorithm<br />%<br />% format: [u,sig,t,iter] = fit_mix_gaussian( X,M )<br />%<br />% input: X - input samples, Nx1 vector<br />% M - number of gaussians which are assumed to compose the distribution<br />%<br />% output: u - fitted mean for each gaussian<br />% sig - fitted standard deviation for each gaussian<br />% t - probability of each gaussian in the complete distribution<br />% iter- number of iterations done by the function<br />%<br />% initialize and initial guesses<br />N = length( X );<br />Z = ones(N,M) * 1/M; % indicators vector<br />P = zeros(N,M); % probabilities vector for each sample and each model<br />t = ones(1,M) * 1/M; % distribution of the gaussian models in the samples<br />u = linspace(min(X),max(X),M); % mean vector<br />sig2 = ones(1,M) * var(X) / sqrt(M); % variance vector<br />C = 1/sqrt(2*pi); % just a constant<br />Ic = ones(N,1); % - enable a row replication by the * operator<br />Ir = ones(1,M); % - enable a column replication by the * operator<br />Q = zeros(N,M); % user variable to determine when we have converged to a steady solution<br />thresh = 1e-3;<br />step = N;<br />last_step = inf;<br />iter = 0;<br />min_iter = 10;</p>
<p>% main convergence loop, assume gaussians are 1D<br />while ((( abs((step/last_step)-1) > thresh) & (step>(N*eps)) ) | (iter<min_iter) ) <br /> <br /> % E step<br /> % ========<br /> Q = Z;<br /> P = C ./ (Ic*sqrt(sig2)) .* exp( -((X*Ir - Ic*u).^2)./(2*Ic*sig2) );<br /> for m = 1:M<br /> Z(:,m) = (P(:,m)*t(m))./(P*t(:));<br /> end<br /> <br /> % estimate convergence step size and update iteration number<br /> prog_text = sprintf(repmat( '\b',1,(iter>0)*12+ceil(log10(iter+1)) ));<br /> iter = iter + 1;<br /> last_step = step * (1 + eps) + eps;<br /> step = sum(sum(abs(Q-Z)));<br /> fprintf( '%s%d iterations\n',prog_text,iter );</p>
<p>% M step<br /> % ========<br /> Zm = sum(Z); % sum each column<br /> Zm(find(Zm==0)) = eps; % avoid devision by zero<br /> u = (X')*Z ./ Zm;<br /> sig2 = sum(((X*Ir - Ic*u).^2).*Z) ./ Zm;<br /> t = Zm/N;<br />end</p>
<p>sig = sqrt( sig2 );</p>
<p>return<br />end</p></div>NoRSE and SFA Comparison Code2012-10-05T22:00:37+00:002012-10-05T22:00:37+00:00http://stranogroup.mit.edu/index.php/resources/19-simulation-and-analysis-codes/norse-code-files/41-norse-and-sfa-comparison-codeSuper Userzulissi@mit.edu<div class="feed-description"><p>function NoRSE_SFA_Comparison<br />% Finished by Nigel F. Reuel on 3.31.2011<br />% Massachusetts Institute of Technology<br />% Department of Chemical Engineering<br />%<br />% This code uses the generalized model of stochastic data sets to generate<br />% traces with embeded event states. It is currently set to compare NoRSE<br />% with the SFA algorithm. For each parameter set (A,B,C) it does the<br />% following:<br />%<br />% 1) Generates 100 traces<br />% 2) Fits traces with NoRSE<br />% 3) Fits traces with SFA<br />% 4) Computes average error for each algorithm<br />% 5) Reports the average times to run the 100 traces through NoRSE and SFA<br />%<br />% The errors and run times are then recorded in a *.csv file for further<br />% analysis.<br />%<br />% The program goes as follows:<br />%<br />% 1) Specify the trace generator parameters<br />%<br />A = 1;<br />B = [5 7 10 15 20 35 50 75 100 125 150 175 200 250 300 400 500];<br />C = [0.1 .15 .2 .3 .35 .4 .45 .5 .6 .7 .8 .85 .9 1 2 3 4 5 6 7 8 9 10];</p>
<p>% Number of traces: <br />nt = 100;<br />% Number of steps:<br />ns = 1000;<br />% Answer matrices:<br />NorseQual = zeros(length(B),length(C));<br />FSQual = zeros(length(B),length(C));<br />TimeN = zeros(length(B),length(C));<br />TimeFS = zeros(length(B),length(C));<br />%<br />for i = 1:length(B)<br /> for j = 1:length(C)<br /> b = B(i);<br /> c = C(j);<br /> disp(['Analyzer started for condition B = ',num2str(b),' and C = ',num2str(c)])<br /> % 2) Generate Traces<br /> M = feval(@GenTrace,A,b,c,nt,ns);<br /> % 3) Generate Fits<br /> [M2,T] = feval(@TraceFitter,M(:,:,2));<br /> % 4) Compare Fits<br /> % First you must specify the tolerance of fit:<br /> TF = 0.02; %<--- Remember this is on a scale of 0 to 1<br /> % Reset the count of bad NoRSE points and bad FS points:<br /> BadN = 0;<br /> BadFS = 0;<br /> for k = 1:nt<br /> for l = 1:ns<br /> if abs(M2(l,k,1)-M(l,k,1))> TF<br /> BadN = BadN + 1;<br /> end<br /> if abs(M2(l,k,2)-M(l,k,1))> TF<br /> BadFS = BadFS + 1;<br /> end<br /> end<br /> end<br /> % Now compute the percent error:<br /> %disp('End of analysis. Run time and results below.')<br /> TotalSteps = nt*ns;<br /> PerBN = BadN/TotalSteps;<br /> PerBFS = BadFS/TotalSteps;<br /> NorseQual(i,j) = PerBN;<br /> FSQual(i,j) = PerBFS;<br /> TimeN(i,j) = T(1,1);<br /> TimeFS(i,j) = T(1,2);<br /> end<br />end<br />% These are the output files:</p>
<p>csvwrite('BadFitPercentN.csv',NorseQual)<br />csvwrite('BadFitPercentFS.csv',FSQual)<br />csvwrite('TimeN.csv',TimeN)<br />csvwrite('TimeFS.csv',TimeFS)<br /> <br /> <br />return<br />end</p>
<p>function [M,T] = TraceFitter(RTD)<br />% This function fits the generated traces with both the NoRSE program and<br />% the fit states program.<br />% ----- Step 1: Read in and the Traces --------<br />%<br />%Determine the number of time steps (Ns) and number of transducers (Nt):<br />[Ns,Nt]= size(RTD);<br />RTD_n = RTD;<br />%<br />M = zeros(Ns,Nt,4); % First level Norse fit, second level FS<br />T = [0 0]; % 1 = NoRSE Time, 2 = FS time</p>
<p>%############# NoRSE ALGORITHM ############################################<br />%<br />% ---------------- Step 2: Denoise each of the Traces --------------------<br />%<br />tic<br />%The denoising algorithm loses the first and last time steps, so set a new<br />%time window:<br />time = Ns - 2;<br />%Create an empty matrix to recieve the denoised trace data:<br />T_denoise = zeros(time,Nt);<br />for i = 1:Nt<br /> NT = RTD_n(:,i);<br /> IC = feval(@NoiseReduc,NT);<br /> T_denoise(:,i) = IC(:,1);<br />end<br />% csvwrite(['T_denoise',int2str(iii),'.csv'],T_denoise);<br />%<br />%---------------- Step 3: Construct APH for each trace ------------------<br />%<br />%Number of bins for the histogram construction?<br />nbins = 200;<br />%Create empty recieving matrices for histogram counts (nM) and the bin<br />%centers (xM)<br />nM = zeros(Nt,nbins);<br />xM = zeros(Nt,nbins);<br />for i = 1:Nt<br /> Y = T_denoise(:,i);<br /> [n,xout] = hist(Y,nbins);<br /> nM(i,:) = n(1,:);<br /> xM(i,:) = xout(1,:);<br />end<br />%xlswrite('nM',nM);<br />%xlswrite('xM',xM);<br />%<br />%---------------- Step 4: Find peaks for each APH histrogram ------------<br />%<br />%Use PeakFinder Function to find the Average Peak Locations (APL) - these<br />%translate to the bin number <br />APL = feval(@PeakFinder,nM);<br />[Nt,Nlev] = size(APL);<br />%<br />%Translate the APL bin numbers back to the normalized intensity level to<br />%determine the stochastic step levels (SLL)<br />SLL = zeros(Nt,Nlev);<br />for i = 1:Nt<br /> % Very rarely, the peak finder will flag no bins as peaks, in this case<br /> % we will use a general gaussian fit of the denoised data to find a<br /> % mean step value.<br /> if sum(APL(i,:)) == 0<br /> Ncurves = 1;<br /> X = T_denoise(i,:);<br /> [u,~,~,~] = feval(@fit_mix_gaussian,X,Ncurves);<br /> SLL(i,1) = u;<br /> else<br /> for j = 1:Nlev<br /> if APL(i,j) ~= 0<br /> N_bin = APL(i,j);<br /> SLL(i,j) = xM(i,N_bin);<br /> end<br /> end<br /> end<br />end<br />%<br />%----------- Step 5 Determine the NoRSE Fit Trace ---------------<br />%<br />for ii = 1:Nt<br /> RealT = T_denoise(:,ii);<br /> count = 0;<br /> for m = 1:Nlev<br /> if SLL(ii,m) > 0<br /> count = count + 1;<br /> end<br /> end<br /> for i = 1:time<br /> compare = zeros(count,1);<br /> for j = 1:count<br /> compare(j,1) = abs(RealT(i,1)-SLL(ii,j));<br /> end<br /> [C,I] = min(compare);<br /> M(i+1,ii,1) = SLL(ii,I);<br /> end<br /> % Because the noise reduction program takes away the first and last time<br /> % point, estimate their values as those immediately following and<br /> % proceeding respectively<br /> M(1,ii,1) = M(2,ii,1); <br /> M(time+2,ii,1) = M(time+1,ii,1);<br />end<br />%<br />T(1,1) = toc;<br />%<br />%<br />% #############################################<br />%---------- FitStates Analyzer ---------------- (Added 1.15.2011 by NFR)<br />% #############################################<br />%<br />%<br />tic<br />forward_trans = 0;<br />NonNormTraces = RTD;<br />[num_rows, num_cols] = size(NonNormTraces);<br />Ftime = 1:1:num_rows;<br />% Make empty matrix to recieve the traces and their fits<br />count = 0;<br />GoodTrace2 = 0;<br />for i = 1:num_cols <br /> Tracemax = max(NonNormTraces(:,i)); <br /> Tracemin = min(NonNormTraces(:,i));<br /> data3 = 1000*(NonNormTraces(:,i) - Tracemin)/(Tracemax - Tracemin);<br /> data2 = 1000 - data3;<br /> MaxNumofStates = 30;<br /> maxN = 30;<br /> [NumofStates, BestFitTraces] = fitStates(maxN, Ftime, data2, data3, MaxNumofStates);<br /> M(:,i,2) = BestFitTraces; % Best fit trace from Fit States<br />end <br />T(1,2) = toc;<br />return<br />end<br />%<br />%<br />% <--------------Functions embeded in the Trace Analyzer ------------><br />%<br />function IC = NoiseReduc(NT)<br />%<br />%Coded by Nigel Reuel on 8.31.2010<br />%<br />%Adaptation of Chung and Kennedy (1991) forward-backward non-linear <br />%filtering technique to reduce the noise of our<br />%fluorescent signal in an attempt to resolve the single molecule events<br />%amidst the noise.<br />%<br />%Input: Takes supplied noisy trace (NT)<br />%Output: Returns cleaned trace without the first and last data point<br />%<br />time = length(NT);<br />%<br />%Set algorithm parameters:<br />N = [4 8 16 32]; %Sampling window length<br />K = length(N); %Number of sampling windows<br />M = 10; %Weighting parameter length<br />P = 40; %Parameter that effects sharpness of steps<br />%<br />% -------- The Forward-Backward non-linear Algorithm -----------<br />%<br />%First - Run each time step and store the average forward and backward<br />%predictors for each of the sampling windows<br />%<br />I_avg_f = zeros(time,K);<br />I_avg_b = zeros(time,K);<br />for g = 1:time<br /> %Run for each sampling window<br /> for k = 1:K<br /> %Solve for the average forward predictor (w/ some<br /> %logic to help solve the beginning of the trace<br /> window = N(k);<br /> if g == 1<br /> I_avg_f(g,k) = NT(1,1);<br /> elseif g - window - 1 < 0 <br /> I_avg_f(g,k) = sum(NT(1:g-1,1))/g;<br /> else <br /> epoint = g - window;<br /> spoint = g - 1;<br /> I_avg_f(g,k) = sum(NT(epoint:spoint,1))/window;<br /> end<br /> %Now do the same for the backward predictor<br /> if g == time<br /> I_avg_b(g,k) = NT(g,1);<br /> elseif g + window > time<br /> sw = time - g;<br /> I_avg_b(g,k) = sum(NT(g+1:time,1))/sw;<br /> else <br /> epoint = g + window;<br /> spoint = g + 1;<br /> I_avg_b(g,k) = sum(NT(spoint:epoint,1))/window;<br /> end<br /> end<br />end<br />%<br />%Second - Solve for non-normalized forward and backward weights:<br />f = zeros(time,K);<br />b = zeros(time,K);<br />for i = 1:time<br /> for k = 1:K<br /> Mstore_f = zeros(M,1);<br /> Mstore_b = zeros(M,1);<br /> for j = 0:M-1<br /> t_f = i - j;<br /> t_b = i + j;<br /> if t_f < 1<br /> Mstore_f(j+1,1) = (NT(i,1) - I_avg_f(i,k))^2;<br /> else<br /> Mstore_f(j+1,1) = (NT(t_f,1) - I_avg_f(t_f,k))^2;<br /> end<br /> if t_b > time<br /> Mstore_b(j+1,1) = (NT(i,1) - I_avg_b(i,k))^2;<br /> else<br /> Mstore_b(j+1,1) = (NT(t_b,1) - I_avg_b(t_b,k))^2;<br /> end<br /> end<br /> f(i,k) = sum(Mstore_f)^(-P);<br /> b(i,k) = sum(Mstore_b)^(-P);<br /> end<br />end<br />% Third - Solve for vector of normalization factors for the weights<br />C = zeros(time,1);<br />for i = 1:time<br /> Kstore = zeros(K,1);<br /> for k = 1:K<br /> Kstore(k,1) = f(i,k) + b(i,k);<br /> end<br /> C(i,1) = 1/sum(Kstore);<br />end<br />% Fourth and final step - Put all parameters together to solve for the<br />% intensities<br />Iclean = zeros(time,1);<br />for i = 1:time<br /> TempSum = zeros(K,1);<br /> for k = 1:K<br /> TempSum(k,1) = f(i,k)*C(i,1)*I_avg_f(i,k) + b(i,k)*C(i,1)*I_avg_b(i,k);<br /> end<br /> Iclean(i,1) = sum(TempSum);<br />end<br />IC = Iclean(2:time-1,1);<br />return<br />end</p>
<p>function PeakLocAvg = PeakFinder(nM)<br />% Coded by Nigel Reuel on 9.15.2010 updated in March 2011 by NFR<br />% This function finds the peaks of a histogram using a modified running<br />% average algorithm + peak flagging routine<br />%<br />[nT,nB] = size(nM);<br />% <br />%Size of window for running average algorithm (increasing this causes more<br />%of a rough approximation of peaks)<br />SW = 3;<br />% Centered running average algorithm to help find peaks:<br />% <br />T_ra = zeros(nT,nB);<br />for j = 1:nT<br /> for i = 1:nB<br /> if i < (SW + 1)<br /> T_ra(j,i) = sum(nM(j,1:i+SW))/(i+SW);<br /> elseif i >= (SW+1) && i <= (nB-SW)<br /> T_ra(j,i) = sum(nM(j,i-SW:i+SW))/(SW*2+1);<br /> else<br /> T_ra(j,i) = sum(nM(j,i:nB))/(nB-i+1);<br /> end<br /> end<br />end<br />%xlswrite('T_ra',T_ra);<br />%<br />%----------------------Peak finding algorithm---------------------------<br />%<br />% Base function on physical example of climbing up and down hills. Drop<br />% flags on supposed peaks. Eliminate "peaks" that are molehills. Combine<br />% peaks that are close neighbors.<br />Flags = zeros(nT,nB,6);<br />% Layer 1 = Peak Flag<br />% Layer 2 = Right Valley Bin#<br />% Layer 3 = Left Valley Bin#<br />% Layer 4 = Peak magnitude (Histogram Count)<br />% Layer 5 = Right Valley magnitude (Histogram Count)<br />% Layer 6 = Left Valley magnitude (Histogram Count)<br />% Molehill level (number of responses that are not significant - avg!)<br />MoleL = (5/SW)*.5; % = #responses/#bins/SW = Avg. Response, no significance)<br />PeakLocAvg = zeros(nT,2);<br />for i = 1:nT<br /> for j = 1:nB<br /> if T_ra(i,j) > MoleL<br /> % Logic to find significant peaks at endpoints (and the R/L<br /> % valleys)<br /> if j == 1 && T_ra(i,j) > T_ra(i,j+1)<br /> Flags(i,j,1) = 1;<br /> Flags(i,j,4) = T_ra(i,j);<br /> Pval = T_ra(i,j);<br /> Pnext = T_ra(i,j+1);<br /> index = j;<br /> % Find right hand valley:<br /> while Pval > Pnext<br /> index = index + 1;<br /> Pval = T_ra(i,index);<br /> Pnext = T_ra(i,index+1);<br /> end<br /> Flags(i,j,2) = index;<br /> Flags(i,j,5) = T_ra(i,index);<br /> elseif j == 1 && T_ra(i,j) <= T_ra(i,j+1)<br /> Flags(i,j,1) = 0;<br /> elseif j == nB && T_ra(i,j) > T_ra(i,j-1)<br /> Flags(i,j,1) = 1;<br /> Flags(i,j,4) = T_ra(i,j);<br /> Pval = T_ra(i,j);<br /> Pprior = T_ra(i,j-1);<br /> index = j;<br /> % Find left hand valley:<br /> while Pval > Pprior<br /> index = index - 1;<br /> Pval = T_ra(i,index);<br /> Pprior = T_ra(i,index-1);<br /> end<br /> Flags(i,j,3) = index;<br /> Flags(i,j,6) = T_ra(i,index);<br /> elseif j == nB && T_ra(i,j) <= T_ra(i,j-1)<br /> Flags(i,j,1) = 0;<br /> % Logic to find peaks at all midpoints (and their R/L valleys)<br /> elseif T_ra(i,j) >= T_ra(i,j-1) && T_ra(i,j) >= T_ra(i,j+1)<br /> Flags(i,j,1) = 1;<br /> Flags(i,j,4) = T_ra(i,j);<br /> % Find left hand valley:<br /> Pval = T_ra(i,j);<br /> Pprior = T_ra(i,j-1);<br /> index = j;<br /> if j > 2<br /> Switch = 1;<br /> else<br /> Switch = 0;<br /> index = 1;<br /> end<br /> while Pval > Pprior && Switch == 1;<br /> index = index - 1;<br /> Pval = T_ra(i,index);<br /> Pprior = T_ra(i,index-1);<br /> if index == 2<br /> Switch = 0;<br /> index = 1;<br /> end<br /> end<br /> Flags(i,j,3) = index;<br /> Flags(i,j,6) = T_ra(i,index);<br /> % Find right hand valley:<br /> Pval = T_ra(i,j);<br /> Pnext = T_ra(i,j+1);<br /> index = j;<br /> if j < nB - 1<br /> Switch = 1;<br /> else<br /> Switch = 0;<br /> index = nB;<br /> end<br /> while Pval > Pnext && Switch == 1;<br /> index = index + 1;<br /> Pval = T_ra(i,index);<br /> Pnext = T_ra(i,index+1);<br /> if index == nB - 1;<br /> Switch = 0;<br /> index = nB;<br /> end<br /> end<br /> Flags(i,j,2) = index;<br /> Flags(i,j,5) = T_ra(i,index);<br /> end<br /> end<br /> end<br /> %{<br />X = (1:200)';<br />Y1 = Flags(i,:,1)';<br />Y2 = T_ra(i,:);<br />plot(X,Y1,X,Y2)<br /> %}<br />%Make a vector containing the peak locations:<br />PeakVec = 0;<br />count = 1;<br /> for j = 1:nB<br /> if Flags(i,j,1) == 1<br /> PeakVec(1,count) = j;<br /> count = count + 1;<br /> end<br /> end<br /> % Now analyze peak pairs:<br /> % Specify significant peak valley distance and significant bin<br /> % distance:<br /> SigD = (5/SW)*.5; % = the average response<br /> SigBinD = 200*(.025); % = 2.5% of the total bin #.<br /> Npeaks = length(PeakVec);<br /> % Determine the peak pairwise interactions...each will have a right<br /> % hand value and left hand value (except for the endpoints):<br /> % 1: Unique<br /> % 2: Combine<br /> % 3: Ignore<br /> % Create a matrix to recieve these calculations:<br /> PeakCode = zeros(2,Npeaks); % Row1 = to right, Row2 = to left<br /> for j = 1:Npeaks-1<br /> % Determine the peak bin numbers:<br /> PBN1 = PeakVec(1,j);<br /> PBN2 = PeakVec(1,j+1);<br /> % Calcualate the valley distances (left to right)<br /> VD1 = abs(Flags(i,PBN1,4) - Flags(i,PBN1,5)); <br /> VD2 = abs(Flags(i,PBN2,6) - Flags(i,PBN2,4));<br /> %<br /> if VD1 <= SigD && VD2 <= SigD <br /> % Combine<br /> PeakCode(1,j) = 2;<br /> PeakCode(2,j+1) = 2;<br /> elseif VD1 >= SigD && VD2 <= SigD<br /> % Left Peak sig, right peak ignore<br /> PeakCode(1,j) = 1;<br /> PeakCode(2,j+1) = 3;<br /> elseif VD1 <= SigD && VD2 >= SigD<br /> % Left peak ignore, right peak significant<br /> PeakCode(1,j) = 3;<br /> PeakCode(2,j+1) = 1;<br /> elseif VD1 >= SigD && VD2 >= SigD<br /> % Both peaks are significant<br /> PeakCode(1,j) = 1;<br /> PeakCode(2,j+1) = 1;<br /> end<br /> end<br /> % Logic for the peak endpoints<br /> Left = abs(Flags(i,PeakVec(1,1),4) - Flags(i,PeakVec(1,1),6));<br /> if Left >= SigD<br /> PeakCode(2,1) = 1;<br /> else<br /> PeakCode(2,1) = 3;<br /> end<br /> Right = abs(Flags(i,PeakVec(1,Npeaks),4) - Flags(i,PeakVec(1,Npeaks),5));<br /> if Right >= SigD<br /> PeakCode(1,Npeaks) = 1;<br /> else<br /> PeakCode(1,Npeaks) = 3;<br /> end<br /> % Now determine what to do with the peaks. Look at combinations<br /> % reading left to right. Put signifcant peaks into a vector.<br /> SPV = 0;<br /> j = 1;<br /> count = 1;<br /> while j <= Npeaks<br /> C1 = PeakCode(1,j);<br /> C2 = PeakCode(2,j);<br /> if C1 == 1 && C2 == 1<br /> % This is a bonafide unique peak. Put it in the vector.<br /> SPV(1,count) = PeakVec(1,j);<br /> count = count + 1;<br /> j = j+1;<br /> elseif C1 == 2 <br /> % This peak will be merged with other peak(s)<br /> count2 = 2;<br /> Merge = PeakVec(1,j);<br /> while C1 == 2<br /> j = j + 1;<br /> C1 = PeakCode(1,j);<br /> Merge(1,count2) = PeakVec(1,j);<br /> count2 = count2 + 1;<br /> end<br /> % Insert logic here to eliminate valleys and slow rises (see<br /> % what is happening at the significant enpoints!)<br /> End1 = Flags(i,Merge(1,1),4) - Flags(i,Merge(1,1),6);<br /> End2 = Flags(i,Merge(1,end),4) - Flags(i,Merge(1,end),5);<br /> Distance = Merge(1,end) - Merge(1,1);<br /> if End1 >= SigD && End2 >= SigD<br /> SPV(1,count) = round(mean(Merge));<br /> count = count + 1;<br /> j = j + 1;<br /> elseif End1 >= SigD && End2 <= (SigD)*(-1) && Distance >= SigBinD<br /> SPV(1,count) = round(mean(Merge));<br /> count = count + 1;<br /> j = j + 1;<br /> elseif End1 <= SigD*(-1) && End2 >= (SigD) && Distance >= SigBinD<br /> SPV(1,count) = round(mean(Merge));<br /> count = count + 1;<br /> j = j + 1;<br /> end<br /> else % Represents 1/3, 3/1, and 3/3 combos which are all ignored!!<br /> j = j + 1;<br /> end<br /> end<br /> Nsigpeaks = length(SPV);<br /> for j = 1:Nsigpeaks<br /> PeakLocAvg(i,j) = SPV(1,j);<br /> end<br /> %{<br /> % Plot the PeakLocAvg to see how the algorithm performs)<br /> S = Nsigpeaks;<br /> FF = zeros(1,nB);<br /> for k = 1:S<br /> Temp = PeakLocAvg(i,k);<br /> FF(1,Temp) = 1;<br /> end<br /> X = (1:200)';<br /> Y1 = FF(1,:)';<br /> Y2 = T_ra(i,:);<br /> plot(X,Y1,X,Y2)<br /> stop = 1;<br /> %} <br />end</p>
<p><br />return<br />end</p>
<p>%% --------------------------- FIT STATES --------------------------------<br />function [NumofStates, BestFitTraces] = fitStates(maxN, time_vector, data2, data3, MaxNumofStates)<br />%This function determines the number of states for each trace</p>
<p>%INPUTS: <br />% maxN maximum number of transitions for each traces<br />% time_vector vector of times corresponding to each frame<br />% data2 second column of normalized data in *.dat files<br />% data3 third column of normalized data in *.dat files</p>
<p>%OUTPUTS<br />% NumofStates row vector containing number of states for each trace<br />% BestFitTraces matrix showing the current states occupied by each<br />% trace for each frame (crude approximation)</p>
<p>A{1} = time_vector';<br />[row, cols] = size(data2); %# rows = # timeframes, # cols = # SWNT<br />for q = 1:1:cols %Loop through each SWNT trace<br /> %redefine variable to match formatting used in StatesFinder<br /> A{2} = data2(:,q);<br /> A{3} = data3(:,q);</p>
<p>time=A{1};<br /> data=A{3}./1000;<br /> timeinc=time(2,1)-time(1,1);<br /> fulldata=[A{1} A{2} A{3}];</p>
<p>fitdata=data;<br /> antifit=data;</p>
<p>resultArray=zeros(maxN,3); %first column the begnning index. second column is the length. third colum is the step fit.<br /> resultArray(1,1)=1;<br /> resultArray(1,2)=length(data);<br /> resultArray(1,3)=mean(data);<br /> fitdata(:)=resultArray(1,3);<br /> chisquaredFit=zeros(maxN,1);<br /> chisquaredAntifit=zeros(maxN,1);<br /> chisquaredFit(1)=sum((data-fitdata).^2,1);<br /> SS=zeros(maxN,1);<br /> SS(1)=1;<br /> <br /> flag=0;<br /> resultArray0 = resultArray;<br /> for i=1:(maxN-1),<br /> para=0;<br /> for j=1:i,<br /> temp=resultArray(j,1);<br /> temp2=resultArray(j,2);<br /> B=data(temp:(temp2+temp-1));<br /> [StepSize,StepLocation,left,right]=TJstepFinder(B);<br /> temp3=StepSize*(temp2)^0.5;<br /> if temp3 > para %optimal num. of transitions (maximize temp3)<br /> para=temp3;<br /> newBegin1=temp;<br /> newLength1=StepLocation-1;<br /> newValue1=left;<br /> newBegin2=temp+StepLocation-1;<br /> newLength2=temp2-newLength1;<br /> newValue2=right;<br /> bestJ=j;<br /> end<br /> end<br /> <br /> resultArray(bestJ,1)=newBegin1;<br /> resultArray(bestJ,2)=newLength1;<br /> resultArray(bestJ,3)=newValue1;<br /> resultArray(i+1,1)=newBegin2;<br /> resultArray(i+1,2)=newLength2;<br /> resultArray(i+1,3)=newValue2;<br /> currentFitArray=sort_on_key(resultArray(1:i+1,:),1);<br /> antiFitArray=zeros(i+2,3);<br /> antiFitArray(1,1)=1;<br /> antiFitArray(1,2)=length(data);<br /> for j=1:(i+1),<br /> temp=currentFitArray(j,1);<br /> temp2=currentFitArray(j,2);<br /> if (temp2 == 1)<br /> flag = 1;<br /> break;<br /> end<br /> B=data(temp:(temp2+temp-1));<br /> [StepSize,StepLocation,left,right]=TJstepFinder(B);<br /> antiFitArray(j+1,1)=temp+StepLocation-1;<br /> end<br /> if (flag == 1)<br /> break;<br /> end<br /> for j=1:i+1,<br /> antiFitArray(j,2)=antiFitArray(j+1,1)-antiFitArray(j,1);<br /> antiFitArray(j,3)=mean(data(antiFitArray(j,1):(antiFitArray(j,2)+antiFitArray(j,1)-1)));<br /> end<br /> antiFitArray(i+2,2)=length(data)-antiFitArray(i+2,1)+1;<br /> antiFitArray(i+2,3)=mean(data(antiFitArray(i+2,1):end));<br /> <br /> for j=1:i+1,<br /> fitdata(resultArray(j,1):(resultArray(j,2)+resultArray(j,1)-1))=resultArray(j,3);<br /> end<br /> for j=1:i+2,<br /> antifit(antiFitArray(j,1):antiFitArray(j,2)+antiFitArray(j,1)-1)=antiFitArray(j,3);<br /> end<br /> <br /> chisquaredFit(i+1)=sum((data-fitdata).^2,1);<br /> chisquaredAntifit(i+1)=sum((data-antifit).^2,1);<br /> SS(i+1)=chisquaredAntifit(i+1)/chisquaredFit(i+1);<br /> <br /> resultArray0 = resultArray;<br /> end</p>
<p>resultArray = resultArray0;</p>
<p>[C,I]=max(SS);</p>
<p>bestfitnumber=I;<br /> resultArray(1,1)=1;<br /> resultArray(1,2)=length(data);<br /> resultArray(1,3)=mean(data);<br /> fitdata(:)=resultArray(1,3);<br /> chisquaredFit=zeros(maxN,1);<br /> chisquaredFit(1)=sum((data-fitdata).^2,1);<br /> for i=1:(bestfitnumber),<br /> para=0;<br /> for j=1:i,<br /> temp=resultArray(j,1);<br /> temp2=resultArray(j,2);<br /> B=data(temp:(temp2+temp-1));<br /> [StepSize,StepLocation,left,right]=TJstepFinder(B);<br /> temp3=StepSize*(temp2)^0.5;<br /> if temp3 > para,<br /> para=temp3;<br /> newBegin1=temp;<br /> newLength1=StepLocation-1;<br /> newValue1=left;<br /> newBegin2=temp+StepLocation-1;<br /> newLength2=temp2-newLength1;<br /> newValue2=right;<br /> bestJ=j;<br /> end<br /> end<br /> <br /> resultArray(bestJ,1)=newBegin1;<br /> resultArray(bestJ,2)=newLength1;<br /> resultArray(bestJ,3)=newValue1;<br /> resultArray(i+1,1)=newBegin2;<br /> resultArray(i+1,2)=newLength2;<br /> resultArray(i+1,3)=newValue2;<br /> <br /> end</p>
<p>for j=1:bestfitnumber+1,<br /> fitdata(resultArray(j,1):(resultArray(j,2)+resultArray(j,1)-1))=resultArray(j,3);<br /> end<br /> finalresult=sort_on_key(resultArray((1:bestfitnumber+1),:),1);<br /> finalresult(:,1)=(finalresult(:,1)-1)*timeinc+time(1);<br /> finalresult(:,2)=finalresult(:,2)*timeinc;<br /> <br /> if MaxNumofStates == 10 %if maximum number of states is 10<br /> statesnum=length(unique(round(10*finalresult(:,3))));<br /> rounded_fit = round(fitdata*10)/10;<br /> fitdata_average = zeros(1,length(fitdata));<br /> for i = 1:1:length(rounded_fit)<br /> if rounded_fit(i) ~= -1<br /> sameState_coord = find(rounded_fit==rounded_fit(i));<br /> total_sum = 0;<br /> for j = 1:1:length(sameState_coord)<br /> total_sum = total_sum+fitdata(sameState_coord(j));<br /> rounded_fit(sameState_coord(j)) = -1;<br /> end<br /> average_state = total_sum/(length(sameState_coord));<br /> for k = 1:1:length(sameState_coord)<br /> fitdata_average(sameState_coord(k)) = average_state;<br /> end<br /> end<br /> end<br /> else %if maximum number of states is something other than 10<br /> %from the fit, group states with similar values according to the<br /> %maximum numnber of states as specified by the user. First, create a<br /> %vector of states between 0 and 1 consisting of number of states<br /> %specified by the user. Then find the difference between each of these<br /> %states and the fitdata value, and whichever state i closest to the<br /> %fitdata value is determined to be the state.<br /> possible_states = 0:(1/MaxNumofStates):1;<br /> rounded_fit = zeros(1, length(fitdata));<br /> for i = 1:1:length(fitdata)<br /> difference = abs(fitdata(i)-possible_states);<br /> [dum, state_coord] = min(difference); %find the state closest to the fit data<br /> rounded_fit(i) = possible_states(state_coord);<br /> end<br /> %Now start to average all the states that were grouped into having the<br /> %same rounded state<br /> fitdata_average = zeros(1,length(fitdata));<br /> for i = 1:1:length(rounded_fit)<br /> if rounded_fit(i) ~= -1<br /> sameState_coord = find(rounded_fit==rounded_fit(i));<br /> total_sum = 0;<br /> for j = 1:1:length(sameState_coord)<br /> total_sum = total_sum+fitdata(sameState_coord(j));<br /> rounded_fit(sameState_coord(j)) = -1;<br /> end<br /> average_state = total_sum/(length(sameState_coord));<br /> for k = 1:1:length(sameState_coord)<br /> fitdata_average(sameState_coord(k)) = average_state;<br /> end<br /> end<br /> end<br /> statesnum = length(unique(fitdata_average));<br /> end<br /> output(q,:) = [statesnum, fitdata_average]; <br />end</p>
<p>NumofStates = output(:,1)';<br />BestFitTraces = output(:, 2:end)';</p>
<p>return;<br />end</p>
<p>%% ----------------------- TJ STEP FINDER ---------------------------------<br />function [StepSize,StepLocation,left,right]=TJstepFinder(B)</p>
<p>%This function is called upon by the fitStates function to locate steps in<br />%the traces.</p>
<p>global C; %this is what we send to TJflatFit</p>
<p>lengthB=length(B);</p>
<p>%Initial conditions<br />fitResult=B;<br />tempfit=B;<br />StepSize=0;<br />bestchisquared=10^10;<br />time=(1:lengthB);<br />StepLocation=1;<br />C=zeros(1,2);<br />C(:,1)=(1:1)';<br />C(:,2)=B(1:1);<br />A1=mean(C(:,2));<br />tempfit(1:1)=A1;<br />C=zeros((lengthB-1),2);<br />C(:,1)=((1+1):lengthB)';<br />C(:,2)=B((1+1):lengthB);<br />A2=mean(C(:,2));<br />StepSize=abs(A2-A1);<br />left=A1;<br />right=A2;</p>
<p>for i=1:(lengthB-1),<br /> C=zeros(i,2);<br /> C(:,1)=(1:i)';<br /> C(:,2)=B(1:i);<br /> A1=mean(C(:,2));%fmins('TJflatFit',mean(C(:,2)));<br /> tempfit(1:i)=A1;<br /> C=zeros((lengthB-i),2);<br /> C(:,1)=((i+1):lengthB)';<br /> C(:,2)=B((i+1):lengthB);<br /> A2=mean(C(:,2));%fmins('TJflatFit',mean(C(:,2)));<br /> tempfit((i+1):lengthB)=A2;<br /> chisquared=sum((tempfit-B).^2,1);<br /> if chisquared < bestchisquared,<br /> bestchisquared=chisquared;<br /> fitResult=tempfit;<br /> StepSize=abs(A2-A1);<br /> StepLocation=i+1;<br /> left=A1;<br /> right=A2;<br /> end<br />end</p>
<p>return;<br />end</p>
<p><br />%% -----------------------SORT ON KEY -------------------------------------</p>
<p>function sorteer=sort_on_key(rij,key)</p>
<p>%This function is called upon by the fitStates function to locate steps in<br />%the traces. It reads a (index,props) array ands sorts along the index with one of the<br />%props as sort key</p>
<p>size=length(rij(:,1));<br />sorteer=0*rij;<br />buf=rij;<br />for i=1:size<br /> [g,h]=min(buf(:,key));<br /> sorteer(i,:)=buf(h,:);<br /> buf(h,key)=max(buf(:,key))+1;<br />end</p>
<p>return;<br />end</p>
<p>function M = GenTrace(A,B,C,numTr,Ttotal)<br />% Coded by Nigel Reuel on 3.3.2011<br />% This function generates general step traces according to three<br />% parameters which encompass all types of step data:<br />% A - tendency up / tendency down<br />% B - avg event time / signal sampling time<br />% C - avg step size / avg noise<br />% It generates a user-specified number of traces with these three<br />% parameters and returns a 3D matrix of the traces (first level - real<br />% response, 2nd level - noise). To aid in comparing traces, all traces are<br />% normalized on a scale of 0 to 1.<br />%<br />%<br />Tdown = 1;<br />Tup = A;<br />Tprob = Tdown + Tup;<br />%SigSampT = 1; %< --- Not used, but coded in my index numbers<br />AvgEventT = B;<br />AvgNoise = 1;<br />AvgStep = C;<br />%<br />% Set total simulation time and number of traces: - Now function input<br />% Ttotal = 1000;<br />% Initialize solution matrix:<br />M = zeros(Ttotal,numTr,2);<br />% Generate the traces<br />for i = 1:numTr<br /> t = 1;<br /> % Signal start (arbitrary due to normalization):<br /> S = 100;<br /> % Track signal changes:<br /> SigChange = [0 0];<br /> num = 1;<br /> while t < Ttotal<br /> SigChange(num,1) = t;<br /> % 1) Determine step size:<br /> SS = rand()*2*AvgStep;<br /> % 2) Determine the direction:<br /> rand1 = rand();<br /> check1 = rand1*Tprob;<br /> if check1 < Tdown <br /> S = S - SS;<br /> else<br /> S = S + SS;<br /> end<br /> SigChange(num,2) = S;<br /> num = num + 1; <br /> % 3) Determine the next event time:<br /> tstep = rand()*2*AvgEventT;<br /> t = t + tstep;<br /> end<br /> % Turn signal change matrix into real trace and trace w/ noise<br /> Real_Tr = zeros(Ttotal,1);<br /> Noise_Tr = zeros(Ttotal,1);<br /> Intervals = round(SigChange(:,1));<br /> Intervals(num,1) = Ttotal;<br /> for j = 1:num-1<br /> for k = Intervals(j,1):Intervals(j+1,1)<br /> Real_Tr(k,1) = SigChange(j,2);<br /> NoiseL = (rand()*2-1)*AvgNoise;<br /> Noise_Tr(k,1) = SigChange(j,2) + NoiseL;<br /> end<br /> end<br /> %Normalize according to max and min levels in traces<br /> Max(1,1) = max(Real_Tr);<br /> Max(2,1) = max(Noise_Tr);<br /> Min(1,1) = min(Real_Tr);<br /> Min(2,1) = min(Noise_Tr);<br /> MaxN = max(Max);<br /> MinN = min(Min);<br /> SigDiff = MaxN-MinN;<br /> Real_n = (Real_Tr-MinN)./SigDiff;<br /> Noise_n = (Noise_Tr-MinN)./SigDiff;<br /> M(:,i,1) = Real_n;<br /> M(:,i,2) = Noise_n;<br /> % Graphical check of traces:<br /> %{<br /> X = 1:1000;<br /> X = X';<br /> plot(X,Real_n,X,Noise_n);<br /> %}<br />end<br />return<br />end</p>
<p>function [u,sig,t,iter] = fit_mix_gaussian( X,M )<br />%<br />% Gaussian Fit Function by Ohad Gal (c) 2003<br />% Download at:<br />% <a href="http://www.mathworks.com/matlabcentral/fileexchange/4222-a-collection-of-fitting-functions">http://www.mathworks.com/matlabcentral/fileexchange/4222-a-collection-of-fitting-functions</a><br />% on 3.21.2011<br />%<br />% fit_mix_gaussian - fit parameters for a mixed-gaussian distribution using EM algorithm<br />%<br />% format: [u,sig,t,iter] = fit_mix_gaussian( X,M )<br />%<br />% input: X - input samples, Nx1 vector<br />% M - number of gaussians which are assumed to compose the distribution<br />%<br />% output: u - fitted mean for each gaussian<br />% sig - fitted standard deviation for each gaussian<br />% t - probability of each gaussian in the complete distribution<br />% iter- number of iterations done by the function<br />%<br />% initialize and initial guesses<br />N = length( X );<br />Z = ones(N,M) * 1/M; % indicators vector<br />P = zeros(N,M); % probabilities vector for each sample and each model<br />t = ones(1,M) * 1/M; % distribution of the gaussian models in the samples<br />u = linspace(min(X),max(X),M); % mean vector<br />sig2 = ones(1,M) * var(X) / sqrt(M); % variance vector<br />C = 1/sqrt(2*pi); % just a constant<br />Ic = ones(N,1); % - enable a row replication by the * operator<br />Ir = ones(1,M); % - enable a column replication by the * operator<br />Q = zeros(N,M); % user variable to determine when we have converged to a steady solution<br />thresh = 1e-3;<br />step = N;<br />last_step = inf;<br />iter = 0;<br />min_iter = 10;</p>
<p>% main convergence loop, assume gaussians are 1D<br />while ((( abs((step/last_step)-1) > thresh) & (step>(N*eps)) ) | (iter<min_iter) ) <br /> <br /> % E step<br /> % ========<br /> Q = Z;<br /> P = C ./ (Ic*sqrt(sig2)) .* exp( -((X*Ir - Ic*u).^2)./(2*Ic*sig2) );<br /> for m = 1:M<br /> Z(:,m) = (P(:,m)*t(m))./(P*t(:));<br /> end<br /> <br /> % estimate convergence step size and update iteration number<br /> prog_text = sprintf(repmat( '\b',1,(iter>0)*12+ceil(log10(iter+1)) ));<br /> iter = iter + 1;<br /> last_step = step * (1 + eps) + eps;<br /> step = sum(sum(abs(Q-Z)));<br /> fprintf( '%s%d iterations\n',prog_text,iter );</p>
<p>% M step<br /> % ========<br /> Zm = sum(Z); % sum each column<br /> Zm(find(Zm==0)) = eps; % avoid devision by zero<br /> u = (X')*Z ./ Zm;<br /> sig2 = sum(((X*Ir - Ic*u).^2).*Z) ./ Zm;<br /> t = Zm/N;<br />end</p>
<p>sig = sqrt( sig2 );</p>
<p>return<br />end</p></div><div class="feed-description"><p>function NoRSE_SFA_Comparison<br />% Finished by Nigel F. Reuel on 3.31.2011<br />% Massachusetts Institute of Technology<br />% Department of Chemical Engineering<br />%<br />% This code uses the generalized model of stochastic data sets to generate<br />% traces with embeded event states. It is currently set to compare NoRSE<br />% with the SFA algorithm. For each parameter set (A,B,C) it does the<br />% following:<br />%<br />% 1) Generates 100 traces<br />% 2) Fits traces with NoRSE<br />% 3) Fits traces with SFA<br />% 4) Computes average error for each algorithm<br />% 5) Reports the average times to run the 100 traces through NoRSE and SFA<br />%<br />% The errors and run times are then recorded in a *.csv file for further<br />% analysis.<br />%<br />% The program goes as follows:<br />%<br />% 1) Specify the trace generator parameters<br />%<br />A = 1;<br />B = [5 7 10 15 20 35 50 75 100 125 150 175 200 250 300 400 500];<br />C = [0.1 .15 .2 .3 .35 .4 .45 .5 .6 .7 .8 .85 .9 1 2 3 4 5 6 7 8 9 10];</p>
<p>% Number of traces: <br />nt = 100;<br />% Number of steps:<br />ns = 1000;<br />% Answer matrices:<br />NorseQual = zeros(length(B),length(C));<br />FSQual = zeros(length(B),length(C));<br />TimeN = zeros(length(B),length(C));<br />TimeFS = zeros(length(B),length(C));<br />%<br />for i = 1:length(B)<br /> for j = 1:length(C)<br /> b = B(i);<br /> c = C(j);<br /> disp(['Analyzer started for condition B = ',num2str(b),' and C = ',num2str(c)])<br /> % 2) Generate Traces<br /> M = feval(@GenTrace,A,b,c,nt,ns);<br /> % 3) Generate Fits<br /> [M2,T] = feval(@TraceFitter,M(:,:,2));<br /> % 4) Compare Fits<br /> % First you must specify the tolerance of fit:<br /> TF = 0.02; %<--- Remember this is on a scale of 0 to 1<br /> % Reset the count of bad NoRSE points and bad FS points:<br /> BadN = 0;<br /> BadFS = 0;<br /> for k = 1:nt<br /> for l = 1:ns<br /> if abs(M2(l,k,1)-M(l,k,1))> TF<br /> BadN = BadN + 1;<br /> end<br /> if abs(M2(l,k,2)-M(l,k,1))> TF<br /> BadFS = BadFS + 1;<br /> end<br /> end<br /> end<br /> % Now compute the percent error:<br /> %disp('End of analysis. Run time and results below.')<br /> TotalSteps = nt*ns;<br /> PerBN = BadN/TotalSteps;<br /> PerBFS = BadFS/TotalSteps;<br /> NorseQual(i,j) = PerBN;<br /> FSQual(i,j) = PerBFS;<br /> TimeN(i,j) = T(1,1);<br /> TimeFS(i,j) = T(1,2);<br /> end<br />end<br />% These are the output files:</p>
<p>csvwrite('BadFitPercentN.csv',NorseQual)<br />csvwrite('BadFitPercentFS.csv',FSQual)<br />csvwrite('TimeN.csv',TimeN)<br />csvwrite('TimeFS.csv',TimeFS)<br /> <br /> <br />return<br />end</p>
<p>function [M,T] = TraceFitter(RTD)<br />% This function fits the generated traces with both the NoRSE program and<br />% the fit states program.<br />% ----- Step 1: Read in and the Traces --------<br />%<br />%Determine the number of time steps (Ns) and number of transducers (Nt):<br />[Ns,Nt]= size(RTD);<br />RTD_n = RTD;<br />%<br />M = zeros(Ns,Nt,4); % First level Norse fit, second level FS<br />T = [0 0]; % 1 = NoRSE Time, 2 = FS time</p>
<p>%############# NoRSE ALGORITHM ############################################<br />%<br />% ---------------- Step 2: Denoise each of the Traces --------------------<br />%<br />tic<br />%The denoising algorithm loses the first and last time steps, so set a new<br />%time window:<br />time = Ns - 2;<br />%Create an empty matrix to recieve the denoised trace data:<br />T_denoise = zeros(time,Nt);<br />for i = 1:Nt<br /> NT = RTD_n(:,i);<br /> IC = feval(@NoiseReduc,NT);<br /> T_denoise(:,i) = IC(:,1);<br />end<br />% csvwrite(['T_denoise',int2str(iii),'.csv'],T_denoise);<br />%<br />%---------------- Step 3: Construct APH for each trace ------------------<br />%<br />%Number of bins for the histogram construction?<br />nbins = 200;<br />%Create empty recieving matrices for histogram counts (nM) and the bin<br />%centers (xM)<br />nM = zeros(Nt,nbins);<br />xM = zeros(Nt,nbins);<br />for i = 1:Nt<br /> Y = T_denoise(:,i);<br /> [n,xout] = hist(Y,nbins);<br /> nM(i,:) = n(1,:);<br /> xM(i,:) = xout(1,:);<br />end<br />%xlswrite('nM',nM);<br />%xlswrite('xM',xM);<br />%<br />%---------------- Step 4: Find peaks for each APH histrogram ------------<br />%<br />%Use PeakFinder Function to find the Average Peak Locations (APL) - these<br />%translate to the bin number <br />APL = feval(@PeakFinder,nM);<br />[Nt,Nlev] = size(APL);<br />%<br />%Translate the APL bin numbers back to the normalized intensity level to<br />%determine the stochastic step levels (SLL)<br />SLL = zeros(Nt,Nlev);<br />for i = 1:Nt<br /> % Very rarely, the peak finder will flag no bins as peaks, in this case<br /> % we will use a general gaussian fit of the denoised data to find a<br /> % mean step value.<br /> if sum(APL(i,:)) == 0<br /> Ncurves = 1;<br /> X = T_denoise(i,:);<br /> [u,~,~,~] = feval(@fit_mix_gaussian,X,Ncurves);<br /> SLL(i,1) = u;<br /> else<br /> for j = 1:Nlev<br /> if APL(i,j) ~= 0<br /> N_bin = APL(i,j);<br /> SLL(i,j) = xM(i,N_bin);<br /> end<br /> end<br /> end<br />end<br />%<br />%----------- Step 5 Determine the NoRSE Fit Trace ---------------<br />%<br />for ii = 1:Nt<br /> RealT = T_denoise(:,ii);<br /> count = 0;<br /> for m = 1:Nlev<br /> if SLL(ii,m) > 0<br /> count = count + 1;<br /> end<br /> end<br /> for i = 1:time<br /> compare = zeros(count,1);<br /> for j = 1:count<br /> compare(j,1) = abs(RealT(i,1)-SLL(ii,j));<br /> end<br /> [C,I] = min(compare);<br /> M(i+1,ii,1) = SLL(ii,I);<br /> end<br /> % Because the noise reduction program takes away the first and last time<br /> % point, estimate their values as those immediately following and<br /> % proceeding respectively<br /> M(1,ii,1) = M(2,ii,1); <br /> M(time+2,ii,1) = M(time+1,ii,1);<br />end<br />%<br />T(1,1) = toc;<br />%<br />%<br />% #############################################<br />%---------- FitStates Analyzer ---------------- (Added 1.15.2011 by NFR)<br />% #############################################<br />%<br />%<br />tic<br />forward_trans = 0;<br />NonNormTraces = RTD;<br />[num_rows, num_cols] = size(NonNormTraces);<br />Ftime = 1:1:num_rows;<br />% Make empty matrix to recieve the traces and their fits<br />count = 0;<br />GoodTrace2 = 0;<br />for i = 1:num_cols <br /> Tracemax = max(NonNormTraces(:,i)); <br /> Tracemin = min(NonNormTraces(:,i));<br /> data3 = 1000*(NonNormTraces(:,i) - Tracemin)/(Tracemax - Tracemin);<br /> data2 = 1000 - data3;<br /> MaxNumofStates = 30;<br /> maxN = 30;<br /> [NumofStates, BestFitTraces] = fitStates(maxN, Ftime, data2, data3, MaxNumofStates);<br /> M(:,i,2) = BestFitTraces; % Best fit trace from Fit States<br />end <br />T(1,2) = toc;<br />return<br />end<br />%<br />%<br />% <--------------Functions embeded in the Trace Analyzer ------------><br />%<br />function IC = NoiseReduc(NT)<br />%<br />%Coded by Nigel Reuel on 8.31.2010<br />%<br />%Adaptation of Chung and Kennedy (1991) forward-backward non-linear <br />%filtering technique to reduce the noise of our<br />%fluorescent signal in an attempt to resolve the single molecule events<br />%amidst the noise.<br />%<br />%Input: Takes supplied noisy trace (NT)<br />%Output: Returns cleaned trace without the first and last data point<br />%<br />time = length(NT);<br />%<br />%Set algorithm parameters:<br />N = [4 8 16 32]; %Sampling window length<br />K = length(N); %Number of sampling windows<br />M = 10; %Weighting parameter length<br />P = 40; %Parameter that effects sharpness of steps<br />%<br />% -------- The Forward-Backward non-linear Algorithm -----------<br />%<br />%First - Run each time step and store the average forward and backward<br />%predictors for each of the sampling windows<br />%<br />I_avg_f = zeros(time,K);<br />I_avg_b = zeros(time,K);<br />for g = 1:time<br /> %Run for each sampling window<br /> for k = 1:K<br /> %Solve for the average forward predictor (w/ some<br /> %logic to help solve the beginning of the trace<br /> window = N(k);<br /> if g == 1<br /> I_avg_f(g,k) = NT(1,1);<br /> elseif g - window - 1 < 0 <br /> I_avg_f(g,k) = sum(NT(1:g-1,1))/g;<br /> else <br /> epoint = g - window;<br /> spoint = g - 1;<br /> I_avg_f(g,k) = sum(NT(epoint:spoint,1))/window;<br /> end<br /> %Now do the same for the backward predictor<br /> if g == time<br /> I_avg_b(g,k) = NT(g,1);<br /> elseif g + window > time<br /> sw = time - g;<br /> I_avg_b(g,k) = sum(NT(g+1:time,1))/sw;<br /> else <br /> epoint = g + window;<br /> spoint = g + 1;<br /> I_avg_b(g,k) = sum(NT(spoint:epoint,1))/window;<br /> end<br /> end<br />end<br />%<br />%Second - Solve for non-normalized forward and backward weights:<br />f = zeros(time,K);<br />b = zeros(time,K);<br />for i = 1:time<br /> for k = 1:K<br /> Mstore_f = zeros(M,1);<br /> Mstore_b = zeros(M,1);<br /> for j = 0:M-1<br /> t_f = i - j;<br /> t_b = i + j;<br /> if t_f < 1<br /> Mstore_f(j+1,1) = (NT(i,1) - I_avg_f(i,k))^2;<br /> else<br /> Mstore_f(j+1,1) = (NT(t_f,1) - I_avg_f(t_f,k))^2;<br /> end<br /> if t_b > time<br /> Mstore_b(j+1,1) = (NT(i,1) - I_avg_b(i,k))^2;<br /> else<br /> Mstore_b(j+1,1) = (NT(t_b,1) - I_avg_b(t_b,k))^2;<br /> end<br /> end<br /> f(i,k) = sum(Mstore_f)^(-P);<br /> b(i,k) = sum(Mstore_b)^(-P);<br /> end<br />end<br />% Third - Solve for vector of normalization factors for the weights<br />C = zeros(time,1);<br />for i = 1:time<br /> Kstore = zeros(K,1);<br /> for k = 1:K<br /> Kstore(k,1) = f(i,k) + b(i,k);<br /> end<br /> C(i,1) = 1/sum(Kstore);<br />end<br />% Fourth and final step - Put all parameters together to solve for the<br />% intensities<br />Iclean = zeros(time,1);<br />for i = 1:time<br /> TempSum = zeros(K,1);<br /> for k = 1:K<br /> TempSum(k,1) = f(i,k)*C(i,1)*I_avg_f(i,k) + b(i,k)*C(i,1)*I_avg_b(i,k);<br /> end<br /> Iclean(i,1) = sum(TempSum);<br />end<br />IC = Iclean(2:time-1,1);<br />return<br />end</p>
<p>function PeakLocAvg = PeakFinder(nM)<br />% Coded by Nigel Reuel on 9.15.2010 updated in March 2011 by NFR<br />% This function finds the peaks of a histogram using a modified running<br />% average algorithm + peak flagging routine<br />%<br />[nT,nB] = size(nM);<br />% <br />%Size of window for running average algorithm (increasing this causes more<br />%of a rough approximation of peaks)<br />SW = 3;<br />% Centered running average algorithm to help find peaks:<br />% <br />T_ra = zeros(nT,nB);<br />for j = 1:nT<br /> for i = 1:nB<br /> if i < (SW + 1)<br /> T_ra(j,i) = sum(nM(j,1:i+SW))/(i+SW);<br /> elseif i >= (SW+1) && i <= (nB-SW)<br /> T_ra(j,i) = sum(nM(j,i-SW:i+SW))/(SW*2+1);<br /> else<br /> T_ra(j,i) = sum(nM(j,i:nB))/(nB-i+1);<br /> end<br /> end<br />end<br />%xlswrite('T_ra',T_ra);<br />%<br />%----------------------Peak finding algorithm---------------------------<br />%<br />% Base function on physical example of climbing up and down hills. Drop<br />% flags on supposed peaks. Eliminate "peaks" that are molehills. Combine<br />% peaks that are close neighbors.<br />Flags = zeros(nT,nB,6);<br />% Layer 1 = Peak Flag<br />% Layer 2 = Right Valley Bin#<br />% Layer 3 = Left Valley Bin#<br />% Layer 4 = Peak magnitude (Histogram Count)<br />% Layer 5 = Right Valley magnitude (Histogram Count)<br />% Layer 6 = Left Valley magnitude (Histogram Count)<br />% Molehill level (number of responses that are not significant - avg!)<br />MoleL = (5/SW)*.5; % = #responses/#bins/SW = Avg. Response, no significance)<br />PeakLocAvg = zeros(nT,2);<br />for i = 1:nT<br /> for j = 1:nB<br /> if T_ra(i,j) > MoleL<br /> % Logic to find significant peaks at endpoints (and the R/L<br /> % valleys)<br /> if j == 1 && T_ra(i,j) > T_ra(i,j+1)<br /> Flags(i,j,1) = 1;<br /> Flags(i,j,4) = T_ra(i,j);<br /> Pval = T_ra(i,j);<br /> Pnext = T_ra(i,j+1);<br /> index = j;<br /> % Find right hand valley:<br /> while Pval > Pnext<br /> index = index + 1;<br /> Pval = T_ra(i,index);<br /> Pnext = T_ra(i,index+1);<br /> end<br /> Flags(i,j,2) = index;<br /> Flags(i,j,5) = T_ra(i,index);<br /> elseif j == 1 && T_ra(i,j) <= T_ra(i,j+1)<br /> Flags(i,j,1) = 0;<br /> elseif j == nB && T_ra(i,j) > T_ra(i,j-1)<br /> Flags(i,j,1) = 1;<br /> Flags(i,j,4) = T_ra(i,j);<br /> Pval = T_ra(i,j);<br /> Pprior = T_ra(i,j-1);<br /> index = j;<br /> % Find left hand valley:<br /> while Pval > Pprior<br /> index = index - 1;<br /> Pval = T_ra(i,index);<br /> Pprior = T_ra(i,index-1);<br /> end<br /> Flags(i,j,3) = index;<br /> Flags(i,j,6) = T_ra(i,index);<br /> elseif j == nB && T_ra(i,j) <= T_ra(i,j-1)<br /> Flags(i,j,1) = 0;<br /> % Logic to find peaks at all midpoints (and their R/L valleys)<br /> elseif T_ra(i,j) >= T_ra(i,j-1) && T_ra(i,j) >= T_ra(i,j+1)<br /> Flags(i,j,1) = 1;<br /> Flags(i,j,4) = T_ra(i,j);<br /> % Find left hand valley:<br /> Pval = T_ra(i,j);<br /> Pprior = T_ra(i,j-1);<br /> index = j;<br /> if j > 2<br /> Switch = 1;<br /> else<br /> Switch = 0;<br /> index = 1;<br /> end<br /> while Pval > Pprior && Switch == 1;<br /> index = index - 1;<br /> Pval = T_ra(i,index);<br /> Pprior = T_ra(i,index-1);<br /> if index == 2<br /> Switch = 0;<br /> index = 1;<br /> end<br /> end<br /> Flags(i,j,3) = index;<br /> Flags(i,j,6) = T_ra(i,index);<br /> % Find right hand valley:<br /> Pval = T_ra(i,j);<br /> Pnext = T_ra(i,j+1);<br /> index = j;<br /> if j < nB - 1<br /> Switch = 1;<br /> else<br /> Switch = 0;<br /> index = nB;<br /> end<br /> while Pval > Pnext && Switch == 1;<br /> index = index + 1;<br /> Pval = T_ra(i,index);<br /> Pnext = T_ra(i,index+1);<br /> if index == nB - 1;<br /> Switch = 0;<br /> index = nB;<br /> end<br /> end<br /> Flags(i,j,2) = index;<br /> Flags(i,j,5) = T_ra(i,index);<br /> end<br /> end<br /> end<br /> %{<br />X = (1:200)';<br />Y1 = Flags(i,:,1)';<br />Y2 = T_ra(i,:);<br />plot(X,Y1,X,Y2)<br /> %}<br />%Make a vector containing the peak locations:<br />PeakVec = 0;<br />count = 1;<br /> for j = 1:nB<br /> if Flags(i,j,1) == 1<br /> PeakVec(1,count) = j;<br /> count = count + 1;<br /> end<br /> end<br /> % Now analyze peak pairs:<br /> % Specify significant peak valley distance and significant bin<br /> % distance:<br /> SigD = (5/SW)*.5; % = the average response<br /> SigBinD = 200*(.025); % = 2.5% of the total bin #.<br /> Npeaks = length(PeakVec);<br /> % Determine the peak pairwise interactions...each will have a right<br /> % hand value and left hand value (except for the endpoints):<br /> % 1: Unique<br /> % 2: Combine<br /> % 3: Ignore<br /> % Create a matrix to recieve these calculations:<br /> PeakCode = zeros(2,Npeaks); % Row1 = to right, Row2 = to left<br /> for j = 1:Npeaks-1<br /> % Determine the peak bin numbers:<br /> PBN1 = PeakVec(1,j);<br /> PBN2 = PeakVec(1,j+1);<br /> % Calcualate the valley distances (left to right)<br /> VD1 = abs(Flags(i,PBN1,4) - Flags(i,PBN1,5)); <br /> VD2 = abs(Flags(i,PBN2,6) - Flags(i,PBN2,4));<br /> %<br /> if VD1 <= SigD && VD2 <= SigD <br /> % Combine<br /> PeakCode(1,j) = 2;<br /> PeakCode(2,j+1) = 2;<br /> elseif VD1 >= SigD && VD2 <= SigD<br /> % Left Peak sig, right peak ignore<br /> PeakCode(1,j) = 1;<br /> PeakCode(2,j+1) = 3;<br /> elseif VD1 <= SigD && VD2 >= SigD<br /> % Left peak ignore, right peak significant<br /> PeakCode(1,j) = 3;<br /> PeakCode(2,j+1) = 1;<br /> elseif VD1 >= SigD && VD2 >= SigD<br /> % Both peaks are significant<br /> PeakCode(1,j) = 1;<br /> PeakCode(2,j+1) = 1;<br /> end<br /> end<br /> % Logic for the peak endpoints<br /> Left = abs(Flags(i,PeakVec(1,1),4) - Flags(i,PeakVec(1,1),6));<br /> if Left >= SigD<br /> PeakCode(2,1) = 1;<br /> else<br /> PeakCode(2,1) = 3;<br /> end<br /> Right = abs(Flags(i,PeakVec(1,Npeaks),4) - Flags(i,PeakVec(1,Npeaks),5));<br /> if Right >= SigD<br /> PeakCode(1,Npeaks) = 1;<br /> else<br /> PeakCode(1,Npeaks) = 3;<br /> end<br /> % Now determine what to do with the peaks. Look at combinations<br /> % reading left to right. Put signifcant peaks into a vector.<br /> SPV = 0;<br /> j = 1;<br /> count = 1;<br /> while j <= Npeaks<br /> C1 = PeakCode(1,j);<br /> C2 = PeakCode(2,j);<br /> if C1 == 1 && C2 == 1<br /> % This is a bonafide unique peak. Put it in the vector.<br /> SPV(1,count) = PeakVec(1,j);<br /> count = count + 1;<br /> j = j+1;<br /> elseif C1 == 2 <br /> % This peak will be merged with other peak(s)<br /> count2 = 2;<br /> Merge = PeakVec(1,j);<br /> while C1 == 2<br /> j = j + 1;<br /> C1 = PeakCode(1,j);<br /> Merge(1,count2) = PeakVec(1,j);<br /> count2 = count2 + 1;<br /> end<br /> % Insert logic here to eliminate valleys and slow rises (see<br /> % what is happening at the significant enpoints!)<br /> End1 = Flags(i,Merge(1,1),4) - Flags(i,Merge(1,1),6);<br /> End2 = Flags(i,Merge(1,end),4) - Flags(i,Merge(1,end),5);<br /> Distance = Merge(1,end) - Merge(1,1);<br /> if End1 >= SigD && End2 >= SigD<br /> SPV(1,count) = round(mean(Merge));<br /> count = count + 1;<br /> j = j + 1;<br /> elseif End1 >= SigD && End2 <= (SigD)*(-1) && Distance >= SigBinD<br /> SPV(1,count) = round(mean(Merge));<br /> count = count + 1;<br /> j = j + 1;<br /> elseif End1 <= SigD*(-1) && End2 >= (SigD) && Distance >= SigBinD<br /> SPV(1,count) = round(mean(Merge));<br /> count = count + 1;<br /> j = j + 1;<br /> end<br /> else % Represents 1/3, 3/1, and 3/3 combos which are all ignored!!<br /> j = j + 1;<br /> end<br /> end<br /> Nsigpeaks = length(SPV);<br /> for j = 1:Nsigpeaks<br /> PeakLocAvg(i,j) = SPV(1,j);<br /> end<br /> %{<br /> % Plot the PeakLocAvg to see how the algorithm performs)<br /> S = Nsigpeaks;<br /> FF = zeros(1,nB);<br /> for k = 1:S<br /> Temp = PeakLocAvg(i,k);<br /> FF(1,Temp) = 1;<br /> end<br /> X = (1:200)';<br /> Y1 = FF(1,:)';<br /> Y2 = T_ra(i,:);<br /> plot(X,Y1,X,Y2)<br /> stop = 1;<br /> %} <br />end</p>
<p><br />return<br />end</p>
<p>%% --------------------------- FIT STATES --------------------------------<br />function [NumofStates, BestFitTraces] = fitStates(maxN, time_vector, data2, data3, MaxNumofStates)<br />%This function determines the number of states for each trace</p>
<p>%INPUTS: <br />% maxN maximum number of transitions for each traces<br />% time_vector vector of times corresponding to each frame<br />% data2 second column of normalized data in *.dat files<br />% data3 third column of normalized data in *.dat files</p>
<p>%OUTPUTS<br />% NumofStates row vector containing number of states for each trace<br />% BestFitTraces matrix showing the current states occupied by each<br />% trace for each frame (crude approximation)</p>
<p>A{1} = time_vector';<br />[row, cols] = size(data2); %# rows = # timeframes, # cols = # SWNT<br />for q = 1:1:cols %Loop through each SWNT trace<br /> %redefine variable to match formatting used in StatesFinder<br /> A{2} = data2(:,q);<br /> A{3} = data3(:,q);</p>
<p>time=A{1};<br /> data=A{3}./1000;<br /> timeinc=time(2,1)-time(1,1);<br /> fulldata=[A{1} A{2} A{3}];</p>
<p>fitdata=data;<br /> antifit=data;</p>
<p>resultArray=zeros(maxN,3); %first column the begnning index. second column is the length. third colum is the step fit.<br /> resultArray(1,1)=1;<br /> resultArray(1,2)=length(data);<br /> resultArray(1,3)=mean(data);<br /> fitdata(:)=resultArray(1,3);<br /> chisquaredFit=zeros(maxN,1);<br /> chisquaredAntifit=zeros(maxN,1);<br /> chisquaredFit(1)=sum((data-fitdata).^2,1);<br /> SS=zeros(maxN,1);<br /> SS(1)=1;<br /> <br /> flag=0;<br /> resultArray0 = resultArray;<br /> for i=1:(maxN-1),<br /> para=0;<br /> for j=1:i,<br /> temp=resultArray(j,1);<br /> temp2=resultArray(j,2);<br /> B=data(temp:(temp2+temp-1));<br /> [StepSize,StepLocation,left,right]=TJstepFinder(B);<br /> temp3=StepSize*(temp2)^0.5;<br /> if temp3 > para %optimal num. of transitions (maximize temp3)<br /> para=temp3;<br /> newBegin1=temp;<br /> newLength1=StepLocation-1;<br /> newValue1=left;<br /> newBegin2=temp+StepLocation-1;<br /> newLength2=temp2-newLength1;<br /> newValue2=right;<br /> bestJ=j;<br /> end<br /> end<br /> <br /> resultArray(bestJ,1)=newBegin1;<br /> resultArray(bestJ,2)=newLength1;<br /> resultArray(bestJ,3)=newValue1;<br /> resultArray(i+1,1)=newBegin2;<br /> resultArray(i+1,2)=newLength2;<br /> resultArray(i+1,3)=newValue2;<br /> currentFitArray=sort_on_key(resultArray(1:i+1,:),1);<br /> antiFitArray=zeros(i+2,3);<br /> antiFitArray(1,1)=1;<br /> antiFitArray(1,2)=length(data);<br /> for j=1:(i+1),<br /> temp=currentFitArray(j,1);<br /> temp2=currentFitArray(j,2);<br /> if (temp2 == 1)<br /> flag = 1;<br /> break;<br /> end<br /> B=data(temp:(temp2+temp-1));<br /> [StepSize,StepLocation,left,right]=TJstepFinder(B);<br /> antiFitArray(j+1,1)=temp+StepLocation-1;<br /> end<br /> if (flag == 1)<br /> break;<br /> end<br /> for j=1:i+1,<br /> antiFitArray(j,2)=antiFitArray(j+1,1)-antiFitArray(j,1);<br /> antiFitArray(j,3)=mean(data(antiFitArray(j,1):(antiFitArray(j,2)+antiFitArray(j,1)-1)));<br /> end<br /> antiFitArray(i+2,2)=length(data)-antiFitArray(i+2,1)+1;<br /> antiFitArray(i+2,3)=mean(data(antiFitArray(i+2,1):end));<br /> <br /> for j=1:i+1,<br /> fitdata(resultArray(j,1):(resultArray(j,2)+resultArray(j,1)-1))=resultArray(j,3);<br /> end<br /> for j=1:i+2,<br /> antifit(antiFitArray(j,1):antiFitArray(j,2)+antiFitArray(j,1)-1)=antiFitArray(j,3);<br /> end<br /> <br /> chisquaredFit(i+1)=sum((data-fitdata).^2,1);<br /> chisquaredAntifit(i+1)=sum((data-antifit).^2,1);<br /> SS(i+1)=chisquaredAntifit(i+1)/chisquaredFit(i+1);<br /> <br /> resultArray0 = resultArray;<br /> end</p>
<p>resultArray = resultArray0;</p>
<p>[C,I]=max(SS);</p>
<p>bestfitnumber=I;<br /> resultArray(1,1)=1;<br /> resultArray(1,2)=length(data);<br /> resultArray(1,3)=mean(data);<br /> fitdata(:)=resultArray(1,3);<br /> chisquaredFit=zeros(maxN,1);<br /> chisquaredFit(1)=sum((data-fitdata).^2,1);<br /> for i=1:(bestfitnumber),<br /> para=0;<br /> for j=1:i,<br /> temp=resultArray(j,1);<br /> temp2=resultArray(j,2);<br /> B=data(temp:(temp2+temp-1));<br /> [StepSize,StepLocation,left,right]=TJstepFinder(B);<br /> temp3=StepSize*(temp2)^0.5;<br /> if temp3 > para,<br /> para=temp3;<br /> newBegin1=temp;<br /> newLength1=StepLocation-1;<br /> newValue1=left;<br /> newBegin2=temp+StepLocation-1;<br /> newLength2=temp2-newLength1;<br /> newValue2=right;<br /> bestJ=j;<br /> end<br /> end<br /> <br /> resultArray(bestJ,1)=newBegin1;<br /> resultArray(bestJ,2)=newLength1;<br /> resultArray(bestJ,3)=newValue1;<br /> resultArray(i+1,1)=newBegin2;<br /> resultArray(i+1,2)=newLength2;<br /> resultArray(i+1,3)=newValue2;<br /> <br /> end</p>
<p>for j=1:bestfitnumber+1,<br /> fitdata(resultArray(j,1):(resultArray(j,2)+resultArray(j,1)-1))=resultArray(j,3);<br /> end<br /> finalresult=sort_on_key(resultArray((1:bestfitnumber+1),:),1);<br /> finalresult(:,1)=(finalresult(:,1)-1)*timeinc+time(1);<br /> finalresult(:,2)=finalresult(:,2)*timeinc;<br /> <br /> if MaxNumofStates == 10 %if maximum number of states is 10<br /> statesnum=length(unique(round(10*finalresult(:,3))));<br /> rounded_fit = round(fitdata*10)/10;<br /> fitdata_average = zeros(1,length(fitdata));<br /> for i = 1:1:length(rounded_fit)<br /> if rounded_fit(i) ~= -1<br /> sameState_coord = find(rounded_fit==rounded_fit(i));<br /> total_sum = 0;<br /> for j = 1:1:length(sameState_coord)<br /> total_sum = total_sum+fitdata(sameState_coord(j));<br /> rounded_fit(sameState_coord(j)) = -1;<br /> end<br /> average_state = total_sum/(length(sameState_coord));<br /> for k = 1:1:length(sameState_coord)<br /> fitdata_average(sameState_coord(k)) = average_state;<br /> end<br /> end<br /> end<br /> else %if maximum number of states is something other than 10<br /> %from the fit, group states with similar values according to the<br /> %maximum numnber of states as specified by the user. First, create a<br /> %vector of states between 0 and 1 consisting of number of states<br /> %specified by the user. Then find the difference between each of these<br /> %states and the fitdata value, and whichever state i closest to the<br /> %fitdata value is determined to be the state.<br /> possible_states = 0:(1/MaxNumofStates):1;<br /> rounded_fit = zeros(1, length(fitdata));<br /> for i = 1:1:length(fitdata)<br /> difference = abs(fitdata(i)-possible_states);<br /> [dum, state_coord] = min(difference); %find the state closest to the fit data<br /> rounded_fit(i) = possible_states(state_coord);<br /> end<br /> %Now start to average all the states that were grouped into having the<br /> %same rounded state<br /> fitdata_average = zeros(1,length(fitdata));<br /> for i = 1:1:length(rounded_fit)<br /> if rounded_fit(i) ~= -1<br /> sameState_coord = find(rounded_fit==rounded_fit(i));<br /> total_sum = 0;<br /> for j = 1:1:length(sameState_coord)<br /> total_sum = total_sum+fitdata(sameState_coord(j));<br /> rounded_fit(sameState_coord(j)) = -1;<br /> end<br /> average_state = total_sum/(length(sameState_coord));<br /> for k = 1:1:length(sameState_coord)<br /> fitdata_average(sameState_coord(k)) = average_state;<br /> end<br /> end<br /> end<br /> statesnum = length(unique(fitdata_average));<br /> end<br /> output(q,:) = [statesnum, fitdata_average]; <br />end</p>
<p>NumofStates = output(:,1)';<br />BestFitTraces = output(:, 2:end)';</p>
<p>return;<br />end</p>
<p>%% ----------------------- TJ STEP FINDER ---------------------------------<br />function [StepSize,StepLocation,left,right]=TJstepFinder(B)</p>
<p>%This function is called upon by the fitStates function to locate steps in<br />%the traces.</p>
<p>global C; %this is what we send to TJflatFit</p>
<p>lengthB=length(B);</p>
<p>%Initial conditions<br />fitResult=B;<br />tempfit=B;<br />StepSize=0;<br />bestchisquared=10^10;<br />time=(1:lengthB);<br />StepLocation=1;<br />C=zeros(1,2);<br />C(:,1)=(1:1)';<br />C(:,2)=B(1:1);<br />A1=mean(C(:,2));<br />tempfit(1:1)=A1;<br />C=zeros((lengthB-1),2);<br />C(:,1)=((1+1):lengthB)';<br />C(:,2)=B((1+1):lengthB);<br />A2=mean(C(:,2));<br />StepSize=abs(A2-A1);<br />left=A1;<br />right=A2;</p>
<p>for i=1:(lengthB-1),<br /> C=zeros(i,2);<br /> C(:,1)=(1:i)';<br /> C(:,2)=B(1:i);<br /> A1=mean(C(:,2));%fmins('TJflatFit',mean(C(:,2)));<br /> tempfit(1:i)=A1;<br /> C=zeros((lengthB-i),2);<br /> C(:,1)=((i+1):lengthB)';<br /> C(:,2)=B((i+1):lengthB);<br /> A2=mean(C(:,2));%fmins('TJflatFit',mean(C(:,2)));<br /> tempfit((i+1):lengthB)=A2;<br /> chisquared=sum((tempfit-B).^2,1);<br /> if chisquared < bestchisquared,<br /> bestchisquared=chisquared;<br /> fitResult=tempfit;<br /> StepSize=abs(A2-A1);<br /> StepLocation=i+1;<br /> left=A1;<br /> right=A2;<br /> end<br />end</p>
<p>return;<br />end</p>
<p><br />%% -----------------------SORT ON KEY -------------------------------------</p>
<p>function sorteer=sort_on_key(rij,key)</p>
<p>%This function is called upon by the fitStates function to locate steps in<br />%the traces. It reads a (index,props) array ands sorts along the index with one of the<br />%props as sort key</p>
<p>size=length(rij(:,1));<br />sorteer=0*rij;<br />buf=rij;<br />for i=1:size<br /> [g,h]=min(buf(:,key));<br /> sorteer(i,:)=buf(h,:);<br /> buf(h,key)=max(buf(:,key))+1;<br />end</p>
<p>return;<br />end</p>
<p>function M = GenTrace(A,B,C,numTr,Ttotal)<br />% Coded by Nigel Reuel on 3.3.2011<br />% This function generates general step traces according to three<br />% parameters which encompass all types of step data:<br />% A - tendency up / tendency down<br />% B - avg event time / signal sampling time<br />% C - avg step size / avg noise<br />% It generates a user-specified number of traces with these three<br />% parameters and returns a 3D matrix of the traces (first level - real<br />% response, 2nd level - noise). To aid in comparing traces, all traces are<br />% normalized on a scale of 0 to 1.<br />%<br />%<br />Tdown = 1;<br />Tup = A;<br />Tprob = Tdown + Tup;<br />%SigSampT = 1; %< --- Not used, but coded in my index numbers<br />AvgEventT = B;<br />AvgNoise = 1;<br />AvgStep = C;<br />%<br />% Set total simulation time and number of traces: - Now function input<br />% Ttotal = 1000;<br />% Initialize solution matrix:<br />M = zeros(Ttotal,numTr,2);<br />% Generate the traces<br />for i = 1:numTr<br /> t = 1;<br /> % Signal start (arbitrary due to normalization):<br /> S = 100;<br /> % Track signal changes:<br /> SigChange = [0 0];<br /> num = 1;<br /> while t < Ttotal<br /> SigChange(num,1) = t;<br /> % 1) Determine step size:<br /> SS = rand()*2*AvgStep;<br /> % 2) Determine the direction:<br /> rand1 = rand();<br /> check1 = rand1*Tprob;<br /> if check1 < Tdown <br /> S = S - SS;<br /> else<br /> S = S + SS;<br /> end<br /> SigChange(num,2) = S;<br /> num = num + 1; <br /> % 3) Determine the next event time:<br /> tstep = rand()*2*AvgEventT;<br /> t = t + tstep;<br /> end<br /> % Turn signal change matrix into real trace and trace w/ noise<br /> Real_Tr = zeros(Ttotal,1);<br /> Noise_Tr = zeros(Ttotal,1);<br /> Intervals = round(SigChange(:,1));<br /> Intervals(num,1) = Ttotal;<br /> for j = 1:num-1<br /> for k = Intervals(j,1):Intervals(j+1,1)<br /> Real_Tr(k,1) = SigChange(j,2);<br /> NoiseL = (rand()*2-1)*AvgNoise;<br /> Noise_Tr(k,1) = SigChange(j,2) + NoiseL;<br /> end<br /> end<br /> %Normalize according to max and min levels in traces<br /> Max(1,1) = max(Real_Tr);<br /> Max(2,1) = max(Noise_Tr);<br /> Min(1,1) = min(Real_Tr);<br /> Min(2,1) = min(Noise_Tr);<br /> MaxN = max(Max);<br /> MinN = min(Min);<br /> SigDiff = MaxN-MinN;<br /> Real_n = (Real_Tr-MinN)./SigDiff;<br /> Noise_n = (Noise_Tr-MinN)./SigDiff;<br /> M(:,i,1) = Real_n;<br /> M(:,i,2) = Noise_n;<br /> % Graphical check of traces:<br /> %{<br /> X = 1:1000;<br /> X = X';<br /> plot(X,Real_n,X,Noise_n);<br /> %}<br />end<br />return<br />end</p>
<p>function [u,sig,t,iter] = fit_mix_gaussian( X,M )<br />%<br />% Gaussian Fit Function by Ohad Gal (c) 2003<br />% Download at:<br />% <a href="http://www.mathworks.com/matlabcentral/fileexchange/4222-a-collection-of-fitting-functions">http://www.mathworks.com/matlabcentral/fileexchange/4222-a-collection-of-fitting-functions</a><br />% on 3.21.2011<br />%<br />% fit_mix_gaussian - fit parameters for a mixed-gaussian distribution using EM algorithm<br />%<br />% format: [u,sig,t,iter] = fit_mix_gaussian( X,M )<br />%<br />% input: X - input samples, Nx1 vector<br />% M - number of gaussians which are assumed to compose the distribution<br />%<br />% output: u - fitted mean for each gaussian<br />% sig - fitted standard deviation for each gaussian<br />% t - probability of each gaussian in the complete distribution<br />% iter- number of iterations done by the function<br />%<br />% initialize and initial guesses<br />N = length( X );<br />Z = ones(N,M) * 1/M; % indicators vector<br />P = zeros(N,M); % probabilities vector for each sample and each model<br />t = ones(1,M) * 1/M; % distribution of the gaussian models in the samples<br />u = linspace(min(X),max(X),M); % mean vector<br />sig2 = ones(1,M) * var(X) / sqrt(M); % variance vector<br />C = 1/sqrt(2*pi); % just a constant<br />Ic = ones(N,1); % - enable a row replication by the * operator<br />Ir = ones(1,M); % - enable a column replication by the * operator<br />Q = zeros(N,M); % user variable to determine when we have converged to a steady solution<br />thresh = 1e-3;<br />step = N;<br />last_step = inf;<br />iter = 0;<br />min_iter = 10;</p>
<p>% main convergence loop, assume gaussians are 1D<br />while ((( abs((step/last_step)-1) > thresh) & (step>(N*eps)) ) | (iter<min_iter) ) <br /> <br /> % E step<br /> % ========<br /> Q = Z;<br /> P = C ./ (Ic*sqrt(sig2)) .* exp( -((X*Ir - Ic*u).^2)./(2*Ic*sig2) );<br /> for m = 1:M<br /> Z(:,m) = (P(:,m)*t(m))./(P*t(:));<br /> end<br /> <br /> % estimate convergence step size and update iteration number<br /> prog_text = sprintf(repmat( '\b',1,(iter>0)*12+ceil(log10(iter+1)) ));<br /> iter = iter + 1;<br /> last_step = step * (1 + eps) + eps;<br /> step = sum(sum(abs(Q-Z)));<br /> fprintf( '%s%d iterations\n',prog_text,iter );</p>
<p>% M step<br /> % ========<br /> Zm = sum(Z); % sum each column<br /> Zm(find(Zm==0)) = eps; % avoid devision by zero<br /> u = (X')*Z ./ Zm;<br /> sig2 = sum(((X*Ir - Ic*u).^2).*Z) ./ Zm;<br /> t = Zm/N;<br />end</p>
<p>sig = sqrt( sig2 );</p>
<p>return<br />end</p></div>