info on the files in this directory:

1. pdm_table.ods - table creator for blackman windowed sinc kernel for FIR
2. pdm_fir192.ino - 192 point FIR using lookup tables
3. pdm_fir192half.ino - 192 point FIR storing only half the table
4. pdm.ino - 5 stage CIC with 2 tap IIR compensation filter, uses lookup tables
5. Adafruit_ZeroI2S.cpp - modified library file
6. pdm.txt - some useful links
7. pdm_194_test.ino - optimized 192 (actually 194) tap filter
8. pdm_194lookup - folder with all the lookup tables for the 194 tap version

notes on files:

1. this uses a modified windowed sinc function formula to create tables with even amounts of points. basically, it just replaces the usual M points with (M+1). typically you have 0->M, in this case we have 1->M points, and throw out points 0 and M+1, as they area always zero valued in a blackman windowed function.  the blackman window gives the best roll-off.  to use the table, put in M and Fc (cutoff frequency, F/Fs, where F is the cutoff frequency in Hz and Fs is the sample rate).  typically Fc < 1/(2*D), where D is the decimation ratio.  in this case im using ~0.8/(2*64), this reduces bandwidth slightly in order to help cut down on aliasing.

the coefficients are summed in bins of 8, and these are used to set scale factors for the coefficients.  if only one scale factor is used for all coefficients, that is the smallest value, which is in cell H25.  this needs to be tweaked manually to ensure the rounded sums arent larger than 32767.  if using a fixed scale factor per array, then adjust M such that there are only non-zero values in row 4 for 192 consecutive samples in the middle.  this can be checked by increasing the value in cell A1 until cell A4 is non-zero, and then interating with M.  M needs to be even.

to get a C array, copy row 4 into another instance of calc, and save as *.CSV.  using another sheet in the same instance sometimes causes issues.  the data was done in rows so that commas would be inserted, columns dont do that.

2. this creates lookup tables in the setup(), and stores them in RAM.  future iterations should precalculate these arrays and store them in flash.  it uses a lookup table to do accumulations 8b at a time, iterated over 192b (6x32b data values).  output is sent to the DAC, with volume scaling down with a downshift in that command.  on 120MHz M4, it executes in 1.8us and takes up 12kB of memory.

3. same as above, but only stores half the table, since its symmetric.  the last half of the data are iterated over in reverse, with a lookup table used to quickly reverse the 8b data chunks used in the FIR lookup table.  on 120MHz M4, it executes in 2.1us and takes up 6kB of memory.  lots of optimizations possible here for improved audio performance, speed, or decreased memory usage.  the first 3 arrays dont go above 127, and could be stored as bytes rather than ints (only saves 768B).  loop could be unrolled and individual arrays used rather than the nested array.  different (2^n) scale factors could be used for each array, to maximize use of the range and reduce rounding errors (would probably help a lot).  the accumulator then gets downshifted before the next array is added.

4. this is creates a lookup table for the first and second order 8b integrations, which is then used to quickly calculate the first 2 stages of the CIC.  the next 3 stages are standard (although very fast anyways), and then there is a simple IIR to help deal with the poor frequency response in the bandpass region.  it takes up negligible memory and executes in 3us on 120MHz M4.

5. this gets pdm reception working on the feather M4.  i modified the I2S registers to set it up for 32b data reception, LSB first.  the I2S.read() function was modified to only fetch one uint32_t at a time, to get rid of the blocking.  so, data comes in once per 32/Fs, and is a 32b unsigned int, with the oldest data in the LSB and most recent data in the MSB.

6. resources for other projects and online filter tools, etc.

7. this is an FIR that was designed with 194 taps, since the first two taps in the original 192 were 0 (first tap is always zero).  the lookup tables were created using floating point precision before rounding to improve SNR a bit.  not sure how much difference it makes.  i also tried using individual arrays for each section and unrolling the loop, rather than iterating through a matrix array, and it was slower for some reason.  not sure whats up with that.  at any rate, with optimizations set to uber fast, it runs at 2.5us on 120MHz M4.  the new arrays are below 8b for only the first 2 ararys, so not a lot of savings by doing them as bytes instead of ints.

8. all the lookup table business for 194.


other:

i tried both 225 and 64 point FIR filters.  the 64 was really bad, and the 225 was better than the 192, but not by a whole lot.  128 or 160 point version are probably worth a try, although savings will be slim, so only worth it if you need to squeeze every last cycle out of it.  one reason the 225 may have not sounded any better, was that im was testing by listening via the onboard DAC, which has a bit depth of 12b, so SNR better than that isnt going to be noticed.