FFT Analysis using CMSIS DSP Library

FFT Analysis

Many times, once you've captured a data stream from A/D or some other means, the next question is what is the frequency content of it?

Generally, that means you have to use a Fast Fourier Transform to view the signal data in the frequency domain.

There's a lot to signal analysis that I'm not going into here.  For a better coverage of that, I recommand LabKitty.

This blog is just going to walk through the basic steps and some of the pitfalls of using an FFT in the real world.

Setup

I have an ST Micro STM32F746G Discovery board. The processor is an ARM Cortex-M7 with a vector processing unit, and the ARM powers have conveniently provided an optimized DSP library as part of CMSIS.

The Discovery board has a TFT LCD (480 x 272) which I will use to display the output of the FFT in graphical form. Otherwise I'm just using the microcontroller itself to sample the data and process it, not even using the external RAM available on the Discovery.

Sampling Data

In a normal sampling A/D unit you would take in a differential signal, possibly pre-processed in the analog domain to remove DC components and high frequency components. In my case, what I'm sampling is the single-ended output of another microcontroller's DAC, which I have configured to represent a 1V p-p signal as 0 - 3.3V.  So when I bring in the data on the F746, I have to subtract the 1.65V offset to get the represented signal, which means I am acting like the F746 has a 1V p-p range as input.

Data Representation

While I could convert the data to 32-bit floating point once it has been onboarded, I chose in this case to leave things as close to the hardware representation as possible, so I chose the q15_t format.  There's a description of this format on Wikipedia, but basically it acts like a signed 16-bit integer in some ways, but is really a fixed-point number between -1 and +1.  It is described in the CMSIS docs as Q1.15, essentially it has a sign bit and 15 fractional bits.  So you can convert a q15 value to float by (q / 32768.0).

When I sample the 12-bit A/D data, I set it for left justification, so in the data I get a 0 represents 0 V (-1.65 in my scheme), 8000 is midscale, and FFF0 represents the high end.  Here's a table to show some other values:



So what I have is a 16 bit sample value that represents data from -1 to +1, just like q15_t.  And if you look closely at the Q15 data representation, you can see that it is the same as the sample value, just with the sign bit flipped.

Analysis

The next thing on the timeline to do is windowing, but I'm going to talk about the FFT function we will use first to introduce it.

RFFT Function

The CMSIS DSP library has an FFT function suited for what we need - arm_rfft_q15. This function takes in N real-valued samples (in q15_t format) and performs an FFT on them.  

The first thing to note, and this is due to the FFT algorithm more than anything, is that N has to be a power of 2.  Additionally this library function limits the range of N from 32 to 8192. So if you want to do really big FFTs, you may have to look elsewhere.




The layout of the output array is critical, and the documentation doesn't describe it.

The output array of the function is 2*N q15_t values.  These are the complex results of the FFT in all their glory - both the positive and the negative halves of the spectrum.  For real-valued inputs you end up with N/2 bin values, represented as complex conjugate pairs.  The way the q15_t array is laid out is bin 0 is first - real, then imaginary parts, then bin 1's two values, and so on up to bin N/2-1.  After that you have the N/2 bin, and then the reflected conjugate parts of the first bins, which you can mostly ignore.

Also there is scaling inherent in performing the FFT.  For an FFT of N samples, you normally need to divide the results by N if you are going to use the actual numeric values and not just their relative levels.  Since N is a power of 2, and  we are using q15 values, we would just shift the values by log2(N) bits.  But the next note on the library function we used is that it has already scaled the values by log2(N)-1 bits, so that basically cancels out the /N you would see if we were doing this with an array of floats.

Some other notes about using the function:

  • It modifies the input array, so don't re-use the data later.
  • You need a control structure, which you get by declaring it and initializing with the arm_rfft_init_q15 function.

Windowing

Before you perform the FFT on the A/D sample data, you need to perform a windowing function.  Basically the FFT assumes you have a continuous stream of data, but you only have a limited sample size, so the windowing function emphasizes certain samples in such a way as to make it act more like a continuous signal. There are many options for windowing functions, and they choice is often influenced by what you are doing the analysis for.  I chose to use a Hamming window as a placeholder. 

Here's the window setup function:

void fft_analysis::window_hamming (void)
{
float alpha = 25.0/46;
float factor = 2.0 * M_PI / npoints;
for (int i = 0; i < npoints; i++) {
float f  = alpha - (1.0 - alpha) * arm_cos_f32 (factor * i);
rWindow[i] = f_to_q15 (f);
}
windowed = true;
}

To implement the windowing function, you want to store an N-length array of the window coefficients. If you store them in q15_t format, for each set of A/D sample data you can use the CMSIS function arm_mult_q15 to apply the coefficients to the input data.  Since the rfft function is going to munge the input data array, I decided to do the multiplication in-place to save allocating another N q15_t values.

// apply the window factors in-place to the input data
arm_mult_q15 (input_data, rWindow, input_data, npoints);
// windowing uses saturation multiplication, so still 1.15


RFFT

Now it is a simple matter of initializing the API structure and calling the FFT.
arm_rfft_instance_q15  S;

status = arm_rfft_init_q15 (&S, this->npoints, fftFlagForward, flagNotRev);
if (status != 0) {
output ("Failed to initialize RFFT structure\r\n");
return status;
}
// this rfft function scales output to N.(16-N) where N is the 
// order of the FFT (2^10 = 1024, so 1024 samples N = 10)

arm_rfft_q15 (&S, input_data, cDstData);

I could have, probably should have, moved the initialization of the instance out of the per-sample-set loop.

Handling Complex Results

Finally, we want to process the complex output array so that we can get back to working with real numbers. We have two choices - the amplitude of the signal in each bin is proportional to the magnitude of the bin's complex vector, while the power of the signal is proportional to the square of the magnitude.  I chose to go with the power layout, so I use the arm_cmplx_mag_squared_q15 function. The function is valid over all N complex values, but we only need the first N/2+1 due to the nature of the data.

// calculate magnitudes^2 of complex values for power spectrum
arm_cmplx_mag_squared_q15 (cplex, real, npoints/2 + 1);

If you want to analyze amplitude instead, use the arm_cmplx_mag_q15 function instead.

Results

For my demo code in https://github.com/dwinant/fft_demo_stm32f746g_discovery, I plotted the first 400 or so samples of the FFT on the LCD.  I found that plotting the log output was more generally useful.  The CMSIS doesn't provide vector log functions, so that has to be done in the display loop.

Peaks

Peak detection is the next obvious step, and I will address that in another post.  But basically peak analysis involves:

  • Finding the max value in any bin,
  • Interpolating among adjacent bin values to get a better estimate of the peak frequency,
  • Using guard bands to avoid treating one peak as multiple peaks,
  • Ignoring the usual pseudo-peaks at DC and at N/2,
  • Looping to get additional peaks.
The arm_max_q15 function looks like it will be really useful in finding peaks, but we may have to zero out some data points if we want to use it to find multiple peaks.


Comments

Popular posts from this blog

Basic Scan-Mode A/D DMA (STM32H743)

Double-Buffered DMA from ADC