Non-Powers-of-2 FFTs and Why You Should Use Them

Login

About Bradford Watson

Bradford S. Watson is a Technical Fellow at Lockheed Martin, where he has been a valued employee since 1997. He holds a Master of Science in Electrical Engineering with a focus on Signal Processing from the University of Colorado at Denver (2001). Watson has established an extensive background in Digital Signal Processing for spaceflight applications, with expertise in DSP Algorithm Design, FPGAs/ASIC design, and Digital Signal Processors.

In 2021, Watson published a book titled "Channelizers and Reconstructors - A Design Guide" (Nova Science Publishers ISBN 1685071740) on spectral channelization and reconstruction. He is also the inventor of US Patent #7675985: "Digital Channelizer Unit (DCU)". Watson has played a pivotal role in conceiving, architecting, designing, and leading the development of numerous modern digital processing payloads for the commercial, government, and international space sectors.

His professional interests encompass communications signal processing, statistical signal processing, array processing, adaptive systems, and digital design.

Home > On-Demand Archives > Talks >

Non-Powers-of-2 FFTs and Why You Should Use Them

Bradford Watson - Watch Now - EOC 2025 - Duration: 48:10

Non-Powers-of-2 FFTs and Why You Should Use Them

Bradford Watson

Get Your Pass Today!

Abstract Questions & Comments (4)

The Fast Fourier Transform (FFT) is a fundamental algorithm in digital signal processing, and it comes in various forms. However, the most commonly used FFTs are those with sizes that are powers of 2 (e.g., 256, 512, 1024, etc.). This is due to their widespread availability, ease of implementation, and ability to achieve an efficiency of O(N log N), making them a popular choice among designers.

While powers-of-2 FFTs are convenient and efficient, they may not always be the best choice for every design. In some cases, using a power-of-2 FFT is not always the best choice from a system standpoint, and can lead to complications in other parts of the system in terms of inefficient use of resources and increased complexity.

Fortunately, there are alternative FFT algorithms that can handle any size of FFT, not just powers of 2.

This talk will describe traditional FFT implementation, more exotic FFT algorithms such as:

Rader FFT
Winograd FFT
Prime Factor FFT

It will also cover how to extend Cooley-Tukey factoring to combine powers-of-2 and non-powers-of-2 FFTs, and the Four Step FFT.

Several other algorithms will be touched upon, along with their application, advantages, and disadvantages.

M↓ MARKDOWN HELP

italics	surround text with asterisks
bold	surround text with two asterisks
hyperlink	[hyperlink](https://example.com) or just a bare URL
code	surround text with `backticks`
~~strikethrough~~	surround text with ~~two tilde characters~~
quote	prefix with >

Upvotes Newest Oldest

EnesMUTTA

Score: 0 | 2 weeks ago | 1 reply

Hello Mr Watson
I enjoyed your presentation and would like to thank you.

I am a young FPGA designer and I implement FFT cores for real-time signal processing. The systems I work on sample @100+MHz and the FFT cores I design crunch data in a serial manner (1 input-output per clock sample). Before I watched your presentation, I thought non-powers of 2 FFTs would be too impractical (costly) for the kind of data loads I handle, but I am starting to wonder if that is not the case :). I can already think of a couple of benefits a non-power of 2 FFT implementation can provide.

Do you know (maybe from experience) whether or not it is possible to implement the non-power of 2 approaches you presented can be implemented on FPGAs in a real-time pipelined fashion? If so, can you comment on the memory requirement? For an FFT that has a point size N (which is a power of 2) the single-path delay feedback (SDF) architecture requires a memory that can fit N-1 samples. Can the pipelined versions of non-power of 2 FFTs be implemented with a similar amount of memory?

Also, if these pipelined versions exist, I think it should be possible to have run time configurable transform length functionality by simply bypassing some initial stages on the pipeline. For instance, in the FFT360 example @44min; if the FFT8 and the following transpose is bypassed and the input samples are directly fed to FFT9 then we should obtain FFT45 using the remaining structure (with minor modifications to rotator modules within FFT9 and FFT5). Can you comment if this sounds right?
Kind regards

Bradford WatsonSpeaker

Score: 1 | 2 weeks ago | 1 reply

Hello, thanks for watching the presentation and for your comments.
I have implemented many non-powers-of-2 FFTs in FPGAs with various types of architectures, including those with the SDF architecture (although this is the first time I've heard it called that, lol). I think the primary challenge you'll have with the memories is that there will always be some non-utilized memory in FPGA block ram because the number of samples isn't a power of 2. This is less of a problem in ASICs where you can specify exactly how many memory locations you need. In principle I think it should be similar to the radix-2 FFTs (N-1 samples as you're saying), but I haven't looked into it, so I don't know. Also, the bit-reversal trick that's commonly used with radix-2 FFTs isn't likely to be very clean, so you'll need to manage memory pointers.
For your second question, it's perfectly possible to bypass stages and use components of a larger FFT to calculate smaller ones. I'm doing that right now on a project.
Thanks again for your interest and comments.
Brad

EnesMUTTA

Score: 0 | 2 weeks ago | 1 reply

Hello, thanks for your swift response.
The point sizes I deal with are ranging from 10^4 to 10^6 so the imperfect utilization of some memory primitives can be simply disregarded as rounding errors. This is also the reason why, although very neat and clever, the memory requirement of the reordering of the input samples for the prime factor algorithm sounded to me like a nightmare compared to the cost of a single rotator which it saves. I should also admit that I was baffled when you said in your presentation

The Prime Factor FFT is useful for large composite FFTs that are non power of 2.

Although, I guess the scale of large and throughput requirement changes from application to application and I should not only consider it for my usage types. I have come across multiple cases where the memory utilization within the FPGA approaching +90% while the DSP and LUT utilization was barely 10%. Had to come up with some creative solutions for that.
When it comes to natural ordering, you just have to bite the memory cost of that if you don’t want to take any penalty to throughput. There is no way around that (as far as I know, I would welcome anyone who points out there are clever solution around it). I remember coming across a paper that reduced the memory cost of output reordering by a couple of percent while increasing the complexity of managing it astronomically higher.
When it comes to the reordering the output of a pipelined non-power of 2 FFT module, it did not occur to me it may be different compared to regular power of 2 modules before you pointed it out (thanks). Definitely some in depth work is needed to analyse and figure out a solution for it.
Thanks again for taking your time and responding

Bradford WatsonSpeaker

Score: 0 | 2 weeks ago | no reply

Thanks. Sounds like you have some interesting challenges.
When I said "large" I was thinking more in a practical sense. For example, a 6-point FFT could be composed of a 2-point cascaded with a 3-point. But, you could build a 6-point FFT using the Winograd approach, which would likely be better performing. On the other hand, I've build 12-point FFTs using the PFA approach many times, and I wouldn't call them "large." So, maybe I overstated it.

Login

About Bradford Watson

Non-Powers-of-2 FFTs and Why You Should Use Them

OUR SPONSORS & PARTNERS