Fixed-Point Made Easy: A Guide for Newcomers and Seasoned Engineers

Home > On-Demand Archives > Theatre Talks >

Fixed-Point Made Easy: A Guide for Newcomers and Seasoned Engineers

Dan Boschen - Watch Now - EOC 2023 - Duration: 36:56

Dan Boschen

Abstract Questions & Comments (16)

Fixed-point implementation is popular for lowest power, lowest cost solutions when it is critical to make the most out of limited computing resources. However, the jargon and rules can be overwhelming to newcomers and seasoned engineers alike.

In this theatre talk, Dan will guide you through the common representations and rules for working with binary fixed point. This will include the Q notation for fractional number representation, two's complement, signed and unsigned numbers, considerations for truncation, rounding and overflow, and easy to follow rules for binary arithmetic. There will be plenty of fun examples to demonstrate the key concepts and practical use of the methodologies. If you are new to fixed-point or rusty and would like a refresher, this talk is for you! This would particularly apply to anyone that needs a recap on fixed-point and is interested in attending Dan's talk "Fixed-Point Filters - Modelling and Verification Using Python".

Even those exposed to fixed point in the past will appreciate this work-out session to quickly get back in top fixed-point shape!

M↓ MARKDOWN HELP

italics	surround text with asterisks
bold	surround text with two asterisks
hyperlink	[hyperlink](https://example.com) or just a bare URL
code	surround text with `backticks`
~~strikethrough~~	surround text with ~~two tilde characters~~
quote	prefix with >

Upvotes Newest Oldest

johned

Score: 0 | 3 years ago | 1 reply

Great talk, Dan,
Back in the days before everyone had PCs and laptops I was designing some filters by hand and negating values using the "Flip All the Bits and add 1 LSB" technique I'd been taught at Uni.
The chief engineer told me about a much more efficient solution "starting with the lsb write down all the bits up to and including the first "1" then negate all of the other bits". Much easier to do in your head and also easier to implement in and FPGA :-)

DBoschenSpeaker

Score: 0 | 3 years ago | 1 reply

Thanks Johned! Sounds interesting- Could you elaborate with an example? I am not sure I a following it yet…

johned

Score: 0 | 3 years ago | 1 reply

Sure thing, Dan,
If we look at your negation of 18, if you write down the all the bits from the lsb up to the first '1' then we have : 10
Now negate all of the rest: 000100
Combine the two: 00010010
PS Glad to see you're using the correct Q format (Arm, with used to be called AMD format) :-)

DBoschenSpeaker

Score: 0 | 3 years ago | no reply

Ah I see! That is very nice, thanks for sharing that.

RickLyons

Score: 1 | 3 years ago | 1 reply

Hi Dan. This terrific presentation of yours reminds me of a quote from English poet Goeffrey Chaucer [1340-1400], "Gladly would he learn, and gladly teach."

DBoschenSpeaker

Score: 0 | 3 years ago | no reply

Ha! Thanks for watching Rick. You hit it on the head. I have such good memories and learned a lot from going through the 2nd order resonator in such detail with you, starting with your great write up reminding us of the interesting quantization patterns for the poles in such structures and the Coupled Form: https://www.dsprelated.com/showarticle/183.php

nathancharlesjones

Score: 0 | 3 years ago | 1 reply

Are you able to recommend any good fixed-point math libraries, Dan? I've played around with fixed-point before a little and I feel like I could make +,-,*,/ functions but I'd get stuck if I needed to do anything more complex like sin or sqrt.

DBoschenSpeaker

Score: 0 | 3 years ago | 1 reply

I recommend fpbinary for Python which I cover in detail in my other workshop. I believe they have plans for adding trig and sqrt functions. However my concern for verification is if the algo used exactly matches the implementation - for that I would typically match the particular implementation when that is known (LUT etc).

nathancharlesjones

Score: 0 | 3 years ago | 1 reply

What about for on an embedded controller? Any C/C++ libraries for fixed-point math/DSP?

DBoschenSpeaker

Score: 0 | 3 years ago | 1 reply

I’ve come across this but have no experience using it: https://github.com/deftio/fr_math

nathancharlesjones

Score: 0 | 3 years ago | no reply

Thanks! I'll check it out.

rokath

Score: 1 | 3 years ago | 1 reply

Great in-deep explanation, Dan!
Just let me add how one can deal with float operations avoiding float or division in this example:

temp =  ((   5281 * ADCRaw) >> 16) - 50; // r=(adc* 3300 /4095 - 500 )/10  = adc*  3300/40950 - 50;

The 12-bit ADC value needs to be multipled by 0.80568.. and as we know division is costly. But you can transform it to a shift operation if you know the divisor at compile time, here 40950. A calculator gives: (3300/40950) * 2^16 = 5.281,289... ~ 5281.
The max ADC value is 4095, so 5281*4095 = 21.625.695 -> no 32-bit overflow possible. To decrease the error further a 22-bit shift is even better. This allows a really fast computation at runtime. Hope that helps someone.

DBoschenSpeaker

Score: 0 | 3 years ago | no reply

Thank you Rokath!

RemingtonFurman

Score: 2 | 3 years ago | 1 reply

Thanks! The negative weight representation for the sign bit is very useful, and I'm surprised I haven't understood it that way before.

The different "TI" vs "ARM" Q notations are unfortunate. I like the "ARM" notation more too, because it's explicit with how many bits are being used. On a microcontroller it's often going to be a multiple of 8, so if you see "Q3.4" you know you're looking at "TI" notation. But on an FPGA or other chip that might not be immediately clear. Also, after seeing negative m and n values, I agree that Q(m,n) is a better notation.

I'm a fan of Randy Yates' fixed point documents, and it looks like he just made an update to "Fixed-Point Arithmetic: An Introduction" four days ago on April 21, 2023, though the revision history doesn't say what changed.

Another neat fixed-point math fact is that overflowing a fixed point value during a computation doesn't matter, as long as later operations in the computation bring the result back in range before the final result. Your binary wheel diagram makes that much easier to visualize.

DBoschenSpeaker

Score: 0 | 3 years ago | no reply

Thanks for the good comments Remington. I believe Randy simply uploaded his "PA10" version from July 3, 2021 as listed in the Revision table. The date at the top is the print date, but the Rev matches the table. However this was a significant addition as my "PA9" version was only 1111 pages long and now it stands at a whopping 11001 pages!

DBoschenSpeaker

Score: 2 | 3 years ago | no reply

In the recorded presentation I mention 2's complement as "Flip All the Bits and add 1 LSB". This would be clearer as "Flip All the Bits and 1 LSB Weight" (or "Flip All the Bits and Add 1 Bit"), which is what is now shown on the downloadable pdf . I wanted to be explicit since "Flip all the bits and add 1" as often stated can be confusing. To convert negative numbers to and from 2's complement, we add the weight given to the LSB in the Q representation we are using. (ex: Q15.1 would be flip all the bits and add 0.5)

Login

Topic(s) Covered

About Dan Boschen

Fixed-Point Made Easy: A Guide for Newcomers and Seasoned Engineers

OUR SPONSORS & PARTNERS