Home > On-Demand Archives > Theatre Talks >
Using C++ Features for Embedded System Development
Ravindra Singh - Watch Now - EOC 2025 - Duration: 42:40

From the introduction of C++14 onwards, there are various C++ features such as templates, constexpr, specialized classes, etc., which provide new ways to develop bare-metal software for embedded systems that is as efficient as, and in some cases more efficient than, a bare-metal C program.
Hi Nathan,Thanks for your question. Here is the assembly of one of the set functions with O1 optimization, used in configuring the clocks:
stm32l476re::rcc::ahb2rstr::gpioarst::set(1u);
800008e: 681a ldr r2, [r3, #0]
8000090: f022 0201 bic.w r2, r2, #1
8000094: 601a str r2, [r3, #0]
*p |= value;
8000096: 681a ldr r2, [r3, #0]
8000098: f042 0201 orr.w r2, r2, #1
800009c: 601a str r2, [r3, #0]
As you can see the set function has been reduced to 12 bytes of the code and any overhead imposed because of namespace and template abstraction has completely disappeared.
This is all because C++ compiler's optimization capabilities are now far more efficient. In this case what compiler does is, any computation which is dependent upon the compile time parameters of template class is done at compile time which helps the compiler reduce the code size.
If you look at the assembly of same code with O0 optimization it looks like below:
stm32l476re::rcc::ahb2rstr::gpioarst::set(1u);
8000098: 2001 movs r0, #1
800009a: f000 f84f bl 800013c <regAccess<unsigned int, 1073877036u, 1u, 0u, 1u>::set(unsigned int)>
0800013c <regAccess<unsigned int, 1073877036u, 1u, 0u, 1u>::set(unsigned int)>:
{
private:
public:
static constexpr void set( T value )
800013c: b500 push {lr}
800013e: b085 sub sp, #20
8000140: 9001 str r0, [sp, #4]
{
volatile T p;
p = reinterpret_cast<volatile T >(address);
8000142: 4b0d ldr r3, [pc, #52] ; (8000178 <regAccess<unsigned int, 1073877036u, 1u, 0u, 1u>::set(unsigned int)+0x3c>)
8000144: 9303 str r3, [sp, #12]
T mask = genMask<T,address,offset,count>::getMask();
8000146: f7ff ffea bl 800011e <genMask<unsigned int, 1073877036u, 0u, 1u>::getMask()>
800014a: 9002 str r0, [sp, #8]
value = value << offset;
value &= mask;
800014c: 9a01 ldr r2, [sp, #4]
800014e: 9b02 ldr r3, [sp, #8]
8000150: 4013 ands r3, r2
8000152: 9301 str r3, [sp, #4]
{
p |= value;
}
else
{
p &= static_cast
8000154: 9b03 ldr r3, [sp, #12]
8000156: 681a ldr r2, [r3, #0]
8000158: 9b02 ldr r3, [sp, #8]
800015a: 43db mvns r3, r3
800015c: 401a ands r2, r3
800015e: 9b03 ldr r3, [sp, #12]
8000160: 601a str r2, [r3, #0]
*p |= value;
8000162: 9b03 ldr r3, [sp, #12]
8000164: 681a ldr r2, [r3, #0]
8000166: 9b01 ldr r3, [sp, #4]
8000168: 431a orrs r2, r3
800016a: 9b03 ldr r3, [sp, #12]
800016c: 601a str r2, [r3, #0]
}
}
As you can see all the branching instructions and computation within set() function are optimized by C++ compiler when O1 optimization is used.
You can generate these assembly codes using arm-none-eabi-objdump on board.elf file built in the github project. Hope this clarifies your question. Please let me know if you have any further queries.
Excellent presentation!
Thanks
That was a nice example!
Thanks
Thanks for the presentation. I'm struggling to understand the smaller code size on C++ with optimization. I thought that using template the compiler would generate new functions (set and get) for each register as they are using different template parameters. We see this with O0 but I'm wondering what the compiler is doing to get much smaller code in O1. Do you have any insight ?