Exception Handling
Programmers often focus on designing the "happy path" for their code—the scenario where everything goes smoothly, values are within range, and no timeouts occur. However, this emphasis can come at the expense of a program's robustness: its ability to continue operating effectively when exceptions or unexpected conditions arise.
In this workshop, participants will actively explore run-time options for handling exceptions, including techniques like global and local error indicators, return values, immutable sections of code, goto chains, and try-catch mechanisms. Through hands-on exercises, you'll learn how to implement these approaches with minimal extra code and complexity.
By the end of this workshop, you'll have practical strategies for building more robust and reliable programs, ensuring that your code can gracefully handle the unexpected while maintaining simplicity and clarity.
According to Nathan, what is the key difference between an "error" and an "exception" in embedded systems?
I found this useful. I think referring to them as "exceptional conditions", or similar would be clearer, as "exception" has a very defined meaning in C++, Python, etc. I hadn't seen the nodiscard and lambda approaches used before. I can imagine the A-B retry very difficult to test works as expected. I think consistency in whatever approach taken is important. A really good example of consistent error handling can be found in the Micrium uC/OS-III source code (an underappreciated RTOS IMHO). Every function has an error output as the last argument, and all possible error enums are summarised in each function header comment.
https://github.com/weston-embedded/uC-OS3
I agree with all of this! Thanks for the link to the uC project. Great point, too, that many bugs can lurk in our exception handling code exactly because its often less tested than other parts of the code. It's a great reminder to try to keep those solutions as simple as possible.
Just watched the replay. Great presentation thanks !
I find that some of the techniques you are showing are great in reducing lines of code but maybe not at code readability or using standards coding rules. I guess it's a tradeoff to find between the two.
As for try / catch, I don't know how it's used on embedded as I've not used C++ for a while but I think that in the general C++ community, use of exception (as it try catch) is rather not recommended these days.
For the use of return values, I quite like the use of POSIX-like return code (0 or >0 for valid, <0 for errors). It's the default way of handling errors in Zephyr RTOS which I've been using for a couple of years now.
Actually, if assigning to a variable in a conditional expression is the only problem, then all of those techniques work just fine, provided you don't care after if there was an error or what its value is. E.g.
A() || B() || C();
works just as well as
(err = A()) || (err = B()) || (err = C());
as long as you don't about the value of err after. And if you did, an out-argument could still give that to you:
err_t err = NO_ERR;
A(&err);
B(&err);
C(&err);
like I mentioned in the middle of the workshop.
Thanks! I don't disagree that some of my recommendations will be frowned upon by some coding standards. (I'm fairly certain that many will frown on the idea of assigning to a variable like err inside a conditional expression like the logical ors or the if/elses. Are there others you saw?) In that case, I guess you still have the pattern of checking the error variable prior to each block of code:
uint16_t val;
err_t err = A(&val);
if(!err) err = B(val);
if(!err) err = C();
if(err) {...}
Personally, I feel like that amount of added code is exceedingly minimal and, furthermore, the pattern is easy to read once you see that each section of code is just wrapped with if(!err){...} so that the program will skip over them once the err variable is set.
The POSIX-like return codes are fine, provided the function is not also returning a different value through that return value (i.e. it's not an "in-band" error code), as discouraged by SEI ERR02-C.
Hi Nathan, I really appreciated the workshop! Can you expound on the difference/relationship between exception and error? Thanks.
I listed several examples of exceptions in the presentation and there were more in the chat.
Examples of errors include:
- Null pointer dereferencing
- Dereferencing pointer before memory allocation
- Improper function arguments
- Out-of-bounds array index
- Missing object/file/etc
- Stack overflow
- Out of heap space
- Module called before initialization
- Critical library return codes (e.g. RTOS unable to register thread, HAL unable to initialize peripheral, etc)
- Program about to enter an undefined state (e.g. default case in a switch block, invalid next FSM state, etc)
- Program ROM fails checksum
- Division by zero
- The file/stream used by a function is open/closed when it begins/ends executing
- The value of a const/input-only variable isn't changed by a function
Sure thing! I like the way Miro Samek describes it here. He recommends in that article to ask yourself two questions:
(1) "Can a given situation legitimately arise in this particular system?" and
(2) "If it happens, is there anything specific that needs to or can be done in the software?"
If the answer to either question is "Yes", then treat the situation as an exception. Otherwise, it's an error.
If my sensor gives me a weird value I should treat that as an exception; it's totally plausible that a sensor could malfunction. On the other hand, if a function is called with bad parameters (i.e. out-of-bounds array access), I should probably treat that as an error. The best case scenario is that that's the result of a programmer mistake (off-by-one or not checking that the array index was in a valid range before calling the function); in the worst case scenario, though, my program ROM is corrupt or my system is under attack. In those cases, the presence of the error is tantamount to an invalid machine instruction; basically "all bets are off" and the system can't trust anything that happens next so it should probably reset or transition immediately to a safe mode.
I was thinking through the various exception handling methods in terms of Philip Koopman's talk on Embedded System Safety. One of the big reasons I've heard (and thought) to use built-in exceptions handling mechanisms (e.g., C++ throw/try/catch) is because it separates exceptional conditions from the actual logic of the code. In other words, the code you "care about the majority of the time" is not obfuscated with a bunch of exception handling code. But, thinking from a safety perspective, I'm wondering if that's actually such a good thing? I would imagine it is harder to prove (e.g., in formal reviews) that safety was taken into consideration when exceptions are "bubbled up" the stack with no visible code to show that. At least with explicit exception handling code, you can prove what you are doing for safety. Thoughts?
Ooh, there are two great points to which I'd like to respond here.
First, you're absolutely right that exceptions make code verification difficult. A major criticism of try/catch exceptions is that they amount to structured gotos, leaving a nearly unknowable number of possible execution paths in your code base (see here). And because of that, it can be very hard to ensure that your code remains correct and free of memory leaks or other errors when using try/catch exceptions (see here). A quote from "Code Complete" (by Steve McConnell) that I almost included in my presentation is, "Several programming languages have supported exceptions for 5-10 years or more, but little conventional wisdom has emerged about how to use them safely."
Second, I think that moving the exception handling code to a catch block seems like a neat idea, but it actually complicates things in practice. For instance, you'd need multiple catch blocks and multiple different types of thrown exceptions if there's any sort of nuance to your program's exception handling (e.g. "if A() returns exception 1 we can retry up to 3 times but if it returns exception 2 then we need to default to an alternate operation and if B() returns any exception then we need to trigger this other operation and..."). In the simplest case where your program does exactly one thing any time there's any exception in the try block, then it can possibly be easier to read. But beyond that, I just think it doesn't scale well.
Excellent session Nathan! Thank You! I really enjoyed the interaction, getting to hear everyone's experiences and ideas.
Thanks so much!
Great session!
Thank you! I hope it was useful.
Perhaps the "Fail Quiet" concept Phillip discussed can be worked into exception handling thinking, in a safety context. Or maybe it's orthogonal. Not sure.
The opposite concept to exception handling is assertion testing (with "typed" assertions). If asserting a condition we "hope for" goes beyond just doing nothing when all is okay, but instead we post a successful assertion to a queue that is being monitored by another agent (say a watchdog thread or a safety thread), then the absence of an assertion succeeding will trigger a condition that needs to be noticed and responded too.




The chat transcript from the live event has now been posted and is available through the 'Chat Transcript' tab.