Creating Chaos and Hard Faults

Home > On-Demand Archives > Keynote Presentations >

Elecia White - Watch Now - EOC 2024 - Duration: 01:02:39

Elecia White

Abstract Questions & Comments (5) Chat Transcript

The best way to understand why the processor is sending you love letters (exceptions) is to see what they look like when you aren’t also frantically trying to fix your code. This talk goes over the code necessary to cause (and debug) divide by zero, bus errors, stack overflows, and buffer overflows.

For each one, Elecia looks at the information the Cortex-M processor provides and how to use that to determine the cause of the fault. She describes how to use the information in a hard fault handler to create small core dumps to be stored after a system reboot.

M↓ MARKDOWN HELP

italics	surround text with asterisks
bold	surround text with two asterisks
hyperlink	[hyperlink](https://example.com) or just a bare URL
code	surround text with `backticks`
~~strikethrough~~	surround text with ~~two tilde characters~~
quote	prefix with >

Upvotes Newest Oldest

Phil_Kasiecki

Score: 0 | 1 year ago | no reply

This was a great talk, as past experience led me to expect (I enjoyed the prior presentation you alluded to and love the Memory Map Land graphic - I use it on my work PC for my lock screen, and a former colleague who just retired loved it as well when he walked in and saw it once).

SimonSmith

Score: 0 | 2 years ago | no reply

Thanks Elecia, that was a great talk. The links are useful. I have the first edition of your book, looking forward to getting the second soon.

Is there a way to dump some or all of the call stack, beyond just the PC & LR to help work out when a particular function/method was called (similar to what the IDE gives)?

Some of my users just take a photo/video of the devices’s screen when they report a crash, so another variant of HFH could be to dump the registers to screen, then wait a few seconds for restart due to the watchdog triggering or wait forever polling for a GPIO key press and force a system restart. Including the SW version would also be useful in any dump, as addresses move between versions.

I sprinkle asserts liberally, including in overloaded new() to catch dynamic memory allocation failures before they cause effects later. I’ve recently switched to using the safe string functions strncpy_s etc that call an abort constraint handler where I have an assert(false) if there’s a buffer overrun etc.

If anyone’s interested, there was a talk by Jean Labrosse in a previous EOC all about the MPU. Barr Group also have a good article on how to prevent and detect stack overflow. Yours ties in well with the talk by Suraj Joseph on software safety.

EleciaWhiteSpeaker

Score: 3 | 2 years ago | no reply

Resources from this talk:
Introduction to Hard Faults:

First handler shown from FreeRTOS: Debugging and diagnosing hard faults on ARM Cortex-M CPUs

Second handler shown from Memfault: How to debug a HardFault on an ARM Cortex-M MCU | Interrupt (this is the most in-depth resource )

Code used in the demo: github.com/eleciawhite/making-embedded-systems/Ch09_Debugging/

Arm Documentation: [Configurable Fault Status Register - Cortex-M3 (https://developer.arm.com/documentation/dui0552/a/cortex-m3-peripherals/system-control-block/configurable-fault-status-register)

Adding NULL identification: Setting up the Cortex-M3/4 (ARMv7-M) Memory Protection Unit (MPU) - Sticky Bits

Smashing the Stack for Fun and Profit describes how to manipulate the stack

Buried Treasure and Map Files (and Linker Files) talk and map

yen

Score: 1 | 2 years ago | 1 reply

Hey Elecia, love your book and podcast! I know this might not be as common, but do you happen to know strategies to cause/debug dynamic memory allocation issues? Thanks so much for your time.

EleciaWhiteSpeaker

Score: 1 | 2 years ago | no reply

I was trying to fit this in as one last slide but I don't think I'll manage so let me reply directly. (Also, I'm plagiarizing from Ch 9 of my book.)

I have a bag of tricks I use to debug memory issues. The first is to make variables static or global to see if it is a stack issue (or an uninitialized variable issue). Note: if the problem goes away when this happens, the problem is not solved, it is hidden; you still have to fix it.

Second, you can add extra buffers between buffers (red zones). Fill these with known values (such as 0xdeadbeef or 0xA5A5A5A5). After running for a while, look in the red zone buffers to see if the data has been modified, indicating one of the buffers went outside its boundaries. You can do similar things with stacks: fill them with known data at boot and then look for how much of the stack is used after running for a significant amount of time. The highest point the stack reaches (on the edge of where the red zone data has not been modified) is called the high-water mark and represents maximal stack usage.

If you are experiencing odd issues, try making your stack(s) (or heaps) larger to see if it takes longer for the problem to occur. Alternatively, does making the stack smaller cause the issue to happen sooner? Reproducing the bug is an important part of moving from impossible bug to merely very difficult.

Finally, replace heap (malloc and new) buffers with fixed buffers. Not only can this avoid using freed memory, it alleviates issues with trying to allocate too much memory (or trying to allocate a large block when the heap is fragmented).

13:44:53 From Elecia White to Everyone:
	Code is here: https://github.com/eleciawhite/making-embedded-systems/blob/main/Ch09_Debugging/hardfaults.c
13:45:39 From Elecia White to Everyone:
	Web page I jump to: https://developer.arm.com/documentation/dui0552/a/cortex-m3-peripherals/system-control-block/configuration-and-control-register 
	Looking at the CFSER and later the CCR
13:46:19 From Elecia White to Everyone:
	Cortex-M4 Devices User Guide: https://developer.arm.com/documentation/dui0553/latest/
13:58:46 From Elecia White to Everyone:
	This first hard fault handler comes from FreeRTOS blog: Debugging Hard Faults and Other Exceptions:
	https://www.freertos.org/Debugging-Hard-Faults-On-Cortex-M-Microcontrollers.html
13:59:08 From Elecia White to Everyone:
	How to debug a HardFault on an ARM Cortex-M MCU | Interrupt
14:09:11 From Nicolas Fillon to Everyone:
	Excellent session
14:10:09 From Mark Bremer to Everyone:
	Have you looked at rust to avoid several of the issues you've mentioned?
14:10:36 From tyhw to Everyone:
	What do you recommend putting in the core dump?
14:11:57 From Nathan to Everyone:
	Do you use static analysis tools for improving detection of potentials issues (when compiler warnings are not enough) ?
14:12:02 From Jui Yen to Everyone:
	Great session! Just curious: so what actually happens when you div by 0 with the hard fault configured to not trigger?
14:12:22 From Ali to Everyone:
	Cool stuff to allocate a section in the linker file to store the coredump. How is the memory retained post reset? Are we supposed to get the NVRAM addresses?
14:12:48 From sstmichael to Everyone:
	What do you prefer for embedded systems development, C or C++? Or does it depend?
14:13:22 From tyhw to Everyone:
	Thanks!
14:14:35 From adrian to Everyone:
	Follow-on to Ali's question about memory. Do you see any issues with storing the coredump in volatile RAM (i.e. SRAM)? Most processors won't reset SRAM when the processor is reset in the hard-fault handler. Is that something we can always assume though?
14:19:01 From Nathan Jones to Everyone:
	What techniques would you recommend for catching "chaos" on MCUs that don't have a HardFault handler?
14:20:18 From Emily Schmalz to Everyone:
	All non-valid opcodes on PIC18 (for example) are treated as NOPs.
14:21:02 From Nathan Jones to Everyone:
	Thanks, you two!
14:26:27 From Gonzalo to Everyone:
	Do you have experience case of using another mcu as a supervisor (black box) and recommendation of what should be informed to it?. I think a coredamp should be sent to the supervisor. I have never need a supervisor yet, just guessing
14:27:07 From sstmichael to Everyone:
	When do you recommend running bare-metal applications, moving to writing a tiny scheduler, and then finally using an actual os/rtos?
14:28:53 From Erwin to Everyone:
	As follow up to your red zone markers: How to deal with uninitialised variables or cpu's that allocate variables by only incrementing SP + 32 for example?
14:30:02 From Erwin to Everyone:
	Especially if after SP manipulation you access memory out of the current stack
14:33:28 From Marian Petre to Everyone:
	Thank you!  Terrific discussion!
14:33:35 From Jui Yen to Everyone:
	Thanks again for the information!
14:33:42 From Mike to Everyone:
	Thank you!
14:33:43 From Lyden Smith to Everyone:
	Thank you Elecia!
14:33:43 From Emily Schmalz to Everyone:
	Yeah, absolutely awesome talk!!
14:33:45 From Stephane to Everyone:
	Thank you!!!
14:33:48 From René Andrés Ayoroa to Everyone:
	Thank you Elecia!
14:33:52 From SuziO to Everyone:
	Thank you so much!! That was a great talk and I can't wait to read the new edition of your book!
14:33:53 From Carlos Hidalgo to Everyone:
	Thank you for an excellent talk!
14:33:56 From Vishwa to Everyone:
	Thank you for your great talk
14:33:58 From Erwin to Everyone:
	Thanks for sharing all your insights!
14:34:06 From sstmichael to Everyone:
	Thanks!
14:34:14 From Elecia White to Everyone:
	Thank you all for coming!
14:34:21 From Gabriel to Everyone:
	Thanks Elecia
14:34:27 From Andrew MacIsaac to Everyone:
	Thank you!
14:34:31 From BobF to Everyone:
	Jam packed ... a lesson or several in 'How to do it' !!
14:34:31 From Ingo Beyer to Everyone:
	Great, thanks a lot Elecia!
14:34:46 From Manoj to Everyone:
	Thank you!

Login

Topic(s) Covered

About Elecia White

Creating Chaos and Hard Faults

OUR SPONSORS & PARTNERS