Home > On-Demand Archives > Talks >
Demystifying Memory Protection Units (MPUs)
Jean Labrosse - Watch Now - EOC 2021 - Duration: 52:55
A Memory Protection Unit (MPU) is hardware that improves the safety and security of an embedded device by only allowing access to memory and peripheral devices by the code that needs to access those resources. The application can be organized by processes, each having access to its own memory and peripheral space. Not only does the MPU prevent application code from accessing memory or peripheral devices outside its designated area, but it can also be a useful tool for detecting stack overflows, one of the most common causes of issues when using an RTOS.
This class discusses some of the features provided by most MPUs, but specific examples assume the MPU found in most ARM Cortex-M MCUs. Topics covered include:
- Privilege modes
- Limiting RTOS APIs for user code
- Preventing code from executing out of RAM
- Sharing data
- Keeping RTOS objects in RTOS space
- Handling faults
- And more
From the live chat:
Jean mentioned that not using MPU can make it difficult to certify your product. I wanted to ask if he could ellaborate on that please. Great presentation btw!
The MPU allows you to place ?access restrictions? on code and, especially the data that code can access. So, if you get either open source software and/or commercial software that you are unsure about, the MPU will prevent that ?unknown? code from accessing memory and/or I/Os you don?t want that code to access. So, if YOUR code is designed for ?safety critical? applications (avionics, medical, industrial, nuclear, etc.) that cannot afford a crash from potentially ?unsafe? code then the MPU will protect that safety critical code from the ?unsafe code?. Also, it?s extremely expensive to certify code such as TCP/IP stacks because those are huge and complex. Isolating that code with an MPU allows you to have a safety certified product using a TCP/IP that that has not been certified. Without an MPU, you simply cannot prevent unsafe code from taking a system down.
From the live chat:
It was a great presentation. Lots of fascinating information. I?m a little unclear on how you determine which tasks get grouped into processes - is it functional or a matter of which regions of processor registers they need to access? Also, I imagine the MPU and privileged mode switches adds overhead. Is it at all significant?
I would delineate processes by functionality and determine how many tasks are needed for each of those processes. So, communications could be one process with multiple tasks, controls another process, reading and writing from/to I/Os could be another process, user interface yet another, etc.
Indeed, running tasks within processes in user mode (non-privileged) does add overhead every time you make an RTOS API call! The amount of overhead depends on the API called. You might be looking at between 50 to a couple of hundred CPU cycles. The complex APIs such as OSTaskCreate() (e.g. uC/OS-III) would add a couple of hundred cycles but, that?s API does a lot of work and, in the grand scheme of thing, the overhead is irrelevant because you create tasks at startup and not run-time (at least you should).
From the live chat:
From javi : In the process table example you saw us there were many unused regions, is there any reason for that?
Well, it turns out that I only needed 3 regions for those: code, RAM and stack. However, you?d need additional regions for shared RAM and I/Os. On the ARMv7M you only have 8 total regions to play with so you have to plan how your memory is layer out.
From the live chat:
How I can document a RTOS project (a tool) that show the relation between task, priorities, sempahores and mutexes?
Like I mentioned during the Q&A, I like to do a drawing showing all the tasks and ISRs and the communication paths using queues, semaphores, mutexes, etc. Examples of those types of drawings are available from my books which can be downloaded fo free from:
https://www.weston-embedded.com/micrium-books
From the live chat:
I have a task that handle the UART messages logger. I create a queue (Object) that all the other task use for send message to the UART logger. The problem is that the object of the QUEUE that store the message consume much memory, because for example save 256 bytes to the message. If I defined a queue of the 200 elements... the queue use 51.2KB from RAM... What you suggest me to reduce this high consumption?
It sounds like the messages are not being consumed by the target on the other side of the UART (i.e. the recipient). I assume you don?t have Non-volatile memory like a EEPROM to store those messages until they can be consumed? Alternatively, instead of using fixed size messages, have you consider using a large circular buffer that would then hold variable sized messages. As long as there is a way to know either the beginning or even the end of the message then you would reduce your storage need. So, if your messages are ASCII based, NUL terminated messages would be used to know the end of the message. If you don?t use ASCII, you could use a unique 32-bit code to delineate messages.
From the live chat:
You say "Code can execute out of RAM... Susceptible to code injection attacks" Could you explain me this point?
If you set the XN bit for a region that puts a fence around RAM then code that is injected into RAM from a buffer overflow attack or other means will generate an MPU fault preventing that code from executing. Without the MPU, this would go undetected.
Hi,
About the MPU vs "malloc"/new/C++/heap thing vs memory protection (design trade-off?), couldn't it be possible to overload new/delete or provide a "process storage allocator" that would work with a "process protected storage heap". The concept will be similar to the "thread storage" pattern (POSA2, i.e. errno in posix, impure_ptr in newlib and the likes..), so a malloc operation could SVC a syscall requesting memory that would actually use the TCB (Task Control Block) to make the allocator work on the process protected heap, so to allocate a piece of that region.
Anyhow, the main issue would probably still be memory fragmentation (no MMU), unless you are using it only to allocate memory on initialization and never free it, for example to make platform code that can be composed dynamically, to improve (reduce the effort) of its composability.
Unless I didn?t understand your question, there are just not enough MPU regions (fences) that you can use with malloc() and free(). Also, you?d need to ensure proper alignment as well as power of 2 allocation size. I?d rather have a local heap for a process that is managed by the process itself or, have a heap in shared RAM managed by ?system? functions. That being said, I?d prefer fixed-size memory blocks than arbitrary size allocation blocks.
I used an MPU once and this lecture kicked me in making it an habit. Thank you for getting out of retirement for this interesting lecture. Also glad I could meet you in person in ESC Boston roughly 10 years ago.
Warm thanks from Qc.
Awesome. Great to know you enjoyed the class. As you noted, it?s a great device to use, especially since it?s included with most Cortex-M. as I indicated at the beginning of the Q&A, you DON?T have to force your code to run in user mode. If you have the discipline of not tampering with the MPU from your application, you still get a lot of benefits from using the MPU, except with less protection and less overhead.
Hats off Mr. Labrosse.
You just fit a master class on RTOS, their entities, and their protocols, all the while giving away most useful insights, tips and pitfalls in a super-divulgative and well-explained talk. It really shows this is a very known territory to you. I will definitely look at the "more reading" part of the end of the presentation.
Thank you for your kind words. I?m glad you enjoyed it.
Thanks for the effort and this good presentation. The topic is really interesting and make me wanna use MPU in my projects.
If I understand correctly, uCoSIII will be able to provide an MPU module in future releases. When do you think it will be available?
Also beyond the borders of this great presentation, I wonder that is there any plans to offer multicore (SMP or AMP) support with uCoS? I am wondering because I can only use one of the cores with CortexA9 chip and the other core is just waiting idle.
Regards
Well, I'm not sure whether it will be uC/OS-III or the Weston Embedded Solutions (WES) version which is called CesiumOS3. Weston Embedded is formed of ex-Micrium guys and they created a derived product from the Micrium software. That's because Silicon Labs released the uC/ product as open source but many 'customers/users' prefer having a commercial version of the Micrium products. The initial version of CesiumOS is identical to the uC/ products when Silicon Labs released the product for open source. From now on, CesiumOS will be maintained for commercial users. WES also maintains the uC/ products, though.
I know that WES has an AMP version of uC/OS-III. SMP is a different animal. Check with WES: www.Weston-Embedded.com for any updates.
Excellent presentation. Thank you.
I had a question about the exception that can be fired when the stack overflows. Which stack does the exception use since the task's stack just overflowed - is it the RTOS's stack?
All exceptions (on the ARM Cortex-M) forces the CPU to switch to an 'exception' (or ISR) stack. So, it's not technically the RTOS stack but the CPU's ISR/exception stack.
What about 3rd party libraries that use malloc/free? For example newlib, ST's drivers, etc?
You 'could' use an MPU region to 'wrap' all of the Heap space and add that region to all process tables. You won't be able to distinguish each of the allocated areas and protect them from one another.
So if a device is not in a task's process table, then it cannot access it?
That's correct. This is done to ensure that a task that has no business accessing an I/O device doesn't. For example, a TCP/IP stack should be allowed to access the Ethernet controller but not a User Interface process. However, if you need to access the I/Os then you simply add a region for the MPU to access those I/Os.
From the live chat:
I was curious about why he put the stack in the last region.
The higher the region the higher the priority. This ensure that stack overflows will be detected even if other regions permit access to the RAM space. In fact, if anything, there?s a greater likelihood to have a stack overflow than most other memory violations.