Home > On-Demand Archives > Q&A Sessions >

Live Q&A - Super-Simple Tasker - Hardware RTOS for ARM Cortex-M

Miro Samek - Watch Now - Duration: 29:35

Live Q&A - Super-Simple Tasker - Hardware RTOS for ARM Cortex-M
Miro Samek
Live Q&A with Miro Samek for the talk titled Super-Simple Tasker - Hardware RTOS for ARM Cortex-M
M↓ MARKDOWN HELP
italicssurround text with
*asterisks*
boldsurround text with
**two asterisks**
hyperlink
[hyperlink](https://example.com)
or just a bare URL
code
surround text with
`backticks`
strikethroughsurround text with
~~two tilde characters~~
quote
prefix with
>

DS
Score: 0 | 10 months ago | no reply

Outstanding presentation! Thank you for sharing it. It has certainly caused me to consider ways that I may be able to apply this paradigm in my next project design.

nathancharlesjones
Score: 0 | 1 year ago | 1 reply

I had a bit of a fascination with schedulers a while back since it was covered so heavily in my grad studies and because they're so easy to get wrong! For those interested in reading more, Phil Koopman has a whole chapter on schedulers in his book "Better Embedded Systems Software" and some of that material you can find on his teaching website. Miro's book is also freely available.

If the idea of a small, simple scheduler is of interest to you, here are a few that I've found on the internet. Some look like an RTOS, but are written in very few lines of C (like SST), which gives them smaller overhead and also helps makes them easier to understand.

MiroSpeaker
Score: 1 | 1 year ago | 1 reply

Thank you, Nathan, for the compilation of the various schedulers. From that list, I'm only familiar with Adam Dunkels' Protothreads. I've also written how Protothreads compare to state machines. (Readers might enjoy a heated debate at the end of that blog post.)

But if we're making lists, here is a list of approaches related more specifically to the discussed SST kernel (although not necessarily incorporated into SST in any way):

nathancharlesjones
Score: 0 | 12 months ago | no reply

Yes, thanks for posting those! I saw that list at the end of your presentation and I'm very excited to check them out. (My apologies if it seemed like I was trying to hijack your thread with my last post!)

nathancharlesjones
Score: 1 | 1 year ago | 1 reply

Wonderful talk, Miro, and I really enjoyed the Q&A! I wanted to follow-up about my last question regarding the way that the SST addresses events that might try to post while a task is running. How much of the following is correct?: In the legacy version of the SST (which you said was implemented just in software, right?), the "end of interrupt" instruction was issued (for platforms that supported it) so that the processor could still respond to ISR requests. Essentially, this allowed for nested interrupts and a preemptive scheduler. But the interrupt controller on the Cortex-M is already nested so we don't need a separate "EOI" instruction. However, we do still need to worry about tasks running at one priority level and ISRs that post events running at another priority level. So the solution is to have all ISRs that post events run at a higher ISR priority than any of the tasks, right? Thanks!

MiroSpeaker
Score: 1 | 1 year ago | 1 reply

Hi Nathan,
Thanks a lot for tuning in and good questions. Your description of the "legacy SST" sounds correct and for completeness, I think that it would be a good idea to provide an example port of SST to a more traditional CPU, such as MSP430, for example.
But in the "hardware SST for Cortex-M" implementation covered in the presentation there is really not much distinction between ISRs and tasks, except that ISRs are pended from hardware while tasks are pended from software. Beyond that, the NVIC really doesn't "know" any difference. This means that the prioritization of ISRs above tasks is a pure convention. In fact, it would be interesting to ignore this convention and try mixing priorities by assigning lower priorities to some ISRs than some tasks. For example, the SysTick ISR for slower ticking rates (say, only 100Hz) might have a rate slower than some tasks. According to RMA then, SysTick ISR could (and perhaps should) be prioritized according to its lower rate.
But with the existing and working implementation of SST, you can quite easily test all such ideas. I would highly recommend that you do!

nathancharlesjones
Score: 1 | 1 year ago | 1 reply

Thanks for that response, Miro! I think I understand that the NVIC doesn't necessarily know or care whether it's executing an "ISR" or a "task". My question comes from 14:56 in your presentation in which it looks like the "Button" task/ISR is running at the highest priority and the other tasks (Blinky3, Button2b, Button2a, and Blinky1) are running at lower priorities. Those last 4 tasks are all triggered in SW, right? It's only the "Button" task/ISR that is actually triggered by a HW input (i.e. the pressing of a button)? If that's correct, I guess my question is "Is it true that the Button ISR (or any ISR responding to HW events) must run at a higher priority than any "tasks" (ISRs that are triggered in SW by pending them in the NVIC)?" It would seem so, since otherwise the system could be running a task at a higher priority than an ISR that's responding to HW inputs and the SST might "miss" an event. At 28:34 in your presentation there's another view in which we can see that "Blinky3", "Button2b", "Button2a", and "Blinky1" are associated with IRQs PVD, RTC, TSC, and I2C2 (respectively). So those IRQs must be configured to have a lower priority than the Button ISR (not shown), correct?

MiroSpeaker
Score: 1 | 1 year ago | 1 reply

Hi Nathan,
Thank you for drilling into the details. I realize it's tedious, but that's the only way to truly understand how such a kernel works. (The good news is that a deeper understanding of any concept is very satisfactory and allows you to use the concept more efficiently and with greater confidence.)

So first, I now realize that the logic analyzer trace might be confusing because it shows "Button" as the very top trace. This "Button"-labeled trace is just a raw button wire, which provides a trigger for the logic analyzer when the button is physically depressed. So, again, the Button signal does not represent any task or ISR and as such has no NVIC priority assigned to it. This is a good point and perhaps confusing for other viewers as well so I'm glad for the opportunity to clarify that point.

With this out of the way, your confusion about the Button2a and Button2b tasks should be clarified. The task priorities in the "blinky-button" example are embedded in the names of the tasks. So, for example, Button2a and Button2b have both priority 2, which is lower than priority 3 of Blinky3. Consequently, the two Button tasks are not the highest-priority tasks. All these tasks are triggered exclusively in software by posting events to them and none of the tasks are triggered in hardware. Specifically, event posting to the Button2a task occurs in the SysTick ISR. (The periodic SysTick ISR is used to "de-bounce" the button press and therefore in the logic analyzer trace you see the scenario unfolding only at the second tick after the button press. Please check the code for the SysTick_Handler() inside the BSP file bsp_nucleo-l053r8.c)

At 28:34 in your presentation there's another view in which we can see that "Blinky3", "Button2b", "Button2a", and "Blinky1" are associated with IRQs PVD, RTC, TSC, and I2C2 (respectively). So those IRQs must be configured to have a lower priority than the Button ISR (not shown), correct?

Yes, that's correct. The association between SST tasks and NVIC IRQs happens in the BSP (because it depends on the particular MCU). The assignment of SST priorities to tasks occurs in main.c, when the tasks are started. The priority assignment to the SysTick ISR is performed in the SST_onStart() callback found in the BSP (file bsp_nucleo-l053r8.c). You can see there: NVIC_SetPriority(SysTick_IRQn, 0U), which means that SysTick is assigned NVIC priority 0 (the highest in the reversed priority order used by NVIC).

I hope this clarifies the situation. Thanks again for the great questions.
--Miro

nathancharlesjones
Score: 0 | 1 year ago | no reply

It does! Thanks, Miro!

Aniket
Score: 1 | 1 year ago | 1 reply

Hi, Thanks for the lucid explanation. We have been using a very similar implementation albeit on a different family of micro-controllers. One of the qualms that some of us have had with this implementation is that the number of priorities possible depends on the number of interrupt priorities supported by the hardware because of the way the stack is used. While I agree the context switch is significantly faster since the OS need not swap out the stack, don't you think that implementations such as freeRTOS are more flexible with regard to usage of interrupt priorities?

MiroSpeaker
Score: 1 | 1 year ago | no reply

Hi Aniket,
There is a lot to unpack here, so let me peel off the layers of this question one by one.

Let me start with the number of available priority levels (which is often confused with the number of available tasks). The main technical reason why virtually all RTOS kernels (such as FreeRTOS and also SST) support preemptive, priority-based scheduling is the RMA/RMS method. However, if you really apply this method, you will quickly notice that it becomes hopelessly complex for priorities beyond 2 or 3 levels starting from the top. In practice, people don't apply RMA/RMS (do you?). Instead, task priorities are usually assigned by "intuition" and then endlessly tweaked to compensate for various "glitches" (which all can be traced to some blocking calls wrecking the responsiveness of the system). With this type of approach, it is really hard to say how many task priorities you really need. So going back to the presented hardware implementation of SST, the NVIC limit on interrupt/task priorities might be perceived as low. But if you choose Cortex-M silicon with 4 NVIC priority bits (16 priority levels), such as STM32 M3 or higher, you should have really more priority levels than you can possibly know what to do with.

The second issue is the number of possible tasks, which goes to the fundamental reason why an RTOS kernel is used in the first place. And that reason is that RTOS allows you to split your application into quasi-independent tasks (so an RTOS is foremost the "divide and conquer" strategy). To that end, the frequent cause for the proliferation of tasks is blocking because blocked tasks waiting for one kind of event are unresponsive to other events, so you split them into more tasks that can wait for those other events in parallel. This of course leads to poor cohesion and sharing of resources, which leads to mutual exclusion and more blocking. For all these reasons, experts advise avoiding blocking, even if your RTOS can do this. (The NASA JPL developers, from the paper I quoted in the presentation, were certainly aware that the VxWorks RTOS can block anywhere they wanted. Still, they specifically chose not to.) My point here is that non-blocking tasks remain extensible for new functionality (new events), so you need fewer tasks. Such tasks give you freedom to split your functionality the way it really fits the problem with proper cohesion, rather than compensate for unresponsiveness inflicted by blocking.

In the end, the dominating preconception is that for real work you need a "real" RTOS kernel, like FreeRTOS. I disagree. But I understand the immense inertia of the embedded community and therefore I devoted 1/4 of the presentation to explaining how you can use FreeRTOS in an event-driven way. If this is the only takeaway from this talk, I'd consider my main mission as accomplished. For anyone interested in a more complete event-driven framework built on top of FreeRTOS, I recommend the "FreeAct" project available on GitHub.

MiroSpeaker
Score: 1 | 1 year ago | no reply

In case it didn't come through clearly enough in my closing remarks, I'd like to reiterate that the most important take-home message is to avoid blocking your tasks. This is regardless if you are using SST or a traditional RTOS. (Of course, in a traditional RTOS you have to block, but that should be limited to the top of the event loop.) This is also why I've spent 1/4 of the time explaining the FreeRTOS version of the event-driven "blinky-button" example. If this is the only takeaway from this talk, I'd consider my mission accomplished.
Finally, if you are interested in a more complete event-driven framework built on top of FreeRTOS, there is a dedicated, permissively licensed GitHub repo called "FreeAct". Please check it out!

BGav
Score: 0 | 1 year ago | 1 reply

Hi Miro, great presentation!
On the topic of blocking, is it possible to avoid all blocking calls in a more complex system, for example one that has a filesystem, networking stack and shell? I am a big fan of event driven development, and I agree that blocking calls are not ideal and should be avoided. But I am struggling to see how it would be possible to completely remove them from a large IoT platform for instance.

MiroSpeaker
Score: 2 | 1 year ago | 1 reply

Can a substantial, real-life system be implemented without blocking? Yes, I think so! One example is the NASA JPL program of Martian rovers (starting with Pathfinder in 1995), which was described in the paper discussed in the presentation. The real-time embedded software in the latest Perseverance Rover certainly qualifies as a complex system.
 The biggest problem is not the technical impossibility, but rather the traditional sequential approach taken in middleware (such as communication software, file systems, etc.) Such 3rd-party software often blocks internally and thus requires blocking primitives and a traditional blocking RTOS to run. Also, there is often sharing of resources among concurrent tasks, which requires mutual-exclusion mechanisms (also typically blocking, like mutexes). Interestingly, for reasons of performance, some such 3rd-party software (e.g., LwIP TCP/IP stack) is internally implemented in an event-driven way, then blocking is added only in the end for "convenience" (e.g., to provide BSD-style blocking socket interface). But event-driven, non-blocking middleware becomes increasingly popular and available. A good example is the Mongoose Embedded TCP/IP Web Server from Cesanta.
 Regarding embedded file systems, in particular, the situation is complex because these often use microsecond-level delays (e.g., for erasing segments). Such delays are too short for blocking, so they are implemented as busy polling. Obviously, this causes high CPU utilization, and because of that the tasks running file systems must be prioritized low. (Otherwise, they will prevent other tasks from running for a long time.) This last comment applies to any execution environment, blocking or not.

BGav
Score: 0 | 1 year ago | no reply

Thank you, I appreciate the detailed response!

Paul12345
Score: 2 | 1 year ago | 1 reply

This is definitely the best or one of the best presentations at this conference. I've been aware of Miro's scheduler work for a couple years and have occasionally used a (simpler?) C++ event loop from embxx. This talk was a great introduction and I plan to watch some of his YouTube videos to supplement.

MiroSpeaker
Score: 1 | 1 year ago | no reply

Thank you, Paul!
Please also visit the SST repo on GitHub, and try it yourself. The provided examples are intentionally for very inexpensive eval-boards. I also used a cheapo logic analyzer to make testing SST easy.
For a deeper background about event-driven, preemptive but non-blocking kernels, I'd recommend reading the original SST paper. An additional bonus is a comment section attached at the end, which clarifies many misconceptions.
Finally, I hope to see you at the live discussion on Zoom!
--Miro

ChipWeller
Score: 1 | 1 year ago | 1 reply

Miro,
Thanks for the talk. Using the unused NVIC vectors to dispatch tasks is very creative.
I have been using a simple prioritized event driven non-preemptive dispatcher since the 1990s on numerous projects. With my implementation there is only one event queue, and events are pulled by their priority before being sent to their registered event handler (task). This structure allows a task, which is often a state machine, to handle events based on priority instead of FIFO. Could you comment on what advantages you see in associating priorities with the tasks instead of the events?
Thanks, Chip Weller

MiroSpeaker
Score: 1 | 1 year ago | 1 reply

Hi Chip,
Thank you for the questions. Your non-preemptive dispatcher reminds me somewhat of the non-preemptive "SST0" that I mentioned in my talk (see https://github.com/QuantumLeaps/Super-Simple-Tasker#non-preemptive-sst0 ) . But in contrast to your dispatcher, SST0 uses multiple event queues (priority queues).
So, regarding the question about the single, but prioritized event queue, I have the following problems with this approach: First, sorting the queue at runtime as events of various priorities are posted (or pulled out) is probably non-trivial (expensive) and maybe not quite deterministic (likely depends on the number of events in the queue.)
But even more importantly "pulling events by their priority" potentially changes the order of events for the same task (e.g., when a low-priority event is followed by a high-priority event and the low-priority event was not consumed yet). This is very confusing at best and leads to errors at worst. For example, if the low-priority event triggers a state transition, that new state might react completely differently to the following high-priority event than the previous state. I hope you see my point.
And another argument against assigning priorities to events (as opposed to tasks) is that it can lead to priority inversions. This probably applies only to preemptive kernels, but still, let me quickly elaborate. Suppose for example that a task is handling a low-priority event and a high-priority event arrives in the common queue. But due to the RTC (run-to-completion) semantics, the high-priority event must wait until the processing of the previous event completes. But that processing runs at a low priority (the priority of the original event). That means that other medium-priority events destined for other tasks can preempt the low-priority one, while an urgent, high-priority event is waiting in the queue. This would be the classic unbound priority inversion, as it can go on forever (as long as medium-priority events are in the queue).
For these reasons, SST (and other kernels of this type, like OSEK/VDX) assign priorities to tasks but use multiple (priority) queues. This seems to give you the best of both worlds. The work can be prioritized (through priority queues) and hard real-time deadlines can be met without the risk of priority inversion. But at the same time, the order of events for the same task is preserved (FIFO queueing). There is also no overhead or non-determinism of sorting events by priority in a common queue (either when they are posted or when they are pulled out). Of course, there is a cost of having multiple event queues, but this is relatively low.

ChipWeller
Score: 0 | 1 year ago | 1 reply

Miro,
Thanks for the reply. You bring up some good points. I have not run into issue with handling the events out of order, but this largely may be because I know they can be receive out of order. Often a state machine needs to handle race conditions between events, so my approach just makes them more likely.
As far as the priority inversion, yes it can happen but I could argue the other way. Typically the timing requirements can be viewed as a chain of events, so the highest priority deadline doesn't need to take into account how many events could be queued before the most time critical one. Doing a dead-line monotonic analysis becomes fairly simple with this approach. (As far as RTC in a few cases I have had to break low priority event handlers up, firing a new event and terminating in order to allow higher priority events to meet their deadlines.)
I hadn't noticed that a single SST task could handle multiple prioritized queues. That certainly would address some of my concerns.
As far as the cost of pulling events in priority order, the cost is bounded and I am typically surprised by few events are in the queue.
Thanks, Chip

MiroSpeaker
Score: 1 | 1 year ago | no reply

Hi Chip,
Just to clarify, SST has only one event queue per task. I called all these private, per-task queues collectively "priority queues" because they behave according to the standard priority queue algorithm (every SST queue has the priority of the associated task).
But I'd like to step back and reflect on the bigger picture here, which is the non-blocking property of the event handlers. This alone makes all such kernels possible (like your dispatcher, SST, or SST0). This is also what I meant in the presentation by remarking that event-driven code (implying non-blocking code) is more flexible. Yet, a simple kernel like that can provide all benefits of a traditional (blocking) RTOS kernel, such as the ability to decompose of an application into tasks and efficient, real-time execution of these tasks. Due to the non-blocking property (by which I mean also non-polling), the one-shot event handers run typically very fast (microsecond level), so meeting hard real-time deadlines is much easier.

16:35:08	 From  Linari : What is the difference between SST and SST0
16:36:49	 From  Ribero.Alex : Is the number of SST tasks limited by the number of available interrupts vectors? Is there a mechanism to have more or unlimited tasks?
17:00:34	 From  Amy McNeil : Thank You, Miro!
17:00:34	 From  Nathan Jones : Thanks, Miro!
17:00:40	 From  주희수 : THANK YOU!
17:00:42	 From  Aminul Haq : Thanks
17:00:43	 From  Erwin Pschernig : Thanks
17:00:50	 From  Vishwa Perera : Thank you

OUR SPONSORS

OUR PARTNERS