Home > On-Demand Archives > Keynote Presentations >
Mars Perseverance Software
Steve Scandore - Watch Now - EOC 2021 - Duration: 39:42
That's correct, the general use of semaphores (task locks in general) in the software is not allowed. This avoids some classic misuse and unexpected task dependencies (e.g.: inversions, deadlocks) in an architecture where we want tasks to be as independently operating and deterministic as possible. Using semaphores also complicates runtime analysis and testing in a system with processing deadlines. In short, we remove or cautiously use conventions which may question the operation of the code. In many cases we have easily redesigned code to avoid the casual use of a semaphore. Having said all that, we do have cases where waivers to this rule are granted. For example, IPC waits on message are implemented using a semaphore. By not allowing them, and then using waivers in the few places where they are really required helps ensure their overall safe use and operation of the system.
As I mentioned in the talk, we have learned a lot over the years. Here's a link to a related dependency problem from our past: https://www.youtube.com/watch?v=C2xKhxROmhA
Thanks Steve. I used to work on the avionics software for the F-16 at General Dynamics in the late 80's. Back then we had no RTOS, just a homegrown "cyclic executive". We weren't even allowed to pass parameters to a function because it took too long to push and pop data from the stack so everything was in global data. There was even a complete software team just to manage the global data. I am really glad to hear that VxWork is now used. I use to work with them at various contractors when I work for Rational Software. The Ada days. :-)
Thanks for great presentation;
You mentioned "compression and data streaming" in Mars Perseverance FSW slide. Which compression formats do you use? Are they custom made or public/known protocols?
I remember NASA's articles about TTEthernet and how they use it. Did you also use (TT)Ethernet? Why?
Yunas, see related comment from DavidKnight below. We do not use time-triggered (TT) Ethernet on this mission. Unfortunately, it takes many years to get new technology (in space terms) introduced into the mission avionics baseline. I do hope it happens.
I often wondered what kind of state machine implementation was used in these mission critical SW. I was pretty happy to learn that Dr. Samek's awesome QP framework was used along side traditional RTOS. This framework deeply changed my ways of programming embedded SW.
Also, thank you so much for giving us an insight of the SW used for the mars rover missions: it provides us with a light feeling of having been part of this big adventure.
Yes, I was also happy to hear that "Samek's hierarchical state machines were used". (So they are apparently on Mars now! Awesome!). But Steve didn't actually say that the whole QP framework was used in this mission and from my understanding this was actually NOT the case. But the system was clearly event-driven, which is sufficient to apply at least the state machine part...
Steve, in the past there have been papers published from JPL and NASA about software engineering techniques employed in various subsystems (e.g. in MRO?s radios where QP was, at least initially, used and verified using SPIN/Promela).
I would be interested in learning more about how your team designed, implemented and tested the software to achieve this level of reliability. Are there any resources from which we can learn more details about this mission?
Very inspiring, thanks for the presentation Steve. The amount of redundancy and reliability needed in mission on this scale is crazy. It's also astonishing how much one can achieve on such a limited processor with a good architecture.
Perfect Presentation, thanks a lot for your time and your effort.
Can you give info about UnitTest Coverage in this magnificent project??
Thank you for the comment. We require 100% code coverage in unit testing. There are waivers to the 100% rule allowed in specific code cases where the coverage is not possible (e.g.: intentional spin-loops). We use gcov to measure, and report the coverage. It's not perfect. We can't easily measure code path coverage and rely more on test reviews to ensure the right tests exists. We can then use gcov to see what parts of the code have not been tested, then fill in those test gaps.
Hey Steve, Thank you so much for this great presentation.
At one point in the Q&A I think you mentioned having a custom version of gzip for compression. I was wondering if this is the only compression algorithm used or does the rover use other algorithms like huffman coding or rice encoding?
I'm also wondering how the downlink works, does the rover use the CCSDS space packet protocol or some custom packetization protocol? If CCSDS is used do you still use the custom gzip compression for downlink or do you use the standard rice encoding that CCSDS recommends?
My initial response was originally focused on the engineering data and science data aspects of compression. The other types we use in this area are: lzo (data), jpg (image), icer (image), loco (image). All these have their own encoding algorithm methods. Some of these originated from JPL/NASA missions.
For downlink, I would say we use a tailored, but compliant version of the CCSDS space packet protocol. The data in the space packets are compressed using gzip or one of the others mentioned above. The CCSDS transfer frames are then streamed through additional telecom specific encoders for reliability, not really bandwidth management. This can be a Reed-Solomon encoding, but we more commonly use Turbo encoding methods
Excellent presentation. Wonderful insight into how this national treasure was constructed, tested, successfully launched, landed. I learned a great deal about how embedded software development/software modeling is carried out by one of the nation's brightest! :Thank you so much for your time!!! 73, Dave Comer, NM5DC
So cool system! Thanks for the presentation. But, by the way, why did you choose PowerPC750? Is this the best RAD processor on the USA space market now?
The Rad750 is a space qualified radiation hardened processor from the early 2000s. The qualification process is long and expensive. It was the best choice for this mission (Perseverance) given the reuse directive, and implementation timeline. There are newer versions and options which were not fully qualified in the early 2010 time frame for this mission.
Hi Steve, Thank you great presentation!
I'm embedded system engineer, and my career was started from simple cubesat. So I was glad to hear the Perseverance architecture. Perseverance is one of the largest embedded system. Probably you may be able to talk about more than one hour for each topic (e.g. customized processor, cruise software, fail-safe for radiation tolerance, tempature,...), but I couldn't hear even a part of them. Today I could hear. Thank you Steve and EOC2021 team for giving me this opportunity.
Steve, great presentation. The work that you are guys doing is simply awe-inspiring to human civilization.
Thank you
Thank you, Steve - this was excellent and very informative. There was also plenty to think about, seeing the complexity of the system and even how the flight software had 1.2 million lines of flight code and over 50 percent more (1.9 million) lines of unit test code.
Fabulous presentation and Q&A Steve. You mentioned that wrt to coding you do not allow any developers to use semaphores. Just curious why that is the case. I have a hunch :-)