Developing a Robust Serial Communication Framework Between Microcontrollers
Serial communication between microcontrollers is essential for most embedded systems designs. Poorly designed communication frameworks can lead to unreliable system behaviours and can limit future scalability.
This session will explain the basics of building a robust serial communication framework between two microcontrollers. I will cover the key design considerations such as,
- Understanding the system requirements: Identifying the data to be transmitted, upload frequency, and understanding the limitations of the communication link.
- Packet structure and message types design: Creating packet structures and message types taking the requirements into account.
- Error detection: Implementing techniques such as CRCs to ensure data integrity.
- Retries and timeouts: Strategies for handling missing packets and maintaining a reliable communication based on the system requirements.
Attendees will understand the basics of building scalable and reliable serial communication protocols that minimize future pitfalls.
What this presentation is about and why it matters
How do you turn “just send bytes” into a serial link that stays predictable when packets are lost, buffers overflow, or two microcontrollers start talking at once? Prabo Semasinghe approaches that question as a practical firmware case study, grounded in microcontroller-to-microcontroller communication over UART and related serial links. He walks through the design decisions behind a robust protocol, from requirements and packet structure to system behavior, state machines, and testing. If you have ever had a protocol work in the lab and then fail in the field, this session is aimed at the design pressures that make that happen.
Who will benefit the most from this presentation
- Firmware engineers working on MCU-to-MCU links who need a protocol they can own end to end
- Embedded developers dealing with tight memory, bandwidth, or power limits on serial communication
- Systems engineers who have seen packet loss, desynchronization, or retry loops create instability
- Engineers planning firmware upgrade or file-transfer flows over a custom link
- Technical leads who want a clearer process for designing and testing communication behavior
What you need to know
A basic grasp of embedded firmware and serial communication will help.
- Familiarity with microcontrollers and UART-style links
- Comfort reading about packet fields and protocol behavior
- Some exposure to testing firmware, including failure cases
Glossary (terms used in this talk)
- CRC (Cyclic Redundancy Check): An error-detection method that computes a check value from a block of data to detect accidental corruption.
- Byte stuffing: A framing technique that transforms reserved byte values before transmission so they are not mistaken for delimiters or control bytes. The receiver reverses the transformation to recover the original payload.
- Checksum: A compact value computed from data to help detect corruption during transmission or storage. It is usually cheaper to compute than stronger integrity checks, but it may detect fewer error patterns.
- Heartbeat / keep-alive: A periodic signal or message used to show that a communication link is still active even when no application data is being sent. It helps systems distinguish an idle link from a broken one.
- State machine: A model that defines how a system moves between named states in response to inputs, events, or timeouts. It is often used to make protocol behavior explicit and predictable.
- UART: A serial communication peripheral widely used for debug output and host-device messaging. It can carry logs or test results during embedded startup and shutdown flows.
- CAN: A robust multi-node communication bus with built-in arbitration and error handling features. It is widely used in embedded and automotive systems where shared-bus communication is a fit.
- Modbus: A widely used industrial communication protocol family for exchanging structured data between devices. It is commonly used in control and automation environments with established device ecosystems.
Final thoughts
Practical and design-focused, this microtalk gives you a vocabulary for thinking about protocol robustness instead of treating serial links as a byte-shuffling detail. The value is a clearer way to reason about packet structure, system behavior, and failure handling as one connected problem. It will help firmware developers, embedded architects, and anyone who has to make custom links dependable under real constraints. The talk’s spirit is simple: good communication is designed, not hoped for.
This overview is AI-generated from the session transcript. Spot an issue? Let us know.
Thank you for the feedback. Yes the message ID had been extremely useful in many cases like troubleshooting various unexpected behaviours in the field. I have been able to identify the edge cases unrelated to the communication protocol, simply because having a message ID makes it easier to interpret the system behaviour. and yes the state machine logic should detect the invalid message types. And logging these errors and send them over as diagnostics data is an excellent idea, I should have included that in the presentation.
Thank you for a very solid presentation. I am maintaining a system that has a custom serial protocol that doesn't include a start byte, but instead makes the length byte the first byte of a packet. That then allows for the receiver to know how many more bytes to read for that packet.
Our particular implementation is point-to-point with 1 master talking to 1 device, and it seems to work well in that scenario and has been field-proven over a decade. I'm trying to think of cases where the start byte may be required or more beneficial than the way we are doing it with only the length, and one thing that comes to mind might be a 1 master to multiple device scenario? That way, devices would only need to "wake up" and pay attention when they receive a start byte. Is this the primary benefit of having a dedicated start byte, or are there others that I may be overlooking?
Thanks again!
As you said, the start byte can be used to trigger the receivers to start listening. You can even use different start bytes for different slaves on the same bus, in that way the start byte works as an address as well.
In addition, an start byte also can be helpful in following cases,
- Some non-ideal electrical wiring/grounding issues can make garbage bytes to show up in the bus and receivers will start listening and decoding the message thinking it is the length (in your case). Having a start byte will keep the receiver from doing unnecessary processing.
- What happens if a receiver go through a reset cycle (someone turned the device off randomly and turned it back on) and wakes up during the middle of an transmission ? Having a start byte will prevent the devices from processing those partial messages. Same thing can happen during a temporary link break down.
These are a couple of situations that I can think of for now.
Thank you very much for the feedback. Im happy you enjoyed the talk.
Thank you for the reply! One further question. I didn't really understand the Start Byte - Byte stuffing you mentioned. You had:
"Byte stuffing – Ex: Start byte 0x7E. if payload contains 0x7E, send: 0x7D 0x5E"
Could you elaborate a bit more? I initially thought you meant something like escapinging the 0x7E in the payload, but the closer I look at it, the more it seems like maybe you are describing something different.
Thanks again.
Yes I understand the confusion and I did not go into the details given the limited time I had. Here is the detailed explanation,
0x7E: This is the start delimiter of your choice.
0x7D: This is your escape byte. This can be chosen arbitrarily , 0x7D is used in some known protocols.
Encoding Rule: This is the part I did not explain in the talk. This is the rule that is used to transform any reserved byte (the start delimiter in our case) to a special sequence. The encoding rule here is " Stuff 0x7D , XOR the reserved byte with 0x20".
Ex:
If payload contains 0x7E, it is transmitted as 0x7D 0x5E. where 0x7D is the escaped byte and 0x5E ( = 0x7E XOR 0x20) is the transformed byte.
If payload contains 0x7D, it is transmitted as 0x7D 0x5D because 0x7D XOR 0x20 = 0x5D.
General rule: Transmit 0x7D followed by (reserved byte XOR 0x20)
The receiver knows the rule and it decodes accordingly when it sees an 0x7D which is the escape byte.
Thank you for that very clear response! Now I understand :)
It's a great question, thank you very much for bringing that up, I think it will be helpful for other who are looking into implementation details.
Thank you for explaining what is done if 0x7D was sent. I was wondering about that too.
Thanks for the presentation, really interesting. I am currently designing a custom serial protocol and I'd like to have your opinion on the keep-alive messages: In a command / response system (for example on RS-485 which is half duplex), how do you handle keep-alive. If the master sends keep-alive and the slave responds, then how do you detect on the slave side that the communication failed. Would you use a timeout timer ?
If in the system, there are periods of 'idle' and 'activity' where the master is constantly sending get_status messages and the slave responds: Is it still necessary to send keep-alive during those 'activity' phase ? Or could any message act as keep-alive indicators ?
- How do we detect from the slave side if the master's communication has failed. - My approach is to have a timeout as you suggested. The slave does a state transition into a "No connection to master" state in the state machine when the timeout expires. It may mean shutting down certain safety critical operations , or anything that make sense for the application.
- Could any message act as keep-alive messages ? - Yes, my recommendation is to reset the "network timeout" timer regardless of the type of the message a receiver here from the transmitter. I would still send the heartbeat messages during this active phase, just to compensate for "missing messages" etc. But the receiver can reset the timeout timer upon the reception of any message. In your case, you can even choose to have two different timeouts for idle and active states, idle state having a longer timeout.
Thank you for your presentation. While basic (as promised) I found it clear and it highlighted the important considerations without muddying the waters with unnecessary complexity.
Thank you very much for your feedback! Im glad you enjoyed the session
Thank you Prabo. This provides a reasonable basic presentation of the though patterns to design with, things I should have known earlier on in my carrier but learned on the bench debugging. You mentioned testing to verify failure response which is not often stressed in development when the expectation is just to make it work to meet schedules. I've worked in application areas where unexpected electrical behavior was the normal condition so detection and error handling is very important and often not encountered until the product is out of the lab. To implementation checks you might consider adding buffer validation to your lists.
Thank you very much. and yes I have learned things the harder way as well. I'm a big advocator for testing. It is a great idea to include buffer validation, these are the things that can go wrong in the field, as you said with difference electrical behaviours. Thanks!
I need to make two corrections to the video:
-
At 1:29, I say “sending one byte at a time”, this should be “sending one bit at a time.”
-
At 22:39, there is a typo on the slide. The step number should be “5” instead of “7.”
Sorry for any confusion caused by these mistakes. If you have any questions, please feel free to leave a comment and I’ll be happy to clarify.








I enjoyed your presentation. It's a very wide topic and is impossible to cover every single detail when it comes to inter-processor communications but you did a fine job summarizing the high level considerations. I have designed and implemented many serial communication protocols over the years. One thing I have found very helpful is not to rely only on a checksum or CRC to detect errors. For instance, in step 2 of your presentation you talk about an incrementing message ID. I have used this recently and I found it very useful on the receiving end to verify it is the expected value (1 greater than the previously received value) because the CRC will not catch that type of error if a packet was missed by the receiver. Same thing with a message type field. What if the message type is invalid. This could happen if the buffer holding the outgoing message is corrupted before the CRC is generated and placed in the buffer. The receiver should detect that it is invalid. I like to log the number of occurrences of all errors that occur and have a way to view the statistics on a running system.