Analyzing Edge AI Applications with Arm's CMSIS Debugger for VS Code
As Artificial Intelligence continues to move from the cloud to the edge, developers face new challenges in deploying and debugging AI workloads on deeply embedded systems. Edge AI applications running on microcontrollers and using CMSIS or Zephyr RTOS must deliver real-time inference under tight memory and performance constraints. Diagnosing performance bottlenecks, timing anomalies, or inconsistent inference results becomes complex when traditional debugging tools offer limited visibility into system-level interactions.
This session demonstrates how the Arm CMSIS Debugger extension for Visual Studio Code enables developers to understand, optimize, and debug edge AI workloads more effectively. The talk introduces common issues in edge AI integration such as thread scheduling conflicts, inference latency, and memory exhaustion. It demonstrates how CMSIS Debugger provides fine-grained insight into runtime behavior through RTOS-aware debugging, thread inspection, and event tracing.
Attendees will see a live demonstration of debugging a CMSIS- and Zephyr-based AI application, showcasing how to identify timing issues, analyze performance, and correlate AI kernel execution with system events. By the end, participants will gain practical methods to transform AI debugging from a trial-and-error process into a structured, data-driven workflow resulting in shorter development cycles and improving reliability at the edge.
What this presentation is about and why it matters
How do you debug an edge AI system when the model is not the only thing that can be wrong? Matthias Hertel walks through that tension with a concrete ARM Cortex-M and Ethos-U example, using the CMSIS Debugger in VS Code, RTT, SDS, and GitHub Actions to show how software state, hardware state, runtime timing, and captured data fit together. The emphasis is on workflow and observability, not model design. If you work where embedded firmware meets deployed ML, this session helps you see why the usual tools are often not enough.
Who will benefit the most from this presentation
- Embedded engineers working on Cortex-M systems with an NPU or accelerator, especially when failures appear in deployed firmware rather than in training.
- Firmware developers using VS Code and CMSIS-based projects who need better visibility into call stacks, registers, and device state.
- ML engineers or edge AI developers who need to correlate model behavior with the actual sensor data seen on target hardware.
- Test and CI engineers building hardware-in-the-loop regressions for inference pipelines or preprocessing changes.
What you need to know
A basic familiarity with embedded debug workflows and edge AI deployment will help. The talk assumes comfort with concepts like firmware flashing, call stacks, and running code on real hardware.
- Knowing what a microcontroller debug session looks like.
- Understanding that edge AI often includes preprocessing, inference, and post-processing on target.
- Some awareness of hardware-in-the-loop testing or regression workflows.
Glossary (terms used in this talk)
- RTOS (Real-Time Operating System): An operating system designed to provide predictable timing behavior for real-time applications.
- RTT (Real-Time Transfer): A debug and trace channel that can stream text or data between target hardware and a host tool while the program is running. It is often used for console-style output without relying on a conventional serial port.
- PMU (Performance Monitoring Unit): Hardware counters built into a processor to measure events such as cache misses, branch mispredicts, and pipeline stalls. They help distinguish between compute-bound, memory-bound, and control-flow-related issues.
- Synchronous Data Streaming (SDS): A structured data capture and playback mechanism for embedded pipelines. It can record target-side data streams with metadata and replay them later for debugging, validation, or regression testing.
- Ethos-U: An Arm machine learning processor designed to accelerate inference alongside a Cortex-M CPU. It is typically used for edge AI workloads where compute is split between the CPU and an accelerator.
- SVD (System View Description): An XML-based description format that documents a device's peripheral registers and fields so debuggers and IDEs can present named registers and bitfields to developers.
- SWD (Serial Wire Debug): ARM's two-pin debug transport protocol used by debug probes to access target memory and debug interfaces without a full JTAG port.
- CMSIS: The Cortex Microcontroller Software Interface Standard, an ARM specification and ecosystem (device packs, headers, APIs) for developing Cortex-M software and describing boards/devices to tools.
Toolbox (mentioned in this talk)
- GitHub: A web-based platform for hosting Git repositories and collaborating on software development.
- Visual Studio Code: A lightweight source code editor with broad language support and extension-based integration. It is often used for embedded development and can pair well with containerized workflows.
- Raspberry Pi: A family of low-cost single-board computers used for education, prototyping, and embedded computing.
- TensorFlow Lite: A lightweight runtime and model format for deploying machine learning models on mobile and embedded devices. It is commonly used as an intermediate step before target-specific compilation or conversion.
- GitHub Actions: A workflow automation service for building, testing, and deploying software from a GitHub repository. It is commonly used to run CI/CD jobs and generate build artifacts automatically.
- Arm CMSIS Debugger: A Visual Studio Code debugger extension for CMSIS-based embedded projects. It integrates flashing, target configuration, call stack inspection, and peripheral register views.
Final thoughts
Practical and workflow-oriented, this session turns edge AI debugging into something you can reason about with the right layers of visibility. You get a useful mental model for separating software faults, timing problems, and data issues, plus a clearer sense of how target data can flow into testing and CI. It will be especially helpful for embedded developers, edge AI engineers, and anyone asked to make inference behavior measurable on real hardware. The value is in seeing the system whole, not just the model.
This overview is AI-generated from the session transcript. Spot an issue? Let us know.








No comments or questions yet. Be the first to start the conversation!