Rohit Gupta
Rohit Gupta brings deep expertise in cloud and enterprise computing systems, with a focus on advancing generative AI and next-generation AI technologies that deliver value for both business and society. Based in San Jose, California, he has worked with several technology companies, from semiconductors to hyperscalers, and delivered data and compute infrastructure that accelerates AI innovation. He is passionate about building scalable, efficient systems that unlock the potential of AI foundational models, democratize access to advanced computing, reduce environmental impact, and drive positive societal outcomes.
From LLMs to SLMs in Embedded World
Status: Coming up in April 2026!Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning, multimodal understanding, and natural-language interaction, but their computational and memory demands place them far beyond the reach of resource-constrained embedded platforms. As embedded systems increasingly require on-device intelligence, supporting tasks such as voice interfaces, anomaly detection, semantic understanding, and autonomous decision-making, there is a growing need to scale language models down without sacrificing essential accuracy, responsiveness, or safety.
This presentation traces the evolution from cloud-scale LLMs to optimized Small Language Models (SLMs) tailored for embedded systems. It examines the algorithmic, architectural, and co-design innovations that make this transition possible, including model compression, quantization, structured pruning, distillation, sparsity-aware compute, and memory-efficient attention mechanisms. It also highlights system-level considerations such as real-time inference, energy constraints, thermal limits, secure deployment, and domain-specific customization.
The session provides insight into how SLMs enable embedded devices to run meaningful language and reasoning workloads locally, reducing latency, improving privacy, increasing reliability in disconnected environments, and enabling new classes of intelligent edge applications. Representative use cases across automotive, industrial automation, smart IoT, and wearable devices illustrate the emerging potential of deploying compact language models directly at the edge.
