Cerebras

Engineering Manager, Inference ML Runtime

📍 Location
toronto, on
⏰ Job Type
Full-time
📅 Posted
May 20, 2026
Apply Now

Job Description

Company Overview

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer‑scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry‑leading training and inference speeds and empowers machine learning users to effortlessly run large‑scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

About the Role

The Inference ML Engineering team at Cerebras builds the runtime, APIs, and systems that power the fastest generative AI inference platform in the world.

As an Engineering Manager, Inference ML Runtime, you will lead a team responsible for designing and scaling the systems that enable seamless execution of state‑of‑the‑art AI models on Cerebras hardware. You will operate at the intersection of machine learning, distributed systems, and high‑performance runtime en...

Start Your Week Right!

Apply now and make every Monday exciting with Cerebras

Apply for this Position