NVIDIA Gruppe

Senior Software Engineer, AI Inference Systems

📍 Location
toronto, on
⏰ Job Type
Full-time
📅 Posted
June 04, 2026
Apply Now

Job Description

We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-performance inference stacks, optimize GPU kernels and compilers, drive industry benchmarks, and scale workloads across multi-GPU, multi-node, and multi-cloud environments. You’ll collaborate across inference, compiler, scheduling, and performance teams to push the frontier of accelerated computing for AI. What you’ll be doing:

Contribute features to vLLM that empower the newest models with the latest NVIDIA GPU hardware features; profile and optimize the inference framework (vLLM) with methods like speculative decoding, data/tensor/expert/pipeline-parallelism, prefill-decode disaggregation. Develop, optimize, and benchmark GPU kernels (hand‑tuned and compiler‑generated) using techniques such as fusion, autotuning, and memory/layout optimization; build and extend high‑level DSLs and c...

Start Your Week Right!

Apply now and make every Monday exciting with NVIDIA Gruppe

Apply for this Position