Cohere

Software Engineer (GPU Infrastructure, High Performance Computing)

📍 Location
toronto, on
⏰ Job Type
Full-time
📅 Posted
June 19, 2026
Apply Now

Job Description

RequirementsDeep expertise in ML/HPC infrastructure: Experience with GPU/TPU clusters, distributed training frameworks (JAX, PyTorch, TensorFlow), and high-performance computing (HPC) environmentsKubernetes at scale: Proven ability to deploy, manage, and troubleshoot cloud-native Kubernetes clusters for AI workloadsStrong programming skills: Proficiency in Python (for ML tooling) and Go (for systems engineering), with a preference for open-source contributions over reinventing solutionsLow-level systems knowledge: Familiarity with Linux internals, RDMA networking, and performance optimization for ML workloadsResearch collaboration experience: A track record of working closely with AI researchers or ML engineers to solve infrastructure challengesSelf-directed problem‑solving: The ability to identify bottlenecks, propose solutions, and drive impact in a fast‑paced environmentIf some of the above doesn’t line up perfectly with your experience, we still encourage you to apply!What the job ...

Start Your Week Right!

Apply now and make every Monday exciting with Cohere

Apply for this Position