Job Description
Machine Learning Infrastructure Specialist Position Summary As an ML Infrastructure Specialist focused on systems and scalable AI infrastructure, you will build and improve efficient, reusable systems to train, deploy, monitor, and serve large-scale machine learning models, including large language models (LLMs). Working at the intersection of applied research and production systems, you will collaborate with Vector’s AI Engineering team members, researchers, and industry partners to bring advanced AI capabilities into real-world use. You will contribute to initiatives that strengthen software and systems supporting state-of-the‑art AI development and deployment, owning well‑scoped projects from end‑to‑end.
Key Responsibilities
Design and implement distributed systems for scalable ML training, inference, and serving on multi‑GPU/multi‑node environments, with a focus on large foundation models.
Configure and maintain LLM inference systems using modern serving ...
Key Responsibilities
Design and implement distributed systems for scalable ML training, inference, and serving on multi‑GPU/multi‑node environments, with a focus on large foundation models.
Configure and maintain LLM inference systems using modern serving ...