Drive innovation in MLOps and AI infrastructure at a leading organization. Collaborate with cross-functional teams to deliver scalable and secure AI platforms. Enhance your expertise in cutting-edge AI technologies.
Senior Devops Engineer
in Information Technology PermanentJob Detail
Job Description
Overview
- Lead the design and operation of MLOps pipelines supporting AI/ML lifecycle processes.
- Collaborate with cross-functional teams to deliver scalable and secure AI platforms.
- Develop and maintain CI/CD pipelines for AI/ML services and infrastructure.
- Automate infrastructure provisioning using Infrastructure as Code tools like Terraform.
- Ensure reliability, scalability, and observability of AI platforms and workloads.
- Implement security, compliance, and governance requirements for AI systems.
- Support production workloads for Generative AI systems and LLM-based services.
- Document standards and best practices for MLOps and AI infrastructure.
Key Responsibilities & Duties
- Design and operate scalable MLOps pipelines for model lifecycle automation.
- Develop cloud-native infrastructure on AWS using Kubernetes and containerized workloads.
- Implement model versioning, artifact management, and experiment tracking.
- Ensure system health monitoring, model performance tracking, and drift detection.
- Collaborate with AI/ML engineers to standardize deployment patterns.
- Participate in incident response and continuous improvement initiatives.
- Optimize costs for compute-intensive workloads and ensure scalability.
- Document reference architectures and best practices for AI infrastructure.
Job Requirements
- Bachelor of Science degree in a relevant field is required.
- 15+ years of experience in DevOps, SRE, or Platform Engineering roles.
- Proficiency in AWS cloud services, Kubernetes, and CI/CD pipeline development.
- Hands-on experience with Terraform and scripting/programming languages like Python.
- Experience with MLOps platforms, model registries, and experiment tracking.
- Exposure to Generative AI workloads and LLM-based services in production.
- Strong communication skills and ability to work across teams effectively.
- AWS certifications are preferred but not mandatory.
- ShareAustin: