Platform Engineer

in Information Technology
  • Austin, Texas View on Map
  • Salary: $140,000.00 - $250,000.00
Permanent

Job Detail

  • Experience Level Sr Level
  • Degree Type Bachelor of Science (BS)
  • Employment Full Time
  • Working Type On Site
  • Job Reference 0000017898
  • Salary Type Annually
  • Industry Defense and Space
  • Selling Points

    Lead impactful projects at a cutting-edge technology company. Enhance your expertise in GPU-accelerated Kubernetes platforms and ML infrastructure. Collaborate with innovative teams to drive technological advancements.

Job Description

Overview

  • Lead the operation and optimization of GPU-accelerated Kubernetes platforms for high-performance machine learning workloads.
  • Deploy and manage Kubernetes clusters on bare-metal infrastructure with hybrid cloud capabilities for scalability.
  • Design and maintain CI/CD pipelines to ensure efficient software deployment across diverse environments.
  • Develop observability systems for real-time monitoring and performance optimization of infrastructure.
  • Collaborate with development teams to enhance productivity through streamlined tooling and processes.
  • Implement infrastructure-as-code practices using modern tools to ensure consistency and scalability.
  • Manage core infrastructure components, including networking, storage, and system configurations.
  • Ensure compliance with security standards and harden systems against vulnerabilities.

Key Responsibilities & Duties

  • Deploy and manage Kubernetes clusters on bare-metal infrastructure supporting NVIDIA GPUs.
  • Optimize GPU clusters for machine learning training workloads ensuring reliability and performance.
  • Design and operate CI/CD pipelines for automated build and deployment processes.
  • Develop observability stacks for real-time system health monitoring and alerting.
  • Collaborate with development teams to enhance tooling for deployment efficiency.
  • Implement infrastructure-as-code practices using Terraform, Helm, and Ansible.
  • Manage networking, storage, and system configurations for high-performance clusters.
  • Ensure systems meet defense-grade security and compliance standards.

Job Requirements

  • Bachelor's degree in Computer Science or related field.
  • 3-5 years of experience in platform engineering or DevOps roles.
  • Proficiency in Python and Bash for automation and tooling.
  • Deep knowledge of Kubernetes administration and GPU environments.
  • Experience with CI/CD pipelines and observability tools.
  • Strong expertise in Linux systems and infrastructure-as-code practices.
  • Ability to manage complex systems and ensure security compliance.
  • Preferred experience with ML orchestration tools and build toolchains.
  • ShareAustin:

Related Jobs

  • Drive impactful manufacturing projects with a focus on scalability and efficiency. Collaborate with cross-functional teams to optimize operations and site launches. Enhance production systems in a dynamic, growth-oriented environment.