Applied Machine Learning Engineer – Evaluation

Job Detail

Experience Level Sr Level
Degree Type Bachelor of Science (BS)
Employment Full Time
Working Type Remote
Job Reference 0000015466
Salary Type Annually
Industry Aerospace & Defense;Defense and Space
Selling Points
Shape the future of ML evaluation by defining metrics and building scalable systems. Collaborate remotely with top-tier professionals in a dynamic environment. Drive impactful decisions influencing product reliability and model quality.

Overview

Lead efforts to define and measure model quality, ensuring robust evaluation frameworks and trustworthy feedback loops.
Collaborate across ML engineering, product quality, and tooling to build scalable evaluation infrastructure.
Design and implement evaluation metrics, ground truth definitions, and automated pipelines for ML systems.
Develop tools and dashboards to visualize evaluation results, trends, and regressions effectively.
Partner with cross-functional teams to ensure evaluations align with real-world applications and are actionable.
Continuously improve workflows to enhance signal quality and accelerate iteration cycles.
Work in a fully remote environment with strong collaboration across teams.

Key Responsibilities & Duties

Define evaluation metrics and ground truth for ML systems, ensuring clarity and reproducibility.
Build automated evaluation pipelines and integrate results into centralized systems.
Develop human-in-the-loop tools for error analysis and failure categorization.
Create internal dashboards to monitor evaluation results and trends over time.
Collaborate with ML, product, and research teams to align evaluations with real-world use cases.
Optimize evaluation workflows for efficiency and reliability.
Influence evaluation standards across teams to ensure consistency and quality.

Job Requirements

Bachelor of Science degree in a relevant field.
Minimum of 4 years of professional experience in applied ML or ML engineering.
Hands-on experience designing and implementing ML evaluation frameworks in production.
Proficiency in metrics design, ground truth construction, and evaluation pipelines.
Experience with tools like Weights & Biases for experiment tracking.
Ability to build internal tools and dashboards for ML workflows.
Strong engineering mindset focused on clarity, velocity, and correctness.
Preferred experience in NLP, CV, or multimodal model evaluation.