Shape the future of ML evaluation by defining metrics and building scalable systems. Collaborate remotely with top-tier professionals in a dynamic environment. Drive impactful decisions influencing product reliability and model quality.
Applied Machine Learning Engineer – Evaluation
in Information Technology PermanentJob Detail
Job Description
Overview
- Lead efforts to define and measure model quality, ensuring robust evaluation frameworks and trustworthy feedback loops.
- Collaborate across ML engineering, product quality, and tooling to build scalable evaluation infrastructure.
- Design and implement evaluation metrics, ground truth definitions, and automated pipelines for ML systems.
- Develop tools and dashboards to visualize evaluation results, trends, and regressions effectively.
- Partner with cross-functional teams to ensure evaluations align with real-world applications and are actionable.
- Continuously improve workflows to enhance signal quality and accelerate iteration cycles.
- Work in a fully remote environment with strong collaboration across teams.
Key Responsibilities & Duties
- Define evaluation metrics and ground truth for ML systems, ensuring clarity and reproducibility.
- Build automated evaluation pipelines and integrate results into centralized systems.
- Develop human-in-the-loop tools for error analysis and failure categorization.
- Create internal dashboards to monitor evaluation results and trends over time.
- Collaborate with ML, product, and research teams to align evaluations with real-world use cases.
- Optimize evaluation workflows for efficiency and reliability.
- Influence evaluation standards across teams to ensure consistency and quality.
Job Requirements
- Bachelor of Science degree in a relevant field.
- Minimum of 4 years of professional experience in applied ML or ML engineering.
- Hands-on experience designing and implementing ML evaluation frameworks in production.
- Proficiency in metrics design, ground truth construction, and evaluation pipelines.
- Experience with tools like Weights & Biases for experiment tracking.
- Ability to build internal tools and dashboards for ML workflows.
- Strong engineering mindset focused on clarity, velocity, and correctness.
- Preferred experience in NLP, CV, or multimodal model evaluation.
- ShareAustin: