
ML Engineer
- Groningen, Groningen
- Vast
- Voltijds
- Develop Evaluation Frameworks: Architect end-to-end pipelines that combine automated metrics (BLEU, ROUGE, BERTScore, custom error rates) with human‑in‑the‑loop assessments.
- Quantitative Analysis: Implement statistical and machine‑learning methods to measure LLM performance—accuracy, relevance, bias, fairness, robustness—and analyze trends over releases.
- Qualitative Assessment: Design annotation guidelines, recruit/train reviewers, and lead structured reviews of model outputs for coherence, factuality and style.
- Prompt Engineering & Optimization: Use tools like DSPy to craft, test and (automatically) refine prompts; analyze A/B test experiments to maximize response quality and task success.
- Custom Tooling: Build reusable Python libraries and dashboards for monitoring LLM behaviour, automating evaluation workflows and integrating with our CI/CD approaches.
- Collaboration & Reporting: Partner with research, product and MLOps teams to translate user needs into evaluation requirements; present findings, drive data‑backed decisions and iterate on model improvements.
- Best Practices & Ethics: Champion documentation, version control, testing standards and fairness audits. Stay up to date on responsible AI guidelines and industry benchmarks.
- Education: MSc or higher in CS, Engineering, Data Science or related.
- AI/ML Expertise: deep knowledge of ML algorithms, with a focus on NLP and transformers.
- GenAI Expertise: 1+ years experience evaluating, optimizing, and productionzing GenAI products
- Software /Cloud: 3+ years production experience with Python; experience with Docker, Kubernetes, FastAPI; hands-on experience with any major cloud provider (GCP/Azure/AWS)
- MLOps: familiarity with CI/CD for models, monitoring, versioning, pipelines (e.g. KubeFlow)
- Communication: business-fluent English; able to translate complex concepts for diverse stakeholders.
- Knowledge of fairness, bias detection and mitigation techniques for generative models.
- Experience with open‑source (self-hosted) LLMs (e.g. LLaMA, Qwen)
- Experience with LLM tracing and prompt management platforms (e.g. Langfuse)