Source: https://maven.com/parlance-labs/evals
If you encounter questions like these while working with AI:
- How to test applications where results are probabilistic and require subjective evaluation?
- If I change a prompt, how can I ensure nothing else breaks?
- Where should engineering efforts be directed? Is it necessary to test everything?
- What to do if there is no data or users – where to start?
- Which metrics should be tracked? What tools should be used? Which models should be selected?
- Is it possible to automate testing and evaluation? And if yes, how can you trust it?
- then this course is for you.
This is a practical course for engineers and technical product managers. Ideal for those who know how to program or “enjoy coding by intuition.”
What to Expect You will experience intensive practice: exercises, working with code and data. We meet twice a week for four weeks + we offer generous office hours. All sessions are recorded and will be available in an asynchronous format.
Course Content
- Basics and lifecycle of LLM application evaluation
- Systematic error analysis
- Building effective metrics and automated evaluation pipelines
- Collaborative practices and alignment of evaluation criteria
- Testing strategies for different architectures (RAG, pipelines, multimodal systems, etc.)
- Monitoring in production and continuous quality evaluation
- Organizing an effective human-in-the-loop review process
- Cost optimization and query routing
Learning Outcomes
- Master the best tools for finding, diagnosing, and prioritizing errors in AI.
- Learn how to use synthetic data before user engagement and how to use real data as effectively as possible.
- Build a “data flywheel” that ensures your AI improves over time.
- Learn to automate parts of the evaluation processes and trust them.
- Be able to customize AI to your preferences and requirements.
- Avoid common mistakes accumulated from the experience of more than 35 AI projects.
- Gain practical experience through end-to-end exercises, code, and analysis of real cases.
Download Links
Password: cms.ddpanda.org












