AI Evals For Engineers & PMs

Source: https://maven.com/parlance-labs/evals

If you encounter questions like these while working with AI:

  1. How to test applications where results are probabilistic and require subjective evaluation?
  2. If I change a prompt, how can I ensure nothing else breaks?
  3. Where should engineering efforts be directed? Is it necessary to test everything?
  4. What to do if there is no data or users – where to start?
  5. Which metrics should be tracked? What tools should be used? Which models should be selected?
  6. Is it possible to automate testing and evaluation? And if yes, how can you trust it?
  • then this course is for you.

This is a practical course for engineers and technical product managers. Ideal for those who know how to program or “enjoy coding by intuition.”

What to Expect You will experience intensive practice: exercises, working with code and data. We meet twice a week for four weeks + we offer generous office hours. All sessions are recorded and will be available in an asynchronous format.

Course Content

  1. Basics and lifecycle of LLM application evaluation
  2. Systematic error analysis
  3. Building effective metrics and automated evaluation pipelines
  4. Collaborative practices and alignment of evaluation criteria
  5. Testing strategies for different architectures (RAG, pipelines, multimodal systems, etc.)
  6. Monitoring in production and continuous quality evaluation
  7. Organizing an effective human-in-the-loop review process
  8. Cost optimization and query routing

Learning Outcomes

  1. Master the best tools for finding, diagnosing, and prioritizing errors in AI.
  2. Learn how to use synthetic data before user engagement and how to use real data as effectively as possible.
  3. Build a “data flywheel” that ensures your AI improves over time.
  4. Learn to automate parts of the evaluation processes and trust them.
  5. Be able to customize AI to your preferences and requirements.
  6. Avoid common mistakes accumulated from the experience of more than 35 AI projects.
  7. Gain practical experience through end-to-end exercises, code, and analysis of real cases.

4 Optional Office Hours 0000 00 20 02 screenshot

4 Optional Office Hours 0001 00 40 05 screenshot

4 Optional Office Hours 0002 01 00 07 screenshot

4 Optional Office Hours 0003 01 20 10 screenshot



Download Links

Password: cms.ddpanda.org

Archive: 5D5AB475[ddpanda]

Enjoyed this post?

If this article helped you, consider supporting my work.

Support 🐼

Leave a Reply

Your email address will not be published. Required fields are marked *