Evaluations
Evaluation is essential for measuring the performance and reliability of your agents. Orbit provides tools for automated and manual evaluation.
Evaluation Methods
- Automated tests: Use test suites to validate agent outputs and behaviors.
- Human-in-the-loop: Collect user feedback and ratings for continuous improvement.
- Metrics and logging: Track accuracy, latency, and usage statistics.
Example: Automated Test
def test_agent_response():
response = my_agent.run("What is the weather today?")
assert "weather" in response.lower()
Best Practices
- Integrate evaluation into your CI/CD pipeline.
- Use Orbit's observability tools to monitor agent quality.
- Regularly review and update evaluation criteria.
See the Orbit documentation for evaluation templates and advanced techniques.
Last updated on