A/B testing
#793
Replies: 1 comment
-
@boxabirds have you got an API in mind? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Problem: with such wild variability in output based on not only the LLMs but the prompts, small changes can result in quite significant differences.
Solution: ability to specify a list of prompt variations and a list of different LLMs to try.
You could use Optuna for efficient evaluation (cf DSPy), along with argilla the human evaluation.
Beta Was this translation helpful? Give feedback.
All reactions