-
Notifications
You must be signed in to change notification settings - Fork 5
Add llm specification for automated scoring #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,7 @@ | |||
[[examples]] | |||
answer = "Reversing word order and reversing characters both have O(n) complexity, but character reversal requires more operations per word, making it slightly less efficient in practice." | |||
points = "{ \"R1\": 1, \"R2\": 1 }" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to simplify this for the task designer, i.e. use toml syntax instead of a string? Escaping things like this is a bit tedious.
Also: these are weights? Because the points below are 0.5 and 0.5 for R1 and R2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will try to change it to toml syntax. Also yes, you are right. It should be 0.5 for the examples for each rubric. These are points not weights. Thanks!
@@ -0,0 +1,9 @@ | |||
[[rubrics]] | |||
id = "R1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it even necessary to have rubric IDs or could they be parsed in the order they appear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Ids help the model understand the examples. Without them, it would be harder to tell which rubric was correctly solved for each example, especially when there are more than just 2 rubrics.
This is also useful in order to avoid duplication of the rubric, since the model has the rubrics but might not immediately know to infer the order in an unstructured and fairly large prompt as ours. Also as an educator it's easier to give them a clear id like "asymptotically_equivalent" instead of an abstract "R2", in order to then more easily create examples for the llm. I will change the R1, R2 ids to be something more along those lines.
This PR contains an example exercise 04 with LLM specific flags for automated scoring.