Added possibility to add range of values as a grading parameter #5

KarlLundengaard · 2022-08-03T15:39:11Z

No description provided.

RabidSheep55

This looks good, 2 things I think could be done before we merge however:

Updating the docs file to reflect the additional functionality
I don't think inferring an answer should be compared in a range should by done by checking it's type. Since it sounds more like a parameter to me, could we keep answer as a number, and have something like a range array in the params instead? This could also allow having an asymmetric accepted range around an answer.

Something like this:

{
	"answer": x,
	"response": x,
	"params": {
		"range": [min, max]	
	}
}

Let me know what you think about that, we should discuss this further.

KarlLundengaard · 2022-08-08T14:31:17Z

I did not want to update the documentation before I had tested that the functionality worked (i.e. the input field on the website accepts a list and doesn't throw an error immediately).
Using a parameter was my first thought as well but I wanted to try this solution for two reasons:
- Ease of use, using a parameter either requires changing UI needed, author not required to use grading parameters
- IsSimilar always checks whether the response is a number in a given range. It only allows for different ways to define the range. This would be adding a third way and if we give the min and max of the range directly whatever is put in the "answer" will be ignored (except that the web UI forces you to add something). Thus simply replacing the answer with the range makes more sense (to me).

To me the concept of "symmetric range" only makes sense to me if either it is defined with respect to some center point (as with atol or rtol) or if we expect different things to happen depending on what part of the range the response is in, e.g. the feedback will specify whether the response was both in the range as well as above or below the expected value. Currently there is no functionality like that as far as I am aware (is the returned real_diff and allowed_diff used for something else that ease of debugging at the moment?)

I agree that checking the type is ugly, this was mostly me being lazy and wanting to quickly make something that I could test. I think a better solution is to always have the response be a string that is parsed, then it would be relatively easy to adjust the function so you could accept closed ranges, e.g. [a,b] (i.e. a =< x =< b), open ranges, e.g. (a,b) (i.e. a < x < b), or semi-closed ranges, e.g. (a,b], [a,b).

RabidSheep55 · 2022-08-08T15:39:32Z

Ok I understand your point

Currently, the IsSimilar function has been configured to only work with NUMBER or TEXT response areas. The alpha site doesn't allow inputing arrays or lists as the answer to those fields. In general for ease of use and readability, I think we should try keep the answer and response fields as the same data types (although this is open to discussion).

For asymmetric ranges, a use-case I was thinking of was questions which deal with orders of magnitude for example. (e.g. the answer is 30, but any number between 10 and 99 is acceptable). To me, since only dealing with symmetric ranges doesn't add any functionality to the actual comparison - this is something that could be done by the front-end (teacher provides the answer as a range, which gets converted in the background to an answer+atol for the request made to the evaluation function).

As for parametrising the valid range to the function, it doesn't need to be easily human-readable. Instead, we want requests to be unambiguous and as easy as possible for the code. This makes me sceptical about parsing strings you mentioned as inputs. These functions are essentially only called by our front-end, which can reshape the user input as much as it wants before making requests. In your example, we could have the user interface feature what you said here:

closed ranges, e.g. [a,b] (i.e. a =< x =< b), open ranges, e.g. (a,b) (i.e. a < x < b), or semi-closed ranges, e.g. (a,b], [a,b)

But then the actual web request made to the function could look like this:

{
	"answer": x,
	"response": x,
	"params": {
		"range": [min, max],
		"allow_min": <true or false> (open or closed range),
		"allow_max": <true or false>	
	}
}

KarlLundengaard · 2022-08-09T13:22:21Z

Preemptive "Sorry for the wall of text. Yes, I know that I'm annoying".

In general I am not sure it is a good idea to have the answer and response be the same type. This is only convenient if we want to do a direct comparison between the answer and the result (e.g. as in SymbolicEqual), but the actual use case is that for each problem we want to specify one or more statements about the response that should be true in order for the response to be considered correct. A simple example is the case where there are many possible correct answers and to solve the problem the response only needs to give one, in that case the "answer" should be a list (or set or whatever data structure is the most suitable) while we only a single element (number, expression etc) which would be a different type. It would probably be more correct, w.r.t. to how a problem author is likely to reason, to think in terms of "criteria" (that should be satisfied for a response to be correct) and "response", this is why we have gotten several requests for having multiple inputs for one answer. This also feels a more general point of discussion that should not be done in the comments on a particular evaluation function.

With regards to IsSimilar:

Why is IsSimilar enabled with TEXT responses? The current implementation does not take this case into account at all and will fail on any input (including numbers).
Using the [a,b], (a,b), (a,b], [a,b) notation is not about making it human readable, it is about making it fast and easy to define the answer (since "3 key presses + whatever is needed to write the two numbers" is probably faster than "selecting the right fields in the UI to fill in + whatever is needed to write the two numbers").
There can also be cases where getting exactly what the author wrote in the input box can be useful. For example, a reasonable default for IsSimilar is that it is assumed that the answer is given with the desired number of significant digits and compute atol from that, but if answer is immediately parse to a floating point number this default is not possible.
I am not sure I understand what functionality / possible functionalities you are proposing. If we want to minimize the content of the web request shouldn't it always be reduced to an answer+atol? In other words, the web request is always on the form

{
	"answer": <number>,
	"response": <number>,
	"params": {
		"atol": <number>,
		"include_boundary": [<true or false>,<true or false>]
	}
}

where atol and answer are either given directly or computed (atol=answer*rtol or answer=(min+max)/2, atol=(min+max)/2) by the frontend and comparisons are handled by the evaluation function. This would probably mean that there would be more code in the frontend than in the backend, is that what we want? How does that work with contributions to the public eval functions from individuals without access to the private frontend repos?

If we allow different input type and different kinds of processing on the frontend, doesn't that increase ambiguity for the evaluation functions? If the frontend admin can configure things any way he likes the evaluation functions need to be prepared for all possible inputs, right? If the answer, response and params are always the same type (e.g. string) then the eval function creator at least knows that they will need to sanitize and parse the string before processing the input. While doing some processing on the front-end is a good idea (to save server resources), I would even argue that doing everything on the frontend would be nice since it would make it easy for the users to do their exercises even when they do not have an internet connection, separating different kinds of functionality for the evaluation functions and keeping them in two different locations seems like asking for trouble. One solution would be to allow the eval function creator to specify what input types should be possible and what preprocessing should be done (e.g. in the config.json file or similar), or have one evaluation.py and one preprocessing.ts (I am guessing that the typescript files are what defines the frontend behaviour, I have not looked at that part of the code in detail) so that everything is in one place. But this is also starting to sound like a more general discussion so I am gonna stop now.

RabidSheep55 · 2022-08-09T15:36:03Z

Yeah I agree, I think the issue revolves around the fact that as soon as we go into more advanced functionalities, the frontend response area components become more tightly linked to the evaluation function they're connected to: additional functionality means both additional frontend UI components as well as additional evaluation function parameters and logic. The first concept of an evaluation function we came up with last year would hold both comparison logic and the code for the frontend component itself (similar to what you mentioned with preprocessing.ts). However, we abandoned this due to the heap of implementation hurdles. I'd be very interested in picking those discussions up again, as we start cementing the specification for evaluation functions.

Support for multiple correct answers is a feature that would benefit every evaluation function, so I've been working on incorporating that functionality within the base layer.

As for just supplying strings in response and answer which require extensive parsing on the side of the evaluation function, I really don't think it should be done too often:

If the UI or frontend is responsible for those complex inputs, then syntax or logic errors can immediately be reported to the user.
In the same vein as this fail-fast approach, performing schema validation on function inputs makes more sense if the data is already structured (not just one long string). Schemas were implemented in the older function versions, they'll be added to the new ones too.
By having inputs to eval functions laid out very unambiguously and explicitly we allow functions to be easily called by a wider range of frontend components. For example with isSimilar, we could have multiple ways of specifying ranges clientside: sliders, specific text-boxes or even the string syntax you propose ([a,b], (a,b), (a,b], [a,b)). All these can get converted and reshaped into the explicit data structure before being sent to the function. This enhances the re-usability of the function, which I think answers your question:

How does that work with contributions to the public eval functions from individuals without access to the private frontend repos?
I agree with your point about the importance of the balance of where computation is carried out (frontend, backend or evaluation function). However ultimately, I think we shouldn't shy away from creating bespoke UI components and logic client-side since it does improve the teacher experience (with more intuitive and evident usage)

I personally don't mind keeping this on gh, we could move to a trello ticket or teams if you prefer - I do think the discussion is very valuable though.

KarlLundengaard · 2022-08-09T16:14:48Z

I agree that the discussion is valuable. It does not matter to me if it is on GitHub or Trello or elsewhere, but if we keep it on GitHub I think there are more appropriate places to have it, for example in connection with the eval function template, rather than one of the eval functions. Perhaps you can create an appropriate discussion somewhere with a short summary on previous discussions related to communication between the backend and frontend, and what should be done where? Or if this already exists, point me to it.

Added possibility to add range of values as a grading parameter

9d5a185

KarlLundengaard assigned KarlLundengaard and unassigned KarlLundengaard Aug 3, 2022

KarlLundengaard requested a review from RabidSheep55 August 3, 2022 16:02

RabidSheep55 suggested changes Aug 8, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added possibility to add range of values as a grading parameter #5

Added possibility to add range of values as a grading parameter #5

Uh oh!

KarlLundengaard commented Aug 3, 2022

Uh oh!

RabidSheep55 left a comment

Uh oh!

KarlLundengaard commented Aug 8, 2022

Uh oh!

RabidSheep55 commented Aug 8, 2022

Uh oh!

KarlLundengaard commented Aug 9, 2022

Uh oh!

RabidSheep55 commented Aug 9, 2022

Uh oh!

KarlLundengaard commented Aug 9, 2022

Uh oh!

Uh oh!

Added possibility to add range of values as a grading parameter #5

Are you sure you want to change the base?

Added possibility to add range of values as a grading parameter #5

Uh oh!

Conversation

KarlLundengaard commented Aug 3, 2022

Uh oh!

RabidSheep55 left a comment

Choose a reason for hiding this comment

Uh oh!

KarlLundengaard commented Aug 8, 2022

Uh oh!

RabidSheep55 commented Aug 8, 2022

Uh oh!

KarlLundengaard commented Aug 9, 2022

Uh oh!

RabidSheep55 commented Aug 9, 2022

Uh oh!

KarlLundengaard commented Aug 9, 2022

Uh oh!

Uh oh!