Description
Here is my best shot at Expectations 0.7.2, much of it in CUE:
Changed in 0.7.1: Removed false
from the values for with_margin
Changed in 0.7.2: Restricted expectation
keys based on problem.type
// The registry maps submission name patterns to expectations.
// A submission whose name matches a pattern must satisfy its expectations.
// (In particular, a submission can match several patterns, and thus
// needs to satisfy several expectations.)
// The semantics of "matching a pattern" are not defined here
// (possibilities: regexen, prefix, globbing).
#registry: close({[string]: #root_expectations})
// Problem-wide constants
// ======================
// Set by the judging environment
time_limit: >0
// Set in problem.yaml; these are problem.timing.tle_margin and problem.timing.ac_margin
tle_margin: >1
ac_margin: >1
// Testcase Result
// ===============
// The result of running a submission on a testcase
#result: {
testcase_name!: #name
time!: >=0
if time <= time_limit {
// When the submission terminates within the time limit
// its exit status is either a success or a failure:
terminated_successfully!: bool
if terminated_successfully {
// The output of successfully terminated submissions is
// validated by the output validator, the resulting values are:
output_validated!: bool
message?: string
score?: >=0
}
}
}
// The verdicts that can occur in the expectations framework are:
#verdict: "AC" | "WA" | "RTE" | "TLE"
// Testcase Verdict
// ================
//
// The verdict of (a submission's execution on) a single testcase is
// defined in terms of the testcase result in terms of time, termination
// status, and output validation as follows.
#verdict_for_result: {
// The verdict for the _result
_result: #result
let t = _result.time
if t >= time_limit {"TLE"}
if t < time_limit {
if !_result.terminated_successfully {"RTE"}
if _result.terminated_successfully {
if !_result.output_validated {"WA"}
if _result.output_validated {"AC"}
}
}
}
// Aggregate verdict
// =================
// For a linearly ordered sequence of verdicts, the aggregate verdict is
// * the first occurrence of "WA", "TLE", or "RTE" if it exist
// * "AC" if the list is empty or only contains "AC"
// Expectations
// ============
#root_expectations: {
// Set expectations for all testcases
#expectation | #range | #abbreviation
// And/or set them for testcases matching a pattern (such as testgroups)
[=~"^(sample|secret)"]: #expectation | #range | #abbreviation
}
// Often-used expectations are specified in terms of abbreviations
#abbreviation: "accepted" | "wrong answer" | "runtime exception" | "time limit exceeded" | "does not terminate" | "not accepted"
// Scoring problems can set the range
#range: number | ordered_tuple
ordered_tuple: tuple=[number, number & >=tuple[0]]
// In general, we can set fine-grained expectations in terms of which verdicts, timing, // messages, scores are allowed and disallowed for a set of results R
#expectation: {
permitted_testgroup_verdicts?: [...#verdict] // only these testcase verdicts may appear
required_testgroup_verdicts?: [...#verdict] // at least one of these testcase verdicts must appear
message?: string // this judgemessage must appear
with_margin?: true // Set m = max(r.time) over all r in R. Then t < timelimit / ac_margin or t >= timelimit * tle_margin
// pass-fail problems only:
permitted_aggregate_verdict?: [...#verdict] // the aggregate verdict must be in this list
// scoring problems only:
permitted_testcase_scores?: #range // all testcase scores be in range
permitted_aggregate_score?: #range // the aggregate score must be in range
}
// Useful abbreviations
// ====================
//
// Each abbreviation is a shorthand for common #expectations struct, as follows:
_expectation_for_abbreviation: {
_abbreviation: #abbreviation
if _abbreviation == "accepted" {
permitted_testcase_verdicts: ["AC"]
with_margin: true
}
if _abbreviation == "wrong answer" {
permitted_testcase_verdicts: ["AC", "WA"]
required_testcase_verdicts: ["WA"]
}
if _abbreviation == "runtime exception" {
permitted_testcase_verdicts: ["AC", "RTE"]
required_testcase_verdicts: ["RTE"]
}
if _abbreviation == "time limit exceeded" {
permitted_testcase_verdicts: ["AC", "TLE"]
required_testcase_verdicts: ["TLE"]
with_margin: true
}
if _abbreviation == "does not terminate" {
permitted_testcase_verdicts: ["AC", "RTE", "TLE"]
required_testcase_verdicts: ["RTE", "TLE"]
}
if _abbreviation == "not accepted" {
required_testcase_verdicts: ["RTE", "TLE", "WA"]
}} & #expectation
What changed?
The most visible change is that the #expectations
struct is now a lot larger. Here it is again:
#expectation: {
permitted_testgroup_verdicts?: [...#verdict] // only these testcase verdicts may appear
required_testgroup_verdicts?: [...#verdict] // at least one of these testcase verdicts must appear
permitted_aggregate_verdict?: [...#verdict] // the aggregate verdict must be in this list
message?: string // this judgemessage must appear
permitted_testcase_scores?: #range // all testcase scores be in range
permitted_aggregate_score?: #range // the aggregate score must be in range
with_margin?: true // Set m = max(r.time) over all r in R. Then t < timelimit / ac_margin or t >= timelimit * tle_margin
}
-
The key names distinguish between
testcase_verdicts
andaggregate_verdict
. Symmetrically, thescore
field has split into two (and this will do a lot of good!). -
The new boolean
with_margin
has also been added. In ototal, 3 new fields, and several renamings. Have a look. -
Also,
#result
got a new field
#result: {
testcase_name!: #name
...
}
We need this for sorting verdicts (so we can compute aggregate verdicts), and for applying the scoring aggregation rules (which depends on testdata.yaml
file contents and their full path names).
I added the !
to make it explicitly mandatory.
-
I didn’t bother to specify the aggregate verdict in CUE, it’s not clearer than the prose expression I wrote down.
-
Note that none of the abbreviations use aggregated verdicts or scores. (But they do use
with_margin
)