Expectations 0.7.2

Here is my best shot at Expectations 0.7.2, much of it in CUE:

_Changed in 0.7.1_: Removed `false` from the values for `with_margin`

_Changed in 0.7.2_: Restricted `expectation` keys based on `problem.type`

```cue
// The registry maps submission name patterns to expectations.
// A submission whose name matches a pattern must satisfy its expectations.
// (In particular, a submission can match several patterns, and thus
// needs to satisfy several expectations.)
// The semantics of "matching a pattern" are not defined here
// (possibilities: regexen, prefix, globbing).

#registry: close({[string]: #root_expectations})

// Problem-wide constants
// ======================
// Set by the judging environment
time_limit: >0
// Set in problem.yaml; these are problem.timing.tle_margin and problem.timing.ac_margin
tle_margin: >1
ac_margin:  >1

// Testcase Result
// ===============
// The result of running a submission on a testcase

#result: {
	testcase_name!: #name
	time!:         >=0
	if time <= time_limit {
		// When the submission terminates within the time limit
		// its exit status is either a success or a failure:
		terminated_successfully!: bool
		if terminated_successfully {
			// The output of successfully terminated submissions is
			// validated by the output validator, the resulting values are:
			output_validated!: bool
			message?:          string
			score?:            >=0
		}
	}
}

// The verdicts that can occur in the expectations framework are:

#verdict: "AC" | "WA" | "RTE" | "TLE"

// Testcase Verdict
// ================
//
// The verdict of (a submission's execution on) a single testcase is
// defined in terms of the testcase result in terms of time, termination
// status, and output validation as follows.

#verdict_for_result: {
	// The verdict for the _result
	_result: #result
	let t = _result.time
	if t >= time_limit {"TLE"}
	if t < time_limit {
		if !_result.terminated_successfully {"RTE"}
		if _result.terminated_successfully {
			if !_result.output_validated {"WA"}
			if _result.output_validated {"AC"}
		}
	}
}

// Aggregate verdict
// =================

// For a linearly ordered sequence of verdicts, the aggregate verdict is
// * the first occurrence of "WA", "TLE", or "RTE" if it exist
// * "AC" if the list is empty or only contains "AC"


// Expectations
// ============

#root_expectations: {
	// Set expectations for all testcases
	#expectation | #range | #abbreviation

	// And/or set them for testcases matching a pattern (such as testgroups)
	[=~"^(sample|secret)"]: #expectation | #range | #abbreviation
}

// Often-used expectations are specified in terms of abbreviations

#abbreviation: "accepted" | "wrong answer" | "runtime exception" | "time limit exceeded" | "does not terminate" | "not accepted"

// Scoring problems can set the range

#range: number | ordered_tuple

ordered_tuple: tuple=[number, number & >=tuple[0]]

// In general, we can set fine-grained expectations in terms of which verdicts, timing, // messages, scores are allowed and disallowed for a set of results R

#expectation: {
	permitted_testgroup_verdicts?: [...#verdict] // only these testcase verdicts may appear
	required_testgroup_verdicts?: [...#verdict]  // at least one of these testcase verdicts must appear
	message?:                   string // this judgemessage must appear
	with_margin?: true // Set m = max(r.time) over all r in R. Then t < timelimit / ac_margin or t >= timelimit * tle_margin

      // pass-fail problems only: 
	permitted_aggregate_verdict?: [...#verdict]  // the aggregate verdict must be in this list 
        // scoring problems only:	
	permitted_testcase_scores?: #range // all testcase scores be in range
	permitted_aggregate_score?: #range // the aggregate score must be in range
}

// Useful abbreviations
// ====================
//
// Each  abbreviation is a shorthand for common #expectations struct, as follows:

_expectation_for_abbreviation: {
	_abbreviation: #abbreviation
	if _abbreviation == "accepted" {
		permitted_testcase_verdicts: ["AC"]
		with_margin: true
	}
	if _abbreviation == "wrong answer" {
		permitted_testcase_verdicts: ["AC", "WA"]
		required_testcase_verdicts: ["WA"]
	}
	if _abbreviation == "runtime exception" {
		permitted_testcase_verdicts: ["AC", "RTE"]
		required_testcase_verdicts: ["RTE"]
	}
	if _abbreviation == "time limit exceeded" {
		permitted_testcase_verdicts: ["AC", "TLE"]
		required_testcase_verdicts: ["TLE"]
		with_margin: true
	}
	if _abbreviation == "does not terminate" {
		permitted_testcase_verdicts: ["AC", "RTE", "TLE"]
		required_testcase_verdicts: ["RTE", "TLE"]
	}
	if _abbreviation == "not accepted" {
		required_testcase_verdicts: ["RTE", "TLE", "WA"]
	}} & #expectation
```

# What changed?

The most visible change is that the `#expectations` struct is now a lot larger. Here it is again:

```cue
#expectation: {
	permitted_testgroup_verdicts?: [...#verdict] // only these testcase verdicts may appear
	required_testgroup_verdicts?: [...#verdict]  // at least one of these testcase verdicts must appear
	permitted_aggregate_verdict?: [...#verdict]  // the aggregate verdict must be in this list 
	message?:                   string // this judgemessage must appear
	permitted_testcase_scores?: #range // all testcase scores be in range
	permitted_aggregate_score?: #range // the aggregate score must be in range
	with_margin?: true // Set m = max(r.time) over all r in R. Then t < timelimit / ac_margin or t >= timelimit * tle_margin
}
```

1. The key names distinguish between `testcase_verdicts` and `aggregate_verdict`. Symmetrically, the `score` field has split into two (and this will do a lot of good!).

2. The new boolean `with_margin` has also been added. In ototal, 3 new fields, and several renamings. Have a look.


3. Also, `#result` got a new field
```cue
#result: {
   testcase_name!: #name
   ... 
}
```
We need this for sorting verdicts (so we can compute aggregate verdicts), and for applying the scoring aggregation rules (which depends on `testdata.yaml` file contents and their full path names).

I added the `!` to make it explicitly mandatory.

4. I didn’t bother to specify  the aggregate verdict in CUE, it’s not clearer than the prose expression I wrote down.

5. Note that none of the abbreviations use aggregated verdicts or scores. (But they do use `with_margin`)





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expectations 0.7.2 #137

What changed?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expectations 0.7.2 #137

Description

What changed?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions