Skip to content

Expectations 0.7.2 #137

Closed
Closed
@thorehusfeldt

Description

@thorehusfeldt

Here is my best shot at Expectations 0.7.2, much of it in CUE:

Changed in 0.7.1: Removed false from the values for with_margin

Changed in 0.7.2: Restricted expectation keys based on problem.type

// The registry maps submission name patterns to expectations.
// A submission whose name matches a pattern must satisfy its expectations.
// (In particular, a submission can match several patterns, and thus
// needs to satisfy several expectations.)
// The semantics of "matching a pattern" are not defined here
// (possibilities: regexen, prefix, globbing).

#registry: close({[string]: #root_expectations})

// Problem-wide constants
// ======================
// Set by the judging environment
time_limit: >0
// Set in problem.yaml; these are problem.timing.tle_margin and problem.timing.ac_margin
tle_margin: >1
ac_margin:  >1

// Testcase Result
// ===============
// The result of running a submission on a testcase

#result: {
	testcase_name!: #name
	time!:         >=0
	if time <= time_limit {
		// When the submission terminates within the time limit
		// its exit status is either a success or a failure:
		terminated_successfully!: bool
		if terminated_successfully {
			// The output of successfully terminated submissions is
			// validated by the output validator, the resulting values are:
			output_validated!: bool
			message?:          string
			score?:            >=0
		}
	}
}

// The verdicts that can occur in the expectations framework are:

#verdict: "AC" | "WA" | "RTE" | "TLE"

// Testcase Verdict
// ================
//
// The verdict of (a submission's execution on) a single testcase is
// defined in terms of the testcase result in terms of time, termination
// status, and output validation as follows.

#verdict_for_result: {
	// The verdict for the _result
	_result: #result
	let t = _result.time
	if t >= time_limit {"TLE"}
	if t < time_limit {
		if !_result.terminated_successfully {"RTE"}
		if _result.terminated_successfully {
			if !_result.output_validated {"WA"}
			if _result.output_validated {"AC"}
		}
	}
}

// Aggregate verdict
// =================

// For a linearly ordered sequence of verdicts, the aggregate verdict is
// * the first occurrence of "WA", "TLE", or "RTE" if it exist
// * "AC" if the list is empty or only contains "AC"


// Expectations
// ============

#root_expectations: {
	// Set expectations for all testcases
	#expectation | #range | #abbreviation

	// And/or set them for testcases matching a pattern (such as testgroups)
	[=~"^(sample|secret)"]: #expectation | #range | #abbreviation
}

// Often-used expectations are specified in terms of abbreviations

#abbreviation: "accepted" | "wrong answer" | "runtime exception" | "time limit exceeded" | "does not terminate" | "not accepted"

// Scoring problems can set the range

#range: number | ordered_tuple

ordered_tuple: tuple=[number, number & >=tuple[0]]

// In general, we can set fine-grained expectations in terms of which verdicts, timing, // messages, scores are allowed and disallowed for a set of results R

#expectation: {
	permitted_testgroup_verdicts?: [...#verdict] // only these testcase verdicts may appear
	required_testgroup_verdicts?: [...#verdict]  // at least one of these testcase verdicts must appear
	message?:                   string // this judgemessage must appear
	with_margin?: true // Set m = max(r.time) over all r in R. Then t < timelimit / ac_margin or t >= timelimit * tle_margin

      // pass-fail problems only: 
	permitted_aggregate_verdict?: [...#verdict]  // the aggregate verdict must be in this list 
        // scoring problems only:	
	permitted_testcase_scores?: #range // all testcase scores be in range
	permitted_aggregate_score?: #range // the aggregate score must be in range
}

// Useful abbreviations
// ====================
//
// Each  abbreviation is a shorthand for common #expectations struct, as follows:

_expectation_for_abbreviation: {
	_abbreviation: #abbreviation
	if _abbreviation == "accepted" {
		permitted_testcase_verdicts: ["AC"]
		with_margin: true
	}
	if _abbreviation == "wrong answer" {
		permitted_testcase_verdicts: ["AC", "WA"]
		required_testcase_verdicts: ["WA"]
	}
	if _abbreviation == "runtime exception" {
		permitted_testcase_verdicts: ["AC", "RTE"]
		required_testcase_verdicts: ["RTE"]
	}
	if _abbreviation == "time limit exceeded" {
		permitted_testcase_verdicts: ["AC", "TLE"]
		required_testcase_verdicts: ["TLE"]
		with_margin: true
	}
	if _abbreviation == "does not terminate" {
		permitted_testcase_verdicts: ["AC", "RTE", "TLE"]
		required_testcase_verdicts: ["RTE", "TLE"]
	}
	if _abbreviation == "not accepted" {
		required_testcase_verdicts: ["RTE", "TLE", "WA"]
	}} & #expectation

What changed?

The most visible change is that the #expectations struct is now a lot larger. Here it is again:

#expectation: {
	permitted_testgroup_verdicts?: [...#verdict] // only these testcase verdicts may appear
	required_testgroup_verdicts?: [...#verdict]  // at least one of these testcase verdicts must appear
	permitted_aggregate_verdict?: [...#verdict]  // the aggregate verdict must be in this list 
	message?:                   string // this judgemessage must appear
	permitted_testcase_scores?: #range // all testcase scores be in range
	permitted_aggregate_score?: #range // the aggregate score must be in range
	with_margin?: true // Set m = max(r.time) over all r in R. Then t < timelimit / ac_margin or t >= timelimit * tle_margin
}
  1. The key names distinguish between testcase_verdicts and aggregate_verdict. Symmetrically, the score field has split into two (and this will do a lot of good!).

  2. The new boolean with_margin has also been added. In ototal, 3 new fields, and several renamings. Have a look.

  3. Also, #result got a new field

#result: {
   testcase_name!: #name
   ... 
}

We need this for sorting verdicts (so we can compute aggregate verdicts), and for applying the scoring aggregation rules (which depends on testdata.yaml file contents and their full path names).

I added the ! to make it explicitly mandatory.

  1. I didn’t bother to specify the aggregate verdict in CUE, it’s not clearer than the prose expression I wrote down.

  2. Note that none of the abbreviations use aggregated verdicts or scores. (But they do use with_margin)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions