Skip to content

Add submission/rejected directory #139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
niemela opened this issue Dec 1, 2023 · 8 comments · Fixed by #144
Closed

Add submission/rejected directory #139

niemela opened this issue Dec 1, 2023 · 8 comments · Fixed by #144

Comments

@niemela
Copy link
Member

niemela commented Dec 1, 2023

As mentioned in #17 and elsewhere...

@niemela
Copy link
Member Author

niemela commented Dec 2, 2023

See also #87

@thorehusfeldt
Copy link
Contributor

thorehusfeldt commented Dec 3, 2023

In terms of excectations 0.7, I think this would mean

// 0.7
if _abbreviation == "rejected" {
    required_testcase_verdicts: ["RTE", "TLE", "WA"]
}

Note that this definition does not set with_margin (I don’t know how to do that; I’m not sure 0.7 is good enough to express intention here.) Better people than me should try to clarify what is meant; in particular if a successfully validated testcase result with t < timelimit (i.e., what is called AC- in 0.5) is a permitted testcase verdict.

In excectations 0.5 I would have defined

// 0.5
if _abbreviation == "rejected" {
    required: ["RTE", "TLE", "WA"] // Note: TLE means “too slow with margin”
 }

or, in an earlier version that used ! instead of -:

// 0.5-alpha
if _abbreviation == "rejected" {
    required: ["RTE", "TLE!", "WA"] // Note: TLE! means “too slow with margin”
 }

both of it make it clear that we want the TLE verdict to fail with margin (which is the right thing to do, I believe.) Neither 0.6 nor 0.7 allow this, unless I’m missing something.

I think this is an excellent example of something we want to be able to specify! :-) To me, this is another point in favour of 0.5 (clearer, shorter, more expressive) than both 0.6 and 0.7.

@niemela
Copy link
Member Author

niemela commented Dec 3, 2023

In one of my proposals (the one you based "0.7" on I believe?) this would be:

rejected:
   required: ["RTE", "TLE", "WA"]
   used_for_timing: true

(I don't really understand why you can't do basically the same thing in 0.7 though?)

There are two downsides with this compared to 0.5:

  • It's longer (but not terribly so)
  • It's overly restrictive, it would disallow a submission that fails (only) on "RTE" or "WA" between time_limit/ac_margin and time_limit (using what I think is the latest and/or clearest definition)

There is one upside:

  • It's clearer. Because the "use for timing" is so explicit. This is subjective of course (and is basically the inverse of the "it's longer" downside).

I think the overly restrictive issue is fixable, but we need a better definition that the linked one.

@thorehusfeldt
Copy link
Contributor

thorehusfeldt commented Dec 3, 2023

Consider the expectation

// 0.7
rejected:
   required: ["RTE", "TLE", "WA"]
   with_margin: true

in a problem with timelimit=1 and both margins 2, a submission with the following results on a two-testcase problem:

secret/a: { testcase_verdict: "AC";  time: .7 } 
secret/b: { testcase_verdict: "WA"; time .2 }

This submission would not satisfy the expectation because its maximum time does not satisfy the with_margin constraints (it fails to be <.5 or >=2). Maybe that’s what you want; I can’t tell.

I think the overly restrictive issue is fixable, but we need a better definition that the linked one.

Yes! Please provide one. I have a hard time navigating “It’s clearer! […] so explicit” and the simultaneous lack of an actual definition. I’ve tried to provide a definition (which I’m unhappy with, for reasons enumerated several times), so better people than me must try.

@eldering
Copy link
Collaborator

eldering commented Dec 3, 2023

Consider the expectation

// 0.7
rejected:
   required: ["RTE", "TLE", "WA"]
   with_margin: true

TBH, I think that this (I'm referring to the underlying meaning, not the specific v0.7 syntax) does not make sense to specify and I think we could say that with_margin: true should be incompatible with specifying both a TLE and another verdict.

Let's drop the RTE from the above example for simplicity. If we think that the submission should either WA or TLE with margin on the same testcase, then that's a fairly esoteric expectation (although possible in theory I guess). But if this is expected on different testcases, then we should just express it that way, I guess something like

weird_rejected/jaap.cc:
  secret/a:
    required: ['TLE']
  secret/b:
    required: ['WA']

I'm not sure my syntax matches the latest v0.7 syntax, but I hope the intent is clear.

@thorehusfeldt
Copy link
Contributor

thorehusfeldt commented Dec 4, 2023

Your intent is indeed clear, and the fine-grained specification of weird_rejected makes sense (mutatis mutandis) in most of the proposed syntaxen.

The present thread is about specifying the abbreviation rejected (which then naturally would specify the expectation for submissions placed in submissions/rejected.) In English prose, I believe the intended meaning of such a directory would be:

The submission receives a “rejected testcase verdict” on some testcase. “Rejected testcase verdict” here means RTE, WA, or (TLE with margin). (Implicitly, there is no restriction on permitted testcase verdicts; in particular there can be testcases that receive the verdict AC and run in time timelimit / ac_margin <= t < timelimit, and there can be testcases that receive the verdict TLE and run in time timelimit <= t < timelimit * tle_margin.)

Unless I’m missing something, the above cannot be expressed in expectations 0.6 nor expectations 0.7.

It can be expressed in expectations 0.5 in a single line like this:

rejected:
    required: ["WA", "RTE", "TLE"] // note that TLE means “t >= timelimit * tle_margin” in 0.5

Recall that the key required in 0.5 is called required_testcase_verdict in 0.7 to distinguish it from aggregate verdict (which is not part of 0.5).

My current conclusion is that if rejected is a desirable expectation abbreviation (or submission subdirectory that should be definable as an expectation) then neither of my proposals 0.6 or 0.7 is strong enough to express desired expectations. But I could be missing something, and invite attempts to define it.

I am weakly in favour of actually specifying rejected, so if my analysis is right, then this updates my preferences in favour of 0.5. (With the caveat that I increasingly prefer the set AC! AC WA RTE TLE TLE! of testcase verdicts to the set AC AC- WA RTE TLE- TLE.)

@RagnarGrootKoerkamp
Copy link
Collaborator

RagnarGrootKoerkamp commented Dec 4, 2023

My starting point is that rejected should be able to contain pretty much everything that does not fit somewhere else.
I think that from this it follows that we don't want to use with_margins or anything similar here.

Suppose we have secret/1.in as only testcase, 1s timelimit, 2x safety margin in both directions.

I think all of the below should be rejected without further warning:

1.in: WA, 0.25s
1.in: WA, 0.75s
1.in: WA, 1.5s  (also TLE)
1.in: WA, 2.5s (also TLE, probably aborted after 2s)

But currently we can't detect a WA after 1.5s, since that always gets reported as TLE (which is correct IMO), and we never run the output validator. So from this we must allow TLE without safety margin.

(We could enrich result-per-testcase by running the output validator whenever the submission finished (ie not RTE or aborted TLE), so that AC/WA becomes an independent axis from finished/TLE/RTE, but I'm not in favour of that.)


The TLE vs TLE with margin case here only matters if a submission never RTEs or WAs. But then you can just put it in does_not_terminate anyway, right? Because there is exactly where we make this distinction clear and do assert the TLE condition.

Somewhat surprisingly, this leads to the following suggestion: We could instead specify:

rejected:
    required: ["WA", "RTE"] // note: no TLE here

For the typical case where more than 1 type of non-AC result occurs this works. But then again you lose portability because maybe you put something in rejected because on some other python version it does work but it just terribly slow (without crashing).

In the end I think required: [WA, RTE, TLE] with use_margins: false is completely fine for me.

@eldering
Copy link
Collaborator

eldering commented Dec 4, 2023

I very much agree with @RagnarGrootKoerkamp 's analysis, including

(We could enrich result-per-testcase by running the output validator whenever the submission finished (ie not RTE or aborted TLE), so that AC/WA becomes an independent axis from finished/TLE/RTE, but I'm not in favour of that.)

and finally

In the end I think required: [WA, RTE, TLE] with use_margins: false is completely fine for me.

So, I'd like to make a bolder statement and say that we can simply forbid using use_margins: true while at the same time specifying required: ["TLE", x] with x any of the other non-AC verdicts, and that should be perfectly fine as it doesn't prevent us from expressing anything that is very relevant. But I'm happy to hear a counter-example!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants