Skip to content

Allowing run_time_error solutions to either WA, or TLE #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jsannemo opened this issue Sep 6, 2022 · 10 comments
Closed

Allowing run_time_error solutions to either WA, or TLE #17

jsannemo opened this issue Sep 6, 2022 · 10 comments
Assignees

Comments

@jsannemo
Copy link
Contributor

jsannemo commented Sep 6, 2022

I thought this had been the case at some point. It's something I was bitten by recently in Coding Cup, having a solution that was incorrect (in a crashy way), but which sometimes manifested as a WA and sometimes as a TLE.

TLE also allows WA since we only want to verify that /some/ test case times it out, but the above case doesn't seem to be clearly mappable in the spec right now?

@niemela @simonlindholm

@niemela
Copy link
Member

niemela commented Sep 7, 2022

If we allow submissions that claim to be RTE to either WA or TLE, then what are we really testing? Would having a "rejected" directory, with the requirement being that it is not accepted (i.e. it can RTE, WC, TLE, ...) solve this need?

@jsannemo
Copy link
Contributor Author

jsannemo commented Sep 7, 2022

Yes, that's fine too.

I guess the reason why I'm fine with letting RTE get TLE and WA is that:

  1. In C++, an RTE can often result in either
  2. I can't ever remembered writing a solution where I want to test for RTE, unlike WA and TLE - solutions only get placed there because the happen to crash on a case (except they sometimes don't /always/ do...)

In the end I don't care very much, either solves my problem, so pick whatever you prefer :)

@jsannemo
Copy link
Contributor Author

jsannemo commented Sep 7, 2022

@RagnarGrootKoerkamp what does BAPC tools call this?

@RagnarGrootKoerkamp
Copy link
Collaborator

I'm using the Domjudge way to handle this, using @EXPECTED_RESULTS@:, see here.

@jsannemo
Copy link
Contributor Author

jsannemo commented Sep 7, 2022

Are they placed in an arbitrary folder matching one of the expected verdicts?

+1 for rejected anyway

@RagnarGrootKoerkamp
Copy link
Collaborator

Yes indeed, any of the matching folders works.

I think I have an assert somewhere that the folder it is in must be one of the listed verdicts.

submissions/rejected sounds convenient indeed

jsannemo added a commit to jsannemo/problem-package-format that referenced this issue Sep 7, 2022
As discussed in Kattis#17.

I added some normative guidance on the folder for users (since I think this spec is not use primarily by implementers, but by problem creators).
@thorehusfeldt
Copy link
Contributor

thorehusfeldt commented May 20, 2023

In case anybody is still here, I have a draft implementation that allows you to specify this in `/submissions/expected_grades.yaml' like this:

time_limit_exceeded/recursive.py:
  verdict:
    - RTE
    - TLE

or, if you like it terse:

time_limit_exceeded/recursive.py: ["RTE", "TLE"]

This can be specified down to individual testgroups, so you could also do, hypothetically:

time_limit_exceeded/recursive.py:
  subgroups:
    sample: AC
    secret:
      group1: AC
      group2: AC
      group3: ["TLE", "RTE"]  

I have a BAPCtools fork that does this here: https://github.com/thorehusfeldt/BAPCtools

In, green the expected verdicts are shown. Testgroup data/secret/alcohols got an unexpected WA, so it’s red.
image

This is very preliminary, but it parses a yaml file with arbitrarily rich specifications per submission and per testgroup, and compares with the default grader (which I added to BAPCtools) for each internal node of the testdata tree.

I think this is the right way of doing it (or close enough), in particular for test groups. It is much superior to my own @EXPECTED_GRADES@ approach.

@eldering
Copy link
Collaborator

I prefer the @EXPECTED_RESULTS@ approach to encode this information inside the source code itself. This is meta-data that I think is intrinsically related to the submission (of course in the context of the problem), and by encoding it in the source, we ensure the meta-data is not lost e.g. when uploading the submission into a CCS or even when forwarding it from one to another CCS (e.g. when shadoing at the ICPC WFs).

@thorehusfeldt
Copy link
Contributor

I’ve played around with various ideas now.

For editing and curation, I find it much more pleasant to have a single-file overview.

(Use-case: add another testgroup, or merge two existing testgroups. I can do this very quickly in a single file, with no errors. I also with a single glance and check that all submissions get AC on sample, etc.) Also, the YAML could be syntax-checked.

On the other hand, when writing a new submission, or communicating the intent of a submission to others, the source-embedded approach makes more sense.

The semantics of “my” expected-grades proposal are orthogonal to this. An expectation could be defined (along with many others) in a common expecations.yaml file

...
mixed/th.py: ["TLE", "RTE"]
time_limit_exceeded/recursive.py:
  verdict: AC
  score: 100
  subgroups:
    sample: AC
    secret:
      group1: AC
      group2: AC
      group3: ["TLE", "RTE"]  
...

but it could just as well reside in the source code of time_limit_exceeded/recursive.py:

#! /usr/bin/env python3
"""
@EXPECTATIONS_BEGIN@
  verdict: AC
  score: 100
  subgroups:
    sample: AC
    secret:
      group1: AC
      group2: AC
      group3: ["TLE", "RTE"]  
@EXPECTATIONS_END@
"""

def solve(instance):
 ...

(I have no opinion about the convention for source code embedding syntax.)

Contests or traditions could allow both or either; a tool could warn if a submission supplies both (just it currently warns about inconsistency with expectations implied by the placement of the source file.)

@niemela
Copy link
Member

niemela commented Dec 1, 2023

Closing this issue because the actual issue is covered by both adding rejected and the expectation framework. The former we have agreed to multiple times. I (somewhat superfluously) created a ticket for just that (#139) just so we don't forget to actually do it. The latter is WIP, currently discussed in #137.

The discussions in the thread are still interesting, but the actual issue is now closed.

@niemela niemela closed this as completed Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants