Skip to content

Conversation

evouga
Copy link
Collaborator

@evouga evouga commented Sep 2, 2025

Fixes #453

(Based on my interpretation of the consensus there.)

@Matistjati
Copy link
Collaborator

Fredrik suggested
However, the behavior of the problem package must remain the same if all files starting with a period are removed.
which I think I prefer.

@RagnarGrootKoerkamp
Copy link
Collaborator

RagnarGrootKoerkamp commented Sep 2, 2025

Fredrik suggested However, the behavior of the problem package must remain the same if all files starting with a period are removed. which I think I prefer.

I like that, but it has the potential of being a slightly circular definition, since this implies that spec ignores all dotfiles, which then implies that dotfiles are automatically not relevant to the package.

Edit: another question this raises: are there then any dotfiles at all that do affect the package?

@niemela
Copy link
Member

niemela commented Sep 2, 2025

I like that, but it has the potential of being a slightly circular definition, since this implies that spec ignores all dotfiles, which then implies that dotfiles are automatically not relevant to the package.

How is that circular? And isn't that what we want? Take .gitignore for example, that is "not relevant for the package" itself, it's only relevant for tooling that works with the package. Isn't that what we wanted?

@evouga
Copy link
Collaborator Author

evouga commented Sep 2, 2025

What is "the behavior of the problem package"? What does it mean for it to "remain the same"?

Note for instance that arguably it is possible to add or remove some test cases and submissions without changing "the behavior of the problem package" (the same set of user submissions provably get the same verdict with and without those test cases, etc.) Do we really want to allow dotfile test cases with this property?

I couldn't think of any precise way to define these so I went with an explicit rule.

@RagnarGrootKoerkamp
Copy link
Collaborator

However, the behavior of the problem package must remain the same if all files starting with a period are removed.

This implies that the problem package ignores all dotfiles, which then implies this statement itself. Thus, we might as well state any dotfiles in a problem package are ignored, which is much more direct than the above.

But it seems we might want to special-case a few directories where we really do not wany any dotfiles, such as submissions or testcases, which contradicts the above statement.

(Well unless you argue that removing data/secret/.dotfile does change the meaning of the package, since then the package goes from broken to working, implying that data/secret/.dotfile is not allowed in the first place.)

Anyway, more direct wording is better.


Side note: we used to have a bunch of .gitkeep files hanging around in data/{sample,secret} since we generate the content and we don't like git removing the directories. Would be nice to allow those?

@niemela
Copy link
Member

niemela commented Sep 3, 2025

This implies that the problem package ignores all dotfiles, which then implies this statement itself. Thus, we might as well state any dotfiles in a problem package are ignored, which is much more direct than the above.

Yeah, this seems to be the most direct way to say what we mean.

OTOH, a risk with this is that "dotfiling" now is a way to "comment out" a file, and we don't really want that...

But it seems we might want to special-case a few directories where we really do not wany any dotfiles, such as submissions or testcases, which contradicts the above statement.

Which directories would this be? And why?

@niemela
Copy link
Member

niemela commented Sep 3, 2025

What is "the behavior of the problem package"? What does it mean for it to "remain the same"?

I think (?) the wording I use was "meaning" or "semantics", rather than behavior, but that it similarly unclear.

The intended meaning of this is "removing them changes nothing for how the problem is run or evaluated".

Note for instance that arguably it is possible to add or remove some test cases and submissions without changing "the behavior of the problem package" (the same set of user submissions provably get the same verdict with and without those test cases, etc.) Do we really want to allow dotfile test cases with this property?

We absolutely don't want that, but I don't think you can do that. Removing or adding a test case clearly (?) changes how a problem is judged.

@evouga
Copy link
Collaborator Author

evouga commented Sep 3, 2025

Well for instance, a dotfile-testcase that exactly duplicates a non-dotfile-testcase won't change the verdict of any user submission if it is added or removed.

I know what you mean---that "how a problem is judged" includes what files the judge implementation reads/executes in the process of determining the verdict, regardless of whether the outcome is affected---but it's not clear how to specify this formally.

@niemela
Copy link
Member

niemela commented Sep 3, 2025

Well for instance, a dotfile-testcase that exactly duplicates a non-dotfile-testcase won't change the verdict of any user submission if it is added or removed.

Right. But it does change the running of the problem. I don't think that out definition of "no effect" should be only based on verdict.

@evouga evouga changed the title Allow underscores and periods at start of file name (but no period at start of test cases or programs) Allow periods/underscores/non-leading-dashes in files and folders Sep 5, 2025
@evouga
Copy link
Collaborator Author

evouga commented Sep 5, 2025

Pushed a revised attempt: now ._- is allowed anywhere in file and folder names (except for leading dashes), but the problem package shall behave as if anything with a leading . were removed.

Copy link
Collaborator

@Matistjati Matistjati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this formulation.

Co-authored-by: Joshua Bergman Andersson <[email protected]>
Copy link
Collaborator

@eldering eldering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the small reformulation

@evouga
Copy link
Collaborator Author

evouga commented Sep 12, 2025

@niemela Can we merge this, or discuss remaining issues?

@evouga
Copy link
Collaborator Author

evouga commented Sep 19, 2025

Added two characters to the regex, based on feedback from judges during problem preparation for the North America Qualifier:
$ (shows up in Java .class filenames)
~ (emacs, old Windows)

@eldering
Copy link
Collaborator

$ (shows up in Java .class filenames)

Nitpick counter-argument: this could potentially lead to issues when these filenames are processed in shell code that doesn't escape things properly and evaluates things as variable names. I guess this is not a strong enough argument though.

~ (emacs, old Windows)

Those are backup files, so should never affect the actual problem package, so I'd argue can be excluded with the same reasoning as a leading dot.

@mpsijm
Copy link
Contributor

mpsijm commented Sep 21, 2025

Regarding $: should the problem package contain compiled files at all?

Copy link
Collaborator

@eldering eldering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least ~ should be excluded.

@RagnarGrootKoerkamp
Copy link
Collaborator

RagnarGrootKoerkamp commented Sep 21, 2025

I feel like I said this before but I don't see it in this thread here, but I still feel that we're conflating things:

  1. the tree during development
  2. the final package as uploaded to contest systems

With 1 I'm really fine with whatever, but with 2 it makes sense to be strict (because it's nice to error if there's any files in there that are not recognised by the format).
I don't think there's any need to upload files with ~ and $, and they should not be allowed in a final package, but that does not mean tools have to error on the development working directory.

@evouga
Copy link
Collaborator Author

evouga commented Sep 21, 2025

The issue is that for judges developing the problem package, there is no difference between (1) and (2). They work in a git repo and pollute the package with all kinds of temporary and development files (that may or may not be filtered out in .gitignore) and do not want to see warnings or errors about how their .class files, .git folders, etc. are non-compliant with the problem package spec.

So if we want a regex that is only checked at installation, I agree that we can go back to something much simpler (we can require that the authors remove all of the .dotfiles; why do we need .git folders, .gitkeep files, etc. at installation?). It can be the tooling's responsibility to create a "canonical" problem package from a git repo by exporting the repo and removing .gitkeep and other clutter.

If we want a regex that is enforced during development (so that authors get early warning that their test case name is non-compliant, etc.) we need it to be lenient enough that they don't get errors from .class files etc.

@RagnarGrootKoerkamp
Copy link
Collaborator

Do people typically just zip the directory and upload that?

With BAPCtools we pretty much always bt zip to get a problem.zip, and I think that already filters the top-level directory to only include known elements. (But does include junk inside eg submissions/.)
For testcases it's already quite precise and finds all *.in files and then from those finds corresponding ans/out/... files.

Does the fact that some metadata files out of our control have ~ and $ in them imply that we'll accept testcases/submissions/programs that have those in their name? That seems silly.

extend rule for ignoring files and folders to all non-compliant names
@evouga
Copy link
Collaborator Author

evouga commented Sep 21, 2025

Third attempt:

  • .dotfiles are disallowed again by the regex
  • expanded the rule for ignoring files and folders to all non-compliant names
  • added a sentence encouraging tooling to check for and warn about test cases/program files that accidentally include invalid characters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider making file regex more permissive
6 participants