Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generated docs for dictionary repos #21

Open
snomos opened this issue Jan 9, 2025 · 4 comments
Open

Add generated docs for dictionary repos #21

snomos opened this issue Jan 9, 2025 · 4 comments
Assignees

Comments

@snomos
Copy link
Member

snomos commented Jan 9, 2025

Use GH Actions to generate info about:

  • number of entries
    • possibly also split on POS
  • number of translations
  • number of examples
  • authors

Maybe more.

@Phaqui
Copy link
Contributor

Phaqui commented Jan 10, 2025

Should it run on every commit? If there are errors in the xml sources, should the script try its best, or write an error?
Where should the information be stored? In a stats.json file in the root, for example?

@Phaqui
Copy link
Contributor

Phaqui commented Jan 10, 2025

Okay, from the way I understand this now, it seems like it should be possible to commit to the repository from a github action on push to the repositories. However, that new commit is just that, a new commit. This means that users will have to pull after pushing, to retrieve the newly created commit (that the github action commited).

It's possible to run the stats generation script on a calendar interval, e.g. every midnight instead, to cut down on the number of commits. Of course, that may introduce a little bit of an inconsistency in the stats.

I wrote a script in python (and also rust, to compare) which will count lemmas by POS, but we may want to discuss details of this further, before I add such an action.

@Phaqui
Copy link
Contributor

Phaqui commented Jan 13, 2025

I pushed the script to do this. It is giellalt/giella-core/dicts/scripts/gt_pos_counts.(py|rs).

@snomos
Copy link
Member Author

snomos commented Jan 13, 2025

Thanks!

If we publish using a separate gh_pages branch (as is common), we can build the data as part of the CI process, and it will be committed to that branch only. That is all we need, and that is what we do right now for the maturity metadata in the lang-xxx repos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants