Use gettext for recurring phrases, fix a bunch of i18n issues!!!!! #684

johnzhou721 · 2025-06-28T22:57:21Z

Fixes #683, fixes #692, fixes #689 (3rd through plugin).

Notes about this PR

FINAL

REVISION July 9, 14:29 UTC

Deps

When beeware/lektor-i18n-plugin#6 is merged, I will modify the dependency to not depend on a local checkout but instead the finished v0.5.5. It adds some improvements, listed there in the changelog.

Therefore DO NOT APPLY PREVIEW before I update that dep.

Fuzziness

Since the old translations splits up sentences, and mistakes are being made in Chinese Simplified and Chinese Traditional regarding context, they're likely to occur in other languages as well. In addition, universally using ; and . (english variants) probably made some typography errors in other scripts.

Therefore, after much monologue, Ctrl + Uing, and comtemplation, I've decided to mark anything satisfying any of the following as fuzzy:

GitHub blame shows that the phrase has been changed in a PR that is not purely a translation PR. This brings up the possibility of machine translation (although not always, I guess there might be polyglots but let's be on the safe side)
There are english proper nouns in a language that does not use Latin script -- spaces etc. might need tweaking.
There are HTML tags and/or %(placeholder)s. This shows that the string used to be split up, so translators probably didn't have much context back then. Again, some people like me tend to search through every single template to find where the strings are used, but being on the safe side, there exists translators that are not as pedantic.
When I'm unsure about plural forms.
[EDIT] When older translated strings are missing punctuation and they're manually added. Different scripts have different punctuation, and the old templates are assuming . or , or ;.
Whenever multiple strings are concatnated together.
Whenever I doubt my own copy + pasting, or I accidentally fuzzy it.

Only things that are now not fuzzy should consist of simple phrases or completed sentences that are stored as a whole in the databags.

Let me know if you want all strings fuzzy that I've added.

Huge Diffs on PO / POT files

PO files used to have huge diffs because of wrong gettext versions on my macOS system. Solved using a GNU/Linux Virtual Machine running Ubuntu 22.04 which includes the same minor version as 24.04, which is what is used on CI while updating translations. TL;DR this has now been resolved.
POT files (still) have huge diffs because of switching to xgettext to merge pots instead of msgcat. This ensures no encoding issues, while preserving full context without weird headers for source strings.
English PO diff is caused by fixing a bug in which the added strings doesn't have a filled in msgstr.

Changes to the headers of PO files

The plural rules were manually added, as they needed to be. This got documented in the Lektor i18n Plugin as well.
The project Id Version field was manually added, else my translating software offline I use to copy in the strings won't save the files. Lektor i18n plugin has been updated to ensure generation of this field when initializing PO files.

Side effect of second bullet point is that there's a trivial inconsistency. When you initialize new po files now using that plugin, all pos will have

+# Persian translations for BeeWare package.
+# Copyright (C) 2025 THE BeeWare'S COPYRIGHT HOLDER
+# This file is distributed under the same license as the BeeWare package.
+# Automatically generated, 2025.
 #

On top of it; I've not added this since it'd take so much time and is purely cosmetic.

Why is pgettext used?

There's strings like , which is for seperator; Chinese uses 、 for that, and other conventions exist in other parts of the world. It's probably unclear what , is per se.
_(...) seems to strip whitespace, which is problematic when we have stuff like _("Superpower: ") where there's a trailing space. Using pgettext seems to work around this, and can also be used to tell translators that the space is free to remove if your language does not require an additional space in these circumstances. Example: In chinese, ： doesn't need a space after.

How did Labels Get Handled?

All labels are stored in a dict with a function to query it in templates/macros/labels.html. Some labels does not need translation, so they're stored as-is.

The file also includes the function for getting the list seperator comma-space string.

How did I handle plurals?

There's a few strings requiring pluralization -- I've marked the sprint helping and sprint helping plural as pluralized, and also the Gold Member to Gold Members on the front page, just in case we'll have more. If previously existing a plural form for the ENTIRE strings, I copy paste the regular form into cases for <=1 and the rest for all cases >1 and fuzzy. If nonexisting plural but the ORIGINAL STRING exists ENTIRELY, I machine translate the plural and paste them in all cases that's >1 and then fuzzy.

That said, maybe the giving talk etc also need plural forms; I'd need to work on that later (possibly in another PR) since this is taking too much time (6 days of continuous work).

What things did I manually Edit?

Traditional Chinese is missing a bunch of coverage, so I tried filling some strings in by converting from Simplified Chinese strings. However I marked them as fuzzy, and everything I did had identical result as Google Translate, so not so much to worry about.

Modified some simplified chinese strings with all the new context that I was able to get.

For Arabic, only change I made is to correct the semicolon at by AUTHOR; published DATE into the arabic semicolon, but didn't change anything else since I realized those things would be picked up by Weblate anyways. This string is also fuzzy.

For the former label.ini strings in Italian, capitalization is normalized (capitalize first letter when the rest strings of the similar category are). Those strings are all fuzzy.

For strings like {Silver, Gold, Platinum, Individual etc} Member: If the silver, gold, platinum, etc. strings existed in the databags but the word member doesn't, machine translate for the word Member is used and concatnated onto the silver, gold, platinum, etc. If those silver, gold, platinum etc words don't exist, but the word member does, no effort is made to complete the strings.

Misc Notes

I didn't touch the strings that overlapt with the site content at all except for Persian -- I just left them in their exisitng machine-translated state fuzzied. I only touched the entirely new strings.

What I'm Working on Now

Double and Triple checking that everything got copied + pasted correctly.

Questions

Are how I marked stuff as fuzzy okay with you? Should we just mark everything resulting from this PR as fuzzy?
Should I credit the translators properly by putting the databag-translator's name and email address into the po file, if no one so far had translated it on Weblate? Is it a privacy concern to commit... email addresses and (potentially full) names??
In the lektorproject, the locale for pt_BR is br and not pt_BR. What is the purpose of that?
https://github.com/beeware/beeware.github.io/blob/lektor/BeeWare.lektorproject#L17-L20 (EDIT -- normalized by me already)

Tell me if I'm worrying about all of this way too much.

Easter Egg

Bee-fore I started this PR, the missing number of strings in Chinese simplified in Weblate is 256 = 2^8. Now it's 254 since I translated some more in Weblate and resolved the conflicts.

PR Checklist:

All new features have been tested
All new features have been documented
I have read the CONTRIBUTING.md file
I will abide by the code of conduct

johnzhou721

Some notes. I'm really sweating right now after slaving away a solid 3 hours of my life that I'll never get back but am happy to give away... so need to step away from this for now, updating all the po files by hand is going to be hard, so @freakboy3742 or any adults w/ a credit card who signed up for a deepL key, maybe we should just deepL all the strings? Or I'll copy them in tomorrow or after dinner today.

johnzhou721 · 2025-06-28T23:04:23Z

BeeWare.lektorproject

@@ -83,7 +83,7 @@ locale = fa_IR
 lektor-github-repos = 0.1.1
 lektor-gravatar = 0.1.3
 lektor-markdown-admonition = 0.3.1
-git+https://github.com/beeware/lektor-i18n-plugin@v0.5.4 =
+../lektor-i18n-plugin =


Need to replace with new version after beeware/lektor-i18n-plugin#6 gets merged. That is just for extra safety, seem to work fine without it but you never know when the internal impls of any of those programs change...

../ is my local checkout I used to update the pos

johnzhou721 · 2025-06-28T23:05:37Z

babel.cfg

@@ -1,2 +1,3 @@
 [jinja2: **/templates/**.html]
 encoding = utf-8
+trimmed = True


See https://stackoverflow.com/questions/68868257/how-do-i-enable-the-trimmed-policy-in-jinja2.

i18n/contents+ar.po

johnzhou721 · 2025-06-28T23:06:56Z

packages/lektor_beeware_plugin/lektor_beeware_plugin.py

@@ -9,6 +9,9 @@

 @pass_context
 def translate(context, string, bag_name="translate"):
+    if bag_name == 'translate':
+        raise RuntimeError("Use the new gettext system instead")


While doing this PR. eventually I will replace all other bags as well.

johnzhou721 · 2025-06-28T23:07:12Z

templates/blog-post.html

@@ -8,16 +8,16 @@
  <div class="container">
    <p>{{ breadcrumbs(this) }}</p>
    <h1>{{ this.title }}</h1>
-    <p>{{ "posted_by"|trans }}
-        {% if this.mastodon_handle %}
+    {% set author_link %}


[sweating intensifies]

johnzhou721 · 2025-06-28T23:07:35Z

templates/event.html

+{% elif this.event_type == "keynote" %}
+  <p>{% trans title=this.title, url=this.url, talk_title=this.talk_title %}{{ speakers_list }} will be keynoting at {{ title }}, giving a presentation entitled "<a href="{{ url }}">{{ talk_title }}</a>".{% endtrans %}</p>
+{% elif this.event_type == "tutorial" %}
+  <p>{% trans title=this.title, url=this.url, talk_title=this.talk_title %}{{ speakers_list }} will be presenting a tutorial at {{ title }} entitled "<a href="{{ url }}">{{ talk_title }}</a>".{% endtrans %}</p>


sorry for the duplication here... but I realize I can't do away with it.

templates/project.html

johnzhou721 · 2025-06-29T00:05:54Z

@HalfWhitt I apologize that some code you wrote for translation bags might get deleted. The bird has flown away... (continuing the extended metaphor in the other thread)

johnzhou721 · 2025-06-29T01:44:09Z

Yikes... looks like if you're doing _(name) when name is a variable, it does not get extracted... this happens with the badge macro. Which means this PR must be reviewed extremely carefully for such things.

gettexting

johnzhou721 · 2025-06-29T01:47:22Z

FYI -- marked some arabic strings as fuzzy b/c cannot figure out where to put periods correctly... i'm using textedit to edit the po files directly, spent some time yak shaving to use emacs po-mode but realized I don't know how to use emacs...

johnzhou721 · 2025-06-29T02:09:38Z

Yikes. More strings not extracted out of event.html...

johnzhou721 · 2025-06-29T02:42:51Z

Isolating event.html into a seperate directory and deleting random components shows that deleting

{% for slug in this.speaker %}
    {%
        do speaker_names.update(
            {slug: members.filter(F._slug == slug).first().name}
        )
    %}
{% endfor %}

will cause babel to extract properly.

johnzhou721 · 2025-06-29T14:13:52Z

Progress: python-babel/babel#1216 reports a simple MWE for this situation.

johnzhou721 · 2025-06-29T16:12:06Z

Given that this seems to be a bug, I'm going to work around it by separating the logic into a seperate macro. To pass Python dicts around, I will put filters that converts to and from JSON in our plugin.

[sweats]

johnzhou721 · 2025-06-29T16:22:26Z

And... now the messages are extracted! Lektorbuilding to update the translation files and then committing.

johnzhou721 · 2025-06-30T02:13:09Z

FYI beeware/lektor-i18n-plugin#5 caused a huge diff in 42a8f3c because of the way xgetttext seems to output stuff differently, but a cursory diff will get you the conclusion that it's actually good because now the difference between the format of the pot and the po file is 0, sans the translated strings.

johnzhou721 · 2025-06-30T02:24:24Z

@freakboy3742 @HalfWhitt (latter -- since you started the freeform) I'd like some preliminary comments on this before I go copy all the strings into the po files and suggestions on how to work around the issue linked #684 (comment) ? 'cause this is going to produce huge, multithousand-line diffs.

Read the above comments though, if you have time, especially the last one. Thanks

johnzhou721 · 2025-06-30T02:58:29Z

Maybe CI is using Ubuntu 24.04 with gettext 0.21 while I'm using gettext 0.25 on macOS and we're hitting the test case over here at https://github.com/translate/translate/pull/5439/files which differentiates b/w <=0.23 and >0.23...

See: https://launchpad.net/ubuntu/noble/+package/gettext

OK... time to go yak shaving tomorrow to install gettext 0.21...

johnzhou721 · 2025-06-30T02:59:03Z

Hmm... maybe let me try my ubuntu 22.04 vm tomorrow. See how that works.

johnzhou721 · 2025-06-30T02:25:52Z

i18n/contents.pot


-#: https://beeware.org/ (content/contents+en.lr:button-block.label)


FYI beeware/lektor-i18n-plugin#5 caused a huge diff in 42a8f3c because of the way xgetttext seems to output stuff differently, but a cursory diff will get you the conclusion that it's actually good because now the difference between the format of the pot and the po file is 0, sans the translated strings.

Not in the specific hash but looking at these changes

johnzhou721 · 2025-06-30T15:23:37Z

Maybe CI is using Ubuntu 24.04 with gettext 0.21 while I'm using gettext 0.25 on macOS and we're hitting the test case over here at https://github.com/translate/translate/pull/5439/files which differentiates b/w <=0.23 and >0.23...

See: https://launchpad.net/ubuntu/noble/+package/gettext

OK... time to go yak shaving tomorrow to install gettext 0.21...

This is resolved now.

packages/lektor_beeware_plugin/pyproject.toml

…diffs

… to reformat

mhsmith · 2025-07-09T14:14:41Z

You said this PR was finalized, but you've continued changing it almost every day. I'm switching it to draft status; please change it back once it's ready to review.

johnzhou721 · 2025-07-09T14:30:18Z

@mhsmith I apologize for this... but if you look at the latest diffs, they're all merging lektor to get the latest Weblate changes into this branch, but there's some version discrepancies that cause differences in how Weblate wraps the lines in the PO files and how my Ubuntu 24.04 machine wraps them... so sometimes I push new commits to reformat those files, such that any further changes won't get huge diffs (since rebuilding the website which is almost alwasy nessacary will rewrap the files).

Besides that, I apologize for a21b22d and 1324d11; they're because an upstream "bug" I reported to Babel turned out because of a missing plugin; after I got the reply, I removed the workaround.

johnzhou721 · 2025-07-21T15:26:30Z

@freakboy3742 I’d like a preliminary look at this PR and the i18n plugin one. Would you prefer if I mark all imported strings as fuzzy? Also: lock weblate before merging this PR and only after the Update Translations commit is finished — there will be conflicts if we don’t.

(Note the PO diffs on unrelated stuff are produced when I run lektor build — there’s a version mismatch for gettext, I can’t update to Ubuntu 24.04.)

thank you!

…esolve a missed conflict

johnzhou721 · 2025-07-22T17:40:54Z

FWIW: I can't figure out why the zh_CN has this huge wrapping diff. It's formatted using Ubuntu 24.04 on GitHub codespaces; it matches the version used in CI. Maybe some of my plugin changes? The CI isn't re-wrapping it back, either.

johnzhou721 · 2025-07-22T18:21:08Z

Strangely, using the new po file with the state of beeware.github.io before this PR causes an update translation, but does not re-wrap the strings back.

What the heck is even going on here? I am totally confused to why on Ubuntu 22.04 and 24.04 both, they both would rewrap the strings from how Weblate wraps them... but this is not happening on the main branch! And copying the new PO file onto the main branch and updating translations in the CI in my fork does not wrap them back!

johnzhou721 added 4 commits June 28, 2025 17:17

switch to gettext

331ee2b

verify that all |trans statements without bag specification are replaced

86d0e9f

translate most chinese strings

af41d4f

more context

d9499db

johnzhou721 commented Jun 28, 2025

View reviewed changes

Update templates/project.html

13540dd

start filling in arabic translations, also correct badge macro

e6b1e27

gettexting

work done today... still no fix for missing strings

2936cf9

johnzhou721 added 6 commits June 29, 2025 11:23

workaround a translaiton bug

e6ba812

restore pluralization

88d637a

update the plugin a bit

42a8f3c

add plural forms

5dc5542

plural forms (filled in missing ones, need the sprinting mentor part)

69ac875

add comment

07ed969

use an older gettext on a gnu/linux vm

84e158c

johnzhou721 commented Jun 30, 2025

View reviewed changes

johnzhou721 marked this pull request as ready for review June 30, 2025 15:23

johnzhou721 and others added 7 commits July 2, 2025 21:31

Update i18n/contents+fa.po

962af73

Update i18n/contents+fa.po

bafc25e

Update i18n/contents+fa.po

9e9e1fd

Update i18n/contents+fa.po

4052723

Merge remote-tracking branch 'upstream/lektor' into jinjai18n

b7c3fcd

Merge remote-tracking branch 'upstream/lektor' into jinjai18n

8f316f2

Merge remote-tracking branch 'upstream' into jinjai18n

79edb56

johnzhou721 commented Jul 5, 2025

View reviewed changes

packages/lektor_beeware_plugin/pyproject.toml Outdated Show resolved Hide resolved

johnzhou721 and others added 7 commits July 5, 2025 14:51

Update packages/lektor_beeware_plugin/pyproject.toml

2ae5660

Remove workaround

a21b22d

fix a bug, rebuild to reformat

1324d11

Merge remote-tracking branch 'upstream/lektor' into jinjai18n

639c7ad

Merge remote-tracking branch 'upstream/lektor' into jinjai18n

74239ef

linebreak using my gettext version conventions to prvent further big …

86eccb3

…diffs

Merge remote-tracking branch 'upstream/lektor' into jinjai18n, msgcat…

08ecfb0

… to reformat

mhsmith marked this pull request as draft July 9, 2025 14:14

johnzhou721 marked this pull request as ready for review July 9, 2025 14:30

johnzhou721 added 3 commits July 22, 2025 16:33

Merge remote-tracking branch 'upstream/lektor' into jinjai18n

919fb2a

Point at my own repo for lektor i18n plugin for simplified testing, r…

22ae47a

…esolve a missed conflict

reformat on ubuntu 24.04

9676c43

johnzhou721 and others added 2 commits July 22, 2025 14:04

fix beeware#692

ca9fdbc

fixup

0ca5940

johnzhou721 changed the title ~~Use gettext for recurring phrases!!!!!~~ Use gettext for recurring phrases, fix a bunch of i18n issues!!!!! Jul 22, 2025

johnzhou721 added 2 commits July 22, 2025 23:04

more fixups, update pos

895816b

Merge branch 'lektor' into jinjai18n

ef8ac8b


		#: https://beeware.org/ (content/contents+en.lr:button-block.label)

Uh oh!

Use gettext for recurring phrases, fix a bunch of i18n issues!!!!! #684

Are you sure you want to change the base?

Use gettext for recurring phrases, fix a bunch of i18n issues!!!!! #684

Uh oh!

Conversation

johnzhou721 commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes about this PR

Deps

Fuzziness

Huge Diffs on PO / POT files

Changes to the headers of PO files

Why is pgettext used?

How did Labels Get Handled?

How did I handle plurals?

What things did I manually Edit?

Misc Notes

What I'm Working on Now

Questions

Easter Egg

PR Checklist:

Uh oh!

johnzhou721 left a comment

Choose a reason for hiding this comment

Uh oh!

johnzhou721 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

johnzhou721 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

johnzhou721 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johnzhou721 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

johnzhou721 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

johnzhou721 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 30, 2025

Uh oh!

johnzhou721 commented Jun 30, 2025

Uh oh!

johnzhou721 commented Jun 30, 2025

Uh oh!

johnzhou721 commented Jun 30, 2025

Uh oh!

johnzhou721 Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

johnzhou721 commented Jun 30, 2025

Uh oh!

Uh oh!

mhsmith commented Jul 9, 2025

Uh oh!

johnzhou721 commented Jul 9, 2025

Uh oh!

johnzhou721 commented Jun 28, 2025 •

edited

Loading