ICU-23254 Remove C++ static initialization #3766

grhoten · 2025-11-05T06:32:33Z

We should not be using C++ static initialization. It can lengthen the time to load the ICU library. This can be an issue when frequently launching a program on some platforms, and only a small part of the ICU library is being used.

The -Wglobal-constructors -Wexit-time-destructors clang options can be used to prevent C++ static initialization from coming back.

Some of the C++ initialization was moved to a local context, since the static context added no value.

Checklist

Required: Issue filed: ICU-23254
Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
Issue accepted (done by Technical Committee after discussion)
Tests included, if applicable
API docs and/or User Guide docs changed or added, if applicable
Approver: Feel free to merge on my behalf

grhoten · 2025-11-05T06:33:11Z

.github/workflows/icu4c.yml

      - name: ICU4C with clang on MacOS
        env:
-          CPPFLAGS: '-Wall -Wextra -Wextra-semi -Wundef -Wnon-virtual-dtor -Wctad-maybe-unsupported -Werror'
+          CPPFLAGS: '-Wall -Wextra -Wextra-semi -Wundef -Wnon-virtual-dtor -Wctad-maybe-unsupported -Wglobal-constructors -Wexit-time-destructors -Werror'


This prevents the C++ static initialization from coming back.

icu4c/source/extra/scrptrun/scrptrun.cpp

grhoten · 2025-11-05T06:36:12Z

icu4c/source/test/intltest/numberformattesttuple.cpp

-    FIELD_INIT(parseCaseSensitive, &gIntOps),
-    FIELD_INIT(outputCurrency, &gStrOps)
-};
-


All moved to the object constructor.

grhoten · 2025-11-05T06:37:19Z

icu4c/source/test/intltest/transtst.cpp

+        expect(UnicodeString("Test case #") + i,
+               UnicodeString(BEGIN_END_TEST_CASES[i][0], -1, US_INV),
+               UnicodeString(BEGIN_END_TEST_CASES[i][1], -1, US_INV),
+               UnicodeString(BEGIN_END_TEST_CASES[i][2], -1, US_INV));


Restructuring this actually makes it much easier to read.

grhoten · 2025-11-05T06:38:43Z

icu4c/source/tools/genrb/parse.cpp

+    {"process(uca_rules)", u"process(uca_rules)", parseUCARules},
+    {"process(collation)", u"process(collation)", nullptr /* not implemented yet */},
+    {"process(transliterator)", u"process(transliterator)", parseTransliterator},
+    {"process(dependency)", u"process(dependency)", parseDependency},


Now that C++11 is required, the extra string initialization step is no longer required.

grhoten · 2025-11-05T06:39:44Z

icu4c/source/tools/genrb/genrb.cpp

 static     UBool       write_xliff = false;
 static     const char* outputEnc ="";

-static ResFile poolBundle;


This was moved to the main function in genrb.cpp.

grhoten · 2025-11-05T06:40:10Z

icu4c/source/tools/escapesrc/escapesrc.cpp

 # define cp1047_to_8859(c) cp1047_8859_1[c]

-// Our app's name
-std::string prog;


This was moved to the main function.

aheninger · 2025-11-06T00:32:19Z

Getting static initialization out of the ICU library code is good/needed. But is it really necessary to remove it all from the tests and tools? Use there isn't going to affect the ICU libraries themselves.

And we should maintain some level of testing that application use of static ICU objects works and doesn't cause anything weird to happen with loading and cleanup. We had it, somewhat accidentally, through the incidental use of statics in the test code.

grhoten · 2025-11-06T01:06:26Z

@aheninger You make a fair point. I'm mostly concerned about the common and i18n libraries themselves. I didn't quickly see an obvious way to target only the CPP flags for only the library code. I'm sure if I looked deeper that I'd find one.

With that being said, here are some additional observations.

NumberFormatTestTuple was really awkward. The current implementation is much cleaner, and it seems more portable. That wasn't testing ICU.
I think that transtst.cpp is much cleaner to read now. There's less jumping around and keeping fields in sync.
The tools changes aren't testing ICU. Some of the genrb changes are easier to read due to the ability to use u"" now.
The changes to intltest.cpp, and rbbitst.cpp aren't really adding value to the readability nor functionality. I agree that these changes are not adding value beyond preventing C++ static initialization coming back.
The C++ static initialization in loctest.cpp has a local scope. It doesn't look like intentional testing. They aren't explicitly testing C++ static initialization. If you feel that it needs to be explicitly tested, maybe there should be a dedicated test for it.
I don't have a vested interest to see ScriptRun change beyond preventing C++ static initialization coming back.

If we have non-accidental (intentional) C++ static initialization in the tests only, I can try to work with that.

grhoten · 2025-11-06T19:24:26Z

@aheninger Today's ICU-TC discussed this pull request. @eggrobin mentioned his support for enforcing this for the test suite too based on his experience with previous debugging issues in this area. So we will move forward with that TC choice.

grhoten · 2025-11-06T19:57:02Z

icu4c/source/extra/scrptrun/scrptrun.cpp

-const int32_t ScriptRun::pairedCharExtra = pairedCharCount - pairedCharPower;
-
-int8_t ScriptRun::highBit(int32_t value)
+static constexpr int8_t highBit(int32_t value)


@markusicu I tried to minimize the changes to this file, but this one time use function really should be switched to a constexpr function (C++11 syntax) to avoid the C++ static initialization. The constexpr functions are hard to separate between definition and declaration.

richgillam

LOKTM

.gitignore

icu4c/source/i18n/smpdtfmt.cpp

icu4c/source/i18n/transreg.h

icu4c/source/i18n/udatpg.cpp

icu4c/source/tools/genrb/reslist.cpp

icu4c/source/tools/genrb/reslist.h

jira-pull-request-webhook · 2025-11-11T09:04:26Z

Notice: the branch changed across the force-push!

.github/workflows/icu4c.yml is different
.gitignore is no longer changed in the branch
icu4c/source/i18n/smpdtfmt.cpp is different
icu4c/source/i18n/transreg.cpp is different
icu4c/source/test/intltest/caltest.cpp is different
icu4c/source/tools/genrb/reslist.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

grhoten · 2025-11-11T09:10:01Z

@roubert Please take look at the latest changes.

markusicu · 2025-11-11T17:22:05Z

Hi everyone, I would really like to review this PR before it goes in, but I am swamped. I promise to get to it soon after UTW...

icu4c/source/i18n/smpdtfmt.cpp

icu4c/source/i18n/udatpg.cpp

Done.

icu4c/source/test/fuzzer/fuzzer_utils.h

markusicu

this is a lot of work...
thank you!!
sorry for the late review
i do have some comments...

markusicu · 2025-11-21T23:13:04Z

icu4c/source/i18n/smpdtfmt.cpp

    case UDAT_YEAR_FIELD:
    case UDAT_YEAR_WOY_FIELD:
-        if (fDateOverride.compare(hebr)==0 && value>HEBREW_CAL_CUR_MILLENIUM_START_YEAR && value<HEBREW_CAL_CUR_MILLENIUM_END_YEAR) {
+        if (fDateOverride.compare(HEBREW_CALENDAR_VALUE, HEBREW_CALENDAR_VALUE_LEN)==0 && value>HEBREW_CAL_CUR_MILLENIUM_START_YEAR && value<HEBREW_CAL_CUR_MILLENIUM_END_YEAR) {


Suggested change

if (fDateOverride.compare(HEBREW_CALENDAR_VALUE, HEBREW_CALENDAR_VALUE_LEN)==0 && value>HEBREW_CAL_CUR_MILLENIUM_START_YEAR && value<HEBREW_CAL_CUR_MILLENIUM_END_YEAR) {

if (fDateOverride == u"hebr" && value>HEBREW_CAL_CUR_MILLENIUM_START_YEAR && value<HEBREW_CAL_CUR_MILLENIUM_END_YEAR) {

and don't add the new constants above

If this was meant to be time sensitive, I was trying to avoid doing a strlen. I'd rather go with Roubert's suggestion on these lines. I was trying to avoid the construction of u16string_view and a string length calculation from that.

We can expect the compiler to do all of this at compile time. No runtime overhead from UnicodeString::operator==(string literal).

I also had to get used to the fact that we are no longer dealing with the C++ of our college days...

I have switched it to be operator== instead. I have left it as a named constexpr, since I consider that to be best practice, and it's used in multiple locations.

markusicu · 2025-11-21T23:13:37Z

icu4c/source/i18n/smpdtfmt.cpp

        // century, for parsed strings from "00" to "99".  Any other string
        // is treated literally:  "2250", "-1", "1", "002".
-        if (fDateOverride.compare(hebr)==0 && value < 1000) {
+        if (fDateOverride.compare(HEBREW_CALENDAR_VALUE, HEBREW_CALENDAR_VALUE_LEN)==0 && value < 1000) {


markusicu · 2025-11-21T23:13:44Z

icu4c/source/i18n/smpdtfmt.cpp

    case UDAT_YEAR_WOY_FIELD:
        // Comment is the same as for UDAT_Year_FIELDs - look above
-        if (fDateOverride.compare(hebr)==0 && value < 1000) {
+        if (fDateOverride.compare(HEBREW_CALENDAR_VALUE, HEBREW_CALENDAR_VALUE_LEN)==0 && value < 1000) {


icu4c/source/i18n/transreg.cpp

icu4c/source/i18n/udatpg.cpp

markusicu · 2025-11-21T23:26:29Z

icu4c/source/test/intltest/intltest.cpp

 // Static list of errors found
-static UnicodeString errorList;
+static UnicodeString& getErrorList() {
+    static UnicodeString* errorList = new UnicodeString();


I am surprised that we don't get a memory leak complaint...

I guess that if the tests pass we don't call this. If they fail, we should get both the test failure and a leak report.

If there is no error, there is no error list to show. If there was a failure, I'm sure that this UnicodeString would have been allocated some memory, just as before.

Since the CI checks aren't complaining, I don't consider this to be an issue.

markusicu · 2025-11-21T23:31:15Z

icu4c/source/test/intltest/rbbitst.cpp

 constexpr RandomNumberGenerator::result_type defaultSeed = std::ranlux48_base::default_seed;
-static RandomNumberGenerator randomNumberGenerator;
+static RandomNumberGenerator& getDefaultRandomNumberGenerator() {
+    static RandomNumberGenerator *randomNumberGenerator = new RandomNumberGenerator();


memory leak?

What I mean with these questions:

I was convinced that we had instrumented our test suites so that they call u_cleanup() in the end, and when we run them in valgrind (or similar), and some memory hasn't been released, we get a memory leak failure. So while this kind of code is fine in a regular program (all goes "poof" when it exits), I would expect this to fail what I thought we had as a test setup.

Well, what do you know!? You're right, and the test isn't failing as expected. I'll rethink this one.

==15900== ==15900== HEAP SUMMARY: ==15900== in use at exit: 120 bytes in 1 blocks ==15900== total heap usage: 9,985,177 allocs, 9,985,176 frees, 2,062,279,131 bytes allocated ==15900== ==15900== 120 bytes in 1 blocks are still reachable in loss record 1 of 1 ==15900== at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==15900== by 0x55533F: getDefaultRandomNumberGenerator (rbbitst.cpp:1991) ==15900== by 0x55533F: RBBITest::RunMonkey(icu::BreakIterator*, RBBIMonkeyKind&, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, long, signed char, _IO_FILE*, signed char) (rbbitst.cpp:3068) ==15900== by 0x52EF0F: RBBITest::TestMonkey() (rbbitst.cpp:2958) ==15900== by 0x304CC6: IntlTest::runTestLoop(char*, char*, char*) (intltest.cpp:809) ==15900== by 0x3048D3: IntlTest::runTest(char*, char*, char*) (intltest.cpp:730) ==15900== by 0x30477D: IntlTest::callTest(IntlTest&, char*) (intltest.cpp:631) ==15900== by 0x51FBB0: IntlTestRBBI::runIndexedTest(int, signed char, char const*&, char*) (itrbbi.cpp:36) ==15900== by 0x304CC6: IntlTest::runTestLoop(char*, char*, char*) (intltest.cpp:809) ==15900== by 0x3048D3: IntlTest::runTest(char*, char*, char*) (intltest.cpp:730) ==15900== by 0x30477D: IntlTest::callTest(IntlTest&, char*) (intltest.cpp:631) ==15900== by 0x316CEC: MajorTestLevel::runIndexedTest(int, signed char, char const*&, char*) (itmajor.cpp:118) ==15900== by 0x304CC6: IntlTest::runTestLoop(char*, char*, char*) (intltest.cpp:809)

FYI you need to use something like --errors-for-leak-kinds=definite or a variation of that to make valgrind fail.

On a related note, the udbg_knownIssue_open is causing memory to still be reachable according to one of the CI checks. My changes don't touch that code. Is that working as expected, or did you want an issue submitted for that?

On a related note, the udbg_knownIssue_open is causing memory to still be reachable according to one of the CI checks. My changes don't touch that code. Is that working as expected, or did you want an issue submitted for that?

Please create a ticket.
We should probably have an xxx_cleanup() function for test-only libraries that intltest & cintltst then need to call.

Submitted as ICU-23275

markusicu · 2025-11-21T23:37:10Z

icu4c/source/tools/genrb/parse.cpp

-
-U_STRING_DECL(k_type_string,    "string",    6);
-U_STRING_DECL(k_type_binary,    "binary",    6);
 U_STRING_DECL(k_type_bin,       "bin",       3);


why keep a few of these?

Because they're not used in gResourceTypes, whose variable references caused the C++ static initialization. These are aliases that are used elsewhere. I was trying to minimize the changes to make it easier to review.

icu4c/source/tools/genrb/genrb.cpp

icu4c/source/tools/genrb/reslist.h

jira-pull-request-webhook · 2025-11-22T07:13:46Z

Notice: the branch changed across the force-push!

icu4c/source/i18n/dtptngen.cpp is different
icu4c/source/i18n/number_mapper.cpp is no longer changed in the branch
icu4c/source/i18n/udatpg.cpp is different
icu4c/source/i18n/unicode/dtptngen.h is now changed in the branch
icu4c/source/test/intltest/rbbitst.cpp is different
icu4c/source/tools/genrb/genrb.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

icu4c/source/i18n/dtptngen.cpp

grhoten · 2025-11-22T14:38:04Z

@markusicu Please take another look at these changes.

markusicu · 2025-11-24T20:53:16Z

@grhoten I tried to look for further changes in context, but I see that you squashed prematurely... Please remember not to squash until after approval!

markusicu

lgtm except for the string literals in smpdtfmt.cpp

grhoten · 2025-11-25T04:25:00Z

@grhoten I tried to look for further changes in context, but I see that you squashed prematurely... Please remember not to squash until after approval!

Sorry about that. I thought I had addressed your comments. Ideally, this extra manual squash step should be removed, and the GitHub settings of this repository should be switched from "Rebase and merge" to "Squash and merge". It really simplifies the process, and the commit check could be removed.

markusicu

Ideally, this extra manual squash step should be removed, and the GitHub settings of this repository should be switched from "Rebase and merge" to "Squash and merge". It really simplifies the process, and the commit check could be removed.

Except it doesn't work for a project like ICU. We want every change to come with a Jira ticket, and we are enforcing it with a CI check that looks for an open ticket.

Squash-and-merge would go around that, because it lets you edit the commit message, without another CI check.

This is much less annoying than it was at first: PR approval is no longer dismissed when you truly squash and not change any file contents. So we can build up lots of commits and comments, get a PR approved, then squash -- doable via GitHub UI -- and let it rattle through the CI checks once more, some of which we need anyway for checking the PR title and commit message.

markusicu · 2025-11-25T15:00:48Z

icu4c/source/i18n/smpdtfmt.cpp

    case UDAT_YEAR_FIELD:
    case UDAT_YEAR_WOY_FIELD:
-        if (fDateOverride.compare(hebr)==0 && value>HEBREW_CAL_CUR_MILLENIUM_START_YEAR && value<HEBREW_CAL_CUR_MILLENIUM_END_YEAR) {
+        if (fDateOverride == HEBREW_CALENDAR_VALUE && value>HEBREW_CAL_CUR_MILLENIUM_START_YEAR && value<HEBREW_CAL_CUR_MILLENIUM_END_YEAR) {


FYI -- Thanks, this looks ok.
Note that for a named string constant like this you could also have a constexpr std::u16string_view. Then there would be no doubt that the compiler computes the length.

jira-pull-request-webhook · 2025-11-26T06:22:27Z

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

grhoten commented Nov 5, 2025

View reviewed changes

grhoten force-pushed the constructors branch from 1fc134c to c1f7e9c Compare November 5, 2025 06:58