Skip to content

Conversation

@grhoten
Copy link
Member

@grhoten grhoten commented Nov 5, 2025

We should not be using C++ static initialization. It can lengthen the time to load the ICU library. This can be an issue when frequently launching a program on some platforms, and only a small part of the ICU library is being used.

The -Wglobal-constructors -Wexit-time-destructors clang options can be used to prevent C++ static initialization from coming back.

Some of the C++ initialization was moved to a local context, since the static context added no value.

Checklist

  • Required: Issue filed: ICU-23254
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable
  • Approver: Feel free to merge on my behalf

- name: ICU4C with clang on MacOS
env:
CPPFLAGS: '-Wall -Wextra -Wextra-semi -Wundef -Wnon-virtual-dtor -Wctad-maybe-unsupported -Werror'
CPPFLAGS: '-Wall -Wextra -Wextra-semi -Wundef -Wnon-virtual-dtor -Wctad-maybe-unsupported -Wglobal-constructors -Wexit-time-destructors -Werror'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prevents the C++ static initialization from coming back.

FIELD_INIT(parseCaseSensitive, &gIntOps),
FIELD_INIT(outputCurrency, &gStrOps)
};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All moved to the object constructor.

expect(UnicodeString("Test case #") + i,
UnicodeString(BEGIN_END_TEST_CASES[i][0], -1, US_INV),
UnicodeString(BEGIN_END_TEST_CASES[i][1], -1, US_INV),
UnicodeString(BEGIN_END_TEST_CASES[i][2], -1, US_INV));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restructuring this actually makes it much easier to read.

{"process(uca_rules)", u"process(uca_rules)", parseUCARules},
{"process(collation)", u"process(collation)", nullptr /* not implemented yet */},
{"process(transliterator)", u"process(transliterator)", parseTransliterator},
{"process(dependency)", u"process(dependency)", parseDependency},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that C++11 is required, the extra string initialization step is no longer required.

static UBool write_xliff = false;
static const char* outputEnc ="";

static ResFile poolBundle;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved to the main function in genrb.cpp.

# define cp1047_to_8859(c) cp1047_8859_1[c]

// Our app's name
std::string prog;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved to the main function.

@aheninger
Copy link
Contributor

Getting static initialization out of the ICU library code is good/needed. But is it really necessary to remove it all from the tests and tools? Use there isn't going to affect the ICU libraries themselves.

And we should maintain some level of testing that application use of static ICU objects works and doesn't cause anything weird to happen with loading and cleanup. We had it, somewhat accidentally, through the incidental use of statics in the test code.

@grhoten
Copy link
Member Author

grhoten commented Nov 6, 2025

@aheninger You make a fair point. I'm mostly concerned about the common and i18n libraries themselves. I didn't quickly see an obvious way to target only the CPP flags for only the library code. I'm sure if I looked deeper that I'd find one.

With that being said, here are some additional observations.

  • NumberFormatTestTuple was really awkward. The current implementation is much cleaner, and it seems more portable. That wasn't testing ICU.
  • I think that transtst.cpp is much cleaner to read now. There's less jumping around and keeping fields in sync.
  • The tools changes aren't testing ICU. Some of the genrb changes are easier to read due to the ability to use u"" now.
  • The changes to intltest.cpp, and rbbitst.cpp aren't really adding value to the readability nor functionality. I agree that these changes are not adding value beyond preventing C++ static initialization coming back.
  • The C++ static initialization in loctest.cpp has a local scope. It doesn't look like intentional testing. They aren't explicitly testing C++ static initialization. If you feel that it needs to be explicitly tested, maybe there should be a dedicated test for it.
  • I don't have a vested interest to see ScriptRun change beyond preventing C++ static initialization coming back.

If we have non-accidental (intentional) C++ static initialization in the tests only, I can try to work with that.

@grhoten
Copy link
Member Author

grhoten commented Nov 6, 2025

@aheninger Today's ICU-TC discussed this pull request. @eggrobin mentioned his support for enforcing this for the test suite too based on his experience with previous debugging issues in this area. So we will move forward with that TC choice.

const int32_t ScriptRun::pairedCharExtra = pairedCharCount - pairedCharPower;

int8_t ScriptRun::highBit(int32_t value)
static constexpr int8_t highBit(int32_t value)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markusicu I tried to minimize the changes to this file, but this one time use function really should be switched to a constexpr function (C++11 syntax) to avoid the C++ static initialization. The constexpr functions are hard to separate between definition and declaration.

richgillam
richgillam previously approved these changes Nov 10, 2025
Copy link
Contributor

@richgillam richgillam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOKTM

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • .github/workflows/icu4c.yml is different
  • .gitignore is no longer changed in the branch
  • icu4c/source/i18n/smpdtfmt.cpp is different
  • icu4c/source/i18n/transreg.cpp is different
  • icu4c/source/test/intltest/caltest.cpp is different
  • icu4c/source/tools/genrb/reslist.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@grhoten
Copy link
Member Author

grhoten commented Nov 11, 2025

@roubert Please take look at the latest changes.

@markusicu
Copy link
Member

Hi everyone, I would really like to review this PR before it goes in, but I am swamped. I promise to get to it soon after UTW...

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. this is a lot of work...
  2. thank you!!
  3. sorry for the late review
  4. i do have some comments...

case UDAT_YEAR_FIELD:
case UDAT_YEAR_WOY_FIELD:
if (fDateOverride.compare(hebr)==0 && value>HEBREW_CAL_CUR_MILLENIUM_START_YEAR && value<HEBREW_CAL_CUR_MILLENIUM_END_YEAR) {
if (fDateOverride.compare(HEBREW_CALENDAR_VALUE, HEBREW_CALENDAR_VALUE_LEN)==0 && value>HEBREW_CAL_CUR_MILLENIUM_START_YEAR && value<HEBREW_CAL_CUR_MILLENIUM_END_YEAR) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (fDateOverride.compare(HEBREW_CALENDAR_VALUE, HEBREW_CALENDAR_VALUE_LEN)==0 && value>HEBREW_CAL_CUR_MILLENIUM_START_YEAR && value<HEBREW_CAL_CUR_MILLENIUM_END_YEAR) {
if (fDateOverride == u"hebr" && value>HEBREW_CAL_CUR_MILLENIUM_START_YEAR && value<HEBREW_CAL_CUR_MILLENIUM_END_YEAR) {

and don't add the new constants above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was meant to be time sensitive, I was trying to avoid doing a strlen. I'd rather go with Roubert's suggestion on these lines. I was trying to avoid the construction of u16string_view and a string length calculation from that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can expect the compiler to do all of this at compile time. No runtime overhead from UnicodeString::operator==(string literal).

I also had to get used to the fact that we are no longer dealing with the C++ of our college days...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have switched it to be operator== instead. I have left it as a named constexpr, since I consider that to be best practice, and it's used in multiple locations.

// century, for parsed strings from "00" to "99". Any other string
// is treated literally: "2250", "-1", "1", "002".
if (fDateOverride.compare(hebr)==0 && value < 1000) {
if (fDateOverride.compare(HEBREW_CALENDAR_VALUE, HEBREW_CALENDAR_VALUE_LEN)==0 && value < 1000) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

case UDAT_YEAR_WOY_FIELD:
// Comment is the same as for UDAT_Year_FIELDs - look above
if (fDateOverride.compare(hebr)==0 && value < 1000) {
if (fDateOverride.compare(HEBREW_CALENDAR_VALUE, HEBREW_CALENDAR_VALUE_LEN)==0 && value < 1000) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

// Static list of errors found
static UnicodeString errorList;
static UnicodeString& getErrorList() {
static UnicodeString* errorList = new UnicodeString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised that we don't get a memory leak complaint...

I guess that if the tests pass we don't call this. If they fail, we should get both the test failure and a leak report.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no error, there is no error list to show. If there was a failure, I'm sure that this UnicodeString would have been allocated some memory, just as before.

Since the CI checks aren't complaining, I don't consider this to be an issue.

constexpr RandomNumberGenerator::result_type defaultSeed = std::ranlux48_base::default_seed;
static RandomNumberGenerator randomNumberGenerator;
static RandomNumberGenerator& getDefaultRandomNumberGenerator() {
static RandomNumberGenerator *randomNumberGenerator = new RandomNumberGenerator();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory leak?

What I mean with these questions:

I was convinced that we had instrumented our test suites so that they call u_cleanup() in the end, and when we run them in valgrind (or similar), and some memory hasn't been released, we get a memory leak failure. So while this kind of code is fine in a regular program (all goes "poof" when it exits), I would expect this to fail what I thought we had as a test setup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, what do you know!? You're right, and the test isn't failing as expected. I'll rethink this one.

==15900== 
==15900== HEAP SUMMARY:
==15900==     in use at exit: 120 bytes in 1 blocks
==15900==   total heap usage: 9,985,177 allocs, 9,985,176 frees, 2,062,279,131 bytes allocated
==15900== 
==15900== 120 bytes in 1 blocks are still reachable in loss record 1 of 1
==15900==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==15900==    by 0x55533F: getDefaultRandomNumberGenerator (rbbitst.cpp:1991)
==15900==    by 0x55533F: RBBITest::RunMonkey(icu::BreakIterator*, RBBIMonkeyKind&, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, long, signed char, _IO_FILE*, signed char) (rbbitst.cpp:3068)
==15900==    by 0x52EF0F: RBBITest::TestMonkey() (rbbitst.cpp:2958)
==15900==    by 0x304CC6: IntlTest::runTestLoop(char*, char*, char*) (intltest.cpp:809)
==15900==    by 0x3048D3: IntlTest::runTest(char*, char*, char*) (intltest.cpp:730)
==15900==    by 0x30477D: IntlTest::callTest(IntlTest&, char*) (intltest.cpp:631)
==15900==    by 0x51FBB0: IntlTestRBBI::runIndexedTest(int, signed char, char const*&, char*) (itrbbi.cpp:36)
==15900==    by 0x304CC6: IntlTest::runTestLoop(char*, char*, char*) (intltest.cpp:809)
==15900==    by 0x3048D3: IntlTest::runTest(char*, char*, char*) (intltest.cpp:730)
==15900==    by 0x30477D: IntlTest::callTest(IntlTest&, char*) (intltest.cpp:631)
==15900==    by 0x316CEC: MajorTestLevel::runIndexedTest(int, signed char, char const*&, char*) (itmajor.cpp:118)
==15900==    by 0x304CC6: IntlTest::runTestLoop(char*, char*, char*) (intltest.cpp:809)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI you need to use something like --errors-for-leak-kinds=definite or a variation of that to make valgrind fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a related note, the udbg_knownIssue_open is causing memory to still be reachable according to one of the CI checks. My changes don't touch that code. Is that working as expected, or did you want an issue submitted for that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a related note, the udbg_knownIssue_open is causing memory to still be reachable according to one of the CI checks. My changes don't touch that code. Is that working as expected, or did you want an issue submitted for that?

Please create a ticket.
We should probably have an xxx_cleanup() function for test-only libraries that intltest & cintltst then need to call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitted as ICU-23275


U_STRING_DECL(k_type_string, "string", 6);
U_STRING_DECL(k_type_binary, "binary", 6);
U_STRING_DECL(k_type_bin, "bin", 3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why keep a few of these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they're not used in gResourceTypes, whose variable references caused the C++ static initialization. These are aliases that are used elsewhere. I was trying to minimize the changes to make it easier to review.

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/dtptngen.cpp is different
  • icu4c/source/i18n/number_mapper.cpp is no longer changed in the branch
  • icu4c/source/i18n/udatpg.cpp is different
  • icu4c/source/i18n/unicode/dtptngen.h is now changed in the branch
  • icu4c/source/test/intltest/rbbitst.cpp is different
  • icu4c/source/tools/genrb/genrb.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@grhoten
Copy link
Member Author

grhoten commented Nov 22, 2025

@markusicu Please take another look at these changes.

@markusicu
Copy link
Member

@grhoten I tried to look for further changes in context, but I see that you squashed prematurely... Please remember not to squash until after approval!

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm except for the string literals in smpdtfmt.cpp

@grhoten
Copy link
Member Author

grhoten commented Nov 25, 2025

@grhoten I tried to look for further changes in context, but I see that you squashed prematurely... Please remember not to squash until after approval!

Sorry about that. I thought I had addressed your comments. Ideally, this extra manual squash step should be removed, and the GitHub settings of this repository should be switched from "Rebase and merge" to "Squash and merge". It really simplifies the process, and the commit check could be removed.

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this extra manual squash step should be removed, and the GitHub settings of this repository should be switched from "Rebase and merge" to "Squash and merge". It really simplifies the process, and the commit check could be removed.

Except it doesn't work for a project like ICU. We want every change to come with a Jira ticket, and we are enforcing it with a CI check that looks for an open ticket.

Squash-and-merge would go around that, because it lets you edit the commit message, without another CI check.

This is much less annoying than it was at first: PR approval is no longer dismissed when you truly squash and not change any file contents. So we can build up lots of commits and comments, get a PR approved, then squash -- doable via GitHub UI -- and let it rattle through the CI checks once more, some of which we need anyway for checking the PR title and commit message.

case UDAT_YEAR_FIELD:
case UDAT_YEAR_WOY_FIELD:
if (fDateOverride.compare(hebr)==0 && value>HEBREW_CAL_CUR_MILLENIUM_START_YEAR && value<HEBREW_CAL_CUR_MILLENIUM_END_YEAR) {
if (fDateOverride == HEBREW_CALENDAR_VALUE && value>HEBREW_CAL_CUR_MILLENIUM_START_YEAR && value<HEBREW_CAL_CUR_MILLENIUM_END_YEAR) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI -- Thanks, this looks ok.
Note that for a named string constant like this you could also have a constexpr std::u16string_view. Then there would be no doubt that the compiler computes the length.

@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@grhoten grhoten merged commit 9f1c574 into unicode-org:main Nov 26, 2025
106 checks passed
@grhoten grhoten deleted the constructors branch November 26, 2025 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants