Skip to content

Conversation

@poliudian-iv
Copy link
Contributor

@poliudian-iv poliudian-iv commented Oct 31, 2025

Bugs identified by the static analyzers when analyzing the chromium source code (part 3)

icu4c/source/i18n/rbt_pars.cpp - "Out of bound access to memory preceding the field 'preContext'"
icu4c/source/common/cstring.cpp - "Out of bound access to memory" if n == INT_MAX
icu4c/source/i18n/rematch.cpp - "Out of bound access to memory preceding the field 'd'" at Regex8BitSet::contains() when c == -1;
icu4c/source/i18n/number_longnames.cpp - "Out of bound access to memory" at getMixedUnitModifier
icu4c/source/common/messagepattern.cpp - "Out of bound access to memory preceding the field 'postContext'" when index < msg.length()
icu4c/source/common/ubidiln.cpp - "Out of bound access to memory" when limit==0
icu4c/source/i18n/simpletz.cpp - "The left operand of '<' is a garbage value"

TODO: Fill out the checklist below.

Checklist

  • Required: Issue filed: ICU-23251
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-23251 Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-23251 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable
  • Approver: Feel free to merge on my behalf

@poliudian-iv poliudian-iv changed the title Fix static analyzer bugs ICU-23251 Fix static analyzer bugs Nov 1, 2025
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot


// Set postContext to some of msg starting at index.
length=msg.length()-index;
length=msg.length()>index ? msg.length()-index : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that could reasonably happen? If not, I suggest instead adding the length condition to the first if-statement in this function and just do nothing at all if index is out of range. You might even add an U_ASSERT() statement to assert that this function never gets called with an invalid index value.

int32_t pos,
UErrorCode& status)
{
if (pos < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this is not something that should ever happen, assert (to get a nice human readable error message in debug builds if some bug causes it to happen even if it shouldn't) and then return immediately without doing anything.

const UnicodeSet &s = RegexStaticSets::gStaticSets->fPropSets[opValue];
if (s.contains(c)) {
success = !success;
if (c >= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right, UChar32 should be an unsigned data type, it shouldn't be possible for this value to be negative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UChar32 is int32_t
And U16_NEXT can return U_SENTINEL == -1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, I mixed it up with UChar, sorry about that. But then there's a clearly defined sentinel value that you can test for here, U_SENTINEL, which will tell any future reader that this is a value that this variable really can be assigned. (And in case you have reason to fear that any other negative value could ever be assigned here, cover that case with a U_ASSERT.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing for negative / non-negative is much spiffier, and used throughout the code base.
U_SENTINEL is negative for a reason -- for easy testing via if (c < 0).

const UnicodeSet &s = RegexStaticSets::gStaticSets->fPropSets[opValue];
if (s.contains(c)) {
success = !success;
if (c >= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, I mixed it up with UChar, sorry about that. But then there's a clearly defined sentinel value that you can test for here, U_SENTINEL, which will tell any future reader that this is a value that this variable really can be assigned. (And in case you have reason to fear that any other negative value could ever be assigned here, cover that case with a U_ASSERT.)

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/messagepattern.cpp is different
  • icu4c/source/i18n/number_longnames.cpp is different
  • icu4c/source/i18n/rbt_pars.cpp is different
  • icu4c/source/i18n/rematch.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@poliudian-iv poliudian-iv requested a review from roubert November 11, 2025 09:15
/* search for the run limits and initialize visualLimit values with the run lengths */
i=0;
do {
while(i<limit) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 578 says that limit>0. let's assert that if necessary, rather than change the logic.
after 26 years with this code as is, i doubt there is a real problem to be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if you wanna, i can add checking limit to start of function for checking argument pBiDi->trailingWSStart

const UnicodeSet &s = RegexStaticSets::gStaticSets->fPropSets[opValue];
if (s.contains(c)) {
success = !success;
if (c != U_SENTINEL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (c != U_SENTINEL) {
if (c >= 0) {

etc.

Actually, I am surprised that @aheninger used the validating macro but didn't check for well-formed input.

Also, is it ok to do nothing when the input is ill-formed? We shouldn't change this code without understanding what it does.

FYI: If it was ok to use a simple fallback, then we could use U16_NEXT_OR_FFFD().

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/ubidiln.cpp is different
  • icu4c/source/i18n/rematch.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants