Skip to content

Conversation

kimkulling
Copy link
Owner

@kimkulling kimkulling commented Aug 26, 2025

Summary by CodeRabbit

  • Improvements

    • Increased default string capacity to 512 to better handle larger text and reduce reallocations.
    • Ensured string instances are consistently initialized with an explicit null terminator.
  • Bug Fixes

    • Fixed edge-case resizing to guarantee proper null-termination and prevent unstable string state.
  • Tests

    • Updated unit tests to expect the new default capacity (512) and verify termination behavior.

Copy link
Contributor

coderabbitai bot commented Aug 26, 2025

Walkthrough

Initialize TStringBase's inline buffer first element to '\0'; when copying, allocate one extra element (size+1), set capacity accordingly, and write a null terminator at index size after memcpy. Tests updated to expect default capacity 512 for TStringBase<char>.

Changes

Cohort / File(s) Summary
String base implementation
include/cppcore/Common/TStringBase.h
Initialize mBuffer[InitSize] with {'\\0'}. In copyFrom: allocate new T[size + 1], set mCapacity = size + 1, perform memcpy, then write terminator at targetPtr[size].
Unit tests
test/common/TStringBaseTest.cpp
Update expected default/pre-initialized capacity for TStringBase<char> from 256 to 512 in relevant tests.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nibble at bytes with a twitch of my nose,
A tiny '\0' planted where the buffer grows.
Capacity hops from two-five-six to five-one-two,
Tests clap their paws — the null's snug and true.
A rabbit-approved tweak, concise and new.

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bugfix/ensure_no_string_override

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
include/cppcore/Common/TStringBase.h (1)

162-183: Equality can break after set/append because mHashId is not updated

set() and operator+= call copyFrom, but copyFrom does not update mHashId. Since operator== short-circuits to false on differing hashes, comparing a string built with set() to one built via the constructor (which hashes) will likely fail even when contents match.

Centralize hash maintenance in copyFrom so all writers keep mHashId in sync. This preserves existing constructor behavior (double-hashing is harmless; you can remove it later if desired).

Apply this diff:

 inline void TStringBase<T>::copyFrom(TStringBase<T> &base, const T *ptr, size_t size) {
@@
     memcpy(targetPtr, ptr, size * sizeof(T));
     targetPtr[size] = T(0);
     base.mSize = size;
+    // Keep hash consistent for all write paths (ctor, set, operator+=, etc.)
+    base.mHashId = THash<HashId>::toHash(base.c_str(), size);
 }

Optionally, you can remove the redundant hashing in the constructor:

 inline TStringBase<T>::TStringBase(const T *ptr, size_t size) {
     copyFrom(*this, ptr, size);
-    mHashId = THash<HashId>::toHash(c_str(), size);
 }

If you prefer not to touch copyFrom, add the hashing line in set() and both operator+= overloads instead. I can provide that patch too.

🧹 Nitpick comments (4)
include/cppcore/Common/TStringBase.h (2)

95-95: Prefer value-initialization over char literal for template generality

T mBuffer[InitSize] = {'\0'}; works for char but subtly couples the template to character-like T. Using {} value-initializes the entire buffer to zero for any trivially value-initializable T (including wchar_t, char16_t, etc.) and reads clearer.

Apply this diff:

-    T mBuffer[InitSize] = {'\0'};
+    T mBuffer[InitSize] = {};

121-126: Ensure empty c_str() stays NUL-terminated after reset

reset() sets mSize = 0 but doesn’t explicitly write a terminator into the active buffer. If the active buffer is the small buffer and previously held data, c_str() will still return a pointer to an old non-empty C-string even though size() == 0. This is surprising and can lead to misuse by callers that trust c_str().

Consider forcing the first element to zero in reset():

 inline void TStringBase<T>::reset() {
     mSize = 0u;
     mHashId = 0;
+    if (mStringBuffer == nullptr) {
+        mBuffer[0] = T(0);
+    } else if (mCapacity > 0) {
+        mStringBuffer[0] = T(0);
+    }
 }
test/common/TStringBaseTest.cpp (2)

47-47: Brittle assertion on exact capacity

Asserting an exact capacity (512) tightly couples the test to an internal implementation detail and will break on legitimate tuning. Prefer asserting invariants (e.g., capacity is at least the small-buffer size or doesn’t shrink across reset) instead of the precise number.

For example:

  • EXPECT_GE(string_1.capacity(), 4);
  • EXPECT_EQ_AFTER_RESET(capacity, initial_capacity); // if you capture and compare

I can refactor this test to verify behavior rather than the literal constant.


57-57: Same as above: avoid hard-coding InitSize in tests

Mirrors the previous comment. Unless you need to lock the ABI, asserting the exact small-buffer capacity makes tests noisy for non-functional changes.

Recommendation: assert relative properties (e.g., capacity unchanged by clear; capacity sufficient for the written data) rather than the magic number 512.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4499610 and 2618fae.

📒 Files selected for processing (2)
  • include/cppcore/Common/TStringBase.h (2 hunks)
  • test/common/TStringBaseTest.cpp (2 hunks)

@kimkulling kimkulling merged commit aa18d4a into master Aug 26, 2025
3 checks passed
@kimkulling kimkulling deleted the bugfix/ensure_no_string_override branch August 26, 2025 14:02
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
include/cppcore/Common/TStringBase.h (4)

121-126: reset() should preserve the c_str() invariant (always NUL-terminated)

After reset(), c_str() may still point to mStringBuffer with stale data at index 0. Ensure the first element is 0 on the active buffer so c_str() is always a valid empty string.

 inline void TStringBase<T>::reset() {
     mSize = 0u;
     mHashId = 0;
+    // Keep storage but enforce empty-string invariant for c_str()
+    if (mStringBuffer) {
+        mStringBuffer[0] = T(0);
+    } else {
+        mBuffer[0] = T(0);
+    }
 }

205-213: operator+= overwrites instead of appending

operator+=(const TStringBase&) currently replaces the content with rhs instead of appending it. This is a semantic bug and will violate user expectations.

If you want to keep copyFrom() as the central routine, implement append here:

 inline TStringBase<T> &TStringBase<T>::operator += (const TStringBase<T> &rhs) {
     if (rhs.isEmpty()) {
         return *this;
     }
-
-    copyFrom(*this, rhs.c_str(), rhs.size());
+    const size_t newSize = mSize + rhs.mSize;
+    // Ensure capacity (including terminator)
+    if (newSize + 1 > mCapacity) {
+        T* newBuf = new T[newSize + 1];
+        // copy existing content
+        memcpy(newBuf, c_str(), mSize * sizeof(T));
+        // copy rhs
+        memcpy(newBuf + mSize, rhs.c_str(), rhs.mSize * sizeof(T));
+        newBuf[newSize] = T(0);
+        if (mStringBuffer) delete [] mStringBuffer;
+        mStringBuffer = newBuf;
+        mCapacity = newSize + 1;
+    } else {
+        // Append into active buffer
+        T* dst = mStringBuffer ? mStringBuffer : mBuffer;
+        memcpy(dst + mSize, rhs.c_str(), rhs.mSize * sizeof(T));
+        dst[newSize] = T(0);
+    }
+    mSize = newSize;
+    mHashId = THash<HashId>::toHash(c_str(), mSize);
 
     return *this;
 }

Optionally, extract the growth logic into a private ensureCapacity(size_t needed) helper to avoid duplication.


216-219: operator+=(char) also overwrites instead of appending

Same semantic issue as the string append. This should append one character, not replace the whole content.

 inline TStringBase<T>& TStringBase<T>::operator+=(char c) {
-    copyFrom(*this, &c, 1);
-    return *this;
+    const size_t newSize = mSize + 1;
+    if (newSize + 1 > mCapacity) {
+        T* newBuf = new T[newSize + 1];
+        memcpy(newBuf, c_str(), mSize * sizeof(T));
+        if (mStringBuffer) delete [] mStringBuffer;
+        mStringBuffer = newBuf;
+        mCapacity = newSize + 1;
+    }
+    T* dst = mStringBuffer ? mStringBuffer : mBuffer;
+    dst[mSize] = static_cast<T>(c);
+    dst[newSize] = T(0);
+    mSize = newSize;
+    mHashId = THash<HashId>::toHash(dst, mSize);
+    return *this;
 }

221-224: operator[] reads from the wrong buffer (potentially garbage)

operator[] always indexes mBuffer even when the active storage is mStringBuffer. This will return stale/garbage data for heap-backed strings.

 inline T TStringBase<T>::operator[](size_t index) const {
-    return mBuffer[index];
+    // Optionally add bounds checks in debug builds
+    return c_str()[index];
 }
♻️ Duplicate comments (1)
include/cppcore/Common/TStringBase.h (1)

167-183: Fix remaining overflow risk, active-buffer copy bug, and keep hash in sync

Three issues persist in copyFrom:

  • Capacity check ignores the terminator. If size == mCapacity (e.g., stack buffer of 512 and size 512), you write targetPtr[size] past the end. Condition must be size + 1 > mCapacity. [duplicate of prior review, still unresolved]
  • targetPtr is always initialized to mBuffer, so when mStringBuffer already exists and has enough capacity you still copy into mBuffer, but c_str() continues to point to mStringBuffer (stale content). Choose the active buffer first.
  • For size == 0, you return early without guaranteeing that c_str() points to a valid empty, null-terminated buffer. Ensure [0] = 0 on the active buffer.
  • Minor: write the terminator as T(0) instead of '\0' for type correctness.
  • Keep mHashId consistent; otherwise operator== can spuriously fail.

Apply this diff:

-    T *targetPtr = base.mBuffer;
-    if (size == 0) {
-        return;
-    }
-
-    if (size > base.mCapacity) {
+    // Use the active buffer and ensure capacity includes room for the terminator
+    T *targetPtr = base.mStringBuffer ? base.mStringBuffer : base.mBuffer;
+    if (size == 0) {
+        targetPtr[0] = T(0);
+        base.mSize = 0;
+        base.mHashId = 0; // empty string; adjust if your hash for empty is non-zero
+        return;
+    }
+
+    if (size + 1 > base.mCapacity) {
         if (base.mStringBuffer != nullptr) {
             delete [] base.mStringBuffer;
         }
-        base.mStringBuffer = new T[size + 1];
-        base.mCapacity = size + 1;
+        base.mStringBuffer = new T[size + 1];
+        base.mCapacity = size + 1;
         targetPtr = base.mStringBuffer;
     }
     memcpy(targetPtr, ptr, size * sizeof(T));
-    targetPtr[size] = '\0';
+    targetPtr[size] = T(0);
     base.mSize = size;
+    base.mHashId = THash<HashId>::toHash(base.c_str(), base.mSize);
🧹 Nitpick comments (2)
include/cppcore/Common/TStringBase.h (2)

95-95: Prefer T(0) over '\0' for template-friendly zero-initialization

Using '\0' (char) in a templated array can trigger narrowing or warnings when T ≠ char. Use T(0) (or simply {}) to make intent explicit and type-safe.

-    T mBuffer[InitSize] = {'\0'};
+    T mBuffer[InitSize] = {T(0)};

25-25: Use instead of "string.h" in C++ headers

Prefer the C++ header and qualified names (std::memcpy/std::memcmp). This avoids namespace pollution and is more idiomatic in C++.

-#include "string.h"
+#include <cstring>

Optionally:

-    memcpy(targetPtr, ptr, size * sizeof(T));
+    std::memcpy(targetPtr, ptr, size * sizeof(T));

and similarly for memcmp.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2618fae and ae55a0c.

📒 Files selected for processing (1)
  • include/cppcore/Common/TStringBase.h (2 hunks)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants