Skip to content

Commit

Permalink
Fix (u)int8 implicit char conversion in stringstream (#5445)
Browse files Browse the repository at this point in the history
This PR fixes the implicit character conversion that occurs when writing
a `(u)int8` value to a `stringstream` in the helper function
`std::string to_str(const T& value)`. By casting to `(u)int32`, we avoid
this conversion.

This issue was discovered after encountering a `UnicodeDecodeError` when
using `operator<<` in the TileDB-Py API on a TileDB Array schema that
contained a `(u)int8` attribute. Both a minimal TileDB-Py reproduction
and the original issue involving TileDB-SOMA now work as expected.

[sc-61915]

---
TYPE: NO_HISTORY | BUG
DESC: Fix implicit character conversion in `to_str` by casting (u)int8
to (u)int32.

---------

Co-authored-by: Theodore Tsirpanis <[email protected]>
  • Loading branch information
kounelisagis and teo-tsirpanis authored Feb 14, 2025
1 parent 74ef793 commit d0348dc
Show file tree
Hide file tree
Showing 3 changed files with 69 additions and 2 deletions.
6 changes: 4 additions & 2 deletions tiledb/sm/misc/parse_argument.cc
Original file line number Diff line number Diff line change
Expand Up @@ -241,10 +241,12 @@ std::string to_str(const void* value, Datatype type) {
std::stringstream ss;
switch (type) {
case Datatype::INT8:
ss << *(const int8_t*)value;
// cast to int32 to avoid char conversion to ASCII
ss << static_cast<int32_t>(*(const int8_t*)value);
break;
case Datatype::UINT8:
ss << *(const uint8_t*)value;
// cast to uint32 to avoid char conversion to ASCII
ss << static_cast<uint32_t>(*(const uint8_t*)value);
break;
case Datatype::INT16:
ss << *(const int16_t*)value;
Expand Down
1 change: 1 addition & 0 deletions tiledb/sm/misc/test/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ commence(unit_test misc)
unit_hilbert.cc
unit_integral_type_casts.cc
unit_math.cc
unit_parse_argument.cc
)
conclude(unit_test)

Expand Down
64 changes: 64 additions & 0 deletions tiledb/sm/misc/test/unit_parse_argument.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
/**
* @file unit_parse_argument.cc
*
* @section LICENSE
*
* The MIT License
*
* @copyright Copyright (c) 2025 TileDB, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
* @section DESCRIPTION
*
* Tests for useful (global) functions.
*/

#include "catch.hpp"
#include "tiledb/sm/enums/datatype.h"
#include "tiledb/sm/misc/parse_argument.h"

using namespace tiledb::sm::utils::parse;
using namespace tiledb::sm;

TEST_CASE("Test to_str function for integers", "[to_str][integer]") {
int8_t int8_value = -10;
uint8_t uint8_value = 10;

REQUIRE(to_str(&int8_value, Datatype::INT8) == "-10");
REQUIRE(to_str(&uint8_value, Datatype::UINT8) == "10");

int16_t int16_value = -10;
uint16_t uint16_value = 10;

REQUIRE(to_str(&int16_value, Datatype::INT16) == "-10");
REQUIRE(to_str(&uint16_value, Datatype::UINT16) == "10");

int32_t int32_value = -10;
uint32_t uint32_value = 10;

REQUIRE(to_str(&int32_value, Datatype::INT32) == "-10");
REQUIRE(to_str(&uint32_value, Datatype::UINT32) == "10");

int64_t int64_value = -10;
uint64_t uint64_value = 10;

REQUIRE(to_str(&int64_value, Datatype::INT64) == "-10");
REQUIRE(to_str(&uint64_value, Datatype::UINT64) == "10");
}

0 comments on commit d0348dc

Please sign in to comment.