Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VARINT and numerics type conversion #626

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dentiny
Copy link
Contributor

@dentiny dentiny commented Feb 26, 2025

Addresses #598

Sorry for the latency, I was thinking to implement our own conversion from internal storage format of VARINT, then I realize postgres already provides a util function.

@dentiny dentiny force-pushed the hjiang/varint-numerics branch from f2d8b50 to f8eb069 Compare February 26, 2025 08:17
Copy link
Collaborator

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I think this is mostly good, but as in the other comment. Let's move the code to src/pg/types.cpp.

Comment on lines 344 to 345
Datum pg_numeric = DirectFunctionCall3(numeric_in, CStringGetDatum(value_str.c_str()),
ObjectIdGetDatum(InvalidOid), Int32GetDatum(-1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this conversion I appreciate the simplicity that the string conversion gives us. Especially since VARINTs are fairly rare types and already a pretty expensive anyway. So converting making the conversion to Postgres a bit more expensive than strictly necessary seems acceptable. This does need a PostgresFunctionGuard though, to keep Postgres C code a bit more physically separated from the C++ code let's move this to logic to the src/pg/types.cpp file. (we're steadily moving more code to the src/pg directory, but we're not done with that migration yet)

Copy link
Contributor Author

@dentiny dentiny Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does need a PostgresFunctionGuard though, to keep Postgres C code a bit more physically separated from the C++ code

Make sense! Function guard added.

Just to confirm, I don't think we need type conversion for numerics <-> VARINT, right?
Because the entrypoint is in pg and we have other mapping (double) in duckdb.

let's move this to logic to the src/pg/types.cpp file.

I could understand the motivation (i.e. place pg related code under pg directory for better maintainability, namespace, etc), but I'm not sure which part to place, should I move all functions there?

As you might be aware of, I'm also working on other type conversions; do you think it ok after I finish them? (I promise I will prioritize them). I'm willing to help migration later on :)

Copy link
Contributor Author

@dentiny dentiny Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this conversion I appreciate the simplicity that the string conversion gives us. Especially since VARINTs are fairly rare types and already a pretty expensive anyway

Yeah I implement via pg native function mostly for code simplicity (but seems other types like TIME could be more easily implement, another story). I leave a comment for the motivation, and what we could improve.

@dentiny dentiny force-pushed the hjiang/varint-numerics branch 2 times, most recently from b5799e9 to e3a22bb Compare February 26, 2025 13:14
@dentiny dentiny requested a review from JelteF February 26, 2025 13:19
@dentiny dentiny marked this pull request as draft February 26, 2025 13:22
@dentiny dentiny changed the title VARINT and numerics type conversion [WIP] VARINT and numerics type conversion Feb 26, 2025
@dentiny dentiny force-pushed the hjiang/varint-numerics branch from e3a22bb to fb34cbb Compare February 26, 2025 13:34
// A wrapper function to convert a string to a numeric value, used for `PostgresFunctionGuard` macro.
static Datum
ConvertStringToNumerics(Datum str, Datum typelem, Datum typmod) {
Datum pg_numeric = DirectFunctionCall3(numeric_in, str, typelem, typmod);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise compilation failure

src/pgduckdb_types.cpp: In function 'Datum pgduckdb::ConvertNumericDatum(const duckdb::Value&)':
include/pgduckdb/pgduckdb_utils.hpp:106:68: error: no matching function for call to '__PostgresFunctionGuard__<Datum (*)(FunctionCallInfo), numeric_in>(const char [11], Datum, Datum, Datum)'
  106 |         pgduckdb::__PostgresFunctionGuard__<decltype(&FUNC), &FUNC>(#FUNC, ##__VA_ARGS__)
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
src/pgduckdb_types.cpp:346:42: note: in expansion of macro 'PostgresFunctionGuard'
  346 |                 const Datum pg_numeric = PostgresFunctionGuard(numeric_in, CStringGetDatum(value_str.c_str()),
      |                                          ^~~~~~~~~~~~~~~~~~~~~
include/pgduckdb/pgduckdb_utils.hpp:83:1: note: candidate: 'template<class Func, Func func, class ... FuncArgs> typename std::invoke_result<Func, FuncArgs ...>::type pgduckdb::__PostgresFunctionGuard__(const char*, FuncArgs ...)'
   83 | __PostgresFunctionGuard__(const char *func_name, FuncArgs... args) {
      | ^~~~~~~~~~~~~~~~~~~~~~~~~

@dentiny dentiny changed the title [WIP] VARINT and numerics type conversion VARINT and numerics type conversion Feb 26, 2025
@dentiny dentiny marked this pull request as ready for review February 26, 2025 13:35
Comment on lines 353 to 355
const Datum pg_numeric =
PostgresFunctionGuard(ConvertStringToNumerics, /*str=*/CStringGetDatum(value_str.c_str()),
/*typelen=*/ObjectIdGetDatum(InvalidOid), /*typmod=*/Int32GetDatum(-1));
Copy link
Collaborator

@JelteF JelteF Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes most sense to create a function in pg/types.cpp

Datum
StringToNumeric(char* str) {
	// ...
}

And then have that function call the actual function with a PostgresFunctionGuard and name that actual function StringToNumeric_C. Similarly to how GetBaseDuckColumnType is implemented.

static Oid
GetBaseDuckColumnType_C(Oid attribute_type_oid) {
Oid typoid = attribute_type_oid;
if (get_typtype(attribute_type_oid) == TYPTYPE_DOMAIN) {
/* It is a domain type that needs to be reduced to its base type */
typoid = getBaseType(attribute_type_oid);
} else if (type_is_array(attribute_type_oid)) {
Oid eltoid = get_base_element_type(attribute_type_oid);
if (OidIsValid(eltoid) && get_typtype(eltoid) == TYPTYPE_DOMAIN) {
/* When the member type of an array is domain, you need to build a base array type */
typoid = get_array_type(getBaseType(eltoid));
}
}
return typoid;
}
Oid
GetBaseDuckColumnType(Oid attribute_type_oid) {
return PostgresFunctionGuard(GetBaseDuckColumnType_C, attribute_type_oid);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Updated.

@dentiny dentiny requested a review from JelteF February 26, 2025 19:38
#include "utils/numeric.h"
#include "utils/uuid.h"
#include "utils/array.h"
#include "fmgr.h"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice fmgr.h is included twice, sort the headers so we don't get confused.

@dentiny dentiny force-pushed the hjiang/varint-numerics branch from e1dceb7 to 9c0f1de Compare February 26, 2025 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants