-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add union_tag
scalar function
#14687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// Union fields type IDs only constraints are being unique and in the 0..128 range: | ||
// They may not start at 0, be sequential, or even contiguous. | ||
// Therefore, we allocate a values vector with a length equal to the highest type ID plus one, | ||
// ensuring that each field's name can be placed at the index corresponding to its type ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The union column used on the sqllogictests contains a single field with type id 3, so this is put to the test
datafusion/datafusion/sqllogictest/src/test_context.rs
Lines 411 to 430 in e4b78c7
fn register_union_table(ctx: &SessionContext) { | |
let union = UnionArray::try_new( | |
UnionFields::new(vec![3], vec![Field::new("int", DataType::Int32, false)]), | |
ScalarBuffer::from(vec![3, 3]), | |
None, | |
vec![Arc::new(Int32Array::from(vec![1, 2]))], | |
) | |
.unwrap(); | |
let schema = Schema::new(vec![Field::new( | |
"union_column", | |
union.data_type().clone(), | |
false, | |
)]); | |
let batch = | |
RecordBatch::try_new(Arc::new(schema.clone()), vec![Arc::new(union)]).unwrap(); | |
ctx.register_batch("union_table", batch).unwrap(); | |
} |
datafusion/datafusion/sqllogictest/src/test_context.rs
Lines 117 to 120 in e4b78c7
"union_function.slt" => { | |
info!("Registering table with union column"); | |
register_union_table(test_ctx.session_ctx()) | |
} |
@alamb - here is another function coming in (xxhash, regexp_extract (both versions of it), array_min/array_max functions) where it is not clear what should be accepted and what shouldn't be. Since we've already accepted union_extract this may be a case of fleshing that series of functions out. I strongly think we need 'official' documentation as to what will be and won't be accepted and an recommended repository/sub project where additional functions can be located. Hopefully under the apache umbrella such that they hopefully can maintained by the community and work across multiple DF versions. |
Yeah I agree. I think we should file a "discussion" type ticket to have this discussion. I can file one at some point later (I am low on time this week) or if you can that would be sweet. we have some generic guidance here: https://datafusion.apache.org/contributor-guide/index.html#what-contributions-are-good-fits |
on it. #14777 |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
@alamb, thoughts on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think we should merge this @Omega359 I am fine with doing so too.
I have merged up from main to get a clean CI run, and hopefully we'll be good.
I think this will nicely round out the UnionArray
related functions
I'll review it today @alamb |
LGTM. I think I'd like to see a test with multiple columns but the logic looks solid to me. I believe the use of unsafe is indeed ok given the conditions outlined. |
@@ -23,7 +26,8 @@ query ?I | |||
select union_column, union_extract(union_column, 'int') from union_table; | |||
---- | |||
{int=1} 1 | |||
{int=2} 2 | |||
{string=bar} NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new row to the table so we could test union_tag with a field that did not exist, per @Omega359 's suggestion
select union_column, union_tag(union_column) from union_table; | ||
---- | ||
{int=1} int | ||
{string=bar} string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new test as suggested
Thanks again for the review. I added the requested test. I think the CI is going to fail because of #15149 (comment) Once that is fixed I'll refresh the PR |
* feat: add union_tag scalar function * update for new api * Add test for second field type --------- Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
union_tag
function #11080Rationale for this change
Retrieve the name of the currently selected field on a union, as there's no way to do it today
What changes are included in this PR?
union_tag
scalar function implementationAre these changes tested?
Yes, with sqllogictests when possible, and with unit tests for union scalars, which are not supported in SQL yet
Are there any user-facing changes?
A new scalar function
union_tag