feat(rust/sedona-raster-functions): add RS_SetSRID/RS_SetCRS with batch-local cache refactoring#630
Conversation
…SRID/CRS is null Extract input nulls from the original SRID/CRS argument before conversion and merge them into the raster struct's null buffer. This distinguishes actual null inputs (which null the raster) from SRID=0/CRS="0" (which clear the CRS but preserve the raster).
…nstead of Cow The Cow<'static, str> was always used as Cow::Owned, never Borrowed. Plain String removes the enum discriminant overhead and simplifies inserts.
srid_to_crs_columnar and normalize_crs_columnar now return StringViewArray directly instead of ColumnarValue, removing redundant Scalar/Array re-wrapping and the cast_to(Utf8View) in replace_raster_crs. Broadcasting for scalar CRS applied to array raster is handled explicitly.
a8362f8 to
8a2d86b
Compare
There was a problem hiding this comment.
Pull request overview
This pull request adds RS_SetSRID and RS_SetCRS functions for setting the CRS on raster values, while refactoring CRS caching logic into shared batch-local cache structs that are used by both ST_ (geometry) and RS_ (raster) functions.
Changes:
- Adds
RS_SetSRIDandRS_SetCRSfunctions with null-input semantics: null SRID/CRS input nulls the entire raster, while SRID=0/CRS="0" clears the CRS but preserves the raster - Introduces shared
CachedSRIDToCrsandCachedCrsNormalizationcache structs insedona-schema/src/crs.rsfor batch-local CRS conversion caching - Refactors
ST_SetSRID/ST_SetCRSto use the new shared cache types, replacing inline caching with HashMap-based structures
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| rust/sedona-schema/src/crs.rs | Adds CachedSRIDToCrs and CachedCrsNormalization batch-local cache structs; simplifies CachedCrsToSRIDMapping by removing Cow wrapper and using Default derive |
| rust/sedona-raster-functions/src/rs_setsrid.rs | Implements RS_SetSRID and RS_SetCRS functions with 17 tests covering null handling, CRS normalization, and metadata preservation |
| rust/sedona-raster-functions/src/register.rs | Registers the new RS_SetSRID and RS_SetCRS functions |
| rust/sedona-raster-functions/src/lib.rs | Adds rs_setsrid module declaration |
| rust/sedona-raster-functions/benches/native-raster-functions.rs | Adds benchmarks for rs_setcrs and rs_setsrid functions |
| rust/sedona-functions/src/st_setsrid.rs | Refactors to use shared CachedSRIDToCrs and CachedCrsNormalization instead of inline HashSet/HashMap caching |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| Arc::new(StringViewArray::new_null(len)) | ||
| } else { | ||
| let value = array.value(0); | ||
| Arc::new(std::iter::repeat_n(Some(value), len).collect::<StringViewArray>()) |
There was a problem hiding this comment.
I think there is an optimization for string views with respect to appending repeated values:
There was a problem hiding this comment.
The method optimized for inserting repeated values (try_append_value_n) was introduced in arrow-rs 57.2.0, we need to upgrade to a later version of DataFusion and arrow-rs to use them. See apache/datafusion#19797
I would like to stick to the current method since it is the way DataFusion 51 uses to append repeated values: https://github.com/apache/datafusion/blob/51.0.0/datafusion/common/src/scalar/mod.rs#L2960C13-L2964
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
Summary
RS_SetSRIDandRS_SetCRSraster functions that set the CRS on raster values, with null-input semantics: null SRID/CRS input nulls the entire raster, while SRID=0/CRS="0" clears the CRS but preserves the raster.CachedSRIDToCrsandCachedCrsNormalizationbatch-local cache structs insedona-schema/src/crs.rs, shared by both ST_ (geometry) and RS_ (raster) setter functions.ST_SetSRID/ST_SetCRSto use the new shared cache types instead of inline caching.Details
RS_SetSRID / RS_SetCRS
engine: Option<Arc<dyn CrsEngine + Send + Sync>>matching the ST_ pattern_with_engine_udf(engine)and_udf()(delegates withNone) constructorsextract_input_nulls()captures the null buffer from the original input argument before any conversion, thenreplace_raster_crs()merges input nulls into the struct's null buffer viaNullBuffer::union()Batch-local caches
CachedSRIDToCrs: caches SRID integer → CRS string conversion (e.g. 4326 → "OGC:CRS84")CachedCrsNormalization: caches CRS string → abbreviatedauthority:codeform via deserialization#[derive(Default)]and are used identically by ST_ and RS_ functions