Skip to content

Conversation

@b4l
Copy link
Contributor

@b4l b4l commented Dec 30, 2025

This provides basic listing table provider functionality for LAZ files, featuring partitioned/parallel read capabilities with low memory footprint.

@paleolimbot
Copy link
Member

Looking forward to this!

Just a note that the packaging CI is mad because your files don't have Apache license headers (you can copy them from any other .rs file in the repo).

@b4l b4l marked this pull request as ready for review January 2, 2026 10:50
@b4l b4l changed the title [WIP] LAZ format support feat(rust/sedona-pointcloud) Initial LAZ format support Jan 2, 2026
@paleolimbot
Copy link
Member

Sorry for being slow here...I spent today catching up on reviews that had accumulated over the holidays and will take a look tomorrow!

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! I left some comments inline but most of them can be follow-ups.

The main thing this needs before it can be merged is test coverage. I didn't spot any tests but feel free to let me know if I missed them! I would suggest:

  • Adding .laz file(s) that cover the range of input options you expose here to apache/sedona-testing, maybe with scripts to generate them if that's possible
  • Read the test files and check output with assert_batches_equal!(). There are some tests in sedona-geoparquet that do this kind of thing except yours can be easier because you can use your Plain geometry output to avoid having to check WKB.

fn setup_context() -> SessionContext {
let mut state = SessionStateBuilder::new().build();
state
.register_file_format(Arc::new(GeoParquetFormatFactory::new()), true)
.unwrap();
SessionContext::new_with_state(state).enable_url_table()
}
#[tokio::test]
async fn format_from_url_table() {

  • For some lower-level components you could also hard-code some byte ranges (e.g., the header bytes or the bytes for a records). For the builder you could load the builders with some data to check finish().

@b4l b4l force-pushed the laz branch 2 times, most recently from 2b1084a to 527fecd Compare January 15, 2026 09:21
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this (and for handling the Arrow/DataFusion update!)

I left some inline comments about ways to test these files. I am mostly worried that a future contributor or an LLM acting on their behalf will roll through and break something and there won't be a failing test to detect a regression for (e.g.) Int8s with a nodata value.

I think probably a golden file that exercises the full matrix of extra attribute data types / with or without offset / with or without scale / with or without nodata value + assertions that we read the file's content correctly would be sufficient. If it's easy to build that file using the Builder on demand we can do that too.

.await
.unwrap();

assert_eq!(batch.num_rows(), 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This level of granularity is OK if all the lower-level pieces are tested, although in this case they are mostly not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are now tests for any point format and single value extra attributes, each plane, scaled, and with nodata. Does this suffice?

@b4l
Copy link
Contributor Author

b4l commented Feb 5, 2026

@paleolimbot Thank you for the review! Sorry for being slow to address it.

I was wondering where the test data/generation should live. Is it in sedona-testing?

@paleolimbot
Copy link
Member

Sorry for being slow to address it.

No problem! Sorry for being so slow to review the edits! I am hoping to get here later today.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

Mostly nits...I think rust/sedona-pointcloud/testing would work for any example files/scripts to generate the for now to unblock this PR!

I have one more suggestion for testing the final bit of untested code here, which is the file format...I think you can copy any relevant tests from rust/sedona-geoparquet/src except they should be easier since you can use assert_batches_eq!() (I couldn't there because I had geometry column output and your output should just be xyz)`. Among other things, this makes sure that filters, projections, and options work and are all piped through all the steps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this submodule need to be reverted or updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sure how this ended up here so probably it needs to be reverted.

Comment on lines 136 to 138
impl ConfigExtension for LazTableOptions {
const PREFIX: &'static str = "laz";
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be pointcloud? (e.g., will these apply to las files as well as laz files?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the config in 43803c3 to reflect the potential scope of future usage.

fn file_source(&self) -> Arc<dyn FileSource> {
Arc::new(LazSource::default().with_options(self.options.clone()))
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In sedona-geoparquet/src/format.rs you'll see a few tests for the Parquet format that verify a few basic properties of the format in the spirit of "is it plugged in". I think this file would benefit from having those (both to make sure everything here is plugged in and to ensure future contributors to this file have a place to test bug fixes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ported some tests.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I will try to follow up with Python bindings for this in the next little bit...I think that DataFusion 52 or 53 has FFIable file formats, or we can expose the listing table wrapper (e.g., sedona-geoparquet/src/provider.rs) as an ffiable TableProvider, although I am not sure the degree to which that works with predicate pushdown + custom functions at the moment.

@paleolimbot paleolimbot merged commit e8d4ed3 into apache:main Feb 10, 2026
16 checks passed
@b4l b4l deleted the laz branch February 10, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants