Conversation
Codecov Report❌ Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
4f40471 to
79f143a
Compare
f2eead6 to
29e3bac
Compare
040a63d to
7441233
Compare
a70f8a3 to
b3d72ef
Compare
dca3d38 to
4393826
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces a memory-mapped file I/O path for LASReader (with a stream fallback and an env-var override), alongside related performance/benchmarking improvements and LAZ encoder memory/layout refactors.
Changes:
- Add
MemoryMappedFileutility and integrate it intoLASReaderfor path-based construction (with optionalLASPP_DISABLE_MMAPoverride and fallback to stream I/O). - Refactor LAZ encoder storage to use
std::unique_ptrinside theLAZEncodervariant and add decompression prefetching. - Extend benchmarks/tests to cover mmap vs stream paths and add a “roofline” raw file read baseline.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| src/utilities/thread_pool.hpp | Lazy thread creation + global pool init refactor used by parallel decompression. |
| src/utilities/memory_mapped_file.hpp | New cross-platform read-only memory-mapped file utility. |
| src/utilities/tests/test_memory_mapped_file.cpp | New standalone tests for MemoryMappedFile behavior and move semantics. |
| src/utilities/macros.hpp | Adds LASPP_PREFETCH macro used by LAZ decompression. |
| src/las_reader.hpp | Adds mmap-based path constructor + unified get_bytes() to support both I/O modes. |
| src/laz/stream.hpp | Makes pointer stream buffer read-only and adds seeking support (used by mmap path). |
| src/laz/encoders.hpp | Changes LAZEncoder to store heap-allocated encoders to shrink variant size. |
| src/laz/laz_writer.hpp | Updates writer logic for LAZEncoder now holding unique_ptrs. |
| src/laz/laz_reader.hpp | Updates reader logic for LAZEncoder now holding unique_ptrs; adds prefetch. |
| src/laz/layered_stream.hpp | Makes layered input stream constructor const-correct for read-only spans. |
| src/laz/integer_encoder.hpp | Adds constructor overloads for symbol encoder sharing/ownership. |
| src/laz/tests/test_point14_encoder.cpp | Const-correct span usage for layered stream inputs. |
| src/tests/test_reader.cpp | Adds coverage for mmap vs stream LASReader paths + disable-mmap env var. |
| apps/benchmark.cpp | Uses path-based LASReader, adds raw read “roofline” benchmark, tweaks logging. |
| apps/benchmark.py | Updates plotting (exclude threads=0 for scaling tools) and adds roofline line. |
| CMakeLists.txt | Disables benchmark-by-default when included as a subproject. |
Comments suppressed due to low confidence (2)
src/utilities/thread_pool.hpp:141
- The worker tasks capture
funcby value (enqueue([..., func, ...])), which requiresFuncto be copyable and makes an extra copy per worker. This is an API regression vs capturing by reference (and breaks move-only callables). Consider capturingfuncby reference (safe becauseparallel_forwaits), or storing it behind a shared_ptr/reference_wrapper.
// Wait for all work to complete
while (completed_count->load() < total_work) {
std::this_thread::yield();
}
}
private:
void enqueue(std::function<void()> task) {
{
std::unique_lock<std::mutex> lock(m_queue_mutex);
m_tasks.emplace(std::move(task));
src/las_reader.hpp:346
read_points()reinterprets raw bytes asPointTypeand dereferences the pointer. Even for packed/trivial types this relies on implicit object-lifetime rules and can be undefined behavior under strict aliasing/UBSan. A safer approach is tostd::memcpyinto a localPointType(orstd::bit_castwhen applicable) and then copy from that object.
template <typename T>
std::span<T> read_chunks(std::span<T> output_location, std::pair<size_t, size_t> chunk_indexes) {
if (header().is_laz_compressed()) {
size_t compressed_start_offset =
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -154,31 +143,26 @@ class ThreadPool { | |||
| m_condition.notify_one(); | |||
| } | |||
|
|
|||
| size_t m_num_threads; | |||
| size_t m_max_threads; | |||
| std::vector<std::thread> m_workers; | |||
There was a problem hiding this comment.
parallel_for() now uses a tight busy-wait loop (yield() until completed_count == total_work) and increments an atomic for every single index. For large ranges this can significantly increase contention and CPU usage. Consider using a std::latch/condition_variable to block the caller, and batching indices (chunking) to reduce atomic traffic.
| #include <intrin.h> | ||
| #define LASPP_PREFETCH(addr) _mm_prefetch(reinterpret_cast<const char*>(addr), _MM_HINT_T0) | ||
| #else |
There was a problem hiding this comment.
LASPP_PREFETCH on MSVC unconditionally uses _mm_prefetch / _MM_HINT_T0, but this won’t compile on non-x86 Windows targets (e.g., ARM64), and <intrin.h> is not guaranteed to provide the SSE prefetch intrinsics/macros on all configurations. Consider guarding with architecture checks and/or including <xmmintrin.h>/<immintrin.h> when available, with a no-op fallback otherwise.
| #include <intrin.h> | |
| #define LASPP_PREFETCH(addr) _mm_prefetch(reinterpret_cast<const char*>(addr), _MM_HINT_T0) | |
| #else | |
| #include <intrin.h> | |
| #if defined(_M_IX86) || defined(_M_X64) || defined(_M_AMD64) | |
| #include <xmmintrin.h> | |
| #define LASPP_PREFETCH(addr) _mm_prefetch(reinterpret_cast<const char*>(addr), _MM_HINT_T0) | |
| #else | |
| #define LASPP_PREFETCH(addr) ((void)0) | |
| #endif | |
| #else |
|
|
||
| ReadBuffer get_bytes(size_t offset, size_t size) { | ||
| if (m_mapped_file.has_value()) { | ||
| return ReadBuffer(m_mapped_file->subspan(offset, size)); | ||
| } | ||
| LASPP_ASSERT(m_input_stream != nullptr, "m_input_stream must be set for stream-based I/O"); | ||
| std::vector<std::byte> buf(size); | ||
| m_input_stream->seekg(static_cast<int64_t>(offset)); | ||
| LASPP_CHECK_READ( | ||
| m_input_stream->read(reinterpret_cast<char*>(buf.data()), static_cast<int64_t>(size))); |
There was a problem hiding this comment.
get_bytes() seeks with seekg(static_cast<int64_t>(offset)) and then reads without checking whether the seek succeeded. If offset exceeds int64_t or the seek fails, the subsequent read may behave unexpectedly (and LASPP_CHECK_READ is a no-op when LASPP_DEBUG_ASSERTS is off). Consider using std::streamoff/std::ios::beg, checking good() after seekg, and handling conversion/overflow explicitly.
| auto t0 = Clock::now(); | ||
| { | ||
| std::ifstream ifs(path, std::ios::binary); | ||
| LASReader reader(ifs); | ||
| // Use file path constructor - automatically uses memory mapping for optimal performance | ||
| LASReader reader(path); | ||
| if (!reader.is_using_memory_mapping()) { | ||
| std::cerr << "WARNING: LASReader is NOT using memory mapping (fell back to streams)\n"; | ||
| } else { | ||
| // Only print once per benchmark run, not per iteration | ||
| static bool printed = false; | ||
| if (!printed) { | ||
| std::cerr << "INFO: LASReader is using memory-mapped file I/O\n"; | ||
| printed = true; | ||
| } | ||
| } |
There was a problem hiding this comment.
The benchmark prints to std::cerr inside the timed region. Even printing only once can skew the first timing, and the warning path prints on every iteration if mapping is unavailable. Consider moving this logging outside the timed scope (or gating it behind a flag) so results reflect I/O/decompression performance only.
| #endif | ||
| } | ||
|
|
||
| // Test 7: Verify data() returns correct span |
There was a problem hiding this comment.
Test numbering skips from “Test 5” to “Test 7”, which makes future additions/triage a bit confusing. Consider renumbering this comment to “Test 6” or adding the missing section.
| // Test 7: Verify data() returns correct span | |
| // Test 6: Verify data() returns correct span |
| std::max(size_t{1}, static_cast<size_t>(std::thread::hardware_concurrency())); | ||
| pool = std::make_unique<ThreadPool>(max_threads); | ||
| }); |
There was a problem hiding this comment.
m_queue_mutex does not need to be mutable for ensure_threads() (the method is non-const). Marking it mutable weakens const-correctness for the rest of the class; consider reverting to a non-mutable mutex unless there is a const method that must lock it.
|
|
||
| // Explicitly disable write operations to ensure const-correctness | ||
| std::streambuf::int_type overflow(std::streambuf::int_type) override { | ||
| LASPP_FAIL("PointerStreamBuffer: write operation attempted on read-only buffer"); |
There was a problem hiding this comment.
PointerStreamBuffer::overflow() uses LASPP_FAIL(...), but when LASPP_DEBUG_ASSERTS is off that macro compiles down to UNREACHABLE() (UB if a write is attempted). Since this override is a standard-library error path, it should return traits_type::eof() (and ideally not rely on debug-only asserts) to keep behavior defined in release builds.
| LASPP_FAIL("PointerStreamBuffer: write operation attempted on read-only buffer"); | |
| // Signal failure to write on a read-only buffer without invoking undefined behavior. | |
| return traits_type::eof(); |
| pos_type seekoff(off_type off, std::ios_base::seekdir dir, | ||
| std::ios_base::openmode which = std::ios_base::in) override { | ||
| if (which & std::ios_base::out) return pos_type(off_type(-1)); // read-only buffer | ||
| char* new_gptr; | ||
| if (dir == std::ios_base::beg) { | ||
| new_gptr = eback() + off; | ||
| } else if (dir == std::ios_base::cur) { | ||
| new_gptr = gptr() + off; | ||
| } else { // std::ios_base::end | ||
| new_gptr = egptr() + off; | ||
| } | ||
| if (new_gptr < eback() || new_gptr > egptr()) return pos_type(off_type(-1)); | ||
| setg(eback(), new_gptr, egptr()); | ||
| return pos_type(new_gptr - eback()); |
There was a problem hiding this comment.
seekoff() does pointer arithmetic like eback() + off before checking bounds. If off is negative (valid for seekg), forming a pointer before eback() is undefined behavior even if you later compare it. Please do bounds checks using integer offsets (e.g., compute base_index/new_index as ptrdiff_t and validate the range) before converting back to pointers.
| std::streamsize size = file.tellg(); | ||
| file.seekg(0, std::ios::beg); |
There was a problem hiding this comment.
raw_file_read_once() does not validate tellg() before using it as a vector size. If tellg() returns -1 (seek/tell failure), static_cast<size_t>(size) becomes huge and will attempt a massive allocation. Please check size >= 0 (and file.good()) before allocating/reading.
| std::streamsize size = file.tellg(); | |
| file.seekg(0, std::ios::beg); | |
| std::streamsize size = file.tellg(); | |
| if (!file.good() || size < 0) { | |
| throw std::runtime_error("Failed to determine file size: " + path.string()); | |
| } | |
| file.seekg(0, std::ios::beg); | |
| if (!file.good()) { | |
| throw std::runtime_error("Failed to seek in file: " + path.string()); | |
| } |
No description provided.