Releases: lance-format/lance
Releases · lance-format/lance
v9.0.0-beta.8
v9.0.0-beta.7
v9.0.0-beta.6
What's Changed
Breaking Changes 🛠
- refactor!: rename FMIndexIndexDetails to FMIndexDetails by @westonpace in #7397
New Features 🎉
- feat: support cleanup explain for python and java by @yanghua in #7248
- feat(index): support Utf8View prefixes in SargableQueryParser by @wombatu-kun in #7351
- feat(python): expose fragment-reuse remap and delete-by-offset by @wkalt in #7438
Performance Improvements 🚀
- perf: speed up cold read up to 8x by lazily load column metadata by @Xuanwo in #7375
- perf(fts): remove OR hot path bookkeeping by @BubbleCal in #7416
- perf(fts): use block max for or tail bounds by @BubbleCal in #7435
Full Changelog: v9.0.0-beta.5...v9.0.0-beta.6
v9.0.0-beta.5
What's Changed
Bug Fixes 🐛
- fix: relax DataReplacement conflict with concurrent Delete/Update by @brendanclement in #7433
Full Changelog: v9.0.0-beta.4...v9.0.0-beta.5
v9.0.0-beta.4
v8.0.0-rc.2
What's Changed
Breaking Changes 🛠
- feat!: migrate bitmap to index segment based by @Xuanwo in #6869
- fix(python)!: derive index type from details instead of opening the index by @wjones127 in #6903
- refactor!: remove index segment builder by @Xuanwo in #6997
- refactor(index)!: move distributed BTree build to segmented index framework by @zhangyue19921010 in #7013
- feat!: return write summaries from file writers by @Xuanwo in #7096
- perf!: avoid listing index files after writes by @Xuanwo in #7129
- fix(dataset)!: fail-fast casting for columns with attached indices by @WenDing-Y in #7158
- feat(vector)!: add approx mode for RaBitQ search by @BubbleCal in #7179
- perf(vector)!: add dedicated SIMD kernels for RaBitQ ex-code reranking by @BubbleCal in #7205
- refactor!: rename FMIndexIndexDetails to FMIndexDetails by @westonpace in #7397
Critical Fixes ‼️
- fix: merge_insert silently drops matches when a leading payload column is all-null by @Ar-maan05 in #7251
New Features 🎉
- feat: expose tracked_files and all_files on LanceDataset by @wjones127 in #6011
- feat: add commit timeout to CommitBuilder by @wjones127 in #6773
- feat: add segmented BTree index merge_segments support by @zhangyue19921010 in #6889
- feat(index): add streaming ivf kmeans training by @BubbleCal in #6913
- feat: use indexes to accelerate filtered count_rows by @westonpace in #6916
- feat(java): allow schema override for fragment writes by @beinan in #6919
- feat: make ICU the default FTS tokenizer by @Xuanwo in #6968
- feat: support Utf8View and BinaryView in encoding and filter coercion by @xuanyu-z in #6985
- feat(python): add shared RaBitQ rotation for distributed IVF_RQ builds by @gstamatakis95 in #7014
- feat(blobv2): support all BlobKind types in blob v2 compact_files by @yyzhao2025 in #7017
- feat: add TOS object store support via OpenDAL by @ddupg in #7019
- feat(index): implement FM-Index scalar index for exact substring search by @beinan in #7026
- feat(python): expose external blob mode and outside-base option for fragments by @plotor in #7028
- feat(lance-io): add GooseFS object store provider by @XuQianJin-Stars in #7034
- feat(index): support multi-bit IVF_RQ storage by @BubbleCal in #7038
- feat: expose getters for ScalarIndexExec by @LuQQiu in #7039
- feat: expose methods and getters of scalar index for possible distributed execution by @LuQQiu in #7045
- feat(mem_wal): configurable HNSW build params for MemTable writers by @touch-of-grey in #7054
- feat: dedup FTS results across LSM tiers in LsmFtsSearchPlanner by @hamersaw in #7066
- feat(index): support raw-query ivf rq search by @BubbleCal in #7078
- feat: add EnforceDistribution to physical optimizer by @wjones127 in #7086
- feat(java): add missing scanner and merge insert params to align with Python/Rust by @WenDing-Y in #7100
- feat: populate enriched IndexContent fields in dir namespace ListTableIndices by @wjones127 in #7109
- feat(index): support configurable multi-segment FM-Index builds by @beinan in #7123
- feat: support merging zonemap index segments by @Xuanwo in #7128
- feat(index): accelerate regex and infix LIKE with the ngram index by @wombatu-kun in #7139
- feat(rust): add cleanup explain API by @yanghua in #7147
- feat: stabilize cache codec with a versioned envelope by @wjones127 in #7163
- feat(lance-select): expose selected rows accessor on NullableRowAddrSet by @LuQQiu in #7164
- feat: branch-aware table version ops in directory and rest namespaces by @brendanclement in #7166
- feat(python): expose segment FTS build through create_index_uncommitted by @ddupg in #7170
- feat(python): expose zonemap segment builds by @everySympathy in #7177
- feat(dir-catalog): add reader/writer feature flags to __manifest by @jackye1995 in #7191
- feat(index): expose per-query I/O metrics on ANN operators by @wombatu-kun in #7204
- feat(mem-wal): snapshot-consistent as-of cut for fresh-tier membership by @hamersaw in #7215
- feat: expose io_buffer_size in CompactionOptions by @aimanmalib in #7226
- fix(python): expose stable row id property in stub by @BubbleCal in #7249
- feat: bump lance-namespace-reqwest-client to 0.8.6 (source_task_size) by @justinrmiller in #7254
- feat: configure blob inline threshold per column by @Xuanwo in #7269
- feat(mem_wal): warm flushed generations into shared caches before query by @hamersaw in #7284
- feat(java): expose RTree scalar index type to Java by @zhangyue19921010 in #7291
- feat: expose session cache key inventory by @jackye1995 in #7298
- feat: support mixed-language FTS stop words by @Xuanwo in #7324
- feat(scalar): expose LogicalScalarIndex::try_new and load_named_scalar_segments by @LuQQiu in #7339
- feat: allow tuning miniblock value chunks to 32k by @Xuanwo in #7356
Bug Fixes 🐛
- fix(encoding): plan sparse structural miniblock pages by @Xuanwo in #6787
- fix(java): resolve JNI classloader bug on dispatcher thread in Spark by @sezruby in #6946
- fix: preserve zero-length buffers in binary copy compaction by @zhangyue19921010 in #6992
- fix(storage): retry throttled fts metadata listing by @BubbleCal in #6994
- fix: support multivector IVF centroids in segment builds by @ddupg in #6995
- fix(python): validate BFloat16.from_bytes length by @ddupg in #6998
- fix(test): tolerate boundary-tie in IVF distance_range assertions by @xuanyu-z in #6999
- fix(datafusion): coerce filter literals for dictionary-encoded columns by @valkum in #7003
- fix: stream object copies larger than cloud's CopyObject limit by @vivek-bharathan in #7004
- fix: restore simple FTS tokenizer default by @Xuanwo in #7006
- fix: expose Hugging Face download mode by @Xuanwo in #7022
- fix(python): clamp target partition sizing by @ddupg in #7036
- fix(fts): handle empty query tokens in flat full-text search by @vivek-bharathan in #7046
- fix(lance-index): fix some flaky tests by @XuQianJin-Stars in #7052
- fix: specify roaring's patch version by @HuaHuaY in #7056
- fix(filtered-read): record IO metrics even when filter matches no rows by @westonpace in #7057
- fix: handle nested JSON conversion recursively by @Xuanwo in #7060
- fix: retry S3 multipart request timeouts by @Xuanwo in #7061
- fix(fts): size cached posting lists by referenced slice by @vivek-bharathan in #7068
- fix: cap exec-node parallelism to DataFusion target_partitions by @wjones127 in #7087
- fix(mem-wal): fence predecessor with a WAL sentinel on claim by @hamersaw in #7110
- fix(fts): reset TokenSet next_id and total_length after remap by @vivek-bharathan in #7115
- fix(linalg): reduce cosine bench TOTAL to avoid FixedSizeBinaryArray i32 overflow by @weston...
v9.0.0-beta.3
What's Changed
New Features 🎉
- feat: add SpillStore trait with local-disk implementation by @wjones127 in #7311
- feat(mem-wal): add ShardWriter::put_no_wait by @hamersaw in #7362
- feat(python): add list_with_delimiter, delete_file, and read_range to LanceFileSession by @jmhsieh in #7426
Bug Fixes 🐛
- fix(index): drop stale scalar index entries after stable-row-id update by @wkalt in #7359
- fix(merge_insert): keep nested-field index correct on stable row id update by @jackye1995 in #7410
- fix(update): keep nested-field index correct when updating a struct column by @jackye1995 in #7412
- fix(io): clamp ObjectStore::io_parallelism() to at least 1 by @LuciferYang in #7414
- fix(namespace): allow create_branch bootstrap on managed tables by @geruh in #7415
Documentation 📚
Performance Improvements 🚀
- perf(index): reduce TwoFileShuffler peak memory via interleave sort by @wjones127 in #7295
- perf(fts): open fts segments as scalar indices by @BubbleCal in #7408
Full Changelog: v9.0.0-beta.2...v9.0.0-beta.3
v9.0.0-beta.2
What's Changed
New Features 🎉
- feat(namespace-dir): add alter table column operations (add/alter/drop) by @XuQianJin-Stars in #6273
Bug Fixes 🐛
- fix: support manifests >5 GB via size-aware copy by @lixmgl in #7047
- fix: reject DataReplacement racing concurrent Update/Delete/Merge by @wkalt in #7373
- fix(index): use range block max for fts conjunction by @BubbleCal in #7387
Performance Improvements 🚀
- perf(fts): prune low-scoring conjunction candidates by @BubbleCal in #7386
- perf(index): improve FTS search metadata caching by @Xuanwo in #7398
Full Changelog: v9.0.0-beta.1...v9.0.0-beta.2
v9.0.0-beta.1
What's Changed
New Features 🎉
- feat: support COUNT(*) pushdown on stable row id datasets by @wkalt in #7360
- feat: support hamming clustering by @brendanclement in #7379
Bug Fixes 🐛
- fix: route JSON index queries to the correct sub-parser by path by @ztorchan in #7072
- fix: evaluate all list-element docs in FTS prefilter walk-the-allowlist branch by @Ar-maan05 in #7246
- fix: apply per-segment filters and frag-reuse remap in BTree segment merge by @zhangyue19921010 in #7320
- fix(ci): replace deprecated array.shape assignment for NumPy 2.5 by @zhangyue19921010 in #7384
- fix(fts): enforce required terms for and queries by @BubbleCal in #7385
Performance Improvements 🚀
- perf(knn): reduce memory for batch flat vector search by @LeoReeYang in #6950
- perf: speed up ICU FTS index builds by 11% by @Xuanwo in #7393
Other Changes
- refactor(index): rely on total ordering for nan zonemap max by @HaochengLIU in #7049
- refactor: remove as_vector_index from the Index trait by @westonpace in #7392
Full Changelog: release-root/9.0.0-beta.N...v9.0.0-beta.1
v8.0.0-rc.1
What's Changed
Breaking Changes 🛠
- feat!: migrate bitmap to index segment based by @Xuanwo in #6869
- fix(python)!: derive index type from details instead of opening the index by @wjones127 in #6903
- refactor!: remove index segment builder by @Xuanwo in #6997
- refactor(index)!: move distributed BTree build to segmented index framework by @zhangyue19921010 in #7013
- feat!: return write summaries from file writers by @Xuanwo in #7096
- perf!: avoid listing index files after writes by @Xuanwo in #7129
- fix(dataset)!: fail-fast casting for columns with attached indices by @WenDing-Y in #7158
- feat(vector)!: add approx mode for RaBitQ search by @BubbleCal in #7179
- perf(vector)!: add dedicated SIMD kernels for RaBitQ ex-code reranking by @BubbleCal in #7205
Critical Fixes ‼️
- fix: merge_insert silently drops matches when a leading payload column is all-null by @Ar-maan05 in #7251
New Features 🎉
- feat: expose tracked_files and all_files on LanceDataset by @wjones127 in #6011
- feat: add commit timeout to CommitBuilder by @wjones127 in #6773
- feat: add segmented BTree index merge_segments support by @zhangyue19921010 in #6889
- feat(index): add streaming ivf kmeans training by @BubbleCal in #6913
- feat: use indexes to accelerate filtered count_rows by @westonpace in #6916
- feat(java): allow schema override for fragment writes by @beinan in #6919
- feat: make ICU the default FTS tokenizer by @Xuanwo in #6968
- feat: support Utf8View and BinaryView in encoding and filter coercion by @xuanyu-z in #6985
- feat(python): add shared RaBitQ rotation for distributed IVF_RQ builds by @gstamatakis95 in #7014
- feat(blobv2): support all BlobKind types in blob v2 compact_files by @yyzhao2025 in #7017
- feat: add TOS object store support via OpenDAL by @ddupg in #7019
- feat(index): implement FM-Index scalar index for exact substring search by @beinan in #7026
- feat(python): expose external blob mode and outside-base option for fragments by @plotor in #7028
- feat(lance-io): add GooseFS object store provider by @XuQianJin-Stars in #7034
- feat(index): support multi-bit IVF_RQ storage by @BubbleCal in #7038
- feat: expose getters for ScalarIndexExec by @LuQQiu in #7039
- feat: expose methods and getters of scalar index for possible distributed execution by @LuQQiu in #7045
- feat(mem_wal): configurable HNSW build params for MemTable writers by @touch-of-grey in #7054
- feat: dedup FTS results across LSM tiers in LsmFtsSearchPlanner by @hamersaw in #7066
- feat(index): support raw-query ivf rq search by @BubbleCal in #7078
- feat: add EnforceDistribution to physical optimizer by @wjones127 in #7086
- feat(java): add missing scanner and merge insert params to align with Python/Rust by @WenDing-Y in #7100
- feat: populate enriched IndexContent fields in dir namespace ListTableIndices by @wjones127 in #7109
- feat(index): support configurable multi-segment FM-Index builds by @beinan in #7123
- feat: support merging zonemap index segments by @Xuanwo in #7128
- feat(index): accelerate regex and infix LIKE with the ngram index by @wombatu-kun in #7139
- feat(rust): add cleanup explain API by @yanghua in #7147
- feat: stabilize cache codec with a versioned envelope by @wjones127 in #7163
- feat(lance-select): expose selected rows accessor on NullableRowAddrSet by @LuQQiu in #7164
- feat: branch-aware table version ops in directory and rest namespaces by @brendanclement in #7166
- feat(python): expose segment FTS build through create_index_uncommitted by @ddupg in #7170
- feat(python): expose zonemap segment builds by @everySympathy in #7177
- feat(dir-catalog): add reader/writer feature flags to __manifest by @jackye1995 in #7191
- feat(index): expose per-query I/O metrics on ANN operators by @wombatu-kun in #7204
- feat(mem-wal): snapshot-consistent as-of cut for fresh-tier membership by @hamersaw in #7215
- feat: expose io_buffer_size in CompactionOptions by @aimanmalib in #7226
- fix(python): expose stable row id property in stub by @BubbleCal in #7249
- feat: bump lance-namespace-reqwest-client to 0.8.6 (source_task_size) by @justinrmiller in #7254
- feat: configure blob inline threshold per column by @Xuanwo in #7269
- feat(mem_wal): warm flushed generations into shared caches before query by @hamersaw in #7284
- feat(java): expose RTree scalar index type to Java by @zhangyue19921010 in #7291
- feat: expose session cache key inventory by @jackye1995 in #7298
- feat: support mixed-language FTS stop words by @Xuanwo in #7324
- feat(scalar): expose LogicalScalarIndex::try_new and load_named_scalar_segments by @LuQQiu in #7339
- feat: allow tuning miniblock value chunks to 32k by @Xuanwo in #7356
Bug Fixes 🐛
- fix(encoding): plan sparse structural miniblock pages by @Xuanwo in #6787
- fix(java): resolve JNI classloader bug on dispatcher thread in Spark by @sezruby in #6946
- fix: preserve zero-length buffers in binary copy compaction by @zhangyue19921010 in #6992
- fix(storage): retry throttled fts metadata listing by @BubbleCal in #6994
- fix: support multivector IVF centroids in segment builds by @ddupg in #6995
- fix(python): validate BFloat16.from_bytes length by @ddupg in #6998
- fix(test): tolerate boundary-tie in IVF distance_range assertions by @xuanyu-z in #6999
- fix(datafusion): coerce filter literals for dictionary-encoded columns by @valkum in #7003
- fix: stream object copies larger than cloud's CopyObject limit by @vivek-bharathan in #7004
- fix: restore simple FTS tokenizer default by @Xuanwo in #7006
- fix: expose Hugging Face download mode by @Xuanwo in #7022
- fix(python): clamp target partition sizing by @ddupg in #7036
- fix(fts): handle empty query tokens in flat full-text search by @vivek-bharathan in #7046
- fix(lance-index): fix some flaky tests by @XuQianJin-Stars in #7052
- fix: specify roaring's patch version by @HuaHuaY in #7056
- fix(filtered-read): record IO metrics even when filter matches no rows by @westonpace in #7057
- fix: handle nested JSON conversion recursively by @Xuanwo in #7060
- fix: retry S3 multipart request timeouts by @Xuanwo in #7061
- fix(fts): size cached posting lists by referenced slice by @vivek-bharathan in #7068
- fix: cap exec-node parallelism to DataFusion target_partitions by @wjones127 in #7087
- fix(mem-wal): fence predecessor with a WAL sentinel on claim by @hamersaw in #7110
- fix(fts): reset TokenSet next_id and total_length after remap by @vivek-bharathan in #7115
- fix(linalg): reduce cosine bench TOTAL to avoid FixedSizeBinaryArray i32 overflow by @westonpace in #7116
- fix: compile AVX-512 dist table for target CPU by @Xuanwo in https...