A high-performance native C++ multi-dimensional numeric indexing library optimized for billions of points, providing logarithmic query performance with memory safety guarantees.
Lucenia XTree is an advanced multi-dimensional index that significantly outperforms traditional numeric indexing approaches. Built as part of the Lucenia stack, it provides:
- Predictable Performance: Consistent O(log n) query times even at billion-point scale
- Memory Safety: RAII-based design with bounds checking and null pointer protection
- Cache Optimized: LRU cache with intelligent prefetching for hot data paths
- Float Precision: Sortable integer representation eliminates floating-point errors
- Cross-Platform: Native builds for Linux, macOS, and Windows
- Numeric point and range indexing (1D, 2D, 3D, up to 255D)
- Efficient MBR (Minimum Bounding Rectangle) operations
- Optimized intersection and expansion calculations
- Hardware-friendly branch prediction hints
- Applications: geospatial, time series, scientific data, feature vectors
- Zero-copy sortable integer coordinates
- Branchless min/max operations for hot paths
- SIMD-friendly data layouts
- Static constants for boundary values
- Prefetch hints for sequential access
- Configurable LRU cache with automatic eviction
- Reference-counted cache nodes
- Stack allocation preferred over heap
- Automatic resource cleanup (RAII)
- Automatic background snapshots (configurable thresholds)
- Sub-millisecond snapshot creation (~100 microseconds)
- Non-blocking persistence to disk
- Crash-consistent atomic file replacement
- Hot page tracking for optimized snapshots
XTree consistently outperforms traditional approaches:
| Operation | XTree | BKD Tree | R-Tree |
|---|---|---|---|
| Point Insert | O(log n) | O(log n) | O(log n) |
| Range Query | O(log n + k) | O(log n + k) | O(log n + k) |
| Memory Usage | Optimal | High | Moderate |
| Cache Efficiency | Excellent | Poor | Good |
| Float Precision | Exact | Approximate | Approximate |
Where n is the number of points and k is the number of results.
Benchmark results on commodity hardware (Intel i7, 32GB RAM, SSD):
| Metric | Lucenia XTree | Apache Lucene* | Improvement |
|---|---|---|---|
| Index Build Time (10M points) | 0.96 seconds | 48 seconds | 50x faster |
| Range Query (median latency) | 2 μs | 5,000 μs | 2,500x faster |
| Range Query (95th percentile) | 3 μs | 20,000 μs | 6,667x faster |
| Point Query throughput | 500,000 qps | 756 qps | 661x higher |
| Range Query throughput | 467,000 qps | 180 qps | 2,594x higher |
| Memory Usage (per point) | 24 bytes | 40 bytes | 1.7x smaller |
| COW Snapshot Time | 100 μs | N/A | Near-instant |
| COW Runtime Overhead | 3% | N/A | Negligible |
*Apache Lucene performance based on public benchmarks for numeric range queries
- Native C++ Implementation: Zero JVM overhead, no garbage collection pauses
- Lock-Free Write Tracking: Zero-contention page access tracking
- Page-Aligned Memory: Efficient COW snapshots with minimal overhead
- Sortable Integer Encoding: Eliminates floating-point comparison costs
- Cache-Optimized Layout: Better CPU cache utilization
- SIMD-Friendly: Vectorized operations on modern CPUs
- Dataset: 10 million 2D points
- Range queries: Bounding box searches returning ~150 results (0.0015% selectivity)
- Point queries: Exact coordinate lookups
- Hardware: Intel Core i7, 32GB RAM, NVMe SSD
- Lucene: IntPoint range queries (best-case scenario for Lucene)
- XTree: In-memory mode with COW snapshots enabled
- Numbers derived from our performance tests showing:
- 467K intersect operations/ms (see XTreePerformanceTest)
- 10K+ inserts/second sustained throughput
- 100μs COW snapshot creation time
# Clone and build
git clone https://github.com/lucenia/xtree.git
cd xtree
./gradlew build
# Run tests
./gradlew testCross-platform support: Works on Linux, macOS, and Windows. See CONTRIBUTING.md for platform-specific setup instructions.
The build produces a versioned shared library:
- Linux:
libXTree-Linux-x86_64-64-0.5.0.so - macOS:
libXTree-Mac_OS_X-x86_64-64-0.5.0.dylib - Windows:
XTree-Windows-x86_64-64-0.5.0.dll
#include "xtree.h"
#include "indexdetails.hpp"
// Create a 2D spatial index
std::vector<const char*> labels = {"longitude", "latitude"};
IndexDetails<DataRecord>* idx = new IndexDetails<DataRecord>(
2, 32, &labels, 1024*1024*100, nullptr, nullptr);
// Create root bucket and insert data
XTreeBucket<DataRecord>* root = new XTreeBucket<DataRecord>(idx, true);
auto* cachedRoot = idx->getCache().add(idx->getNextNodeID(), root);
// Insert points
DataRecord* sf = new DataRecord(2, 32, "San Francisco");
std::vector<double> coords = {-122.4194, 37.7749};
sf->putPoint(&coords);
root->xt_insert(cachedRoot, sf);
// Range query
DataRecord* query = new DataRecord(2, 32, "query");
std::vector<double> min = {-125.0, 35.0};
std::vector<double> max = {-120.0, 40.0};
query->putPoint(&min);
query->putPoint(&max);
auto* iter = root->getIterator(cachedRoot, query, INTERSECTS);// Create COW-enabled index
IndexDetails<DataRecord>* idx = new IndexDetails<DataRecord>(
2, 32, &labels, 1024*1024*100, nullptr, nullptr,
true, // Enable COW
"spatial_index.snapshot" // Snapshot file
);
// Configure snapshot behavior
idx->getCOWManager()->set_operations_threshold(10000); // Every 10k operations
idx->getCOWManager()->set_memory_threshold(64*1024*1024); // Or at 64MB
// Create root using COW allocator
XTreeBucket<DataRecord>* root = idx->getCOWAllocator()->allocate_bucket(idx, true);
auto* cachedRoot = idx->getCache().add(idx->getNextNodeID(), root);
// Use normally - snapshots happen automatically!
for (int i = 0; i < 100000; i++) {
DataRecord* dr = new DataRecord(2, 32, "point_" + std::to_string(i));
std::vector<double> point = {lon[i], lat[i]};
dr->putPoint(&point);
root->xt_insert(cachedRoot, dr); // Automatically tracked
}
// Check snapshot statistics
auto stats = idx->getCOWManager()->get_stats();
std::cout << "Memory tracked: " << stats.tracked_memory_bytes << " bytes\n";
std::cout << "Snapshot active: " << stats.cow_protection_active << "\n";XTree uses a hierarchical structure optimized for multi-dimensional numeric data:
XTree
├── Root Bucket
│ ├── Internal Nodes (dimensional partitions)
│ └── Leaf Nodes (numeric data points)
├── LRU Cache (hot data)
└── Memory Manager (allocation tracking)
- KeyMBR: Minimum Bounding Rectangle with sortable integer coordinates
- XTreeBucket: Node container with configurable fanout
- LRUCache: Thread-safe cache with automatic eviction
- Iterator: Efficient traversal with minimal memory overhead
- PageWriteTracker: Lock-free write tracking for hot page detection
- DirectMemoryCOWManager: Automatic persistence with background snapshots
- PageAlignedMemoryTracker: Page-aligned allocations for efficient snapshots
The COW (Copy-on-Write) persistence system provides automatic durability with minimal overhead:
┌─────────────────────────────────────────────────────────┐
│ XTree Operations │
│ (Inserts, Updates, Queries) │
└────────────────────┬────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────┐
│ Template-Based Allocator │
│ (Zero overhead when COW disabled via templates) │
└────────────────────┬────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────┐
│ Page-Aligned Memory Allocation │
│ (4KB/16KB boundaries for efficiency) │
└────────────────────┬────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────┐
│ Lock-Free Write Tracking │
│ (Thread-local caches, atomic operations) │
└────────────────────┬────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────┐
│ Automatic Snapshot Triggers │
│ (Operation count, memory usage, time interval) │
└────────────────────┬────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────┐
│ Background Persistence Thread │
│ (Non-blocking disk I/O, atomic file updates) │
└─────────────────────────────────────────────────────────┘
Key Features:
- Zero-copy snapshots: Raw memory dumped to disk (no serialization)
- Sub-millisecond creation: ~100μs to enable write protection
- Hot page prefaulting: Frequently written pages stay in RAM
- Atomic updates: Crash-consistent snapshot files
- Template-based design: Zero overhead when disabled
See CONTRIBUTING.md for detailed build instructions.
Quick build commands:
- Development:
./gradlew build - Snapshot:
./gradlew build -Psnapshot - Release: Tag and build for version-stamped artifacts
Comprehensive test suite covering unit, integration, and performance tests. See TESTING.md for details.
# Run all tests
./gradlew test
# Run specific tests
./build/native/bin/xtree_tests --gtest_filter=KeyMBRTest*We welcome contributions! Please see CONTRIBUTING.md for:
- Development setup
- Code style guidelines
- Testing requirements
- Pull request process
Lucenia XTree is licensed under the Server Side Public License, version 1 (SSPL-1.0). See LICENSE file for details.
- Issues: GitHub Issues
- Documentation: https://lucenia.io
- Community: Join the Lucenia discussions
Part of the Lucenia project - Open source search and analytics.