lsCOMP (light source COMPression) is a user-friendly and fast GPU lossy/lossless compressor for light source data and unsigned integers (both uint32 and uint16).
Both compression and decompression in lsCOMP is fully executed in a single NVIDIA GPU kernel without CPU intervention, guaranteeing high end-to-end performance.
Supporting both configurable lossy and lossless compression modes, lsCOMP can be used in diverse scenarios that require different level of data fidelity.
lsCOMP is not only suitable for light source data but also for generic integer compression that requres high speed (e.g., visualization datasets from Open Scivis Datasets).
This work is published in [SC'25] lsCOMP: Efficient Light Source Compression.
- Developer: Yafan Huang, Sheng Di, Robert Underwood
- Contributors: Peco Myint, Miaoqi Chu, Guanpeng Li, Nicholas Schwarz, and Franck Cappello
- Contact:
[email protected]
- Linux OS with NVIDIA GPUs
- Git >= 2.15
- CMake >= 3.21
- CUDA Toolkit >= 11.0
- GCC >= 7.3.0
You can compile lsCOMP by following commands:
$ git clone https://github.com/szcompressor/lsCOMP.git
$ cd lsCOMP
$ mkdir build && cd build
$ cmake ..
$ make -jAfter compilation, you will see 2 executable binaries, lsCOMP_uint32 and lsCOMP_uint16, in lsCOMP/build/.
They are used for performing either configurable lossy or lossless compression for uint32 and uint16 data.
We use lsCOMP_uint32 here to explain; the usage of lsCOMP_uint16 is similar.
$ ./lsCOMP_uint32 --help
lsCOMP Usage:
./lsCOMP_uint32 -i oriFilePath -d dims.x dims.y dims.z -b quantBins.x quantBins.y quantBins.z quantBins.w -p value -x cmpFilePath -o decFilePath
Options:
-i oriFilePath: Path to the original data file
-d dims.x dims.y dims.z: Dimensions of the original data, where dim.z is the fastest dimension.
-b quantBins.x quantBins.y quantBins.z quantBins.w: Quantization bins for the 4 levels, where x is the base one and x<=y<=z<=w.
-p value: Pooling threshold for a data block.
-x cmpFilePath: Path to the compressed data file (optional).
-o decFilePath: Path to the decompressed data file (optional).
Examples:
./lsCOMP_uint32 -i data/cssi.bin -d 600 1813 1558 -b 3 5 10 15 -p 0.5
./lsCOMP_uint32 -i data/cssi.bin -d 600 1813 1558 -b 3 5 10 15 -p 0.5 -x data/cssi-cmp.bin
./lsCOMP_uint32 -i data/cssi.bin -d 600 1813 1558 -b 3 5 10 15 -p 0.5 -o data/cssi-dec.bin
./lsCOMP_uint32 -i data/cssi.bin -d 600 1813 1558 -b 3 5 10 15 -p 0.5 -x data/cssi-cmp.bin -o data/cssi-dec.binNote that lsCOMP supports configurable lossy modes, consisting of two steps: Adaptive Scalar Quantization and Selective Pooling.
You may choose to enable either step individually, enable both, or disable both (which makes lsCOMP operate in a lossless mode).
To disable Adaptive Scalar Quantization, set -b 1 1 1 1; to disable Selective Pooling, set -p 1.
More details about these algorithms can be found in the paper.
A sample output (lossless mode) can be found in the below:
$ ./lsCOMP_uint32 -i cssi_600_1813_1558.uint32 -d 600 1813 1558 -b 1 1 1 1 -p 1 -x cmp.bin -o dec.bin
Section 0: lsCOMP Input Preparation
→ Read data from disk...
✓ Done.
→ Transfer data to GPU...
✓ Done.
Section 1: GPU Warmup
→ Performing GPU warmup runs for 3 iterations...
✓ Done.
Section 2: lsCOMP Compression and Decompression
→ lsCOMP GPU compression...
✓ Done.
→ Verify compressed data correctness via GPU-CPU-GPU transfer (optional step)...
✓ Done.
→ lsCOMP GPU decompression...
✓ Done.
Section 3: Output Data Writing (optional step)
→ Write compressed data from GPU to CPU...
✓ Done.
→ Write compressed data to from CPU to disk...
✓ Done.
→ Write decompressed data from GPU to CPU...
✓ Done.
→ Write decompressed data from CPU to disk...
✓ Done.
====================================
========== lsCOMP Summary ==========
====================================
Dataset information:
- dims: 600 x 1813 x 1558
- length: 1694792400
Input arguments:
- quantBins: 1 1 1 1
- poolingTH: 1.000000
Breakdown of time costs:
- Read data from disk time: 2.681619 s
- CPU data transfer to GPU: 0.530205 s
- GPU compression time: 0.012865 s
- GPU-CPU data tranfer time: 0.186769 s (optional step, flushing cmpData to 0 for verification)
- GPU decompression time: 0.019442 s
- GPU data transfer to CPU: 2.447481 s (optional step, only used when -x/-o flag is used)
- Write data to disk time: 5.952488 s (optional step, only used when -x/-o flag is used)
lsCOMP performance results:
lsCOMP compression end-to-end speed: 490.759780 GB/s
lsCOMP decompression end-to-end speed: 324.740466 GB/s
lsCOMP compression ratio: 16.171549
- oriSize: 6779169600 bytes
- cmpSize: 419203488 bytesBreakdown execution details are printed. This result is measured using an NVIDIA A100 (40 GB) GPU.
lsCOMP also supports Python bindings for fast compression with GPU.
Examples for uint32 and uint16 can be found in python folder.
The required Python packages are ctypes, pycuda, and numpy.
The command usages for Python bindings are consistent with C/C++ APIs. Taking uint32 compression as an example, its usages can be found as below:
$ python example_uint32.py
usage: example_uint32.py [-h] -i ORI_PATH -d DX DY DZ -b BX BY BZ BW -p POOLING [-x CMP_PATH] [-o DEC_PATH]A sample compression can be found as below.
$ python example_uint32.py -i cssi-128.bin -d 128 1813 1558 -b 1 1 1 1 -p 1
=== lsCOMP uint32 example ===
Input file : cssi-128.bin
Dims : (128, 1813, 1558) (z fastest)
Quantization : (1, 1, 1, 1)
Pooling SH : 1.0
Original size : 1379.226 MiB
Compressed size : 59.126 MiB
Compression ratio (orig/cmp): 23.327x
Compression time: 6.133 ms
Compression TP : 235.815 GB/s
Decompress time : 4.242 ms
Decompress TP : 340.932 GB/s
Max abs diff : 0The Python bindings are based on the complied shared/dynamic library liblsCOMP.so, of which default compiled path is build/ folder.
If you compiled this library in another path, please make sure modifying the path in python/lsCOMP.py accordingly.
If you find lsCOMP is useful, the following paper can be considered for citing.
@inproceedings{huang2025lscomp,
title={lsCOMP: Efficient Light Source Compression},
author={Huang, Yafan and Di, Sheng and Underwood, Robert and Myint, Peco and Chu, Miaoqi and and Li, Guanpeng and Schwarz, Nicholas and Cappello, Franck},
booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
pages={1--18},
year={2025}
}(C) 2025 by Argonne National Laboratory and University of Iowa. For more details see COPYRIGHT.
This work is supported by the U.S. Department of Energy (DOE) Office of Science, Advanced Scientific Computing Research (ASCR) and Basic Energy Sciences (BES) under the award "ILLUMINE - Intelligent Learning for Light Source and Neutron Source User Measurements Including Navigation and Experiment Steering." This work also received support from the DOE Office of Science ASCR Leadership Computing Challenge (ALCC) through the 2025–2026 award "Enhancing APS-Enabled Research through Integrated Research Infrastructure." This research used resources of the Advanced Photon Source (APS) and the Argonne Leadership Computing Facility (ALCF), both U.S. DOE Office of Science user facilities operated by Argonne National Laboratory under Contract No. DE-AC02-06CH11357. Additional computing resources were provided by the National Energy Research Scientific Computing Center (NERSC) under ALCC award ERCAP0030693 and the Oak Ridge Leadership Computing Facility (OLCF) at Oak Ridge National Laboratory under Contract No. DE-AC05-00OR22725. We also acknowledge computing resources provided by ALCF Polaris and the Argonne Laboratory Computing Resource Center (LCRC) Swing. This work was further supported by the U.S. DOE Office of Science, ASCR, under Contracts DE-SC0024559, as well as National Science Foundation (NSF) grants OAC-2104023, OAC-2211538, OAC-2311875, OAC-2514036, and OAC-2513768.