Skip to content

Add cache_layer_indices to reduce wavefunction storage by selecting specific slice depths#95

Open
HaoranLMaoMao wants to merge 3 commits intoh-walk:mainfrom
HaoranLMaoMao:feat/cache-layer-indices
Open

Add cache_layer_indices to reduce wavefunction storage by selecting specific slice depths#95
HaoranLMaoMao wants to merge 3 commits intoh-walk:mainfrom
HaoranLMaoMao:feat/cache-layer-indices

Conversation

@HaoranLMaoMao
Copy link
Copy Markdown

Problem

When cache_levels=["slices"] is set, MultisliceCalculator stores exit-wavefunctions for all nz propagation slices. For a typical simulation (440 slices × 4,096 MD frames × 25 probes × complex128) this produces ~4 TB per block. Users who only need EELS spectra at a few target thicknesses are forced to store and manage tens of terabytes of intermediate data, even though only a small fraction is ever used downstream.


Solution

This pull request adds a single optional parameter cache_layer_indices to setup():

ms.setup(
    cache_levels=["slices"],
    cache_layer_indices=[43, 87, 175, 263, 351, 439],  # None = all layers (default)
)

When set, only the listed layers are FFT'd and stored in frame_data / wavefunction_data; the remaining layers are discarded immediately after propagation without any FFT or disk write. The full multislice propagation through all nz slices still runs (physically required). WFData.layer is updated to hold the actual layer indices instead of arange(nz).


Changes

  • MultisliceCalculator.setup(): new parameter cache_layer_indices: Optional[List[int]] = None
  • MultisliceCalculator.run(): introduces self._active_layers to replace the hardcoded range(self.n_layers) loop; allocation of wavefunction_data and frame_data uses len(self._active_layers) as the last dimension
  • WFData.layer: now holds the actual slice indices (e.g. [43, 87, 175, 263, 351, 439]) instead of arange(nz), so downstream code can map a target depth back to its compact storage slot via list(wave.layer).index(layer_idx). When cache_layer_indices=None, wave.layer remains identical to the original arange(nz).
  • Fully backward-compatible: cache_layer_indices=None (default) preserves existing behaviour exactly — no changes required to existing scripts

Downstream usage example

Users specify layer indices directly. The corresponding physical depths can be
derived from slice_thickness for reference:

slice_thickness = 1.0   # Angstrom, same value passed to ms.setup()
target_layers   = [43, 87, 175, 263, 351, 439]
 
# Show what physical depth each layer corresponds to
for layer_idx in target_layers:
    depth_nm = (layer_idx + 1) * slice_thickness / 10
    print(f"  layer {layer_idx:4d}  ->  {depth_nm:.2f} nm")
# layer   43  ->  4.40 nm
# layer   87  ->  8.80 nm
# layer  175  -> 17.60 nm
# layer  263  -> 26.40 nm
# layer  351  -> 35.20 nm
# layer  439  -> 44.00 nm
 
ms.setup(
    cache_levels=["slices"],
    cache_layer_indices=target_layers,
    ...
)
wave = ms.run()
# wave.layer -> array([43, 87, 175, 263, 351, 439])
 
# Post-processing: map layer index to compact storage slot
for layer_idx in target_layers:
    out_idx = list(wave.layer).index(layer_idx)
    tacaw = TACAWData(wave, layer_index=out_idx)

Impact

For the amorphous-Si benchmark (nz = 440, saving 6 layers):

Before After
wavefunction_data shape (25, 4096, 111, 111, 440) (25, 4096, 111, 111, 6)
Disk per block ~4 TB ~55 GB
Saving ~98.6%

Added cache_layer_indices parameter to control which slice layers are stored. Updated related documentation and logic to handle selective layer storage.
Updated comments for clarity and consistency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant