[BUG] list index out of range from dataloader #235

YuanfengZhang · 2025-03-02T09:52:43Z

Describe the bug
list index out of range when dealing with pandas dataframe.

To Reproduce
Steps to reproduce the behavior:

from mambular.models import MambularRegressor
import pandas as pd
from sklearn.model_selection import train_test_split
import torch

torch.set_float32_matmul_precision('high')

df = pd.read_csv('./df.csv')
(X_train,
 X_test,
 y_train,
 y_test) = train_test_split(df[[i for i in df.columns if 'f' in i]].values,
                            df[['y']].values,
                            random_state=42)

regressor = MambularRegressor()
regressor.fit(X_train, y_train, max_epochs=4)
print(regressor.evaluate(X_test, y_test))

Expected behavior
No error, regressor fitted.

Screenshots
Here the full output comes:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[14], line 7
      4 torch.set_float32_matmul_precision('high')
      6 regressor = MambularRegressor()
----> 7 regressor.fit(X_train, y_train, max_epochs=4)
      8 print(regressor.evaluate(X_test, y_test))

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/mambular/models/sklearn_base_regressor.py:398, in SklearnBaseRegressor.fit(self, X, y, val_size, X_val, y_val, embeddings, embeddings_val, max_epochs, random_state, batch_size, shuffle, patience, monitor, mode, lr, lr_patience, lr_factor, weight_decay, checkpoint_path, dataloader_kwargs, train_metrics, val_metrics, rebuild, **trainer_kwargs)
    388 # Initialize the trainer and train the model
    389 self.trainer = pl.Trainer(
    390     max_epochs=max_epochs,
    391     callbacks=[
   (...)    396     **trainer_kwargs,
    397 )
--> 398 self.trainer.fit(self.task_model, self.data_module)  # type: ignore
    400 best_model_path = checkpoint_callback.best_model_path
    401 if best_model_path:

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py:539, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    537 self.state.status = TrainerStatus.RUNNING
    538 self.training = True
--> 539 call._call_and_handle_interrupt(
    540     self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    541 )

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/lightning/pytorch/trainer/call.py:47, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     45     if trainer.strategy.launcher is not None:
     46         return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 47     return trainer_fn(*args, **kwargs)
     49 except _TunerExitException:
     50     _call_teardown_hook(trainer)

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py:575, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    568 assert self.state.fn is not None
    569 ckpt_path = self._checkpoint_connector._select_ckpt_path(
    570     self.state.fn,
    571     ckpt_path,
    572     model_provided=True,
    573     model_connected=self.lightning_module is not None,
    574 )
--> 575 self._run(model, ckpt_path=ckpt_path)
    577 assert self.state.stopped
    578 self.training = False

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py:982, in Trainer._run(self, model, ckpt_path)
    977 self._signal_connector.register_signal_handlers()
    979 # ----------------------------
    980 # RUN THE TRAINER
    981 # ----------------------------
--> 982 results = self._run_stage()
    984 # ----------------------------
    985 # POST-Training CLEAN UP
    986 # ----------------------------
    987 log.debug(f"{self.__class__.__name__}: trainer tearing down")

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py:1024, in Trainer._run_stage(self)
   1022 if self.training:
   1023     with isolate_rng():
-> 1024         self._run_sanity_check()
   1025     with torch.autograd.set_detect_anomaly(self._detect_anomaly):
   1026         self.fit_loop.run()

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py:1053, in Trainer._run_sanity_check(self)
   1050 call._call_callback_hooks(self, "on_sanity_check_start")
   1052 # run eval step
-> 1053 val_loop.run()
   1055 call._call_callback_hooks(self, "on_sanity_check_end")
   1057 # reset logger connector

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/lightning/pytorch/loops/utilities.py:179, in _no_grad_context.<locals>._decorator(self, *args, **kwargs)
    177     context_manager = torch.no_grad
    178 with context_manager():
--> 179     return loop_run(self, *args, **kwargs)

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/lightning/pytorch/loops/evaluation_loop.py:119, in _EvaluationLoop.run(self)
    117 @_no_grad_context
    118 def run(self) -> list[_OUT_DICT]:
--> 119     self.setup_data()
    120     if self.skip:
    121         return []

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/lightning/pytorch/loops/evaluation_loop.py:201, in _EvaluationLoop.setup_data(self)
    198 self._max_batches = []
    199 for dl in combined_loader.flattened:
    200     # determine number of batches
--> 201     length = len(dl) if has_len_all_ranks(dl, trainer.strategy, allow_zero_length) else float("inf")
    202     limit_batches = getattr(trainer, f"limit_{stage.dataloader_prefix}_batches")
    203     num_batches = _parse_num_batches(stage, length, limit_batches)

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/lightning/pytorch/utilities/data.py:99, in has_len_all_ranks(dataloader, strategy, allow_zero_length_dataloader_with_multiple_devices)
     93 def has_len_all_ranks(
     94     dataloader: object,
     95     strategy: "pl.strategies.Strategy",
     96     allow_zero_length_dataloader_with_multiple_devices: bool = False,
     97 ) -> TypeGuard[Sized]:
     98     """Checks if a given object has ``__len__`` method implemented on all ranks."""
---> 99     local_length = sized_len(dataloader)
    100     if local_length is None:
    101         # __len__ is not defined, skip these checks
    102         return False

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/lightning/fabric/utilities/data.py:52, in sized_len(dataloader)
     49 """Try to get the length of an object, return ``None`` otherwise."""
     50 try:
     51     # try getting the length
---> 52     length = len(dataloader)  # type: ignore [arg-type]
     53 except (TypeError, NotImplementedError):
     54     length = None

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/torch/utils/data/dataloader.py:532, in DataLoader.__len__(self)
    530     return length
    531 else:
--> 532     return len(self._index_sampler)

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/torch/utils/data/sampler.py:365, in BatchSampler.__len__(self)
    363     return len(self.sampler) // self.batch_size  # type: ignore[arg-type]
    364 else:
--> 365     return (len(self.sampler) + self.batch_size - 1) // self.batch_size

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/torch/utils/data/sampler.py:128, in SequentialSampler.__len__(self)
    127 def __len__(self) -> int:
--> 128     return len(self.data_source)

File ~/mambaforge/envs/mambular/lib/python3.12/site-packages/mambular/data_utils/dataset.py:47, in MambularDataset.__len__(self)
     46 def __len__(self):
---> 47     return len(self.num_features_list[0])

IndexError: list index out of range

Desktop (please complete the following information):

OS: Ubuntu 24.04
Python 3.12.9
Mambular Version 1.2.1

Here is my conda.yaml to create the env:

name: mambular
channels:
  - rapidsai
  - conda-forge
  - nvidia
  - defaults
dependencies:
  - pip
  - pip:
    - mambular
    - 'polars[excel,pyarrow]'
  - conda-forge::jupyterlab
  - conda-forge::shap

Additional context
the df.csv has been attached.

df.csv

The text was updated successfully, but these errors were encountered:

AnFreTh · 2025-03-02T10:10:36Z

Thanks for raising this. As a quick fix, removing the .values from the X, should solve the issue.
But lets leave the issue open such that we can implement a check/fix in the package.

(X_train, X_test, y_train, y_test) = train_test_split(
    df[[i for i in df.columns if "f" in i]], df[["y"]].values.squeeze(-1), random_state=42
)

As a side note: I would suggest normalizing your targets before training. This is not done internally, but should be done before training.

YuanfengZhang · 2025-03-02T10:14:43Z

Thank you so much. I didn't expect the differnce between X and y.
When removing .values for both, erorr happens: ValueError: could not determine the shape of object type 'DataFrame'
When adding .values for both, the IndexError: list index out of range appears.

AnFreTh · 2025-03-02T10:17:53Z

Yes it's definitely a mistake from our side. We'll fix it in the next release. This shouldn't happen and you are absolutely correct that there should not be a difference.

YuanfengZhang added the bug Something isn't working label Mar 2, 2025

YuanfengZhang mentioned this issue Mar 3, 2025

[FAQ] sklearn raise error: ValueError: not enough values to unpack (expected 2, got 1) #236

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] list index out of range from dataloader #235

[BUG] list index out of range from dataloader #235

YuanfengZhang commented Mar 2, 2025

AnFreTh commented Mar 2, 2025 •

edited

Loading

YuanfengZhang commented Mar 2, 2025

AnFreTh commented Mar 2, 2025

[BUG] list index out of range from dataloader #235

[BUG] list index out of range from dataloader #235

Comments

YuanfengZhang commented Mar 2, 2025

AnFreTh commented Mar 2, 2025 • edited Loading

YuanfengZhang commented Mar 2, 2025

AnFreTh commented Mar 2, 2025

AnFreTh commented Mar 2, 2025 •

edited

Loading