Skip to content

Streaming dataset looses .feature method after .add_column #5752

@sanchit-gandhi

Description

@sanchit-gandhi

Describe the bug

After appending a new column to a streaming dataset using .add_column, we can no longer access the list of dataset features using the .feature method.

Steps to reproduce the bug

from datasets import load_dataset

original_dataset = load_dataset("librispeech_asr", "clean", split="validation", streaming=True)
print(original_dataset.features.keys())

# now add a new column to our streaming dataset
modified_dataset = original_dataset.add_column("new_column", ["some random text" for _ in range(50)])
print(modified_dataset.features.keys())

Print Output:

dict_keys(['file', 'audio', 'text', 'speaker_id', 'chapter_id', 'id'])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[1], line 8
      6 # now add a new column to our streaming dataset
      7 modified_dataset = original_dataset.add_column("new_column", ["some random text" for _ in range(50)])
----> 8 print(modified_dataset.features.keys())

AttributeError: 'NoneType' object has no attribute 'keys'

We see that we get the features for the original dataset, but not the modified one with the added column.

Expected behavior

Features should be persevered after adding a new column, i.e. calling:

print(modified_dataset.features.keys())

Should return:

dict_keys(['file', 'audio', 'text', 'speaker_id', 'chapter_id', 'id', 'new_column'])

Environment info

  • datasets version: 2.10.2.dev0
  • Platform: Linux-4.19.0-23-cloud-amd64-x86_64-with-glibc2.28
  • Python version: 3.9.16
  • Huggingface_hub version: 0.13.3
  • PyArrow version: 10.0.1
  • Pandas version: 1.5.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions