-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
After appending a new column to a streaming dataset using .add_column, we can no longer access the list of dataset features using the .feature method.
Steps to reproduce the bug
from datasets import load_dataset
original_dataset = load_dataset("librispeech_asr", "clean", split="validation", streaming=True)
print(original_dataset.features.keys())
# now add a new column to our streaming dataset
modified_dataset = original_dataset.add_column("new_column", ["some random text" for _ in range(50)])
print(modified_dataset.features.keys())Print Output:
dict_keys(['file', 'audio', 'text', 'speaker_id', 'chapter_id', 'id'])
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[1], line 8
6 # now add a new column to our streaming dataset
7 modified_dataset = original_dataset.add_column("new_column", ["some random text" for _ in range(50)])
----> 8 print(modified_dataset.features.keys())
AttributeError: 'NoneType' object has no attribute 'keys'
We see that we get the features for the original dataset, but not the modified one with the added column.
Expected behavior
Features should be persevered after adding a new column, i.e. calling:
print(modified_dataset.features.keys())Should return:
dict_keys(['file', 'audio', 'text', 'speaker_id', 'chapter_id', 'id', 'new_column'])
Environment info
datasetsversion: 2.10.2.dev0- Platform: Linux-4.19.0-23-cloud-amd64-x86_64-with-glibc2.28
- Python version: 3.9.16
- Huggingface_hub version: 0.13.3
- PyArrow version: 10.0.1
- Pandas version: 1.5.2
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working