You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to load the dataset and simply count the number of examples from the latest version in oscar-corpus/community-oscar, but it seems the dataset is broken?
Simple testcase
from datasets import load_dataset
ds = load_dataset('oscar-corpus/community-oscar', data_files='data/2024-22/af_meta/*.jsonl.zst', split='train', streaming=True)
count = 0
for _ in ds:
count += 1
print(count)
Another test case: I also try to decompress the zst file to get the jsonl file, then load again, but still get the same error
from datasets import load_dataset
ds = load_dataset('json', data_files='DATA_PATH/af_meta.jsonl', split='train', streaming=True)
count = 0
for _ in ds:
count += 1
print(count)
Hi,
I am trying to load the dataset and simply count the number of examples from the latest version in
oscar-corpus/community-oscar
, but it seems the dataset is broken?zst
file to get thejsonl
file, then load again, but still get the same errorOther testcase
OSCAR-2301
using the same codejson.load()
python package
@Uinelj Anyone can help to solve this? Thanks!
The text was updated successfully, but these errors were encountered: