Adding new splits to a dataset script with existing old splits info in metadata's `dataset_info` fails

### Describe the bug

If you first create a custom dataset with a specific set of splits, generate metadata with `datasets-cli test ... --save_info`, then change your script to include more splits, it fails.

That's what happened in https://huggingface.co/datasets/mrdbourke/food_vision_199_classes/discussions/2#6385fd1269634850f8ddff48.

### Steps to reproduce the bug

1. create a dataset with a custom split that returns, for example, only `"train"` split in `_splits_generators'`. specifically, if really want to reproduce, copy `https://huggingface.co/datasets/mrdbourke/food_vision_199_classes/blob/main/food_vision_199_classes.py 
2. run `datasets-cli test dataset_script.py --save_info --all_configs` - this would generate metadata yaml in `README.md` that would contain info about splits, for example, like this:
```
  splits:
  - name: train
    num_bytes: 2973286
    num_examples: 19747
```
3. make changes to your script so that it returns another set of splits, for example, `"train"` and `"test"` (uncomment [these lines](https://huggingface.co/datasets/mrdbourke/food_vision_199_classes/blob/main/food_vision_199_classes.py#L271))
4. run `load_dataset` and get the following error:
```python
Traceback (most recent call last):
  File "/home/daniel/code/pytorch/env/bin/datasets-cli", line 8, in <module>
    sys.exit(main())
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/datasets/commands/datasets_cli.py", line 39, in main
    service.run()
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/datasets/commands/test.py", line 141, in run
    builder.download_and_prepare(
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/datasets/builder.py", line 822, in download_and_prepare
    self._download_and_prepare(
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/datasets/builder.py", line 1555, in _download_and_prepare
    super()._download_and_prepare(
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/datasets/builder.py", line 913, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/datasets/builder.py", line 1356, in _prepare_split
    split_info = self.info.splits[split_generator.name]
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/datasets/splits.py", line 525, in __getitem__
    instructions = make_file_instructions(
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/datasets/arrow_reader.py", line 111, in make_file_instructions
    name2filenames = {
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/datasets/arrow_reader.py", line 112, in <dictcomp>
    info.name: filenames_for_dataset_split(
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/datasets/naming.py", line 78, in filenames_for_dataset_split
    prefix = filename_prefix_for_split(dataset_name, split)
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/datasets/naming.py", line 57, in filename_prefix_for_split
    if os.path.basename(name) != name:
  File "/home/daniel/code/pytorch/env/lib/python3.8/posixpath.py", line 143, in basename
    p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType
```
5. bonus: try to regenerate metadata in `README.md` with `datasets-cli` as in step 2 and get the same error.


This is because `dataset.info.splits` contains only `"train"` split so when we are doing `self.info.splits[split_generator.name]` it tries to infer smth like `info.splits['train[50%]']` and that's not the case and it fails.

### Expected behavior

to be discussed? 

This can be solved by removing splits information from metadata file first. But I wonder if there is a better way. 

### Environment info

- Datasets version: 2.7.1
- Python version: 3.8.13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding new splits to a dataset script with existing old splits info in metadata's `dataset_info` fails #5315

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Adding new splits to a dataset script with existing old splits info in metadata's dataset_info fails #5315

Description

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Adding new splits to a dataset script with existing old splits info in metadata's `dataset_info` fails #5315