You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from datasets import load_dataset
dataset_name = "Papersnake/people_daily_news"
dataset = load_dataset(dataset_name,cache_dir=r'xxx/')
错误信息:
An error occurred while generating the dataset
All the data files must have the same columns, but at some point there are 2 missing columns ({'author', 'page'})
This happened while the json dataset builder was generating data using
..\downloads\d434406d0e80132d996bc6796817699b81390d86744e10acda0ec2ea71fead71
Please either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)
Traceback (most recent call last):
File "_pydevd_bundle/pydevd_cython.pyx", line 546, in _pydevd_bundle.pydevd_cython.PyDBFrame._handle_exception
File "C:\Program Files\Python39\lib\linecache.py", line 26, in getline
def getline(filename, lineno, module_globals=None):
File "C:\Program Files\Python39\lib\linecache.py", line 36, in getlines
def getlines(filename, module_globals=None):
File "C:\Program Files\Python39\lib\linecache.py", line 80, in updatecache
def updatecache(filename, module_globals=None):
File "C:\Program Files\Python39\lib\codecs.py", line 319, in decode
def decode(self, input, final=False):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 41: invalid start byte
0.03s - Error on build_exception_info_response.
Traceback (most recent call last):
File "c:\program files\microsoft visual studio\2022\community\common7\ide\extensions\microsoft\python\core\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_comm.py", line 1404, in build_exception_info_response
def build_exception_info_response(dbg, thread_id, request_seq, set_additional_thread_info, iter_visible_frames_info, max_frames):
File "C:\Program Files\Python39\lib\linecache.py", line 26, in getline
def getline(filename, lineno, module_globals=None):
File "C:\Program Files\Python39\lib\linecache.py", line 36, in getlines
def getlines(filename, module_globals=None):
File "C:\Program Files\Python39\lib\linecache.py", line 80, in updatecache
def updatecache(filename, module_globals=None):
File "C:\Program Files\Python39\lib\codecs.py", line 319, in decode
def decode(self, input, final=False):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 41: invalid start byte
0.03s - Error on build_exception_info_response.
Traceback (most recent call last):
File "c:\program files\microsoft visual studio\2022\community\common7\ide\extensions\microsoft\python\core\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_comm.py", line 1404, in build_exception_info_response
def build_exception_info_response(dbg, thread_id, request_seq, set_additional_thread_info, iter_visible_frames_info, max_frames):
File "C:\Program Files\Python39\lib\linecache.py", line 26, in getline
def getline(filename, lineno, module_globals=None):
File "C:\Program Files\Python39\lib\linecache.py", line 36, in getlines
def getlines(filename, module_globals=None):
File "C:\Program Files\Python39\lib\linecache.py", line 80, in updatecache
def updatecache(filename, module_globals=None):
File "C:\Program Files\Python39\lib\codecs.py", line 319, in decode
def decode(self, input, final=False):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 41: invalid start byte
下载用的代码:
错误信息:
打开看了对应的文件,内容是这个:
{"url": "hf://datasets/Papersnake/people_daily_news@e61323bc7692312d907fc2d154b4ffc4290ce496/2004.jsonl.gz", "etag": null}
The text was updated successfully, but these errors were encountered: