-
Notifications
You must be signed in to change notification settings - Fork 159
Description
Describe the bug
I'm getting JSONDecodeError: Extra data: line 12 column 2 (char 2984) when attempting to load NWB files from the VisualBehavior cache.
This occurred after starting a process to download all NWB files to a local directory. I set out to do this to make it possible to do a summary analysis of all experiments. I was hoping to use the NWB files for this analysis in order to ensure that results were consistent with what an external users would get when doing the same analysis.
To Reproduce
I did the following to start the download process of all BehaviorOphysExperiment NWB files using 16 cores on my local machine:
import allensdk.brain_observatory.behavior.behavior_project_cache as bpc
from multiprocessing import Pool
data_storage_directory = "/allen/programs/braintv/workgroups/nc-ophys/visual_behavior/production_cache/" # Note: this path must exist on your local drive
cache = bpc.VisualBehaviorOphysProjectCache.from_s3_cache(cache_dir=data_storage_directory)
experiment_table = cache.get_ophys_experiment_table()
oeids = experiment_table.index.values
def open_experiment(oeid):
print('oeid = {}'.format(oeid))
cache.get_behavior_ophys_experiment(oeid)
with Pool(16) as pool:
pool.map(open_experiment, oeids)Expected behavior
I expected this process to take some number of hours to complete. At the end, I expected all NWB files to be in the data_storage_directory defined above
Actual Behavior
The process started running as expected. After approximately 20 NWB files had been downloaded, I got the following error:
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
<ipython-input-18-8199a0378505> in <module>
1 oeid = 993891850
----> 2 cache.get_behavior_ophys_experiment(oeid)
~/anaconda3/envs/vba/lib/python3.8/site-packages/allensdk/brain_observatory/behavior/behavior_project_cache/behavior_project_cache.py in get_behavior_ophys_experiment(self, ophys_experiment_id, fixed)
515 fetch_session = partial(self.fetch_api.get_behavior_ophys_experiment,
516 ophys_experiment_id)
--> 517 return call_caching(
518 fetch_session,
519 lambda x: x, # not writing anything
~/anaconda3/envs/vba/lib/python3.8/site-packages/allensdk/api/warehouse_cache/caching_utilities.py in call_caching(fetch, write, read, pre_write, cleanup, lazy, num_tries, failure_message)
94 if not lazy or read is None:
95 logger.info("Fetching data from remote")
---> 96 data = fetch()
97 if pre_write is not None:
98 data = pre_write(data)
~/anaconda3/envs/vba/lib/python3.8/site-packages/allensdk/brain_observatory/behavior/behavior_project_cache/project_apis/data_io/behavior_project_cloud_api.py in get_behavior_ophys_experiment(self, ophys_experiment_id)
253 f" there are {row.shape[0]} entries.")
254 file_id = str(int(row[self.cache.file_id_column]))
--> 255 data_path = self._get_data_path(file_id=file_id)
256 return BehaviorOphysExperiment.from_nwb_path(str(data_path))
257
~/anaconda3/envs/vba/lib/python3.8/site-packages/allensdk/brain_observatory/behavior/behavior_project_cache/project_apis/data_io/behavior_project_cloud_api.py in _get_data_path(self, file_id)
347 data_path = self._get_local_path(file_id=file_id)
348 else:
--> 349 data_path = self.cache.download_data(file_id=file_id)
350 return data_path
351
~/anaconda3/envs/vba/lib/python3.8/site-packages/allensdk/api/cloud_cache/cloud_cache.py in download_data(self, file_id)
621 If the file cannot be downloaded
622 """
--> 623 super_attributes = self.data_path(file_id)
624 file_attributes = super_attributes['file_attributes']
625 self._download_file(file_attributes)
~/anaconda3/envs/vba/lib/python3.8/site-packages/allensdk/api/cloud_cache/cloud_cache.py in data_path(self, file_id)
592 """
593 file_attributes = self._manifest.data_file_attributes(file_id)
--> 594 exists = self._file_exists(file_attributes)
595 local_path = file_attributes.local_path
596 output = {'local_path': local_path,
~/anaconda3/envs/vba/lib/python3.8/site-packages/allensdk/api/cloud_cache/cloud_cache.py in _file_exists(self, file_attributes)
560
561 if not file_exists:
--> 562 file_exists = self._check_for_identical_copy(file_attributes)
563
564 return file_exists
~/anaconda3/envs/vba/lib/python3.8/site-packages/allensdk/api/cloud_cache/cloud_cache.py in _check_for_identical_copy(self, file_attributes)
502
503 with open(self._downloaded_data_path, 'rb') as in_file:
--> 504 available_files = json.load(in_file)
505
506 matched_path = None
~/anaconda3/envs/vba/lib/python3.8/json/__init__.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
291 kwarg; otherwise ``JSONDecoder`` is used.
292 """
--> 293 return loads(fp.read(),
294 cls=cls, object_hook=object_hook,
295 parse_float=parse_float, parse_int=parse_int,
~/anaconda3/envs/vba/lib/python3.8/json/__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
355 parse_int is None and parse_float is None and
356 parse_constant is None and object_pairs_hook is None and not kw):
--> 357 return _default_decoder.decode(s)
358 if cls is None:
359 cls = JSONDecoder
~/anaconda3/envs/vba/lib/python3.8/json/decoder.py in decode(self, s, _w)
338 end = _w(s, end).end()
339 if end != len(s):
--> 340 raise JSONDecodeError("Extra data", s, end)
341 return obj
342
JSONDecodeError: Extra data: line 12 column 2 (char 2984)Now, simply calling:
oeid = 993891850
cache.get_behavior_ophys_experiment(oeid)results in the same error as above for any oeid
Environment (please complete the following information):
- OS & version: Ubuntu 18.04
- Python version 3.8.8
- AllenSDK version 2.11.2
Additional context
I'm assuming that the parallel processing has somehow corrupted the manifest file. Is this true? If so, is there some other way to download the NWB files beyond what I tried above? Should I simply download them in a serial loop and wait however long it takes?
This is also related to a recent forum question (https://community.brain-map.org/t/visual-behavior-optical-physiology/1183), so I suspect external users will run into similar problems if attempting to parallelize the download process.
Do you want to work on this issue?
Yes, I'd like to help solve this.