`dataset.from_files` always returns empty #1896

Peter9192 · 2023-01-18T11:59:38Z

Describe the bug
I tried out the new dataset facet search functionality following the example in https://github.com/ESMValGroup/ESMValCore/blob/main/notebooks/discovering-data.ipynb. However, I never seem to get any results. I tried with the same query, and also with CMIP5 instead of CMIP6.

Upon further investigation it looks like the problem for CMIP5 at least lies in the filtering out of identical facetsets. In my case, it finds 2 local files on my laptop, for which

ESMValCore/esmvalcore/dataset.py

Line 111 in 69a284d

facets = dict(file.facets)

returns an empty dict. There seem to be some files on ESGF that do not have a complete facetset either.

Therefore, the same checker:

ESMValCore/esmvalcore/dataset.py

Lines 98 to 100 in 69a284d

    
           def same(facets_a, facets_b): 
        
               """Define when two sets of facets are the same.""" 
        
               return facets_a.issubset(facets_b) or facets_b.issubset(facets_a)

will always see the empty (and otherwise the incomplete) set as a subset of every other set. This results in all files being filtered out.

Changing the same function from above to facets_a.issubset(facets_b) fixes the issue for me, but now it also returns incomplete facetsets which still have wildcards in them. Can we somehow require that the facetset must be complete?

The text was updated successfully, but these errors were encountered:

bouweandela · 2023-01-18T15:16:29Z

Thanks for reporting the issue! What are the names and paths of the files you have on your computer and what settings are you using in config-user.yml?

Peter9192 · 2023-01-18T16:49:57Z

CFG


Config({'always_search_esgf': True,
        'auxiliary_data_dir': PosixPath('/home/peter/ESMValGroup/ESMValCore/auxiliary_data'),
        'check_level': <CheckLevels.DEFAULT: 3>,
        'compress_netcdf': False,
        'config_developer_file': PosixPath('/home/peter/ESMValGroup/ESMValCore/esmvalcore/config-developer.yml'),
        'config_file': PosixPath('/home/peter/.esmvaltool/config-user.yml'),
        'diagnostics': None,
        'download_dir': PosixPath('/home/peter/climate_data'),
        'drs': {'CMIP5': 'default',
                'CMIP6': 'default',
                'CORDEX': 'default',
                'OBS': 'default',
                'native6': 'default'},
        'exit_on_warning': False,
        'extra_facets_dir': (PosixPath('/home/peter/.esmvaltool'),),
        'log_level': 'debug',
        'max_datasets': None,
        'max_parallel_tasks': 1,
        'max_years': None,
        'offline': False,
        'output_dir': PosixPath('/home/peter/esmvaltool_output'),
        'output_file_type': 'png',
        'profile_diagnostic': False,
        'remove_preproc_dir': False,
        'resume_from': [],
        'rootpath': {'CMIP5': [PosixPath('/mnt/c/Users/PeterKalverla/climatedata/cmip5')],
                     'CMIP6': [PosixPath('/mnt/c/Users/PeterKalverla/climatedata/cmip6')],
                     'OBS': [PosixPath('/mnt/c/Users/PeterKalverla/climatedata/obs')],
                     'OBS6': [PosixPath('/mnt/c/Users/PeterKalverla/climatedata/obs6')],
                     'RAWOBS': [PosixPath('/mnt/c/Users/PeterKalverla/climatedata/rawobs')],
                     'default': [PosixPath('/mnt/c/Users/PeterKalverla/climatedata')],
                     'native6': [PosixPath('/mnt/c/Users/PeterKalverla/climatedata/native6')]},
        'run_diagnostic': True,
        'save_intermediary_cubes': False,
        'skip_nonexistent': False})

Local file:

LocalFile('/mnt/c/Users/PeterKalverla/climatedata/cmip6/tas_Amon_CMCC-ESM2_historical_r1i1p1f1_gn_185001-201412.nc'),

bouweandela · 2023-01-18T18:27:44Z

The facets of the local files are read from the subdirectories in which the files are stored relative to the rootpath. In your case, this fails because the data on your computer is not using any subdirectories per facet. I'll add a note about this to the documentation and think a bit about how to improve the way the duplicate sets of facets are removed from the list. Maybe we could try to keep only those sets that have the largest number of facets?

bouweandela · 2023-02-27T09:36:25Z

@Peter9192 This should work better now, let me know if you still encounter problems.

Peter9192 · 2023-02-27T09:50:01Z

@bouweandela thanks, I noticed #1609 and #1924. I'm currently trying it out.

Peter9192 assigned bouweandela Jan 18, 2023

bouweandela mentioned this issue Feb 21, 2023

Support wildcards in the recipe and improve support for ancillary variables and dataset versioning #1609

Merged

9 tasks

remi-kazeroni closed this as completed in #1609 Feb 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`dataset.from_files` always returns empty #1896

`dataset.from_files` always returns empty #1896

Peter9192 commented Jan 18, 2023 •

edited

Loading

bouweandela commented Jan 18, 2023

Peter9192 commented Jan 18, 2023

bouweandela commented Jan 18, 2023 •

edited

Loading

bouweandela commented Feb 27, 2023

Peter9192 commented Feb 27, 2023

dataset.from_files always returns empty #1896

dataset.from_files always returns empty #1896

Comments

Peter9192 commented Jan 18, 2023 • edited Loading

bouweandela commented Jan 18, 2023

Peter9192 commented Jan 18, 2023

bouweandela commented Jan 18, 2023 • edited Loading

bouweandela commented Feb 27, 2023

Peter9192 commented Feb 27, 2023

`dataset.from_files` always returns empty #1896

`dataset.from_files` always returns empty #1896

Peter9192 commented Jan 18, 2023 •

edited

Loading

bouweandela commented Jan 18, 2023 •

edited

Loading