Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset.from_files always returns empty #1896

Closed
Peter9192 opened this issue Jan 18, 2023 · 5 comments · Fixed by #1609
Closed

dataset.from_files always returns empty #1896

Peter9192 opened this issue Jan 18, 2023 · 5 comments · Fixed by #1609
Assignees

Comments

@Peter9192
Copy link
Contributor

Peter9192 commented Jan 18, 2023

Describe the bug
I tried out the new dataset facet search functionality following the example in https://github.com/ESMValGroup/ESMValCore/blob/main/notebooks/discovering-data.ipynb. However, I never seem to get any results. I tried with the same query, and also with CMIP5 instead of CMIP6.

Upon further investigation it looks like the problem for CMIP5 at least lies in the filtering out of identical facetsets. In my case, it finds 2 local files on my laptop, for which

facets = dict(file.facets)

returns an empty dict. There seem to be some files on ESGF that do not have a complete facetset either.

Therefore, the same checker:

def same(facets_a, facets_b):
"""Define when two sets of facets are the same."""
return facets_a.issubset(facets_b) or facets_b.issubset(facets_a)

will always see the empty (and otherwise the incomplete) set as a subset of every other set. This results in all files being filtered out.

Changing the same function from above to facets_a.issubset(facets_b) fixes the issue for me, but now it also returns incomplete facetsets which still have wildcards in them. Can we somehow require that the facetset must be complete?

@bouweandela
Copy link
Member

Thanks for reporting the issue! What are the names and paths of the files you have on your computer and what settings are you using in config-user.yml?

@Peter9192
Copy link
Contributor Author

CFG

Config({'always_search_esgf': True,
        'auxiliary_data_dir': PosixPath('/home/peter/ESMValGroup/ESMValCore/auxiliary_data'),
        'check_level': <CheckLevels.DEFAULT: 3>,
        'compress_netcdf': False,
        'config_developer_file': PosixPath('/home/peter/ESMValGroup/ESMValCore/esmvalcore/config-developer.yml'),
        'config_file': PosixPath('/home/peter/.esmvaltool/config-user.yml'),
        'diagnostics': None,
        'download_dir': PosixPath('/home/peter/climate_data'),
        'drs': {'CMIP5': 'default',
                'CMIP6': 'default',
                'CORDEX': 'default',
                'OBS': 'default',
                'native6': 'default'},
        'exit_on_warning': False,
        'extra_facets_dir': (PosixPath('/home/peter/.esmvaltool'),),
        'log_level': 'debug',
        'max_datasets': None,
        'max_parallel_tasks': 1,
        'max_years': None,
        'offline': False,
        'output_dir': PosixPath('/home/peter/esmvaltool_output'),
        'output_file_type': 'png',
        'profile_diagnostic': False,
        'remove_preproc_dir': False,
        'resume_from': [],
        'rootpath': {'CMIP5': [PosixPath('/mnt/c/Users/PeterKalverla/climatedata/cmip5')],
                     'CMIP6': [PosixPath('/mnt/c/Users/PeterKalverla/climatedata/cmip6')],
                     'OBS': [PosixPath('/mnt/c/Users/PeterKalverla/climatedata/obs')],
                     'OBS6': [PosixPath('/mnt/c/Users/PeterKalverla/climatedata/obs6')],
                     'RAWOBS': [PosixPath('/mnt/c/Users/PeterKalverla/climatedata/rawobs')],
                     'default': [PosixPath('/mnt/c/Users/PeterKalverla/climatedata')],
                     'native6': [PosixPath('/mnt/c/Users/PeterKalverla/climatedata/native6')]},
        'run_diagnostic': True,
        'save_intermediary_cubes': False,
        'skip_nonexistent': False})

Local file:

LocalFile('/mnt/c/Users/PeterKalverla/climatedata/cmip6/tas_Amon_CMCC-ESM2_historical_r1i1p1f1_gn_185001-201412.nc'),

@bouweandela
Copy link
Member

bouweandela commented Jan 18, 2023

The facets of the local files are read from the subdirectories in which the files are stored relative to the rootpath. In your case, this fails because the data on your computer is not using any subdirectories per facet. I'll add a note about this to the documentation and think a bit about how to improve the way the duplicate sets of facets are removed from the list. Maybe we could try to keep only those sets that have the largest number of facets?

@bouweandela
Copy link
Member

@Peter9192 This should work better now, let me know if you still encounter problems.

@Peter9192
Copy link
Contributor Author

@bouweandela thanks, I noticed #1609 and #1924. I'm currently trying it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants