Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

obs2ioda-v2.x overwrites existing hdf5 files #19

Open
st-ncar opened this issue Nov 11, 2024 · 1 comment
Open

obs2ioda-v2.x overwrites existing hdf5 files #19

st-ncar opened this issue Nov 11, 2024 · 1 comment

Comments

@st-ncar
Copy link
Collaborator

st-ncar commented Nov 11, 2024

subroutine open_netcdf_for_write(fname,ncfileid)
character(len=*), intent(in) :: fname
integer, intent(out) :: ncfileid
! create nc file
!ncstatus = nf90_create(path=trim(adjustl(fname)),cmode=nf90_clobber,ncid=ncfileid)
ncstatus = nf90_create(path=trim(adjustl(fname)),cmode=NF90_NETCDF4,ncid=ncfileid)
if ( ncstatus /= 0 ) then
write(0,fmt='(a)') 'error creating netcdf file '//trim(adjustl(fname))
write(0,*) 'ierr = ', ncstatus
stop
endif
return
end subroutine open_netcdf_for_write

This issue is related to #9 but occurs in a different context.

The NF90_NETCDF4 creation mode in the call to nf90_create overwrites existing files. This can lead to undesired data loss in the hdf5 files. For example, process and convert two consecutive BUFR files to hourly data:

Convert the first BUFR file by executing
$ ./obs2ioda-v2.x -split gdas.satwnd.t00z.20180415.bufr
This generates the files
satwnd_obs_2018041421.h5
satwnd_obs_2018041422.h5
[…]
satwnd_obs_2018041503.h5

Convert the second BUFR file:
$ ./obs2ioda-v2.x -split gdas.satwnd.t06z.20180415.bufr
This generates the following files
satwnd_obs_2018041503.h5. (the content of the old file is overwritten and the data from the first BUFR file missing)
satwnd_obs_2018041504.h5
[…]
satwnd_obs_2018041509.h5

To prevent any data loss, I suggest we check if a file already exists and if so, abort the execution. I am not aware that it is possible to concurrently use the NF90_NOCLOBBER and NF90_NETCDF4 access modes, but adding this sample code at the beginning of subroutine open_netcdf_for_write would achieve the desired goal:

logical :: file_exists
inquire(file = trim(adjustl(fname)), exist = file_exists)
if (file_exists) then
    write(*,*) 'File ',  trim(adjustl(fname)), ' already exists. Exiting.'
    stop
endif

In the future, we may want to improve the file handling. For example, if obs2ioda-v2.x tries to overwrite an existing file, we can check if max_datetime stored in the file is smaller than min_datetime of the newly processed data and if so, append them.

Adding @liujake @junmeiban @ibanos90 . Please let me know if I am missing something or if you disagree with my interpretation. Thank you!

@ibanos90
Copy link
Collaborator

Hi @st-ncar, I am not sure if I understand the issue correctly, but we usually convert the observations for each analysis time in separate folders. For example, for this case ./obs2ioda-v2.x -split gdas.satwnd.t00z.20180415.bufr the output files will go the folder 2018041500, and then ./obs2ioda-v2.x -split gdas.satwnd.t06z.20180415.bufr will go to 2018041506. Therefore, no files will be overwritten, if I am not mistaken. Does that make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants