obs2ioda-v2.x overwrites existing hdf5 files #19

st-ncar · 2024-11-11T23:44:31Z

Lines 169 to 183 in 2d8033b

    
           subroutine open_netcdf_for_write(fname,ncfileid) 
        
              character(len=*), intent(in) :: fname 
        
              integer, intent(out) :: ncfileid 
        
              ! create nc file 
        
              !ncstatus = nf90_create(path=trim(adjustl(fname)),cmode=nf90_clobber,ncid=ncfileid) 
        
              ncstatus = nf90_create(path=trim(adjustl(fname)),cmode=NF90_NETCDF4,ncid=ncfileid) 
        
              if ( ncstatus /= 0 ) then 
        
                 write(0,fmt='(a)') 'error creating netcdf file '//trim(adjustl(fname)) 
        
                 write(0,*) 'ierr = ', ncstatus 
        
                 stop 
        
              endif 
        
              return 
        
           end subroutine open_netcdf_for_write

This issue is related to #9 but occurs in a different context.

The NF90_NETCDF4 creation mode in the call to nf90_create overwrites existing files. This can lead to undesired data loss in the hdf5 files. For example, process and convert two consecutive BUFR files to hourly data:

Convert the first BUFR file by executing
$ ./obs2ioda-v2.x -split gdas.satwnd.t00z.20180415.bufr
This generates the files
satwnd_obs_2018041421.h5
satwnd_obs_2018041422.h5
[…]
satwnd_obs_2018041503.h5

Convert the second BUFR file:
$ ./obs2ioda-v2.x -split gdas.satwnd.t06z.20180415.bufr
This generates the following files
satwnd_obs_2018041503.h5. (the content of the old file is overwritten and the data from the first BUFR file missing)
satwnd_obs_2018041504.h5
[…]
satwnd_obs_2018041509.h5

To prevent any data loss, I suggest we check if a file already exists and if so, abort the execution. I am not aware that it is possible to concurrently use the NF90_NOCLOBBER and NF90_NETCDF4 access modes, but adding this sample code at the beginning of subroutine open_netcdf_for_write would achieve the desired goal:

logical :: file_exists
inquire(file = trim(adjustl(fname)), exist = file_exists)
if (file_exists) then
    write(*,*) 'File ',  trim(adjustl(fname)), ' already exists. Exiting.'
    stop
endif

In the future, we may want to improve the file handling. For example, if obs2ioda-v2.x tries to overwrite an existing file, we can check if max_datetime stored in the file is smaller than min_datetime of the newly processed data and if so, append them.

Adding @liujake @junmeiban @ibanos90 . Please let me know if I am missing something or if you disagree with my interpretation. Thank you!

The text was updated successfully, but these errors were encountered:

ibanos90 · 2024-11-12T16:40:45Z

Hi @st-ncar, I am not sure if I understand the issue correctly, but we usually convert the observations for each analysis time in separate folders. For example, for this case ./obs2ioda-v2.x -split gdas.satwnd.t00z.20180415.bufr the output files will go the folder 2018041500, and then ./obs2ioda-v2.x -split gdas.satwnd.t06z.20180415.bufr will go to 2018041506. Therefore, no files will be overwritten, if I am not mistaken. Does that make sense?

st-ncar mentioned this issue Nov 12, 2024

obs2ioda-v2.x processing limited to one 6h window #20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

obs2ioda-v2.x overwrites existing hdf5 files #19

obs2ioda-v2.x overwrites existing hdf5 files #19

st-ncar commented Nov 11, 2024 •

edited

Loading

ibanos90 commented Nov 12, 2024

obs2ioda-v2.x overwrites existing hdf5 files #19

obs2ioda-v2.x overwrites existing hdf5 files #19

Comments

st-ncar commented Nov 11, 2024 • edited Loading

ibanos90 commented Nov 12, 2024

st-ncar commented Nov 11, 2024 •

edited

Loading