Why does "diff" command tell two netcdf files differ when they should be exactly the same? #10349

jinqj · 2025-05-20T05:12:07Z

jinqj
May 20, 2025

What is your issue?

Hello, I have the following python code to write out a variable from WRF output. I accidentally found that if I run this script twice without changing anything (I rename the output files after each run, like test1.nc and test2.nc), then I use "diff test1.nc test2.nc", which says the two files differ. Based on my understanding, diff command should output nothing, i.e., the two files are exactly the same. After a deeper dig by running this script multiple times, sometimes it produced two identical files but sometimes they are different. The chance is higher to have two different files if the time between the two executions of this script is longer. Moreover, when I subtract the variable in the two files, their differences are zeroes in the entire model domain. I tested the script on two difference severs, it showed the same result. I am not sure if this is the right place to report such a weird behavior of xarray. Any suggestions/hints that could help me debug my code is appreciated.
Thanks a lot.
--Qinjian Jin

#!/usr/bin/env python
import numpy  as np
import pandas as pd
import xarray as xr

# =================| dir |====================================================
dir_2 = f'/glade/campaign/univ/ukul0004/africa/wrf5/fdda'
dir_o = f'/glade/work/qinjian'

ds2 = xr.open_mfdataset(f'{dir_2}/wrfout_d01_2020-06-05_00:00:00)

out = xr.Dataset()

data = ds2['AOD2D_OUT'].copy()
numb = ds2['AOD2D_OUT'].copy().rename(f'AOD2D_OUT_num')

out = xr.merge([data, numb])

out.to_netcdf(f'{dir_o}/test.nc')

kmuehlbauer · 2025-05-20T06:08:04Z

kmuehlbauer
May 20, 2025
Maintainer

Thanks @jinqj for raising. Please check if anything discussed in #10028 helps here.

If data and internal layout of your files are identical then the following should show no differences:

$ h5dump file1.nc > dump1.txt
$ h5dump file2.nc > dump2.txt
$ diff dump1.txt dump2.txt

For NetCDF4 files the underlying HDF5 stores creation/modification time to any objects which ~~will lead to~~ can be the cause of differences in the binary dump.

0 replies

jinqj · 2025-05-20T15:54:27Z

jinqj
May 20, 2025
Author

Thank you very much @kmuehlbauer! I followed your suggestion and here is what I get:

> h5dump test1.nc > log1
> h5dump test2.nc > log2
> diff log1 log2
1c1
< HDF5 "test1.nc" {
---
> HDF5 "test2.nc" {

diff still show they are different.

> diff test1.nc test2.nc
Binary files test1.nc and test2.nc differ

My WRF file is in NetCDF2 format. And my netcdf related libraries are:

# Name                    Version                   Build  Channel
h5netcdf                  1.3.0              pyhd8ed1ab_0    conda-forge
libnetcdf                 4.9.2           nompi_h9612171_113    conda-forge
netcdf-fortran            4.6.1           nompi_hacb5139_103    conda-forge
netcdf4                   1.6.5           nompi_py311he8ad708_100    conda-forge

The creation time stored in the .nc files (which are NetCDF4 files), as you suggested, may be the culprit. Do you know how I can check the creation time of an nc file?

0 replies

kmuehlbauer · 2025-05-23T10:09:54Z

kmuehlbauer
May 23, 2025
Maintainer

@jinqj I have to step back a bit here. Modification time etc. of a certain HDF5 object in the file are optional and a quick peek into a xarray generated file (engine="netcdf4") showed that there are none such times.

But what I found is that it sometimes happens that data offsets denoted in the HDF5 OHDR headers are pointing to the "wrong" data position. As in this case the data is actually the same (we did a copy), it doesn't matter to which data the offset is pointing to. I'm not sure why this happens, but for the HDF5 OHDR this also leads to different CRC32 checksums at the end of each HDF5 OHDR.

I'm adding a simple example to demonstrate the issue:

def create_xarray(num):
    temperature_data = np.array([10*num])
    time = np.array([num])
    ds = xr.Dataset(
        {
            "temperature": ("time", temperature_data),
        },
        coords={
            "time": time,
        },
    )
    ds.to_netcdf(f"test{num}.nc", format="NETCDF4")
    
def create_test(num, swap=False, engine="netcdf4"):
    flist = ["test1.nc", "test2.nc"]
    with xr.open_mfdataset(flist) as ds2:
        out = xr.Dataset()
        data = ds2['temperature'].copy()
        numb = ds2['temperature'].copy().rename('temperature_num')
        if swap:
            numb[:] = np.array([20, 10])
        out = xr.merge([data, numb])
        out.to_netcdf(f"test{num}.nc", engine=engine)

engine = "netcdf4"
create_xarray(1)
create_xarray(2)
for i in range(3,7):
    create_test(i, engine=engine)
for i in range(7,11):
    create_test(i, swap=True, engine=engine)

Create sha256 and hexdumps for comparison:

!sha256sum test1.nc
!sha256sum test2.nc
!sha256sum test3.nc
!sha256sum test4.nc
!sha256sum test5.nc
!sha256sum test6.nc
!sha256sum test7.nc
!sha256sum test8.nc
!sha256sum test9.nc
!sha256sum test10.nc
!xxd test3.nc > test3.hex
!xxd test4.nc > test4.hex
!xxd test5.nc > test5.hex
!xxd test6.nc > test6.hex
!xxd test7.nc > test7.hex
!xxd test8.nc > test8.hex
!xxd test9.nc > test9.hex
!xxd test10.nc > test10.hex

4a41eab85358aacbd2e50768c015c80e861e61fe70d5e8eaf9bc49cf1f563b59  test1.nc
a4ad098151cf9931c453ce92f793ad6e4238c56983e4d31005799de289a8b612  test2.nc
05f61eeea53ca1dae8cf22d25cf73a7ed34d27c7fa227222ad480c26a850b834  test3.nc
49782b1d5df1a09bb64a2cd3e4bae3dd82e278040e0c4809433be4d82ca8aef5  test4.nc
05f61eeea53ca1dae8cf22d25cf73a7ed34d27c7fa227222ad480c26a850b834  test5.nc
05f61eeea53ca1dae8cf22d25cf73a7ed34d27c7fa227222ad480c26a850b834  test6.nc
221f0c2696defa7a7fa716659426fbb8547dc9ef553168a02222f2a310026f9e  test7.nc
221f0c2696defa7a7fa716659426fbb8547dc9ef553168a02222f2a310026f9e  test8.nc
221f0c2696defa7a7fa716659426fbb8547dc9ef553168a02222f2a310026f9e  test9.nc
221f0c2696defa7a7fa716659426fbb8547dc9ef553168a02222f2a310026f9e  test10.nc

We can observe that our two input files are different. We've got two versions of the first iteration (with the dataset copy) and only one version of the second iteration (with the changed dataset). Let's compare the hexdumps (I've tried to make the differences bold). The a602/b602 are the data offsets and the other 4-byte differences are the CRC32 checksums.


!diff test3.hex test4.hex
175c175
< 00000ae0: 0000 0000 0301 b602 0000 0000 0000 1000  ................
---
> 00000ae0: 0000 0000 0301 a602 0000 0000 0000 1000  ................
186,187c186,187
< 00000b90: 0000 0000 0000 0000 0000 0000 0000 753a  ..............u:
< 00000ba0: 8f16 4f48 4452 020d 0001 0114 0000 0000  ..OHDR..........
---
> 00000b90: 0000 0000 0000 0000 0000 0000 0000 bdb9  ................
> 00000ba0: 924c 4f48 4452 020d 0001 0114 0000 0000  .LOHDR..........
192c192
< 00000bf0: 0301 a602 0000 0000 0000 1000 0000 0000  ................
---
> 00000bf0: 0301 b602 0000 0000 0000 1000 0000 0000  ................
203c203
< 00000ca0: 0000 0000 0000 0000 0000 f295 1dcb 4f43  ..............OC
---
> 00000ca0: 0000 0000 0000 0000 0000 1e44 0999 4f43  ...........D..OC

This compares now one of the first cycle with the swapped data version. We can see that we got now the data differences added to the above differences (0a00/1400).


!diff test3.hex test7.hex
44c44
< 000002b0: 0000 0000 0000 0a00 0000 0000 0000 1400  ................
---
> 000002b0: 0000 0000 0000 1400 0000 0000 0000 0a00  ................
175c175
< 00000ae0: 0000 0000 0301 b602 0000 0000 0000 1000  ................
---
> 00000ae0: 0000 0000 0301 a602 0000 0000 0000 1000  ................
186,187c186,187
< 00000b90: 0000 0000 0000 0000 0000 0000 0000 753a  ..............u:
< 00000ba0: 8f16 4f48 4452 020d 0001 0114 0000 0000  ..OHDR..........
---
> 00000b90: 0000 0000 0000 0000 0000 0000 0000 bdb9  ................
> 00000ba0: 924c 4f48 4452 020d 0001 0114 0000 0000  .LOHDR..........
192c192
< 00000bf0: 0301 a602 0000 0000 0000 1000 0000 0000  ................
---
> 00000bf0: 0301 b602 0000 0000 0000 1000 0000 0000  ................
203c203
< 00000ca0: 0000 0000 0000 0000 0000 f295 1dcb 4f43  ..............OC
---
> 00000ca0: 0000 0000 0000 0000 0000 1e44 0999 4f43  ...........D..OC

If we compare the second version of the first cycle with the swapped data, everything is in place, only the data has changed:


!diff test4.hex test7.hex
44c44
< 000002b0: 0000 0000 0000 0a00 0000 0000 0000 1400  ................
---
> 000002b0: 0000 0000 0000 1400 0000 0000 0000 0a00  ................

If we use engine="h5netcdf" this strange behaviour (offsets pointing to the wrong location) does not happen. But we still might get differing files since the modification times (Access time, Modification Time, Change Time, Birth Time) are encoded into the HDF5 OHDR. (Also the internal file order is different for the two engines). As the times are encoded as seconds since the epoch it will only have affect if the files/objects are not written in the same second.

You can use the low level debug tool h5debug to gain some more information from the files (48 is the OHDR address for the engine="netcdf4" in my case, other addresses might vary in your case too and also for engine="h5netcdf"):

# get overview
!h5debug test3.nc
# inspect ohdr address
!h5debug test3.nc 48
# inspect temperature object
!h5debug test3.nc 2710
# inspect temperature_new object
!h5debug test3.nc 2978

Show Output

Reading signature at address 0 (rel)
File Super Block...
File name (as opened):                             test3.nc
File name (after resolving symlinks):              test3.nc
File access flags                                  0x00000000
File open reference count:                         1
Address of super block:                            0 (abs)
Size of userblock:                                 0 bytes
Superblock version number:                         2
Free list version number:                          0
Root group symbol table entry version number:      0
Shared header version number:                      0
Size of file offsets (haddr_t type):               8 bytes
Size of file lengths (hsize_t type):               8 bytes
Symbol table leaf node 1/2 rank:                   4
Symbol table internal node 1/2 rank:               16
Indexed storage internal node 1/2 rank:            32
File status flags:                                 0x00
Superblock extension address:                      18446744073709551615 (rel)
Shared object header message table address:        18446744073709551615 (rel)
Shared object header message version number:       0
Number of shared object header message indexes:    0
Address of driver information block:               18446744073709551615 (rel)
Root group symbol table entry:                     
   Name offset into private heap:                  0
   Object header address:                          48
   Cache info type:                                Nothing Cached
------------------------------------------------------------------------
Reading signature at address 48 (rel)
Object Header...
Dirty:                                             FALSE
Version:                                           2
Header size (in bytes):                            11
Number of links:                                   1
Attribute creation order tracked:                  Yes
Attribute creation order indexed:                  Yes
Attribute storage phase change values:             Default
Timestamps:                                        Disabled
Number of messages (allocated):                    11 (16)
Number of chunks (allocated):                      3 (4)
Chunk 0...
   Address:                                        48
   Size in bytes:                                  180
   Gap:                                            0
Chunk 1...
   Address:                                        3246
   Size in bytes:                                  77
   Gap:                                            0
Chunk 2...
   Address:                                        579
   Size in bytes:                                  83
   Gap:                                            0
Message 0...
   Message ID (sequence number):                   0x0002 `linfo' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (13, 34) bytes
   Message Information:                           
      Track creation order of links:               TRUE
      Index creation order of links:               TRUE
      Number of links:                             18446744073709551615
      Max. creation order value:                   3
      'Dense' link storage fractal heap address:   18446744073709551615
      'Dense' link storage name index v2 B-tree address: 18446744073709551615
      'Dense' link storage creation order index v2 B-tree address: 18446744073709551615
Message 1...
   Message ID (sequence number):                   0x000a `ginfo' (0)
   Dirty:                                          FALSE
   Message flags:                                  <C>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (53, 2) bytes
   Message Information:                           
      Max. compact links:                          8
      Min. dense links:                            6
      Estimated # of objects in group:             4
      Estimated length of object in group's name:  8
Message 2...
   Message ID (sequence number):                   0x0010 `hdr continuation' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (61, 16) bytes
   Message Information:                           
      Continuation address:                        3246
      Continuation size in bytes:                  77
      Points to chunk number:                      1
Message 3...
   Message ID (sequence number):                   0x0000 `null' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (83, 1) bytes
   Message Information:                           
      <No info for this message>
Message 4...
   Message ID (sequence number):                   0x0015 `ainfo' (0)
   Dirty:                                          FALSE
   Message flags:                                  <DS>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (90, 28) bytes
   Message Information:                           
      Number of attributes:                        18446744073709551615
      Track creation order of attributes:          TRUE
      Index creation order of attributes:          TRUE
      Max. creation index value:                   1
      'Dense' attribute storage fractal heap address: 18446744073709551615
      'Dense' attribute storage name index v2 B-tree address: 18446744073709551615
      'Dense' attribute storage creation order index v2 B-tree address: 18446744073709551615
Message 5...
   Message ID (sequence number):                   0x0010 `hdr continuation' (1)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (124, 16) bytes
   Message Information:                           
      Continuation address:                        579
      Continuation size in bytes:                  83
      Points to chunk number:                      2
Message 6...
   Message ID (sequence number):                   0x0006 `link' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (146, 30) bytes
   Message Information:                           
      Link Type:                                   Hard
      Creation Order:                              1
      Link Name Character Set:                     ASCII
      Link Name:                                   'temperature'
      Object address:                              2710
Message 7...
   Message ID (sequence number):                   0x0000 `null' (1)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (182, 5) bytes
   Message Information:                           
      <No info for this message>
Message 8...
   Message ID (sequence number):                   0x0006 `link' (1)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   1
   Raw message data (offset, size) in chunk:       (10, 23) bytes
   Message Information:                           
      Link Type:                                   Hard
      Creation Order:                              0
      Link Name Character Set:                     ASCII
      Link Name:                                   'time'
      Object address:                              239
Message 9...
   Message ID (sequence number):                   0x0006 `link' (2)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   1
   Raw message data (offset, size) in chunk:       (39, 34) bytes
   Message Information:                           
      Link Type:                                   Hard
      Creation Order:                              2
      Link Name Character Set:                     ASCII
      Link Name:                                   'temperature_num'
      Object address:                              2978
Message 10...
   Message ID (sequence number):                   0x000c `attribute' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   2
   Raw message data (offset, size) in chunk:       (10, 69) bytes
   Message Information:                           
      Name:                                        "_NCProperties"
      Character Set of Name:                       ASCII
      Object opened:                               FALSE
      Object:                                      0
      Creation Index:                              0
      Datatype...
         Encoded Size:                             8
         Type class:                               text string
         Size:                                     34 bytes
         Version:                                  1
         Character Set:                            ASCII
         String Padding:                           NULL Terminated
      Dataspace...
         Encoded Size:                             4
         Space class:                              H5S_SCALAR
------------------------------------------------------------------------
Reading signature at address 2710 (rel)
Object Header...
Dirty:                                             FALSE
Version:                                           2
Header size (in bytes):                            12
Number of links:                                   1
Attribute creation order tracked:                  Yes
Attribute creation order indexed:                  Yes
Attribute storage phase change values:             Default
Timestamps:                                        Disabled
Number of messages (allocated):                    9 (16)
Number of chunks (allocated):                      2 (2)
Chunk 0...
   Address:                                        2710
   Size in bytes:                                  256
   Gap:                                            0
Chunk 1...
   Address:                                        3323
   Size in bytes:                                  90
   Gap:                                            0
Message 0...
   Message ID (sequence number):                   0x0001 `dataspace' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (14, 20) bytes
   Message Information:                           
      Rank:                                        1
      Dim Size:                                    {2}
      Dim Max:                                     {2}
Message 1...
   Message ID (sequence number):                   0x0003 `datatype' (0)
   Dirty:                                          FALSE
   Message flags:                                  <C>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (40, 12) bytes
   Message Information:                           
      Type class:                                  integer
      Size:                                        8 bytes
      Version:                                     1
      Byte order:                                  little endian
      Precision:                                   64 bits
      Offset:                                      0 bits
      Low pad type:                                zero
      High pad type:                               zero
      Sign scheme:                                 2's comp
Message 2...
   Message ID (sequence number):                   0x0005 `fill_new' (0)
   Dirty:                                          FALSE
   Message flags:                                  <C>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (58, 14) bytes
   Message Information:                           
      Space Allocation Time:                       Late
      Fill Time:                                   If Set
      Fill Value Defined:                          User Defined
      Size:                                        8
      Data type:                                   <dataset type>
Message 3...
   Message ID (sequence number):                   0x0008 `layout' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (78, 18) bytes
   Message Information:                           
      Version:                                     3
      Type:                                        Contiguous
      Data address:                                678
      Data Size:                                   16
Message 4...
   Message ID (sequence number):                   0x0015 `ainfo' (0)
   Dirty:                                          FALSE
   Message flags:                                  <DS>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (102, 28) bytes
   Message Information:                           
      Number of attributes:                        18446744073709551615
      Track creation order of attributes:          TRUE
      Index creation order of attributes:          TRUE
      Max. creation index value:                   2
      'Dense' attribute storage fractal heap address: 18446744073709551615
      'Dense' attribute storage name index v2 B-tree address: 18446744073709551615
      'Dense' attribute storage creation order index v2 B-tree address: 18446744073709551615
Message 5...
   Message ID (sequence number):                   0x000c `attribute' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (136, 65) bytes
   Message Information:                           
      Name:                                        "_Netcdf4Coordinates"
      Character Set of Name:                       ASCII
      Object opened:                               FALSE
      Object:                                      0
      Creation Index:                              0
      Datatype...
         Encoded Size:                             12
         Type class:                               integer
         Size:                                     4 bytes
         Version:                                  1
         Byte order:                               little endian
         Precision:                                32 bits
         Offset:                                   0 bits
         Low pad type:                             zero
         High pad type:                            zero
         Sign scheme:                              2's comp
      Dataspace...
         Encoded Size:                             20
         Space class:                              H5S_SIMPLE
            Rank:                                  1
            Dim Size:                              {1}
            Dim Max:                               {1}
Message 6...
   Message ID (sequence number):                   0x0010 `hdr continuation' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (207, 16) bytes
   Message Information:                           
      Continuation address:                        3323
      Continuation size in bytes:                  90
      Points to chunk number:                      1
Message 7...
   Message ID (sequence number):                   0x0000 `null' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (229, 35) bytes
   Message Information:                           
      <No info for this message>
Message 8...
   Message ID (sequence number):                   0x000c `attribute' (1)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   1
   Raw message data (offset, size) in chunk:       (10, 76) bytes
   Message Information:                           
      Name:                                        "DIMENSION_LIST"
      Character Set of Name:                       ASCII
      Object opened:                               FALSE
      Object:                                      0
      Creation Index:                              1
      Datatype...
         Encoded Size:                             16
         Type class:                               vlen
         Size:                                     16 bytes
         Version:                                  1
         Vlen type:                                sequence
         Location:                                 H5T_LOC_0
      Dataspace...
         Encoded Size:                             20
         Space class:                              H5S_SIMPLE
            Rank:                                  1
            Dim Size:                              {1}
            Dim Max:                               {1}
------------------------------------------------------------------------
Reading signature at address 2978 (rel)
Object Header...
Dirty:                                             FALSE
Version:                                           2
Header size (in bytes):                            12
Number of links:                                   1
Attribute creation order tracked:                  Yes
Attribute creation order indexed:                  Yes
Attribute storage phase change values:             Default
Timestamps:                                        Disabled
Number of messages (allocated):                    9 (16)
Number of chunks (allocated):                      2 (2)
Chunk 0...
   Address:                                        2978
   Size in bytes:                                  256
   Gap:                                            0
Chunk 1...
   Address:                                        3559
   Size in bytes:                                  90
   Gap:                                            0
Message 0...
   Message ID (sequence number):                   0x0001 `dataspace' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (14, 20) bytes
   Message Information:                           
      Rank:                                        1
      Dim Size:                                    {2}
      Dim Max:                                     {2}
Message 1...
   Message ID (sequence number):                   0x0003 `datatype' (0)
   Dirty:                                          FALSE
   Message flags:                                  <C>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (40, 12) bytes
   Message Information:                           
      Type class:                                  integer
      Size:                                        8 bytes
      Version:                                     1
      Byte order:                                  little endian
      Precision:                                   64 bits
      Offset:                                      0 bits
      Low pad type:                                zero
      High pad type:                               zero
      Sign scheme:                                 2's comp
Message 2...
   Message ID (sequence number):                   0x0005 `fill_new' (0)
   Dirty:                                          FALSE
   Message flags:                                  <C>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (58, 14) bytes
   Message Information:                           
      Space Allocation Time:                       Late
      Fill Time:                                   If Set
      Fill Value Defined:                          User Defined
      Size:                                        8
      Data type:                                   <dataset type>
Message 3...
   Message ID (sequence number):                   0x0008 `layout' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (78, 18) bytes
   Message Information:                           
      Version:                                     3
      Type:                                        Contiguous
      Data address:                                694
      Data Size:                                   16
Message 4...
   Message ID (sequence number):                   0x0015 `ainfo' (0)
   Dirty:                                          FALSE
   Message flags:                                  <DS>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (102, 28) bytes
   Message Information:                           
      Number of attributes:                        18446744073709551615
      Track creation order of attributes:          TRUE
      Index creation order of attributes:          TRUE
      Max. creation index value:                   2
      'Dense' attribute storage fractal heap address: 18446744073709551615
      'Dense' attribute storage name index v2 B-tree address: 18446744073709551615
      'Dense' attribute storage creation order index v2 B-tree address: 18446744073709551615
Message 5...
   Message ID (sequence number):                   0x000c `attribute' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (136, 65) bytes
   Message Information:                           
      Name:                                        "_Netcdf4Coordinates"
      Character Set of Name:                       ASCII
      Object opened:                               FALSE
      Object:                                      0
      Creation Index:                              0
      Datatype...
         Encoded Size:                             12
         Type class:                               integer
         Size:                                     4 bytes
         Version:                                  1
         Byte order:                               little endian
         Precision:                                32 bits
         Offset:                                   0 bits
         Low pad type:                             zero
         High pad type:                            zero
         Sign scheme:                              2's comp
      Dataspace...
         Encoded Size:                             20
         Space class:                              H5S_SIMPLE
            Rank:                                  1
            Dim Size:                              {1}
            Dim Max:                               {1}
Message 6...
   Message ID (sequence number):                   0x0010 `hdr continuation' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (207, 16) bytes
   Message Information:                           
      Continuation address:                        3559
      Continuation size in bytes:                  90
      Points to chunk number:                      1
Message 7...
   Message ID (sequence number):                   0x0000 `null' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (229, 35) bytes
   Message Information:                           
      <No info for this message>
Message 8...
   Message ID (sequence number):                   0x000c `attribute' (1)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   1
   Raw message data (offset, size) in chunk:       (10, 76) bytes
   Message Information:                           
      Name:                                        "DIMENSION_LIST"
      Character Set of Name:                       ASCII
      Object opened:                               FALSE
      Object:                                      0
      Creation Index:                              1
      Datatype...
         Encoded Size:                             16
         Type class:                               vlen
         Size:                                     16 bytes
         Version:                                  1
         Vlen type:                                sequence
         Location:                                 H5T_LOC_0
      Dataspace...
         Encoded Size:                             20
         Space class:                              H5S_SIMPLE
            Rank:                                  1
            Dim Size:                              {1}
            Dim Max:                               {1}

Just to mention, in the message 3 of the two temperature objects you can find the data address. This data address is which is different in the raw files.

Message 3...
   Message ID (sequence number):                   0x0008 `layout' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (78, 18) bytes
   Message Information:                           
      Version:                                     3
      Type:                                        Contiguous
      Data address:                                694
      Data Size:                                   16

For engine="h5netcdf":

# get overview
!h5debug test3.nc
# inspect ohdr address
!h5debug test3.nc 48
# inspect temperature object
!h5debug test3.nc 2710
# inspect temperature_new object
!h5debug test3.nc 2978

Show Output

Reading signature at address 0 (rel)
File Super Block...
File name (as opened):                             test3.nc
File name (after resolving symlinks):              test3.nc
File access flags                                  0x00000000
File open reference count:                         1
Address of super block:                            0 (abs)
Size of userblock:                                 0 bytes
Superblock version number:                         0
Free list version number:                          0
Root group symbol table entry version number:      0
Shared header version number:                      0
Size of file offsets (haddr_t type):               8 bytes
Size of file lengths (hsize_t type):               8 bytes
Symbol table leaf node 1/2 rank:                   4
Symbol table internal node 1/2 rank:               16
Indexed storage internal node 1/2 rank:            32
File status flags:                                 0x00
Superblock extension address:                      18446744073709551615 (rel)
Shared object header message table address:        18446744073709551615 (rel)
Shared object header message version number:       0
Number of shared object header message indexes:    0
Address of driver information block:               18446744073709551615 (rel)
Root group symbol table entry:                     
   Name offset into private heap:                  0
   Object header address:                          96
   Cache info type:                                Nothing Cached
------------------------------------------------------------------------
Reading signature at address 96 (rel)
Object Header...
Dirty:                                             FALSE
Version:                                           2
Header size (in bytes):                            27
Number of links:                                   1
Attribute creation order tracked:                  Yes
Attribute creation order indexed:                  Yes
Attribute storage phase change values:             Default
Timestamps:                                        Enabled
Access Time:                                       2025-05-23 12:01:42 CEST
Modification Time:                                 2025-05-23 12:01:42 CEST
Change Time:                                       2025-05-23 12:01:42 CEST
Birth Time:                                        2025-05-23 12:01:42 CEST
Number of messages (allocated):                    9 (16)
Number of chunks (allocated):                      2 (2)
Chunk 0...
   Address:                                        96
   Size in bytes:                                  224
   Gap:                                            0
Chunk 1...
   Address:                                        968
   Size in bytes:                                  126
   Gap:                                            0
Message 0...
   Message ID (sequence number):                   0x0002 `linfo' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (29, 34) bytes
   Message Information:                           
      Track creation order of links:               TRUE
      Index creation order of links:               TRUE
      Number of links:                             18446744073709551615
      Max. creation order value:                   3
      'Dense' link storage fractal heap address:   18446744073709551615
      'Dense' link storage name index v2 B-tree address: 18446744073709551615
      'Dense' link storage creation order index v2 B-tree address: 18446744073709551615
Message 1...
   Message ID (sequence number):                   0x000a `ginfo' (0)
   Dirty:                                          FALSE
   Message flags:                                  <C>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (69, 2) bytes
   Message Information:                           
      Max. compact links:                          8
      Min. dense links:                            6
      Estimated # of objects in group:             4
      Estimated length of object in group's name:  8
Message 2...
   Message ID (sequence number):                   0x0006 `link' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (77, 23) bytes
   Message Information:                           
      Link Type:                                   Hard
      Creation Order:                              0
      Link Name Character Set:                     ASCII
      Link Name:                                   'time'
      Object address:                              347
Message 3...
   Message ID (sequence number):                   0x0006 `link' (1)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (106, 30) bytes
   Message Information:                           
      Link Type:                                   Hard
      Creation Order:                              1
      Link Name Character Set:                     ASCII
      Link Name:                                   'temperature'
      Object address:                              634
Message 4...
   Message ID (sequence number):                   0x0006 `link' (2)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (142, 34) bytes
   Message Information:                           
      Link Type:                                   Hard
      Creation Order:                              2
      Link Name Character Set:                     ASCII
      Link Name:                                   'temperature_num'
      Object address:                              1190
Message 5...
   Message ID (sequence number):                   0x0015 `ainfo' (0)
   Dirty:                                          FALSE
   Message flags:                                  <DS>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (182, 28) bytes
   Message Information:                           
      Number of attributes:                        18446744073709551615
      Track creation order of attributes:          TRUE
      Index creation order of attributes:          TRUE
      Max. creation index value:                   1
      'Dense' attribute storage fractal heap address: 18446744073709551615
      'Dense' attribute storage name index v2 B-tree address: 18446744073709551615
      'Dense' attribute storage creation order index v2 B-tree address: 18446744073709551615
Message 6...
   Message ID (sequence number):                   0x0010 `hdr continuation' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (216, 16) bytes
   Message Information:                           
      Continuation address:                        968
      Continuation size in bytes:                  126
      Points to chunk number:                      1
Message 7...
   Message ID (sequence number):                   0x0000 `null' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (238, 9) bytes
   Message Information:                           
      <No info for this message>
Message 8...
   Message ID (sequence number):                   0x000c `attribute' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   1
   Raw message data (offset, size) in chunk:       (10, 112) bytes
   Message Information:                           
      Name:                                        "_NCProperties"
      Character Set of Name:                       ASCII
      Object opened:                               FALSE
      Object:                                      0
      Creation Index:                              0
      Datatype...
         Encoded Size:                             8
         Type class:                               text string
         Size:                                     72 bytes
         Version:                                  1
         Character Set:                            ASCII
         String Padding:                           NULL Padded
      Dataspace...
         Encoded Size:                             8
         Space class:                              H5S_SCALAR
------------------------------------------------------------------------
Reading signature at address 634 (rel)
Object Header...
Dirty:                                             FALSE
Version:                                           2
Header size (in bytes):                            12
Number of links:                                   1
Attribute creation order tracked:                  Yes
Attribute creation order indexed:                  Yes
Attribute storage phase change values:             Default
Timestamps:                                        Disabled
Number of messages (allocated):                    9 (16)
Number of chunks (allocated):                      2 (2)
Chunk 0...
   Address:                                        634
   Size in bytes:                                  256
   Gap:                                            0
Chunk 1...
   Address:                                        1124
   Size in bytes:                                  66
   Gap:                                            0
Message 0...
   Message ID (sequence number):                   0x0001 `dataspace' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (14, 24) bytes
   Message Information:                           
      Rank:                                        1
      Dim Size:                                    {2}
      Dim Max:                                     {2}
Message 1...
   Message ID (sequence number):                   0x0003 `datatype' (0)
   Dirty:                                          FALSE
   Message flags:                                  <C>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (44, 12) bytes
   Message Information:                           
      Type class:                                  integer
      Size:                                        8 bytes
      Version:                                     1
      Byte order:                                  little endian
      Precision:                                   64 bits
      Offset:                                      0 bits
      Low pad type:                                zero
      High pad type:                               zero
      Sign scheme:                                 2's comp
Message 2...
   Message ID (sequence number):                   0x0005 `fill_new' (0)
   Dirty:                                          FALSE
   Message flags:                                  <C>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (62, 8) bytes
   Message Information:                           
      Space Allocation Time:                       Late
      Fill Time:                                   If Set
      Fill Value Defined:                          Default
      Size:                                        0
      Data type:                                   <dataset type>
Message 3...
   Message ID (sequence number):                   0x0008 `layout' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (76, 18) bytes
   Message Information:                           
      Version:                                     3
      Type:                                        Contiguous
      Data address:                                6160
      Data Size:                                   16
Message 4...
   Message ID (sequence number):                   0x0015 `ainfo' (0)
   Dirty:                                          FALSE
   Message flags:                                  <DS>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (100, 28) bytes
   Message Information:                           
      Number of attributes:                        18446744073709551615
      Track creation order of attributes:          TRUE
      Index creation order of attributes:          TRUE
      Max. creation index value:                   2
      'Dense' attribute storage fractal heap address: 18446744073709551615
      'Dense' attribute storage name index v2 B-tree address: 18446744073709551615
      'Dense' attribute storage creation order index v2 B-tree address: 18446744073709551615
Message 5...
   Message ID (sequence number):                   0x000c `attribute' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (134, 80) bytes
   Message Information:                           
      Name:                                        "DIMENSION_LIST"
      Character Set of Name:                       ASCII
      Object opened:                               FALSE
      Object:                                      0
      Creation Index:                              0
      Datatype...
         Encoded Size:                             16
         Type class:                               vlen
         Size:                                     16 bytes
         Version:                                  1
         Vlen type:                                sequence
         Location:                                 H5T_LOC_0
      Dataspace...
         Encoded Size:                             24
         Space class:                              H5S_SIMPLE
            Rank:                                  1
            Dim Size:                              {1}
            Dim Max:                               {1}
Message 6...
   Message ID (sequence number):                   0x0010 `hdr continuation' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (220, 16) bytes
   Message Information:                           
      Continuation address:                        1124
      Continuation size in bytes:                  66
      Points to chunk number:                      1
Message 7...
   Message ID (sequence number):                   0x0000 `null' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (242, 22) bytes
   Message Information:                           
      <No info for this message>
Message 8...
   Message ID (sequence number):                   0x000c `attribute' (1)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   1
   Raw message data (offset, size) in chunk:       (10, 52) bytes
   Message Information:                           
      Name:                                        "_Netcdf4Dimid"
      Character Set of Name:                       ASCII
      Object opened:                               FALSE
      Object:                                      0
      Creation Index:                              1
      Datatype...
         Encoded Size:                             12
         Type class:                               integer
         Size:                                     4 bytes
         Version:                                  1
         Byte order:                               little endian
         Precision:                                32 bits
         Offset:                                   0 bits
         Low pad type:                             zero
         High pad type:                            zero
         Sign scheme:                              2's comp
      Dataspace...
         Encoded Size:                             8
         Space class:                              H5S_SCALAR
------------------------------------------------------------------------
Reading signature at address 1190 (rel)
Object Header...
Dirty:                                             FALSE
Version:                                           2
Header size (in bytes):                            12
Number of links:                                   1
Attribute creation order tracked:                  Yes
Attribute creation order indexed:                  Yes
Attribute storage phase change values:             Default
Timestamps:                                        Disabled
Number of messages (allocated):                    9 (16)
Number of chunks (allocated):                      2 (2)
Chunk 0...
   Address:                                        1190
   Size in bytes:                                  256
   Gap:                                            0
Chunk 1...
   Address:                                        902
   Size in bytes:                                  66
   Gap:                                            0
Message 0...
   Message ID (sequence number):                   0x0001 `dataspace' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (14, 24) bytes
   Message Information:                           
      Rank:                                        1
      Dim Size:                                    {2}
      Dim Max:                                     {2}
Message 1...
   Message ID (sequence number):                   0x0003 `datatype' (0)
   Dirty:                                          FALSE
   Message flags:                                  <C>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (44, 12) bytes
   Message Information:                           
      Type class:                                  integer
      Size:                                        8 bytes
      Version:                                     1
      Byte order:                                  little endian
      Precision:                                   64 bits
      Offset:                                      0 bits
      Low pad type:                                zero
      High pad type:                               zero
      Sign scheme:                                 2's comp
Message 2...
   Message ID (sequence number):                   0x0005 `fill_new' (0)
   Dirty:                                          FALSE
   Message flags:                                  <C>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (62, 8) bytes
   Message Information:                           
      Space Allocation Time:                       Late
      Fill Time:                                   If Set
      Fill Value Defined:                          Default
      Size:                                        0
      Data type:                                   <dataset type>
Message 3...
   Message ID (sequence number):                   0x0008 `layout' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (76, 18) bytes
   Message Information:                           
      Version:                                     3
      Type:                                        Contiguous
      Data address:                                6176
      Data Size:                                   16
Message 4...
   Message ID (sequence number):                   0x0015 `ainfo' (0)
   Dirty:                                          FALSE
   Message flags:                                  <DS>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (100, 28) bytes
   Message Information:                           
      Number of attributes:                        18446744073709551615
      Track creation order of attributes:          TRUE
      Index creation order of attributes:          TRUE
      Max. creation index value:                   2
      'Dense' attribute storage fractal heap address: 18446744073709551615
      'Dense' attribute storage name index v2 B-tree address: 18446744073709551615
      'Dense' attribute storage creation order index v2 B-tree address: 18446744073709551615
Message 5...
   Message ID (sequence number):                   0x000c `attribute' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (134, 80) bytes
   Message Information:                           
      Name:                                        "DIMENSION_LIST"
      Character Set of Name:                       ASCII
      Object opened:                               FALSE
      Object:                                      0
      Creation Index:                              0
      Datatype...
         Encoded Size:                             16
         Type class:                               vlen
         Size:                                     16 bytes
         Version:                                  1
         Vlen type:                                sequence
         Location:                                 H5T_LOC_0
      Dataspace...
         Encoded Size:                             24
         Space class:                              H5S_SIMPLE
            Rank:                                  1
            Dim Size:                              {1}
            Dim Max:                               {1}
Message 6...
   Message ID (sequence number):                   0x0010 `hdr continuation' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (220, 16) bytes
   Message Information:                           
      Continuation address:                        902
      Continuation size in bytes:                  66
      Points to chunk number:                      1
Message 7...
   Message ID (sequence number):                   0x0000 `null' (0)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   0
   Raw message data (offset, size) in chunk:       (242, 22) bytes
   Message Information:                           
      <No info for this message>
Message 8...
   Message ID (sequence number):                   0x000c `attribute' (1)
   Dirty:                                          FALSE
   Message flags:                                  <none>
   Chunk number:                                   1
   Raw message data (offset, size) in chunk:       (10, 52) bytes
   Message Information:                           
      Name:                                        "_Netcdf4Dimid"
      Character Set of Name:                       ASCII
      Object opened:                               FALSE
      Object:                                      0
      Creation Index:                              1
      Datatype...
         Encoded Size:                             12
         Type class:                               integer
         Size:                                     4 bytes
         Version:                                  1
         Byte order:                               little endian
         Precision:                                32 bits
         Offset:                                   0 bits
         Low pad type:                             zero
         High pad type:                            zero
         Sign scheme:                              2's comp
      Dataspace...
         Encoded Size:                             8
         Space class:                              H5S_SCALAR

Here we can observe the modification times in the object header:

Reading signature at address 96 (rel)
Object Header...
Dirty:                                             FALSE
Version:                                           2
Header size (in bytes):                            27
Number of links:                                   1
Attribute creation order tracked:                  Yes
Attribute creation order indexed:                  Yes
Attribute storage phase change values:             Default
Timestamps:                                        Enabled
Access Time:                                       2025-05-23 12:01:42 CEST
Modification Time:                                 2025-05-23 12:01:42 CEST
Change Time:                                       2025-05-23 12:01:42 CEST
Birth Time:                                        2025-05-23 12:01:42 CEST
Number of messages (allocated):                    9 (16)
Number of chunks (allocated):                      2 (2)

OK, I hope this little digression into the depths of HDF5 was not too exhausting and we all have now a bit deeper understanding why and how the binary files are sometimes different (even when the data contents are exactly the same).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Why does "diff" command tell two netcdf files differ when they should be exactly the same? #10349

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Why does "diff" command tell two netcdf files differ when they should be exactly the same? #10349

Uh oh!

jinqj May 20, 2025

What is your issue?

Replies: 3 comments

Uh oh!

Uh oh!

kmuehlbauer May 20, 2025 Maintainer

Uh oh!

jinqj May 20, 2025 Author

Uh oh!

Uh oh!

kmuehlbauer May 23, 2025 Maintainer

jinqj
May 20, 2025

kmuehlbauer
May 20, 2025
Maintainer

jinqj
May 20, 2025
Author

kmuehlbauer
May 23, 2025
Maintainer