Skip to content

FileNotFoundError when writing an array that contains empty chunks #3136

Open
@lukasbindreiter

Description

@lukasbindreiter

Zarr version

v3.0.8

Numcodecs version

v0.16.1

Python Version

3.13.1

Operating System

Mac

Installation

using uv

Description

There seems to be an issue with the zarr + obstore integration, which results in a FileNotFoundError when trying to write data into a newly created, empty array in case some of the chunks trying to write are empty (all zeros).

Curiously though the issue only surfaces when using a GCSStore or a LocalStore, for S3 it seems to work as expected.

import numpy as np
import zarr
from obstore.store import LocalStore
from zarr.storage import ObjectStore

zarr_store = ObjectStore(LocalStore("test_zarr_store"))  # issue also comes up with GCSStore
arr = zarr.create_array(zarr_store, name="arr", shape=(5, 128, 128), dtype=np.uint16, chunks=(1, 32, 32))

# this will fail with: FileNotFoundError: Object at location debug/zarr_issue/arr/c/0/1/2 not found
arr[0, :, :] = np.zeros((128, 128), dtype=np.uint16)
FileNotFoundError, expand for Traceback
Traceback (most recent call last):
  File "/Users/lukasbindreiter/Documents/tilebox/playground/zarr/gcs_issue.py", line 20, in <module>
    arr[0, :, :] = np.zeros((128, 128), dtype=np.uint16)
    ~~~^^^^^^^^^
  File "/Users/lukasbindreiter/Library/Caches/uv/environments-v2/repr-6df0040f128f3d90/lib/python3.13/site-packages/zarr/core/array.py", line 2553, in __setitem__
    self.set_orthogonal_selection(pure_selection, value, fields=fields)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lukasbindreiter/Library/Caches/uv/environments-v2/repr-6df0040f128f3d90/lib/python3.13/site-packages/zarr/_compat.py", line 43, in inner_f
    return f(*args, **kwargs)
  File "/Users/lukasbindreiter/Library/Caches/uv/environments-v2/repr-6df0040f128f3d90/lib/python3.13/site-packages/zarr/core/array.py", line 3009, in set_orthogonal_selection
    return sync(
        self._async_array._set_selection(indexer, value, fields=fields, prototype=prototype)
    )
  File "/Users/lukasbindreiter/Library/Caches/uv/environments-v2/repr-6df0040f128f3d90/lib/python3.13/site-packages/zarr/core/sync.py", line 163, in sync
    raise return_result
  File "/Users/lukasbindreiter/Library/Caches/uv/environments-v2/repr-6df0040f128f3d90/lib/python3.13/site-packages/zarr/core/sync.py", line 119, in _runner
    return await coro
           ^^^^^^^^^^
  File "/Users/lukasbindreiter/Library/Caches/uv/environments-v2/repr-6df0040f128f3d90/lib/python3.13/site-packages/zarr/core/array.py", line 1446, in _set_selection
    await self.codec_pipeline.write(
    ...<12 lines>...
    )
  File "/Users/lukasbindreiter/Library/Caches/uv/environments-v2/repr-6df0040f128f3d90/lib/python3.13/site-packages/zarr/core/codec_pipeline.py", line 481, in write
    await concurrent_map(
    ...<6 lines>...
    )
  File "/Users/lukasbindreiter/Library/Caches/uv/environments-v2/repr-6df0040f128f3d90/lib/python3.13/site-packages/zarr/core/common.py", line 76, in concurrent_map
    return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lukasbindreiter/Library/Caches/uv/environments-v2/repr-6df0040f128f3d90/lib/python3.13/site-packages/zarr/core/common.py", line 74, in run
    return await func(*item)
           ^^^^^^^^^^^^^^^^^
  File "/Users/lukasbindreiter/Library/Caches/uv/environments-v2/repr-6df0040f128f3d90/lib/python3.13/site-packages/zarr/core/codec_pipeline.py", line 431, in write_batch
    await concurrent_map(
    ...<8 lines>...
    )
  File "/Users/lukasbindreiter/Library/Caches/uv/environments-v2/repr-6df0040f128f3d90/lib/python3.13/site-packages/zarr/core/common.py", line 76, in concurrent_map
    return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lukasbindreiter/Library/Caches/uv/environments-v2/repr-6df0040f128f3d90/lib/python3.13/site-packages/zarr/core/common.py", line 74, in run
    return await func(*item)
           ^^^^^^^^^^^^^^^^^
  File "/Users/lukasbindreiter/Library/Caches/uv/environments-v2/repr-6df0040f128f3d90/lib/python3.13/site-packages/zarr/core/codec_pipeline.py", line 427, in _write_key
    await byte_setter.delete()
  File "/Users/lukasbindreiter/Library/Caches/uv/environments-v2/repr-6df0040f128f3d90/lib/python3.13/site-packages/zarr/storage/_common.py", line 165, in delete
    await self.store.delete(self.path)
  File "/Users/lukasbindreiter/Library/Caches/uv/environments-v2/repr-6df0040f128f3d90/lib/python3.13/site-packages/zarr/storage/_obstore.py", line 184, in delete
    await obs.delete_async(self.store, key)
FileNotFoundError: Object at location debug/zarr_issue/arr/c/0/1/2 not found: Error performing DELETE https://storage.googleapis.com/workflow%2Dcache%2D15c9850/debug%2Fzarr%5Fissue%2Farr%2Fc%2F0%2F1%2F2 in 72.532333ms - Server returned non-2xx status code: 404 Not Found: <?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Details>No such object: workflow-cache-15c9850/debug/zarr_issue/arr/c/0/1/2</Details></Error>

Debug source:
NotFound {
    path: "debug/zarr_issue/arr/c/0/1/2",
    source: RetryError {
        method: DELETE,
        uri: Some(
            https://storage.googleapis.com/workflow%2Dcache%2D15c9850/debug%2Fzarr%5Fissue%2Farr%2Fc%2F0%2F1%2F2,
        ),
        retries: 0,
        max_retries: 10,
        elapsed: 72.532333ms,
        retry_timeout: 180s,
        inner: Status {
            status: 404,
            body: Some(
                "<?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Details>No such object: workflow-cache-15c9850/debug/zarr_issue/arr/c/0/1/2</Details></Error>",
            ),
        },
    },
}

The issue only surfaces when one (or all of the chunks) contains all zeros, if I fill up my array with random values it works:

# fails with FileNotFoundError
arr[0, :, :] = np.zeros((128, 128), dtype=np.uint16)

# works
arr[0, :, :] = np.random.randint(0, 100, size=(128, 128), dtype=np.uint16)

It makes sense to me that zarr would delete chunks here, instead of writing ones with only zeros in them.
However, it seems that instead of a delete operation it should use a delete_if_exists, right?

Steps to reproduce

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
#   "obstore==0.6.0",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues

import shutil
from pathlib import Path

import numpy as np
import zarr
from obstore.store import LocalStore
from zarr.storage import ObjectStore

store_location = Path("test_zarr_store")
if store_location.exists():
    # allow to run the script multiple times by deleting the previous run
    shutil.rmtree(store_location)
store_location.mkdir()

zarr_store = ObjectStore(LocalStore(store_location))
arr = zarr.create_array(
    zarr_store, name="arr", shape=(5, 128, 128), dtype=np.uint16, chunks=(1, 32, 32)
)
# this will fail with: # FileNotFoundError: Object at location test_zarr_store/arr/c/0/1/0 not found
arr[0, :, :] = np.zeros((128, 128), dtype=np.uint16)

# however, if we don't use zeros but random values it works
# arr[0, :, :] = np.random.randint(0, 100, size=(128, 128), dtype=np.uint16)

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions