You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is due to the fixed input shape of the underlying VIT implementation (either TIMM/MAE or SAM). However, this is only due to the fixed positional encoding size. Otherwise the transformer could process sequences of arbitrary length (and consequently images of dynamic shape as long as their divisible by the patch shape).
It would be nice to update this so that arbitrary input shapes are supported. But this is currently not a priority. cc @anwai98.
The text was updated successfully, but these errors were encountered:
Currently our UNETR implementation has a fixed input shape, see https://github.com/constantinpape/torch-em/blob/main/torch_em/model/unetr.py#L64.
This is due to the fixed input shape of the underlying VIT implementation (either TIMM/MAE or SAM). However, this is only due to the fixed positional encoding size. Otherwise the transformer could process sequences of arbitrary length (and consequently images of dynamic shape as long as their divisible by the patch shape).
It would be nice to update this so that arbitrary input shapes are supported. But this is currently not a priority. cc @anwai98.
The text was updated successfully, but these errors were encountered: