AutoEncoderKL output tensor dimension mismatch with Input

I am trying to train a AutoEncoderKL model on RGB images with the following dimensions (3,1225,966). Here is the code that I use ( similar to what's there in tutorials/generative/2d_ldm/2d_ldm_tutorial.ipynb ). 
 autoencoderkl = AutoencoderKL(
    spatial_dims=2,
    in_channels=3,
    out_channels=3,
    num_channels=(128, 256, 384),
    latent_channels=8,
    num_res_blocks=1,
    attention_levels=(False, False, False),
    with_encoder_nonlocal_attn=False,
    with_decoder_nonlocal_attn=False,
)
autoencoderkl = autoencoderkl.to(device)

Error is reported at line 27 (Train Model -  as in the tutorials notebook)
`recons_loss = F.l1_loss(reconstruction.float(), images.float())
RuntimeError: The size of tensor a (964) must match the size of tensor b (966) at non-singleton dimension 3`

Using pytorchinfo package , I was able to print the model summary and can find the discrepancy in the upsampling layer.

>===================================================================================================================
Layer (type:depth-idx)                   Input Shape               Output Shape              Param #
===================================================================================================================
AutoencoderKL                            [1, 3, 1225, 966]         [1, 3, 1224, 964]         --
├─Encoder: 1-1                           [1, 3, 1225, 966]         [1, 8, 306, 241]          --
│    └─ModuleList: 2-1                   --                        --                        --
│    │    └─Convolution: 3-1             [1, 3, 1225, 966]         [1, 128, 1225, 966]       3,584
│    │    └─ResBlock: 3-2                [1, 128, 1225, 966]       [1, 128, 1225, 966]       295,680
│    │    └─Downsample: 3-3              [1, 128, 1225, 966]       [1, 128, 612, 483]        147,584
│    │    └─ResBlock: 3-4                [1, 128, 612, 483]        [1, 256, 612, 483]        919,040
│    │    └─Downsample: 3-5              [1, 256, 612, 483]        [1, 256, 306, 241]        590,080
│    │    └─ResBlock: 3-6                [1, 256, 306, 241]        [1, 384, 306, 241]        2,312,576
│    │    └─GroupNorm: 3-7               [1, 384, 306, 241]        [1, 384, 306, 241]        768
│    │    └─Convolution: 3-8             [1, 384, 306, 241]        [1, 8, 306, 241]          27,656
├─Convolution: 1-2                       [1, 8, 306, 241]          [1, 8, 306, 241]          --
│    └─Conv2d: 2-2                       [1, 8, 306, 241]          [1, 8, 306, 241]          72
├─Convolution: 1-3                       [1, 8, 306, 241]          [1, 8, 306, 241]          --
│    └─Conv2d: 2-3                       [1, 8, 306, 241]          [1, 8, 306, 241]          72
├─Convolution: 1-4                       [1, 8, 306, 241]          [1, 8, 306, 241]          --
│    └─Conv2d: 2-4                       [1, 8, 306, 241]          [1, 8, 306, 241]          72
├─Decoder: 1-5                           [1, 8, 306, 241]          [1, 3, 1224, 964]         --
│    └─ModuleList: 2-5                   --                        --                        --
│    │    └─Convolution: 3-9             [1, 8, 306, 241]          [1, 384, 306, 241]        28,032
│    │    └─ResBlock: 3-10               [1, 384, 306, 241]        [1, 384, 306, 241]        2,656,512
│    │    └─Upsample: 3-11               [1, 384, 306, 241]        [1, 384, 612, 482]        1,327,488
│    │    └─ResBlock: 3-12               [1, 384, 612, 482]        [1, 256, 612, 482]        1,574,912
│    │    └─Upsample: 3-13               [1, 256, 612, 482]        [1, 256, 1224, 964]       590,080
│    │    └─ResBlock: 3-14               [1, 256, 1224, 964]       [1, 128, 1224, 964]       476,288
│    │    └─GroupNorm: 3-15              [1, 128, 1224, 964]       [1, 128, 1224, 964]       256
│    │    └─Convolution: 3-16            [1, 128, 1224, 964]       [1, 3, 1224, 964]         3,459
===================================================================================================================
Total params: 10,954,211
Trainable params: 10,954,211
Non-trainable params: 0
Total mult-adds (Units.TERABYTES): 3.20
===================================================================================================================
Input size (MB): 14.20
Forward/backward pass size (MB): 26803.57
Params size (MB): 43.82
Estimated Total Size (MB): 26861.59
===================================================================================================================


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AutoEncoderKL output tensor dimension mismatch with Input #498

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AutoEncoderKL output tensor dimension mismatch with Input #498

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions