How to achieve sota performance as shown in Table 2

Hi,

Thanks for this impressive work

I have a question regarding the Linear Probing performance. I noticed that the results reported in Table 2 are significantly better than the curves shown in Figure 5(a).

I am aware that Table 2 utilizes a symmetric ViT (Pixel Decoder) instead of the lightweight (4-layer) decoder. My understanding is that a symmetric/heavier decoder is primarily beneficial for reconstruction quality (rFID). However, it is surprising to see it lead to such a massive improvement in Linear Probing (representation quality) as well.

Does the symmetric decoder alone account for this performance gap?

Could you kindly clarify the training setup for Table 2? Specifically, how much data and how much training data was required?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to achieve sota performance as shown in Table 2 #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to achieve sota performance as shown in Table 2 #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions