-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Hi,
Thanks for this impressive work
I have a question regarding the Linear Probing performance. I noticed that the results reported in Table 2 are significantly better than the curves shown in Figure 5(a).
I am aware that Table 2 utilizes a symmetric ViT (Pixel Decoder) instead of the lightweight (4-layer) decoder. My understanding is that a symmetric/heavier decoder is primarily beneficial for reconstruction quality (rFID). However, it is surprising to see it lead to such a massive improvement in Linear Probing (representation quality) as well.
Does the symmetric decoder alone account for this performance gap?
Could you kindly clarify the training setup for Table 2? Specifically, how much data and how much training data was required?
Metadata
Metadata
Assignees
Labels
No labels