[FEATURE] More strong vision encoder pretrained models

Hi, timm is excellent and helps me a lot for my research. Thanks for your great work!

Do you have the plan to support more  strong vision models like Depth Anything v1/v2, and cross-modal MLLM like QWENvl-2.5/LLaVA, etc?
@rwightman