Adapting Vision-Language Models Without Labels: A Comprehensive Survey

This repository collects research papers on unsupervised learning methods with VLMs. The repository will be continuously updated to track the latest work in the community.

Keywords: unsupervised learning, test-time adaptation, vision-language models.

🔥 Update

[Aug 7, 2025] Just finished the manuscript.

📃 Overview

✨ Contents

🤝 Citation

Please visit Adapting Vision-Language Models Without Labels: A Comprehensive Survey for more details and comprehensive information. If you find our paper and repo helpful, please consider citing it as follows:

@article{dong2025adapting,
  title={Adapting Vision-Language Models Without Labels: A Comprehensive Survey}, 
  author={Dong, Hao and Sheng, Lijun and Liang, Jian and He, Ran and Chatzi, Eleni and Fink, Olga},
  journal={arXiv preprint arXiv:2508.05547}, 
  year={2025}}

💥 Selected Papers from Premier AI/ML Conferences

ICCV-2025 FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation
ICCV-2025 Generate, Transduct, Adapt: Iterative Transduction with VLMs
ICCV-2025 BATCLIP: Bimodal Online Test-Time Adaptation for CLIP
ICCV-2025 Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation
ICML-2025 From Local Details to Global Context: Advancing Vision-Language Models with Attention-based Selection
ICML-2025 GS-Bias: Global-Spatial Bias Learner for Single-Image Test-Time Adaptation of Vision-Language Models
CVPR-2025 CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP
CVPR-2025 TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models
CVPR-2025 SCAP: Transductive Test-Time Adaptation via Supportive Clique-based Attribute Prompting
CVPR-2025 O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models
CVPR-2025 R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning
CVPR-2025 Realistic Test-Time Adaptation of Vision-Language Models
CVPR-2025 SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models
CVPR-2025 Free on the Fly: Enhancing Flexibility in Test-Time Adaptation with Online EM
CVPR-2025 Bayesian Test-Time Adaptation for Vision-Language Models
CVPR-2025 COSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time Adaptation
CVPR-2025 Hierarchical Knowledge Prompt Tuning for Multi-task Test-Time Adaptation
CVPR-2025 On the Zero-shot Adversarial Robustness of Vision-Language Models: A Truly Zero-shot and Training-free Approach
ICLR-2025 RA-TTA: Retrieval-Augmented Test-Time Adaptation for Vision-Language Models
ICLR-2025 Noisy Test-Time Adaptation in Vision-Language Models
ICLR-2025 Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model
ICLR-2025 DynaPrompt: Dynamic Test-Time Prompt Tuning
ICLR-2025 Test-time Adaptation for Cross-modal Retrieval with Query Shift
AAAI-2025 Learning to Prompt with Text Only Supervision for Vision-Language Models
AAAI-2025 Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
WACV-2025 Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation
WACV-2025 Enhancing Visual Classification using Comparative Descriptors
WACV-2025 DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models
WACV-2025 LATTECLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts
WACV-2025 Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models
WACV-2025 Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models
WACV-2025 Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation
WACV-2025 CLIPArTT: Adaptation of CLIP to New Domains at Test Time
IJCV-2025 Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation
TIP-2025 Task-to-Instance Prompt Learning for Vision-Language Models at Test Time
PR-2025 A Closer Look at the Explainability of Contrastive Language-Image Pre-training
PR-2025 CTPT: Continual Test-time Prompt Tuning for Vision-Language Models
NeurIPS-2024 Boosting Vision-Language Models with Transduction
NeurIPS-2024 OTTER: Effortless Label Distribution Adaptation of Zero-shot Models
NeurIPS-2024 Frustratingly Easy Test-Time Adaptation of Vision-Language Models
NeurIPS-2024 AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
NeurIPS-2024 WATT: Weight Average Test-Time Adaptation of CLIP
NeurIPS-2024 Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models
NeurIPS-2024 BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping
NeurIPS-2024 Historical Test-time Prompt Tuning for Vision Foundation Models
ACMMM-2024 WaveDN: A Wavelet-based Training-free Zero-shot Enhancement for Vision-Language Models
ACMMM-2024 Towards Robustness Prompt Tuning with Fully Test-Time Adaptation for CLIP’s Zero-Shot Generalization
ECCV-2024 Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
ECCV-2024 SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
ECCV-2024 ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
ECCV-2024 ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
ECCV-2024 TAG: Text Prompt Augmentation for Zero-Shot Out-of-Distribution Detection
ECCV-2024 Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation
ECCV-2024 uCAP: An Unsupervised Prompting Method for Vision-Language Models
ECCV-2024 Robust Calibration of Large Vision-Language Adapters
ECCV-2024 Robust Calibration of Large Vision-Language Adapters
ECCV-2024 Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
ECCV-2024 In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
ECCV-2024 Online Zero-Shot Classification with CLIP
IJCAI-2024 DTS-TPT: Dual Temporal-Sync Test-time Prompt Tuning for Zero-shot Activity Recognition
ICML-2024 Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection
ICML-2024 Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization
ICML-2024 Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data
ICML-2024 Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
ICML-2024 Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
CVPR-2024 Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
CVPR-2024 The Neglected Tails in Vision-Language Models
CVPR-2024 PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
CVPR-2024 Label Propagation for Zero-shot Classification with Vision-Language Models
CVPR-2024 Transductive Zero-Shot and Few-Shot CLIP
CVPR-2024 On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?
CVPR-2024 Test-Time Zero-Shot Temporal Action Localization
CVPR-2024 Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification
CVPR-2024 Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
CVPR-2024 Efficient Test-Time Adaptation of Vision-Language Models
CVPR-2024 Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
CVPR-2024 Improved Self-Training for Test-Time Adaptation
CVPR-2024 Any-Shift Prompting for Generalization over Distributions
ICLR-2024 Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
ICLR-2024 C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion
ICLR-2024 PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts
ICLR-2024 Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image Classification
AAAI-2024 Robust Test-Time Adaptation for Zero-Shot Prompt Tuning
AAAI-2024 DART: Dual-Modal Adaptive Online Prompting and Knowledge Retention for Test-Time Adaptation
WACV-2024 ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation
WACV-2024 CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-Free
WACV-2024 DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification
NeurIPS-2023 Neural Priming for Sample-Efficient Adaptation
NeurIPS-2023 ChatGPT-Powered Hierarchical Comparisons for Image Classification
NeurIPS-2023 Neural Priming for Sample-Efficient Adaptation
NeurIPS-2023 LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections
NeurIPS-2023 Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt Tuning
NeurIPS-2023 Intra-Modal Proxy Learning for Zero-Shot Visual Categorization with CLIP
NeurIPS-2023 SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models
NeurIPS-2023 Diffusion-TTA: Test-time Adaptation of Discriminative Models via Generative Feedback
NeurIPS-2023 Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
NeurIPS-2023 SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models
NeurIPS-2023 Test-Time Distribution Normalization for Contrastively Learned Vision-language Models
ACMMM-2023 VPA: Fully Test-Time Visual Prompt Adaptation
ICCV-2023 What Does a Platypus Look Like? Generating Customized Prompts for Zero-Shot Image Classification
ICCV-2023 SuS-X: Training-Free Name-Only Transfer of Vision-Language Models
ICCV-2023 Waffling around for Performance: Visual Classification with Random Words and Broad Concepts
ICCV-2023 Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning
ICML-2023 CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets
ICML-2023 A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
ICML-2023 POUF: Prompt-Oriented Unsupervised Fine-tuning for Large Pre-trained Models
CVPR-2023 Texts as Images in Prompt Tuning for Multi-Label Image Recognition
CVPR-2023 Improving Zero-shot Generalization and Robustness of Multi-modal Models
CVPR-2023 Texts as Images in Prompt Tuning for Multi-Label Image Recognition
ICLR-2023 Visual Classification via Description from Large Language Models
ICLR-2023 Masked Unsupervised Self-training for Label-free Image Classification
AAAI-2023 CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
NeurIPS-2022 ReCo: Retrieve and Co-segment for Zero-shot Transfer
NeurIPS-2022 Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models
NeurIPS-2022 ReCo: Retrieve and Co-segment for Zero-shot Transfer
ECCV-2022 Extract Free Dense Labels from CLIP
ICML-2021 Learning Transferable Visual Models From Natural Language Supervision

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Adapting Vision-Language Models Without Labels: A Comprehensive Survey

🔥 Update

📃 Overview

✨ Contents

🤝 Citation

💥 Selected Papers from Premier AI/ML Conferences

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

License

tim-learn/Awesome-LabelFree-VLMs

Folders and files

Latest commit

History

Repository files navigation

Adapting Vision-Language Models Without Labels: A Comprehensive Survey

🔥 Update

📃 Overview

✨ Contents

🤝 Citation

💥 Selected Papers from Premier AI/ML Conferences

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Packages