Internal Compass: Zooming Zero-Shot Image Classification with Internal Confidence

This repository contains the code for Internal Compass, a course project exploring zero-shot image classification using internal confidence scores from Vision-Language Models (VLMs). The work builds on mechanistic interpretability techniques to analyze internal representations of VLMs, improving zero-shot classification performance.

Features

Zero-shot image classification using Vision-Language Models.
Internal confidence-based classification using Logit Lens.
Support for InstructBLIP and LLaVA models.
Tested on CIFAR-10 dataset.

Setup

Files

git clone https://github.com/ansh997/incom.git
cd incom

Environment

# Create a new conda environment
conda create -n vl python=3.9 -y
conda activate vl

# install from root repo first
pip3 install -e .

# Set up LLaVA repo
cd src/caption/llava/LLaVA
pip3 install -e .

# Install some remaining packages
pip3 install lightning openai-clip transformers==4.37.2 omegaconf python-dotenv "numpy<2"

# Missing dependency (Only works with Conda not pip)
conda install conda-forge::pattern -y

Model Weights

The configs for InstructBLIP models are under src/caption/lavis/configs/.

In order to get InstructBLIP (7B) working, you should download the pretrained model weights and vicuna7b weights.

📊 Results

We tested the method on CIFAR-10 and compared it with standard models:

Method	Accuracy (%)
EfficientNet (Supervised)	89.32%
ResNet18 (Supervised)	90.89%
ViT (Supervised)	96.70%
SimCLR (One-Shot)	93.20%
CLIP (Zero-Shot)	95.10%
SigLIP (Zero-Shot)	95.32%
Our Method (Zero-Shot)	82.31%

@misc{pal2024internalcompass,
  title={Internal Compass: Zooming Zero-Shot Image Classification with Internal Confidence},
  author={Himanshu Pal, Snigdha Agarwal, Uday Bhaskar},
  institution={IIIT Hyderabad},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
demos		demos
images		images
log_results		log_results
methods		methods
metric		metric
src.egg-info		src.egg-info
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
runnable_environment.yaml		runnable_environment.yaml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Internal Compass: Zooming Zero-Shot Image Classification with Internal Confidence

Features

Setup

Files

Environment

Model Weights

📊 Results

About

Releases

Packages

Languages

License

ansh997/incom

Folders and files

Latest commit

History

Repository files navigation

Internal Compass: Zooming Zero-Shot Image Classification with Internal Confidence

Features

Setup

Files

Environment

Model Weights

📊 Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages