From 673c18d35a0acc77f339a8797cd38a0189549504 Mon Sep 17 00:00:00 2001 From: marshrossney <17361029+marshrossney@users.noreply.github.com> Date: Thu, 29 Aug 2024 10:13:29 +0100 Subject: [PATCH] add WIP notebook --- notebooks/BioclipExperiment.md | 305 +++++++++++++++++++++++++++++++++ 1 file changed, 305 insertions(+) create mode 100644 notebooks/BioclipExperiment.md diff --git a/notebooks/BioclipExperiment.md b/notebooks/BioclipExperiment.md new file mode 100644 index 0000000..6a49aa2 --- /dev/null +++ b/notebooks/BioclipExperiment.md @@ -0,0 +1,305 @@ +--- +jupyter: + jupytext: + formats: ipynb,md + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.16.4 + kernelspec: + display_name: Python 3 (ipykernel) + language: python + name: python3 +--- + +# Joe experiments with BioCLIP + +```python +import logging +import warnings + +from PIL import Image +import torch +import open_clip + +# Get clues about what's happening upstream +logging.basicConfig(level=logging.INFO) + +warnings.filterwarnings(action="ignore", category=FutureWarning) +``` + + +## Loading and inspecting the model + +I am following the (very minimal) instructions for loading the BioCLIP from the [BioCLIP huggingface page](https://huggingface.co/imageomics/bioclip#how-to-get-started-with-the-model +), and the more generic ones from the [`open_clip` README](https://github.com/mlfoundations/open_clip?tab=readme-ov-file#usage). + +The first step is + +```python +model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:imageomics/bioclip') +``` + +I am looking at the logic inside [`open_clip.create_model_and_transforms`](https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/factory.py#L375) and it's quite messy/confusing and sparsely documented, with defaults silently overriden. Probably best to just treat this as a black box! Most importantly, it returns three things: + +1. an instance of [`open_clip.model.CLIP`](https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/model.py#L220) (a subclass of `torch.nn.Module`) with loaded weights +2. an instance of [`torchvision.transforms.transforms.Compose`](https://pytorch.org/vision/main/generated/torchvision.transforms.Compose.html) that performs the image preprocessing for training +3. another `Compose` for validation + + +```python +model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:imageomics/bioclip') +model.eval() +print(type(model), type(preprocess_train), type(preprocess_val)) +``` + +### Unpacking CLIP + +The `forward` pass of `CLIP` transforms both image and text inputs (if both are provided). + +#### Images +Images are passed to the `visual` submodule, which is an instance of [`VisionTransformer`](https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/transformer.py#L434) (You can get an idea for what a `VisionTransformer` contains/does from the `repr` below, but see [source code](https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/transformer.py#L434) for more details.) +The outputs of the transformer are then [normalised](https://pytorch.org/docs/stable/generated/torch.nn.functional.normalize.html) (by dividing by the $\ell_2$ norm). + +#### Text +Text goes through the following stages: + +1. Token embedding (`token_embedding`) +2. Transformer (`transformer`) +3. Layer norm (`ln_final1`) +4. Normalise (another division by the $\ell_2$ norm, functional so not listed as a module) + +```python +model +``` + +### Preprocessing + +The preprocessing for training and validation are slightly different: during training [`torchvision.transforms.RandomResizedCrop`](https://pytorch.org/vision/main/generated/torchvision.transforms.RandomResizedCrop.html) is used to crop a random portion of the image and resize it to (224, 224), whereas in validation the image is resized so that the *smaller* edge is 224 pixels, before a center crop is applied to make the resulting image square. + +Note that [`torchvision.transforms.CenterCrop`](https://pytorch.org/vision/main/generated/torchvision.transforms.CenterCrop.html#torchvision.transforms.CenterCrop.forward) pads any dimensions that are smaller than the give center size with zeros. + +```python +preprocess_train.transforms +``` + +```python +preprocess_val.transforms +``` + +**TODO:** demo the preprocessors on our images + + +### Tokenizer + +```python +tokenizer = open_clip.get_tokenizer('hf-hub:imageomics/bioclip') +``` + +## Plankton images + +```python +import os +from cyto_ml.models.scivision import prepare_image +from intake_xarray import ImageSource +from dotenv import load_dotenv + +load_dotenv() # sets our object store endpoint and credentials from the .env file + +from intake import open_catalog +``` + +```python +dataset = open_catalog(f"{os.environ.get('ENDPOINT', '')}/metadata/intake.yml") +imgs = dataset.test_image().to_dask() +imgs.to_numpy().shape +``` + +```python +index_df = dataset.plankton().to_dask().compute() # intake-xarray is not sensible here - to_dask.compute() seems silly +index_df +``` + + +image_data = [ + ImageSource(row).to_dask().to_numpy() for row in index_df["Filename"].iloc[:100] # take first 100, 535 too many! +] +image_data[0].shape + + +```python +from tqdm.autonotebook import tqdm + +fnames = index_df["Filename"] + +images = [] + +for row in tqdm(fnames): + try: + image = ImageSource(row).to_dask().to_numpy() + except OSError as e: # I think one of these is a csv not an image! + print(e) + else: + images.append(image) +``` + +```python +import matplotlib.pyplot as plt + +plt.imshow(images[0]) +``` + +```python +from IPython.display import display +print(images[0].shape) +im = Image.fromarray(images[0], "RGB") +display(im) +``` + +```python +for i in images[10:50]: + display(Image.fromarray(i, "RGB")) +``` + +```python +images[0].shape +``` + +```python +dimensions = [image.shape[:-1] for image in images] +plt.scatter(*zip(*dimensions), s=3) +``` + +```python +test = preprocess_val(Image.fromarray(images[0], "RGB")) +print(test.shape) +print(test.min(), test.max()) +test + +testim = test.permute(2, 1, 0).numpy() +print(testim.shape) +im = Image.fromarray(test.permute(1, 2, 0).numpy(), "RGB") +display(im) +``` + +```python +#resize, crop, totensor, norm = preprocess_val.transforms + +test_image = Image.fromarray(images[16], "RGB") + +for transform in preprocess_val.transforms: + print("transform: ", transform) + test_image = transform(test_image) + print("new type: ", type(test_image)) + + + if isinstance(test_image, Image.Image): + display(test_image) + + elif isinstance(test_image, torch.Tensor): + plt.imshow(test_image.transpose(2, 0)) + print(test_image.min(), test_image.max()) + + else: + print(type(test_image)) +``` + +```python +from torchvision.transforms.v2.functional import to_image + +test_image = Image.fromarray(images[16], "RGB") + +poo = to_image(test_image) +poo.min(), poo.max() +``` + +## Put in the thing + +```python +test_image = Image.fromarray(images[16], "RGB") + +display(test_image) +``` + +```python +in_tensor = preprocess_val(test_image) +in_tensor.shape +``` + +```python +image_features, text_features, scale = model(in_tensor.unsqueeze(0)) +``` + +```python +image_features.shape +``` + +### One at a time or it cries + +```python +from pathlib import Path +from tqdm.autonotebook import tqdm +assert Path("features").is_dir() + +for i, image in tqdm(enumerate(images)): + image_tensor = preprocess_val(Image.fromarray(image, "RGB")) + out_feature, _, _ = model(image_tensor.unsqueeze(0)) + torch.save(out_feature, f"features/feature_{i}.pt") +``` + +```python +features = torch.cat( + [torch.load(f"features/feature_{i}.pt") for i in range(len(images))] +) +features.shape +``` + +```python +!pip install scikit-learn +``` + +```python +from sklearn.manifold import TSNE + +t_sne = TSNE() + +embedded = t_sne.fit_transform(features.detach().numpy()) +``` + +```python +embedded.shape +``` + +```python +plt.scatter(*zip(*embedded)) +``` + +```python + +``` + +## Resources + +### CLIP + +- [GitHub repo](https://github.com/mlfoundations/open_clip/) +- [Class definition](https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/model.py#L220) +- + +### BioCLIP + +- [arXiv paper](https://arxiv.org/pdf/2311.18803) +- [GitHub repo](https://github.com/Imageomics/bioclip) +- [Model card](https://huggingface.co/imageomics/bioclip) +- [Evaluation using t-SNE](https://imageomics.github.io/bioclip/#intrinsic) + +### Evaluation + +- [Scikit Learn TSNE class](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#tsne) +- [t-SNE from the original author](https://lvdmaaten.github.io/tsne/) +- [Demo about manifold learning](https://scikit-learn.org/stable/auto_examples/manifold/plot_compare_methods.html#) + +```python + +```