Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply model explainability tools to the images output by similarity search #6

Closed
metazool opened this issue Jul 3, 2024 · 3 comments
Labels
wontfix This will not be worked on

Comments

@metazool
Copy link
Collaborator

metazool commented Jul 3, 2024

Exploration of model explainability techniques using the prediction capabilities of the CEFAS model, in complement to using it as a source of embeddings.

E.g. we take the images resulting from a similarity search of the embeddings, make predictions with the original model and look at the visual features that influenced the predictions

SHAP / LIME are the ones I'm familiar with but there's a whole toolbox in the Captum API - suggestions of approaches that worked well during development of AMI-system would be appreciated, @albags !

  • Do heatmaps of features in similar subsets look properly coherent?
  • Can this be reproduced using the CEFAS reference data, if there's a reserved test set available?
  • Are the prediction capabilities of the model of any immediate use to the researchers working with the FlowCam data?
  • In the best case, can we show the model's view of "functional traits" in a way that looks familiar and meaningful to the researchers?
  • Can we flush out other factors like image dimensions or object size that may be giving a false positive impression of the results of the embedding search, to gauge whether it's truly useful to put effort into self-supervised clustering of the embeddings
@metazool
Copy link
Collaborator Author

metazool commented Jul 4, 2024

A quick note on this as I may not make time to finish the branch, to the extent worth doing so, this week

Initial output was a lot more inconclusive than i'd hoped for. Could be a range of reasons including

  1. the plankton-cefas ResNet model is undercooked (is there any info about how it was trained?)
  2. its classification mode is a poor fit for our data (unsurprisingly)
  3. we're missing a normalisation step for the input and that's throwing things off (are there worked examples)

It's worth running the same attempted interpretations over a CEFAS plankton test set before drawing any conclusions. This seems not worth pursuing much more because using the scivision model for classification was never the intention, this was only to throw light on how and why it seems to work pretty well for feature extraction.

It's also worth going back a step, to extract and compare embeddings using different networks - using a generic ImageNet-type Resnet50 that's never specifically looked at plankton, and a default network as a sense check.

short video dataviz of occlusion output - most of the other methods i tried were even more garbled. we should expect to see much more consistency here

@metazool
Copy link
Collaborator Author

I was on the point of closing #7 as

  1. The results were unhelpfully inconclusive (image size, model maturity, other?)
  2. Subsequent refactoring would now involve an overhaul of the code for work we don't particularly need

It's a useful line in the sand though. Low-priority but still actionable?

@metazool metazool added the wontfix This will not be worked on label Sep 11, 2024
@metazool
Copy link
Collaborator Author

Closed this along with #7 - see comments there

@metazool metazool closed this as not planned Won't fix, can't repro, duplicate, stale Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

1 participant