You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-8
Original file line number
Diff line number
Diff line change
@@ -192,18 +192,14 @@ AstroCLIP is trained using a two-step process:
192
192
### Single-Modal Pretraining
193
193
194
194
#### Image Pretraining - DINOv2 ViT:
195
-
AstroCLIP uses a Vision Transformer (ViT) to encode galaxy images. Pretraining is performed using the [DINOv2](https://github.com/facebookresearch/dinov2/) package, which combines self-distillation, masked-modeling, and contrastive objectives. Overall, we use largely the same training regime, however we modify some of the contrastive augmentations to suit an astrophysics context.
196
-
197
-
Model training can be launched with the following command:
195
+
AstroCLIP uses a Vision Transformer (ViT) to encode galaxy images. Pretraining is performed using the [DINOv2](https://github.com/facebookresearch/dinov2/) package, which combines self-distillation, masked-modeling, and contrastive objectives. Overall, we use largely the same training regime, however we modify some of the contrastive augmentations to suit an astrophysics context. Model training can be launched with the following command:
198
196
```
199
197
image_trainer -c astroclip/astrodino/config.yaml
200
198
```
201
199
We train the model using 20 A100 GPUs (on 5 nodes) for 250k steps which takes roughly 46 hours.
AstroCLIP uses a 1D Transformer to encode galaxy spectra. Pretraining is performed using a masked-modeling objective, whereby the 1D spectrum is split into contiguous, overlapping patches.
205
-
206
-
Model training can be launched with the following command:
202
+
AstroCLIP uses a 1D Transformer to encode galaxy spectra. Pretraining is performed using a masked-modeling objective, whereby the 1D spectrum is split into contiguous, overlapping patches. Model training can be launched with the following command:
207
203
```
208
204
spectrum_trainer fit -c config/specformer.yaml
209
205
```
@@ -219,13 +215,21 @@ We train the model using 4 A100 GPUs (on 1 node) for 25k steps or until the vali
219
215
220
216
## Downstream Tasks
221
217
222
-
TODO
218
+
We demonstrate that the AstroCLIP can be used to easily perform a variety of downstream tasks. In particular, we demonstrate their ability to do:
219
+
220
+
1. In-modal and cross-modal similarity search
221
+
2. Photometric redshift prediction
222
+
3. Physical property estimation from images
223
+
4. Physical property estimation from spectra
224
+
5. Morphology classification from images
225
+
226
+
The details of these downstream tasks and the results in our paper can be found in `astroclip/downstream_tasks`.
223
227
224
228
## Acknowledgements
225
229
This reposity uses datasets and contrastive augmentations from [Stein, et al. (2022)](https://github.com/georgestein/ssl-legacysurvey/tree/main). The image pretraining is built on top of the [DINOv2](https://github.com/facebookresearch/dinov2/) framework; we also thank Piotr Bojanowski for valuable conversations around image pretraining.
226
230
227
231
## License
228
232
AstroCLIP code and model weights are released under the MIT license. See [LICENSE](https://github.com/PolymathicAI/AstroCLIP/blob/main/LICENSE) for additional details.
0 commit comments