You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello the community,
I wanted to train an LDM using remote sensing data, and due to the difference in data sources, I re-used the remotely sensed data to fine-tune the VAE. According to the regular process of training LDM, we will encode the training data with VAE and then train the LDM. however, after I train 10k steps the loss still doesn't show a decreasing trend, I am very confused about this.
For the specific model I am using MDT (a variant of DiT) and training with the smallest version of it (12 DiT Blocks). One phenomenon I noticed during training is that when I trained on pixel space (another independent experiment), the loss dropped very fast and the model converged very quickly; however, when training on latent space, the loss hardly changed. I suspected that it was a problem of training VAE by myself, then I observed my VAE encoded vectors and used t-SNE to visualize the downscaling as below (20k vectors and 100k vectors), but again, it felt like nothing was wrong.
I use scale=1.2 for normalization when training the LDM (1.2 is obtained by counting the standard deviation of the encoded vectors) Anyone know what went wrong? thanks!!!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello the community,
![t-SNE-20k](https://private-user-images.githubusercontent.com/86882618/317728546-e06b001b-66cc-4dda-9a1a-74898d03e02d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1ODU3MjksIm5iZiI6MTczOTU4NTQyOSwicGF0aCI6Ii84Njg4MjYxOC8zMTc3Mjg1NDYtZTA2YjAwMWItNjZjYy00ZGRhLTlhMWEtNzQ4OThkMDNlMDJkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDAyMTAyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTczYzliNjM1OTIwODkwY2RkOGQ4ZDQ5NjYzNDZiZWEwODYwZWViZDhmYzYzZTQ1NTU4M2YwMThiYzY2MzI0ZDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.4DcrbKwp0pSGl-pJWUAALVkvgMsV3WZ6t_zxeKOmvug)
![t-SNE-100k](https://private-user-images.githubusercontent.com/86882618/317728562-42e148ac-b3d3-41e4-8654-a2b55a9715c1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1ODU3MjksIm5iZiI6MTczOTU4NTQyOSwicGF0aCI6Ii84Njg4MjYxOC8zMTc3Mjg1NjItNDJlMTQ4YWMtYjNkMy00MWU0LTg2NTQtYTJiNTVhOTcxNWMxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDAyMTAyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTE5NzgxOWQ4MjRmY2UyMzcwNjg0NzEwOGY2YWU2ZTU2MTViN2YzZjBjOGFmNGNhNTU4MmMxNzI4YTI1YTA2OWMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.NCEHy6hCCl_t8j20dJpIHWq2eQNkUUFSTbWv3nUJPJw)
I wanted to train an LDM using remote sensing data, and due to the difference in data sources, I re-used the remotely sensed data to fine-tune the VAE. According to the regular process of training LDM, we will encode the training data with VAE and then train the LDM. however, after I train 10k steps the loss still doesn't show a decreasing trend, I am very confused about this.
For the specific model I am using MDT (a variant of DiT) and training with the smallest version of it (12 DiT Blocks). One phenomenon I noticed during training is that when I trained on pixel space (another independent experiment), the loss dropped very fast and the model converged very quickly; however, when training on latent space, the loss hardly changed. I suspected that it was a problem of training VAE by myself, then I observed my VAE encoded vectors and used t-SNE to visualize the downscaling as below (20k vectors and 100k vectors), but again, it felt like nothing was wrong.
I use scale=1.2 for normalization when training the LDM (1.2 is obtained by counting the standard deviation of the encoded vectors) Anyone know what went wrong? thanks!!!
Beta Was this translation helpful? Give feedback.
All reactions