Fine-tuning the Text Encoder of Dreambooth

In Dreambooth blog (https://huggingface.co/blog/dreambooth), I cannot reproduce the result of Fine-tuning the Text Encoder part. After fine-tuning the unet and text encoder, the generated faces images seemed to have lost a lot of prior semantics. I think there may be something wrong about my setup: 
```
accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a photo of sks person" \
  --class_prompt="a photo of person" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=2e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=2000 \
  --save_interval=200 \
  --train_text_encoder
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine-tuning the Text Encoder of Dreambooth #2396

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fine-tuning the Text Encoder of Dreambooth #2396

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions