Skip to content

Commit 3fdf173

Browse files
authored
[docs] Update prompt weighting docs (#10843)
* sd_embed * feedback
1 parent aba4a57 commit 3fdf173

File tree

1 file changed

+137
-172
lines changed

1 file changed

+137
-172
lines changed

docs/source/en/using-diffusers/weighted_prompts.md

Lines changed: 137 additions & 172 deletions
Original file line numberDiff line numberDiff line change
@@ -215,144 +215,107 @@ image
215215

216216
Prompt weighting provides a way to emphasize or de-emphasize certain parts of a prompt, allowing for more control over the generated image. A prompt can include several concepts, which gets turned into contextualized text embeddings. The embeddings are used by the model to condition its cross-attention layers to generate an image (read the Stable Diffusion [blog post](https://huggingface.co/blog/stable_diffusion) to learn more about how it works).
217217

218-
Prompt weighting works by increasing or decreasing the scale of the text embedding vector that corresponds to its concept in the prompt because you may not necessarily want the model to focus on all concepts equally. The easiest way to prepare the prompt-weighted embeddings is to use [Compel](https://github.com/damian0815/compel), a text prompt-weighting and blending library. Once you have the prompt-weighted embeddings, you can pass them to any pipeline that has a [`prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.prompt_embeds) (and optionally [`negative_prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.negative_prompt_embeds)) parameter, such as [`StableDiffusionPipeline`], [`StableDiffusionControlNetPipeline`], and [`StableDiffusionXLPipeline`].
218+
Prompt weighting works by increasing or decreasing the scale of the text embedding vector that corresponds to its concept in the prompt because you may not necessarily want the model to focus on all concepts equally. The easiest way to prepare the prompt embeddings is to use [Stable Diffusion Long Prompt Weighted Embedding](https://github.com/xhinker/sd_embed) (sd_embed). Once you have the prompt-weighted embeddings, you can pass them to any pipeline that has a [prompt_embeds](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.prompt_embeds) (and optionally [negative_prompt_embeds](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.negative_prompt_embeds)) parameter, such as [`StableDiffusionPipeline`], [`StableDiffusionControlNetPipeline`], and [`StableDiffusionXLPipeline`].
219219

220220
<Tip>
221221

222222
If your favorite pipeline doesn't have a `prompt_embeds` parameter, please open an [issue](https://github.com/huggingface/diffusers/issues/new/choose) so we can add it!
223223

224224
</Tip>
225225

226-
This guide will show you how to weight and blend your prompts with Compel in 🤗 Diffusers.
226+
This guide will show you how to weight your prompts with sd_embed.
227227

228-
Before you begin, make sure you have the latest version of Compel installed:
228+
Before you begin, make sure you have the latest version of sd_embed installed:
229229

230-
```py
231-
# uncomment to install in Colab
232-
#!pip install compel --upgrade
230+
```bash
231+
pip install git+https://github.com/xhinker/sd_embed.git@main
233232
```
234233

235-
For this guide, let's generate an image with the prompt `"a red cat playing with a ball"` using the [`StableDiffusionPipeline`]:
234+
For this example, let's use [`StableDiffusionXLPipeline`].
236235

237236
```py
238-
from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler
237+
from diffusers import StableDiffusionXLPipeline, UniPCMultistepScheduler
239238
import torch
240239

241-
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_safetensors=True)
240+
pipe = StableDiffusionXLPipeline.from_pretrained("Lykon/dreamshaper-xl-1-0", torch_dtype=torch.float16)
242241
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
243242
pipe.to("cuda")
244-
245-
prompt = "a red cat playing with a ball"
246-
247-
generator = torch.Generator(device="cpu").manual_seed(33)
248-
249-
image = pipe(prompt, generator=generator, num_inference_steps=20).images[0]
250-
image
251-
```
252-
253-
<div class="flex justify-center">
254-
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/compel/forest_0.png"/>
255-
</div>
256-
257-
### Weighting
258-
259-
You'll notice there is no "ball" in the image! Let's use compel to upweight the concept of "ball" in the prompt. Create a [`Compel`](https://github.com/damian0815/compel/blob/main/doc/compel.md#compel-objects) object, and pass it a tokenizer and text encoder:
260-
261-
```py
262-
from compel import Compel
263-
264-
compel_proc = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder)
265243
```
266244

267-
compel uses `+` or `-` to increase or decrease the weight of a word in the prompt. To increase the weight of "ball":
245+
To upweight or downweight a concept, surround the text with parentheses. More parentheses applies a heavier weight on the text. You can also append a numerical multiplier to the text to indicate how much you want to increase or decrease its weights by.
268246

269-
<Tip>
270-
271-
`+` corresponds to the value `1.1`, `++` corresponds to `1.1^2`, and so on. Similarly, `-` corresponds to `0.9` and `--` corresponds to `0.9^2`. Feel free to experiment with adding more `+` or `-` in your prompt!
247+
| format | multiplier |
248+
|---|---|
249+
| `(hippo)` | increase by 1.1x |
250+
| `((hippo))` | increase by 1.21x |
251+
| `(hippo:1.5)` | increase by 1.5x |
252+
| `(hippo:0.5)` | decrease by 4x |
272253

273-
</Tip>
254+
Create a prompt and use a combination of parentheses and numerical multipliers to upweight various text.
274255

275256
```py
276-
prompt = "a red cat playing with a ball++"
257+
from sd_embed.embedding_funcs import get_weighted_text_embeddings_sdxl
258+
259+
prompt = """A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus.
260+
This imaginative creature features the distinctive, bulky body of a hippo,
261+
but with a texture and appearance resembling a golden-brown, crispy waffle.
262+
The creature might have elements like waffle squares across its skin and a syrup-like sheen.
263+
It's set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting,
264+
possibly including oversized utensils or plates in the background.
265+
The image should evoke a sense of playful absurdity and culinary fantasy.
266+
"""
267+
268+
neg_prompt = """\
269+
skin spots,acnes,skin blemishes,age spot,(ugly:1.2),(duplicate:1.2),(morbid:1.21),(mutilated:1.2),\
270+
(tranny:1.2),mutated hands,(poorly drawn hands:1.5),blurry,(bad anatomy:1.2),(bad proportions:1.3),\
271+
extra limbs,(disfigured:1.2),(missing arms:1.2),(extra legs:1.2),(fused fingers:1.5),\
272+
(too many fingers:1.5),(unclear eyes:1.2),lowers,bad hands,missing fingers,extra digit,\
273+
bad hands,missing fingers,(extra arms and legs),(worst quality:2),(low quality:2),\
274+
(normal quality:2),lowres,((monochrome)),((grayscale))
275+
"""
277276
```
278277

279-
Pass the prompt to `compel_proc` to create the new prompt embeddings which are passed to the pipeline:
280-
281-
```py
282-
prompt_embeds = compel_proc(prompt)
283-
generator = torch.manual_seed(33)
284-
285-
image = pipe(prompt_embeds=prompt_embeds, generator=generator, num_inference_steps=20).images[0]
286-
image
287-
```
278+
Use the `get_weighted_text_embeddings_sdxl` function to generate the prompt embeddings and the negative prompt embeddings. It'll also generated the pooled and negative pooled prompt embeddings since you're using the SDXL model.
288279

289-
<div class="flex justify-center">
290-
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/compel/forest_1.png"/>
291-
</div>
292-
293-
To downweight parts of the prompt, use the `-` suffix:
294-
295-
```py
296-
prompt = "a red------- cat playing with a ball"
297-
prompt_embeds = compel_proc(prompt)
298-
299-
generator = torch.manual_seed(33)
300-
301-
image = pipe(prompt_embeds=prompt_embeds, generator=generator, num_inference_steps=20).images[0]
302-
image
303-
```
304-
305-
<div class="flex justify-center">
306-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/compel-neg.png"/>
307-
</div>
308-
309-
You can even up or downweight multiple concepts in the same prompt:
310-
311-
```py
312-
prompt = "a red cat++ playing with a ball----"
313-
prompt_embeds = compel_proc(prompt)
314-
315-
generator = torch.manual_seed(33)
316-
317-
image = pipe(prompt_embeds=prompt_embeds, generator=generator, num_inference_steps=20).images[0]
318-
image
319-
```
320-
321-
<div class="flex justify-center">
322-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/compel-pos-neg.png"/>
323-
</div>
324-
325-
### Blending
326-
327-
You can also create a weighted *blend* of prompts by adding `.blend()` to a list of prompts and passing it some weights. Your blend may not always produce the result you expect because it breaks some assumptions about how the text encoder functions, so just have fun and experiment with it!
280+
> [!TIP]
281+
> You can safely ignore the error message below about the token index length exceeding the models maximum sequence length. All your tokens will be used in the embedding process.
282+
>
283+
> ```
284+
> Token indices sequence length is longer than the specified maximum sequence length for this model
285+
> ```
328286
329287
```py
330-
prompt_embeds = compel_proc('("a red cat playing with a ball", "jungle").blend(0.7, 0.8)')
331-
generator = torch.Generator(device="cuda").manual_seed(33)
288+
(
289+
prompt_embeds,
290+
prompt_neg_embeds,
291+
pooled_prompt_embeds,
292+
negative_pooled_prompt_embeds
293+
) = get_weighted_text_embeddings_sdxl(
294+
pipe,
295+
prompt=prompt,
296+
neg_prompt=neg_prompt
297+
)
332298
333-
image = pipe(prompt_embeds=prompt_embeds, generator=generator, num_inference_steps=20).images[0]
299+
image = pipe(
300+
prompt_embeds=prompt_embeds,
301+
negative_prompt_embeds=prompt_neg_embeds,
302+
pooled_prompt_embeds=pooled_prompt_embeds,
303+
negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
304+
num_inference_steps=30,
305+
height=1024,
306+
width=1024 + 512,
307+
guidance_scale=4.0,
308+
generator=torch.Generator("cuda").manual_seed(2)
309+
).images[0]
334310
image
335311
```
336312
337313
<div class="flex justify-center">
338-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/compel-blend.png"/>
314+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sd_embed_sdxl.png"/>
339315
</div>
340316

341-
### Conjunction
342-
343-
A conjunction diffuses each prompt independently and concatenates their results by their weighted sum. Add `.and()` to the end of a list of prompts to create a conjunction:
344-
345-
```py
346-
prompt_embeds = compel_proc('["a red cat", "playing with a", "ball"].and()')
347-
generator = torch.Generator(device="cuda").manual_seed(55)
348-
349-
image = pipe(prompt_embeds=prompt_embeds, generator=generator, num_inference_steps=20).images[0]
350-
image
351-
```
352-
353-
<div class="flex justify-center">
354-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/compel-conj.png"/>
355-
</div>
317+
> [!TIP]
318+
> Refer to the [sd_embed](https://github.com/xhinker/sd_embed) repository for additional details about long prompt weighting for FLUX.1, Stable Cascade, and Stable Diffusion 1.5.
356319
357320
### Textual inversion
358321

@@ -363,35 +326,63 @@ Create a pipeline and use the [`~loaders.TextualInversionLoaderMixin.load_textua
363326
```py
364327
import torch
365328
from diffusers import StableDiffusionPipeline
366-
from compel import Compel, DiffusersTextualInversionManager
367329

368330
pipe = StableDiffusionPipeline.from_pretrained(
369-
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16,
370-
use_safetensors=True, variant="fp16").to("cuda")
331+
"stable-diffusion-v1-5/stable-diffusion-v1-5",
332+
torch_dtype=torch.float16,
333+
).to("cuda")
371334
pipe.load_textual_inversion("sd-concepts-library/midjourney-style")
372335
```
373336

374-
Compel provides a `DiffusersTextualInversionManager` class to simplify prompt weighting with textual inversion. Instantiate `DiffusersTextualInversionManager` and pass it to the `Compel` class:
337+
Add the `<midjourney-style>` text to the prompt to trigger the textual inversion.
375338

376339
```py
377-
textual_inversion_manager = DiffusersTextualInversionManager(pipe)
378-
compel_proc = Compel(
379-
tokenizer=pipe.tokenizer,
380-
text_encoder=pipe.text_encoder,
381-
textual_inversion_manager=textual_inversion_manager)
340+
from sd_embed.embedding_funcs import get_weighted_text_embeddings_sd15
341+
342+
prompt = """<midjourney-style> A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus.
343+
This imaginative creature features the distinctive, bulky body of a hippo,
344+
but with a texture and appearance resembling a golden-brown, crispy waffle.
345+
The creature might have elements like waffle squares across its skin and a syrup-like sheen.
346+
It's set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting,
347+
possibly including oversized utensils or plates in the background.
348+
The image should evoke a sense of playful absurdity and culinary fantasy.
349+
"""
350+
351+
neg_prompt = """\
352+
skin spots,acnes,skin blemishes,age spot,(ugly:1.2),(duplicate:1.2),(morbid:1.21),(mutilated:1.2),\
353+
(tranny:1.2),mutated hands,(poorly drawn hands:1.5),blurry,(bad anatomy:1.2),(bad proportions:1.3),\
354+
extra limbs,(disfigured:1.2),(missing arms:1.2),(extra legs:1.2),(fused fingers:1.5),\
355+
(too many fingers:1.5),(unclear eyes:1.2),lowers,bad hands,missing fingers,extra digit,\
356+
bad hands,missing fingers,(extra arms and legs),(worst quality:2),(low quality:2),\
357+
(normal quality:2),lowres,((monochrome)),((grayscale))
358+
"""
382359
```
383360

384-
Incorporate the concept to condition a prompt with using the `<concept>` syntax:
361+
Use the `get_weighted_text_embeddings_sd15` function to generate the prompt embeddings and the negative prompt embeddings.
385362

386363
```py
387-
prompt_embeds = compel_proc('("A red cat++ playing with a ball <midjourney-style>")')
364+
(
365+
prompt_embeds,
366+
prompt_neg_embeds,
367+
) = get_weighted_text_embeddings_sd15(
368+
pipe,
369+
prompt=prompt,
370+
neg_prompt=neg_prompt
371+
)
388372

389-
image = pipe(prompt_embeds=prompt_embeds).images[0]
373+
image = pipe(
374+
prompt_embeds=prompt_embeds,
375+
negative_prompt_embeds=prompt_neg_embeds,
376+
height=768,
377+
width=896,
378+
guidance_scale=4.0,
379+
generator=torch.Generator("cuda").manual_seed(2)
380+
).images[0]
390381
image
391382
```
392383

393384
<div class="flex justify-center">
394-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/compel-text-inversion.png"/>
385+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sd_embed_textual_inversion.png"/>
395386
</div>
396387

397388
### DreamBooth
@@ -401,70 +392,44 @@ image
401392
```py
402393
import torch
403394
from diffusers import DiffusionPipeline, UniPCMultistepScheduler
404-
from compel import Compel
405395

406396
pipe = DiffusionPipeline.from_pretrained("sd-dreambooth-library/dndcoverart-v1", torch_dtype=torch.float16).to("cuda")
407397
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
408398
```
409399

410-
Create a `Compel` class with a tokenizer and text encoder, and pass your prompt to it. Depending on the model you use, you'll need to incorporate the model's unique identifier into your prompt. For example, the `dndcoverart-v1` model uses the identifier `dndcoverart`:
411-
412-
```py
413-
compel_proc = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder)
414-
prompt_embeds = compel_proc('("magazine cover of a dndcoverart dragon, high quality, intricate details, larry elmore art style").and()')
415-
image = pipe(prompt_embeds=prompt_embeds).images[0]
416-
image
417-
```
418-
419-
<div class="flex justify-center">
420-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/compel-dreambooth.png"/>
421-
</div>
422-
423-
### Stable Diffusion XL
424-
425-
Stable Diffusion XL (SDXL) has two tokenizers and text encoders so it's usage is a bit different. To address this, you should pass both tokenizers and encoders to the `Compel` class:
400+
Depending on the model you use, you'll need to incorporate the model's unique identifier into your prompt. For example, the `dndcoverart-v1` model uses the identifier `dndcoverart`:
426401

427402
```py
428-
from compel import Compel, ReturnedEmbeddingsType
429-
from diffusers import DiffusionPipeline
430-
from diffusers.utils import make_image_grid
431-
import torch
432-
433-
pipeline = DiffusionPipeline.from_pretrained(
434-
"stabilityai/stable-diffusion-xl-base-1.0",
435-
variant="fp16",
436-
use_safetensors=True,
437-
torch_dtype=torch.float16
438-
).to("cuda")
439-
440-
compel = Compel(
441-
tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2] ,
442-
text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
443-
returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
444-
requires_pooled=[False, True]
403+
from sd_embed.embedding_funcs import get_weighted_text_embeddings_sd15
404+
405+
prompt = """dndcoverart of A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus.
406+
This imaginative creature features the distinctive, bulky body of a hippo,
407+
but with a texture and appearance resembling a golden-brown, crispy waffle.
408+
The creature might have elements like waffle squares across its skin and a syrup-like sheen.
409+
It's set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting,
410+
possibly including oversized utensils or plates in the background.
411+
The image should evoke a sense of playful absurdity and culinary fantasy.
412+
"""
413+
414+
neg_prompt = """\
415+
skin spots,acnes,skin blemishes,age spot,(ugly:1.2),(duplicate:1.2),(morbid:1.21),(mutilated:1.2),\
416+
(tranny:1.2),mutated hands,(poorly drawn hands:1.5),blurry,(bad anatomy:1.2),(bad proportions:1.3),\
417+
extra limbs,(disfigured:1.2),(missing arms:1.2),(extra legs:1.2),(fused fingers:1.5),\
418+
(too many fingers:1.5),(unclear eyes:1.2),lowers,bad hands,missing fingers,extra digit,\
419+
bad hands,missing fingers,(extra arms and legs),(worst quality:2),(low quality:2),\
420+
(normal quality:2),lowres,((monochrome)),((grayscale))
421+
"""
422+
423+
(
424+
prompt_embeds
425+
, prompt_neg_embeds
426+
) = get_weighted_text_embeddings_sd15(
427+
pipe
428+
, prompt = prompt
429+
, neg_prompt = neg_prompt
445430
)
446431
```
447432

448-
This time, let's upweight "ball" by a factor of 1.5 for the first prompt, and downweight "ball" by 0.6 for the second prompt. The [`StableDiffusionXLPipeline`] also requires [`pooled_prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/stable_diffusion_xl#diffusers.StableDiffusionXLInpaintPipeline.__call__.pooled_prompt_embeds) (and optionally [`negative_pooled_prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/stable_diffusion_xl#diffusers.StableDiffusionXLInpaintPipeline.__call__.negative_pooled_prompt_embeds)) so you should pass those to the pipeline along with the conditioning tensors:
449-
450-
```py
451-
# apply weights
452-
prompt = ["a red cat playing with a (ball)1.5", "a red cat playing with a (ball)0.6"]
453-
conditioning, pooled = compel(prompt)
454-
455-
# generate image
456-
generator = [torch.Generator().manual_seed(33) for _ in range(len(prompt))]
457-
images = pipeline(prompt_embeds=conditioning, pooled_prompt_embeds=pooled, generator=generator, num_inference_steps=30).images
458-
make_image_grid(images, rows=1, cols=2)
459-
```
460-
461-
<div class="flex gap-4">
462-
<div>
463-
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/compel/sdxl_ball1.png"/>
464-
<figcaption class="mt-2 text-center text-sm text-gray-500">"a red cat playing with a (ball)1.5"</figcaption>
465-
</div>
466-
<div>
467-
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/compel/sdxl_ball2.png"/>
468-
<figcaption class="mt-2 text-center text-sm text-gray-500">"a red cat playing with a (ball)0.6"</figcaption>
469-
</div>
433+
<div class="flex justify-center">
434+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sd_embed_dreambooth.png"/>
470435
</div>

0 commit comments

Comments
 (0)