Add EvilTwin optimizer for evil twin prompt optimization #7893

ramisbahi · 2025-03-03T08:35:34Z

This PR introduces the EvilTwin optimizer to DSPy, implementing the Greedy Coordinate Gradient (GCG) algorithm for evil twin prompt optimization. EvilTwin generates evil twin prompts that induce similar model outputs while appearing garbled or obfuscated, based on the "Prompts have evil twins" paper.

Key Features:

Uses KL divergence minimization to iteratively modify prompts to achieve a similar output distribution.
Runs on local models (default: "EleutherAI/gpt-neo-125M") since it requires gradients, logits, and token-level likelihoods, which API-based LLMs don’t expose.
Supports customizable optimization settings (e.g., n_epochs, batch_size, top_k, gamma for fluency penalty).
Provides an easy way to retrieve the final optimized prompt via optimizer.optimized_prompt.

Example Usage:

from dspy.teleprompt.evil_twin import EvilTwin

predictor = dspy.Predict('question -> answer')
q = "Describe the definition of artificial intelligence in one sentence."

optimizer = EvilTwin(question=q)
optimized_predictor = optimizer.compile(program=predictor)

print("Optimized Evil Twin Prompt:", optimizer.optimized_prompt)
original_response = predictor(question=q)
evil_twin_response = optimized_predictor(question=q)

print("Original Output:", original_response.answer)
print("Evil Twin Output:", evil_twin_response.answer)

Notes:

EvilTwin is best run on a GPU due to the computational cost of token gradient updates.
Future work may include warm start initialization, as proposed in the Evil Twins paper.

This PR enhances DSPy’s optimizer suite by enabling adversarial prompt exploration, making it a powerful tool for LLM evaluation and security research. 🚀

ramisbahi added 20 commits February 26, 2025 22:00

ignore venv

35dad20

initial attempt

4cb0971

Working version - unbatched

e5d523d

solution w/ batching for kl divergence

62fd15e

remove parallel executor - reduce memory usage

15d0d71

add gcg_log.json to gitignore

3f0bd34

Fix spelling

f972d14

remove comment

c90773e

Allow for any input field name (not just "question")

e0563d1

change default batch size to 5

59e4efc

add documentation and allow for accessing final optimized prompt

b5d375f

Fix indentation of md

84a2d71

remove git repo from requirements

bf780ed

Update md to explain local model usage

ae1d348

change language - evil twins

12e7a5a

Add einops and torch to requirements.txt

87d3005

Add torch/einops dependencies to pyproject.toml

0ad53a2

Add ruff fix, update poetry.lock, add transformers

e0fef04

Merge remote-tracking branch 'upstream/main' into evil_twins

ef7bd09

update poetry lock

5e197af

okhat closed this May 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add EvilTwin optimizer for evil twin prompt optimization #7893

Add EvilTwin optimizer for evil twin prompt optimization #7893

Uh oh!

ramisbahi commented Mar 3, 2025

Uh oh!

Uh oh!

Add EvilTwin optimizer for evil twin prompt optimization #7893

Add EvilTwin optimizer for evil twin prompt optimization #7893

Uh oh!

Conversation

ramisbahi commented Mar 3, 2025

Key Features:

Example Usage:

Notes:

Uh oh!

Uh oh!