Skip to content

eqtylab/camel-prompt-injection

 
 

Repository files navigation

Edoardo Debenedetti1,3, Ilia Shumailov2, Tianqi Fan1, Jamie Hayes2, Nicholas Carlini2, Daniel Fabian1, Christoph Kern1, Chongyang Shi2, Florian Tramèr3

1Google, 2Google DeepMind, and 3ETH Zurich

Warning

This is a research artifact released to reproduce the results in our paper. The interpreter implementation likely contains bugs (e.g., it might throw uncaught exceptions and crash) and the implementation might not be fully secure.

This is not a Google product, and we are not planning to provide support for and/or maintain this codebase.

Pre-requisites

  1. Install uv via the official instructions.
  2. Rename .env.example to .env and populate it with your API keys.
  3. uv will install all dependencies as soon as you run uv run ....

Running running the defense against AgentDojo

uv run --env-file .env main.py MODEL_NAME [--use-original] [--ad_defense] [--reasoning-effort] [--thinking_budget_tokens] [--run-attack] [--replay-with-policies] [--eval_mode]

More details on the various CLI arguments can be found by running uv run main.py --help

FAQ

How do I try a new/different model?

You can add it to the models.py file, in the _supported_model_names variable. The keys are the model names with the given provider (check the provider's API) and the values is what the model says when asked "what model are you?". Keep in mind that OpenAI reasoning models are stored in the _oai_thinking_models variable instead.

If I have questions on the codebase how can I reach out?

Please open an issue in this repository. Please note that we are not planning to fix bugs as this codebase is just meant as a research artifact.

Running tests and linters

uv run ruff check --fix
uv run format
uv run pyright
uv run pytest

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

About

Code for the paper "Defeating Prompt Injections by Design"

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 70.4%
  • Python 29.6%