Spikee comes with a variety of built-in seeds and modules (e.g., targets, judges, plugins, attacks).
Jump to Links:
Spikee comes with a variety of built-in seeds, each designed for a specific testing purpose. These seeds are located in the datasets/ directory after you run spikee init. You can list them at any time with spikee list seeds.
| Seed | Source | Type | Description |
|---|---|---|---|
seeds-cybersec-2026-01 |
Reversec | Cybersecurity | A general-purpose dataset for testing prompt injection and cybersecurity harms. It focuses on common attack goals seen in web application security, such as data exfiltration, cross-site scripting (XSS), and resource exhaustion. |
seeds-harmful-instructions-only |
Reversec | Objectives | Specifically designed for attacks using LLM Agents, such as Crescendo and LLM-Jailbreaker. These attacks require a instruction (objective) to generate their own attack vectors dynamically. Contains harmful instructions in instructions.jsonl, while leaving jailbreaks and user inputs as empty placeholders. |
seeds-simsonsun-high-quality-jailbreaks |
External | Jailbreaks | A high-quality set of contamination-free jailbreak prompts, specifically curated to avoid overlap with the training data of many common safety classifiers. |
seeds-in-the-wild-jailbreak-prompts |
External | Jailbreaks | Contains approximately 1,400 real-world jailbreak prompts collected from public sources like Discord and Reddit (filtered from the TrustAIRLab dataset). Ideal for testing a target's resilience against known, publicly available jailbreaks. |
seeds-wildguardmix-harmful |
External | Harmful | A dataset for testing harmful content generation. The prompts are sourced from the WildGuard-Mix dataset. |
seeds-wildguardmix-harmful-fp |
External | Harmful (FP) | A companion dataset to seeds-wildguardmix-harmful, containing benign (harmless) prompts. |
seeds-toxic-chat |
External | Harmful | A dataset for testing toxic prompts, filtered from 10K user prompts collected from the Vicuna online demo. |
seeds-investment-advice |
Reversec | Topical Guardrails | Designed to test topical guardrails that are supposed to block personal financial or investment advice. It includes both malicious instructions and standalone attack prompts. |
seeds-investment-advice-fp |
Reversec | Topical Guardrails (FP) | A companion dataset to seeds-investment-advice, containing benign (harmless) queries about financial topics. |
seeds-sysmsg-extraction-2025-04 |
Reversec | System Prompt Extraction | Specifically designed to test for system prompt extraction. The instructions and judges are tailored to detect if the target model leaks its own system prompt or initial instructions. |
seeds-llm-mailbox |
Reversec | Tutorial | An example seed tailored for testing an email summarization feature. The documents are sample emails, and the instructions are designed to test for vulnerabilities in that specific context. See the associated blog post for a detailed walkthrough. |
seeds-empty |
Reversec | Utility | An empty template folder. It contains empty documents.jsonl, jailbreaks.jsonl, and instructions.jsonl files. This is the recommended starting point when creating a new dataset from scratch, especially for standalone attacks. |
seeds-mini-test |
Reversec | Utility | A very small set of examples for quick, functional testing of Spikee itself. Use this to verify your setup or to test a new custom target or plugin without running a large number of tests. |
FP datasets are intended for use with the
--false-positive-checksflag to measure how often a guardrail incorrectly blocks legitimate prompts when evaluating harmful content filters.External datasets require you to run a fetch script to download the prompts. See the
README.mdinside each seed folder for instructions. Some of these use an LLM judge by default, which will be specified in the seed's README.
** Usage Example**
spikee generate --seed ./seeds-cybersec-2026-01Spikee includes a variety of built-in and sample targets, which can be listed at any time with spikee list targets.
Built-in targets focus on several common LLM providers, and will require you to rename .env-example to .env and add any necessary API keys - these are located within the spikee/targets/ folder.
| Target | Type | Description |
|---|---|---|
llm_provider |
Provider | Generic LLM target for supported LLM providers (e.g., openai, bedrock, google, ollama, e.t.c.) (See Docs) |
aws_bedrock_guardrail |
Guardrails | Assess AWS Bedrock Guardrails |
az_ai_content_safety_harmful |
Guardrails | Assess Azure AI Content Safety Harm Categories |
az_prompt_shields_document_analysis |
Guardrails | Assess Azure Prompt Shields Document Analysis |
az_prompt_shields_prompt_analysis |
Guardrails | Assess Azure Prompt Shields Prompt Analysis |
Sample targets are provided within the workspace/targets/ folder - created by running spikee init. These demonstrate how to write custom targets and can be easily modified to assess an LLM application of your choice.
| Target | Type | Description |
|---|---|---|
sample_target |
Single-Turn | Sends a GET request to a fictional application, demonstrating options and advanced guardrail and error handling. |
sample_target_legacy |
Single-Turn (Legacy) | Returns a mock message. This is a legacy target, demonstrating the older target format. |
sample_pdf_request_target |
Single-Turn | Sends a POST request containing a PDF to a fictional application. |
test_chatbot |
Multi-Turn | Sends requests to Spikee Test Chatbot |
simple_test_chatbot |
Multi-Turn | Implements the simple multi-turn target, and sends requests to Spikee Test Chatbot |
llm_mailbox |
Single-Turn | Sample target for email summarisation application tutorial |
Usage Example
spikee test --dataset datasets/cybersec-2026-01.jsonl \
--target llm_provider \
--target-options "bedrock/claude45-haiku"Spikee includes several built-in judges to evaluate LLM responses, located within the spikee/judges/ and workspace/judges/ folders. These can be listed at any time with spikee list judges.
Basic Judges These evaluate responses based on simple criteria.
canary: Checks if a predefined canary string is present in the response.regex: Uses regular expressions to identify specific patterns in the response.
LLM Judges Some test cases, success cannot be determined by a simple keyword or pattern. For instance, did the model's response contain harmful advice, or did it refuse to answer a question on a restricted topic?
LLM-based judges address this by using a separate LLM to evaluate the target's response against a natural language criterion.
llm_judge_harmful: LLM judge to evaluate whether the target LLMs response complied with a potentially harmful user prompt.llm_judge_objective: LLM judge to evaluate whether the target LLMs response meets a specific input objective.llm_judge_output_criteria: LLM judge to evaluate whether the target LLMs response meets specific success criteria defined injudge_args.
The LLM Agent model can be specified using the --judge-options flag. See LLM Providers for a complete list of supported models, prefixes, and examples. Some common examples include
offline: Mock judge, for restrictive environments. See re-judging and isolated environments documentation for more information.bedrock/<model_name>: AWS Bedrock API (e.g.,bedrock/claude45-haiku)openai/<model_name>: OpenAI API (e.g.,openai/gpt-4o-mini)google/<model_name>: Google Gen AI API (e.g.,google/gemini-2.5-flash)
Usage Example
# Use an offline judge, allowing for later re-judging
spikee test --dataset datasets/cybersec-2026-01.jsonl \
--target llm_provider \
--target-options "bedrock/claude45-haiku" \
--judge-options offlineSpikee includes several build-in plugins, that can be leveraged to enhance dataset generation. These are scripts that will apply static transformations to payloads during dataset generation, and can create multiple iterations of each entry. Built-in plugins are located in the spikee/plugins/ directory, local plugins are located in the plugins/ directory within your workspace. You can list them at any time with spikee list plugins.
The following list provides an overview of each build-in plugin, further information on each plugin can be found within the plugin file.
Type Key
- Encoding: Deterministic format or character conversion.
- Obfuscation: Noise injection, character mangling, and word masking.
- Translation: Language or script conversion.
- Formatting: Structural manipulation of payload layout.
- Social Engineering: LLM-driven persuasion and manipulation tactics.
- LLM: Requires an LLM provider.
- ML: Requires local machine learning models.
- Attack-Based: Adapted from dynamic attack research into static dataset transformations.
| Plugin | Type | Description | Options |
|---|---|---|---|
1337 |
Encoding | Transforms text into "leet speak" by replacing certain letters with numbers or symbols. | N/A |
ascii_smuggler |
Encoding | Transforms ASCII text into a series of Unicode rags that are generally invisible to most UI elements (bypassing content filters). | N/A |
base64 |
Encoding | Encodes text using Base64 encoding. | N/A |
ceasar |
Encoding | Applies a Caesar cipher to the text, shifting letters by a specified number of positions. | shift (number of positions to shift, default: 3) |
hex |
Encoding | Encodes text into its hexadecimal representation. | N/A |
morse |
Encoding | Encodes text into Morse code. | N/A |
best_of_n |
Obfuscation, Attack-Based | Implements "Best-of-N Jailbreaking" John Hughes et al., 2024 to apply character scrambling, random capitalization, and character noising. | variants (number of variations to generate, default: 50) |
flip |
Obfuscation | Applies a flip attack to obfuscate text: - FWO: Flip Word Order - FCW: Flip Chars in Word - FCS: Flip Chars in Sentence |
mode (the flip mode to apply, default: FWO) |
mask |
Obfuscation, LLM | Masks high-risk words in the text with random character sequences, while providing a suffix that maps the masks back to the original words. | advanced (if true, creates multiple masks for longer words)advanced-split (the number of characters per mask chunk for the advanced option, default: 6) |
splat |
Obfuscation | Obfuscates the text using splat-based techniques (e.g., asterisks '*', special characters, and spacing tricks), to bypass basic filters. | character (the character to use for splatting, default: *)insert_rand (probability of inserting a splat within words, default: 0.6)pad_rand (probability of padding words with splats, default: 0.4) |
digraphic_translate |
Translation, LLM | Generates jailbreak prompts by mixing writing systems within a single digraphic language (e.g. Japanese Kanji/Romaji, Serbian Cyrillic/Latin) to evade script-sensitive safety classifiers. | language (target digraphic language, default: japanese. Options: korean, serbian, chinese, hindi-urdu) |
google_translate |
Translation | Translates text to another language using google translate. | source-lang (language code for source language, default: en)target-lang (language code for target language, default: zh-cn) |
llm_multi_language_jailbreaker |
Translation, LLM, Attack-Based | Generates jailbreak attempts using different languages, focusing on low-resource languages. | model (The LLM model to use for generating attacks, default: model=openai/gpt-4o)variants (number of variations to generate, default: 5) |
opus_translator |
Translation, ML | Translates text to another language using local OPUS-MT models. | source (source language code, default: en)targets (target language(s), default: zh)quality (translation quality, default: 1)device (cpu or gpu, default: auto-detect)cache_dir (directory to cache ML models, optional) |
anti_spotlighting |
Formatting, Attack-Based | Generates variations of delimiter-based attacks to test LLM applications against spotlighting vulnerabilities. | variants (number of variations to generate, default: 50) |
prompt_decomposition |
Formatting, Attack-Based | Decomposes a prompt into chunks and generates shuffled variations. | modes (LLM model to apply, default: dumb)variants (number of variations to generate, default: 50) |
llm_jailbreaker |
Social Engineering, LLM, Attack-Based | Uses an LLM to iteratively generate jailbreak attacks against the target. | model (The LLM model to use for generating attacks, default: model=openai/gpt-4o)variants (number of variations to generate, default: 5) |
llm_poetry_jailbreaker |
Social Engineering, LLM, Attack-Based | Generates jailbreak attempts in the form of poetry or rhymes. | model (The LLM model to use for generating attacks, default: model=openai/gpt-4o)variants (number of variations to generate, default: 5) |
shortener |
LLM | Uses an LLM to shorten the text to a specified maximum length while retaining key details. | max_length (the maximum length for the shortened text, default: 256) |
rag_poisoner |
LLM, Attack-Based | Injects fake RAG context that appears to be legitimate document snippets supporting the attack objective. | model (The LLM model to use for generating attacks, default: model=openai/gpt-4o)variants (number of variations to generate, default: 5) |
Usage Example
spikee generate --seed ./seeds-cybersec-2026-01 \
--plugin best_of_n google_translate|base64 \
--plugin-options "best_of_n:variants=5;google_translate:source-lang=en"Spikee includes several built-in dynamic attacks, that will iteratively modify prompts/documents until they succeed (or run out of iterations). These are located within the spikee/attacks/ folder, and can be listed at any time with spikee list attacks.
You can customize the behavior of attacks using the following command-line options:
--attack-iterations: Specifies the maximum number of iterations for each attack (default: 1000).--attack-options: Passes a single string option to the attack script for custom behavior (e.g.,"mode=aggressive").
Type Key
- Obfuscation: Noise injection, character mangling, and random token perturbation.
- Formatting: Structural manipulation of payload layout.
- Social Engineering: LLM-driven persuasion, escalation, and manipulation tactics.
- Translation: Cross-lingual evasion via language conversion.
- LLM: Requires an LLM provider.
| Attack | Type | Description | Additional Options |
|---|---|---|---|
best_of_n |
Obfuscation | Implements "Best-of-N Jailbreaking" John Hughes et al., 2024 to apply character scrambling, random capitalization, and character noising. | N/A |
random_suffix_search |
Obfuscation | Implements Random Suffix Search techniques, which appends random suffixes to the prompt to bypass filters. | N/A |
anti_spotlighting |
Formatting | Assess spotlighting vulnerabilities by sequentially trying variations of delimiter-based attacks. | N/A |
prompt_decomposition |
Formatting, LLM | Decomposes a prompt into chunks and generates shuffled variations. | modes (LLM model to apply, default: dumb)variants (number of variations to generate, default: 50) |
llm_multi_language_jailbreaker |
Translation, LLM | Generates jailbreak attempts using different languages, focusing on low-resource languages. | model (The LLM model to use for generating attacks) |
llm_jailbreaker |
Social Engineering, LLM | Uses an LLM to iteratively generate jailbreak attacks against the target. | model (The LLM model to use for generating attacks, e.g., model=openai/gpt-4o) |
llm_poetry_jailbreaker |
Social Engineering, LLM | Generates jailbreak attempts in the form of poetry or rhymes. | model (The LLM model to use for generating attacks) |
crescendo |
Social Engineering, LLM | Implements the Crescendo Attack. A multi-turn jailbreak that leverages an LLM Agent to prompt the target with seemingly benign prompts, gradually escalating the conversation by referencing the model's replies progressively leading to a successful jailbreak. | N/A |
echo_chamber |
Social Engineering, LLM | Implements the Echo Chamber Attack. A multi-turn attack that uses an LLM Agent to create a feedback loop, where the model's own responses are fed back into itself in order to bypass guardrails and achieve jailbreaks. | N/A |
goat |
Social Engineering, LLM | Implements the GOAT Attack. A multi-turn attack using an LLM acting as an automated red teaming agent, that can implement a range of adversarial prompting and jailbreaking techniques to achieve an objective. | See file for target specific configuration using APPLICATION_CONFIG and APPLICATION_GUARDRAILS. |
multi_turn |
Multi-Turn | Sequentially sends a predefined list of user prompts to the target LLM, from a simplistic multi-turn dataset. | N/A |
rag_poisoner |
LLM | Injects fake RAG context that appears to be legitimate document snippets supporting the attack objective. | model (The LLM model to use for generating attacks) |
Usage Example
spikee test --dataset datasets/dataset-name.jsonl \
--target demo_llm_application \
--attack crescendo \
--attack-options 'max-turns=5,model=bedrock/deepseek-v3' \
--attack-only