Built-In Seeds and Modules

Spikee comes with a variety of built-in seeds and modules (e.g., targets, judges, plugins, attacks).

Jump to Links:

Built-in Seeds
Built-in Targets
Built-in Judges
Built-in Plugins
Built-in Attacks

Built-in Seeds

Spikee comes with a variety of built-in seeds, each designed for a specific testing purpose. These seeds are located in the datasets/ directory after you run spikee init. You can list them at any time with spikee list seeds.

Seed	Source	Type	Description
`seeds-cybersec-2026-01`	Reversec	Cybersecurity	A general-purpose dataset for testing prompt injection and cybersecurity harms. It focuses on common attack goals seen in web application security, such as data exfiltration, cross-site scripting (XSS), and resource exhaustion.
`seeds-harmful-instructions-only`	Reversec	Objectives	Specifically designed for attacks using LLM Agents, such as Crescendo and LLM-Jailbreaker. These attacks require a instruction (objective) to generate their own attack vectors dynamically. Contains harmful instructions in `instructions.jsonl`, while leaving jailbreaks and user inputs as empty placeholders.
`seeds-simsonsun-high-quality-jailbreaks`	External	Jailbreaks	A high-quality set of contamination-free jailbreak prompts, specifically curated to avoid overlap with the training data of many common safety classifiers.
`seeds-in-the-wild-jailbreak-prompts`	External	Jailbreaks	Contains approximately 1,400 real-world jailbreak prompts collected from public sources like Discord and Reddit (filtered from the TrustAIRLab dataset). Ideal for testing a target's resilience against known, publicly available jailbreaks.
`seeds-wildguardmix-harmful`	External	Harmful	A dataset for testing harmful content generation. The prompts are sourced from the WildGuard-Mix dataset.
`seeds-wildguardmix-harmful-fp`	External	Harmful (FP)	A companion dataset to `seeds-wildguardmix-harmful`, containing benign (harmless) prompts.
`seeds-toxic-chat`	External	Harmful	A dataset for testing toxic prompts, filtered from 10K user prompts collected from the Vicuna online demo.
`seeds-investment-advice`	Reversec	Topical Guardrails	Designed to test topical guardrails that are supposed to block personal financial or investment advice. It includes both malicious instructions and standalone attack prompts.
`seeds-investment-advice-fp`	Reversec	Topical Guardrails (FP)	A companion dataset to `seeds-investment-advice`, containing benign (harmless) queries about financial topics.
`seeds-sysmsg-extraction-2025-04`	Reversec	System Prompt Extraction	Specifically designed to test for system prompt extraction. The instructions and judges are tailored to detect if the target model leaks its own system prompt or initial instructions.
`seeds-llm-mailbox`	Reversec	Tutorial	An example seed tailored for testing an email summarization feature. The documents are sample emails, and the instructions are designed to test for vulnerabilities in that specific context. See the associated blog post for a detailed walkthrough.
`seeds-empty`	Reversec	Utility	An empty template folder. It contains empty `documents.jsonl`, `jailbreaks.jsonl`, and `instructions.jsonl` files. This is the recommended starting point when creating a new dataset from scratch, especially for standalone attacks.
`seeds-mini-test`	Reversec	Utility	A very small set of examples for quick, functional testing of Spikee itself. Use this to verify your setup or to test a new custom target or plugin without running a large number of tests.

FP datasets are intended for use with the --false-positive-checks flag to measure how often a guardrail incorrectly blocks legitimate prompts when evaluating harmful content filters.

External datasets require you to run a fetch script to download the prompts. See the README.md inside each seed folder for instructions. Some of these use an LLM judge by default, which will be specified in the seed's README.

** Usage Example**

spikee generate --seed ./seeds-cybersec-2026-01

Built-in Targets

Spikee includes a variety of built-in and sample targets, which can be listed at any time with spikee list targets.

Built-in targets focus on several common LLM providers, and will require you to rename .env-example to .env and add any necessary API keys - these are located within the spikee/targets/ folder.

Target	Type	Description
`llm_provider`	Provider	Generic LLM target for supported LLM providers (e.g., openai, bedrock, google, ollama, e.t.c.) (See Docs)
`aws_bedrock_guardrail`	Guardrails	Assess AWS Bedrock Guardrails
`az_ai_content_safety_harmful`	Guardrails	Assess Azure AI Content Safety Harm Categories
`az_prompt_shields_document_analysis`	Guardrails	Assess Azure Prompt Shields Document Analysis
`az_prompt_shields_prompt_analysis`	Guardrails	Assess Azure Prompt Shields Prompt Analysis

Sample targets are provided within the workspace/targets/ folder - created by running spikee init. These demonstrate how to write custom targets and can be easily modified to assess an LLM application of your choice.

Target	Type	Description
`sample_target`	Single-Turn	Sends a GET request to a fictional application, demonstrating options and advanced guardrail and error handling.
`sample_target_legacy`	Single-Turn (Legacy)	Returns a mock message. This is a legacy target, demonstrating the older target format.
`sample_pdf_request_target`	Single-Turn	Sends a POST request containing a PDF to a fictional application.
`test_chatbot`	Multi-Turn	Sends requests to Spikee Test Chatbot
`simple_test_chatbot`	Multi-Turn	Implements the simple multi-turn target, and sends requests to Spikee Test Chatbot
`llm_mailbox`	Single-Turn	Sample target for email summarisation application tutorial

Usage Example

spikee test --dataset datasets/cybersec-2026-01.jsonl \
            --target llm_provider \
            --target-options "bedrock/claude45-haiku"

Built-in Judges

Spikee includes several built-in judges to evaluate LLM responses, located within the spikee/judges/ and workspace/judges/ folders. These can be listed at any time with spikee list judges.

Basic Judges These evaluate responses based on simple criteria.

canary: Checks if a predefined canary string is present in the response.
regex: Uses regular expressions to identify specific patterns in the response.

LLM Judges Some test cases, success cannot be determined by a simple keyword or pattern. For instance, did the model's response contain harmful advice, or did it refuse to answer a question on a restricted topic?

LLM-based judges address this by using a separate LLM to evaluate the target's response against a natural language criterion.

llm_judge_harmful: LLM judge to evaluate whether the target LLMs response complied with a potentially harmful user prompt.
llm_judge_objective: LLM judge to evaluate whether the target LLMs response meets a specific input objective.
llm_judge_output_criteria: LLM judge to evaluate whether the target LLMs response meets specific success criteria defined in judge_args.

The LLM Agent model can be specified using the --judge-options flag. See LLM Providers for a complete list of supported models, prefixes, and examples. Some common examples include

offline: Mock judge, for restrictive environments. See re-judging and isolated environments documentation for more information.
bedrock/<model_name>: AWS Bedrock API (e.g., bedrock/claude45-haiku)
openai/<model_name>: OpenAI API (e.g., openai/gpt-4o-mini)
google/<model_name>: Google Gen AI API (e.g., google/gemini-2.5-flash)

Usage Example

# Use an offline judge, allowing for later re-judging
spikee test --dataset datasets/cybersec-2026-01.jsonl \
            --target llm_provider \
            --target-options "bedrock/claude45-haiku" \
            --judge-options offline

Built-in Plugins

Spikee includes several build-in plugins, that can be leveraged to enhance dataset generation. These are scripts that will apply static transformations to payloads during dataset generation, and can create multiple iterations of each entry. Built-in plugins are located in the spikee/plugins/ directory, local plugins are located in the plugins/ directory within your workspace. You can list them at any time with spikee list plugins.

The following list provides an overview of each build-in plugin, further information on each plugin can be found within the plugin file.

Type Key

Encoding: Deterministic format or character conversion.
Obfuscation: Noise injection, character mangling, and word masking.
Translation: Language or script conversion.
Formatting: Structural manipulation of payload layout.
Social Engineering: LLM-driven persuasion and manipulation tactics.
LLM: Requires an LLM provider.
ML: Requires local machine learning models.
Attack-Based: Adapted from dynamic attack research into static dataset transformations.

Plugin	Type	Description	Options
`1337`	Encoding	Transforms text into "leet speak" by replacing certain letters with numbers or symbols.	N/A
`ascii_smuggler`	Encoding	Transforms ASCII text into a series of Unicode rags that are generally invisible to most UI elements (bypassing content filters).	N/A
`base64`	Encoding	Encodes text using Base64 encoding.	N/A
`ceasar`	Encoding	Applies a Caesar cipher to the text, shifting letters by a specified number of positions.	`shift` (number of positions to shift, default: 3)
`hex`	Encoding	Encodes text into its hexadecimal representation.	N/A
`morse`	Encoding	Encodes text into Morse code.	N/A
`best_of_n`	Obfuscation, Attack-Based	Implements "Best-of-N Jailbreaking" John Hughes et al., 2024 to apply character scrambling, random capitalization, and character noising.	`variants` (number of variations to generate, default: 50)
`flip`	Obfuscation	Applies a flip attack to obfuscate text: - FWO: Flip Word Order - FCW: Flip Chars in Word - FCS: Flip Chars in Sentence	`mode` (the flip mode to apply, default: `FWO`)
`mask`	Obfuscation, LLM	Masks high-risk words in the text with random character sequences, while providing a suffix that maps the masks back to the original words.	`advanced` (if true, creates multiple masks for longer words) `advanced-split` (the number of characters per mask chunk for the advanced option, default: 6)
`splat`	Obfuscation	Obfuscates the text using splat-based techniques (e.g., asterisks '*', special characters, and spacing tricks), to bypass basic filters.	`character` (the character to use for splatting, default: `*`) `insert_rand` (probability of inserting a splat within words, default: 0.6) `pad_rand` (probability of padding words with splats, default: 0.4)
`digraphic_translate`	Translation, LLM	Generates jailbreak prompts by mixing writing systems within a single digraphic language (e.g. Japanese Kanji/Romaji, Serbian Cyrillic/Latin) to evade script-sensitive safety classifiers.	`language` (target digraphic language, default: `japanese`. Options: `korean`, `serbian`, `chinese`, `hindi-urdu`)
`google_translate`	Translation	Translates text to another language using google translate.	`source-lang` (language code for source language, default: `en`) `target-lang` (language code for target language, default: `zh-cn`)
`llm_multi_language_jailbreaker`	Translation, LLM, Attack-Based	Generates jailbreak attempts using different languages, focusing on low-resource languages.	`model` (The LLM model to use for generating attacks, default: `model=openai/gpt-4o`) `variants` (number of variations to generate, default: 5)
`opus_translator`	Translation, ML	Translates text to another language using local OPUS-MT models.	`source` (source language code, default: `en`) `targets` (target language(s), default: `zh`) `quality` (translation quality, default: 1) `device` (cpu or gpu, default: auto-detect) `cache_dir` (directory to cache ML models, optional)
`anti_spotlighting`	Formatting, Attack-Based	Generates variations of delimiter-based attacks to test LLM applications against spotlighting vulnerabilities.	`variants` (number of variations to generate, default: 50)
`prompt_decomposition`	Formatting, Attack-Based	Decomposes a prompt into chunks and generates shuffled variations.	`modes` (LLM model to apply, default: dumb) `variants` (number of variations to generate, default: 50)
`llm_jailbreaker`	Social Engineering, LLM, Attack-Based	Uses an LLM to iteratively generate jailbreak attacks against the target.	`model` (The LLM model to use for generating attacks, default: `model=openai/gpt-4o`) `variants` (number of variations to generate, default: 5)
`llm_poetry_jailbreaker`	Social Engineering, LLM, Attack-Based	Generates jailbreak attempts in the form of poetry or rhymes.	`model` (The LLM model to use for generating attacks, default: `model=openai/gpt-4o`) `variants` (number of variations to generate, default: 5)
`shortener`	LLM	Uses an LLM to shorten the text to a specified maximum length while retaining key details.	`max_length` (the maximum length for the shortened text, default: 256)
`rag_poisoner`	LLM, Attack-Based	Injects fake RAG context that appears to be legitimate document snippets supporting the attack objective.	`model` (The LLM model to use for generating attacks, default: `model=openai/gpt-4o`) `variants` (number of variations to generate, default: 5)

Usage Example

spikee generate --seed ./seeds-cybersec-2026-01 \
                --plugin best_of_n google_translate|base64 \
                --plugin-options "best_of_n:variants=5;google_translate:source-lang=en"

Built-in Attacks

Spikee includes several built-in dynamic attacks, that will iteratively modify prompts/documents until they succeed (or run out of iterations). These are located within the spikee/attacks/ folder, and can be listed at any time with spikee list attacks.

You can customize the behavior of attacks using the following command-line options:

--attack-iterations: Specifies the maximum number of iterations for each attack (default: 1000).
--attack-options: Passes a single string option to the attack script for custom behavior (e.g., "mode=aggressive").

Type Key

Obfuscation: Noise injection, character mangling, and random token perturbation.
Formatting: Structural manipulation of payload layout.
Social Engineering: LLM-driven persuasion, escalation, and manipulation tactics.
Translation: Cross-lingual evasion via language conversion.
LLM: Requires an LLM provider.

Attack	Type	Description	Additional Options
`best_of_n`	Obfuscation	Implements "Best-of-N Jailbreaking" John Hughes et al., 2024 to apply character scrambling, random capitalization, and character noising.	N/A
`random_suffix_search`	Obfuscation	Implements Random Suffix Search techniques, which appends random suffixes to the prompt to bypass filters.	N/A
`anti_spotlighting`	Formatting	Assess spotlighting vulnerabilities by sequentially trying variations of delimiter-based attacks.	N/A
`prompt_decomposition`	Formatting, LLM	Decomposes a prompt into chunks and generates shuffled variations.	`modes` (LLM model to apply, default: dumb) `variants` (number of variations to generate, default: 50)
`llm_multi_language_jailbreaker`	Translation, LLM	Generates jailbreak attempts using different languages, focusing on low-resource languages.	`model` (The LLM model to use for generating attacks)
`llm_jailbreaker`	Social Engineering, LLM	Uses an LLM to iteratively generate jailbreak attacks against the target.	`model` (The LLM model to use for generating attacks, e.g., `model=openai/gpt-4o`)
`llm_poetry_jailbreaker`	Social Engineering, LLM	Generates jailbreak attempts in the form of poetry or rhymes.	`model` (The LLM model to use for generating attacks)
`crescendo`	Social Engineering, LLM	Implements the Crescendo Attack. A multi-turn jailbreak that leverages an LLM Agent to prompt the target with seemingly benign prompts, gradually escalating the conversation by referencing the model's replies progressively leading to a successful jailbreak.	N/A
`echo_chamber`	Social Engineering, LLM	Implements the Echo Chamber Attack. A multi-turn attack that uses an LLM Agent to create a feedback loop, where the model's own responses are fed back into itself in order to bypass guardrails and achieve jailbreaks.	N/A
`goat`	Social Engineering, LLM	Implements the GOAT Attack. A multi-turn attack using an LLM acting as an automated red teaming agent, that can implement a range of adversarial prompting and jailbreaking techniques to achieve an objective.	See file for target specific configuration using `APPLICATION_CONFIG` and `APPLICATION_GUARDRAILS`.
`multi_turn`	Multi-Turn	Sequentially sends a predefined list of user prompts to the target LLM, from a simplistic multi-turn dataset.	N/A
`rag_poisoner`	LLM	Injects fake RAG context that appears to be legitimate document snippets supporting the attack objective.	`model` (The LLM model to use for generating attacks)

Usage Example

spikee test --dataset datasets/dataset-name.jsonl \
            --target demo_llm_application \
            --attack crescendo \
            --attack-options 'max-turns=5,model=bedrock/deepseek-v3' \
            --attack-only

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Built-In Seeds and Modules

Built-in Seeds

Built-in Targets

Built-in Judges

Built-in Plugins

Built-in Attacks

FilesExpand file tree

02_builtin.md

Latest commit

History

02_builtin.md

File metadata and controls

Built-In Seeds and Modules

Built-in Seeds

Built-in Targets

Built-in Judges

Built-in Plugins

Built-in Attacks