LLMart uses Hydra for configuration management, which provides a flexible command line interface for specifying configuration options. This document details all available command line arguments organized by functional groups.
Configurations can be specified directly on the command line (using dot notation for nested configurations) and are applicable to launching the LLMart module using -m llmart
. Lists are specified using brackets and comma separation (careful with extra spaces). For example:
accelerate launch -m llmart model=llama3-8b-instruct data=basic steps=567 optim.n_tokens=11 banned_strings=[car,machine]
You can also compose configurations from pre-defined groups and override specific values as needed.
These parameters control the basic behavior of experiments. Parameters marked as MISSING are mandatory.
Parameter | Type | Default | Description |
---|---|---|---|
model |
string | MISSING | Model name Can be either one of the pre-defined options, or the Hugging Face name supplied to AutoModelForCausalLM |
revision |
string | MISSING | Model revision Hugging Face revision supplied to AutoModelForCausalLM |
data |
string | "advbench_behavior" | Dataset configuration Can be one of the pre-defined options or custom if loading an arbitrary Hugging Face dataset |
loss |
string | "model" | Loss function type |
optim |
string | "gcg" | Optimization algorithm Choices: gcg, sgd, adam sgd or adam will result in a soft embedding attack |
scheduler |
string | "linear" | Scheduler to use on an integer hyper-parameter specified by scheduler.var_name Choices: constant, linear, exponential, cosine, multistep, plateau |
experiment_name |
string | llmart |
Name of the folder where results will be stored |
output_dir |
string | ${now:%Y-%m-%d}/${now:%H-%M-%S.%f} |
Name of the sub-folder where results for the current run will be stored Defaults to a millisecond-level timestamp |
seed |
integer | 2024 | Global random seed for reproducibility |
use_deterministic_algorithms |
boolean | false | Whether to use cuDNN deterministic algorithms |
steps |
integer | 500 | Number of adversarial optimization steps |
early_stop |
boolean | true | Whether to enable early stopping If true , enables early stopping once all forced tokens are rank-1 (guaranteed selection in greedy decoding) |
val_every |
integer | 50 | Validation frequency (in steps) |
max_new_tokens |
int | 512 | The maximum number of tokens to auto-regressively generate when periodically validating the adversarial attack |
save_every |
integer | 50 | Result saving frequency (in steps) |
per_device_bs |
integer | 1 | Per-device batch size Setting this to -1 will enable auto functionality for finding the largest batch size that can fit on the device❗The value -1 is currently only supported for single-device execution |
use_kv_cache |
boolean | false | Whether to use KV cache for efficiency ❗ Setting this to true is only intended for len(data.subset)=1 , otherwise it may cause silent errors |
Parameters related to model selection and configuration.
Parameter | Type | Default | Description |
---|---|---|---|
model.task |
string | "text-generation" | Task for the model pipeline |
model.device |
string | "cuda" | Device to run on ("cuda", "cpu", etc.) |
model.device_map |
string | null | Device mapping strategy |
model.torch_dtype |
string | "bfloat16" | Torch data type |
Pre-defined model options:
llama3-8b-instruct
llama3.1-8b-instruct
llama3.1-70b-instruct
llama3.2-1b-instruct
llama3.2-11b-vision
llamaguard3-1b
llama3-8b-grayswan-rr
deepseek-r1-distill-llama-8b
Parameters for configuring adversarial token placement and optimization methods.
Parameter | Type | Default | Description |
---|---|---|---|
banned_strings |
list[string] | empty | Any tokens that are superstrings of any element will be excluded from optimization |
attack.suffix |
integer | 20 | How many adversarial suffix tokens are optimized |
attack.prefix |
integer | 0 | How many adversarial prefix tokens are optimized |
attack.pattern |
string | null | The string that is replaced by attack.repl tokensEach occurence of the string pattern will be replaced with the same tokens |
attack.dim |
integer | 0 | The dimension out of {0: dict_size, 1: embedding_dim} used to define and compute gradients0 is currently the only robust and recommended setting |
attack.default_token |
string | " !" | The initial string representation of the adversarial tokenssuffix |
optim.lr |
float | 0.001 | Learning rate (step size) for the optimizer |
optim.n_tokens |
integer | 20 | Number of tokens to simultaneously optimize in a single step |
optim.n_swaps |
integer | 1024 | Number of token candidate swaps (replacements) to sample in a single step |
scheduler.var_name |
string | "n_tokens" | The optim integer hyper-parameter that the scheduler modifies during optimizationChoices: n_tokens, n_swaps |
Parameters for data loading and processing.
Parameter | Type | Default | Description |
---|---|---|---|
data.path |
string | MISSING | Name of Hugging Face dataset Only required when using a data=custom |
data.subset |
list[integer] | null | Specific data samples to use from the dataset and learn a single adversarial attack for all of them |
data.files |
string | null | Files passed to the Hugging Face dataset |
data.shuffle |
boolean | false | Whether to shuffle data at each step |
data.n_train |
integer | 0 | Number of training samples to take from the data.subset Leaving this and data.{n_val, n_train} to their default values will automatically use only the first sample for training and testing. |
data.n_val |
integer | 0 | Number of validation samples |
data.n_test |
integer | 0 | Number of test samples |
bs |
integer | 1 | Data batch size to use in an optimization stepper_device_bs and must be equal to it if len(data.subset) > 1 . |
Pre-defined data options:
basic
advbench_behavior
advbench_judge
custom