LLMart Command Line Reference

Introduction

LLMart uses Hydra for configuration management, which provides a flexible command line interface for specifying configuration options. This document details all available command line arguments organized by functional groups.

Configurations can be specified directly on the command line (using dot notation for nested configurations) and are applicable to launching the LLMart module using -m llmart. Lists are specified using brackets and comma separation (careful with extra spaces). For example:

accelerate launch -m llmart model=llama3-8b-instruct data=basic steps=567 optim.n_tokens=11 banned_strings=[car,machine]

You can also compose configurations from pre-defined groups and override specific values as needed.

Core Configuration

These parameters control the basic behavior of experiments. Parameters marked as MISSING are mandatory.

Parameter	Type	Default	Description
`model`	string	MISSING	Model name Can be either one of the pre-defined options, or the Hugging Face name supplied to `AutoModelForCausalLM`
`revision`	string	MISSING	Model revision Hugging Face revision supplied to `AutoModelForCausalLM` ⚠️ Mandatory only when the model is not one of the pre-defined ones
`data`	string	"advbench_behavior"	Dataset configuration Can be one of the pre-defined options or `custom` if loading an arbitrary Hugging Face dataset
`loss`	string	"model"	Loss function type
`optim`	string	"gcg"	Optimization algorithm Choices: `gcg, sgd, adam` ⚠️ Using `sgd` or `adam` will result in a soft embedding attack
`scheduler`	string	"linear"	Scheduler to use on an integer hyper-parameter specified by `scheduler.var_name` Choices: `constant, linear, exponential, cosine, multistep, plateau`
`experiment_name`	string	`llmart`	Name of the folder where results will be stored
`output_dir`	string	`${now:%Y-%m-%d}/${now:%H-%M-%S.%f}`	Name of the sub-folder where results for the current run will be stored Defaults to a millisecond-level timestamp
`seed`	integer	2024	Global random seed for reproducibility ⚠️ The seed will only reproduce results on the same number of GPUs as the original run
`use_deterministic_algorithms`	boolean	false	Whether to use cuDNN deterministic algorithms
`steps`	integer	500	Number of adversarial optimization steps
`early_stop`	boolean	true	Whether to enable early stopping If `true`, enables early stopping once all forced tokens are rank-1 (guaranteed selection in greedy decoding)
`val_every`	integer	50	Validation frequency (in steps)
`max_new_tokens`	int	512	The maximum number of tokens to auto-regressively generate when periodically validating the adversarial attack
`save_every`	integer	50	Result saving frequency (in steps)
`per_device_bs`	integer	1	Per-device batch size Setting this to `-1` will enable `auto` functionality for finding the largest batch size that can fit on the device ❗The value `-1` is currently only supported for single-device execution ⚠️ This parameter can greatly improve efficiency, but will error out if insufficient VRAM is available
`use_kv_cache`	boolean	false	Whether to use KV cache for efficiency ❗ Setting this to `true` is only intended for `len(data.subset)=1`, otherwise it may cause silent errors

Model Configuration

Parameters related to model selection and configuration.

Parameter	Type	Default	Description
`model.task`	string	"text-generation"	Task for the model pipeline
`model.device`	string	"cuda"	Device to run on ("cuda", "cpu", etc.)
`model.device_map`	string	null	Device mapping strategy
`model.torch_dtype`	string	"bfloat16"	Torch data type

Pre-defined model options:

llama3-8b-instruct
llama3.1-8b-instruct
llama3.1-70b-instruct
llama3.2-1b-instruct
llama3.2-11b-vision
llamaguard3-1b
llama3-8b-grayswan-rr
deepseek-r1-distill-llama-8b

Attack & Optimization Configuration

Parameters for configuring adversarial token placement and optimization methods.

Parameter	Type	Default	Description
`banned_strings`	list[string]	empty	Any tokens that are superstrings of any element will be excluded from optimization ⚠️ This could be useful for banning profanities from being optimized, although it is not sufficient to guarantee that the model cannot learn two adjacent tokens that decode to a banned string
`attack.suffix`	integer	20	How many adversarial suffix tokens are optimized
`attack.prefix`	integer	0	How many adversarial prefix tokens are optimized
`attack.pattern`	string	null	The string that is replaced by `attack.repl` tokens Each occurence of the string pattern will be replaced with the same tokens
`attack.dim`	integer	0	The dimension out of `{0: dict_size, 1: embedding_dim}` used to define and compute gradients ⚠️ `0` is currently the only robust and recommended setting
`attack.default_token`	string	" !"	The initial string representation of the adversarial tokens ⚠️ If string here does not encode to a single token, the number of optimized tokens will be the length of the default token multiplied by `suffix`
`optim.lr`	float	0.001	Learning rate (step size) for the optimizer
`optim.n_tokens`	integer	20	Number of tokens to simultaneously optimize in a single step
`optim.n_swaps`	integer	1024	Number of token candidate swaps (replacements) to sample in a single step
`scheduler.var_name`	string	"n_tokens"	The `optim` integer hyper-parameter that the scheduler modifies during optimization Choices: `n_tokens, n_swaps`

Data Configuration

Parameters for data loading and processing.

Parameter	Type	Default	Description
`data.path`	string	MISSING	Name of Hugging Face dataset Only required when using a `data=custom`
`data.subset`	list[integer]	null	Specific data samples to use from the dataset and learn a single adversarial attack for all of them
`data.files`	string	null	Files passed to the Hugging Face dataset
`data.shuffle`	boolean	false	Whether to shuffle data at each step
`data.n_train`	integer	0	Number of training samples to take from the `data.subset` Leaving this and `data.{n_val, n_train}` to their default values will automatically use only the first sample for training and testing.
`data.n_val`	integer	0	Number of validation samples
`data.n_test`	integer	0	Number of test samples
`bs`	integer	1	Data batch size to use in an optimization step ⚠️ This is different than the core `per_device_bs` and must be equal to it if `len(data.subset) > 1`.

Pre-defined data options:

basic
advbench_behavior
advbench_judge
custom

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!