Source code for "Correlations inference attacks against machine learning models"

Source code for pre-processing datasets, running experiments, and generating the figures of the Science Advances paper Correlations inference attacks against machine learning models by Ana-Maria Cretu*, Florent Guépin*, and Yves-Alexandre de Montjoye (* denotes equal contribution).

Requirements

For optimal execution we recommend using a machine with at least 40 cores and 2 GPUs. The cores are needed to paralellize the execution of experiments using logistic regression models, while the GPUs are needed to parallelize the execution of experiments using multilayer perceptron models. The code can also be run using one GPU and fewer cores but it will be less efficient.

Install the environment:

conda env create -f correlations.yml

If this command returns an error, for instance due to using a different OS than Ubuntu (which we used in our experiments), you need to install the main libraries manually using conda or pip. Please refer to correlations.yml for the library versions. We recommend using the same version for all libraries using randomness, including numpy, random, torch, and scikit-learn.

Figure 1

To generate Figure 1 run the notebook notebooks/results/ear_shape_analysis.ipynb.

Figure 2

To reproduce the results of Figure 2A and 2B, run:

bash scripts/run_grid_attack.sh logreg logreg NBR_CORES

For Figure 2C, run:

bash scripts/run_grid_attack.sh mlptorch mlptorch NBR_CORES

In our experiments, we set the NBR_CORES variable to 40. You can modify it according to your resources.

To generate Figure 2, run the corresponding cell in the following notebook: notebooks/results/figures_synthetic_evaluation.ipynb.

You can generate smaller-scale results (i.e., a blurry version of Figure 2) more quickly by reducing the granularity of the grid discretization by setting the --lengths parameter to, e.g., "10,10".

Figure 3

Figure 3 shows results for predicting $\rho(X_1, X_2)$ from models trained on synthetic datasets of $n$ variables $X_1, ..., X_{n-1}, Y$, ($n \in [3,...,10]$).

To reproduce the results of Figure 3 - S1, run:

bash scripts/run_randomized_target_attack_balanced.sh logreg logreg two 3 NBR_CORES NBR_GPUS CUDA_VISIBLE_DEVICES
bash scripts/run_randomized_target_attack_balanced.sh mlptorch mlptorch two 3 NBR_CORES NBR_GPUS CUDA_VISIBLE_DEVICES

Figure 3 - S2, run:

bash scripts/run_randomized_target_attack_balanced.sh logreg logreg column 3 NBR_CORES NBR_GPUS CUDA_VISIBLE_DEVICES
bash scripts/run_randomized_target_attack_balanced.sh mlptorch mlptorch column 3 NBR_CORES NBR_GPUS CUDA_VISIBLE_DEVICES

Figure 3 - S3, run:

bash scripts/run_randomized_target_attack_balanced.sh logreg logreg all_but_target 3 NBR_CORES NBR_GPUS CUDA_VISIBLE_DEVICES
bash scripts/run_randomized_target_attack_balanced.sh mlptorch mlptorch all_but_target 3 NBR_CORES NBR_GPUS CUDA_VISIBLE_DEVICES

We set the NBR_CORES variable to 40 in our experiments. You can modify it according to your resources. For experiments using logistic regression (logreg), CUDA_VISIBLE_DEVICES should be set to 0 (default). For experiments using multilayer perceptrons (mlptorch) which are trained on the GPU, we set CUDA_VISIBLE_DEVICES to 0,1 to parallelize our code on 2 GPUs, and we set NBR_GPUS=2. If you only have 1 GPU, set CUDA_VISIBLE_DEVICES to 0 and set NBR_GPUS to 1.

To generate Figure 3, run the corresponding cell in the following notebook: notebooks/results/figures_synthetic_evaluation.ipynb.

Figure 4

To reproduce the results of Figure 4, run the following command:

bash scripts/run_mitigations.sh logreg logreg NBR_CORES 1 0

We set the NBR_CORES variable to 40 in our experiments. You can modify it according to your resources.

To generate Figure 4, run the corresponding cell in the following notebook: notebooks/results/figures_synthetic_evaluation.ipynb

Figure 5

To reproduce the results of Figure 5, run the following command:

bash scripts/run_dp_experiment.sh NBR_CORES

We set the NBR_CORES variable to 40 in our experiments. You can modify it according to your resources.

To generate Figure 5, run the corresponding cell in the following notebook: notebooks/results/dp_experiment.sh

Table 1

Run the notebooks/dataset_preprocessing.ipynb notebook do download and pre-process the real-world datasets used in this experiment.

Then, to reproduce the results of Table 1, run the following command:

bash scripts/run_real_dataset_attack.sh communities_and_crime_v2 logreg logreg 3 NBR_CORES 1 0
bash scripts/run_real_dataset_attack.sh communities_and_crime_v2 logreg logreg 5 NBR_CORES 1 0
bash scripts/run_real_dataset_attack.sh fifa19_v2 logreg logreg 3 NBR_CORES 1 0
bash scripts/run_real_dataset_attack.sh fifa19_v2 logreg logreg 5 NBR_CORES 1 0
bash scripts/run_real_dataset_attack.sh musk logreg logreg 3 NBR_CORES 1 0
bash scripts/run_real_dataset_attack.sh musk logreg logreg 5 NBR_CORES 1 0

We set the NBR_CORES variable to 40 in our experiments. You can modify it according to your resources.

bash scripts/run_real_dataset_attack.sh communities_and_crime_v2 mlptorch mlptorch 3 NBR_CORES NBR_GPUS CUDA_VISIBLE_DEVICES
bash scripts/run_real_dataset_attack.sh communities_and_crime_v2 mlptorch mlptorch 5 NBR_CORES NBR_GPUS CUDA_VISIBLE_DEVICES
bash scripts/run_real_dataset_attack.sh fifa19_v2 mlptorch mlptorch 3 NBR_CORES NBR_GPUS CUDA_VISIBLE_DEVICES
bash scripts/run_real_dataset_attack.sh fifa19_v2 mlptorch mlptorch 5 NBR_CORES NBR_GPUS CUDA_VISIBLE_DEVICES
bash scripts/run_real_dataset_attack.sh musk mlptorch mlptorch 3 NBR_CORES NBR_GPUS CUDA_VISIBLE_DEVICES
bash scripts/run_real_dataset_attack.sh musk mlptorch mlptorch 5 NBR_CORES NBR_GPUS CUDA_VISIBLE_DEVICES

These commands use multilayer perceptrons (mlptorch) which are trained on the GPU. We set CUDA_VISIBLE_DEVICES to 0,1 to parallelize our code on 2 GPUs, and we set NBR_GPUS=2. If you only have 1 GPU, set CUDA_VISIBLE_DEVICES to 0 and set NBR_GPUS to 1.

The table results are aggregated in the following notebook: notebooks/results/real_dataset_evaluation.ipynb.

Tables 2 (and S1)

To reproduce the results of Tables 2 (and S1), run the following command:

bash scripts/run_large_scale_aia_attack.sh fifa19_v2 NBR_CORES

We set the NBR_CORES variable to 40 in our experiments. You can modify it according to your resources.

The table results are aggregated in the following notebook: notebooks/results/aia_results.ipynb.

Figure 6

To reproduce the results of Figure 6, run the following command:

bash scripts/run_correlation_extraction.sh fifa19_v2 NBR_CORES
bash scripts/run_correlation_extraction.sh communities_and_crime_v2 NBR_CORES
bash scripts/run_correlation_extraction.sh musk NBR_CORES

We set the NBR_CORES variable to 40 in our experiments. You can modify it according to your resources.

To generate Figure 6, run the corresponding cell in the following notebook: notebooks/results/correlation_extraction.ipynb

Figure 7

The results of Figure 7 - left are already computed as a result of the experiment used to generate results for Figure - S2.

To reproduce the results of Figure 7 - right, run the following command:

bash scripts/run_randomized_target_attack_same_seed.sh mlptorch mlptorch column 3 NBR_CORES NBR_GPUS CUDA_VISIBLE_DEVICES

We set the NBR_CORES variable to 40 in our experiments. You can modify it according to your resources. We set CUDA_VISIBLE_DEVICES to 0,1 to parallelize our code on 2 GPUs, and we set NBR_GPUS=2. If you only have 1 GPU, set CUDA_VISIBLE_DEVICES to 0 and set NBR_GPUS to 1.

To generate Figure 7, run the corresponding cell in the following notebook: notebooks/results/figures_synthetic_evaluation.ipynb.

Figure S1

To reproduce the results of Figure S1, run the following command:

bash scripts/run_randomized_target_attack_model_less_only.sh NBR_CORES

We set the NBR_CORES variable to 40 in our experiments. You can modify it according to your resources.

To generate Figure S1, run the corresponding cell in the following notebook: notebooks/results/model_less_attack_analysis.ipynb.

Figure S2

To reproduce the results of Figure S2, run the following command:

bash scripts/run_randomized_target_attack_balanced.sh logreg logreg column 5 NBR_CORES 1 0

We set the NBR_CORES variable to 40 in our experiments. You can modify it according to your resources.

To generate Figure S2, run the corresponding cell in the following notebook: notebooks/results/figures_synthetic_evaluation.ipynb.

Figures S3 and S4

The results of Figures S3 and S4 are already computed as a result of the experiment used to generate results for Figure 2. To generate the figures, run the corresponding cell in the following notebook: notebooks/results/figures_synthetic_evaluation.ipynb.

Figure S5

To reproduce the results of Figure S5, run the following command:

bash scripts/run_mitigations.sh mlptorch mlptorch NBR_CORES NBR_GPUS CUDA_VISIBLE_DEVICES

We set the NBR_CORES variable to 40 in our experiments. You can modify it according to your resources. This experiment uses multilayer perceptrons (mlptorch) which are trained on the GPU. We set CUDA_VISIBLE_DEVICES to 0,1 to parallelize our code on 2 GPUs, and we set NBR_GPUS=2. If you only have 1 GPU, set CUDA_VISIBLE_DEVICES to 0 and set NBR_GPUS to 1.

To generate Figure S5, run the corresponding cell in the following notebook: notebooks/results/figures_synthetic_evaluation.ipynb.

Figure S6

To reproduce the results of Figure S6, run the following command:

bash scripts/run_granularity_marginals.sh NBR_CORES

We set the NBR_CORES variable to 40 in our experiments. You can modify it according to your resources.

To generate Figure S6, run the corresponding cell in the following notebook: notebooks/results/real_dataset_evaluation.ipynb

Figures S7, S8 and S9

The results of Figures S7, S8 and S9 are already computed as a result of the experiment used to generate results for Figure 3 - S2. To generate the figures, run the corresponding cells in the following notebook: notebooks/results/figures_synthetic_evaluation.ipynb.

How to cite

If you re-use this code, please cite our paper as:

@article{crectu2024correlation,
  title={Correlation inference attacks against machine learning models},
  author={Cre{\c{t}}u, Ana-Maria and Gu{\'e}pin, Florent and de Montjoye, Yves-Alexandre},
  journal={Science Advances},
  volume={10},
  number={28},
  pages={eadj9260},
  year={2024},
  publisher={American Association for the Advancement of Science}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
notebooks		notebooks
plots		plots
scripts		scripts
src		src
LICENSE		LICENSE
README.md		README.md
correlations.yml		correlations.yml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Source code for "Correlations inference attacks against machine learning models"

Requirements

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Table 1

Tables 2 (and S1)

Figure 6

Figure 7

Figure S1

Figure S2

Figures S3 and S4

Figure S5

Figure S6

Figures S7, S8 and S9

How to cite

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

computationalprivacy/ml-correlation-inference

Folders and files

Latest commit

History

Repository files navigation

Source code for "Correlations inference attacks against machine learning models"

Requirements

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Table 1

Tables 2 (and S1)

Figure 6

Figure 7

Figure S1

Figure S2

Figures S3 and S4

Figure S5

Figure S6

Figures S7, S8 and S9

How to cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages