GlibAI
diff --git a/‎privacy/README.md
+79 b/‎privacy/README.md
+79
diff --git a/‎privacy/aggregation.py
+131 b/‎privacy/aggregation.py
+131
@@ -0,0 +1,79 @@
+# Learning private models with multiple teachers
+
+This repository contains code to create a setup for learning privacy-preserving 
+student models by transferring knowledge from an ensemble of teachers trained 
+on disjoint subsets of the data for which privacy guarantees are to be provided.
+
+Knowledge acquired by teachers is transferred to the student in a differentially
+private manner by noisily aggregating the teacher decisions before feeding them
+to the student during training.
+
+A paper describing the approach is in preparation. A link will be added to this 
+README when available.
+
+## Dependencies
+
+This model uses `TensorFlow` to perform numerical computations associated with 
+machine learning models, as well as common Python libraries like: `numpy`, 
+`scipy`, and `six`. Instructions to install these can be found in their 
+respective documentations. 
+
+## How to run
+
+This repository supports the MNIST, CIFAR10, and SVHN datasets. The following
+instructions are given for MNIST but can easily be adapted by replacing the 
+flag `--dataset=mnist` by `--dataset=cifar10` or `--dataset=svhn`.
+There are 2 steps: teacher training and student training. Data will be 
+automatically downloaded when you start the teacher training. 
+
+The following is a two-step process: first we train an ensemble of teacher
+models and second we train a student using predictions made by this ensemble.
+
+**Training the teachers:** first run the `train_teachers.py` file with at least
+three flags specifying (1) the number of teachers, (2) the ID of the teacher
+you are training among these teachers, and (3) the dataset on which to train. 
+For instance, to train teacher number 10 among an ensemble of 100 teachers for 
+MNIST, you use the following command:
+
+```
+python train_teachers.py --nb_teachers=100 --teacher_id=10 --dataset=mnist
+```
+
+Other flags like `train_dir` and `data_dir` should optionally be set to
+respectively point to the directory where model checkpoints and temporary data
+(like the dataset) should be saved. The flag `max_steps` (default at 3000) 
+controls the length of training. See `train_teachers.py` and `deep_cnn.py` 
+to find available flags and their descriptions.
+
+**Training the student:** once the teachers are all trained, e.g., teachers 
+with IDs `0` to `99` are trained for `nb_teachers=100`, we are ready to train
+the student. The student is trained by labeling some of the test data with 
+predictions from the teachers. The predictions are aggregated by counting the
+votes assigned to each class among the ensemble of teachers, adding Laplacian 
+noise to these votes, and assigning the label with the maximum noisy vote count
+to the sample. This is detailed in function `noisy_max` in the file 
+`aggregation.py`. To learn the student, use the following command:
+
+```
+python train_student.py --nb_teachers=100 --dataset=mnist --stdnt_share=5000
+```
+
+The flag `--stdnt_share=5000` indicates that the student should be able to
+use the first `5000` samples of the dataset's test subset as unlabeled
+training points (they will be labeled using the teacher predictions). The 
+remaining samples are used for evaluation of the student's accuracy, which
+is displayed upon completion of training.
+
+## Alternative deeper convolutional architecture
+
+Note that a deeper convolutional model is available. Both the default and 
+deeper models graphs are defined in `deep_cnn.py`, respectively by 
+functions `inference` and `inference_deeper`. Use the flag `--deeper=true` 
+to switch to that model when launching `train_teachers.py` and 
+`train_student.py`. 
+
+## Contact
+
+To ask questions, please email `[email protected]` or open an issue on 
+the `tensorflow/models` issues tracker. Please assign issues to 
+[(@npapernot)](https://github.com/npapernot).
@@ -0,0 +1,131 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+
+def labels_from_probs(probs):
+  """
+  Helper function: computes argmax along last dimension of array to obtain
+  labels (max prob or max logit value)
+  :param probs: numpy array where probabilities or logits are on last dimension
+  :return: array with same shape as input besides last dimension with shape 1
+          now containing the labels
+  """
+  # Compute last axis index
+  last_axis = len(np.shape(probs)) - 1
+
+  # Label is argmax over last dimension
+  labels = np.argmax(probs, axis=last_axis)
+
+  # Return as np.int32
+  return np.asarray(labels, dtype=np.int32)
+
+
+def noisy_max(logits, lap_scale, return_clean_votes=False):
+  """
+  This aggregation mechanism takes the softmax/logit output of several models
+  resulting from inference on identical inputs and computes the noisy-max of
+  the votes for candidate classes to select a label for each sample: it
+  adds Laplacian noise to label counts and returns the most frequent label.
+  :param logits: logits or probabilities for each sample
+  :param lap_scale: scale of the Laplacian noise to be added to counts
+  :param return_clean_votes: if set to True, also returns clean votes (without
+                      Laplacian noise). This can be used to perform the
+                      privacy analysis of this aggregation mechanism.
+  :return: pair of result and (if clean_votes is set to True) the clean counts
+           for each class per sample and the the original labels produced by
+           the teachers.
+  """
+
+  # Compute labels from logits/probs and reshape array properly
+  labels = labels_from_probs(logits)
+  labels_shape = np.shape(labels)
+  labels = labels.reshape((labels_shape[0], labels_shape[1]))
+
+  # Initialize array to hold final labels
+  result = np.zeros(int(labels_shape[1]))
+
+  if return_clean_votes:
+    # Initialize array to hold clean votes for each sample
+    clean_votes = np.zeros((int(labels_shape[1]), 10))
+
+  # Parse each sample
+  for i in xrange(int(labels_shape[1])):
+    # Count number of votes assigned to each class
+    label_counts = np.bincount(labels[:,i], minlength=10)
+
+    if return_clean_votes:
+      # Store vote counts for export
+      clean_votes[i] = label_counts
+
+    # Cast in float32 to prepare before addition of Laplacian noise
+    label_counts = np.asarray(label_counts, dtype=np.float32)
+
+    # Sample independent Laplacian noise for each class
+    for item in xrange(10):
+      label_counts[item] += np.random.laplace(loc=0.0, scale=float(lap_scale))
+
+    # Result is the most frequent label
+    result[i] = np.argmax(label_counts)
+
+  # Cast labels to np.int32 for compatibility with deep_cnn.py feed dictionaries
+  result = np.asarray(result, dtype=np.int32)
+
+  if return_clean_votes:
+    # Returns several array, which are later saved:
+    # result: labels obtained from the noisy aggregation
+    # clean_votes: the number of teacher votes assigned to each sample and class
+    # labels: the labels assigned by teachers (before the noisy aggregation)
+    return result, clean_votes, labels
+  else:
+    # Only return labels resulting from noisy aggregation
+    return result
+
+
+def aggregation_most_frequent(logits):
+  """
+  This aggregation mechanism takes the softmax/logit output of several models
+  resulting from inference on identical inputs and computes the most frequent
+  label. It is deterministic (no noise injection like noisy_max() above.
+  :param logits: logits or probabilities for each sample
+  :return:
+  """
+  # Compute labels from logits/probs and reshape array properly
+  labels = labels_from_probs(logits)
+  labels_shape = np.shape(labels)
+  labels = labels.reshape((labels_shape[0], labels_shape[1]))
+
+  # Initialize array to hold final labels
+  result = np.zeros(int(labels_shape[1]))
+
+  # Parse each sample
+  for i in xrange(int(labels_shape[1])):
+    # Count number of votes assigned to each class
+    label_counts = np.bincount(labels[:,i], minlength=10)
+
+    label_counts = np.asarray(label_counts, dtype=np.int32)
+
+    # Result is the most frequent label
+    result[i] = np.argmax(label_counts)
+
+  return np.asarray(result, dtype=np.int32)
+
+