Experiments have shown that language models trained on email datasets can encode sensitive information in the training data and thus can reveal a specific user’s data. To overcome this problem, most data scientists use federated learning. Google AI researchers decided to apply Differential Privacy to ImageNet classification to demonstrate the effectiveness of such an approach.
ML-based applications are on the rise. The mass of data needed to train the algorithms has raised many concerns about the protection of personal data. Federated learning offers better guarantees on their confidentiality.
Federated Learning and Differential Privacy
In 2017, Google developed a concept of Deep Learning distributed on terminals: Federated Learning, or federated learning, a solution that allows directly embedded processing on Android devices of the data to be analyzed. To operate, a Deep Learning model does not need data, these are only used to adjust the weights of the links connecting the artificial neurons during the learning phase. Thus, federated learning allows mobile phones to learn collaboratively while decoupling machine learning capability from the need to store data in the cloud.
Differential Privacy (DP) prevents models from storing specific data of individuals and only allows them to learn statistical behaviors. Thanks to it, the protection of sensitive data is guaranteed by training a federated learning model since inferring training data on its basis or restoring the original datasets becomes almost impossible. In the DP framework, the confidentiality guarantees of a system are generally characterized by a positive parameter ε, called loss of confidentiality, with a smaller ε corresponding to better confidentiality. One can train a model with DP guarantees using DP-SGD (stochastic gradient descent with differential privacy), a specialized training algorithm that provides DP guarantees for the trained model. However, this training is little used because it usually has two major drawbacks: slow as well as inefficient implementations and a negative impact on utility (such as model accuracy). As a result, most DP research papers present DP algorithms on very small datasets and do not even attempt to perform an evaluation of larger datasets, such as Image Net. Google researchers published the first results of their research using DP titled: “Toward Training at ImageNet Scale with Differential Privacy”
Differential Privacy Test on ImageNet
The researchers chose ImageNet classification as a demonstration of the practicality and effectiveness of DP because research on differential privacy in this area is very rare and little advanced and other researchers will have the opportunity to collectively improve the usefulness of actual PD training. Classification on ImageNet is a challenge for DP because it requires large networks with many parameters. This results in a significant amount of noise added to the calculation, as the noise added is proportional to the size of the model.
Scaling differential privacy with JAX
The researchers, to save time, used JAX, an XLA-based high-performance computational library that can perform efficient automatic vectorization and on-the-fly compilation of mathematical computations, recommended for speeding up DP-SGD in the context of sets smaller data sets such as CIFAR-10.
Their implementation of DP-SGD on JAX was compared to the ImageNet Big Dataset. Although relatively simple, it resulted in notable performance gains simply due to the use of the XLA compiler over other DP-SGD implementations, such as Tensorflow Privacy. XLA is generally even faster than the custom and optimized PyTorch Opacus.
Each step of the DP-SGD implementation requires approximately two round trips through the network. While non-private training requires only one pass, Google’s approach is the most efficient for training with the gradients needed for DP-SGD.
The researchers deemed DP-SGD on JAX fast enough to perform large experiments simply by slightly reducing the number of training runs used to find optimal hyperparameters compared to non-private training. A clear improvement compared to Tensorflow Privacy, 5 to 10 times slower on CIFAR10 and MNIST. The graph below shows the training run times for two models on ImageNet with DP-SGD versus non-private SGD, each on JAX.
Transfer learning from public data
Prior training on public data followed by adjustment of DP on private data has been shown to improve accuracy on other benchmarks but the problem is finding which public data to use for a given task in order to optimize transfer learning. Google researchers simulated a separation of private/public data by using ImageNet as the “private” data and using Places365, another image classification dataset, as a proxy for the “public” data. For this purpose, they pre-trained their models on Places365 before refining them with DP-SGD on ImageNet.
Places365 only features images of landscapes and buildings, not animals like ImageNet, making it a good candidate to demonstrate the model’s ability to transfer into a different but related domain. Places365 transfer learning yielded 47.5% accuracy on ImageNet with a reasonable level of privacy (ε = 10). This is low compared to the 70% accuracy of a similar nonprivate model, but compared to naive DP training on ImageNet, which yields very low accuracy (2–5%) or no privacy (ε = 10 ), this is a good result.
The Google researchers hope that these initial results and the source code will give impetus to other researchers to work on improving DP and recommend that they start with a baseline that incorporates full batch training and step-by-step learning. transfer.
Sources of the article: “Toward Training at ImageNet Scale with Differential Privacy”