TensorFow on GPU
as defined in the image landerlini/lhcbaf:v0p8
¶This notebook is part of a pipeline. It requires the preprocessing step defined in the GAN preprocessing notebook and the trained model is validated in the Covariance-validation notebook.
As for the other trainings, we are handling the GPU with TensorFlow. To make sure the GPU is found, we print below the system name of the accelerator.
GPU: /device:GPU:0
The data are loaded with our custom FeatherReader
helper class, defined in the local module feather_io
.
In this notebook, we are using:
A chunk of data is loaded to ease the construction of the model, for example defining the shapes of the input and output tensors.
TensorShape([352612, 15])
High correlation of the input features may result in large distances between the distributions of the trained and generated dataset with minimal overlap. In this conditions, the discriminator would be unable to drive the training of the generator towards a successful end (unless more advanced loss functions are used).
Since we adopted a rather sophisticated preprocessing step, it is worth verifying that it behaves as expected, clearing strong correlations between variables.
The GANs used in this module are composed of three different neural networks trained simultaneously, namely:
For a discussion on the techniques used to describe the GAN, please refer to the resolution notebook.
The generator architecture is very similar to the one adopted for the resolution GAN.
Model: "model" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) [(None, 15)] 0 __________________________________________________________________________________________________ input_2 (InputLayer) [(None, 128)] 0 __________________________________________________________________________________________________ concatenate (Concatenate) (None, 143) 0 input_1[0][0] input_2[0][0] __________________________________________________________________________________________________ dense (Dense) (None, 128) 18432 concatenate[0][0] __________________________________________________________________________________________________ dense_1 (Dense) (None, 128) 16512 dense[0][0] __________________________________________________________________________________________________ tf.__operators__.add (TFOpLambd (None, 128) 0 dense[0][0] dense_1[0][0] __________________________________________________________________________________________________ dense_2 (Dense) (None, 128) 16512 tf.__operators__.add[0][0] __________________________________________________________________________________________________ tf.__operators__.add_1 (TFOpLam (None, 128) 0 tf.__operators__.add[0][0] dense_2[0][0] __________________________________________________________________________________________________ dense_3 (Dense) (None, 128) 16512 tf.__operators__.add_1[0][0] __________________________________________________________________________________________________ tf.__operators__.add_2 (TFOpLam (None, 128) 0 tf.__operators__.add_1[0][0] dense_3[0][0] __________________________________________________________________________________________________ dense_4 (Dense) (None, 128) 16512 tf.__operators__.add_2[0][0] __________________________________________________________________________________________________ tf.__operators__.add_3 (TFOpLam (None, 128) 0 tf.__operators__.add_2[0][0] dense_4[0][0] __________________________________________________________________________________________________ dense_5 (Dense) (None, 128) 16512 tf.__operators__.add_3[0][0] __________________________________________________________________________________________________ tf.__operators__.add_4 (TFOpLam (None, 128) 0 tf.__operators__.add_3[0][0] dense_5[0][0] __________________________________________________________________________________________________ dense_6 (Dense) (None, 128) 16512 tf.__operators__.add_4[0][0] __________________________________________________________________________________________________ tf.__operators__.add_5 (TFOpLam (None, 128) 0 tf.__operators__.add_4[0][0] dense_6[0][0] __________________________________________________________________________________________________ dense_7 (Dense) (None, 128) 16512 tf.__operators__.add_5[0][0] __________________________________________________________________________________________________ tf.__operators__.add_6 (TFOpLam (None, 128) 0 tf.__operators__.add_5[0][0] dense_7[0][0] __________________________________________________________________________________________________ dense_8 (Dense) (None, 128) 16512 tf.__operators__.add_6[0][0] __________________________________________________________________________________________________ tf.__operators__.add_7 (TFOpLam (None, 128) 0 tf.__operators__.add_6[0][0] dense_8[0][0] __________________________________________________________________________________________________ dense_9 (Dense) (None, 15) 1935 tf.__operators__.add_7[0][0] ================================================================================================== Total params: 152,463 Trainable params: 152,463 Non-trainable params: 0 __________________________________________________________________________________________________
Please note that we observe better performance using shallower neural networks and removing the skip connections for the discriminator. It is not clear why, but it is possible that the classification problem is so different from a logistic regression (for example because of the large number of flags) that propagating the input to the output is not beneficial and limits the network ability to perform the classification.
As for the resolution GAN, the input tensor is as follows.
$X$ | $y$ | ||
---|---|---|---|
Input conditions (gen. level features) | Reference target features | 1 | |
Input conditions (gen. level features) | Generated target features | 0 |
The input conditions are repeated twice, but in the first half of the batch they are completed with the output features of the reference samples and labeld as $1$. In the second half of the batch they are completed with randomly generated features and labeld with $0$.
Model: "model_1" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== X_ref (InputLayer) [(None, 15)] 0 __________________________________________________________________________________________________ Y_ref (InputLayer) [(None, 15)] 0 __________________________________________________________________________________________________ Y_gen (InputLayer) [(None, 15)] 0 __________________________________________________________________________________________________ X (Concatenate) (None, 15) 0 X_ref[0][0] X_ref[0][0] __________________________________________________________________________________________________ Y (Concatenate) (None, 15) 0 Y_ref[0][0] Y_gen[0][0] __________________________________________________________________________________________________ XY (Concatenate) (None, 30) 0 X[0][0] Y[0][0] __________________________________________________________________________________________________ dense_10 (Dense) (None, 128) 3968 XY[0][0] __________________________________________________________________________________________________ dense_11 (Dense) (None, 128) 16512 dense_10[0][0] __________________________________________________________________________________________________ dense_12 (Dense) (None, 128) 16512 dense_11[0][0] __________________________________________________________________________________________________ dense_13 (Dense) (None, 128) 16512 dense_12[0][0] __________________________________________________________________________________________________ dense_14 (Dense) (None, 128) 16512 dense_13[0][0] __________________________________________________________________________________________________ dense_15 (Dense) (None, 128) 16512 dense_14[0][0] __________________________________________________________________________________________________ dense_16 (Dense) (None, 128) 16512 dense_15[0][0] __________________________________________________________________________________________________ dense_17 (Dense) (None, 1) 129 dense_16[0][0] ================================================================================================== Total params: 103,169 Trainable params: 103,169 Non-trainable params: 0 __________________________________________________________________________________________________
The referee is kept as similar as possible to the discriminator, but trained with a larger learning rate.
Model: "model_2" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_3 (InputLayer) [(None, 15)] 0 __________________________________________________________________________________________________ input_4 (InputLayer) [(None, 15)] 0 __________________________________________________________________________________________________ input_5 (InputLayer) [(None, 15)] 0 __________________________________________________________________________________________________ concatenate_2 (Concatenate) (None, 15) 0 input_3[0][0] input_3[0][0] __________________________________________________________________________________________________ concatenate_1 (Concatenate) (None, 15) 0 input_4[0][0] input_5[0][0] __________________________________________________________________________________________________ concatenate_3 (Concatenate) (None, 30) 0 concatenate_2[0][0] concatenate_1[0][0] __________________________________________________________________________________________________ dense_18 (Dense) (None, 128) 3968 concatenate_3[0][0] __________________________________________________________________________________________________ dense_19 (Dense) (None, 128) 16512 dense_18[0][0] __________________________________________________________________________________________________ dense_20 (Dense) (None, 128) 16512 dense_19[0][0] __________________________________________________________________________________________________ dense_21 (Dense) (None, 128) 16512 dense_20[0][0] __________________________________________________________________________________________________ dense_22 (Dense) (None, 128) 16512 dense_21[0][0] __________________________________________________________________________________________________ dense_23 (Dense) (None, 128) 16512 dense_22[0][0] __________________________________________________________________________________________________ dense_24 (Dense) (None, 128) 16512 dense_23[0][0] __________________________________________________________________________________________________ dense_25 (Dense) (None, 1) 129 dense_24[0][0] ================================================================================================== Total params: 103,169 Trainable params: 103,169 Non-trainable params: 0 __________________________________________________________________________________________________
The training step is defined with the lower-level tensorflow API because we need to carefully tune which weights we wish to update based on each evaluation of a loss function.
Technically, we are using the tensorflow GradientTape to keep track of the gradient while we describe the computation of the loss function. We will have a different tape for each neural network, recording the derivatives of the loss functions with respect to the weights of that particular network.
The loss function for the classification task is clearly a Binary Cross Entropy (BCE). However we adopt two non-default options for its computation:
from_logit=True
, to improve the numerical stability of the gradient computation, which
is of particular importance for GANs because of the very long training procedure that
may inflate the errors due to many subsequent iterationslabel_smoothing=0.1
, to introduce a penalty against overconfident classification, which
corresponds to the plateaux of the sigmoid function, where the gradient is null, providing
no useful information for the generator's training.The evolution of the loss function as evaluated by the referee network is reported below. A dashed line represent the ideal value of the BCE when evaluated on two identical datasets with an ideal classifier.
In the following plot we represent the correlation between the output features as they are in the reference dataset and as they are reproduced by the GAN.
The original dataset is represented in blue, while the generated dataset in orange.
The model is exported to the same directory were the preprocessing steps tX
and tY
were stored.
WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model. INFO:tensorflow:Assets written to: /workarea/local/private/cache/models/covariance/assets
In this notebook we discussed the training procedure of the GAN model used to parametrize the covariance. The model is very similar to the one adopted for the resolution, with some difference in the architecture of the discriminator and referee networks.
In particular, we discussed
Finally, we exported the model for deployment and further validation.