TensorFow on GPU
as defined in the image landerlini/lhcbaf:v0p8
¶This notebook is part of a pipeline. It requires the preprocessing step defined in the GAN preprocessing notebook and the trained model is validated in the Resolution-validation notebook.
As for the other trainings, we are handling the GPU with TensorFlow. To make sure the GPU is found, we print below the system name of the accelerator.
GPU: /device:GPU:0
The data are loaded with our custom FeatherReader
helper class, defined in the local module feather_io
.
In this notebook, we are using:
A chunk of data is loaded to ease the construction of the model, for example defining the shapes of the input and output tensors.
TensorShape([352220, 9])
The GANs used in this module are composed of three different neural networks trained simultaneously, namely:
Generator and discrminator are part of the architecture of GANs since they were proposed in 2014, with the underlying idea that the generator has to be trained to worsen the classification ability of the discriminator. Generator and discriminator are players of a game in which they try to drive the classification loss function towards higher and lower values, respectively. The outcome of the game that we wish to reach an equilibrium between the two players (Nash equilibrium) that forces both players to improve their abilities along the training and results into a generator providing samples with distributions almost indistinguishable from those of the reference sample.
Unfortunately, using the loss function of the discriminator as adversarial objective of two neural networks makes it harder to interpret it as metric for the goodness of the trained model. A high value of the loss may be obtained with both an excellent generator evaluated by an excellent discriminator, or by an awful generator evaluated by an awful discriminator. To evaluate the progress of the training and assess the overall quality of the generator, we introduced a third neural network to the game that does not contribute to the training of the generator, but is a spectator providing and independent assessment of the agreement between the generated and the trained sample. We call this the referee network. Since it does not provide a feedback to the generator, larger jumps in the parameter space do not cause the training procedure of the generator to derail because of confusing information. Therefore, the learning rate can be larger. A large learning rate is also useful to provide timely updates on the status of the network, identifying quickly the discrepancies between the generated and the reference sample.
The referee network was found to provide useful information when comparing the loss on the training and validation datasets. Indeed, we observed the ability of the discriminator to learn "by heart" some of the outlayers in the reference sample, resulting in a classification that does not generalize to the validation dataset.
As for other tasks, we observed that L2 regularization is useful to contrast overtraining. In addition, L2 regularization is expected to contrast sharp boundaries in the classification function that may lead to explosions of the gradient when training the generator.
As discussed for the acceptance models, we adopted very deep neural networks with skip connections to achieve decent results with limited effort in hyperparameter optimization. This choice might be reviewed with additional effort on tuning.
The generator takes the input conditions and the random noise as inputs and provides the target features as an output. In formula, $$ g: \mathbb R^{N_I} \otimes \mathbb R^{N_R} \to \mathbb R^{N_O} $$ where $N_I$ and $N_O$ respresent the number of input and output features, respectively; while $N_R$ is the dimension of the random noise
The activation function of the last layer is linear to avoid any constraint on the generated features.
Model: "model" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) [(None, 12)] 0 __________________________________________________________________________________________________ input_2 (InputLayer) [(None, 128)] 0 __________________________________________________________________________________________________ concatenate (Concatenate) (None, 140) 0 input_1[0][0] input_2[0][0] __________________________________________________________________________________________________ dense (Dense) (None, 128) 18048 concatenate[0][0] __________________________________________________________________________________________________ dense_1 (Dense) (None, 128) 16512 dense[0][0] __________________________________________________________________________________________________ tf.__operators__.add (TFOpLambd (None, 128) 0 dense[0][0] dense_1[0][0] __________________________________________________________________________________________________ dense_2 (Dense) (None, 128) 16512 tf.__operators__.add[0][0] __________________________________________________________________________________________________ tf.__operators__.add_1 (TFOpLam (None, 128) 0 tf.__operators__.add[0][0] dense_2[0][0] __________________________________________________________________________________________________ dense_3 (Dense) (None, 128) 16512 tf.__operators__.add_1[0][0] __________________________________________________________________________________________________ tf.__operators__.add_2 (TFOpLam (None, 128) 0 tf.__operators__.add_1[0][0] dense_3[0][0] __________________________________________________________________________________________________ dense_4 (Dense) (None, 128) 16512 tf.__operators__.add_2[0][0] __________________________________________________________________________________________________ tf.__operators__.add_3 (TFOpLam (None, 128) 0 tf.__operators__.add_2[0][0] dense_4[0][0] __________________________________________________________________________________________________ dense_5 (Dense) (None, 128) 16512 tf.__operators__.add_3[0][0] __________________________________________________________________________________________________ tf.__operators__.add_4 (TFOpLam (None, 128) 0 tf.__operators__.add_3[0][0] dense_5[0][0] __________________________________________________________________________________________________ dense_6 (Dense) (None, 128) 16512 tf.__operators__.add_4[0][0] __________________________________________________________________________________________________ tf.__operators__.add_5 (TFOpLam (None, 128) 0 tf.__operators__.add_4[0][0] dense_6[0][0] __________________________________________________________________________________________________ dense_7 (Dense) (None, 128) 16512 tf.__operators__.add_5[0][0] __________________________________________________________________________________________________ tf.__operators__.add_6 (TFOpLam (None, 128) 0 tf.__operators__.add_5[0][0] dense_7[0][0] __________________________________________________________________________________________________ dense_8 (Dense) (None, 128) 16512 tf.__operators__.add_6[0][0] __________________________________________________________________________________________________ tf.__operators__.add_7 (TFOpLam (None, 128) 0 tf.__operators__.add_6[0][0] dense_8[0][0] __________________________________________________________________________________________________ dense_9 (Dense) (None, 128) 16512 tf.__operators__.add_7[0][0] __________________________________________________________________________________________________ tf.__operators__.add_8 (TFOpLam (None, 128) 0 tf.__operators__.add_7[0][0] dense_9[0][0] __________________________________________________________________________________________________ dense_10 (Dense) (None, 128) 16512 tf.__operators__.add_8[0][0] __________________________________________________________________________________________________ tf.__operators__.add_9 (TFOpLam (None, 128) 0 tf.__operators__.add_8[0][0] dense_10[0][0] __________________________________________________________________________________________________ dense_11 (Dense) (None, 9) 1161 tf.__operators__.add_9[0][0] ================================================================================================== Total params: 184,329 Trainable params: 184,329 Non-trainable params: 0 __________________________________________________________________________________________________
The generator takes as an input the conditions and the target features (either generated or from the reference sample) and provide the probability for the sample of being part of the reference sample.
In formula, $$ d^{(th)}: \mathbb R^{N_I} \otimes \mathbb R^{N_O} \to [0, 1] \subset \mathbb R. $$
Usually, to map the response of a neural network in the interval $[0, 1]$, the sigmoid function is used. At implementation level, we decided to move the evaluation of the sigmoid from the activation of the last layer to the computation of the loss function. This is believed to improve the numerical stability of the computation by avoiding taking the exponential of a logarithm.
In practice, our implemented discriminator will be described by $$ d^{(impl)}: \mathbb R^{N_I} \otimes \mathbb R^{N_O} \to \mathbb R. $$
In terms of implemntation, the tensor we are passing as an input to the neural network for each batch is composed as depicted in the following table
$X$ | $y$ | ||
---|---|---|---|
Input conditions (gen. level features) | Reference target features | 1 | |
Input conditions (gen. level features) | Generated target features | 0 |
The input conditions are repeated twice, but in the first half of the batch they are completed with the output features of the reference samples and labeld as $1$. In the second half of the batch they are completed with randomly generated features and labeld with $0$.
Model: "model_1" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_3 (InputLayer) [(None, 12)] 0 __________________________________________________________________________________________________ input_4 (InputLayer) [(None, 9)] 0 __________________________________________________________________________________________________ input_5 (InputLayer) [(None, 9)] 0 __________________________________________________________________________________________________ concatenate_2 (Concatenate) (None, 12) 0 input_3[0][0] input_3[0][0] __________________________________________________________________________________________________ concatenate_1 (Concatenate) (None, 9) 0 input_4[0][0] input_5[0][0] __________________________________________________________________________________________________ concatenate_3 (Concatenate) (None, 21) 0 concatenate_2[0][0] concatenate_1[0][0] __________________________________________________________________________________________________ dense_12 (Dense) (None, 128) 2816 concatenate_3[0][0] __________________________________________________________________________________________________ dense_13 (Dense) (None, 128) 16512 dense_12[0][0] __________________________________________________________________________________________________ tf.__operators__.add_10 (TFOpLa (None, 128) 0 dense_12[0][0] dense_13[0][0] __________________________________________________________________________________________________ dense_14 (Dense) (None, 128) 16512 tf.__operators__.add_10[0][0] __________________________________________________________________________________________________ tf.__operators__.add_11 (TFOpLa (None, 128) 0 tf.__operators__.add_10[0][0] dense_14[0][0] __________________________________________________________________________________________________ dense_15 (Dense) (None, 128) 16512 tf.__operators__.add_11[0][0] __________________________________________________________________________________________________ tf.__operators__.add_12 (TFOpLa (None, 128) 0 tf.__operators__.add_11[0][0] dense_15[0][0] __________________________________________________________________________________________________ dense_16 (Dense) (None, 128) 16512 tf.__operators__.add_12[0][0] __________________________________________________________________________________________________ tf.__operators__.add_13 (TFOpLa (None, 128) 0 tf.__operators__.add_12[0][0] dense_16[0][0] __________________________________________________________________________________________________ dense_17 (Dense) (None, 128) 16512 tf.__operators__.add_13[0][0] __________________________________________________________________________________________________ tf.__operators__.add_14 (TFOpLa (None, 128) 0 tf.__operators__.add_13[0][0] dense_17[0][0] __________________________________________________________________________________________________ dense_18 (Dense) (None, 128) 16512 tf.__operators__.add_14[0][0] __________________________________________________________________________________________________ tf.__operators__.add_15 (TFOpLa (None, 128) 0 tf.__operators__.add_14[0][0] dense_18[0][0] __________________________________________________________________________________________________ dense_19 (Dense) (None, 128) 16512 tf.__operators__.add_15[0][0] __________________________________________________________________________________________________ tf.__operators__.add_16 (TFOpLa (None, 128) 0 tf.__operators__.add_15[0][0] dense_19[0][0] __________________________________________________________________________________________________ dense_20 (Dense) (None, 128) 16512 tf.__operators__.add_16[0][0] __________________________________________________________________________________________________ tf.__operators__.add_17 (TFOpLa (None, 128) 0 tf.__operators__.add_16[0][0] dense_20[0][0] __________________________________________________________________________________________________ dense_21 (Dense) (None, 128) 16512 tf.__operators__.add_17[0][0] __________________________________________________________________________________________________ tf.__operators__.add_18 (TFOpLa (None, 128) 0 tf.__operators__.add_17[0][0] dense_21[0][0] __________________________________________________________________________________________________ dense_22 (Dense) (None, 128) 16512 tf.__operators__.add_18[0][0] __________________________________________________________________________________________________ tf.__operators__.add_19 (TFOpLa (None, 128) 0 tf.__operators__.add_18[0][0] dense_22[0][0] __________________________________________________________________________________________________ dense_23 (Dense) (None, 1) 129 tf.__operators__.add_19[0][0] ================================================================================================== Total params: 168,065 Trainable params: 168,065 Non-trainable params: 0 __________________________________________________________________________________________________
The referee network mimics the discriminator network, therefore $$ r: \mathbb R^{N_I} \otimes \mathbb R^{N_O} \to \mathbb R $$ The input tensor is built in the same way as for the discriminator.
Model: "model_2" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_6 (InputLayer) [(None, 12)] 0 __________________________________________________________________________________________________ input_7 (InputLayer) [(None, 9)] 0 __________________________________________________________________________________________________ input_8 (InputLayer) [(None, 9)] 0 __________________________________________________________________________________________________ concatenate_5 (Concatenate) (None, 12) 0 input_6[0][0] input_6[0][0] __________________________________________________________________________________________________ concatenate_4 (Concatenate) (None, 9) 0 input_7[0][0] input_8[0][0] __________________________________________________________________________________________________ concatenate_6 (Concatenate) (None, 21) 0 concatenate_5[0][0] concatenate_4[0][0] __________________________________________________________________________________________________ dense_24 (Dense) (None, 128) 2816 concatenate_6[0][0] __________________________________________________________________________________________________ dense_25 (Dense) (None, 128) 16512 dense_24[0][0] __________________________________________________________________________________________________ tf.__operators__.add_20 (TFOpLa (None, 128) 0 dense_24[0][0] dense_25[0][0] __________________________________________________________________________________________________ dense_26 (Dense) (None, 128) 16512 tf.__operators__.add_20[0][0] __________________________________________________________________________________________________ tf.__operators__.add_21 (TFOpLa (None, 128) 0 tf.__operators__.add_20[0][0] dense_26[0][0] __________________________________________________________________________________________________ dense_27 (Dense) (None, 128) 16512 tf.__operators__.add_21[0][0] __________________________________________________________________________________________________ tf.__operators__.add_22 (TFOpLa (None, 128) 0 tf.__operators__.add_21[0][0] dense_27[0][0] __________________________________________________________________________________________________ dense_28 (Dense) (None, 128) 16512 tf.__operators__.add_22[0][0] __________________________________________________________________________________________________ tf.__operators__.add_23 (TFOpLa (None, 128) 0 tf.__operators__.add_22[0][0] dense_28[0][0] __________________________________________________________________________________________________ dense_29 (Dense) (None, 128) 16512 tf.__operators__.add_23[0][0] __________________________________________________________________________________________________ tf.__operators__.add_24 (TFOpLa (None, 128) 0 tf.__operators__.add_23[0][0] dense_29[0][0] __________________________________________________________________________________________________ dense_30 (Dense) (None, 128) 16512 tf.__operators__.add_24[0][0] __________________________________________________________________________________________________ tf.__operators__.add_25 (TFOpLa (None, 128) 0 tf.__operators__.add_24[0][0] dense_30[0][0] __________________________________________________________________________________________________ dense_31 (Dense) (None, 128) 16512 tf.__operators__.add_25[0][0] __________________________________________________________________________________________________ tf.__operators__.add_26 (TFOpLa (None, 128) 0 tf.__operators__.add_25[0][0] dense_31[0][0] __________________________________________________________________________________________________ dense_32 (Dense) (None, 128) 16512 tf.__operators__.add_26[0][0] __________________________________________________________________________________________________ tf.__operators__.add_27 (TFOpLa (None, 128) 0 tf.__operators__.add_26[0][0] dense_32[0][0] __________________________________________________________________________________________________ dense_33 (Dense) (None, 128) 16512 tf.__operators__.add_27[0][0] __________________________________________________________________________________________________ tf.__operators__.add_28 (TFOpLa (None, 128) 0 tf.__operators__.add_27[0][0] dense_33[0][0] __________________________________________________________________________________________________ dense_34 (Dense) (None, 128) 16512 tf.__operators__.add_28[0][0] __________________________________________________________________________________________________ tf.__operators__.add_29 (TFOpLa (None, 128) 0 tf.__operators__.add_28[0][0] dense_34[0][0] __________________________________________________________________________________________________ dense_35 (Dense) (None, 1) 129 tf.__operators__.add_29[0][0] ================================================================================================== Total params: 168,065 Trainable params: 168,065 Non-trainable params: 0 __________________________________________________________________________________________________
The training step is defined with the lower-level tensorflow API because we need to carefully tune which weights we wish to update based on each evaluation of a loss function.
Technically, we are using the tensorflow GradientTape to keep track of the gradient while we describe the computation of the loss function. We will have a different tape for each neural network, recording the derivatives of the loss functions with respect to the weights of that particular network.
The loss function for the classification task is clearly a Binary Cross Entropy (BCE). However we adopt two non-default options for its computation:
from_logit=True
, to improve the numerical stability of the gradient computation, which
is of particular importance for GANs because of the very long training procedure that
may inflate the errors due to many subsequent iterationslabel_smoothing=0.1
, to introduce a penalty against overconfident classification, which
corresponds to the plateaux of the sigmoid function, where the gradient is null, providing
no useful information for the generator's training.The evolution of the loss function as evaluated by the referee network is reported below. A dashed line represent the ideal value of the BCE when evaluated on two identical datasets with an ideal classifier.
The model is exported to the same directory were the preprocessing steps tX
and tY
were stored.
WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model. INFO:tensorflow:Assets written to: /workarea/local/private/cache/models/resolution/assets
In this notebook we discussed the training procedure of the GAN model used to parametrize the resolution.
In particular, we discussed
Finally, we exported the model for deployment and further validation.