Training Resolution model¶

Tested on kernel TensorFow on GPU as defined in the image landerlini/lhcbaf:v0p8¶

This notebook is part of a pipeline. It requires the preprocessing step defined in the GAN preprocessing notebook and the trained model is validated in the Resolution-validation notebook.

Environment and libraries¶

As for the other trainings, we are handling the GPU with TensorFlow. To make sure the GPU is found, we print below the system name of the accelerator.

GPU: /device:GPU:0

Loading the data¶

The data are loaded with our custom FeatherReader helper class, defined in the local module feather_io.

In this notebook, we are using:

  • training data: to optimize the weights of the network
  • validation data: to evaluate overtraining effects

A chunk of data is loaded to ease the construction of the model, for example defining the shapes of the input and output tensors.

TensorShape([352220, 9])

Definition of the model¶

The GANs used in this module are composed of three different neural networks trained simultaneously, namely:

  • a generator neural network that takes as an input the condition (such as the generator-level features) and the random noise and formulate predictions for the output
  • a discriminator neural network trained to identify whether a sample was part of the reference dataset or was produced by the generator
  • a referee network that mimics the configuration of the discriminator, is trained with a much larger learning rate.

Generator and discrminator are part of the architecture of GANs since they were proposed in 2014, with the underlying idea that the generator has to be trained to worsen the classification ability of the discriminator. Generator and discriminator are players of a game in which they try to drive the classification loss function towards higher and lower values, respectively. The outcome of the game that we wish to reach an equilibrium between the two players (Nash equilibrium) that forces both players to improve their abilities along the training and results into a generator providing samples with distributions almost indistinguishable from those of the reference sample.

Unfortunately, using the loss function of the discriminator as adversarial objective of two neural networks makes it harder to interpret it as metric for the goodness of the trained model. A high value of the loss may be obtained with both an excellent generator evaluated by an excellent discriminator, or by an awful generator evaluated by an awful discriminator. To evaluate the progress of the training and assess the overall quality of the generator, we introduced a third neural network to the game that does not contribute to the training of the generator, but is a spectator providing and independent assessment of the agreement between the generated and the trained sample. We call this the referee network. Since it does not provide a feedback to the generator, larger jumps in the parameter space do not cause the training procedure of the generator to derail because of confusing information. Therefore, the learning rate can be larger. A large learning rate is also useful to provide timely updates on the status of the network, identifying quickly the discrepancies between the generated and the reference sample.

The referee network was found to provide useful information when comparing the loss on the training and validation datasets. Indeed, we observed the ability of the discriminator to learn "by heart" some of the outlayers in the reference sample, resulting in a classification that does not generalize to the validation dataset.

As for other tasks, we observed that L2 regularization is useful to contrast overtraining. In addition, L2 regularization is expected to contrast sharp boundaries in the classification function that may lead to explosions of the gradient when training the generator.

As discussed for the acceptance models, we adopted very deep neural networks with skip connections to achieve decent results with limited effort in hyperparameter optimization. This choice might be reviewed with additional effort on tuning.

Generator architecture¶

The generator takes the input conditions and the random noise as inputs and provides the target features as an output. In formula, $$ g: \mathbb R^{N_I} \otimes \mathbb R^{N_R} \to \mathbb R^{N_O} $$ where $N_I$ and $N_O$ respresent the number of input and output features, respectively; while $N_R$ is the dimension of the random noise

The activation function of the last layer is linear to avoid any constraint on the generated features.

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 12)]         0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, 128)]        0                                            
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 140)          0           input_1[0][0]                    
                                                                 input_2[0][0]                    
__________________________________________________________________________________________________
dense (Dense)                   (None, 128)          18048       concatenate[0][0]                
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 128)          16512       dense[0][0]                      
__________________________________________________________________________________________________
tf.__operators__.add (TFOpLambd (None, 128)          0           dense[0][0]                      
                                                                 dense_1[0][0]                    
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 128)          16512       tf.__operators__.add[0][0]       
__________________________________________________________________________________________________
tf.__operators__.add_1 (TFOpLam (None, 128)          0           tf.__operators__.add[0][0]       
                                                                 dense_2[0][0]                    
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 128)          16512       tf.__operators__.add_1[0][0]     
__________________________________________________________________________________________________
tf.__operators__.add_2 (TFOpLam (None, 128)          0           tf.__operators__.add_1[0][0]     
                                                                 dense_3[0][0]                    
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 128)          16512       tf.__operators__.add_2[0][0]     
__________________________________________________________________________________________________
tf.__operators__.add_3 (TFOpLam (None, 128)          0           tf.__operators__.add_2[0][0]     
                                                                 dense_4[0][0]                    
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 128)          16512       tf.__operators__.add_3[0][0]     
__________________________________________________________________________________________________
tf.__operators__.add_4 (TFOpLam (None, 128)          0           tf.__operators__.add_3[0][0]     
                                                                 dense_5[0][0]                    
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 128)          16512       tf.__operators__.add_4[0][0]     
__________________________________________________________________________________________________
tf.__operators__.add_5 (TFOpLam (None, 128)          0           tf.__operators__.add_4[0][0]     
                                                                 dense_6[0][0]                    
__________________________________________________________________________________________________
dense_7 (Dense)                 (None, 128)          16512       tf.__operators__.add_5[0][0]     
__________________________________________________________________________________________________
tf.__operators__.add_6 (TFOpLam (None, 128)          0           tf.__operators__.add_5[0][0]     
                                                                 dense_7[0][0]                    
__________________________________________________________________________________________________
dense_8 (Dense)                 (None, 128)          16512       tf.__operators__.add_6[0][0]     
__________________________________________________________________________________________________
tf.__operators__.add_7 (TFOpLam (None, 128)          0           tf.__operators__.add_6[0][0]     
                                                                 dense_8[0][0]                    
__________________________________________________________________________________________________
dense_9 (Dense)                 (None, 128)          16512       tf.__operators__.add_7[0][0]     
__________________________________________________________________________________________________
tf.__operators__.add_8 (TFOpLam (None, 128)          0           tf.__operators__.add_7[0][0]     
                                                                 dense_9[0][0]                    
__________________________________________________________________________________________________
dense_10 (Dense)                (None, 128)          16512       tf.__operators__.add_8[0][0]     
__________________________________________________________________________________________________
tf.__operators__.add_9 (TFOpLam (None, 128)          0           tf.__operators__.add_8[0][0]     
                                                                 dense_10[0][0]                   
__________________________________________________________________________________________________
dense_11 (Dense)                (None, 9)            1161        tf.__operators__.add_9[0][0]     
==================================================================================================
Total params: 184,329
Trainable params: 184,329
Non-trainable params: 0
__________________________________________________________________________________________________

Discriminator¶

The generator takes as an input the conditions and the target features (either generated or from the reference sample) and provide the probability for the sample of being part of the reference sample.

In formula, $$ d^{(th)}: \mathbb R^{N_I} \otimes \mathbb R^{N_O} \to [0, 1] \subset \mathbb R. $$

Usually, to map the response of a neural network in the interval $[0, 1]$, the sigmoid function is used. At implementation level, we decided to move the evaluation of the sigmoid from the activation of the last layer to the computation of the loss function. This is believed to improve the numerical stability of the computation by avoiding taking the exponential of a logarithm.

In practice, our implemented discriminator will be described by $$ d^{(impl)}: \mathbb R^{N_I} \otimes \mathbb R^{N_O} \to \mathbb R. $$

In terms of implemntation, the tensor we are passing as an input to the neural network for each batch is composed as depicted in the following table

$X$$y$
Input conditions (gen. level features)Reference target features1
Input conditions (gen. level features)Generated target features0

The input conditions are repeated twice, but in the first half of the batch they are completed with the output features of the reference samples and labeld as $1$. In the second half of the batch they are completed with randomly generated features and labeld with $0$.

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_3 (InputLayer)            [(None, 12)]         0                                            
__________________________________________________________________________________________________
input_4 (InputLayer)            [(None, 9)]          0                                            
__________________________________________________________________________________________________
input_5 (InputLayer)            [(None, 9)]          0                                            
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 12)           0           input_3[0][0]                    
                                                                 input_3[0][0]                    
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 9)            0           input_4[0][0]                    
                                                                 input_5[0][0]                    
__________________________________________________________________________________________________
concatenate_3 (Concatenate)     (None, 21)           0           concatenate_2[0][0]              
                                                                 concatenate_1[0][0]              
__________________________________________________________________________________________________
dense_12 (Dense)                (None, 128)          2816        concatenate_3[0][0]              
__________________________________________________________________________________________________
dense_13 (Dense)                (None, 128)          16512       dense_12[0][0]                   
__________________________________________________________________________________________________
tf.__operators__.add_10 (TFOpLa (None, 128)          0           dense_12[0][0]                   
                                                                 dense_13[0][0]                   
__________________________________________________________________________________________________
dense_14 (Dense)                (None, 128)          16512       tf.__operators__.add_10[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_11 (TFOpLa (None, 128)          0           tf.__operators__.add_10[0][0]    
                                                                 dense_14[0][0]                   
__________________________________________________________________________________________________
dense_15 (Dense)                (None, 128)          16512       tf.__operators__.add_11[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_12 (TFOpLa (None, 128)          0           tf.__operators__.add_11[0][0]    
                                                                 dense_15[0][0]                   
__________________________________________________________________________________________________
dense_16 (Dense)                (None, 128)          16512       tf.__operators__.add_12[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_13 (TFOpLa (None, 128)          0           tf.__operators__.add_12[0][0]    
                                                                 dense_16[0][0]                   
__________________________________________________________________________________________________
dense_17 (Dense)                (None, 128)          16512       tf.__operators__.add_13[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_14 (TFOpLa (None, 128)          0           tf.__operators__.add_13[0][0]    
                                                                 dense_17[0][0]                   
__________________________________________________________________________________________________
dense_18 (Dense)                (None, 128)          16512       tf.__operators__.add_14[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_15 (TFOpLa (None, 128)          0           tf.__operators__.add_14[0][0]    
                                                                 dense_18[0][0]                   
__________________________________________________________________________________________________
dense_19 (Dense)                (None, 128)          16512       tf.__operators__.add_15[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_16 (TFOpLa (None, 128)          0           tf.__operators__.add_15[0][0]    
                                                                 dense_19[0][0]                   
__________________________________________________________________________________________________
dense_20 (Dense)                (None, 128)          16512       tf.__operators__.add_16[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_17 (TFOpLa (None, 128)          0           tf.__operators__.add_16[0][0]    
                                                                 dense_20[0][0]                   
__________________________________________________________________________________________________
dense_21 (Dense)                (None, 128)          16512       tf.__operators__.add_17[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_18 (TFOpLa (None, 128)          0           tf.__operators__.add_17[0][0]    
                                                                 dense_21[0][0]                   
__________________________________________________________________________________________________
dense_22 (Dense)                (None, 128)          16512       tf.__operators__.add_18[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_19 (TFOpLa (None, 128)          0           tf.__operators__.add_18[0][0]    
                                                                 dense_22[0][0]                   
__________________________________________________________________________________________________
dense_23 (Dense)                (None, 1)            129         tf.__operators__.add_19[0][0]    
==================================================================================================
Total params: 168,065
Trainable params: 168,065
Non-trainable params: 0
__________________________________________________________________________________________________

Referee network¶

The referee network mimics the discriminator network, therefore $$ r: \mathbb R^{N_I} \otimes \mathbb R^{N_O} \to \mathbb R $$ The input tensor is built in the same way as for the discriminator.

Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_6 (InputLayer)            [(None, 12)]         0                                            
__________________________________________________________________________________________________
input_7 (InputLayer)            [(None, 9)]          0                                            
__________________________________________________________________________________________________
input_8 (InputLayer)            [(None, 9)]          0                                            
__________________________________________________________________________________________________
concatenate_5 (Concatenate)     (None, 12)           0           input_6[0][0]                    
                                                                 input_6[0][0]                    
__________________________________________________________________________________________________
concatenate_4 (Concatenate)     (None, 9)            0           input_7[0][0]                    
                                                                 input_8[0][0]                    
__________________________________________________________________________________________________
concatenate_6 (Concatenate)     (None, 21)           0           concatenate_5[0][0]              
                                                                 concatenate_4[0][0]              
__________________________________________________________________________________________________
dense_24 (Dense)                (None, 128)          2816        concatenate_6[0][0]              
__________________________________________________________________________________________________
dense_25 (Dense)                (None, 128)          16512       dense_24[0][0]                   
__________________________________________________________________________________________________
tf.__operators__.add_20 (TFOpLa (None, 128)          0           dense_24[0][0]                   
                                                                 dense_25[0][0]                   
__________________________________________________________________________________________________
dense_26 (Dense)                (None, 128)          16512       tf.__operators__.add_20[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_21 (TFOpLa (None, 128)          0           tf.__operators__.add_20[0][0]    
                                                                 dense_26[0][0]                   
__________________________________________________________________________________________________
dense_27 (Dense)                (None, 128)          16512       tf.__operators__.add_21[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_22 (TFOpLa (None, 128)          0           tf.__operators__.add_21[0][0]    
                                                                 dense_27[0][0]                   
__________________________________________________________________________________________________
dense_28 (Dense)                (None, 128)          16512       tf.__operators__.add_22[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_23 (TFOpLa (None, 128)          0           tf.__operators__.add_22[0][0]    
                                                                 dense_28[0][0]                   
__________________________________________________________________________________________________
dense_29 (Dense)                (None, 128)          16512       tf.__operators__.add_23[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_24 (TFOpLa (None, 128)          0           tf.__operators__.add_23[0][0]    
                                                                 dense_29[0][0]                   
__________________________________________________________________________________________________
dense_30 (Dense)                (None, 128)          16512       tf.__operators__.add_24[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_25 (TFOpLa (None, 128)          0           tf.__operators__.add_24[0][0]    
                                                                 dense_30[0][0]                   
__________________________________________________________________________________________________
dense_31 (Dense)                (None, 128)          16512       tf.__operators__.add_25[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_26 (TFOpLa (None, 128)          0           tf.__operators__.add_25[0][0]    
                                                                 dense_31[0][0]                   
__________________________________________________________________________________________________
dense_32 (Dense)                (None, 128)          16512       tf.__operators__.add_26[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_27 (TFOpLa (None, 128)          0           tf.__operators__.add_26[0][0]    
                                                                 dense_32[0][0]                   
__________________________________________________________________________________________________
dense_33 (Dense)                (None, 128)          16512       tf.__operators__.add_27[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_28 (TFOpLa (None, 128)          0           tf.__operators__.add_27[0][0]    
                                                                 dense_33[0][0]                   
__________________________________________________________________________________________________
dense_34 (Dense)                (None, 128)          16512       tf.__operators__.add_28[0][0]    
__________________________________________________________________________________________________
tf.__operators__.add_29 (TFOpLa (None, 128)          0           tf.__operators__.add_28[0][0]    
                                                                 dense_34[0][0]                   
__________________________________________________________________________________________________
dense_35 (Dense)                (None, 1)            129         tf.__operators__.add_29[0][0]    
==================================================================================================
Total params: 168,065
Trainable params: 168,065
Non-trainable params: 0
__________________________________________________________________________________________________

Training step¶

The training step is defined with the lower-level tensorflow API because we need to carefully tune which weights we wish to update based on each evaluation of a loss function.

Technically, we are using the tensorflow GradientTape to keep track of the gradient while we describe the computation of the loss function. We will have a different tape for each neural network, recording the derivatives of the loss functions with respect to the weights of that particular network.

Notes on the chosen loss function¶

The loss function for the classification task is clearly a Binary Cross Entropy (BCE). However we adopt two non-default options for its computation:

  • from_logit=True, to improve the numerical stability of the gradient computation, which is of particular importance for GANs because of the very long training procedure that may inflate the errors due to many subsequent iterations
  • label_smoothing=0.1, to introduce a penalty against overconfident classification, which corresponds to the plateaux of the sigmoid function, where the gradient is null, providing no useful information for the generator's training.

Training¶

  • Batch size: 100k
  • Number of epochs: 1000
Training (validation) loss: 0.559 (0.603): 100%|████████████████████████████████████████████████████████████████████| 1000/1000 [53:10<00:00,  3.19s/it]

The evolution of the loss function as evaluated by the referee network is reported below. A dashed line represent the ideal value of the BCE when evaluated on two identical datasets with an ideal classifier.

Exporting the model¶

The model is exported to the same directory were the preprocessing steps tX and tY were stored.

WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
INFO:tensorflow:Assets written to: /workarea/local/private/cache/models/resolution/assets

Conclusion¶

In this notebook we discussed the training procedure of the GAN model used to parametrize the resolution.

In particular, we discussed

  • the overall structure of the DNN system;
  • the architecture of the generator, discriminator and of a referee network we introduced to ease monitoring, debugging an hyperparameter optimization
  • the procedure for optimizing the weights of the three networks based on three different computations of the gradients
  • the outcome of the training procedure as visualized by the evolution of the loss of the referee network

Finally, we exported the model for deployment and further validation.