Training of the Acceptance model¶

Tested on environment TensorFlow on GPU from landerlini/lhcbaf:v0p8¶

In this notebook we setup the training for the acceptance model, based on a Deep Neural Network.

This notebook is part of a pipeline and requires the data preprocessed with the notebook Preprocessing.ipynb.

Enviornment setup and libraries¶

In this notebooks we will use the standard software stack for TensorFlow machine learning applications.

To ensure a GPU is found and properly loaded in the notebook kernel to the benefit of TensorFlow, the output of the following block should be similar to '/device:GPU:0'. If a GPU is not found, the string will be empty.

'/device:GPU:0'

Load datasets¶

Preprocessed datasets were stored in Apache feather format and can be reloaded with our custom FeatherReader designed to stream the datasets into TensorFlow or Dask format.

We are loading in particular:

  • the training dataset to train the network
  • the validation dataset to evaluate the network performance at each epoch and identify overtraining effects

We load a small batch of data to identify the number of conditional and target features and automate the definition of the neural network architecture.

TensorShape([1000000, 12])

Model definition¶

The function we are modelling is not a trivial one, requiring a sufficiently deep neural network. In order to limit the effect of the vanishing gradient and make the trainig procedure faster, we are using the concept of Residual Layer, introducing skipping connections between the input and the output of each layer.

With residual layers, the neural network is trained to learn the deviation from identity which is a much less demanding task than learning the whole transformation.

The output layer of the neural network has a sigmoid activation function to be mapped into the interval 0-1 and being interpreted as the probability for the particle of being in acceptance.

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 12)]         0                                            
__________________________________________________________________________________________________
dense (Dense)                   (None, 128)          1664        input_1[0][0]                    
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 128)          16512       dense[0][0]                      
__________________________________________________________________________________________________
add (Add)                       (None, 128)          0           dense[0][0]                      
                                                                 dense_1[0][0]                    
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 128)          16512       add[0][0]                        
__________________________________________________________________________________________________
add_1 (Add)                     (None, 128)          0           add[0][0]                        
                                                                 dense_2[0][0]                    
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 128)          16512       add_1[0][0]                      
__________________________________________________________________________________________________
add_2 (Add)                     (None, 128)          0           add_1[0][0]                      
                                                                 dense_3[0][0]                    
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 128)          16512       add_2[0][0]                      
__________________________________________________________________________________________________
add_3 (Add)                     (None, 128)          0           add_2[0][0]                      
                                                                 dense_4[0][0]                    
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 128)          16512       add_3[0][0]                      
__________________________________________________________________________________________________
add_4 (Add)                     (None, 128)          0           add_3[0][0]                      
                                                                 dense_5[0][0]                    
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 128)          16512       add_4[0][0]                      
__________________________________________________________________________________________________
add_5 (Add)                     (None, 128)          0           add_4[0][0]                      
                                                                 dense_6[0][0]                    
__________________________________________________________________________________________________
dense_7 (Dense)                 (None, 128)          16512       add_5[0][0]                      
__________________________________________________________________________________________________
add_6 (Add)                     (None, 128)          0           add_5[0][0]                      
                                                                 dense_7[0][0]                    
__________________________________________________________________________________________________
dense_8 (Dense)                 (None, 128)          16512       add_6[0][0]                      
__________________________________________________________________________________________________
add_7 (Add)                     (None, 128)          0           add_6[0][0]                      
                                                                 dense_8[0][0]                    
__________________________________________________________________________________________________
dense_9 (Dense)                 (None, 128)          16512       add_7[0][0]                      
__________________________________________________________________________________________________
add_8 (Add)                     (None, 128)          0           add_7[0][0]                      
                                                                 dense_9[0][0]                    
__________________________________________________________________________________________________
dense_10 (Dense)                (None, 128)          16512       add_8[0][0]                      
__________________________________________________________________________________________________
add_9 (Add)                     (None, 128)          0           add_8[0][0]                      
                                                                 dense_10[0][0]                   
__________________________________________________________________________________________________
dense_11 (Dense)                (None, 1)            129         add_9[0][0]                      
==================================================================================================
Total params: 166,913
Trainable params: 166,913
Non-trainable params: 0
__________________________________________________________________________________________________

The rest of the training procedure is rather standard:

  • loss function: binary cross-entropy
  • optimizer: RMSprop or Adam

To speed up convergence, we split the training procedure in two steps. First, we proceed at high learning rate and then we operate some fine tuning of the network drastrically reducing it. Similar results might be obtained in a more elegant way using a keras LearningRateScheduler.

Instead of setting a fixed number of epochs, we switch from one learning rate to the next one upon lack of improvement for at least 5 epochs.

Epoch 1/50
26/26 [==============================] - 9s 291ms/step - loss: 1.3900 - val_loss: 0.7639
Epoch 2/50
26/26 [==============================] - 8s 281ms/step - loss: 0.7300 - val_loss: 0.6839
Epoch 3/50
26/26 [==============================] - 8s 293ms/step - loss: 0.6042 - val_loss: 1.0523
Epoch 4/50
26/26 [==============================] - 8s 283ms/step - loss: 0.5679 - val_loss: 0.4884
Epoch 5/50
26/26 [==============================] - 8s 292ms/step - loss: 0.4773 - val_loss: 0.3921
Epoch 6/50
26/26 [==============================] - 8s 291ms/step - loss: 0.4453 - val_loss: 0.4443
Epoch 7/50
26/26 [==============================] - 8s 276ms/step - loss: 0.4114 - val_loss: 0.4245
Epoch 8/50
26/26 [==============================] - 8s 278ms/step - loss: 0.3885 - val_loss: 0.3633
Epoch 9/50
26/26 [==============================] - 8s 285ms/step - loss: 0.3778 - val_loss: 0.3277
Epoch 10/50
26/26 [==============================] - 8s 277ms/step - loss: 0.3715 - val_loss: 0.3291
Epoch 11/50
26/26 [==============================] - 10s 362ms/step - loss: 0.3678 - val_loss: 0.3117
Epoch 12/50
26/26 [==============================] - 10s 364ms/step - loss: 0.3440 - val_loss: 0.3286
Epoch 13/50
26/26 [==============================] - 10s 357ms/step - loss: 0.3456 - val_loss: 0.3153
Epoch 14/50
26/26 [==============================] - 8s 289ms/step - loss: 0.3351 - val_loss: 0.3022
Epoch 15/50
26/26 [==============================] - 10s 361ms/step - loss: 0.3229 - val_loss: 0.3071
Epoch 16/50
26/26 [==============================] - 8s 278ms/step - loss: 0.3195 - val_loss: 0.3054
Epoch 17/50
26/26 [==============================] - 8s 277ms/step - loss: 0.3086 - val_loss: 0.3002
Epoch 18/50
26/26 [==============================] - 8s 283ms/step - loss: 0.2968 - val_loss: 0.2765
Epoch 19/50
26/26 [==============================] - 7s 272ms/step - loss: 0.2914 - val_loss: 0.2625
Epoch 20/50
26/26 [==============================] - 7s 272ms/step - loss: 0.2863 - val_loss: 0.3002
Epoch 21/50
26/26 [==============================] - 8s 278ms/step - loss: 0.2809 - val_loss: 0.3399
Epoch 22/50
26/26 [==============================] - 8s 277ms/step - loss: 0.2702 - val_loss: 0.2573
Epoch 23/50
26/26 [==============================] - 8s 274ms/step - loss: 0.2724 - val_loss: 0.2506
Epoch 24/50
26/26 [==============================] - 8s 295ms/step - loss: 0.2637 - val_loss: 0.2451
Epoch 25/50
26/26 [==============================] - 8s 293ms/step - loss: 0.2678 - val_loss: 0.2637
Epoch 26/50
26/26 [==============================] - 8s 278ms/step - loss: 0.2829 - val_loss: 0.2400
Epoch 27/50
26/26 [==============================] - 8s 290ms/step - loss: 0.2592 - val_loss: 0.2585
Epoch 28/50
26/26 [==============================] - 8s 287ms/step - loss: 0.2561 - val_loss: 0.2524
Epoch 29/50
26/26 [==============================] - 8s 283ms/step - loss: 0.2562 - val_loss: 0.2382
Epoch 30/50
26/26 [==============================] - 8s 291ms/step - loss: 0.2572 - val_loss: 0.2517
Epoch 31/50
26/26 [==============================] - 8s 276ms/step - loss: 0.2536 - val_loss: 0.2284
Epoch 32/50
26/26 [==============================] - 8s 292ms/step - loss: 0.2448 - val_loss: 0.3143
Epoch 33/50
26/26 [==============================] - 8s 286ms/step - loss: 0.2516 - val_loss: 0.2296
Epoch 34/50
26/26 [==============================] - 8s 283ms/step - loss: 0.2461 - val_loss: 0.2688
Epoch 1/50
26/26 [==============================] - 11s 299ms/step - loss: 0.1928 - val_loss: 0.1836
Epoch 2/50
26/26 [==============================] - 8s 280ms/step - loss: 0.1828 - val_loss: 0.1798
Epoch 3/50
26/26 [==============================] - 10s 367ms/step - loss: 0.1791 - val_loss: 0.1771
Epoch 4/50
26/26 [==============================] - 10s 355ms/step - loss: 0.1763 - val_loss: 0.1748
Epoch 5/50
26/26 [==============================] - 8s 279ms/step - loss: 0.1744 - val_loss: 0.1735
Epoch 6/50
26/26 [==============================] - 10s 361ms/step - loss: 0.1731 - val_loss: 0.1726
Epoch 7/50
26/26 [==============================] - 10s 357ms/step - loss: 0.1715 - val_loss: 0.1705
Epoch 8/50
26/26 [==============================] - 8s 286ms/step - loss: 0.1698 - val_loss: 0.1691
Epoch 9/50
26/26 [==============================] - 8s 287ms/step - loss: 0.1686 - val_loss: 0.1681
Epoch 10/50
26/26 [==============================] - 7s 268ms/step - loss: 0.1673 - val_loss: 0.1666
Epoch 11/50
26/26 [==============================] - 8s 282ms/step - loss: 0.1661 - val_loss: 0.1652
Epoch 12/50
26/26 [==============================] - 8s 282ms/step - loss: 0.1650 - val_loss: 0.1650
Epoch 13/50
26/26 [==============================] - 8s 285ms/step - loss: 0.1641 - val_loss: 0.1639
Epoch 14/50
26/26 [==============================] - 7s 273ms/step - loss: 0.1634 - val_loss: 0.1641
Epoch 15/50
26/26 [==============================] - 8s 281ms/step - loss: 0.1627 - val_loss: 0.1613
Epoch 16/50
26/26 [==============================] - 7s 263ms/step - loss: 0.1618 - val_loss: 0.1637
Epoch 17/50
26/26 [==============================] - 10s 364ms/step - loss: 0.1616 - val_loss: 0.1604
Epoch 18/50
26/26 [==============================] - 8s 273ms/step - loss: 0.1608 - val_loss: 0.1614
Epoch 19/50
26/26 [==============================] - 8s 280ms/step - loss: 0.1601 - val_loss: 0.1603
Epoch 20/50
26/26 [==============================] - 7s 272ms/step - loss: 0.1598 - val_loss: 0.1597
Epoch 21/50
26/26 [==============================] - 8s 282ms/step - loss: 0.1584 - val_loss: 0.1583
Epoch 22/50
26/26 [==============================] - 8s 282ms/step - loss: 0.1575 - val_loss: 0.1574
Epoch 23/50
26/26 [==============================] - 7s 269ms/step - loss: 0.1568 - val_loss: 0.1561
Epoch 24/50
26/26 [==============================] - 8s 280ms/step - loss: 0.1562 - val_loss: 0.1563
Epoch 25/50
26/26 [==============================] - 8s 287ms/step - loss: 0.1559 - val_loss: 0.1572
Epoch 26/50
26/26 [==============================] - 8s 280ms/step - loss: 0.1552 - val_loss: 0.1548
Epoch 27/50
26/26 [==============================] - 7s 265ms/step - loss: 0.1544 - val_loss: 0.1540
Epoch 28/50
26/26 [==============================] - 7s 260ms/step - loss: 0.1539 - val_loss: 0.1539
Epoch 29/50
26/26 [==============================] - 8s 281ms/step - loss: 0.1529 - val_loss: 0.1526
Epoch 30/50
26/26 [==============================] - 8s 277ms/step - loss: 0.1529 - val_loss: 0.1518
Epoch 31/50
26/26 [==============================] - 10s 359ms/step - loss: 0.1527 - val_loss: 0.1535
Epoch 32/50
26/26 [==============================] - 8s 276ms/step - loss: 0.1518 - val_loss: 0.1527
Epoch 33/50
26/26 [==============================] - 7s 273ms/step - loss: 0.1520 - val_loss: 0.1512
Epoch 34/50
26/26 [==============================] - 7s 262ms/step - loss: 0.1511 - val_loss: 0.1540
Epoch 35/50
26/26 [==============================] - 8s 278ms/step - loss: 0.1506 - val_loss: 0.1510
Epoch 36/50
26/26 [==============================] - 9s 348ms/step - loss: 0.1501 - val_loss: 0.1494
Epoch 37/50
26/26 [==============================] - 8s 283ms/step - loss: 0.1500 - val_loss: 0.1496
Epoch 38/50
26/26 [==============================] - 7s 270ms/step - loss: 0.1496 - val_loss: 0.1503
Epoch 39/50
26/26 [==============================] - 8s 277ms/step - loss: 0.1494 - val_loss: 0.1499
Epoch 40/50
26/26 [==============================] - 7s 263ms/step - loss: 0.1489 - val_loss: 0.1513
Epoch 41/50
26/26 [==============================] - 8s 281ms/step - loss: 0.1489 - val_loss: 0.1500

The two regimes are well visibile in the plot below, combining the two training phases in a single epoch count.

A first, rough validation¶

While a proper validation of the model is demanded to a dedicated notebook, here we check that the training was not completely failed and provides reasonable numbers.

In particular, we compare the distributions of the training lables and of the predictions.

Clearly, the training labels are either 1 or 0 because a particle is either in acceptance or not. Instead, the output of the neural network is a probability and hence it will be distributed between 0 and 1.

The comparison is used to ensure that both the training labels and the network output are not unreasonable, for example collapsed into a single value or with training labels belonging to a single category.

Then we can compare the distribution of a preprocessed variable before and after the application of the acceptance requirement.

Here, for example, we consider the preprocessed log-value of the momentum.

Since the variable is preprocessed, without applying any cut (Generated) we have a perfectly normalized Gaussian.

Applying the criterion acceptance == 1 we get another distribution that we expect to model with decent approximation by applying the response of the trained neural networks as weights to the Generated dataset.

The comparison of the two histograms (obtained by applying a cut on the true acceptance or applying a weight on the acceptance probability) provides a first validation on the quality of the parametrization.

Export the model¶

Finally, we export the model to the same folder where the pretraining step was stored.

INFO:tensorflow:Assets written to: /workarea/local/private/cache/models/acceptance/assets
/usr/local/miniconda3/envs/tf_on_gpu/lib/python3.9/site-packages/keras/utils/generic_utils.py:494: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
  warnings.warn('Custom mask layers require a config and must override '

Conclusion¶

In this notebook we have:

  • loaded the training and validation data
  • defined a neural network model for the acceptance
  • trained the model on simulated data
  • performed a couple of sanity checks to ensure the training procedure did not fail
  • exported the model to disk

In the next notebook we will perform a more detaile validation, by splitting the sample in kinematic bins and comparing for each bin the distribution of particles in acceptance.