Training of the efficiency model¶

Notebook tested within the environment TensorFlow on GPU available in the docker image landerlini/lhcbaf:v0p8¶

This notebook is part of a pipeline, in particular it requires the data preprocessed as defined in the notebook Preprocessing.ipynb and the validation of the trained model is demanded to the notebook Efficiency-validation.ipynb.

Here, we define the training procedure for the Deep Neural Network model defining the class each track is reconstructed as. As evident from the preprocessing step, we restrain the classes to:

  • long tracks (traversing the whole detector)
  • upstream tracks (traversing the VELO and the Tracker Turincensis)
  • downstream tracks (traversing the Tracker Turicensis and the downstream tracker, TT).

We include as a class the "unreconstructed" category which includes both the non-reconstructed particles and those reconstructed as other classes.

The neural network we will train is designed to predict the probability each track is reconstructed as a given track. In the deployment of the model we will assign the particle to a single class, by drawing one of the classes above based on the probabilities obtained from the network.

The classes are mutually exclusive, each particle can be assigned to at most one of the reconstruction classes. Hence, we describe the problem as a multiclass classification with a multinomial probability function and a Categorical Cross-entropy as loss function.

Libraries and environment setup¶

As for the training of the acceptance model, we are using here the standard software stack for TensorFlow on GPU.

We ensure the GPU is properly loaded and assigned to TensorFlow as hardware accelerator for the training.

If the GPU is loaded properly, the following code block should result in a string similar to '/devince:GPU:0'.

'/device:GPU:0'

Loading data¶

We are reading the data using our custom implementation of FeatherReader streaming the data directly to TensorFlow. In particular, we are loading:

  • the training dataset to optimize the weights;
  • the validation dataset to evaluate possible overtraining and select model and tune the regularization hyper-parameters and techniques.

We also load to RAM a small chunk of data to ease the model building.

TensorShape([1000000, 4])

Model definition¶

We define the neural network as a deep network with skip connections to limit the gradient vanishing problem.

Note that the activation of the last layer is a softmax as expected by the Categorical Cross-entropy loss function.

Unfortunately, the scikinC package that we are relying on to deploy these models in Lamarr does not support the softmax activation function is indicated as a string, but needs it defined as an independent layer.

{'activation': 'tanh',
 'kernel_initializer': 'he_normal',
 'kernel_regularizer': <keras.regularizers.L2 object at 0x7f1bc4993790>,
 'units': 128}
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 12)]         0                                            
__________________________________________________________________________________________________
dense (Dense)                   (None, 128)          1664        input_1[0][0]                    
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 128)          16512       dense[0][0]                      
__________________________________________________________________________________________________
add (Add)                       (None, 128)          0           dense[0][0]                      
                                                                 dense_1[0][0]                    
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 128)          16512       add[0][0]                        
__________________________________________________________________________________________________
add_1 (Add)                     (None, 128)          0           add[0][0]                        
                                                                 dense_2[0][0]                    
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 128)          16512       add_1[0][0]                      
__________________________________________________________________________________________________
add_2 (Add)                     (None, 128)          0           add_1[0][0]                      
                                                                 dense_3[0][0]                    
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 128)          16512       add_2[0][0]                      
__________________________________________________________________________________________________
add_3 (Add)                     (None, 128)          0           add_2[0][0]                      
                                                                 dense_4[0][0]                    
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 128)          16512       add_3[0][0]                      
__________________________________________________________________________________________________
add_4 (Add)                     (None, 128)          0           add_3[0][0]                      
                                                                 dense_5[0][0]                    
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 4)            516         add_4[0][0]                      
__________________________________________________________________________________________________
softmax (Softmax)               (None, 4)            0           dense_6[0][0]                    
==================================================================================================
Total params: 84,740
Trainable params: 84,740
Non-trainable params: 0
__________________________________________________________________________________________________

The configuration of the training is standard for the multiclass classification task.

  • CategoricalCrossentropy loss function
  • RMSprop optimizer

Once again we split the training procedure in two steps, we train with a very high learning rate as long as it brings to some improvement in the value of the loss function. Then we drastically reduce it to a much smaller value.

Note that to limit the local minima in the loss function and ease convergence towards the global minimum at such a high learning rate, we apply a small smoothing of the labels. This results into a non-probabilistic meaning of the generated output, which is unaccepable to our purpose. Hence, we reset the label smoothing to zero for the second (and last) part of the training with a reduced learning rate.

Epoch 1/50
26/26 [==============================] - 12s 377ms/step - loss: 2.6902 - val_loss: 1.7093
Epoch 2/50
26/26 [==============================] - 10s 361ms/step - loss: 1.4752 - val_loss: 1.2476
Epoch 3/50
26/26 [==============================] - 10s 353ms/step - loss: 1.2887 - val_loss: 0.9020
Epoch 4/50
26/26 [==============================] - 9s 303ms/step - loss: 0.9144 - val_loss: 0.7787
Epoch 5/50
26/26 [==============================] - 9s 311ms/step - loss: 0.6593 - val_loss: 0.5660
Epoch 6/50
26/26 [==============================] - 9s 312ms/step - loss: 0.7964 - val_loss: 0.5383
Epoch 7/50
26/26 [==============================] - 8s 292ms/step - loss: 0.6415 - val_loss: 0.7350
Epoch 8/50
26/26 [==============================] - 8s 293ms/step - loss: 0.5418 - val_loss: 0.5917
Epoch 9/50
26/26 [==============================] - 8s 291ms/step - loss: 0.6108 - val_loss: 1.8677
Epoch 1/50
26/26 [==============================] - 9s 288ms/step - loss: 0.5237 - val_loss: 0.3887
Epoch 2/50
26/26 [==============================] - 9s 318ms/step - loss: 0.3629 - val_loss: 0.3338
Epoch 3/50
26/26 [==============================] - 9s 317ms/step - loss: 0.3182 - val_loss: 0.2959
Epoch 4/50
26/26 [==============================] - 8s 287ms/step - loss: 0.2910 - val_loss: 0.2743
Epoch 5/50
26/26 [==============================] - 8s 295ms/step - loss: 0.2742 - val_loss: 0.2686
Epoch 6/50
26/26 [==============================] - 8s 283ms/step - loss: 0.2619 - val_loss: 0.2618
Epoch 7/50
26/26 [==============================] - 8s 304ms/step - loss: 0.2548 - val_loss: 0.2566
Epoch 8/50
26/26 [==============================] - 8s 274ms/step - loss: 0.2489 - val_loss: 0.2427
Epoch 9/50
26/26 [==============================] - 8s 297ms/step - loss: 0.2444 - val_loss: 0.2377
Epoch 10/50
26/26 [==============================] - 8s 289ms/step - loss: 0.2399 - val_loss: 0.2444
Epoch 11/50
26/26 [==============================] - 8s 286ms/step - loss: 0.2380 - val_loss: 0.2288
Epoch 12/50
26/26 [==============================] - 10s 350ms/step - loss: 0.2343 - val_loss: 0.2295
Epoch 13/50
26/26 [==============================] - 8s 289ms/step - loss: 0.2318 - val_loss: 0.2347
Epoch 14/50
26/26 [==============================] - 8s 299ms/step - loss: 0.2303 - val_loss: 0.2253
Epoch 15/50
26/26 [==============================] - 8s 286ms/step - loss: 0.2287 - val_loss: 0.2291
Epoch 16/50
26/26 [==============================] - 9s 312ms/step - loss: 0.2273 - val_loss: 0.2189
Epoch 17/50
26/26 [==============================] - 9s 306ms/step - loss: 0.2247 - val_loss: 0.2297
Epoch 18/50
26/26 [==============================] - 8s 282ms/step - loss: 0.2242 - val_loss: 0.2175
Epoch 19/50
26/26 [==============================] - 8s 304ms/step - loss: 0.2224 - val_loss: 0.2405
Epoch 20/50
26/26 [==============================] - 8s 297ms/step - loss: 0.2221 - val_loss: 0.2364
Epoch 21/50
26/26 [==============================] - 8s 273ms/step - loss: 0.2205 - val_loss: 0.2142
Epoch 22/50
26/26 [==============================] - 10s 345ms/step - loss: 0.2189 - val_loss: 0.2166
Epoch 23/50
26/26 [==============================] - 7s 267ms/step - loss: 0.2198 - val_loss: 0.2136
Epoch 24/50
26/26 [==============================] - 8s 271ms/step - loss: 0.2174 - val_loss: 0.2165
Epoch 25/50
26/26 [==============================] - 8s 289ms/step - loss: 0.2180 - val_loss: 0.2184
Epoch 26/50
26/26 [==============================] - 10s 368ms/step - loss: 0.2163 - val_loss: 0.2096
Epoch 27/50
26/26 [==============================] - 8s 272ms/step - loss: 0.2155 - val_loss: 0.2212
Epoch 28/50
26/26 [==============================] - 7s 261ms/step - loss: 0.2164 - val_loss: 0.2081
Epoch 29/50
26/26 [==============================] - 8s 275ms/step - loss: 0.2135 - val_loss: 0.2132
Epoch 30/50
26/26 [==============================] - 8s 273ms/step - loss: 0.2140 - val_loss: 0.2143
Epoch 31/50
26/26 [==============================] - 7s 267ms/step - loss: 0.2140 - val_loss: 0.2100
Epoch 32/50
26/26 [==============================] - 8s 279ms/step - loss: 0.2120 - val_loss: 0.2102
Epoch 33/50
26/26 [==============================] - 10s 362ms/step - loss: 0.2129 - val_loss: 0.2053
Epoch 34/50
26/26 [==============================] - 8s 282ms/step - loss: 0.2116 - val_loss: 0.2094
Epoch 35/50
26/26 [==============================] - 8s 276ms/step - loss: 0.2119 - val_loss: 0.2139
Epoch 36/50
26/26 [==============================] - 10s 356ms/step - loss: 0.2109 - val_loss: 0.2075
Epoch 37/50
26/26 [==============================] - 10s 360ms/step - loss: 0.2104 - val_loss: 0.2083
Epoch 38/50
26/26 [==============================] - 8s 271ms/step - loss: 0.2099 - val_loss: 0.2053

The two training phases are well visible in the plot below reporting the full history of the training procedure.

A first rough validation (sanity checks)¶

As done for the acceptance training, we perform simple and quick checks on the trained model to ensure that the model makes sense, while demanding the most important part of the validation to a dedicated notebook.

First we plot the distribution of the original labels and of the predictions for the various categories.

1.000011329198176

Then we use the probability of belonging to the long track class as a weight to compare the distribution of candidates reconstructed as long tracks in the detailed simulation with candidates probably reconstructable as long tracks according to Lamarr.

Exporting the model¶

As a last step, we export the model to the same directory where we stored the preprocessing steps.

INFO:tensorflow:Assets written to: /workarea/local/private/cache/models/efficiency/assets
/usr/local/miniconda3/envs/tf_on_gpu/lib/python3.9/site-packages/keras/utils/generic_utils.py:494: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
  warnings.warn('Custom mask layers require a config and must override '

Conclusion¶

In this notebook we trained a model for the track reconstruction efficiency, implemented a very simple sanity check to ensure that the trained model makes sense, and finally we exported it to perform a more complete validation in a dedicated notebook.