TensorFlow on GPU
available in the docker image landerlini/lhcbaf:v0p8
¶This notebook is part of a pipeline, in particular it requires the data preprocessed as defined in the notebook Preprocessing.ipynb and the validation of the trained model is demanded to the notebook Efficiency-validation.ipynb.
Here, we define the training procedure for the Deep Neural Network model defining the class each track is reconstructed as. As evident from the preprocessing step, we restrain the classes to:
We include as a class the "unreconstructed" category which includes both the non-reconstructed particles and those reconstructed as other classes.
The neural network we will train is designed to predict the probability each track is reconstructed as a given track. In the deployment of the model we will assign the particle to a single class, by drawing one of the classes above based on the probabilities obtained from the network.
The classes are mutually exclusive, each particle can be assigned to at most one of the reconstruction classes. Hence, we describe the problem as a multiclass classification with a multinomial probability function and a Categorical Cross-entropy as loss function.
As for the training of the acceptance model, we are using here the standard software stack for TensorFlow on GPU.
We ensure the GPU is properly loaded and assigned to TensorFlow as hardware accelerator for the training.
If the GPU is loaded properly, the following code block should result in a string similar to '/devince:GPU:0'
.
'/device:GPU:0'
We are reading the data using our custom implementation of FeatherReader
streaming the data directly to TensorFlow.
In particular, we are loading:
We also load to RAM a small chunk of data to ease the model building.
TensorShape([1000000, 4])
We define the neural network as a deep network with skip connections to limit the gradient vanishing problem.
Note that the activation of the last layer is a softmax as expected by the Categorical Cross-entropy loss function.
Unfortunately, the scikinC
package that we are relying on to deploy these models in Lamarr does not support the softmax
activation function is indicated as a string, but needs it defined as an independent layer.
{'activation': 'tanh', 'kernel_initializer': 'he_normal', 'kernel_regularizer': <keras.regularizers.L2 object at 0x7f1bc4993790>, 'units': 128} Model: "model" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) [(None, 12)] 0 __________________________________________________________________________________________________ dense (Dense) (None, 128) 1664 input_1[0][0] __________________________________________________________________________________________________ dense_1 (Dense) (None, 128) 16512 dense[0][0] __________________________________________________________________________________________________ add (Add) (None, 128) 0 dense[0][0] dense_1[0][0] __________________________________________________________________________________________________ dense_2 (Dense) (None, 128) 16512 add[0][0] __________________________________________________________________________________________________ add_1 (Add) (None, 128) 0 add[0][0] dense_2[0][0] __________________________________________________________________________________________________ dense_3 (Dense) (None, 128) 16512 add_1[0][0] __________________________________________________________________________________________________ add_2 (Add) (None, 128) 0 add_1[0][0] dense_3[0][0] __________________________________________________________________________________________________ dense_4 (Dense) (None, 128) 16512 add_2[0][0] __________________________________________________________________________________________________ add_3 (Add) (None, 128) 0 add_2[0][0] dense_4[0][0] __________________________________________________________________________________________________ dense_5 (Dense) (None, 128) 16512 add_3[0][0] __________________________________________________________________________________________________ add_4 (Add) (None, 128) 0 add_3[0][0] dense_5[0][0] __________________________________________________________________________________________________ dense_6 (Dense) (None, 4) 516 add_4[0][0] __________________________________________________________________________________________________ softmax (Softmax) (None, 4) 0 dense_6[0][0] ================================================================================================== Total params: 84,740 Trainable params: 84,740 Non-trainable params: 0 __________________________________________________________________________________________________
The configuration of the training is standard for the multiclass classification task.
CategoricalCrossentropy
loss functionRMSprop
optimizerOnce again we split the training procedure in two steps, we train with a very high learning rate as long as it brings to some improvement in the value of the loss function. Then we drastically reduce it to a much smaller value.
Note that to limit the local minima in the loss function and ease convergence towards the global minimum at such a high learning rate, we apply a small smoothing of the labels. This results into a non-probabilistic meaning of the generated output, which is unaccepable to our purpose. Hence, we reset the label smoothing to zero for the second (and last) part of the training with a reduced learning rate.
Epoch 1/50 26/26 [==============================] - 12s 377ms/step - loss: 2.6902 - val_loss: 1.7093 Epoch 2/50 26/26 [==============================] - 10s 361ms/step - loss: 1.4752 - val_loss: 1.2476 Epoch 3/50 26/26 [==============================] - 10s 353ms/step - loss: 1.2887 - val_loss: 0.9020 Epoch 4/50 26/26 [==============================] - 9s 303ms/step - loss: 0.9144 - val_loss: 0.7787 Epoch 5/50 26/26 [==============================] - 9s 311ms/step - loss: 0.6593 - val_loss: 0.5660 Epoch 6/50 26/26 [==============================] - 9s 312ms/step - loss: 0.7964 - val_loss: 0.5383 Epoch 7/50 26/26 [==============================] - 8s 292ms/step - loss: 0.6415 - val_loss: 0.7350 Epoch 8/50 26/26 [==============================] - 8s 293ms/step - loss: 0.5418 - val_loss: 0.5917 Epoch 9/50 26/26 [==============================] - 8s 291ms/step - loss: 0.6108 - val_loss: 1.8677
Epoch 1/50 26/26 [==============================] - 9s 288ms/step - loss: 0.5237 - val_loss: 0.3887 Epoch 2/50 26/26 [==============================] - 9s 318ms/step - loss: 0.3629 - val_loss: 0.3338 Epoch 3/50 26/26 [==============================] - 9s 317ms/step - loss: 0.3182 - val_loss: 0.2959 Epoch 4/50 26/26 [==============================] - 8s 287ms/step - loss: 0.2910 - val_loss: 0.2743 Epoch 5/50 26/26 [==============================] - 8s 295ms/step - loss: 0.2742 - val_loss: 0.2686 Epoch 6/50 26/26 [==============================] - 8s 283ms/step - loss: 0.2619 - val_loss: 0.2618 Epoch 7/50 26/26 [==============================] - 8s 304ms/step - loss: 0.2548 - val_loss: 0.2566 Epoch 8/50 26/26 [==============================] - 8s 274ms/step - loss: 0.2489 - val_loss: 0.2427 Epoch 9/50 26/26 [==============================] - 8s 297ms/step - loss: 0.2444 - val_loss: 0.2377 Epoch 10/50 26/26 [==============================] - 8s 289ms/step - loss: 0.2399 - val_loss: 0.2444 Epoch 11/50 26/26 [==============================] - 8s 286ms/step - loss: 0.2380 - val_loss: 0.2288 Epoch 12/50 26/26 [==============================] - 10s 350ms/step - loss: 0.2343 - val_loss: 0.2295 Epoch 13/50 26/26 [==============================] - 8s 289ms/step - loss: 0.2318 - val_loss: 0.2347 Epoch 14/50 26/26 [==============================] - 8s 299ms/step - loss: 0.2303 - val_loss: 0.2253 Epoch 15/50 26/26 [==============================] - 8s 286ms/step - loss: 0.2287 - val_loss: 0.2291 Epoch 16/50 26/26 [==============================] - 9s 312ms/step - loss: 0.2273 - val_loss: 0.2189 Epoch 17/50 26/26 [==============================] - 9s 306ms/step - loss: 0.2247 - val_loss: 0.2297 Epoch 18/50 26/26 [==============================] - 8s 282ms/step - loss: 0.2242 - val_loss: 0.2175 Epoch 19/50 26/26 [==============================] - 8s 304ms/step - loss: 0.2224 - val_loss: 0.2405 Epoch 20/50 26/26 [==============================] - 8s 297ms/step - loss: 0.2221 - val_loss: 0.2364 Epoch 21/50 26/26 [==============================] - 8s 273ms/step - loss: 0.2205 - val_loss: 0.2142 Epoch 22/50 26/26 [==============================] - 10s 345ms/step - loss: 0.2189 - val_loss: 0.2166 Epoch 23/50 26/26 [==============================] - 7s 267ms/step - loss: 0.2198 - val_loss: 0.2136 Epoch 24/50 26/26 [==============================] - 8s 271ms/step - loss: 0.2174 - val_loss: 0.2165 Epoch 25/50 26/26 [==============================] - 8s 289ms/step - loss: 0.2180 - val_loss: 0.2184 Epoch 26/50 26/26 [==============================] - 10s 368ms/step - loss: 0.2163 - val_loss: 0.2096 Epoch 27/50 26/26 [==============================] - 8s 272ms/step - loss: 0.2155 - val_loss: 0.2212 Epoch 28/50 26/26 [==============================] - 7s 261ms/step - loss: 0.2164 - val_loss: 0.2081 Epoch 29/50 26/26 [==============================] - 8s 275ms/step - loss: 0.2135 - val_loss: 0.2132 Epoch 30/50 26/26 [==============================] - 8s 273ms/step - loss: 0.2140 - val_loss: 0.2143 Epoch 31/50 26/26 [==============================] - 7s 267ms/step - loss: 0.2140 - val_loss: 0.2100 Epoch 32/50 26/26 [==============================] - 8s 279ms/step - loss: 0.2120 - val_loss: 0.2102 Epoch 33/50 26/26 [==============================] - 10s 362ms/step - loss: 0.2129 - val_loss: 0.2053 Epoch 34/50 26/26 [==============================] - 8s 282ms/step - loss: 0.2116 - val_loss: 0.2094 Epoch 35/50 26/26 [==============================] - 8s 276ms/step - loss: 0.2119 - val_loss: 0.2139 Epoch 36/50 26/26 [==============================] - 10s 356ms/step - loss: 0.2109 - val_loss: 0.2075 Epoch 37/50 26/26 [==============================] - 10s 360ms/step - loss: 0.2104 - val_loss: 0.2083 Epoch 38/50 26/26 [==============================] - 8s 271ms/step - loss: 0.2099 - val_loss: 0.2053
The two training phases are well visible in the plot below reporting the full history of the training procedure.
As done for the acceptance training, we perform simple and quick checks on the trained model to ensure that the model makes sense, while demanding the most important part of the validation to a dedicated notebook.
First we plot the distribution of the original labels and of the predictions for the various categories.
1.000011329198176
Then we use the probability of belonging to the long track
class as a weight to compare the distribution of candidates reconstructed as long tracks in the detailed simulation with candidates probably reconstructable as long tracks
according to Lamarr.
As a last step, we export the model to the same directory where we stored the preprocessing steps.
INFO:tensorflow:Assets written to: /workarea/local/private/cache/models/efficiency/assets
In this notebook we trained a model for the track reconstruction efficiency, implemented a very simple sanity check to ensure that the trained model makes sense, and finally we exported it to perform a more complete validation in a dedicated notebook.