TensorFlow on GPU
from landerlini/lhcbaf:v0p8¶In this notebook we setup the training for the acceptance model, based on a Deep Neural Network.
This notebook is part of a pipeline and requires the data preprocessed with the notebook Preprocessing.ipynb.
In this notebooks we will use the standard software stack for TensorFlow machine learning applications.
To ensure a GPU is found and properly loaded in the notebook kernel to the benefit of TensorFlow, the output of the following block should be similar to '/device:GPU:0'
.
If a GPU is not found, the string will be empty.
'/device:GPU:0'
Preprocessed datasets were stored in Apache feather format and can be reloaded with our custom FeatherReader
designed to stream the datasets into TensorFlow or Dask format.
We are loading in particular:
We load a small batch of data to identify the number of conditional and target features and automate the definition of the neural network architecture.
TensorShape([1000000, 12])
The function we are modelling is not a trivial one, requiring a sufficiently deep neural network. In order to limit the effect of the vanishing gradient and make the trainig procedure faster, we are using the concept of Residual Layer, introducing skipping connections between the input and the output of each layer.
With residual layers, the neural network is trained to learn the deviation from identity which is a much less demanding task than learning the whole transformation.
The output layer of the neural network has a sigmoid activation function to be mapped into the interval 0-1 and being interpreted as the probability for the particle of being in acceptance.
Model: "model" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) [(None, 12)] 0 __________________________________________________________________________________________________ dense (Dense) (None, 128) 1664 input_1[0][0] __________________________________________________________________________________________________ dense_1 (Dense) (None, 128) 16512 dense[0][0] __________________________________________________________________________________________________ add (Add) (None, 128) 0 dense[0][0] dense_1[0][0] __________________________________________________________________________________________________ dense_2 (Dense) (None, 128) 16512 add[0][0] __________________________________________________________________________________________________ add_1 (Add) (None, 128) 0 add[0][0] dense_2[0][0] __________________________________________________________________________________________________ dense_3 (Dense) (None, 128) 16512 add_1[0][0] __________________________________________________________________________________________________ add_2 (Add) (None, 128) 0 add_1[0][0] dense_3[0][0] __________________________________________________________________________________________________ dense_4 (Dense) (None, 128) 16512 add_2[0][0] __________________________________________________________________________________________________ add_3 (Add) (None, 128) 0 add_2[0][0] dense_4[0][0] __________________________________________________________________________________________________ dense_5 (Dense) (None, 128) 16512 add_3[0][0] __________________________________________________________________________________________________ add_4 (Add) (None, 128) 0 add_3[0][0] dense_5[0][0] __________________________________________________________________________________________________ dense_6 (Dense) (None, 128) 16512 add_4[0][0] __________________________________________________________________________________________________ add_5 (Add) (None, 128) 0 add_4[0][0] dense_6[0][0] __________________________________________________________________________________________________ dense_7 (Dense) (None, 128) 16512 add_5[0][0] __________________________________________________________________________________________________ add_6 (Add) (None, 128) 0 add_5[0][0] dense_7[0][0] __________________________________________________________________________________________________ dense_8 (Dense) (None, 128) 16512 add_6[0][0] __________________________________________________________________________________________________ add_7 (Add) (None, 128) 0 add_6[0][0] dense_8[0][0] __________________________________________________________________________________________________ dense_9 (Dense) (None, 128) 16512 add_7[0][0] __________________________________________________________________________________________________ add_8 (Add) (None, 128) 0 add_7[0][0] dense_9[0][0] __________________________________________________________________________________________________ dense_10 (Dense) (None, 128) 16512 add_8[0][0] __________________________________________________________________________________________________ add_9 (Add) (None, 128) 0 add_8[0][0] dense_10[0][0] __________________________________________________________________________________________________ dense_11 (Dense) (None, 1) 129 add_9[0][0] ================================================================================================== Total params: 166,913 Trainable params: 166,913 Non-trainable params: 0 __________________________________________________________________________________________________
The rest of the training procedure is rather standard:
To speed up convergence, we split the training procedure in two steps. First, we proceed at high learning rate and then we operate some fine tuning of the network drastrically reducing it. Similar results might be obtained in a more elegant way using a keras LearningRateScheduler.
Instead of setting a fixed number of epochs, we switch from one learning rate to the next one upon lack of improvement for at least 5 epochs.
Epoch 1/50 26/26 [==============================] - 9s 291ms/step - loss: 1.3900 - val_loss: 0.7639 Epoch 2/50 26/26 [==============================] - 8s 281ms/step - loss: 0.7300 - val_loss: 0.6839 Epoch 3/50 26/26 [==============================] - 8s 293ms/step - loss: 0.6042 - val_loss: 1.0523 Epoch 4/50 26/26 [==============================] - 8s 283ms/step - loss: 0.5679 - val_loss: 0.4884 Epoch 5/50 26/26 [==============================] - 8s 292ms/step - loss: 0.4773 - val_loss: 0.3921 Epoch 6/50 26/26 [==============================] - 8s 291ms/step - loss: 0.4453 - val_loss: 0.4443 Epoch 7/50 26/26 [==============================] - 8s 276ms/step - loss: 0.4114 - val_loss: 0.4245 Epoch 8/50 26/26 [==============================] - 8s 278ms/step - loss: 0.3885 - val_loss: 0.3633 Epoch 9/50 26/26 [==============================] - 8s 285ms/step - loss: 0.3778 - val_loss: 0.3277 Epoch 10/50 26/26 [==============================] - 8s 277ms/step - loss: 0.3715 - val_loss: 0.3291 Epoch 11/50 26/26 [==============================] - 10s 362ms/step - loss: 0.3678 - val_loss: 0.3117 Epoch 12/50 26/26 [==============================] - 10s 364ms/step - loss: 0.3440 - val_loss: 0.3286 Epoch 13/50 26/26 [==============================] - 10s 357ms/step - loss: 0.3456 - val_loss: 0.3153 Epoch 14/50 26/26 [==============================] - 8s 289ms/step - loss: 0.3351 - val_loss: 0.3022 Epoch 15/50 26/26 [==============================] - 10s 361ms/step - loss: 0.3229 - val_loss: 0.3071 Epoch 16/50 26/26 [==============================] - 8s 278ms/step - loss: 0.3195 - val_loss: 0.3054 Epoch 17/50 26/26 [==============================] - 8s 277ms/step - loss: 0.3086 - val_loss: 0.3002 Epoch 18/50 26/26 [==============================] - 8s 283ms/step - loss: 0.2968 - val_loss: 0.2765 Epoch 19/50 26/26 [==============================] - 7s 272ms/step - loss: 0.2914 - val_loss: 0.2625 Epoch 20/50 26/26 [==============================] - 7s 272ms/step - loss: 0.2863 - val_loss: 0.3002 Epoch 21/50 26/26 [==============================] - 8s 278ms/step - loss: 0.2809 - val_loss: 0.3399 Epoch 22/50 26/26 [==============================] - 8s 277ms/step - loss: 0.2702 - val_loss: 0.2573 Epoch 23/50 26/26 [==============================] - 8s 274ms/step - loss: 0.2724 - val_loss: 0.2506 Epoch 24/50 26/26 [==============================] - 8s 295ms/step - loss: 0.2637 - val_loss: 0.2451 Epoch 25/50 26/26 [==============================] - 8s 293ms/step - loss: 0.2678 - val_loss: 0.2637 Epoch 26/50 26/26 [==============================] - 8s 278ms/step - loss: 0.2829 - val_loss: 0.2400 Epoch 27/50 26/26 [==============================] - 8s 290ms/step - loss: 0.2592 - val_loss: 0.2585 Epoch 28/50 26/26 [==============================] - 8s 287ms/step - loss: 0.2561 - val_loss: 0.2524 Epoch 29/50 26/26 [==============================] - 8s 283ms/step - loss: 0.2562 - val_loss: 0.2382 Epoch 30/50 26/26 [==============================] - 8s 291ms/step - loss: 0.2572 - val_loss: 0.2517 Epoch 31/50 26/26 [==============================] - 8s 276ms/step - loss: 0.2536 - val_loss: 0.2284 Epoch 32/50 26/26 [==============================] - 8s 292ms/step - loss: 0.2448 - val_loss: 0.3143 Epoch 33/50 26/26 [==============================] - 8s 286ms/step - loss: 0.2516 - val_loss: 0.2296 Epoch 34/50 26/26 [==============================] - 8s 283ms/step - loss: 0.2461 - val_loss: 0.2688
Epoch 1/50 26/26 [==============================] - 11s 299ms/step - loss: 0.1928 - val_loss: 0.1836 Epoch 2/50 26/26 [==============================] - 8s 280ms/step - loss: 0.1828 - val_loss: 0.1798 Epoch 3/50 26/26 [==============================] - 10s 367ms/step - loss: 0.1791 - val_loss: 0.1771 Epoch 4/50 26/26 [==============================] - 10s 355ms/step - loss: 0.1763 - val_loss: 0.1748 Epoch 5/50 26/26 [==============================] - 8s 279ms/step - loss: 0.1744 - val_loss: 0.1735 Epoch 6/50 26/26 [==============================] - 10s 361ms/step - loss: 0.1731 - val_loss: 0.1726 Epoch 7/50 26/26 [==============================] - 10s 357ms/step - loss: 0.1715 - val_loss: 0.1705 Epoch 8/50 26/26 [==============================] - 8s 286ms/step - loss: 0.1698 - val_loss: 0.1691 Epoch 9/50 26/26 [==============================] - 8s 287ms/step - loss: 0.1686 - val_loss: 0.1681 Epoch 10/50 26/26 [==============================] - 7s 268ms/step - loss: 0.1673 - val_loss: 0.1666 Epoch 11/50 26/26 [==============================] - 8s 282ms/step - loss: 0.1661 - val_loss: 0.1652 Epoch 12/50 26/26 [==============================] - 8s 282ms/step - loss: 0.1650 - val_loss: 0.1650 Epoch 13/50 26/26 [==============================] - 8s 285ms/step - loss: 0.1641 - val_loss: 0.1639 Epoch 14/50 26/26 [==============================] - 7s 273ms/step - loss: 0.1634 - val_loss: 0.1641 Epoch 15/50 26/26 [==============================] - 8s 281ms/step - loss: 0.1627 - val_loss: 0.1613 Epoch 16/50 26/26 [==============================] - 7s 263ms/step - loss: 0.1618 - val_loss: 0.1637 Epoch 17/50 26/26 [==============================] - 10s 364ms/step - loss: 0.1616 - val_loss: 0.1604 Epoch 18/50 26/26 [==============================] - 8s 273ms/step - loss: 0.1608 - val_loss: 0.1614 Epoch 19/50 26/26 [==============================] - 8s 280ms/step - loss: 0.1601 - val_loss: 0.1603 Epoch 20/50 26/26 [==============================] - 7s 272ms/step - loss: 0.1598 - val_loss: 0.1597 Epoch 21/50 26/26 [==============================] - 8s 282ms/step - loss: 0.1584 - val_loss: 0.1583 Epoch 22/50 26/26 [==============================] - 8s 282ms/step - loss: 0.1575 - val_loss: 0.1574 Epoch 23/50 26/26 [==============================] - 7s 269ms/step - loss: 0.1568 - val_loss: 0.1561 Epoch 24/50 26/26 [==============================] - 8s 280ms/step - loss: 0.1562 - val_loss: 0.1563 Epoch 25/50 26/26 [==============================] - 8s 287ms/step - loss: 0.1559 - val_loss: 0.1572 Epoch 26/50 26/26 [==============================] - 8s 280ms/step - loss: 0.1552 - val_loss: 0.1548 Epoch 27/50 26/26 [==============================] - 7s 265ms/step - loss: 0.1544 - val_loss: 0.1540 Epoch 28/50 26/26 [==============================] - 7s 260ms/step - loss: 0.1539 - val_loss: 0.1539 Epoch 29/50 26/26 [==============================] - 8s 281ms/step - loss: 0.1529 - val_loss: 0.1526 Epoch 30/50 26/26 [==============================] - 8s 277ms/step - loss: 0.1529 - val_loss: 0.1518 Epoch 31/50 26/26 [==============================] - 10s 359ms/step - loss: 0.1527 - val_loss: 0.1535 Epoch 32/50 26/26 [==============================] - 8s 276ms/step - loss: 0.1518 - val_loss: 0.1527 Epoch 33/50 26/26 [==============================] - 7s 273ms/step - loss: 0.1520 - val_loss: 0.1512 Epoch 34/50 26/26 [==============================] - 7s 262ms/step - loss: 0.1511 - val_loss: 0.1540 Epoch 35/50 26/26 [==============================] - 8s 278ms/step - loss: 0.1506 - val_loss: 0.1510 Epoch 36/50 26/26 [==============================] - 9s 348ms/step - loss: 0.1501 - val_loss: 0.1494 Epoch 37/50 26/26 [==============================] - 8s 283ms/step - loss: 0.1500 - val_loss: 0.1496 Epoch 38/50 26/26 [==============================] - 7s 270ms/step - loss: 0.1496 - val_loss: 0.1503 Epoch 39/50 26/26 [==============================] - 8s 277ms/step - loss: 0.1494 - val_loss: 0.1499 Epoch 40/50 26/26 [==============================] - 7s 263ms/step - loss: 0.1489 - val_loss: 0.1513 Epoch 41/50 26/26 [==============================] - 8s 281ms/step - loss: 0.1489 - val_loss: 0.1500
The two regimes are well visibile in the plot below, combining the two training phases in a single epoch count.
While a proper validation of the model is demanded to a dedicated notebook, here we check that the training was not completely failed and provides reasonable numbers.
In particular, we compare the distributions of the training lables and of the predictions.
Clearly, the training labels are either 1 or 0 because a particle is either in acceptance or not. Instead, the output of the neural network is a probability and hence it will be distributed between 0 and 1.
The comparison is used to ensure that both the training labels and the network output are not unreasonable, for example collapsed into a single value or with training labels belonging to a single category.
Then we can compare the distribution of a preprocessed variable before and after the application of the acceptance requirement.
Here, for example, we consider the preprocessed log-value of the momentum.
Since the variable is preprocessed, without applying any cut (Generated) we have a perfectly normalized Gaussian.
Applying the criterion acceptance == 1
we get another distribution that we expect to model with decent approximation by applying the response of the trained neural networks as weights to the Generated dataset.
The comparison of the two histograms (obtained by applying a cut on the true acceptance or applying a weight on the acceptance probability) provides a first validation on the quality of the parametrization.
Finally, we export the model to the same folder where the pretraining step was stored.
INFO:tensorflow:Assets written to: /workarea/local/private/cache/models/acceptance/assets
In this notebook we have:
In the next notebook we will perform a more detaile validation, by splitting the sample in kinematic bins and comparing for each bin the distribution of particles in acceptance.