In order to run the trained parametrizations within a Gaudi application, some transformation to a format that can be easily loaded from C/C++. Several options were considered, but most of them require a dedicated runtime to be deployed and linked in the target application.
In practice, the runtimes are extremely effective to distribute the computation on multiple cores and are designed to speed up the evaluation of the networks on large batches of data samples. Unfortunately, in Lamarr the number of particles we can process per event is rather limited and the complexity of the models is also not comparable to those used for Computer Vision or Natural Language Processing tasks. As a consequence, the overhead introduced for context switching when passing the input variables to the main thread to the runtime thread is unacceptable large. In addition, the latest versions of Gaudi introduce multithread processing that ease to reduce the memory footprint of the applications. Unfortunately, the schedulers designed for runtime use a completly different logic from the Gaudi scheduler and are cerainly not designed to interact effectively with Gaudi. In conclusion, the attempts of relying on the TensorFlow C++ APIs and a first exploration of using ONNX for Lamarr, convinced the Simulation Project to move towards other alternatives.
A totally different approach, developed in the context of real time processing for plasma control in nuclear fusion experiments is to translate the neural network in compatible C code, compile it, and link it to the main application. The most important effort in this direction lead to the release of keras2c.
In LHCb, we have developed our own version of a transpiler of scikit-learn
and Keras models to compatible C code, under the name
scikinC
.
While much more limited in terms of support for neural networks, scikinC
provides functionalities to combine the preprocessing step with the model
in a very simple and clean way.
In this notebook we will run a development version of scikinC as some of the functionalities needed to deploy these neural networks are being implemented now.
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
The libraries used to import the model and the validate model and deployment are the same as for the validation notebooks.
In this notebook we are going to load four different models for:
For each of these models, we are loading:
tX
)for the GAN models (resolution and covariance) we are loading a
postprocessing step (named tY
) as well.
To handle the model loading and some manipulation needed to make it
ready to be deployed, we rely on a custom class, named LamarrModel
defined in the local module deploy_utils.py
.
The class LamarrModel
takes as an input the path of a keras model,
and loads from the corresponding folder:
tX.pkl
file is found, it loads it as a pickled preprocessing steptY.pkl
file is found, it loads it as a pickled postprocessing stepThen, it tries to convert the model to a keras Sequential model. Note that most of the neural networks defined in this repository include skip connections and are therefore defined with the Functional APIs. As of today, models defined with the Functional APIs cannot be converted with scikinC, hence, a conversion of each DNN to a Sequential model is necessary. Note that the restriction to the Sequential form is formal, as long as we do not constrain the operations single layers may perform.
In the particular case of dense layers with skip connections, we can
define a new layer, named DenseWithSkipConnection
that implements the
operation
$$
\vec h_{i+1} = \vec h_{i} + \sigma(\mathbf{A} \vec h_{i} + \vec b),
$$
where $\mathbf{A}$ and $\vec {b}$ represent the weights of the dense
layer and $\sigma(\cdot)$ the activation function.
A stack of DenseWithSkipConnection
layers can reproduce exactly the
functionality of the original neural network while being represented as
a Sequential model.
To be completely honest, this simply moves the problem to scikinC
that has now to provide a C implementation for our custom
DenseWithSkipConnection
layer.
Fortunately, this is extremely immediate to implement (though a bit hacky),
requiring only few lines of Python, placed in the
deploy_utils.py
module.
So, in summary, we are defining:
DenseWithSkipConnection
that will replace any combination
of a keras Dense layer followed by a skip connection;DenseWithSkipConnection
that implements
this layer in C, reusing the C code for converting Dense layers as much as possible.Most of this happens behind the scenes in the implementation of the LamarrModel
class.
Loading model from '/workarea/local/private/cache/models/acceptance'. Preprocessing: 👌.Postprocessing: 😞. Check on the number of weights: ✅! Original model: 166913. Collapsed model: 166913.
Loading model from '/workarea/local/private/cache/models/efficiency'. Preprocessing: 👌.Postprocessing: 😞. Check on the number of weights: ✅! Original model: 84740. Collapsed model: 84740.
Loading model from '/workarea/local/private/cache/models/resolution'. Preprocessing: 👌.Postprocessing: 👌. WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually. Check on the number of weights: ✅! Original model: 184329. Collapsed model: 184329.
Loading model from '/workarea/local/private/cache/models/covariance'. Preprocessing: 👌.Postprocessing: 👌. WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually. Check on the number of weights: ✅! Original model: 152463. Collapsed model: 152463.
In order to export the models to a shared object we have to obtain a C code describing the models and then to compile the C code. For all operations supported by scikinC, we rely on code generation. For GANs, we will need to glue some steps together with custom C functions.
For models like acceptance and efficiency where all the transform and the evaluations are pipelined in the same "direction" as they were trained, we can define scikit-learn Pipelines with the sequence of transformations and models and let scikinC to generate the glue between the preprocessing and DNN steps.
For GANs this is not possible because we cannot pipeline the inverse of the postprocessing transformation tY
.
Besides, the input features have to be preprocessed, but the random noise should be injected directly in the generator model,
breaking the Pipeline model.
Hence, we generate C code for each step (preprocessing, model and postprocessing) separately.
Note. In addition,
scikinC
does not support pipelines of pipelines and the covariance GAN preprocessing step is defined as a pipeline, so even defining a inverse QuantileTransformer and injecting it in the pipeline would not be sufficient to automate the generation of the complete C code.
The pipelines for the resolution and covariance models are defined in a separate file (gan_pipeline.c), reproduced below for completeness.
Note that scikinC operates with a separate call for each row or array, so all the arrays are 1D.
In order to customize the pipeline with specific information from the model, such as the number of input and output features, or the dimensionality of the random noise, we use symbols that will be defined at compile time.
#define FLOAT_T float
/**** Prototypes of functions automatically generated by scikinC ****/
// Resolution steps
FLOAT_T* resolution_tX (FLOAT_T*, const FLOAT_T*);
FLOAT_T* resolution_tY_inverse (FLOAT_T*, const FLOAT_T*);
FLOAT_T* resolution_dnn (FLOAT_T*, const FLOAT_T*);
// Covariance steps
FLOAT_T* covariance_tX (FLOAT_T*, const FLOAT_T*);
FLOAT_T* covariance_tY_inverse (FLOAT_T*, const FLOAT_T*);
FLOAT_T* covariance_dnn (FLOAT_T*, const FLOAT_T*);
/**** Additional functions wrapping generated functions together ****/
// Resolution entry point
FLOAT_T* resolution (FLOAT_T* output, const FLOAT_T* input, const FLOAT_T* random)
{
// Rename constants defined at compile time
const int nInputs = RESOLUTION_NUM_FEATURES;
const int nOutputs = RESOLUTION_NUM_OUTPUTS;
const int nRandom = COVARIANCE_NUM_RANDOM;
// Instanciate variables in the stack.
FLOAT_T buf[nInputs + nOutputs + nRandom];
int i;
// Preprocessing
resolution_tX(buf, input);
// Concatenate preprocessed features and random noise
for (i = 0; i < nRandom; ++i)
buf[i + nInputs] = random[i];
// Execute the generator
resolution_dnn(buf, buf);
// Apply the inverse postprocessing transformation
resolution_tY_inverse(output, buf);
return output;
}
// Covariance entry point
FLOAT_T* covariance (FLOAT_T* output, const FLOAT_T* input, const FLOAT_T* random)
{
// Rename constants defined at compile time
const int nInputs = COVARIANCE_NUM_FEATURES;
const int nOutputs = COVARIANCE_NUM_OUTPUTS;
const int nRandom = COVARIANCE_NUM_RANDOM;
// Instanciate variables in the stack.
FLOAT_T buf[nInputs + nOutputs + nRandom];
int i;
// Preprocessing
covariance_tX(buf, input);
// Concatenate preprcessed features and random noise
for (i = 0; i < nRandom; ++i)
buf[i + nInputs] = random[i];
// Execute the DNN
covariance_dnn(buf, buf);
// Apply the inverse postprocessing transformation
covariance_tY_inverse(output, buf);
return output;
}
To compile the code we simply retrieve the missing information from the loaded models and we call gcc.
The following compiler flags are worth a comment:
-O3
defines a very high level of code optimization that may slighly reduce the floating point precision and
removes several checks on memory usage and allocation. For debugging, it should definitely be removed.-lm
links to the Standard C library with mathematic operations (as defined in the math.h
header) and is needed for
the tanh
, log
and exp
functions used for activations--shared -fPIC
define the target binary as a shared library (instead of a static object or an executable application) to ease
importing the models in Gaudi at runtime.The complete compile command is reported below.
Compilation command: gcc \ exported/generated.C \ gan_pipelines.c \ -D RESOLUTION_NUM_FEATURES=12 \ -D RESOLUTION_NUM_OUTPUTS=9 \ -D RESOLUTION_NUM_RANDOM=128 \ -D COVARIANCE_NUM_FEATURES=15 \ -D COVARIANCE_NUM_OUTPUTS=15 \ -D COVARIANCE_NUM_RANDOM=128 \ -o ./exported/generated.so \ -O3 \ -lm \ --shared \ -fPIC
0
To ensure that the converted models are reasonably good we should validate them by comparing their output to the original models. Note in particular that they underwent two transformations:
The scikinC package provides a helper class named MLFunction
to wrap a function in a compiled C library and evaluate it on numpy arrays.
A simple extension is need to also wrap our custom GAN pipelines which require passing random features as well.
We will name such an extension GanFunction
.
To validate the full export mechanism we are going to evaluate the model using real training data and comparing:
MLFunction
or GanFunction
wrappers.Note that while for evaluating the models with the keras APIs we will rely on the preprocessed data stored on disk in the Preprocessing notebooks, to evaluate the whole pipeline deployed in C we will need to apply the inverse preprocessing transformation.
For each output feature, we are drawing histograms:
We expect the comparison between the original and collapsed model to provide perfectly consistent results, unless of bugs introduced in the collapse algorithm. So this comparison is rather a sanity check than a real validation. Instead, discrepancies between the keras and C implementations is expected because of the floating point algebra and error propagation. Assess the entity of these discrepancies is not trivial and validating the result requires considering both the distributions of the absolute and relative errors.
WARNING:tensorflow:5 out of the last 5 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7fba57801a60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
In this notebook we discussed the procedure to export the trained models to a shared C library that can be easily imported at run time in Gaudi-based applications.
The procedure used involves three steps:
Finally, we have validated the procedure by comparing the results obtained running the original, collapsed and exported models on a subset of the training dataset. Studying the distributions we observe that the conversion introduces a small level of discrepancy between the exported and original data, probably because of floating point algebra or small discrepancies in the implementations of the quantile transformation.
Still, the distributions are very well reproduced and the pathological entries are at the permil level or less, with relative errors of few percents that should not compromise the overall quality of the simulation.
The most worrying case is probably the resolution where in $\mathcal O(10^{-5})$ cases the events are pushed very far from the original value and might end up in unphysical regions. This might require to crop the resolution distribution to avoid outlayers.
Tuning the hyperparameters of the quantile transformers is known to have a significan effect on these discrepancies. However, increasing the number of bins also slows down the evaluation, so that a trade-off between speed and "safety" must be found.