Validation of the resolution model¶

This notebook was tested with the environment LHCb Analysis Facility as made available in the image landerlini/lhcbaf:v0p8.¶

This notebook is part of a pipeline, in particular it requires data preprocessed in the Preprocessing-GANs notebook and the model trained in the Covariance notebook.

Environment and libraries¶

As for other validation notebooks, we are using the GPU to process the data (selections and histogramming) which will make it unusable for evaluating the DNN models in tensorflow. For a discussion on this issue, refer to the acceptance-validation notebook.

The libraries for using the GPU for data analysis are:

  • cupy, as a replacement for numpy with operations running on the GPU
  • cudf, as a replacement for pandas with a DataFrame stored on GPU
  • dask to implement lazy operations and streaming data from disk to the GPU memory, on demand.
/usr/local/miniconda3/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Loading data¶

We are using a dataset statistically equivalent to the une used for training, but never loaded in the training notebook. We will use our custom FeatherReader helper function, converting data to a dask dataframe while they are read from disk.

Loading model, preprocessing and postprocessing steps¶

Preprocessing steps are obtained from the same directory as the model with the standard naming convention:

  • tX.pkl for the preprocessing step
  • tY.pkl for the postprocessing step

Please note that tY encodes the transformation from physics variables to normally-distributed preprocessed features. In this notebook we are interested in the inverse transformation, mapping the output of the generator (normally distributed features) to the physics quantities.

WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.

Processing pipeline¶

We define a lazy pipeline using dask to:

  • load the data from disk (implemented by FeatherReader)
  • transform both the preprocessed conditions and target features into the original physics quantities
  • generate the random noise
  • evaluate the generator on the preprocessed conditions and generated noise
  • apply the inverse preprocessing step to the generated features to retrieve physics quantities (postprocessing)
  • upload the resulting dataset to device memory for further processing
WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
INFO:tensorflow:Assets written to: ram://32763278-895b-40f4-b97f-dcfdc21e61c6/assets

Adding variables to the dataframe¶

We append to the pipeline the computation of variables that we wish using for binning the dataset but do not enter directly in the training dataset.

The list of the variables computed include:

  • the components of the momentum (from momentum and the slopes)
  • the pseudorapidity $\eta$
  • the azimuthal angle $\phi$
  • the relative error on the momentum $\frac{\Delta p}{p}$

Distribution of the output features in kinematic bins¶

The following histogram show a comparison of the distribution of the generated and reference output features in kinematic bins. The comparison is split per particle type (electron, muon and hadron) and track class (long, upstream and downstream), resulting into nine comparisons for each output feature.

Long tracks

Particle type: electron

Variable: log_cov_ClosestToBeam_0_0 for electrons with long tracks

Variable: log_cov_ClosestToBeam_1_1 for electrons with long tracks

Variable: log_cov_ClosestToBeam_2_2 for electrons with long tracks

Variable: log_cov_ClosestToBeam_3_3 for electrons with long tracks

Variable: log_cov_ClosestToBeam_4_4 for electrons with long tracks

Variable: corr_ClosestToBeam_0_1 for electrons with long tracks

Variable: corr_ClosestToBeam_0_2 for electrons with long tracks

Variable: corr_ClosestToBeam_1_2 for electrons with long tracks

Variable: corr_ClosestToBeam_0_3 for electrons with long tracks

Variable: corr_ClosestToBeam_1_3 for electrons with long tracks

Variable: corr_ClosestToBeam_2_3 for electrons with long tracks

Variable: corr_ClosestToBeam_0_4 for electrons with long tracks

Variable: corr_ClosestToBeam_1_4 for electrons with long tracks

Variable: corr_ClosestToBeam_2_4 for electrons with long tracks

Variable: corr_ClosestToBeam_3_4 for electrons with long tracks

Particle type: muon

Variable: log_cov_ClosestToBeam_0_0 for muons with long tracks

/tmp/ipykernel_45694/830466830.py:49: RuntimeWarning: ks_2samp: Exact calculation unsuccessful. Switching to method=asymp.
  *ks_2samp(bin_df[label].values.get(), bin_df[f"predicted_{label}"].values.get())])

Variable: log_cov_ClosestToBeam_1_1 for muons with long tracks

Variable: log_cov_ClosestToBeam_2_2 for muons with long tracks

Variable: log_cov_ClosestToBeam_3_3 for muons with long tracks

Variable: log_cov_ClosestToBeam_4_4 for muons with long tracks

Variable: corr_ClosestToBeam_0_1 for muons with long tracks

/tmp/ipykernel_45694/830466830.py:49: RuntimeWarning: ks_2samp: Exact calculation unsuccessful. Switching to method=asymp.
  *ks_2samp(bin_df[label].values.get(), bin_df[f"predicted_{label}"].values.get())])

Variable: corr_ClosestToBeam_0_2 for muons with long tracks

Variable: corr_ClosestToBeam_1_2 for muons with long tracks

Variable: corr_ClosestToBeam_0_3 for muons with long tracks

Variable: corr_ClosestToBeam_1_3 for muons with long tracks

Variable: corr_ClosestToBeam_2_3 for muons with long tracks

Variable: corr_ClosestToBeam_0_4 for muons with long tracks

Variable: corr_ClosestToBeam_1_4 for muons with long tracks

Variable: corr_ClosestToBeam_2_4 for muons with long tracks

Variable: corr_ClosestToBeam_3_4 for muons with long tracks

Particle type: hadron

Variable: log_cov_ClosestToBeam_0_0 for hadrons with long tracks

Variable: log_cov_ClosestToBeam_1_1 for hadrons with long tracks

Variable: log_cov_ClosestToBeam_2_2 for hadrons with long tracks

Variable: log_cov_ClosestToBeam_3_3 for hadrons with long tracks

Variable: log_cov_ClosestToBeam_4_4 for hadrons with long tracks

Variable: corr_ClosestToBeam_0_1 for hadrons with long tracks

Variable: corr_ClosestToBeam_0_2 for hadrons with long tracks

Variable: corr_ClosestToBeam_1_2 for hadrons with long tracks

Variable: corr_ClosestToBeam_0_3 for hadrons with long tracks

Variable: corr_ClosestToBeam_1_3 for hadrons with long tracks

Variable: corr_ClosestToBeam_2_3 for hadrons with long tracks

Variable: corr_ClosestToBeam_0_4 for hadrons with long tracks

Variable: corr_ClosestToBeam_1_4 for hadrons with long tracks

Variable: corr_ClosestToBeam_2_4 for hadrons with long tracks

Variable: corr_ClosestToBeam_3_4 for hadrons with long tracks

Upstream tracks

Particle type: electron

Variable: log_cov_ClosestToBeam_0_0 for electrons with upstream tracks

/tmp/ipykernel_45694/830466830.py:49: RuntimeWarning: ks_2samp: Exact calculation unsuccessful. Switching to method=asymp.
  *ks_2samp(bin_df[label].values.get(), bin_df[f"predicted_{label}"].values.get())])

Variable: log_cov_ClosestToBeam_1_1 for electrons with upstream tracks

/tmp/ipykernel_45694/830466830.py:49: RuntimeWarning: ks_2samp: Exact calculation unsuccessful. Switching to method=asymp.
  *ks_2samp(bin_df[label].values.get(), bin_df[f"predicted_{label}"].values.get())])

Variable: log_cov_ClosestToBeam_2_2 for electrons with upstream tracks

Variable: log_cov_ClosestToBeam_3_3 for electrons with upstream tracks

Variable: log_cov_ClosestToBeam_4_4 for electrons with upstream tracks

Variable: corr_ClosestToBeam_0_1 for electrons with upstream tracks

/tmp/ipykernel_45694/830466830.py:49: RuntimeWarning: ks_2samp: Exact calculation unsuccessful. Switching to method=asymp.
  *ks_2samp(bin_df[label].values.get(), bin_df[f"predicted_{label}"].values.get())])

Variable: corr_ClosestToBeam_0_2 for electrons with upstream tracks

Variable: corr_ClosestToBeam_1_2 for electrons with upstream tracks

Variable: corr_ClosestToBeam_0_3 for electrons with upstream tracks

Variable: corr_ClosestToBeam_1_3 for electrons with upstream tracks

Variable: corr_ClosestToBeam_2_3 for electrons with upstream tracks

Variable: corr_ClosestToBeam_0_4 for electrons with upstream tracks

Variable: corr_ClosestToBeam_1_4 for electrons with upstream tracks

Variable: corr_ClosestToBeam_2_4 for electrons with upstream tracks

Variable: corr_ClosestToBeam_3_4 for electrons with upstream tracks

Particle type: muon

Variable: log_cov_ClosestToBeam_0_0 for muons with upstream tracks

Variable: log_cov_ClosestToBeam_1_1 for muons with upstream tracks

/tmp/ipykernel_45694/830466830.py:49: RuntimeWarning: ks_2samp: Exact calculation unsuccessful. Switching to method=asymp.
  *ks_2samp(bin_df[label].values.get(), bin_df[f"predicted_{label}"].values.get())])

Variable: log_cov_ClosestToBeam_2_2 for muons with upstream tracks

Variable: log_cov_ClosestToBeam_3_3 for muons with upstream tracks

Variable: log_cov_ClosestToBeam_4_4 for muons with upstream tracks

/tmp/ipykernel_45694/830466830.py:49: RuntimeWarning: ks_2samp: Exact calculation unsuccessful. Switching to method=asymp.
  *ks_2samp(bin_df[label].values.get(), bin_df[f"predicted_{label}"].values.get())])

Variable: corr_ClosestToBeam_0_1 for muons with upstream tracks

Variable: corr_ClosestToBeam_0_2 for muons with upstream tracks

Variable: corr_ClosestToBeam_1_2 for muons with upstream tracks

Variable: corr_ClosestToBeam_0_3 for muons with upstream tracks

Variable: corr_ClosestToBeam_1_3 for muons with upstream tracks

/tmp/ipykernel_45694/830466830.py:49: RuntimeWarning: ks_2samp: Exact calculation unsuccessful. Switching to method=asymp.
  *ks_2samp(bin_df[label].values.get(), bin_df[f"predicted_{label}"].values.get())])

Variable: corr_ClosestToBeam_2_3 for muons with upstream tracks

Variable: corr_ClosestToBeam_0_4 for muons with upstream tracks

Variable: corr_ClosestToBeam_1_4 for muons with upstream tracks

Variable: corr_ClosestToBeam_2_4 for muons with upstream tracks

Variable: corr_ClosestToBeam_3_4 for muons with upstream tracks

Particle type: hadron

Variable: log_cov_ClosestToBeam_0_0 for hadrons with upstream tracks

Variable: log_cov_ClosestToBeam_1_1 for hadrons with upstream tracks

Variable: log_cov_ClosestToBeam_2_2 for hadrons with upstream tracks

Variable: log_cov_ClosestToBeam_3_3 for hadrons with upstream tracks

Variable: log_cov_ClosestToBeam_4_4 for hadrons with upstream tracks

Variable: corr_ClosestToBeam_0_1 for hadrons with upstream tracks

Variable: corr_ClosestToBeam_0_2 for hadrons with upstream tracks

Variable: corr_ClosestToBeam_1_2 for hadrons with upstream tracks

Variable: corr_ClosestToBeam_0_3 for hadrons with upstream tracks

Variable: corr_ClosestToBeam_1_3 for hadrons with upstream tracks

Variable: corr_ClosestToBeam_2_3 for hadrons with upstream tracks

Variable: corr_ClosestToBeam_0_4 for hadrons with upstream tracks

Variable: corr_ClosestToBeam_1_4 for hadrons with upstream tracks

Variable: corr_ClosestToBeam_2_4 for hadrons with upstream tracks

Variable: corr_ClosestToBeam_3_4 for hadrons with upstream tracks

Downstream tracks

Particle type: electron

Variable: log_cov_ClosestToBeam_0_0 for electrons with downstream tracks

Variable: log_cov_ClosestToBeam_1_1 for electrons with downstream tracks

Variable: log_cov_ClosestToBeam_2_2 for electrons with downstream tracks

Variable: log_cov_ClosestToBeam_3_3 for electrons with downstream tracks

Variable: log_cov_ClosestToBeam_4_4 for electrons with downstream tracks

Variable: corr_ClosestToBeam_0_1 for electrons with downstream tracks

Variable: corr_ClosestToBeam_0_2 for electrons with downstream tracks

Variable: corr_ClosestToBeam_1_2 for electrons with downstream tracks

Variable: corr_ClosestToBeam_0_3 for electrons with downstream tracks

Variable: corr_ClosestToBeam_1_3 for electrons with downstream tracks

Variable: corr_ClosestToBeam_2_3 for electrons with downstream tracks

Variable: corr_ClosestToBeam_0_4 for electrons with downstream tracks

Variable: corr_ClosestToBeam_1_4 for electrons with downstream tracks

Variable: corr_ClosestToBeam_2_4 for electrons with downstream tracks

Variable: corr_ClosestToBeam_3_4 for electrons with downstream tracks

Particle type: muon

Variable: log_cov_ClosestToBeam_0_0 for muons with downstream tracks

Variable: log_cov_ClosestToBeam_1_1 for muons with downstream tracks

Variable: log_cov_ClosestToBeam_2_2 for muons with downstream tracks

Variable: log_cov_ClosestToBeam_3_3 for muons with downstream tracks

Variable: log_cov_ClosestToBeam_4_4 for muons with downstream tracks

Variable: corr_ClosestToBeam_0_1 for muons with downstream tracks

Variable: corr_ClosestToBeam_0_2 for muons with downstream tracks

Variable: corr_ClosestToBeam_1_2 for muons with downstream tracks

Variable: corr_ClosestToBeam_0_3 for muons with downstream tracks

Variable: corr_ClosestToBeam_1_3 for muons with downstream tracks

Variable: corr_ClosestToBeam_2_3 for muons with downstream tracks

Variable: corr_ClosestToBeam_0_4 for muons with downstream tracks

Variable: corr_ClosestToBeam_1_4 for muons with downstream tracks

Variable: corr_ClosestToBeam_2_4 for muons with downstream tracks

Variable: corr_ClosestToBeam_3_4 for muons with downstream tracks

Particle type: hadron

Variable: log_cov_ClosestToBeam_0_0 for hadrons with downstream tracks

/tmp/ipykernel_45694/830466830.py:49: RuntimeWarning: ks_2samp: Exact calculation unsuccessful. Switching to method=asymp.
  *ks_2samp(bin_df[label].values.get(), bin_df[f"predicted_{label}"].values.get())])

Variable: log_cov_ClosestToBeam_1_1 for hadrons with downstream tracks

Variable: log_cov_ClosestToBeam_2_2 for hadrons with downstream tracks

Variable: log_cov_ClosestToBeam_3_3 for hadrons with downstream tracks

Variable: log_cov_ClosestToBeam_4_4 for hadrons with downstream tracks

Variable: corr_ClosestToBeam_0_1 for hadrons with downstream tracks

Variable: corr_ClosestToBeam_0_2 for hadrons with downstream tracks

Variable: corr_ClosestToBeam_1_2 for hadrons with downstream tracks

Variable: corr_ClosestToBeam_0_3 for hadrons with downstream tracks

Variable: corr_ClosestToBeam_1_3 for hadrons with downstream tracks

Variable: corr_ClosestToBeam_2_3 for hadrons with downstream tracks

Variable: corr_ClosestToBeam_0_4 for hadrons with downstream tracks

Variable: corr_ClosestToBeam_1_4 for hadrons with downstream tracks

Variable: corr_ClosestToBeam_2_4 for hadrons with downstream tracks

Variable: corr_ClosestToBeam_3_4 for hadrons with downstream tracks

Kolmogorov-Smirnov distance as a quality metric¶

To assess the overall quality of the dataset we are using the KS distance as computed from the histograms discussed above. For each kinematic bin with a sufficient number of samples (100 or more) and for each output feature, we compute the KS distance. The KS distances are used to fill a histogram that provides a visual summary of the fitting of the model to the reference sample across the kinematic region covered by the training sample.

In general, we aim at the lowest possible values of the KS distance. In practice, values below 0.1 are conventionally considered as very good and values below 0.3 are considered acceptable. These thresholds may change increasing the statistics of the test set.

KS distance across the kinematic space¶

In the plot below, we report the value of the KS distance as a function the kinematic bin. The purpose of this visualization is to identify regions where the model performs better and regions where it is weaker. In general, it is observed, as expected, that the model performs better in regions where the training dataset is more populated.

No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.
No handles with labels found to put in legend.

Conclusion¶

In this notebook we discussed the validation procedure of the resolution model, comparing the distributions of the predicted features with the reference test set.

Additional splitting and variables might be included in the future to provide a more comprhensive view on the performance of the model.

To ease the identification of regions where the model is weaker, we draw the KS distance obtained comparing the distribution of each variable, in each kinematic bin.