{% include 'head.html' %}

{{ meta.report_title }}{{ meta.report_subtitle }}

Generated on {{ report_creation_datetime.strftime("%d %b %Y, %H:%M") }}   ●   {{ "{:,d}".format(meta.rows_original) }} original samples, {{ "{:,d}".format(meta.rows_synthetic) }} synthetic samples

{% if is_model_report %}
Accuracy
{{html_assets['info.svg']}}
{{ "{:.1%}".format(metrics.accuracy.overall) }}
({{ "{:.1%}".format(metrics.accuracy.overall_max) }})
{% if 'bivariate' in accuracy_table_by_column %} {% endif %} {% if 'trivariate' in accuracy_table_by_column %} {% endif %} {% if 'coherence' in accuracy_table_by_column %} {% endif %}
Univariate {{ "{:.1%}".format(metrics.accuracy.univariate) }} {% if metrics.accuracy.univariate_max is not none %}
({{ "{:.1%}".format(metrics.accuracy.univariate_max) }})
{% endif %}
Bivariate {{ "{:.1%}".format(metrics.accuracy.bivariate) }} {% if metrics.accuracy.bivariate_max is not none %}
({{ "{:.1%}".format(metrics.accuracy.bivariate_max) }})
{% endif %}
Trivariate {{ "{:.1%}".format(metrics.accuracy.trivariate) }} {% if metrics.accuracy.trivariate_max is not none %}
({{ "{:.1%}".format(metrics.accuracy.trivariate_max) }})
{% endif %}
Coherence {{ "{:.1%}".format(metrics.accuracy.coherence).replace('nan%', '-') }} {% if metrics.accuracy.coherence_max is not none %}
({{ "{:.1%}".format(metrics.accuracy.coherence_max) }})
{% endif %}
Similarity
{{html_assets['info.svg']}}
Cosine Similarity {{ "{:.5f}".format(metrics.similarity.cosine_similarity_training_synthetic) }} {% if metrics.similarity.cosine_similarity_training_holdout is not none %}
({{ "{:.5f}".format(metrics.similarity.cosine_similarity_training_holdout) }})
{% endif %}
Discriminator AUC {{ "{:.1%}".format(metrics.similarity.discriminator_auc_training_synthetic) }}
(50.0%)
Distances
{{html_assets['info.svg']}}
{% if metrics.distances.dcr_share is not none %} {% endif %} {% if metrics.distances.nndr_holdout is not none %} {% endif %}
Identical Matches {{ "{:.1%}".format(metrics.distances.ims_training) }} {% if metrics.distances.ims_holdout is not none %}
({{ "{:.1%}".format(metrics.distances.ims_holdout) }})
{% endif %}
Average Distances {{ "{:.3f}".format(metrics.distances.dcr_training) }} {% if metrics.distances.dcr_holdout is not none %}
({{ "{:.3f}".format(metrics.distances.dcr_holdout) }})
{% endif %}
DCR Share {{ "{:.1%}".format(metrics.distances.dcr_share) }}
(50.0%)
NNDR Ratio {{ "{:.3f}".format(metrics.distances.nndr_training / metrics.distances.nndr_holdout) }}
(1.000)
{% endif %}

Correlations

{{ correlation_matrix_html_chart }}

Univariate Distributions

{% for uni_plots_row in univariate_html_charts | batch(3, ' ') %}
{% for uni_plot in uni_plots_row %}
{{ uni_plot }}
{% endfor %}
{% endfor %}
{% if bivariate_html_charts_tgt | length > 0 %}

Bivariate Distributions

{% for biv_plots_row in bivariate_html_charts_tgt | batch(3, ' ') %}
{% for biv_plot in biv_plots_row %}
{{ biv_plot }}
{% endfor %}
{% endfor %}
{% endif %} {% if bivariate_html_charts_ctx | length > 0 %}

Bivariate Distributions for context

{% for biv_plots_row in bivariate_html_charts_ctx | batch(3, ' ') %}
{% for biv_plot in biv_plots_row %}
{{ biv_plot }}
{% endfor %}
{% endfor %}
{% endif %} {% if 'coherence' in accuracy_table_by_column %}

Coherence: Auto-correlations

{% for biv_plots_row in bivariate_html_charts_nxt | batch(3, ' ') %}
{% for biv_plot in biv_plots_row %}
{{ biv_plot }}
{% endfor %}
{% endfor %}
{% endif %} {% if sequences_per_distinct_category_html_charts | length > 0 %}

Coherence: Sequences per Distinct Category

{% for seq_per_cat_plots_row in sequences_per_distinct_category_html_charts | batch(3, ' ') %}
{% for seq_per_cat_plot in seq_per_cat_plots_row %}
{{ seq_per_cat_plot }}
{% endfor %}
{% endfor %}
{% endif %} {% if distinct_categories_per_sequence_html_charts | length > 0 %}

Coherence: Distinct Categories per Sequence

{% for cats_per_seq_plots_row in distinct_categories_per_sequence_html_charts | batch(3, ' ') %}
{% for cats_per_seq_plot in cats_per_seq_plots_row %}
{{ cats_per_seq_plot }}
{% endfor %}
{% endfor %}
{% endif %} {% if is_model_report %}

Accuracy

{% if 'bivariate' in accuracy_table_by_column %} {% endif %} {% if 'trivariate' in accuracy_table_by_column %} {% endif %} {% if 'coherence' in accuracy_table_by_column %} {% endif %} {% for key, row in accuracy_table_by_column.iterrows() %} {% if 'bivariate' in accuracy_table_by_column %} {% endif %} {% if 'trivariate' in accuracy_table_by_column %} {% endif %} {% if 'coherence' in accuracy_table_by_column %} {% endif %} {% endfor %} {% if 'bivariate' in accuracy_table_by_column %} {% endif %} {% if 'trivariate' in accuracy_table_by_column %} {% endif %} {% if 'coherence' in accuracy_table_by_column %} {% endif %}
Column UnivariateBivariateTrivariateCoherence
{{ row['column'] }} {{ "{:.1%}".format(row['univariate']) }}{{ "{:.1%}".format(row['bivariate']) }}{{ "{:.1%}".format(row['trivariate']) }}{{ "{:.1%}".format(row['coherence']).replace('nan%', '-') }}
Total
 
{{ "{:.1%}".format(metrics.accuracy.univariate) }}
({{ "{:.1%}".format(metrics.accuracy.univariate_max) }})
{{ "{:.1%}".format(metrics.accuracy.bivariate) }}
({{ "{:.1%}".format(metrics.accuracy.bivariate_max) }})
{{ "{:.1%}".format(metrics.accuracy.trivariate) }}
({{ "{:.1%}".format(metrics.accuracy.trivariate_max) }})
{{ "{:.1%}".format(metrics.accuracy.coherence) }}
({{ "{:.1%}".format(metrics.accuracy.coherence_max) }})
{{ accuracy_matrix_html_chart }}

{{html_assets['explainer.svg']}}
Explainer
Accuracy of synthetic data is assessed by comparing the distributions of the synthetic (shown in green) and the original data (shown in gray). For each distribution plot we sum up the deviations across all categories, to get the so-called total variation distance (TVD). The reported accuracy is then simply reported as 100% - TVD. These accuracies are calculated for all univariate, bivariate and trivariate distributions. A final accuracy score is then calculated as the average across all of these.
{% endif %} {% if similarity_pca_html_chart %}

Similarity

{{ similarity_pca_html_chart }}

{{html_assets['explainer.svg']}}
Explainer
These plots show the first 3 principal components of training samples, synthetic samples, and (if available) holdout samples within the embedding space. The black dots visualize the centroids of the respective samples. The similarity metric then measures the cosine similarity between these centroids. We expect the cosine similarity to be close to 1, indicating that the synthetic samples are as similar to the training samples as the holdout samples are.
{% endif %} {% if is_model_report %}

Distances

{% if metrics.distances.ims_holdout is not none %} {% endif %} {% if metrics.distances.ims_holdout is not none %} {% endif %} {% if metrics.distances.dcr_holdout is not none %} {% endif %} {% if metrics.distances.nndr_holdout is not none %} {% endif %} {% if metrics.distances.dcr_share is not none %} {% endif %} {% if metrics.distances.nndr_holdout is not none %} {% endif %}
Synthetic vs. TrainingSynthetic vs. Holdout Training vs. Holdout
Identical Matches {{ "{:.1%}".format(metrics.distances.ims_training) }}{{ "{:.1%}".format(metrics.distances.ims_holdout) }} {{ "{:.1%}".format(metrics.distances.ims_trn_hol) if metrics.distances.ims_trn_hol is not none else "N/A" }}
DCR Average {{ "{:.3f}".format(metrics.distances.dcr_training) }}{{ "{:.3f}".format(metrics.distances.dcr_holdout) }} {{ "{:.3f}".format(metrics.distances.dcr_trn_hol) if metrics.distances.dcr_trn_hol is not none else "N/A" }}
NNDR Min10 {{ "{:.2e}".format(metrics.distances.nndr_training) if metrics.distances.nndr_training < 0.01 else "{:.3f}".format(metrics.distances.nndr_training) }}{{ "{:.2e}".format(metrics.distances.nndr_holdout) if metrics.distances.nndr_holdout < 0.01 else "{:.3f}".format(metrics.distances.nndr_holdout) }} {{ "{:.2e}".format(metrics.distances.nndr_trn_hol) if metrics.distances.nndr_trn_hol < 0.01 else "{:.3f}".format(metrics.distances.nndr_trn_hol) }}
DCR Share {{ "{:.1%}".format(metrics.distances.dcr_share) }} of synthetic samples are closer to a training than to a holdout sample
NNDR Ratio {{ "{:.3f}".format(metrics.distances.nndr_training / metrics.distances.nndr_holdout) }} = (NNDR Min10 of Synthetic vs. Training) / (NNDR Min10 of Synthetic vs. Holdout)
{{ distances_dcr_html_chart }}

{{html_assets['explainer.svg']}}
Explainer
Synthetic data shall be as close to the original training samples, as it is close to original holdout samples, which serve us as a reference. This can be asserted empirically by measuring distances between synthetic samples to their closest original samples, whereas training and holdout sets are sampled to be of equal size. A green line that is significantly left of the dark gray line implies that synthetic samples are closer to the training samples than to the holdout samples, indicating that the data has overfitted to the training data. A green line that overlays with the dark gray line validates that the trained model indeed represents the general rules, that can be found in training just as well as in holdout samples. The DCR share indicates the proportion of synthetic samples that are closer to a training sample than to a holdout sample, and ideally, this value should not significantly exceed 50%, as a higher value could indicate overfitting. The NNDR ratio is the ratio of the 10-th smallest NNDR for synthetic vs. training, divided by 10-th smallest NNDR for synthetic vs. holdout. Ideally, this value should be close to 1, indicating that the synthetic samples are in sparse as well as in dense regions just as close to the training samples as to the holdout samples.
{% endif %}