Metric : the name of Metric or Preset you can pass to Report
.
Description: what it does. Complex Metrics link to explainer pages.
Parameters: available options. You can also add conditional tests
to any Metric with standard operators like eq
(equal), gt
(greater than), etc. How Tests work .
Test defaults are conditions that apply when you invoke Tests but do not set a pass/fail condition yourself.
With reference : if you provide a reference dataset during the Report run
, the conditions are set relative to reference.
No reference : if you do not provide a reference, Tests will use fixed heuristics (like expect no missing values).
Text Evals
Use to summarize results of output-level text or LLM evals.
Data definition :
Metric Description Parameters Test Defaults TextEvals() Optional : As in Metrics included in ValueStats
Columns
Use to aggregate descriptor results or check data quality on column level.
Data definition : map column types.
Value stats
Descriptive statistics.
Metric Description Parameters Test Defaults ValueStats() Small Preset, column-level. Computes various descriptive stats (min, max, mean, quantiles, most common, etc.) Returns different stats based on the column type (text, categorical, numerical, datetime). Required : Optional : No reference . As in individual Metrics.With reference . As in indiviudal Metrics.MinValue() Column-level. Returns min value for a given numerical column. Metric result: value
. Required : Optional : No reference . N/A.With reference . Fails if Min Value is differs by more than 10% (+/-).StdValue() Column-level. Computes the standard deviation of a given numerical column. Metric result: value
. Required : Optional : No reference . N/A.With reference . Fails if the standard deviation differs by more than 10% (+/-).MeanValue() Column-level. Computes the mean value of a given numerical column. Metric result: value
. Required : Optional : No reference . N/A.With reference . Fails if the mean value differs by more than 10%.MaxValue() Column-level. Computes the max value of a given numerical column. Metric result: value
. Required : Optional : No reference . N/A.With reference . Fails if the max value is higher than in the reference.MedianValue() Column-level. Computes the median value of a given numerical column. Metric result: value
. Required : Optional : No reference . N/A.With reference . Fails if the median value differs by more than 10% (+/-).QuantileValue() Column-level. Computes the quantile value of a given numerical column. Defaults to 0.5 if no quantile is specified. Metric result: value
. Required : Optional : No reference . N/A.With reference . Fails if quantile value differs by more than 10% (+/-).CategoryCount() Example: CategoryCount(
column="city",
category="NY")
Column-level. Counts occurrences of the specified category. Metric result: count
, share
. Required : Optional : No reference . N/A.With reference . Fails if the specified category is not present.
Column data quality
Column-level data quality metrics.
Data definition : map column types.
Metric Description Parameters Test Defaults MissingValueCount() Column-level. Counts the number and share of missing values. Metric result: count
, share
. Required : Optional : No reference : Fails if there are missing values.With reference : Fails if share of missing values is >10% higher.NewCategoriesCount() (Coming soon)Column-level. Counts new categories compared to reference (reference required). Metric result: count
, share
. Required : Optional : Expect 0. MissingCategoriesCount() (Coming soon)Column-level. Counts missing categories compared to reference. Metric result: count
, share
. Required : Optional : Expect 0. InRangeValueCount() Example: InRangeValueCount(
column="age",
left="1", right="18")
Column-level. Counts the number and share of values in the set range. Metric result: count
, share
. Required : Optional : No reference : N/A.With reference : Fails if column contains values out of the min-max reference range.OutRangeValueCount() Column-level. Counts the number and share of values out of the set range. Metric result: count
, share
. Required : Optional : No reference : N/A.With reference : Fails if any value is out of min-max reference range.InListValueCount() Column-level. Counts the number and share of values in the set list. Metric result: count
, share
. Required : Optional : No reference : N/A.With reference : Fails if any value is out of list.OutListValueCount() Example: OutListValueCount(
column="city",
values=["Lon", "NY"])
Column-level. Counts the number and share of values out of the set list. Metric result: count
, share
. Required : Optional : No reference : N/A.With reference : Fails if any value is out of list.UniqueValueCount() Column-level. Counts the number and share of unique values. Metric result: values
(dict with count, share
). Required : Optional : No reference : N/A.With reference : Fails if the share of unique values differs by >10% (+/-).MostCommonValueCount() (Coming soon)Column-level. Identifies the most common value and provides its count/share. Metric result: value: count, share
. Required : Optional : No reference : Fails if most common value share is ≥80%.With reference : Fails if most common value share differs by >10% (+/-).
Dataset
Use for exploratory data analysis and data quality checks.
Data definition : map column types, ID and timestamp if available.
Dataset stats
Descriptive statistics.
Metric Description Parameters Test Defaults DataSummaryPreset() Large Preset. Combines DatasetStats
and ValueStats
for all or specified columns. Metric result: for all Metrics. Preset page Optional : As in individual Metrics. DatasetStats() Small preset. Dataset-level. Calculates descriptive dataset stats, including columns by type, rows, missing values, empty columns, etc. Metric result: for all Metrics. None No reference : As in included MetricsWith reference : As in included Metrics.RowCount() Dataset-level. Counts the number of rows. Metric result: value
. Optional : No reference : N/A.With reference : Fails if row count differs by >10%.ColumnCount() Dataset-level. Counts the number of columns. Metric result: value
. Optional : No reference : N/A.With reference : Fails if not equal to reference.
Dataset data quality
Dataset-level data quality metrics.
Data definition : map column types, ID and timestamp if available.
Metric Description Parameters Test Defaults ConstantColumnsCount() Dataset-level. Counts the number of constant columns. Metric result: value
. Optional : No reference : Fails if there is at least one constant column.With reference : Fails if count is higher than in reference.EmptyRowsCount() Dataset-level. Counts the number of empty rows. Metric result: value
. Optional : No reference : Fails if there is at least one empty row.With reference : Fails if share differs by >10%.EmptyColumnsCount() Dataset-level. Counts the number of empty columns. Metric result: value
. Optional : No reference : Fails if there is at least one empty column.With reference : Fails if count is higher than in reference.DuplicatedRowCount() Dataset-level. Counts the number of duplicated rows. Metric result: value
. Optional : No reference : Fails if there is at least one duplicated row.With reference : Fails if share differs by >10% (+/-).DuplicatedColumnsCount() Dataset-level. Counts the number of duplicated columns. Metric result: value
. Optional : No reference : Fails if there is at least one duplicated column.With reference : Fails if count is higher than in reference.DatasetMissingValueCount() Dataset-level. Calculates the number and share of missing values. Displays the number of missing values per column. Metric result: value
. Required : Optional : No reference : Fails if there are missing values.With reference : Fails if share is >10% higher than reference (+/-).AlmostEmptyColumnCount() (Coming soon) Dataset-level. Counts almost empty columns (95% empty). Metric result: value
. Optional : No reference : Fails if there is at least one almost empty column.With reference : Fails if count is higher than in reference.AlmostConstantColumnsCount() Dataset-level. Counts almost constant columns (95% identical values). Metric result: value
. Optional : No reference : Fails if there is at least one almost constant column.With reference : Fails if count is higher than in reference.RowsWithMissingValuesCount() (Coming soon) Dataset-level. Counts rows with missing values. Metric result: value
. Optional : No reference : Fails if there is at least one row with missing values.With reference : Fails if share differs by >10% (+/-)ColumnsWithMissingValuesCount() Dataset-level. Counts columns with missing values. Metric result: value
. Optional : No reference : Fails if there is at least one column with missing values.With reference : Fails if count is higher than in reference.
Data Drift
Use to detect distribution drift for text, tabular, embeddings data or over computed text descriptors. 20+ drift methods listed separately: text and tabular , embeddings .
Data definition : map column types, ID and timestamp if available.
Metric Description Parameters Test Defaults DataDriftPreset() Large Preset. Requires reference. Calculates data drift for all or set columns. Uses the default or set method. Returns drift score for each column. Visualizes all distributions. Metric result: all Metrics. Preset page .Optional : columns
method
cat_method
num_method
per_column_method
threshold
cat_threshold
num_threshold
per_column_threshold
See drift options .With reference : Data drift defaults, depending on column type. See drift methods .DriftedColumnsCount() Dataset-level. Requires reference. Calculates the number and share of drifted columns in the dataset. Each column is tested for drift using the default algorithm or set method. Returns only the total number of drifted columns. Metric result: count
, share
. Optional : columns
method
cat_method
num_method
per_column_method
threshold
cat_threshold
num_threshold
per_column_threshold
See drift options .With reference : Fails if 50% of columns are drifted.ValueDrift() Column-level. Requires reference. Calculates data drift for a defined column (num, cat, text). Visualizes distributions. Metric result: value
. Required : Optional: See drift options .With reference : Data drift defaults, depending on column type. See drift methods .MultivariateDrift() (Coming soon)Dataset-level. Requires reference. Computes a single dataset drift score. Default method: share of drifted columns. Metric result: value
. Optional : See drift options .With reference : Defaults for method. See methods . EmbeddingDrift() (Coming soon)Column-level. Requires reference. Calculates data drift for embeddings. Requires embedding columns set in data definition. Metric result: value
. Required : See embedding drift options .With reference : Defaults for method. See methods .
Correlations
Use for exploratory data analysis, drift monitoring (correlation changes) or to check alignment between scores (e.g. LLM-based descriptors against human labels).
Data definition : map column types.
Metric Description Parameters Test Defaults DatasetCorrelations() (Coming soon)Calculates the correlations between all or set columns in the dataset. Supported methods: Pearson, Spearman, Kendall, Cramer_V. Optional : N/A Correlation() (Coming soon)Calculates the correlation between two defined columns. Required : Optional :method
(default: pearson
, available: pearson
, spearman
, kendall
, cramer_v
)Test conditions N/A CorrelationChanges() (Coming soon)Dataset-level. Reference required. Checks the number of correlation violations (significant changes in correlation strength between columns) across all or set columns. Optional : columns
method
(default: pearson
, available: pearson
, spearman
, kendall
, cramer_v
)corr_diff
(default: 0.25)Test conditions With reference : Fails if at least one correlation violation is detected.
Classification
Use to evaluate quality on a classification task (probabilistic, non-probabilistic, binary and multi-class).
Data definition : map prediction and target columns and classification type.
General
Use for binary classification and aggregated results for multi-class.
Metric Description Parameters Test Defaults ClassificationPreset() Large Preset with many classification Metrics and visuals. See Preset page . Metric result: all Metrics. Optional: probas_threshold
. As in individual Metrics. ClassificationQuality() Small Preset. Summarizes quality Metrics in a single widget. Metric result: all Metrics. Optional: probas_threshold
As in individual Metrics. LabelCount() (Coming soon)Distribution of predicted classes. Can visualize class balance and/or probability distribution. Required : Set at least one visualization: class_balance
, prob_distribution
. Optional : N/A Accuracy() Calculates accuracy. Metric result: value
. Optional : No reference : Fails if lower than dummy model accuracy.With reference : Fails if accuracy differs by >20%.Precision() Calculates precision. Visualizations available: Confusion Matrix, PR Curve, PR Table. Metric result: value
. Required : Set at least one visualization: conf_matrix
, pr_curve
, pr_table
. Optional : probas_threshold
(default: None or 0.5 for probabilistic classification)top_k
Test conditions No reference : Fails if Precision is lower than the dummy model.With reference : Fails if Precision differs by >20%.Recall() Calculates recall. Visualizations available: Confusion Matrix, PR Curve, PR Table. Metric result: value
. Required : Set at least one visualization: conf_matrix
, pr_curve
, pr_table
. Optional : No reference : Fails if lower than dummy model recall.With reference : Fails if Recall differs by >20%.F1Score() Calculates F1 Score. Metric result: value
. Required : Set at least one visualization: conf_matrix
. Optional : No reference : Fails if lower than dummy model F1.With reference : Fails if F1 differs by >20%.TPR() Calculates True Positive Rate (TPR). Metric result: value
. Required : Set at least one visualization: pr_table
. Optional : No reference : Fails if TPR is lower than the dummy model.With reference : Fails if TPR differs by >20%.TNR() Calculates True Negative Rate (TNR). Metric result: value
. Required : Set at least one visualization: pr_table
. Optional : No reference : Fails if TNR is lower than the dummy model.With reference : Fails if TNR differs by >20%.FPR() Calculates False Positive Rate (FPR). Metric result: value
. Required : Set at least one visualization: pr_table
. Optional : No reference : Fails if FPR is higher than the dummy model.With reference : Fails if FPR differs by >20%.FNR() Calculates False Negative Rate (FNR). Metric result: value
. Required : Set at least one visualization: pr_table
. Optional : No reference : Fails if FNR is higher than the dummy model.With reference : Fails if FNR differs by >20%.LogLoss() Calculates Log Loss. Metric result: value
. Required : Set at least one visualization: pr_table
. Optional : No reference : Fails if LogLoss is higher than the dummy model (equals 0.5 for a constant model).With reference : Fails if LogLoss differs by >20%.RocAUC() Calculates ROC AUC. Can visualize PR curve or table. Metric result: value
. Required : Set at least one visualization: pr_table
, roc_curve
. Optional : No reference : Fails if ROC AUC is ≤ 0.5.With reference : Fails if ROC AUC differs by >20%.Lift() (Coming soon)Calculates lift. Can visualize lift curve or table. Metric result: value
. Required : Set at least one visualization: lift_table
, lift_curve
. Optional : N/A
Dummy metrics:
Use these Metics to get the quality of a dummy model created on the same data (based on heuristics). You can compare your model quality to verify that it’s better than random. These Metrics serve as a baseline in automated testing.
Metric Description Parameters Test Defaults ClassificationDummyQuality() Small Preset summarizing quality of a dummy model. Metric result: all Metrics N/A N/A DummyPrecision() Calculates precision for a dummy model. Metric result: value
. N/A N/A DummyRecall() Calculates recall for a dummy model. Metric result: value
. N/A N/A DummyF1() Calculates F1 Score for a dummy model. Metric result: value
. N/A N/A
By label
Use when you have multiple classes and want to evaluate quality separately.
Metric Description Parameters Test Defaults ClassificationQualityByLabel() Small Preset summarizing classification quality Metrics by label. Metric result: all Metrics. None As in individual Metrics. PrecisionByLabel() Calculates precision by label in multiclass classification. Metric result (dict): label: value
. Optional : No reference : Fails if Precision is lower than the dummy model.With reference : Fails if Precision differs by >20%.F1ByLabel() Calculates F1 Score by label in multiclass classification. >Metric result (dict): label: value
. Optional : No reference : Fails if F1 is lower than the dummy model.With reference : Fails if F1 differs by >20%.RecallByLabel() Calculates recall by label in multiclass classification. >Metric result (dict): label: value
Optional : No reference : Fails if Recall is lower than the dummy model.With reference : Fails if Recall differs by >20%.RocAUCByLabel() Calculates ROC AUC by label in multiclass classification. Metric result (dict): label: value
Optional : No reference : Fails if ROC AUC is ≤ 0.5.With reference : Fails if ROC AUC differs by >20%.
Regression
Use to evaluate the quality of a regression model.
Data definition : map prediction and target columns.
Metric Description Parameters Test Defaults RegressionPreset Large Preset. Includes a wide range of regression metrics with rich visuals. Metric result: all metrics. See Preset page . None. As in individual metrics. RegressionQuality Small Preset. Summarizes key regression metrics in a single widget. Metric result: all metrics. None. As in individual metrics. MeanError() Calculates the mean error. Visualizations available: Error Plot, Error Distribution, Error Normality. Metric result: mean_error
, error_std
. Required : Set at least one visualization: error_plot
, error_distr
, error_normality
. Optional : No reference/With reference : Expect ME to be near zero. Fails if Mean Error is skewed and condition is violated: eq = approx(absolute=0.1 * error_std)
.MAE() Calculates Mean Absolute Error (MAE). Visualizations available: Error Plot, Error Distribution, Error Normality. Metric result: mean_absolute_error
, absolute_error_std
. Required : Set at least one visualization: error_plot
, error_distr
, error_normality
. Optional : No reference : Fails if MAE is higher than the dummy model predicting the median target value.With reference : Fails if MAE differs by >10%.RMSE() Calculates Root Mean Square Error (RMSE). Metric result: rmse
. Optional : No reference : Fails if RMSE is higher than the dummy model predicting the mean target value.With reference : Fails if RMSE differs by >10%.MAPE() Calculates Mean Absolute Percentage Error (MAPE). Visualizations available: Percentage Error Plot. Metric result: mean_perc_absolute_error
, perc_absolute_error_std
. Required : Set at least one visualization: perc_error_plot
. Optional : No reference : Fails if MAPE is higher than the dummy model predicting the weighted median target value.With reference : Fails if MAPE differs by >10%.R2Score() Calculates R² (Coefficient of Determination). Metric result: r2score
. Optional : No reference : Fails if R² ≤ 0.With reference : Fails if R² differs by >10%.AbsMaxError() Calculates Absolute Maximum Error. Metric result: abs_max_error
. Optional : No reference : Fails if absolute maximum error is higher than the dummy model predicting the median target value.With reference : Fails if it differs by >10%.
Dummy metrics:
Use these Metics to get the baseline quality for regression: they use optimal constants (varies by the Metric). These Metrics serve as a baseline in automated testing.
Metric Description Parameters Test Defaults RegressionDummyQuality() Small Preset summarizing quality of a dummy model. Metric result: all Metrics N/A N/A DummyMeanError() Calculates Mean Error for a dummy model. Metric result: mean_error
, error std
. N/A N/A DummyMAE() Calculates Mean Absolute Error (MAE) for a dummy model. Metric result: mean_absolute_error
, absolute_error_std
. N/A N/A DummyMAPE() Calculates Mean Absolute Percentage Error (MAPE) for a dummy model. Metric result: mean_perc_absolute_error
, perc_absolute_error std
. N/A N/A DummyRMSE() Calculates Root Mean Square Error (RMSE) for a dummy model. Metric result: rmse
. N/A N/A DummyR2() Calculates Calculates R² (Coefficient of Determination) for a dummy model. Metric result: r2score
. N/A N/A
Ranking
Use to evaluate ranking, search / retrieval or recommendations.
Data definition : map prediction and target columns and ranking type. Some metrics require additional training data.
Metric Description Parameters Test Defaults RecSysPreset() Larget Preset. Includes a range of recommendation system metrics. Metric result: all metrics. See Preset page . None. As in individual metrics. RecallTopK() Calculates Recall at the top K retrieved items. Metric result: value
. Required : Optional : No reference : Tests if recall > 0.With reference : Fails if Recall differs by >10%.FBetaTopK() Calculates F-beta score at the top K retrieved items. Metric result: value
. Required : Optional : No reference : Tests if F-beta > 0.With reference : Fails if F-beta differs by >10%.PrecisionTopK() Calculates Precision at the top K retrieved items. Metric result: value
. Required : Optional : No reference : Tests if Precision > 0.With reference : Fails if Precision differs by >10%.MAP() Calculates Mean Average Precision at the top K retrieved items. Metric result: value
. Required : Optional : No reference : Tests if MAP > 0.With reference : Fails if MAP differs by >10%.NDCG() Calculates Normalized Discounted Cumulative Gain at the top K retrieved items. Metric result: value
. Required : Optional : No reference : Tests if NDCG > 0.With reference : Fails if NDCG differs by >10%.MRR() Calculates Mean Reciprocal Rank at the top K retrieved items. Metric result: value
. Required : Optional : No reference : Tests if MRR > 0.With reference : Fails if MRR differs by >10%.HitRate() Calculates Hit Rate at the top K retrieved items. Metric result: value
. Required : Optional : No reference : Tests if Hit Rate > 0.With reference : Fails if Hit Rate differs by >10%.ScoreDistribution() Computes the predicted score entropy (KL divergence). Applies only when the recommendations_type is a score.. Metric result: value
. Required : Optional : No reference :value
With reference : value
.Personalization() (Coming soon)Calculates Personalization score at the top K recommendations. Metric result: value
. Required : Optional : No reference : Tests if Personalization > 0.With reference : Fails if Personalization differs by >10%.ARP() (Coming soon)Computes Average Recommendation Popularity at the top K recommendations. Requires a training dataset. Metric result: value
. Required : Optional : No reference : Tests if ARP > 0.With reference : Fails if ARP differs by >10%.Coverage() (Coming soon)Calculates Coverage at the top K recommendations. Requires a training dataset. Metric result: value
. Required : Optional : No reference : Tests if Coverage > 0.With reference : Fails if Coverage differs by >10%.GiniIndex() (Coming soon)Calculates Gini Index at the top K recommendations. Requires a training dataset. Metric result: value
. Required : Optional : No reference : Tests if Gini Index < 1.With reference : Fails if Gini Index differs by >10%.Diversity() (Coming soon)Calculates Diversity at the top K recommendations. Requires item features. Metric result: value
. Required : Optional : No reference : Tests if Diversity > 0.With reference : Fails if Diversity differs by >10%.Serendipity() (Coming soon)Calculates Serendipity at the top K recommendations. Requires a training dataset. Metric result: value
. Required : Optional : No reference : Tests if Serendipity > 0.With reference : Fails if Serendipity differs by >10%.Novelty() (Coming soon)Calculates Novelty at the top K recommendations. Requires a training dataset. Metric result: value
. Required : Optional : No reference : Tests if Novelty > 0.With reference : Fails if Novelty differs by >10%.
Relevant for RecSys metrics:
no_feedback_user: bool = False
. Specifies whether to include the users who did not select any of the items, when computing the quality metric. Default: False.
min_rel_score: Optional[int] = None
. Specifies the minimum relevance score to consider relevant when calculating the quality metrics for non-binary targets (e.g., if a target is a rating or a custom score).