Report. To run a Preset on your data, comparing current data to ref data:

report = Report([
    DataDriftPreset(),
])

my_eval = report.run(current, ref)

Test Suite. To add Tests with explicit pass/fail for each column:

report = Report([
    DataDriftPreset(),
],
include_tests=True)

my_eval = report.run(current, ref)

Overview

TheDataDriftPreset lets you evaluate shift in data distribution between the two datasets to detect if there are significant changes.

  • Column drift. Checks for shifts in each column. The drift detection method is chosen automatically based on the column type and number of observations.

  • Target / Prediction Drift. If you dataset includes Prediction or Target value, it will be evaluated together with other columns.

  • Overall dataset drift. Returns the share of drifting columns in the Dataset. By default, Dataset Drift is detected if at least 50% of columns drift.

The table shows the drifting columns first. You can also choose to sort the rows by the feature name or type, and open up individual columns to see distribution details.

If you choose to enable Tests, you will get an additional Test Suite view:

Data Drift Explainer. Read about Data Drift Methods and default algorithm.

Use case

You can evaluate data drift in different scenarios.

  • To monitor the ML model performance without ground truth. When you do not have true labels or actuals, you can monitor feature drift and prediction drift to check if the model still operates in a familiar environment. These are proxy metrics. If you detect drift in features or prediction, you can trigger labelling and retraining, or decide to pause and switch to a different decision method.

  • When you are debugging the ML model quality decay. If you observe a drop in the model quality, you can evaluate Data Drift to explore the change in the feature patterns, e.g., to understand the change in the environment or discover the appearance of a new segment.

  • To understand model drift in an offline environment. You can explore the historical data drift to understand past changes and define the optimal drift detection approach and retraining strategy.

  • To decide on the model retraining. Before feeding fresh data into the model, you might want to verify whether it even makes sense. If there is no data drift, the environment is stable, and retraining might not be necessary.

For conceptual explanation, read about Data Drift and Concept Drift. To build intuition about different drift detection methods, check these research blogs: numerical data, embeddings.

Data requirements

  • Input columns. You can provide any input columns. They must be non-empty.

  • Two datasets. You must always pass both: the current one will be compared to the reference.

  • (Optional) Set column types. The Preset evaluates drift for numerical, categorical, or text data. You can specify column types explicitly (recommended). Otherwise Evidently will auto-detect numerical and categorical features. You must always map text data.

Data schema mapping. Use the data definition to map your input data.

Report customization

You have multiple customization options.

Select columns. You can apply Drift Detection only to some columns in the Dataset, for example, only to the important features. Use the columns parameter.

Change drift parameters. You can modify how drift detection works:

  • Change methods. Evidently has a large number of drift detection methods (univariate and multivariate), including PSI, K-L divergence, Jensen-Shannon distance, Wasserstein distance, etc. You can also pick tests by column.

  • Change thresholds. You can specify different drift detection conditions on the dataset or column level.

  • Implement a custom method. You can implement a custom drift method as Python function.

Drift detection parameters. Learn available methods and parameters in Drift Customization. .

Modify Report composition. You can add other Metrics to the Report to get a more comprehensive evaluation. Here are some recommended options.

  • Single out the Target/Prediction column. If you want to evaluate drift in the Prediction column separately, you can add ValueDrift("prediction") to your Report so that you see the drift in this value in a separate widget.

  • Add data quality checks. Add DataSummaryPreset to get descriptive stats and run Tests like detecting missing values. Data drift check drops nulls (and compares the distributions of non-empty features), so you may want to run these Tests separately.

  • Check for correlation changes. You can also consider adding checks on changes in correlations between the features.

  • Detect embedding drift. To detect drift in embeddings data, use EmbeddingDrift metric.

Custom Report. Check how to create a Report and add Tests conditions.