Data drift

In some tests and metrics, Evidently uses the default Data Drift Detection algorithm. It helps detect the distribution drift in the individual columns (features, prediction, or target). This page describes how the default algorithm works.

This applies to: DataDriftPreset, ValueDrift, DriftedColumnsCount.

This is an explainer page. For API reference, check the guide on setting data drift parameters.

How it works

Evidently compares the distributions of the values in a given column (or columns) of the two datasets. You should pass these datasets as reference and current. Evidently applies several statistical tests and drift detection methods to detect if the distribution has changed significantly. It returns a “drift detected” or “not detected” result.

There is a default logic to choosing the appropriate drift test for each column. It is based on:

column type: categorical, numerical, text data or embeddings
the number of observations in the reference dataset
the number of unique values in the column (n_unique)

On top of this, you can set a rule to detect dataset-level drift based on the number of columns that are drifted.

Data requirements

Two datasets. You always need to pass two datasets: current (dataset evaluated for drift) and reference (dataset that serves as a benchmark).

Non-empty columns. To evaluate data or prediction drift in the dataset, you need to ensure that the columns you test for drift are not empty. If these columns are empty in either reference or current data, Evidently will not calculate distribution drift and will raise an error.

Empty values. If some columns contain empty or infinite values (+-np.inf), these values will be filtered out when calculating distribution drift in the corresponding column.

By default, drift tests do not react to changes or increases in the number of empty values. Since the high number of nulls can be an important indicator, we recommend running separate tests on share of nulls in the dataset and/or columns. You can choose from several tests.

Dataset drift

With Presets like DatasetDriftPreset() and Metrics like DriftedColumnsCount(), you can also set a rule on top of the individual column drift results to detect dataset-level drift.

For example, you can declare dataset drift if 50% of all features (columns) drifted. In this case, each column in the Dataset is tested for drift individually using a default method for the column type. You can specify a custom threshold as a parameter.

Tabular data drift

The following defaults apply for tabular data: numerical and categorical columns.

For small data with <= 1000 observations in the reference dataset:

For numerical columns (n_unique > 5): two-sample Kolmogorov-Smirnov test.
For categorical columns or numerical columns with n_unique <= 5: chi-squared test.
For binary categorical features (n_unique <= 2): proportion difference test for independent samples based on Z-score.

All tests use a 0.95 confidence level by default. Drift score is P-value. (=< 0.05 means drift).

For larger data with > 1000 observations in the reference dataset:

For numerical columns (n_unique > 5):Wasserstein Distance.
For categorical columns or numerical with n_unique <= 5):Jensen—Shannon divergence.

All metrics use a threshold = 0.1 by default. Drift score is distance/divergence. (>= 0.1 means drift).

You can modify this drift detection logic. You can select any method available in the library (PSI, K-L divergence, Jensen-Shannon distance, Wasserstein distance, etc.), specify thresholds, or pass a custom test. Read more about data drift parameters and available methods.

Exploring drift. You can see the distribution of each individual column inside the DataDriftPreset or using ValueDrift metric:

For numerical features, you can also explore the values mapped in a plot.

The dark green line is the mean, as seen in the reference dataset.
The green area covers one standard deviation from the mean.

Index is binned to 150 or uses timestamp if provided.

Text data drift

Text content drift using a domain classifier. Evidently trains a binary classification model to discriminate between data from reference and current distributions.

If the model can confidently identify which text samples belong to the “newer” data, you can consider that the two datasets are significantly different.

You can read more about the domain classifier approach in the paper “Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift.”

The drift score in this case is the ROC AUC of the resulting classifier.

The default for larger data with > 1000 observations detects drift if the ROC AUC > 0.55. The ROC AUC of the obtained classifier is directly compared against the set ROC AUC threshold. You can set a different threshold as a parameter.

The default for small data with <= 1000 observations detects drift if the ROC AUC of the drift detection classifier > possible ROC AUC of the random classifier at a 95th percentile. This approach protects against false positive drift results for small datasets since we explicitly compare the classifier score against the “best random score” we could obtain.

How this works. The drift score is the ROC-AUC score of the domain classifier computed on a validation dataset. This ROC AUC is compared to the ROC AUC of the random classifier at a set percentile. To ensure the result is statistically meaningful, we repeat the calculation 1000 times with randomly assigned target class probabilities. This produces a distribution with a mean of 0.5. We then take the 95th percentile (default) of this distribution and compare it to the ROC-AUC score of the classifier. If the classifier score is higher, we consider the data drift to be detected. You can also set a different percentile as a parameter.

If the drift is detected, Evidently will also calculate the top features of the domain classifier. The resulting output contains specific characteristic words that help identify whether a given sample belongs to reference or current. They are normalized based on vocabulary, for example, to exclude non-interpretable words such as articles.

Text descriptors drift. If you work with raw text data, you can also check for distribution drift in text descriptors (such as text length, etc.) To use this method, first compute the selected text descriptors. Then, use numerical / categorical drift detection methods as usual.

Embeddings drift

This method is coming soon to the new Evidently API! Check the old docs for now.

The default embedding drift method is a classifier. Evidently trains a binary classification model to discriminate between data from reference and current distributions.

The default for small data with <= 1000 observations detects drift if the ROC AUC of the drift detection classifier > possible ROC AUC of the random classifier at a 95th percentile.
The default for larger data with > 1000 observations detects drift if the ROC AUC > 0.55.

You can choose other embedding drift detection methods. You can specify custom thresholds and parameters such as dimensionality reduction and choose from other methods, including Euclidean distance, Cosine Similarity, Maximum Mean Discrepancy, and share of drifted embeddings. You must specify this as a parameter.

Resources

To build up a better intuition for which tests are better in different kinds of use cases, you can read our in-depth blogs with experimental code:

Additional links:

Reference

Metric Customization

Explainers

How it works

Data requirements

Dataset drift

Tabular data drift

Text data drift

Embeddings drift

Resources

Reference

Metric Customization

Explainers

​How it works

​Data requirements

​Dataset drift

​Tabular data drift

​Text data drift

​Embeddings drift

​Resources

How it works

Data requirements

Dataset drift

Tabular data drift

Text data drift

Embeddings drift

Resources