Data definition
How to map the input data.
To run evaluations, you must create a Dataset
object with a DataDefinition
, which maps:
-
Column types (e.g., categorical, numerical, text, embeddings).
-
Column roles (e.g., id, prediction, target, LLM output, etc.).
This helps Evidently process the data correctly. Some evaluations require certain columns, and will fail if these are missing.
Check data requirements for specific Metrics in the Reference table.
Basic flow
You can create a DataDefinition
in Python before generating a Report or map the columns visually in the Evidently platform.
Import the following modules:
Prepare your data. It can have flexible structure. Prepare it as a pandas.DataFrame.
Create an Evidently Dataset. Create a Dataset object using Dataset.from_pandas
, and pass the corresponding data_definition
.
For automated mapping, pass an empty DataDefinition()
object:
Once you created the Dataset, you can add text Descriptors and/or get Reports.
Two datasets. If you have two datasets (like current and reference for drift detection), create a Dataset object for each. They must have identical data definition.
Automated data definition. You can often use the automated mapping like shown above. In this case, Evidently tries to map columns:
-
Based on their type (numerical, categorical).
-
By matching column names to known roles (e.g., a column “target” treated as target).
Manual data definition. While automation works in many cases, manual mapping is more accurate. Some evaluations, like text or embedding drift always require explicit mapping.
This page shows all mapping options. Note that you only need to use the relevant ones. For example, you don’t need columns like target/prediction to run data quality checks.
Column types
Knowing the column type helps Evidently compute correct statistics, visualizations, and pick default tests.
Text data
Map the columns that contain text:
It’s optional but useful. You can generate text descriptors without explicit mapping. But it’s a good idea to map text columns since you may later run other evals which vary by column type (like Dataset Summary).
Tabular data
Map numerical, categorical or datetime columns:
Explicit mapping helps avoid mistakes like misclassifying numerical columns with few unique values as categorical.
If you exclude certain columns in mapping, they’ll be ignored in all evaluations.
Default types
If you do not pass explicit mapping, the following defaults apply:
Column Type | Description | Automated Mapping |
---|---|---|
numerical_columns |
| All columns with numeric types (np.number ). |
datetime_columns |
| All columns with DateTime format (np.datetime64 ). |
categorical_columns |
| All non-numeric/non-datetime columns. |
text_columns |
| No automated mapping. |
ID and timestamp
If you have a timestamp or ID column, it’s useful to identify them.
Column role | Description | Automated mapping |
---|---|---|
id_column |
| Column named “id” (TBC) |
timestamp |
| Column named “timestamp” (TBC) |
How istimestamp
different from datetime_columns
?
-
DateTime is a column type. You can have many DateTime columns in the dataset. For example, conversation start / end time or features like “date of last contact.”
-
Timestamp is a role. You can have a single timestamp column. It often represents the time when a data input was recorded. Use it if you want to see it as index on the plots.
LLM evals
When you generate text descriptors and add them to the dataset, they are automatically mapped as descriptors
in Data Definition. This means they will be included in the TextEvals
preset or treated as descriptors when you plot them on the dashboard.
However, if you computed some scores or metadata externally and want to treat them as descriptors, you can map them explicitly:
Regression
To run regression quality checks, you must map the columns with:
-
Target: actual values.
-
Prediction: predicted values.
You can have several regression results in the dataset, for example in case of multiple regression. (Pass the mappings in a list).
Example mapping:
Defaults:
Classification
To run classification checks, you must map the columns with:
-
Target: true label.
-
Prediction: predicted labels/probabilities.
There two different mapping options, for binary and multi-class classification. You can also have several classification results in the dataset. (Pass the mappings in a list).
Multiclass
Example mapping:
Available options and defaults:
Binary
Example mapping:
Available options and defaults:
Ranking
RecSys
To evaluate recommender systems performance, you must map the columns with:
-
Prediction: this could be predicted score or rank.
-
Target: relevance labels (e.g., this could be an interaction result like user click or upvote, or a true relevance label)
The target column can contain either:
-
a binary label (where
1
is a positive outcome) -
any scores (positive values, where a higher value corresponds to a better match or a more valuable user action).
Here are the examples of the expected data inputs.
If the system prediction is a score (expected by default):
user_id | item_id | prediction (score) | target (relevance) |
---|---|---|---|
user_1 | item_1 | 1.95 | 0 |
user_1 | item_2 | 0.8 | 1 |
user_1 | item_3 | 0.05 | 0 |
If the model prediction is a rank:
user_id | item_id | prediction (rank) | target (relevance) |
---|---|---|---|
user_1 | item_1 | 1 | 0 |
user_1 | item_2 | 2 | 1 |
user_1 | item_3 | 3 | 0 |
Example mapping:
Available options and defaults: