Data definition

To run evaluations, you must create a Dataset object with a DataDefinition, which maps:

Column types (e.g., categorical, numerical, text, embeddings).
Column roles (e.g., id, prediction, target, LLM output, etc.).

This helps Evidently process the data correctly. Some evaluations require certain columns, and will fail if these are missing.

Check data requirements for specific Metrics in the Reference table.

Basic flow

You can create a DataDefinition in Python before generating a Report or map the columns visually in the Evidently platform.

Import the following modules:

from evidently.future.datasets import Dataset
from evidently.future.datasets import DataDefinition

Prepare your data. It can have flexible structure. Prepare it as a pandas.DataFrame.

Create an Evidently Dataset. Create a Dataset object using Dataset.from_pandas, and pass the corresponding data_definition.

For automated mapping, pass an empty DataDefinition()object:

eval_data = Dataset.from_pandas(
    pd.DataFrame(source_df),
    data_definition=DataDefinition()
)

Once you created the Dataset, you can add text Descriptors and/or get Reports.

Two datasets. If you have two datasets (like current and reference for drift detection), create a Dataset object for each. They must have identical data definition.

Automated data definition. You can often use the automated mapping like shown above. In this case, Evidently tries to map columns:

Based on their type (numerical, categorical).
By matching column names to known roles (e.g., a column “target” treated as target).

Manual data definition. While automation works in many cases, manual mapping is more accurate. Some evaluations, like text or embedding drift always require explicit mapping.

This page shows all mapping options. Note that you only need to use the relevant ones. For example, you don’t need columns like target/prediction to run data quality checks.

Column types

Knowing the column type helps Evidently compute correct statistics, visualizations, and pick default tests.

Text data

Map the columns that contain text:

definition = DataDefinition(
    text_columns=["Latest_Review"]
    )
    
eval_data = Dataset.from_pandas(
    pd.DataFrame(source_df),
    data_definition=definition
)

It’s optional but useful. You can generate text descriptors without explicit mapping. But it’s a good idea to map text columns since you may later run other evals which vary by column type (like Dataset Summary).

Tabular data

Map numerical, categorical or datetime columns:

definition = DataDefinition(
    text_columns=["Latest_Review"],
    numerical_columns=["Age", "Salary"],
    categorical_columns=["Department"],
    datetime_columns=["Joining_Date"]
    )
    
eval_data = Dataset.from_pandas(
    pd.DataFrame(source_df),
    data_definition=definition
)

Explicit mapping helps avoid mistakes like misclassifying numerical columns with few unique values as categorical.

If you exclude certain columns in mapping, they’ll be ignored in all evaluations.

Default types

If you do not pass explicit mapping, the following defaults apply:

Column Type	Description	Automated Mapping
`numerical_columns`	Columns with numeric values.	All columns with numeric types (`np.number`).
`datetime_columns`	Columns with datetime values. Ignored in data drift calculations.	All columns with DateTime format (`np.datetime64`).
`categorical_columns`	Columns with categorical values.	All non-numeric/non-datetime columns.
`text_columns`	Text columns. Mapping required for text data drift detection.	No automated mapping.

ID and timestamp

If you have a timestamp or ID column, it’s useful to identify them.

definition = DataDefinition(
    id_column="Id",
    timestamp="Date"
    )

Column role	Description	Automated mapping
`id_column`	Identifier column. Ignored in data drift calculations.	Column named “id” (TBC)
`timestamp`	Timestamp column. Ignored in data drift calculations.	Column named “timestamp” (TBC)

How istimestamp different from datetime_columns?

DateTime is a column type. You can have many DateTime columns in the dataset. For example, conversation start / end time or features like “date of last contact.”
Timestamp is a role. You can have a single timestamp column. It often represents the time when a data input was recorded. Use it if you want to see it as index on the plots.

LLM evals

When you generate text descriptors and add them to the dataset, they are automatically mapped as descriptors in Data Definition. This means they will be included in the TextEvals preset or treated as descriptors when you plot them on the dashboard.

However, if you computed some scores or metadata externally and want to treat them as descriptors, you can map them explicitly:

definition = DataDefinition(
    numerical_descriptors=["chat_length", "user_rating"],
    categorical_descriptors=["upvotes", "model_type"]
    )

Regression

To run regression quality checks, you must map the columns with:

Target: actual values.
Prediction: predicted values.

You can have several regression results in the dataset, for example in case of multiple regression. (Pass the mappings in a list).

Example mapping:

definition = DataDefinition(
    regression: [Regression()]
    )

Defaults:

    target: str = "target"
    prediction: str = "prediction"

Classification

To run classification checks, you must map the columns with:

Target: true label.
Prediction: predicted labels/probabilities.

There two different mapping options, for binary and multi-class classification. You can also have several classification results in the dataset. (Pass the mappings in a list).

Multiclass

Example mapping:

from evidently.future.datasets import MulticlassClassification

definition = DataDefinition(
    classification=[MulticlassClassification()]
    )

Available options and defaults:

    target: str = "target"
    prediction_labels: str = "prediction"
    prediction_probas: Optional[List[str]] = None #if probabilistic classification
    labels: Optional[Dict[Label, str]] = None

Binary

Example mapping:

from evidently.future.datasets import BinaryClassification

definition = DataDefinition(
    classification=[BinaryClassification(
        target="target",
        prediction_labels="prediction")],
    categorical_columns=["target", "prediction"])

Available options and defaults:

    target: str = "target"
    prediction_labels: Optional[str] = None
    prediction_probas: Optional[str] = "prediction" #if probabilistic classification
    pos_label: Label = 1 #name of the positive label
    labels: Optional[Dict[Label, str]] = None

Ranking

RecSys

To evaluate recommender systems performance, you must map the columns with:

Prediction: this could be predicted score or rank.
Target: relevance labels (e.g., this could be an interaction result like user click or upvote, or a true relevance label)

The target column can contain either:

a binary label (where 1 is a positive outcome)
any scores (positive values, where a higher value corresponds to a better match or a more valuable user action).

Here are the examples of the expected data inputs.

If the system prediction is a score (expected by default):

user_id	item_id	prediction (score)	target (relevance)
user_1	item_1	1.95	0
user_1	item_2	0.8	1
user_1	item_3	0.05	0

If the model prediction is a rank:

user_id	item_id	prediction (rank)	target (relevance)
user_1	item_1	1	0
user_1	item_2	2	1
user_1	item_3	3	0

Example mapping:

definition = DataDefinition(
    ranking=[Recsys()]
    )

Available options and defaults:

    user_id: str = "user_id" #columns with user IDs
    item_id: str = "item_id" #columns with ranked items
    target: str = "target"
    prediction: str = "prediction"

Get Started

Setup

Evaluation library

Platform

Basic flow

Column types

Text data

Tabular data

Default types

ID and timestamp

LLM evals

Regression

Classification

Multiclass

Binary

Ranking

RecSys

Get Started

Setup

Evaluation library

Platform

​Basic flow

​Column types

​Text data

​Tabular data

​Default types

​ID and timestamp

​LLM evals

​Regression

​Classification

​Multiclass

​Binary

​Ranking

​RecSys

Basic flow

Column types

Text data

Tabular data

Default types

ID and timestamp

LLM evals

Regression

Classification

Multiclass

Binary

Ranking

RecSys