To run a check not available in Evidently, you can implement it as a custom function. Use this for building your own programmatic evaluators.

You can also customize existing evals with parameters, such as defining custom LLM judges or using regex-based metrics like Contains for word lists. See available descriptors.

Pre-requisites:

Imports

import pandas as pd

from evidently.future.datasets import Dataset
from evidently.future.datasets import DataDefinition
from evidently.future.datasets import DatasetColumn
from evidently.future.datasets import Descriptor

Single column check

You can define a CustomColumnDescriptor that will:

  • take any column from your dataset to evaluate each value inside it

  • return a single column with numerical (num) scores or categorical (cat) labels.

Implement it as a Python function that takes a Pandas Series as input and return a transformed Series. For example, to check if the column is empty:

def is_empty(data: DatasetColumn) -> DatasetColumn:
    return DatasetColumn(
        type="cat",
        data=pd.Series([
            "EMPTY" if val == "" else "NON EMPTY"
            for val in data.data]))

To use this descriptor on your data:

eval_df.add_descriptors(descriptors=[
    CustomColumnDescriptor("answer", is_empty, alias="is_empty"),
])

Publish to a dataframe:

eval_df.as_dataframe()

Multi-column check

You can alternatively define a CustomDescriptor that:

  • Takes one or many named columns from your dataset,

  • Returns one or many transformed columns.

Pairwise evaluation. For example, to check exact match between target_answer and answer columns, and return a label:

def exact_match(dataset: Dataset) -> DatasetColumn:
    return DatasetColumn(
        type="cat",
        data=pd.Series([
            "MATCH" if val else "MISMATCH"
            for val in dataset.column("target_answer").data
            == dataset.column("answer").data]))

To use this descriptor on your data:

eval_df.add_descriptors(descriptors=[
    CustomDescriptor(exact_match, alias="exact"),
])

Multiple scores. You can also use CustomDescriptor to run evals for multiple columns and return multiple scores.

As a fun example, let’s reverse all words in the question and answer columns:

from typing import Union, Dict

def reverse_text(dataset: Dataset) -> Union[DatasetColumn, Dict[str, DatasetColumn]]:
    return {
        "reversed_question": DatasetColumn(
            type="cat",
            data=pd.Series([
                value[::-1] for value in dataset.column("question").data])),
        "reversed_answer": DatasetColumn(
            type="cat",
            data=pd.Series([
                value[::-1] for value in dataset.column("answer").data]))}

To use this descriptor on your data:

eval_df.add_descriptors(descriptors=[
    CustomDescriptor(reverse_text),
])