Text Evals
Overview of the Text Evals Preset.
To run this Report, first compute descriptors
and add them to your Dataset. Check how.
Report. To run a Preset on your data for a single current
dataset:
Test Suite. To add pass/fail data quality Tests, auto-generated from ref
dataset:
Overview
The TextEvals
is a utility Preset that lets you immediately summarize the results of all descriptors (output-level text evaluations) that you computed on your dataset.
It lets you visually explore distributions and capture all relevant statistics at once: they will vary based on descriptor type. If you pass two datasets, you’ll get a side-by-side comparison.
How text and LLM evaluations work. Read about Descriptors, or try a Quickstart.
Test Suite. If you choose to enable Tests, you will get an additional Test Suite view.
-
Based on reference dataset. If the reference dataset is provided, conditions like expected descriptor values are derived directly from it.
-
Based on heuristics. If there is no reference, some data quality Tests will run with heuristics (like expect no missing values).
How Tests work. Read about Tests and check defaults for each Test in the reference table.
Use case
You can use this Preset in different scenarios.
-
LLM experiments. Get a visual Report to explore your evaluation results as you experiment on prompts, model version, etc. and compare different runs between them.
-
LLM observability. Run evaluations on your production data and capture the resulting statistics to track them over time.
Data requirements
-
Input dataset with descriptors. Dataset with computed descriptors (check how).
-
One or two datasets. Pass a single dataset or two for comparison or to auto-generate test conditions.
Data schema mapping. Use the data definition to map your input data.
Report customization
You have multiple customization options.
Select descriptors. Get stats only for some descriptors in the Dataset. Use the columns
parameter.
Customize or set Test conditions. Add your own Test conditions, for example, to get a fail if texts are out of the specified Length Range. Check a Quickstart example.
Modify Report composition. Add other Metrics to the Report to get a more comprehensive evaluation. For example:
-
Correlations. Add correlations heatmap to see if some descriptor values are connected to others (for example, if certain metrics are always aligned, you may not need them both). You can also notice patterns like whether descriptor values are connected with any metadata present in the Dataset, like the model type used.
-
Data drift. Compute data drift to compare descriptor distributions between two datasets.