For an intro, read about Core Concepts and check the LLM Quickstart.

Deterministic evals

Programmatic and heuristics-based evaluations.

Pattern match

Check for general pattern matching.

NameDescriptionParameters
ExactMatch()
  • Checks if the column contents matches between two provided columns.
  • Returns True/False for every input.
  • Example: ExactMatch(columns=["answer", "target"])
Required:
  • columns
Optional:
  • alias
RegExp()
  • Matches the text against a set regular expression.
  • Returns True/False for every input.
  • Example: RegExp(reg_exp=r"^I")
Required:
  • reg_exp
Optional:
  • alias
BeginsWith()
  • Checks if the text starts with a given combination.
  • Returns True/False for every input.
  • Example: BeginsWith(prefix="How")
Required:
  • prefix
Optional:
  • alias
  • case_sensitive = True or False
EndsWith()
  • Checks if the text ends with a given combination.
  • Returns True/False for every input.
  • Example: EndsWith(suffix="Thank you.")
Required:
  • suffix
Optional:
  • alias
  • case_sensitive = True or False

Content checks

Verify presence of specific words, items or components.

NameDescriptionParameters
Contains()
  • Checks if the text contains any or all specified items (e.g., competitor names).
  • Returns True/False for every input.
  • Example: Contains(items=["chatgpt"])
Required:
  • items: List[str]
Optional:
  • alias
  • mode = any or all
  • case_sensitive = True or False
DoesNotContain()
  • Checks if the text does not contain the specified items (e.g., forbidden expressions).
  • Returns True/False for every input.
  • Example: DoesNotContain(items=["as a large language model"])
Required:
  • items: List[str]
Optional:
  • alias
  • mode = all
  • case_sensitive = True or False
IncludesWords()
  • Checks if the text includes any or all specified words.
  • Considers only vocabulary words.
  • Returns True/False for every input.
  • Example: IncludesWords(words_list=['booking', 'hotel', 'flight'])
Required:
  • words_list: List[str]
Optional:
  • alias
  • mode = any or all
  • lemmatize = True or False
ExcludesWords()
  • Checks if the texts excludes all specified words (e.g. profanity lists).
  • Considers only vocabulary words.
  • Returns True/False for every input.
  • Example: ExcludesWords(words_list=['buy', 'sell', 'bet'])
Required:
  • words_list: List[str]
Optional:
  • alias
  • mode = all
  • lemmatize = True or False
ItemMatch()
  • Checks if the text contains any or all specified items.
  • The item list is specific to each row and provided in a separate column.
  • Returns True/False for each row.
  • Example: ItemMatch(["Answer", "Expected_items"])
Required:
  • columns
Optional:
  • alias
  • mode = all or any
  • case_sensitive = True or False
ItemNoMatch()
  • Checks if the text excludes all specified items.
  • The item list is specific to each row and provided in a separate column.
  • Returns True/False for each row.
  • Example: ItemMatch(["Answer", "Forbidden_items"])
Required:
  • columns
Optional:
  • alias
  • mode = all
  • case_sensitive = True or False
WordMatch()
  • Checks if the text includes any or all specified words.
  • Word list is specific to each row and provided in a separate column.
  • Considers only vocabulary words.
  • Returns True/False for every input.
  • Example: WordMatch(["Answer", "Expected_words"]
Required:
  • columns
Optional:
  • alias
  • mode = any or all
  • lemmatize = True or False
WordNoMatch()
  • Checks if the text excludes all specified words.
  • Word list is specific to each row and provided in a separate column.
  • Considers only vocabulary words.
  • Returns True/False for every input.
  • Example: WordNoMatch(["Answer", "Forbidden_words"]
Required:
  • columnsstr
Optional:
  • alias
  • mode = all
  • lemmatize = True or False
ContainsLink()
  • Checks if the column contains at least one valid URL.
  • Returns True/False for each row.
Optional:
  • alias

Syntax validation

Validate structured data formats or code syntax.

NameDescriptionParameters
IsValidJSON()
  • Checks if the column contains a valid JSON.
  • Returns True/False for every input.
Optional:
  • alias
JSONSchemaMatch()
  • Checks if the column contains a valid JSON object matching the expected schema: all keys are present and values are not None.
  • Exact match mode checks no extra keys are present.
  • Optional type validation for each key.
  • Returns True/False for each input.
  • Example: JSONSchemaMatch(expected_schema={"name": str, "age": int}, exact_match=False, validate_types=True)
Required:
  • expected_schema: Dict[str, type]
Optional:
  • exact_match = True or False
  • validate_types = True or False
JSONMatch()
  • Checks if the column contains a valid JSON object matching a JSON provided in a reference column.
  • Matches key-value pairs irrespective of order.
  • Whitespace outside of the actual values (e.g., spaces or newlines) is ignored.
  • Returns True/False for every input.
  • Example: JSONMatch(first_column="Json1", second_column="Json2"),
Required:
  • first_column
  • second_column
Optional:
  • alias
IsValidPython()
  • Checks if the column contains valid Python code without syntax errors.
  • Returns True/False for every input.
Optional:
  • alias
IsValidSQL()
  • Checks if the column contains a valid SQL query without executing the query.
  • Returns True/False for every input.
Optional:
  • alias

Text stats

Descriptive text statistics.

NameDescriptorParameters
TextLength()
  • Measures the length of the text in symbols.
  • Returns an absolute number.
Optional:
  • alias
OOVWordsPercentage()
  • Calculates the percentage of out-of-vocabulary words based on imported NLTK vocabulary.
  • Returns a score on a scale: 0 to 100.
Optional:
  • alias
  • ignore_words: Tuple = ()
NonLetterCharacterPercentage()
  • Calculates the percentage of non-letter characters.
  • Returns a score on a scale: 0 to 100.
Optional:
  • alias
SentenceCount()
  • Counts the number of sentences in the text.
  • Returns an absolute number.
Optional:
  • alias
WordCount()
  • Counts the number of words in the text.
  • Returns an absolute number.
Optional:
  • alias

Custom

Implement your own programmatic checks.

NameDescriptorParameters
CustomDescriptor()
  • Implements a custom check for specific column(s) as a Python function.
  • Use it to run your own programmatic checks.
  • Returns score and/or label as specified.
  • Can accept and return multiple columns.
Optional:
  • alias
  • func: callable
See how to add a custom descriptor.
CustomColumnsDescriptor()
  • Implements a custom check as a Python function that can be applied to any column in the dataset.
  • Use it to run your own programmatic checks.
  • Returns score and/or label as specified.
  • Accepts and returns a single column.
Optional:
  • alias
  • func: callable
See how to add a custom descriptor.

LLM-based evals

Use external LLMs with an evaluation prompt.

NameDescriptorParameters
LLMEval() (Custom)
  • Scores the text using user-defined criteria.
  • You must specify provider, model and prompt template, and fill in the template (criteria, category names, etc).
  • Returns score and/or label as specified.
Optional:
DeclineLLMEval()
  • Detects texts containing a refusal or rejection.
  • Returns a label (DECLINE or OK) or score.
Optional:
PIILLMEval()
  • Detects texts containing PII (Personally Identifiable Information).
  • Returns a label (PII or OK) or score.
Optional:
NegativityLLMEval()
  • Detects negative texts.
  • Returns a label (NEGATIVE or POSITIVE) or score.
Optional:
BiasLLMEval()
  • Detects biased texts.
  • Returns a label (BIAS or OK) or score.
Optional:
ToxicityLLMEval()
  • Detects toxic texts.
  • Returns a label (TOXICITY or OK) or score.
Optional:
ContextQualityLLMEval()
  • Evaluates if CONTEXT is VALID (provides sufficient information to answer the question) or INVALID.
  • Returns a label (VALID or INVALID) or score.
Optional:
  • alias
  • model
  • Run over the context column and pass the question column as a parameter.
  • See LLM judge parameters.

ML-based evals

Use pre-trained machine learning or embedding models.

NameDescriptorParameters
SemanticSimilarity()
  • Calculates pairwise semantic similarity (Cosine Similarity) between two columns using a sentence embeddings model all-MiniLM-L6-v2.
  • Returns a score from 0 to 1: (0: different, 0.5: unrelated, 1: identical)
  • Example use: SemanticSimilarity(columns=["Question", "Answer"]).
Required:
  • columns
Optional:
  • alias
BERTScore()
  • Calculates similarity between two text columns based on token embeddings.
  • Returns BERTScore (F1 Score).
  • Example use: BERTScore(columns=["Answer", "Target"]).
Required:
  • columns
Optional:
  • model
  • tfidf_weighted
  • alias
Sentiment()
  • Analyzes text sentiment using a word-based model.
  • Returns a score: -1 (negative) to 1 (positive).
Optional:
  • alias
HuggingFace()Optional:
HuggingFaceToxicity()
  • Detects hate speech using a roberta-hate-speech model.
  • Returns predicted probability for the “hate” label. Scale: 0 to 1.
Optional:
  • toxic_label(default: hate)
  • alias
  • See docs.