Deterministic evals
Programmatic and heuristics-based evaluations.
Pattern match
Check for general pattern matching.
Name | Description | Parameters |
---|
ExactMatch() | - Checks if the column contents matches between two provided columns.
- Returns True/False for every input.
- Example:
ExactMatch(columns=["answer", "target"])
| Required: Optional: |
RegExp() | - Matches the text against a set regular expression.
- Returns True/False for every input.
- Example:
RegExp(reg_exp=r"^I")
| Required: Optional: |
BeginsWith() | - Checks if the text starts with a given combination.
- Returns True/False for every input.
- Example:
BeginsWith(prefix="How")
| Required: Optional: alias -
case_sensitive = True or False
|
EndsWith() | - Checks if the text ends with a given combination.
- Returns True/False for every input.
- Example:
EndsWith(suffix="Thank you." )
| Required: Optional: alias case_sensitive = True or False
|
Content checks
Verify presence of specific words, items or components.
Name | Description | Parameters |
---|
Contains() | - Checks if the text contains any or all specified items (e.g., competitor names).
- Returns True/False for every input.
- Example:
Contains(items=["chatgpt"])
| Required: Optional: alias -
mode = any or all -
case_sensitive = True or False
|
DoesNotContain() | - Checks if the text does not contain the specified items (e.g., forbidden expressions).
- Returns True/False for every input.
- Example:
DoesNotContain(items=["as a large language model"])
| Required: Optional: alias mode = all -
case_sensitive = True or False
|
IncludesWords() | - Checks if the text includes any or all specified words.
- Considers only vocabulary words.
- Returns True/False for every input.
- Example:
IncludesWords(words_list=['booking', 'hotel', 'flight'])
| Required: Optional: alias -
mode = any or all -
lemmatize = True or False
|
ExcludesWords() | - Checks if the texts excludes all specified words (e.g. profanity lists).
- Considers only vocabulary words.
- Returns True/False for every input.
- Example:
ExcludesWords(words_list=['buy', 'sell', 'bet'])
| Required: Optional: alias -
mode = all -
lemmatize = True or False
|
ItemMatch() | - Checks if the text contains any or all specified items.
- The item list is specific to each row and provided in a separate column.
- Returns True/False for each row.
- Example:
ItemMatch(["Answer", "Expected_items"])
| Required: Optional: -
alias -
mode = all or any -
case_sensitive = True or False
|
ItemNoMatch() | - Checks if the text excludes all specified items.
- The item list is specific to each row and provided in a separate column.
- Returns True/False for each row.
- Example:
ItemMatch(["Answer", "Forbidden_items"])
| Required: Optional: alias -
mode = all case_sensitive = True or False
|
WordMatch() | - Checks if the text includes any or all specified words.
- Word list is specific to each row and provided in a separate column.
- Considers only vocabulary words.
- Returns True/False for every input.
- Example:
WordMatch(["Answer", "Expected_words"]
| Required:Optional: alias -
mode = any or all -
lemmatize = True or False
|
WordNoMatch() | - Checks if the text excludes all specified words.
- Word list is specific to each row and provided in a separate column.
- Considers only vocabulary words.
- Returns True/False for every input.
- Example:
WordNoMatch(["Answer", "Forbidden_words"]
| Required: Optional: alias -
mode = all -
lemmatize = True or False
|
ContainsLink() | - Checks if the column contains at least one valid URL.
- Returns True/False for each row.
| Optional: |
Syntax validation
Validate structured data formats or code syntax.
Name | Description | Parameters |
---|
IsValidJSON() | - Checks if the column contains a valid JSON.
- Returns True/False for every input.
| Optional: |
JSONSchemaMatch() | - Checks if the column contains a valid JSON object matching the expected schema: all keys are present and values are not
None . - Exact match mode checks no extra keys are present.
- Optional type validation for each key.
- Returns True/False for each input.
- Example:
JSONSchemaMatch(expected_schema={"name": str, "age": int}, exact_match=False, validate_types=True)
| Required: expected_schema: Dict[str, type] Optional: -
exact_match = True or False -
validate_types = True or False
|
JSONMatch() | - Checks if the column contains a valid JSON object matching a JSON provided in a reference column.
- Matches key-value pairs irrespective of order.
- Whitespace outside of the actual values (e.g., spaces or newlines) is ignored.
- Returns True/False for every input.
- Example:
JSONMatch(first_column="Json1", second_column="Json2"),
| Required: first_column second_column Optional: |
IsValidPython() | - Checks if the column contains valid Python code without syntax errors.
- Returns True/False for every input.
| Optional: |
IsValidSQL() | - Checks if the column contains a valid SQL query without executing the query.
- Returns True/False for every input.
| Optional: |
Text stats
Descriptive text statistics.
Name | Descriptor | Parameters |
---|
TextLength() | - Measures the length of the text in symbols.
- Returns an absolute number.
| Optional: |
OOVWordsPercentage() | - Calculates the percentage of out-of-vocabulary words based on imported NLTK vocabulary.
- Returns a score on a scale: 0 to 100.
| Optional: alias ignore_words: Tuple = ()
|
NonLetterCharacterPercentage() | - Calculates the percentage of non-letter characters.
- Returns a score on a scale: 0 to 100.
| Optional: |
SentenceCount() | - Counts the number of sentences in the text.
- Returns an absolute number.
| Optional: |
WordCount() | - Counts the number of words in the text.
- Returns an absolute number.
| Optional: |
Custom
Implement your own programmatic checks.
Name | Descriptor | Parameters |
---|
CustomDescriptor() | - Implements a custom check for specific column(s) as a Python function.
- Use it to run your own programmatic checks.
- Returns score and/or label as specified.
- Can accept and return multiple columns.
| Optional: See how to add a custom descriptor. |
CustomColumnsDescriptor() | - Implements a custom check as a Python function that can be applied to any column in the dataset.
- Use it to run your own programmatic checks.
- Returns score and/or label as specified.
- Accepts and returns a single column.
| Optional: See how to add a custom descriptor. |
LLM-based evals
Use external LLMs with an evaluation prompt.
Name | Descriptor | Parameters |
---|
LLMEval() (Custom) | - Scores the text using user-defined criteria.
- You must specify provider, model and prompt template, and fill in the template (criteria, category names, etc).
- Returns score and/or label as specified.
| Optional: |
DeclineLLMEval() | - Detects texts containing a refusal or rejection.
- Returns a label (DECLINE or OK) or score.
| Optional: |
PIILLMEval() | - Detects texts containing PII (Personally Identifiable Information).
- Returns a label (PII or OK) or score.
| Optional: |
NegativityLLMEval() | - Detects negative texts.
- Returns a label (NEGATIVE or POSITIVE) or score.
| Optional: |
BiasLLMEval() | - Detects biased texts.
- Returns a label (BIAS or OK) or score.
| Optional: |
ToxicityLLMEval() | - Detects toxic texts.
- Returns a label (TOXICITY or OK) or score.
| Optional: |
ContextQualityLLMEval() | - Evaluates if CONTEXT is VALID (provides sufficient information to answer the question) or INVALID.
- Returns a label (VALID or INVALID) or score.
| Optional: alias model - Run over the
context column and pass the question column as a parameter. - See LLM judge parameters.
|
ML-based evals
Use pre-trained machine learning or embedding models.
Name | Descriptor | Parameters |
---|
SemanticSimilarity() | - Calculates pairwise semantic similarity (Cosine Similarity) between two columns using a sentence embeddings model
all-MiniLM-L6-v2 . - Returns a score from 0 to 1: (0: different, 0.5: unrelated, 1: identical)
- Example use:
SemanticSimilarity(columns=["Question", "Answer"]) .
| Required: Optional: |
BERTScore() | - Calculates similarity between two text columns based on token embeddings.
- Returns BERTScore (F1 Score).
- Example use:
BERTScore(columns=["Answer", "Target"]) .
| Required: Optional: |
Sentiment() | - Analyzes text sentiment using a word-based model.
- Returns a score: -1 (negative) to 1 (positive).
| Optional: |
HuggingFace() | | Optional: |
HuggingFaceToxicity() | - Detects hate speech using a
roberta-hate-speech model. - Returns predicted probability for the “hate” label. Scale: 0 to 1.
| Optional: toxic_label (default: hate )alias - See docs.
|