Custom HuggingFace evaluator
How to use models from HuggingFace as evaluators.
You can score your text using ML models from HuggingFace. This lets you apply any criteria from the source model, e.g. classify texts by emotion. There are:
-
Ready-to-use descriptors that wrap a specific model,
-
A general interface to call other suitable models you select.
Pre-requisites:
- You know how to use descriptors to evaluate text data.
Imports
Built-in ML evals
Available descriptors. Check all available built-in LLM evals in the reference table.
There are built-in evaluators for some models. You can call them like any other descriptor:
Custom ML evals
You can also add any custom checks directly as a Python function.
Alternatively, use the general HuggingFace()
descriptor to call a specific named model. The model you use must return a numerical score or a category for each text in a column.
For example, to evaluate “curiousity” expressed in a text:
Call the result as usual:
Example output:
Sample models
Here are some models you can call using the HuggingFace()
descriptor.
Model | Example use | Parameters |
---|---|---|
Emotion classification
| HuggingFace("response", model="SamLowe/roberta-base-go_emotions", params={"label": "disappointment"}, alias="disappointment") | Required:
|
Zero-shot classification
| HuggingFace("response", model="MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli", params={"labels": ["science", "physics"], "threshold":0.5}, alias="Topic") | Required:
|
GPT-2 text detection
| HuggingFace("response", model="openai-community/roberta-base-openai-detector", params={"score_threshold": 0.7}, alias="fake") | Optional:
|
This list is not exhaustive, and the Descriptor may support other models published on Hugging Face. The implemented interface generally works for models that:
-
Output a single number (e.g., predicted score for a label) or a label, not an array of values.
-
Can process raw text input directly.
-
Name labels using
label
orlabels
fields. -
Use methods named
predict
orpredict_proba
for scoring.
However, since each model is implemented differently, we cannot provide a complete list of models with a compatible interface. We suggest testing the implementation on your own using trial and error. If you discover useful models, feel free to share them with the community in Discord. You can also open an issue on GitHub to request support for a specific model.