Sunday, November 2, 2025

DeepEval

DeepEval helps to test and verify the correctness of LLMs. DeepEval is a framework with a suite of Metrics, Synthetic Data generation having integrations across all leading AI/ ML libraries. 

DeepEval can be used to set-up one LLM to judge the output of another LLM. This JudgeLLM set-up can be used at both the training as well as live inference stage for MLOps scenarios.

Getting started with DeepEval is simple with Ollama

(I) Installation

    pip install deepeval

Ollama installation was covered previously with a llama3.2 base model. 

(II) Set Ollama model in DeepEval

# Unset the openai model - default for DeepEval     

deepeval unset-openai

# Set ollama model for DeepEval 

deepeval set-ollama "llama3.2:1b" --base-url="http://localhost:11434"  

(III) JudgeLlmDeepEval.py using DeepEval with Ollama 

# Set up ollama model

model = OllamaModel(
  model="llama3.2:1b",
  base_url="http://localhost:11434",
  ..

# Set up evaluation metrics 

correctness_metric = GEval(
    name="Correctness",
    criteria="Determine whether the actual output is factually correct based on the ...    ...

 answer_relevancy = AnswerRelevancyMetric(threshold=0.5, model=model) 

# define the test case

test_case_maths = LLMTestCase(
    input="what is 80 in words? using only 1 word.",
    actual_output="eighty",
    expected_output="eighty"

 # Run the evaluation

evaluate(test_cases=[test_case_maths], metrics=[answer_relevancy]) 

(IV) Run JudgeLlmDeepEval.py

 deepeval test run JudgeLLM.py 

 

No comments:

Post a Comment