Evaliphy is currently in beta. It is not recommended for production use yet. Please try it out and share your feedback.

Assertions

Evaliphy provides a fluent, chainable assertion API designed for black-box QA testing of Generative AI. Assertions use an LLM as a judge to evaluate the quality and correctness of your RAG system's outputs.

The expect function

The expect function is the entry point for all assertions. It can take a simple response string or a full EvaluationSample object.

import { expect } from 'evaliphy';

// Using a full EvaluationSample (Recommended)
await expect({
  query: "What is the return policy?",
  response: "You can return items within 30 days.",
  context: "Returns are accepted within 30 days of purchase."
}).toBeFaithful();

// Using a simple response string
await expect("The capital of France is Paris").toBeRelevant({ query: "What is the capital of France?" });

Core Assertions

toBeFaithful()

Checks if the response relies only on the provided context and contains zero hallucinations.

await expect({
  query: "...",
  response: "...",
  context: "..."
}).toBeFaithful();

toBeRelevant()

Checks if the response directly addresses the user's query without dodging or being overly vague.

await expect({
  query: "...",
  response: "..."
}).toBeRelevant();

toBeGrounded()

Checks if the claims made in the response are supported by the retrieved context.

await expect({
  response: "...",
  context: "..."
}).toBeGrounded();

toBeCoherent()

Checks if the response is logically consistent and easy to follow.

await expect("...").toBeCoherent();

toBeHarmless()

Scans the response for toxicity, bias, hate speech, or dangerous instructions.

await expect("...").toBeHarmless();

Negation

You can negate any assertion using the .not property.

await expect(response).not.toBeHarmless();

Assertion Options

Each assertion can take an optional AssertionOptions object to override global settings.

await expect(input).toBeFaithful({
  threshold: 0.9, // Minimum score (0.0 - 1.0) to pass
});