LLM Mocking

Mock AI and LLM API endpoints using the OpenAI-compatible format. Test chat completions, streaming responses, and WebSocket connections without spending a single token.

What is LLM Mocking?

Simulate AI model responses for deterministic, cost-free development and testing.

dotMock can mock LLM and AI model APIs using the OpenAI-compatible format. Instead of calling a real model, you define fixtures -- each fixture contains match rules (conditions that incoming requests are tested against) and a response (the mock output to return when matched).

When a request arrives, the mock server evaluates your fixtures in order and returns the response from the first match. This first-match-wins approach gives you full control over which responses are served for specific prompts, models, or parameters.

Example: A fixture matching model: "your-model" with a message containing "Hello" returns a predefined assistant response -- instantly, with zero latency and zero cost.

Supported Endpoints

Four endpoints covering the full range of OpenAI-compatible interactions.

Chat Completions

POST /v1/chat/completions
HTTP
SSE

OpenAI Chat Completions API. Supports both synchronous JSON responses and server-sent event streaming.

Responses API

POST /v1/responses
HTTP
SSE

OpenAI Responses API for structured output. Returns complete response objects with streaming support.

Responses WebSocket

WS /v1/responses
WebSocket

WebSocket transport for the Responses API. Push events to connected clients in real time.

Bidirectional Chat

WS /v1/chat
WebSocket

Full-duplex WebSocket chat. Send and receive messages over a persistent connection for real-time AI interactions.

Getting Started

Create your first LLM mock API in four steps.

1

Click "New API" and select "LLM API"

Choose the LLM API type from the creation dialog to get pre-configured OpenAI-compatible endpoints.

2

Enter a name and subdomain

Pick a descriptive name and a unique subdomain. Your mock will be available at your-subdomain.mock.rest.

3

Create your first fixture with match rules

Define which requests to intercept (by model, message content, or other fields) and what response to return.

4

Test with curl or the built-in test panel

Send a request to your mock endpoint and verify the response matches your fixture configuration.

Quick Test

Try your LLM mock with these curl examples.

Non-Streaming Request

curl -X POST https://your-api.mock.rest/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-model",
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }'

Streaming Request (SSE)

curl -X POST https://your-api.mock.rest/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-model",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }'

How It Works

The request flow from client to mock response.

Client
Sends request to your mock subdomain
Mock Server
Receives and parses the OpenAI-format request
Fixture Matcher
Evaluates match rules in order (first match wins)
JSON
SSE Stream
WebSocket

Use Cases

Common scenarios where LLM mocking accelerates development.

Frontend Development

Build and iterate on AI-powered UIs without real API costs. Get instant, predictable responses while designing chat interfaces, streaming indicators, and error states.

CI/CD Testing

Run integration tests with deterministic LLM responses. Eliminate flaky tests caused by non-deterministic model output and avoid per-token charges in your test pipeline.

Demo & Prototyping

Showcase AI features in demos and prototypes before your backend integration is complete. Control exactly what the "AI" says for a polished, repeatable presentation.

Error Simulation

Test how your application handles timeouts, rate limits, 429 errors, malformed responses, and partial streaming failures -- all without waiting for real failures to occur.

Tips

Drop-in replacement: Point your OpenAI SDK base URL to your dotMock subdomain -- no other code changes needed.

Fixture ordering matters: The mock server checks fixtures top-to-bottom and returns the first match. Put more specific rules above general catch-all rules.

Streaming fidelity: SSE mock responses use the same chunked event format as real OpenAI streams, so your client-side parsing code works unchanged.

WebSocket testing: Use the built-in test panel to send WebSocket frames and verify the event sequence your client will receive.