Streaming & WebSocket

Mock SSE streaming and real-time WebSocket connections for LLM APIs. Simulate token-by-token output, connection drops, and bidirectional messaging.

Protocol Overview

dotMock supports three delivery modes for LLM responses. The protocol is determined by the client request.

HTTP/JSON

stream: false

Single JSON response returned in full. The simplest mode, ideal for testing parsers and basic integration logic.

SSE Streaming

stream: true

Chunked Server-Sent Events simulating real LLM token-by-token output. Configurable chunk size and latency.

WebSocket

Persistent Connection

Bidirectional real-time messaging over a persistent connection. Full event replay with configurable delays.

SSE Streaming

Server-Sent Events deliver mock LLM responses as a stream of incremental chunks, matching the behavior of real providers.

How It Works

When the client sends stream: true, dotMock splits the fixture response content into chunks and delivers each as an SSE event. Each chunk simulates a token batch, producing the same incremental output your application receives from a real LLM provider.

Configuration Options

Fine-tune streaming behavior per fixture to match real-world conditions.

Chunk Size

chunkSize

Number of characters per SSE event. Default: 20.

Smaller values produce a more realistic token-by-token feel. Larger values deliver content faster with fewer events.

Latency

latencyMs

Delay between chunks in milliseconds. Default: 0.

Set to 50-100ms for a realistic typing effect that matches the cadence of production LLM APIs.

Truncation

truncateAfterChunks

Stop streaming after N chunks without sending [DONE].

Simulates connection drops, timeouts, or partial responses. Essential for testing resilience in your streaming consumer.

SSE Event Format

Chat Completions API streaming follows the OpenAI-compatible event format.

Chat Completions SSE
POST /v1/chat/completions

data: {"id":"chatcmpl-mock-...","object":"chat.completion.chunk","created":1710000000,"model":"your-model","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-mock-...","object":"chat.completion.chunk","created":1710000000,"model":"your-model","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-mock-...","object":"chat.completion.chunk","created":1710000000,"model":"your-model","choices":[{"index":0,"delta":{"content":" there"},"finish_reason":null}]}

data: {"id":"chatcmpl-mock-...","object":"chat.completion.chunk","created":1710000000,"model":"your-model","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Each event is prefixed with data: and separated by a blank line. The stream ends with data: [DONE].

Responses API SSE
POST /v1/responses

event: response.created
data: {"id":"resp_mock_...","object":"response","created":1710000000,"model":"your-model","status":"in_progress","output":[]}

event: response.output_item.added
data: {"type":"response.output_item.added","response_id":"resp_mock_...","output_index":0,"item":{"type":"message","role":"assistant"}}

event: response.content_part.added
data: {"type":"response.content_part.added","response_id":"resp_mock_...","output_index":0,"content_index":0,"part":{"type":"text","text":""}}

event: response.text.delta
data: {"type":"response.text.delta","response_id":"resp_mock_...","output_index":0,"content_index":0,"delta":"Hello"}

event: response.text.delta
data: {"type":"response.text.delta","response_id":"resp_mock_...","output_index":0,"content_index":0,"delta":" there"}

event: response.text.done
data: {"type":"response.text.done","response_id":"resp_mock_...","output_index":0,"content_index":0,"text":"Hello there"}

event: response.output_item.done
data: {"type":"response.output_item.done","response_id":"resp_mock_...","output_index":0,"item":{...}}

event: response.completed
data: {"id":"resp_mock_...","object":"response","status":"completed","output":[...]}

Responses API uses named events with event: and data: fields. Seven event types bracket the text deltas, matching the OpenAI Responses API wire format.

WebSocket

Persistent bidirectional connections for real-time LLM interaction mocking.

How It Works

Connect via WebSocket to /v1/chat or /v1/responses. Send JSON messages and receive matched fixture responses as event sequences. The connection stays open for multiple exchanges.

Connection Example

const ws = new WebSocket('wss://your-api.mock.rest/v1/chat');

ws.onopen = () => {
  // Messages must be wrapped in an envelope with type: "message"
  ws.send(JSON.stringify({
    type: 'message',
    data: {
      model: 'your-model',
      messages: [{ role: 'user', content: 'Hello' }]
    }
  }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log(data);
};

Custom Event Sequences

Fixtures can define custom WebSocket event sequences that are replayed when a matching message is received:

Each event has a type, data payload, and optional delay

Events are replayed in order with configured timing

Support for close codes and disconnect simulation

WebSocket Configuration

closeAfter

Automatically close the connection after all events are sent

closeCode

Custom WebSocket close code (e.g., 1000 for normal, 1006 for abnormal)

disconnectAfterMs

Drop the connection after N milliseconds to simulate a server crash

Error Simulation

Test your application's resilience against real-world failure modes that occur with LLM APIs.

Truncated Streams

Use truncateAfterChunks to simulate partial responses. The stream stops mid-response without sending [DONE], exactly as it would during a real provider outage.

Connection Drops

Use disconnectAfterMs on WebSocket fixtures to simulate server disconnects. The connection is terminated abruptly after the configured delay.

Error Responses

Use error fixtures with status codes like 429 (rate limit) or 500 (server error) to test your error handling and retry logic.

Example: Simulating a Rate Limit Mid-Stream

Configure a fixture that streams a few chunks normally, then abruptly stops to simulate a provider rate limit during generation:

{
  "response": "This is a long response that will be cut short...",
  "streaming": {
    "chunkSize": 15,
    "latencyMs": 80,
    "truncateAfterChunks": 3
  }
}

The client receives 3 chunks (approximately 45 characters), then the connection ends without a [DONE] signal.

Testing with curl

Quickly verify SSE streaming from the command line.

Streaming curl Request

curl -N -X POST https://your-api.mock.rest/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "your-model", "stream": true, "messages": [{"role": "user", "content": "Hello"}]}'

Built-in Test Panel

The fixture editor includes an integrated test panel for verifying streaming behavior without leaving the browser.

Test Panel Features

Real-time SSE streaming preview
WebSocket connection testing
Raw frame debug view
Protocol toggle between HTTP/SSE and WebSocket

Best Practices

Use latencyMs: 50-100 for realistic streaming demos and UI testing

Test with truncateAfterChunks to ensure your app handles incomplete streams gracefully

Keep chunkSize between 10-30 characters for the most realistic token-by-token appearance

Always test both streaming and non-streaming paths in your integration tests