Streaming & WebSocket
Mock SSE streaming and real-time WebSocket connections for LLM APIs. Simulate token-by-token output, connection drops, and bidirectional messaging.
stream parameter. When stream: true, responses are delivered via Server-Sent Events (SSE). WebSocket connections support bidirectional real-time messaging.Protocol Overview
dotMock supports three delivery modes for LLM responses. The protocol is determined by the client request.
HTTP/JSON
Single JSON response returned in full. The simplest mode, ideal for testing parsers and basic integration logic.
SSE Streaming
Chunked Server-Sent Events simulating real LLM token-by-token output. Configurable chunk size and latency.
WebSocket
Bidirectional real-time messaging over a persistent connection. Full event replay with configurable delays.
SSE Streaming
Server-Sent Events deliver mock LLM responses as a stream of incremental chunks, matching the behavior of real providers.
How It Works
When the client sends stream: true, dotMock splits the fixture response content into chunks and delivers each as an SSE event. Each chunk simulates a token batch, producing the same incremental output your application receives from a real LLM provider.
Configuration Options
Fine-tune streaming behavior per fixture to match real-world conditions.
Chunk Size
Number of characters per SSE event. Default: 20.
Smaller values produce a more realistic token-by-token feel. Larger values deliver content faster with fewer events.
Latency
Delay between chunks in milliseconds. Default: 0.
Set to 50-100ms for a realistic typing effect that matches the cadence of production LLM APIs.
Truncation
Stop streaming after N chunks without sending [DONE].
Simulates connection drops, timeouts, or partial responses. Essential for testing resilience in your streaming consumer.
SSE Event Format
Chat Completions API streaming follows the OpenAI-compatible event format.
Chat Completions SSEPOST /v1/chat/completions
data: {"id":"chatcmpl-mock-...","object":"chat.completion.chunk","created":1710000000,"model":"your-model","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-mock-...","object":"chat.completion.chunk","created":1710000000,"model":"your-model","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-mock-...","object":"chat.completion.chunk","created":1710000000,"model":"your-model","choices":[{"index":0,"delta":{"content":" there"},"finish_reason":null}]}
data: {"id":"chatcmpl-mock-...","object":"chat.completion.chunk","created":1710000000,"model":"your-model","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Each event is prefixed with data: and separated by a blank line. The stream ends with data: [DONE].
Responses API SSEPOST /v1/responses
event: response.created
data: {"id":"resp_mock_...","object":"response","created":1710000000,"model":"your-model","status":"in_progress","output":[]}
event: response.output_item.added
data: {"type":"response.output_item.added","response_id":"resp_mock_...","output_index":0,"item":{"type":"message","role":"assistant"}}
event: response.content_part.added
data: {"type":"response.content_part.added","response_id":"resp_mock_...","output_index":0,"content_index":0,"part":{"type":"text","text":""}}
event: response.text.delta
data: {"type":"response.text.delta","response_id":"resp_mock_...","output_index":0,"content_index":0,"delta":"Hello"}
event: response.text.delta
data: {"type":"response.text.delta","response_id":"resp_mock_...","output_index":0,"content_index":0,"delta":" there"}
event: response.text.done
data: {"type":"response.text.done","response_id":"resp_mock_...","output_index":0,"content_index":0,"text":"Hello there"}
event: response.output_item.done
data: {"type":"response.output_item.done","response_id":"resp_mock_...","output_index":0,"item":{...}}
event: response.completed
data: {"id":"resp_mock_...","object":"response","status":"completed","output":[...]}Responses API uses named events with event: and data: fields. Seven event types bracket the text deltas, matching the OpenAI Responses API wire format.
WebSocket
Persistent bidirectional connections for real-time LLM interaction mocking.
How It Works
Connect via WebSocket to /v1/chat or /v1/responses. Send JSON messages and receive matched fixture responses as event sequences. The connection stays open for multiple exchanges.
Connection Example
const ws = new WebSocket('wss://your-api.mock.rest/v1/chat');
ws.onopen = () => {
// Messages must be wrapped in an envelope with type: "message"
ws.send(JSON.stringify({
type: 'message',
data: {
model: 'your-model',
messages: [{ role: 'user', content: 'Hello' }]
}
}));
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log(data);
};Custom Event Sequences
Fixtures can define custom WebSocket event sequences that are replayed when a matching message is received:
Each event has a type, data payload, and optional delay
Events are replayed in order with configured timing
Support for close codes and disconnect simulation
WebSocket Configuration
closeAfterAutomatically close the connection after all events are sent
closeCodeCustom WebSocket close code (e.g., 1000 for normal, 1006 for abnormal)
disconnectAfterMsDrop the connection after N milliseconds to simulate a server crash
Error Simulation
Test your application's resilience against real-world failure modes that occur with LLM APIs.
Truncated Streams
Use truncateAfterChunks to simulate partial responses. The stream stops mid-response without sending [DONE], exactly as it would during a real provider outage.
Connection Drops
Use disconnectAfterMs on WebSocket fixtures to simulate server disconnects. The connection is terminated abruptly after the configured delay.
Error Responses
Use error fixtures with status codes like 429 (rate limit) or 500 (server error) to test your error handling and retry logic.
Example: Simulating a Rate Limit Mid-Stream
Configure a fixture that streams a few chunks normally, then abruptly stops to simulate a provider rate limit during generation:
{
"response": "This is a long response that will be cut short...",
"streaming": {
"chunkSize": 15,
"latencyMs": 80,
"truncateAfterChunks": 3
}
}The client receives 3 chunks (approximately 45 characters), then the connection ends without a [DONE] signal.
Testing with curl
Quickly verify SSE streaming from the command line.
Streaming curl Request
curl -N -X POST https://your-api.mock.rest/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "your-model", "stream": true, "messages": [{"role": "user", "content": "Hello"}]}'-N flag disables output buffering so you can see SSE events arrive in real-time as they are emitted by the server.Built-in Test Panel
The fixture editor includes an integrated test panel for verifying streaming behavior without leaving the browser.
Test Panel Features
Best Practices
Use latencyMs: 50-100 for realistic streaming demos and UI testing
Test with truncateAfterChunks to ensure your app handles incomplete streams gracefully
Keep chunkSize between 10-30 characters for the most realistic token-by-token appearance
Always test both streaming and non-streaming paths in your integration tests