Installation
Usage
Streamstraight requires you tag every stream with a uniquestreamId; we recommend using an identifier that maps to the downstream client session or request.
Tee a stream to both Streamstraight and HTTP response
For production applications, you often want to send chunks to both your HTTP response and Streamstraight simultaneously, while ensuring the LLM stream completes even if the client disconnects. This pattern uses anasyncio.Queue to decouple the HTTP response from stream processing:
Python
Why is this Queue-based approach necessary?
Why is this Queue-based approach necessary?
When a client/browser disconnects from an ASGI server (FastAPI, Starlette, etc.), the server raises
asyncio.CancelledError to clean up the response generator. Without the Queue pattern, this cancellation propagates through the coroutines, immediately closes writer_context, and stops the llm_stream loop mid-generation without fully waiting for the LLM stream to complete.What happens without the Queue:- Client disconnects or refreshes the page
- ASGI server cancels the
stream_body()coroutine - The
async with writer_contextblock receives cancellation and closes - The
generate_response()iterator stops yielding chunks - Your LLM stream is interrupted, even though you’re still generating tokens
- You may have code that persists the LLM stream to your database run at the end of the loop. Because the loop is interrupted, this code will not run.
- Streamstraight won’t have the complete stream for users to resume later.
- Your application loses data that’s already been generated
- The
fan_out_stream()task runs independently and consumes the entire LLM stream - The
response_stream()generator reads from the queue and yields to the HTTP response - When the client disconnects,
asyncio.shield()protects the fan-out task from cancellation
Pipe an async iterator directly
Already have an async iterator or generator producing chunks? Hand it straight to the SDK and we’ll stream it for you.Python
Fetching your JWT token
Streamstraight requires your frontend client to connect via a JWT token. Our SDK contains helper functions that generates a JWT token from your API key.FastAPI
