New TIL: How streaming LLM APIs work

Simon Willison

I put together some notes after poking around with the OpenAI, Anthropic and Google Gemini streaming APIs

How streaming LLM APIs work

I decided to have a poke around and see if I could figure out how the HTTP streaming APIs from the various hosted LLM providers actually worked. Here are my notes so far.

(til.simonwillison.net)

velaia

@simon Great post, Simon.

Do you have any idea why all 3 providers use POST and not GET that would work with the EventSource API?

Simon Willison

@velaia my guess is that OpenAI did that first because they were worried prompts would be too long to send over GET, then everyone else followed their lead

Simon Willison

Updated my TIL with example JavaScript code for streaming events from a fetch() POST API (using an async iterator function) https://til.simonwillison.net/llms/streaming-llm-apis#user-content-bonus--2-processing-streaming-events-in-javascript-with-fetch