Streaming LLM Tokens Through AWS API Gateway

The previous four posts (Part 1, Part 2, Part 3, Part 4) covered Rust on Lambda from cold starts to architectural fit. This one is a sibling rather than a sequel, focused on a specific AWS gotcha that bites anyone wiring an LLM behind API Gateway: HTTP API v2 does not support response streaming. The modern, recommended Gateway flavor is the wrong tool for streaming LLM tokens. The older REST API got streaming support in November 2025. If you reach for HTTP API v2 by reflex (and most “use the modern one” guides will tell you to), your token stream silently collapses into a single buffered response. ...

April 30, 2026 · 6 min