“We saw a demo where it was instant.”

You get a brief. The client wants sub-second latency. No viewer complained, no metric moved, the stream is fine. But somehow your head of sales walked out of the vendor pitch having promised something he figured must be simple since their competitor already had it. Now it’s in the contract and it’s your problem.
The myths
“Low latency improves engagement.”
No published study survives scrutiny once you control for quality and reliability. Stall events hurt retention. Latency, by itself, doesn’t move the number.
“WebRTC is always better.”
Only at small scale. Past a few hundred concurrent viewers you need an SFU, then cascading SFUs, then simulcast because browsers don’t do real ABR for WebRTC. It stops being simple fast.
“The CDN is the bottleneck.”
The CDN is almost never the bottleneck. Most of your latency lives in the player buffer, which you control and the CDN doesn’t touch.
“1.2 seconds glass-to-glass.”
The number is real. It was measured in a lab, wired connection, their encoder feeding their player, no ABR ladder, no DVR, no captions, camera six inches from the screen. Add a 3-rendition ladder and a real player buffer and the same stack delivers 4 to 6 seconds in production. The marketing number and the operational number are not the same thing.
“The encoder is where the latency is.”
The encoder buffer sets you back 500ms to 2 seconds.
Network transit to the ingest point adds 100 to 500ms depending on geography.
Transcode adds 1 to 3 seconds.
HLS packaging adds 2 to 6.
CDN delivery is mostly negligible at steady state, 50 to 200ms.
The player buffer adds 2 to 30 seconds.
Optimizing the encoder while leaving the player buffer alone is solving the wrong end of the problem.
Who actually needs it
Live betting and in-play markets. Live auctions and bidding. Competitive esports where the prize pool justifies the engineering. Bilateral conversation: telehealth, synchronous tutoring, trading floor feeds where the other person is waiting on you.
That’s the list. If you’re in it, pay for it. The engineering is real and worth it.
Everything else is one-way passive consumption. The viewer watches. They don’t bid, they don’t respond, they don’t act. That’s most of the streaming market.
Latency is relative, not absolute
Ten seconds of delay is invisible if everyone in your audience is ten seconds behind. Nobody feels late. Nobody is ahead. The stream is the reference.
Broadcast TV has always run 5 to 10 seconds behind. Cable adds more. Satellite adds more still. Nobody complained for decades because there was no faster reference channel – the whole audience was behind together. Viewers feel latency when something faster exists alongside it. A Twitter feed spoiling the goal before it plays. A Discord chat reacting to a moment the video hasn’t shown yet. The neighbor’s TV cheering through the wall. The stadium roar arriving before the replay does on screen.
The problem isn’t the number. It’s the gap between your stream and the next fastest thing your viewer has access to. Close that gap and the latency disappears. Accept it and build around it.
One underused option: delay the chat to match the slowest viewer instead of lowering the video to match the chat. Twitch and YouTube both do this quietly. It costs nothing and solves the spoiler problem for most audiences without touching the streaming stack.
What you pay
Numbers below are for 1000 concurrent viewers, 720p, one hour, North America. Use them as a multiplier reference. They will change.
| Setup | Cost/hour | What’s included |
|---|---|---|
| Self-managed HLS (Bunny CDN) | ~$11 | CDN delivery only, you handle encoding |
| LL-HLS managed (Mux) | ~$50 | Encoding and delivery |
| LL-HLS managed (Cloudflare Stream) | ~$60 | Encoding and delivery |
| LL-HLS managed (AWS IVS standard) | ~$74 | Encoding and delivery |
| WebRTC via SFU (AWS IVS Real-Time) | $72+ | Scaling is handled, but cost per viewer stays high |
The 5 to 6x gap between self-managed HLS and a managed LL-HLS platform is partly the low-latency premium and partly the managed service premium. Separating the two is hard because no major vendor sells unmanaged LL-HLS delivery at Bunny prices.
WebRTC looks comparable on paper. It isn’t in practice. Because scaling it is painful and not straightforward. Managed services like AWS IVS Real-Time handle that for you, which is what the price reflects.
The mobile brittleness
Tight player buffers are what get you to 1 to 2 seconds on a good connection. On 4G or 5G with variable RTT, the same buffer causes stalls. A viewer on WiFi gets a faster stream. A viewer on cellular gets a broken one. If your audience skews mobile, your audience-weighted experience can be worse on a low-latency deployment than on a standard 6 to 10 second buffer.
The metric you’re probably watching is P50 latency (the median viewer experience). That’s the wrong number. The viewer you lose is the P10 (the bottom 10% – weakest connections, worst conditions) who stalls every 30 seconds. A deployment that improves P50 from 6 seconds to 2 while making P10 rebuffer constantly is a worse product. Instrument your player for stall events and rebuffer ratio, not latency averages.
It gets worse
Building for low latency narrows the stack. Specific ingest formats, specific encoders, specific players, specific CDNs. Every subsequent decision has to respect the latency constraint, including ones that have nothing to do with latency.
DVR is the one that catches people off guard. Standard HLS gives you free DVR because the segments already exist on the CDN. WebRTC has no segments – you need a parallel recording pipeline. LL-HLS keeps segments but partial segments complicate cleanup. The free DVR most operators rely on goes away the moment you pick a low-latency stack.
Why it became a selling point
CDN bandwidth pricing has been collapsing since 2018 and keeps dropping every year. Delivery costs fell, differentiation blurred, “we stream video reliably” stopped being a sellable line.
Latency was a fancy new premium. Technically hard enough to justify the price, easy to demo with a stopwatch on stage; impossible to disprove on a slide because the numbers are real, they’re just measured in conditions that don’t exist in your production environment.
That’s why every CDN suddenly has an ultra-low-latency tier. The feature is real. The widespread demand it claims to address is mostly manufactured.
Should you bother?
Five questions. If you answer yes to most, you’re in the low-latency cohort.
- Does your viewer act on what they see within seconds – bid, bet, respond?
- Is there a faster channel they’re comparing against: a second screen, a chat, a neighbor?
- Does your audience watch primarily on desktop or TV?
- Would delaying the chat break the experience?
- Are you willing to accept a narrower stack, higher ops cost, and no free DVR?
The $0 closer
Low latency is worth every penny when the use case demands it. The problem is that most of the industry decided the use case was universal. It isn’t.