Drunken teenagers

Years ago, a conference speaker pulled up a graph like this – video players all switching renditions at the same time for the wrong reason – and called the behavior “drunken teenagers”.

It stuck. I’ve been over-using it ever since – throwing it at every vaguely similar incident across platforms, to mostly confused looks. This is the explanation I never gave.

Your players are working exactly as configured

…and it’s usually the default config

As you grow, that may no longer be good enough. If it was ever good in the first place.

To name just a few ways it goes wrong:

Error handling surfaces a generic message and stops – no retry, no fallback, no reload.
Retry logic hammers the origin on a failed segment request instead of backing off.
Players switch down on a single slow segment, not a trend – one bad measurement tanks the quality.
The initial rendition is hardcoded, ignoring whatever bandwidth estimate the browser already has.
Stall recovery drops to the lowest rendition and crawls back up too slowly.
Time-to-first-frame inflated because the player preloads more than it needs.
DVR window set too short – viewers who pause for two minutes come back to a dead stream.
Heavy rendering or decoding blocks the video pipeline on low-end devices, causing stalls that look like network issues.
Buffer sized for a lab connection. Stalls on the first 4G hiccup.
Players preload the next segment in the background while the viewer is paused, burning data for no reason.

What the player actually does

The player buffers ahead. It picks a rendition. It decides what to do when things break. Three decisions, happening continuously, driven by config values you set once and forgot.

The buffer is how much video it preloads before it starts playing, and how much it tries to keep ahead. Too small and any jitter stalls the stream. Too large and you’ve added latency you didn’t need and memory pressure on cheap devices.

ABR picks the rendition. It estimates bandwidth from recent segment download times – a lagging indicator at best. On a degrading connection it’s optimistic too long. On a recovering one it’s cautious too long.

Stall recovery kicks in when the buffer runs dry. Drop to the lowest rendition and fill fast, or hold and wait. The wrong call compounds the problem.

Wrong in many different ways

The default buffer is a single number applied to every viewer regardless of where they are, what they’re watching, or what device they’re on.

On a variable cellular connection, it’s too small. One hiccup and it empties. The stall isn’t a network failure. It’s a sizing failure.

On a live stream, it’s too large. Every second of buffer is a second of latency. A buffer sized for VOD puts live viewers that many seconds behind. The stream is technically live. The experience isn’t.

On a low-end device, it’s expensive. Buffered video sits in memory. Size the buffer for a flagship and you create memory pressure on the mid-range Android most of your viewers are actually using. That pressure causes frame drops. Frame drops look like network problems. The player reacts accordingly.

A stall isn’t always a network problem. On a low-end device, it can be a decoding problem.

Hardware decoders have caps. A budget Android from two years ago might handle 720p without issue and fall apart at 1080p – not because the network can’t deliver the data, but because the chip can’t process it fast enough. The player doesn’t (always) know the difference. It sees dropped frames, reads it as a buffer problem, and switches the rendition down.

That sounds like correct behavior. It isn’t. The player switched down on false signal. When the buffer refills, it tries to climb back up. The device struggles again. You get an oscillation loop that has nothing to do with network conditions.

When defaults become contagious

A cohort of viewers starts the same stream at roughly the same time. A few things align. They’re on the same CDN point of presence. Their segments are the same size. Their ABR algorithms are running the same evaluation on the same schedule. Something goes wrong – a CDN hiccup, a keyframe boundary that triggers a quality re-evaluation, a momentary spike in origin response time. Every player in that cohort hits it at the same moment and makes the same decision.

Rendition switches in lockstep. Stalls at the same second. Recovery in waves, because they all dropped to the lowest rendition simultaneously and are climbing back at the same rate.

Rendition layers cycling in near-sync. Players all moving together for no obvious network reason. That’s what drunken teenagers looks like at scale.

Not meant to defame teenagers. Not meant to encourage underage drinking.

What you’re not measuring

Your server-side dashboard shows nothing. CDN healthy. Encoder fine. Origin responding.

The problem is a JavaScript runtime on ten thousand devices making the same bad decision simultaneously, and nothing on your server can see it.

Stall events don’t exist in infrastructure logs.

They exist in the player. If you’re not instrumenting it and shipping those events somewhere, you have no visibility into the part of the stack that actually touches your viewers.

Most teams add player analytics after something breaks badly enough to notice.

By then you’re debugging from memory and user complaints. The signal was there the whole time, unlogged.

Player SDKs vary. Most expose stall count, stall duration, current rendition. Some give buffer level on a timer. Fewer give ABR decision events. Start with what you have. Something beats nothing.

The fix

This is a config problem. You don’t need to rewrite the player, swap the SDK, or file a ticket with your vendor. You need to read the defaults and decide if they make sense for your actual viewers.

Start with buffer size.

Pick a number for live and a different number for VOD. They have opposite requirements. One config for both is already wrong.

Test stall recovery deliberately.

Throttle the connection mid-stream. Pull the network. See what happens. Most teams have never done this and have no idea what their player does when the buffer empties. The default behavior is rarely what you’d choose if you’d seen it.

Test on a real device.

Not your laptop. Not Chrome DevTools throttling. A mid-range Android on 3G or 4G. That’s the device your median viewer is on. If it stalls there, you have a problem. If you’ve never tested there, you don’t know.

The config that ships is the config that runs. At scale, on every device, for every viewer. Treat it accordingly.

Scale makes it worse

At low viewer counts, bad player behavior is noise. One viewer stalls, another doesn’t. It averages out and you never see it in aggregate.

As the audience grows, the randomness collapses. More viewers starting at the same time. More players sharing the same CDN point of presence. More ABR algorithms running the same evaluation on the same segment. Individual failures synchronize into a pattern.

The misconfiguration that was a support ticket at a thousand viewers is a platform event at a hundred thousand.

And you’ll probably blame the CDN.

You can finally run video payloads through Cloudflare, for free!

“Isn’t video against their TOS?”

For a decade, Cloudflare’s self-serve terms drew a line between HTML and everything else, and “everything else” was where your video lived. Push too many megabytes of it through the Free or Pro CDN and you got a polite email, a throttle, or a redirect. The folklore hardened into a rule everyone repeated: don’t serve video on Cloudflare unless you pay for the dedicated product or you’re on Enterprise.

In May they rewrote the rule. The old clause moved into CDN-specific terms, and the new test is where the bytes are hosted, not which plan you’re on. Video served through the CDN from a Cloudflare service (Stream, Images, or R2) is allowed on Free, Pro, and Business. Video pulled from an origin outside Cloudflare is still restricted.

So the ban didn’t die. It turned into an incentive, which points at

R2, the almost-free origin

R2 is object storage with no egress fee. Storage runs $0.015/GB per month, writes cost $4.50 a million, reads $0.36 a million, and bandwidth out is zero. On top of that the first 10 GB of storage, the first million writes, and the first ten million reads each month are free. That egress number is the point: every other object store treats bandwidth out as the meter that runs up your bill, and R2 sets it to nothing.

R2 is an origin, not a CDN. On its own it’s a bucket in a region. The arrangement that matters is R2 behind Cloudflare’s cache on a custom domain, and that arrangement is now cleared to carry video. Immutable files, meaning your HLS segments, cache at the edge and serve from there, so the reads that reach R2 are cache misses, not viewers. Storage costs pennies, egress is free, serving is allowed. For static video that ends the cost conversation.

Live streaimng on the other hand…

Ok, challenge, let’s live stream for free*

*For longer than some 30 day trial

Cloudflare handed us a free-egress origin and permission to serve video from it. What they didn’t hand us is a way to get the video in. RTMP ingest would still need the paid upgrade, and you can’t smuggle a RTMP listener into a CF Worker either: inbound to a Worker is HTTP, WebSocket, or SMTP, the socket API only dials out, and RTMP wants a connection held open for the whole broadcast. Serverless holds nothing open. So Cloudflare walks us most of the way and then waves from the shore. Thanks 😉

We take it from here with Oracle’s Always Free tier: an Arm box, 4 cores, 24 GB, 10 TB of egress a month, free for as long as the account lives. Drop nginx-rtmp on it, point the encoder at 1935, and it remuxes RTMP to HLS on disk. No transcode, so the cores stay bored. rclone pushes each segment to R2 as nginx closes it: one copy of the stream leaving the box, about 2 TB a month for a 24/7 6 Mbps channel, inside the free 10. Then R2 behind the cache serves it, and nobody ever touches your little Arm box.

The minor catch: segments cache, the live playlist doesn’t. Leave it and every viewer’s refresh hits R2; give it a 2s edge TTL and Cloudflare swallows most of the requests gracefully.

Whole thing’s up for grabs as usual: nginx config, sync script, R2 and Cloudflare setup.

Does it scale?

On the delivery side, yes, because that’s Cloudflare’s cache serving static files, which it does for a living. On the ingest side, one box is one box. Four cores will remux a surprising number of concurrent streams, but there’s no failover and the free tier gives you one region. Add viewers freely. Adding ingest redundancy means leaving the free tier.

Is it stable?

nginx-rtmp is old and boring in the way you want infrastructure to be. The soft spot is Oracle. Always Free instances get reclaimed if they idle, and free capacity in a region can dry up under you. Fine for a personal channel or a project you can babysit. Not something to hang a paid SLA on without a paid instance underneath.

How free* is it?

Ingest is free, and the Oracle box comes with a public IPv4 at no charge. A rolling live window is a couple GB, so it sits inside R2’s free 10 and storage costs nothing; the reads come off the cache, also nothing. Egress is zero on both legs that matter. The only nudge above zero is a genuinely around-the-clock channel, whose segment and playlist writes creep past R2’s free million to about a dollar a month. Stream part-time and even that disappears. The one line item that can balloon is compute: the moment you ask that box to transcode an ABR ladder instead of passing a single rendition through, you’re CPU-bound and the free cores run out. Keep it remux and the number stays near zero.

Is it worth it?

For one stream or a few, for a hobby, a community, a project that can shrug off the odd hiccup, it’s hard to beat free. For a business that needs redundancy, more than one region, and someone to page at 3am, the single box is the exact corner you can’t cut, and you may want to pay to uncut it.

Why not a media-dedicated origin?

I’ve been asked repeatedly why not/no longer using the likes of AWS/Elemental MediaStore. Awkward answer: it’s pointless since late 2020. Before that, S3 was eventually consistent, lately it’s read-after-write consistent (all we really need) and still cheaper.

R2 was strongly consistent from launch, it never had the eventual-consistency era S3 spent a decade in. Read-after-write, deletes, and list operations are all globally strongly consistent: write an object and every reader anywhere immediately sees the latest version; delete it and reads immediately return not-found; a list reflects the bucket at that point in time.

The backup stream nobody cared about

“It worked every time we checked…”

You have a backup stream. You haven’t checked it in six months, the credentials are probably expired, and the person who set it up left the company. The one time live went down it wouldn’t have made a difference as the failure was elsewhere.

What took you down usually wasn’t the thing you prepared for. Most backup streams are designed for one scenario: the encoder dies, the switch flips, life goes on. That’s the clean failure. It’s also the rare one. What actually happens in production is messier, slower, and harder to call. The stream is alive but broken. The dashboard is green. The audience is already gone.

The mentality

…we probably don’t need it

The primary never failed. Not last week, not last month, not at the big event in March. The encoder is solid. The CDN is enterprise-grade. The team knows what they’re doing.

Hard to argue with. It’s not laziness. It’s a reasonable reading of a track record that hasn’t included a failure yet.

The economics

…it’s not worth the extra money/effort

The backup stream has a cost. Second encoder, second ingest, second CDN, second everything if you do it properly. That’s a real line item on a budget that already has competing priorities.

The outage has a cost too. But it’s theoretical. It might never happen. And if it does, maybe it’s a short one, maybe viewers come back, maybe the client doesn’t notice. 😈

This is the math that kills backup streams. Not negligence. A rational calculation made by people who have never experienced the failure they’re deciding not to protect against.

Single points of failure

…the backup failed too

The backup encoder is on the same switch as the primary? The same venue WiFi? Pushing to the same ingest endpoint, just different stream key? When the network goes down, both go down. When the ingest malfunctions, both fail. The backup existed. It was wired to the same failure.

A backup that fails the same way as the primary isn’t a backup. But…

True separation at every layer (encoder, network, ingest, origin, CDN) is complex to build and expensive to run. Most operations can’t justify it and don’t need it.
The gaps are rarely invisible. Most engineers can tell you where the stack is fragile. The backup just never got prioritized over everything else that needed building.

Operational readiness

…somebody knows how to switch, I think

The backup stream is configured. The runbook exists somewhere. The vendor console has a button for it. But the one person who knows how it works is not on call tonight. The doc is three product iterations old. Nobody has actually switched to the backup since the initial test months ago. The failure doesn’t wait for the right person to be available.

Automated failover is worse in a sneaky way. You stop thinking about it entirely. The script will handle it. Except the detection logic was written for a clean failure, not a half-dead stream that’s technically alive. The threshold never triggers. The switch never happens. And the humans who could override it manually are asleep, unreachable, or no longer with the company.

What does backup even mean

…we have redundancy

Backup encoder? Backup ingest? Backup origin? Backup CDN? Backup player URL? Each one protects against a different failure. Most teams pick one, check the box, and call it redundancy.

A backup encoder does nothing when the CDN degrades. A backup CDN does nothing when the origin serves a stale manifest. A backup ingest does nothing when the venue network goes down. The stack has layers and each layer can fail on its own.

Knowing which layer to back up requires knowing which layer is most likely to fail for your specific setup. That answer is different for a stadium broadcast, a remote satellite uplink, and a cloud-only pipeline. There is no universal answer. There is just the layer you skipped.

Select your redundancy plan

I’ve been wanting to lay this out as a diagram for a while. Clients always assume backup is a yes/no question. It’s not. Here are your options.

0. No real backup

Cheap until it is not. One failure anywhere takes the stream down. No recovery path, no fallback. A deliberate risk acceptance, not a resilience strategy.

1. Restart-and-recover

Someone notices, someone restarts. No parallel path, just a documented procedure and an operator who knows what to do. About five minutes of downtime if everything goes right. Longer if the right person is not reachable.

2. Fire-exit backup

A separate, deliberately degraded path. Different encoder, different network, different ingest endpoint. Not a mirror – lower quality, simpler setup – but genuinely independent. When the primary goes down, viewers stay on air within about a minute. This is the minimum viable backup for anything with a real audience.

3. Mirrored backup

A full-quality duplicate of the primary path. Separate encoder, separate ingest, separate origin. The CDN is typically still shared, which remains a single point. Recovery is faster and the experience is seamless, but the cost is substantially higher.

4. Full active-active

Both paths running simultaneously. Automated failover routes to the best available path. Viewers never see a switch because there is no switch – continuity is built into the architecture. The right choice when downtime is not an option and the budget reflects that.

Let’s gamify it 🙂

Of all, economics hurts the most, we know it.

This tool gives you a hint of what option you should deploy depending on what you’re doing and how much it costs to fail. Source is here.

Live captions: rent by the minute or buy the cow [GPU]

“Transcribe’s a penny a minute, just use it.”

Live captioning used to mean hiring a stenographer or wiring up a pricy cloud service. Whisper changed the second part. Transcription got cheap enough that the interesting question is no longer “can I afford captions” but “which way of paying for them is cheaper for my load.”

Your bill can come in 2 shapes:

Amazon Transcribe and metered services like it charge per minute of audio: roughly $0.024/min for streaming, which is $1.44 an hour per stream. The discounts don’t kick in until 250,000 minutes a month and only bottom out near $0.0078/min once you’re past five million.

The other shape is a GPU you rent by the hour and fill with Whisper: a g4dn.xlarge (T4) is $0.526/hr, a g5.xlarge (A10G) is $1.006/hr, and a RunPod 4090 hovers around $0.34/hr.

Those two numbers don’t compare directly, and that’s the whole point.

One bill is linear, the other is fixed

Transcribe costs the same per stream whether you run one or fifty. Ten concurrent channels cost ten times one channel. The price scales with usage, and there’s nothing you can engineer to bring it down.

A GPU costs what it costs whether you put one stream on it or twenty. faster-whisper running large-v3-turbo processes batch audio at 30-40x real time, and for live work you only need better than 1x per stream plus headroom. One mid-range card carries several concurrent caption streams; a 48GB L40S fits 25+ turbo instances in memory before it runs out of compute. The cost per stream is whatever you paid for the card divided by how many streams you managed to stuff onto it.

So the decision isn’t Whisper versus Transcribe on quality. It’s whether your load can fill a GPU.

One channel, around the clock

A single 24/7 channel on Transcribe is $1.44 x 720 hours, about $1,037 a month. The same channel on a rented 4090 at $0.34/hr is $245 a month, and the card is barely warm. Self-hosting already wins, roughly four to one, before you put a second stream on the box.

Ten channels, around the clock

Now Transcribe is ten times $1,037, about $10,370 a month. If those ten streams fit on one GPU, and a single strong card handles ten turbo streams, you’re still paying $245. That’s not a discount, it’s a different order of magnitude. The linear bill has run away while the fixed one hasn’t moved.

A three-hour event, once a week

Transcribe is three hours times $1.44, about $4.32 an event, with nothing to run, nothing to patch, and no idle. The GPU has to be spun up, loaded, and babysat for one stream a few hours a week. Here the per-minute bill is both cheaper and far less work. Renting the minute wins outright.

Run your own numbers

The three cases above are just points on a line. Drop in your stream count, hours per day, and GPU rate and find where you actually sit: [calculator].

Concurrent streams (peak)

Hours/day each stream is live

Days per month

Transcribe rate ($/min)

GPU

GPU rate ($/hr)

Streams per GPU

GPU billing

Kept warm 24/7 (realistic for live) Only when live (ideal, ignores cold start)

AWS Transcribe

Self-host Whisper

Where the GPU math quietly leaks

Two costs don't show up in the headline rate.

The managed services bill wall-clock audio, silence included. A sparse channel, a webinar with long pauses, a lecture with a quiet room, still costs the full minute. Self-hosting with voice activity detection only spends compute on speech, so the gap widens further for talky, gappy content.

The GPU bills idle, and idle is hard to avoid for live. Scale-to-zero sounds like the answer, but loading a Whisper model takes tens of seconds, and a live stream can't wait for a cold start. You keep the card warm, which means you pay for the hours nobody is streaming. For spiky, unpredictable load that idle time is exactly what eats the supposed savings.

Is it worth it?

Below a handful of concurrent streams, or for bursty events you can't predict, rent the minute. Transcribe has no idle, no ops, and similar latency, and the bill stays small precisely because your usage is small. The per-minute model is built for exactly that.

Above a few steady, concurrent streams, own the GPU. A 24/7 operation with even a modest channel count crosses the line fast, and once you're filling the card the per-stream cost falls through the floor.

The crossover is concurrency times utilization, not stream count alone. Ten channels that each run an hour a day are a Transcribe job. One channel that never sleeps is closer to a self-host job than it looks.

How cheap is it, really?

At the bottom, a filled GPU gets you to single-digit dollars per stream-month, against roughly a thousand for the same stream on a metered service. But "filled" is load-bearing. An empty GPU is more expensive than Transcribe, not less, and it's yours to keep running.

Is it stable?

The managed service is. You hand it audio, it hands you text, and the failure modes are someone else's 4am alerts. Self-hosting puts three moving parts on you: the GPU, the streaming wrapper that turns a batch model into a live one, and the WebVTT packaging that aligns caption segments to your media segments. None of it is exotic, but it's yours to keep alive.

What's the catch?

Whisper isn't a streaming recognizer. It reads audio in windows and emits a transcript for the whole window, so live captioning leans on a wrapper like whisper_streaming, which only commits words once overlapping windows agree. That buys around 3.3 seconds of latency, tunable, and accuracy still slips on overlapping speakers, heavy accents, and niche vocabulary. The managed services sit in the same latency ballpark and need none of the plumbing. So the cost win has to clear the ops cost, and on a small or spiky load it usually doesn't.

Content-aware encoding for the rest of us

Netflix encodes it a hundred times…

Per-title encoding is well understood by most of us, at least conceptually. Complex content needs more bits; simple content doesn’t. Encoding everything to the same ladder wastes bandwidth on talking heads and shortchanges sports. The theory is solid.

The expensive part is no longer the encoding. It’s the analysis. Finding the optimal ladder for a title means encoding it many times at different resolutions and bitrates, measuring quality at each point, and building a curve. Netflix does this. At their scale, the storage and bandwidth savings pay for that compute many times over.

For us plebs, the math is less obvious. The question isn’t whether content-aware encoding is worth doing. It’s how much analysis you can spare before it burns a hole in your budget. Especially for content that may be very unpopular.

YouTube took a cheaper path: one throwaway encode at 240p to measure complexity, one model to make the ladder decision. One pass instead of hundreds.

Can we do cheaper?

Yes, with trade-offs.

Per-category instead of per-title

You can afford to miss perfection. You can’t afford to use the obviously wrong ladder.

Let’s invent 3 categories to cover most libraries:

Simple – talking heads, slides
Standard – mixed content
Complex – sports, concerts, high motion

Each maps to a preset ladder with different VBV caps. Classification comes from a short probe – a few clips sampled across the file, encoded at 240p, x264 veryfast. The output bitrate tells you which bucket. CRF handles the variation from there.

CRF

CRF targets quality, not bitrate. The encoder spends more bits on complex frames and less on simple ones – the crowd scene and the post-match interview both get what they need without you predicting the right bitrate for either.

We settled on CRF 21/23/25 across the three buckets, give or take a point depending on source. [libx264, yes. AV1 saves more but the analysis cost runs the wrong way for content nobody watches]

The catch: uncapped CRF isn’t streaming-safe. Pair it with VBV caps (maxrate + bufsize) to put a ceiling on spending. When content hits that ceiling, quality drops below your target. Predictable delivery in exchange for occasional quality concessions on the hardest content.

Hit and miss

The probe samples a few clips. It might miss the hard parts. A documentary that’s mostly archive footage with 20 minutes of field action could probe as simple and get encoded against the wrong caps.

That’s the gamble. Most titles will probe correctly. Some won’t. You’re accepting that error in exchange for a probe that costs almost nothing.

See for yourself

👉here

Everything above, in one script. Pulled from a pipeline that’s been quietly doing its job for two years without anyone noticing. Which is the point.

But wait, there’s more!

Post-production, it runs a quick VMAF pass against the source to tell if the guess was right. Later, one can decide to re-encode with the more appropriate setting or just leave it as is, presumably based on compute availability and popularity of the title. How’s that for a treat?

The number you have to lie about

This is the one that bites quietly. A variable-bitrate rung still advertises one fixed bitrate in the manifest, but actual delivery swings under the VBV cap. You have to pick that one number, and the cap and the average fail in opposite directions.

Advertise the cap and you name a bitrate the stream rarely hits. A viewer who could comfortably carry the real average reads as too weak for the rung and gets parked one below it. Quality you could have shipped, left on the table.

Advertise the average and the rung looks cheaper than its worst case. The viewer takes it, a hard scene climbs to the cap, the segment balloons, and the buffer drains into a stall.

Worth instrumenting if your audience is bandwidth-constrained: watch rebuffer ratio, not latency averages.

Does it scale?

Yes, but watch where the cost actually lives. The probe is fixed-cost per title and tiny, so catalog size is irrelevant: ten titles or ten million, same per-unit cost. What doesn’t scale flat is the optional re-encode-on-miss loop, because misses cluster in exactly the complex content that’s most expensive to re-encode. The cheap part scales with your library. The expensive part scales with your error rate on your hardest titles.

Does this work for live?

No. The probe needs the file to exist first, and live has no file. For live you classify up front by channel or event metadata: a sports channel gets the complex ladder, a webinar channel gets simple. No probe involved. The probe-then-encode mechanism is VOD only.

Why only three categories?

CRF already absorbs frame-level variation inside each bucket, so the bucket’s only job is to set the VBV ceiling, and a ceiling is coarse by nature. More buckets means more preset ladders to tune and maintain for diminishing return. Three is roughly where the curve flattens.

Isn’t this just per-title with extra steps?

The savings don’t come from nailing the optimal ladder per title. They come from the long tail of simple titles that a single fixed ladder over-provisions. You’re not chasing Netflix’s quality curve, you’re refusing to ship sports bitrates to slideshows.

Most of you don’t need low latency

“We saw a demo where it was instant.”

You get a brief. The client wants sub-second latency. No viewer complained, no metric moved, the stream is fine. But somehow your head of sales walked out of the vendor pitch having promised something he figured must be simple since their competitor already had it. Now it’s in the contract and it’s your problem.

The myths

“Low latency improves engagement.”

No published study survives scrutiny once you control for quality and reliability. Stall events hurt retention. Latency, by itself, doesn’t move the number.

“WebRTC is always better.”

Only at small scale. Past a few hundred concurrent viewers you need an SFU, then cascading SFUs, then simulcast because browsers don’t do real ABR for WebRTC. It stops being simple fast.

“The CDN is the bottleneck.”

The CDN is almost never the bottleneck. Most of your latency lives in the player buffer, which you control and the CDN doesn’t touch.

“1.2 seconds glass-to-glass.”

The number is real. It was measured in a lab, wired connection, their encoder feeding their player, no ABR ladder, no DVR, no captions, camera six inches from the screen. Add a 3-rendition ladder and a real player buffer and the same stack delivers 4 to 6 seconds in production. The marketing number and the operational number are not the same thing.

“The encoder is where the latency is.”

The encoder buffer sets you back 500ms to 2 seconds.

Network transit to the ingest point adds 100 to 500ms depending on geography.

Transcode adds 1 to 3 seconds.

HLS packaging adds 2 to 6.

CDN delivery is mostly negligible at steady state, 50 to 200ms.

The player buffer adds 2 to 30 seconds.

Optimizing the encoder while leaving the player buffer alone is solving the wrong end of the problem.

Who actually needs it

Live betting and in-play markets. Live auctions and bidding. Competitive esports where the prize pool justifies the engineering. Bilateral conversation: telehealth, synchronous tutoring, trading floor feeds where the other person is waiting on you.

That’s the list. If you’re in it, pay for it. The engineering is real and worth it.

Everything else is one-way passive consumption. The viewer watches. They don’t bid, they don’t respond, they don’t act. That’s most of the streaming market.

Latency is relative, not absolute

Ten seconds of delay is invisible if everyone in your audience is ten seconds behind. Nobody feels late. Nobody is ahead. The stream is the reference.

Broadcast TV has always run 5 to 10 seconds behind. Cable adds more. Satellite adds more still. Nobody complained for decades because there was no faster reference channel – the whole audience was behind together. Viewers feel latency when something faster exists alongside it. A Twitter feed spoiling the goal before it plays. A Discord chat reacting to a moment the video hasn’t shown yet. The neighbor’s TV cheering through the wall. The stadium roar arriving before the replay does on screen.

The problem isn’t the number. It’s the gap between your stream and the next fastest thing your viewer has access to. Close that gap and the latency disappears. Accept it and build around it.

One underused option: delay the chat to match the slowest viewer instead of lowering the video to match the chat. Twitch and YouTube both do this quietly. It costs nothing and solves the spoiler problem for most audiences without touching the streaming stack.

What you pay

Numbers below are for 1000 concurrent viewers, 720p, one hour, North America. Use them as a multiplier reference. They will change.

Setup	Cost/hour	What’s included
Self-managed HLS (Bunny CDN)	~$11	CDN delivery only, you handle encoding
LL-HLS managed (Mux)	~$50	Encoding and delivery
LL-HLS managed (Cloudflare Stream)	~$60	Encoding and delivery
LL-HLS managed (AWS IVS standard)	~$74	Encoding and delivery
WebRTC via SFU (AWS IVS Real-Time)	$72+	Scaling is handled, but cost per viewer stays high

The 5 to 6x gap between self-managed HLS and a managed LL-HLS platform is partly the low-latency premium and partly the managed service premium. Separating the two is hard because no major vendor sells unmanaged LL-HLS delivery at Bunny prices.

WebRTC looks comparable on paper. It isn’t in practice. Because scaling it is painful and not straightforward. Managed services like AWS IVS Real-Time handle that for you, which is what the price reflects.

The mobile brittleness

Tight player buffers are what get you to 1 to 2 seconds on a good connection. On 4G or 5G with variable RTT, the same buffer causes stalls. A viewer on WiFi gets a faster stream. A viewer on cellular gets a broken one. If your audience skews mobile, your audience-weighted experience can be worse on a low-latency deployment than on a standard 6 to 10 second buffer.

The metric you’re probably watching is P50 latency (the median viewer experience). That’s the wrong number. The viewer you lose is the P10 (the bottom 10% – weakest connections, worst conditions) who stalls every 30 seconds. A deployment that improves P50 from 6 seconds to 2 while making P10 rebuffer constantly is a worse product. Instrument your player for stall events and rebuffer ratio, not latency averages.

It gets worse

Building for low latency narrows the stack. Specific ingest formats, specific encoders, specific players, specific CDNs. Every subsequent decision has to respect the latency constraint, including ones that have nothing to do with latency.

DVR is the one that catches people off guard. Standard HLS gives you free DVR because the segments already exist on the CDN. WebRTC has no segments – you need a parallel recording pipeline. LL-HLS keeps segments but partial segments complicate cleanup. The free DVR most operators rely on goes away the moment you pick a low-latency stack.

Why it became a selling point

CDN bandwidth pricing has been collapsing since 2018 and keeps dropping every year. Delivery costs fell, differentiation blurred, “we stream video reliably” stopped being a sellable line.

Latency was a fancy new premium. Technically hard enough to justify the price, easy to demo with a stopwatch on stage; impossible to disprove on a slide because the numbers are real, they’re just measured in conditions that don’t exist in your production environment.

That’s why every CDN suddenly has an ultra-low-latency tier. The feature is real. The widespread demand it claims to address is mostly manufactured.

Should you bother?

Five questions. If you answer yes to most, you’re in the low-latency cohort.

Does your viewer act on what they see within seconds – bid, bet, respond?
Is there a faster channel they’re comparing against: a second screen, a chat, a neighbor?
Does your audience watch primarily on desktop or TV?
Would delaying the chat break the experience?
Are you willing to accept a narrower stack, higher ops cost, and no free DVR?

The $0 closer

Low latency is worth every penny when the use case demands it. The problem is that most of the industry decided the use case was universal. It isn’t.

Cheapest pay as you go CDN for streaming

There is zero activity in February and August

You’re at that point where your single-server or clustered streaming setup just can’t keep up with the spikes and you know for a fact that you need a CDN, kudos for getting this far. Or maybe you’re already using one but wondering if perhaps you’ve missed out on a better deal from the other guys.

Navigating public price offerings can be challenging. Between the assortment of parameters (ingress, egress, requests, transfer), hidden fees, offers too good to be true, temptation to give into long term commitments etc, one may find it’s quite tough to make a decision.

This piece will focus on comparing the true pay-as-you-go CDN vendors with public and transparent pricing models. By ‘true pay as you go’ I mean the flexibility to pay zero if you don’t use the service at all in a certain month. That’s important if you’re broadcasting occasionally (festivals, seasonal sporting events), or you just don’t know if your business will still be alive and kicking in a few months from now.

And here it gets graphed, comparing the few mainstream providers. Just adjust the ckecks and sliders to match your use case and make your pick, today.

Lessons learned

Stick with the big names

Unless you know your game really well that is. Smaller players will do their best to lure you into admittedly appealing deals, yet most of these will be either

(A) resellers – nothing wrong with that per se, except there may be a better offer from the very CDN they’re reselling; that’s not always the case, as they can negotiate better pricing than you ever could by leveraging big volumes and upfront commitments

(B) maintaining their own infrastructure – nothing wrong with that either, yet do expect inferior throughput due to the reduced footprint and peering capabilities; also they may run out of capacity when you need it most – at peak; sadly cards have been dealt in the industry more than a decade ago there’s no way to stand up to the giants unless you’re a giant yourself

(C) hybrid – relying on both own and 3rd party infrastructure and trying to make the best of each; that’s admirable, still… they need to walk a fine line prioritizing either quality or profit, as it’ll be very tempting to try max out the (usually) inferior inbuilt capacity before racking up the upstream bill

(D) tricksters – still sharing traits with one of the above categories, yet at the very dishonest end of the scale; expect generally poor quality for the buck, slowdowns, interruptions, being throttled in favor of other customers, untruthful traffic measurements

Not meaning to scare you off, and there are surely exceptions; you’ll be able to find gold if you take the time to dig, especially among the local providers.

Always have a backup

There are many ways a network can fail, get saturated, or otherwise work against your best interest. Be prepared to switch or offload to some other vendor, may it be more expensive. There’s no good excuse for not being able to deliver the service you promised, and the chance to earn back the trust after a big fail may not come easy, if at all.

Where you deliver matters

While North America and Europe are well covered, providing fast connectivity elsewhere is often not straightforward.

Geographical regions discussed are those offered commonly by all suppliers. Yet some will cater distinctly to destinations like Africa, Middle East, India, Japan etc. You need to do a more in depth research if you focus anywhere there, look into local dealers too.

What goes into the graphs and what doesn’t?

The per-GB egress price – this makes the bulk of the pricing. Varies between $0.02 and $0.466, depending on region and consumption (i.e. the more you use the less you pay per unit)
The per-request price – varies between $0.6 and $2.2 per million HTTPS requests depending on region and offering. HTTP requests are cheaper with AWS but have not been considered here.
The per-GB ingress price (aka cache fill), where applicable – varies between $0.01 and $0.04 depending on ingress and egress location, only applied to google’s offering
The somewhat hidden $0.075 per hour for google’s ‘forwarding rule’ – a must-have paid-for link in their CDN chain.
The licensing price for a month in the case of Wowza

What about Akamai, CloudFlare, Comcast, Fastly, Level3, LimeLight etc

They don’t have public pricing, so we won’t discuss them here. Also some will require commitments and/or longer term contracts to have you as a customer. That does make sense for a company that offers this as its main service, as they need to rely on a somewhat predictable income to invest in capacity.

Do realize that there’s reselling (explicit or not) even at the highest levels, Azure and Wowza among them.

Can I use more than one?

Most certainly can, and you should if it’s feasible. Also know how much you’re paying each for exactly what, and over time use the information as leverage for a possible better deal. And stay on the lookout, the market continuously evolves and all this may be obsolete in a few months.

Another free low latency solution

We really need to do something about this delay

Since we last brought up the topic, the industry has evolved a bit. Most of the big live streaming and social media players now routinely stream at under 5 seconds end-to-end latency, and your modest platform may be laughed at or lose business if still relying on the good old HLS/DASH and its inbuilt huge delays.

The technological background hasn’t changed much, yet the emergence of ‘cord cutting’ has emphasized on the annoyingly big delays and pushed OTT providers to adapt and innovate. Where it could, LL-DASH has been implemented with relative success, periscope’s LHLS has had (and still has) its own success stories, and eventually apple had to step into the game and put together its own LL-HLS, currently already a published standard and deployed in the latest iOS.

As we speak, there are a few factors at play that may set back your roadmap to low latency

Support for proprietary WebSocket based streaming is going away, most notable possibly being wowza’s announcement to discontinue its ‘ultra low latency’ thing; it makes sense in light of market-driven evolution of alternatives and fact that this was a stand-in solution from the get-go, with obvious drawbacks
WebRTC is not yet a grownup; while having been standardized and taken a giant step since available in Safari, remarkable implements are taking a while
Player and server support of LL-HLS is still limited to commercial products
LL-DASH support is still not ubiquitous

The treat

To the rescue, a friendly wrapped POC solution based on the rather amazing open source OvenMediaEngine. It supports both WebRTC and LL-DASH egress from a RTMP source, amongst other cool stuff.

The WebRTC output lets you stream with sub-second latencies (!), and the LL-DASH can be configured to use a playback buffer of 1 second or less.

It’s here, to use as such or inspire from, enjoy!

Does it scale?

LL-DASH – scale with ease

As long as you can deploy/make use of reverse proxies that support chunked-transfer, scaling is a breeze. Nginx can do it, as do most CDNs – go for it.

WebRTC – not as easy but it can be made to

The larger shortcoming of WebRTC is that it’s been designed for peer-to-peer and one-to-one; twisting it to support one-to-many means impersonating multiple one-to-one endpoints, each mildly resource consuming, to the point where it’ll choke any one server.

Capabilities will largely vary depending on actual hardware, and stream characteristics. Consider just 200 viewers per cpu core when budgeting, any betterment will make your customer happy.

There’s also the hot topic of transcoding. While AVC is (at long last) ubiquitous in WebRTC, you’ll need to transcode the audio to Opus. That’s surely a breeze for any CPU but it won’t scale, so the number of streams you can run on a server is limited.

Is it worth it?

If you absolutely need cheap/free low latency, it is.

Biggest conundrum being that DASH won’t work on iOS and WebRTC is harder/more expensive to scale, may I suggest you use both (have iOS users play the WebRTC feed) and see where your scalability needs take you. Provided you’re running a small/medium platform or just starting up, the odds are you’re better off than giving into commercial offerings.

What about LL-HLS?

In OvenMedia, it’s reportedly in the works and may be available soon. In general, it may still be a while before we see it thriving. Partly due to its initial intent to mandate HTTP/2, the industry has been slow to adopt it, and the couple implements I’ve seen still get laggy provided near-perfect networking and encoding setups.

Adaptive bitrate anyone?

Not supported with this product but it may soon be.

Let me point out though that the two (ABR and extremely low latency) don’t go particularly well together. Think that

The need to transcode for ABR will add to the latency
Determining network capabilities and switching between ABR renditions is way tougher to properly plan ahead and execute given sub-second delays and buffers

In the big picture, you’re trading every second of latency for quality of experience or cost. Please don’t make it a whim and seriously assess how bad and how low you absolutely need it. Delivering near-instant high quality uninterrupted video over the open internet requires sophisticated/expensive tech, and even the most state of the art won’t deliver flawlessly to all.

Setting Up A Live Viewer Count, For Cheap

Millions, with an M

You’ve seen it around on big guys’ platforms. But when trying to put together your own you may have hit the price, maintenance, or scalability wall.

The following solution is no magic fix to these and surely not special, yet it may help you understand the pitfalls and tell apart the clues ahead of time.

How it works

As simple as you’d imagine. Each viewer announces its presence to a central authority, let’s call that the counter. As soon one new such presence is announced, all the viewers are notified that the audience count has increased. Also, as soon as any viewer disengages, the others get notified that the respective count decreased.

Persistent connection

To facilitate instant updates, a continuously open connection is required between any one viewer and the centralized counter. Having the former just ask around for the number every once in a while (i.e. polling) is still an option but won’t be nearly as smooth or fast.

Sockets

Such connectivity can generally be accomplished by means of sockets. Long story short, a socket is a kind of nearly-instant bi-directional data channel between 2 network-connected devices.

WebSockets

Many most apps are commonly able to liberally create and make use of regular sockets, however the restrictive context of an internet browser cannot. Special abstractions had to be figured to bring socket-like functionality to the browser, of which the WebSocket has surfaced and is finally widely supported.

The server

The so called counter is merely a piece of software; it has to reside on a publicly accessible, always-on computer or device that oldsters like to call a server; while it does a lot of things, a server’s main job is to ‘serve’ common needs of various other (not necessarily so public or available) devices, generally referred to as ‘clients’.

The ready to use solution

Is here for grabs. Variations have been implemented on multiple platforms and it’s stood the test of time.

How cheap is it?

You’ll just be paying for the ‘server/s’. Unless you can run the counter on one of your existing computers [remember it needs to be public] or take advantage of some cloud’s free tier (in which case it’s free, as in beer).

In production, consider budgeting $1 per 1k simultaneous viewers per month, prorated.

Does it scale?

Not without headaches.

The boxed solution is well optimized and proven to accommodate some 10k viewers when running onto the smallest available cloud instance (with just 0.5 GB of RAM!)

It can stretch to take up to maybe 4-5 times that much on a single computer but the truly scalable setup takes an autoscaled cluster of ‘servers’. Not too complex really, it has been done a repeatedly and hope to get the time to dust one up and make it public soon.

Is it stable/reliable?

Up to a point…

You’ll see it hogging the host’s CPU way before it starts being laggy to your viewers. Run it on a more powerful computer next time you expect a similar or larger audience
Memory use hasn’t been a real concern in any of the implements
If noticing a rather constant limit your counter never goes above, your setup (either the software, OS or NIC) may be running out of sockets it can simultaneously keep open. There are many ways to mitigate that, details vary with specifics of the environment
Before WebSockets were ubiquitous, long-polling (and at times its creepier cousin short-polling) was the norm for setting up persistency in the browser; these put a more severe burden on the server and are safe to be avoided, at long last; don’t give in to the likes of socket.io unless you really know what you’re doing

Is it secure?

In the example it’s not. Meaning that if one wanted to impersonate extra viewers into your pool they could easily do so. Also go the (DoS) extra mile and try to bring the ‘counter’ and the server hosting it to its knees, by impersonating a jolly bunch of extra viewers.

Not to say it can’t be made safe. CORS and SSL are the first things to consider. Also some simple way to limit rate and payload size.

Next up, any extra validation, authentication, tokenization etc. will take a slight toll on the server resources, multiplied by one of the numbers above. So be wary and benchmark each addition.

Is it fast?

Yesss! As fast as you’d expect an update to propagate over the internet these days, at half the speed of light if lucky.

Sounds like a simple task, why is it so hard to scale?

Think the following scenario: 100 average viewers, each coming in or out every 10 minutes. That’s 10 updates per minute, to propagate to all of the 100, for a total of 1000 updates per minute.

Now for 1000 average viewers, each also coming in or out every 10 minutes. That’s 100 updates per minute, to propagate to all of the 1000, for a total of 100k updates per minute.

Take that for 10k average viewers, it’ll be 10M (!) updates per minute. And that’s just averages, real life will show you that the audience tends to flock in the beginning and key moments of an event.

Ok, there are tricks to smooth out the treacherous exponential there, and you know one of them already. Display 1.7k viewers instead of 1745 viewers. That’s a hundred-fold reduction in the number of updates, out of the box! And there’s more to be done of course.

As a small business, must I pay royalties for H264 and H265?

Will they come after me?

There’s a shred of misunderstanding, to say the least, when it comes to grasping and facing the codecs licensing topic. General perception being that if you’re just starting out you don’t need to worry about it, the warning here is that it may crawl up on you as you grow, depending on how you put that codec to good use and especially how you monetize it. Let’s start with the basics though.

Intellectual Property

Many video compression techniques included in a codec are patented inventions. To use the codec, you’d have to license the patents from their creator or representative. Fair enough, except we are talking about a few thousand patents from a few dozen companies.

*Image and information courtesy of* prnewswire

Patent Pools

To simplify licensing, copyright holders ‘pooled’ the patents through organizations that sell these collectively on behalf of their members.

While there’s more than a single pool, and some patents are unaffiliated, it is commonly agreed that you only need to reach out to MPEG-LA to license H.264 (aka AVC), while in the case of H.265 (aka HEVC) you need to pay at least the 3 big pools (MPEG-LA, HEVC Advance, Velos), of which the latter does not even publicly disclose prices.

Known pricing

Terms under which a license is sold are rather complex and highly nuanced. Cost will vary depending on the context respective codec is being used, volume, and revenue you may drive from it.

Very much notable, some use cases bear no cost, while others carry a generous entry level threshold. Nevertheless, do pay attention, and let’s take these one by one.

Per-Device

Applies to smartphones, tablets, digital and smart TVs, computers, video players and anything with a hardware encoder or decoder of the respective codec. Royalties are owned by the device supplier and not by the encoding/decoding chip or module manufacturer.

Also applies to software products that include an encoder or decoder. Royalty is owned by the product vendor/distributor, whether the product itself is commercial or free. Notable exception: free products (truly free, like Firefox) may include the OpenH264 binary, in which case royalties will be generously covered by Cisco.

Per-Title PPV

These include platforms that sell access to content on a per-title basis. Royalties are either (A) a fixed value per sale or (B) a percent from sales to end-users, in some cases the lesser of the two. Note that tiles (i.e. videos) 12 minutes or less are exempt from such royalties.

Subscription-Based PPV

Royalties apply to subscription platforms like Netflix and vary depending on codec and number of subscribers. There’s a zero cost entry level for AVC if one has less than 100K subscribers.

Free Television Broadcast

Applies to terrestrial, cable and satellite broadcasters, with pricing per encoder or size of the audience

Free Internet Broadcast

You own no royalties if encoding content to be distributed for free over the internet.

Real world (small) business models, and how much they may own

Mobile apps

We’re obviously talking about mobile apps that either play or broadcast/manipulate video through either one of these codecs.

If you rely on a hardware or OS exposed encoder/decoder to do the job, you don’t owe anybody anything, godspeed!

If you include a software encoder or decoder in your app, you fit into the ‘Per-Device’ category. For AVC you don’t pay anything until you reach 100K units (i.e. actively installed apps).

For HEVC, you’ll be paying from the ground up, think $1.5 to $4 per unit.

Streaming platforms

You owe royalties if you distribute AVC or HEVC encoded content, unless it’s free as in YouTube.

A TVOD platform (or the live streaming pay-per-title equivalent) should pay MPEG-LA 2¢ or 2% per title sale for H264 and/or 2.5¢ to HEVC Advance for H265. There is no entry level freebie for this model.

A SVOD platform (or the live streaming subscription-based equivalent) starts owing MPEG-LA between $25-100K for AVC after they go over 100K paying customers. HEVC is not as friendly to newcomers and you owe HEVC Advance ¢0.5-2.5 per customer from day one.

Cloud encoding

If you operate a service that sells the encoding/transcoding service explicitly (like encoding.com does), you definitely do owe royalties. How you will be billed is however rather uncertain. You’ll ultimately have to reach out to licensors and ask, I have at least 2 customers being charged very differently for quite similar business models. Common sense would even so dictate that

If you charge for encoding by the item (title) you will pay royalties per title
If you charge for encoding via a subscription, you will fit into the subscription-based royalty model
If you charge for encoding by the minute, you may (possibly) fit into the per-device category, where each encoding server counts as one such device

If you transcode video internally, as part of a larger streaming platform, there’s no clear rule/guideline on how licensing works and you also have to ask. A couple customer stories would lead yours truly to believe that

If the platform distributes paid content (SVOD or TVOD) and already paying per-title or per-subscription royalties in that respect, there is no extra charge for the encoding part
If the platform distributes free or AVOD content, it may owe per device (i.e. transcoding server or server core/thread) royalties; or it may not 😐

Online TV Stations

If it’s free to watch, you’re in the clear, no royalties.

If it’s a paid service (i.e. subscription) you do owe it. Even if streaming is powered by a 3rd party platform and/or commercial player, the organization that labels the content also has to license the technology. Now you know.

Will They Come After Me?

Possibly not. Interesting enough, the ‘pool’ organizations cannot and do not deal with litigation.

Never heard of any small player being anywhere close to indicted but still…

As your startup begins to grow, you should start being aware of how much you owe and consider that you might someday need to pay it all retroactively. Balance your encoding needs and don’t shoot for the mightiest codec unless you really need it. Explore alternatives and know your options.

Are there free alternatives?

Sure!

AV1 is everyone’s dream: royalty free, and backed by an alliance of 48+ members; but it’s rather new, half baked, and it will probably be long before you’ll find a decoder for it in every device out there; but definitely one to look after in the years to come.

VP8 and VP9 are roughly comparable in quality to AVC and HEVC respectively, and also royalty free; except they’re only supported by google. While they admirably carried out the complex (and expensive) job of bringing these to market and safeguarding them from patent claims, they failed to convince the other big boys to adopt it; so hardware support is still scarce some 10 years later.

Where to go next?

See Jina’s article on the matter of AVC licensing, it may help clear out extra concerns. Also a couple of great articles here and there.