Don't Cache the Silence: Building Self-Healing Audio Infrastructure for Earnings Calls

Earnings calls are the most time-sensitive events in an investor's calendar. Every quarter, public companies get on a call with analysts and walk through their results. Revenue, margins, guidance, the unexpected stuff. The transcript of that call becomes a primary source that investors, journalists, and analysts read and re-read for days afterward.

I was building the documents section of the stock platform, and earnings call transcripts were one of the major features. The feature sounds simple: show a list of earnings calls for a company, let users click into one, read the transcript, and listen to the audio recording.

The implementation had a few interesting problems worth unpacking.

What the API Actually Returns

Transcripts come from a financial data API. Two endpoints are involved: /stock/transcripts/list for metadata and /stock/transcripts for full transcript content fetched by ID.

The list endpoint is what you call first to populate the tab. Pass a symbol, get back an array of transcript records. Each one has an ID, a title, a timestamp, a year, and a quarter field.

The problem is that "transcripts associated with a symbol" does not mean "quarterly earnings calls only." The API's list includes investor conferences, technology summits, capital markets days, special investor events, and other occasions where management speaks publicly. These events have transcripts. But they don't have quarterly earnings audio. They don't map to a fiscal quarter. For investors specifically looking for Q1, Q2, Q3, Q4 earnings calls they are noise.

When I first shipped the tab it showed everything unfiltered. A company with 20 actual earnings calls might also have 15 conference appearances mixed in. The tab felt like a raw data dump rather than a curated document library. The client flagged it.

The fix required a reliable way to identify real earnings calls without making extra API requests per item.

The Naive Approach and Its Cost

The obvious verification approach: fetch the list, then for each item call the full transcript endpoint or the audio endpoint to check whether audio exists, keep only the items that have it.

That's N+1 API calls. A company with 35 items in the list means 35 individual requests just to decide what to show. All of them consuming API quota, all of them adding latency, all of them happening before the user sees a single result. The provider enforces a global rate limit across the platform. During earnings season, when multiple companies report in the same week and multiple users are loading transcript tabs simultaneously, N+1 would eat that budget fast.

There was no way to ship that cleanly.

The Field That Was Already There

Looking more carefully at what the list endpoint returns, the answer was already in the response.

The transcript record includes a quarter field. The API's own documentation describes it as "Quarter of earnings result in the case of earnings call transcript." That phrase is doing real work. For actual quarterly earnings calls, this field is set to 1, 2, 3, or 4. For conferences, investor days, and everything else, it is 0. The field is not just display metadata. It is a type discriminator built directly into the API contract.

The filter is one line:

const allItems = (await fetchTranscriptsList(stock.symbol)).filter(
  (item) => item.quarter > 0,
);

No additional API calls. No per-item verification. The information needed to filter was in the list response the whole time. Zero extra requests.

That solved the list itself with no extra requests. But the list was not the part users would notice most. The harder problem started once a call was recent enough that the transcript, the live stream, and the finished recording could all disagree for a while.

The Audio URL Problem

The transcript detail includes an audio field. For most historical calls that field has a direct MP3 URL and everything is straightforward. But for recent calls (ones that happened in the past few hours or days) that field is frequently empty even though audio does exist. The transcript indexing pipeline and the live earnings endpoint operate on different schedules. The transcript gets indexed first. The audio URL gets attached later.

This creates a window where the transcript is readable but the audio is temporarily unavailable via the normal path.

How Audio Is Resolved During the Live Window

The resolver is really one fallback chain. It starts with the transcript detail, then falls through to the live earnings endpoint if the historical audio field has not caught up yet.

First, it checks transcript.audio_url. If that exists, the provider has already finished indexing the call and attached the historical recording, so the resolver can return immediately.

If that field is still empty, it falls back to the live earnings endpoint, which is fresher around the event itself. That endpoint can return either recording, which is the finished MP3, or liveAudio, which is the HLS playlist used during the call and often for the immediate replay right after it ends.

const audioUrl = transcript.audio_url || match?.recording || match?.liveAudio || null;

That ordering is deliberate. A finished MP3 is the cleaner source for a completed call, so recording wins when it exists. But if the call just ended and the MP3 has not been processed yet, returning the HLS stream is still much better than returning nothing. That is the narrow but important gap where the live system and the historical transcript system overlap.

The Hot Window: Why Null Is Never Cached

The most important detail in the entire resolver is near the bottom:

if (audioUrl) {
  await MEMORY.setJSON(cacheId, audioUrl, CACHE_TTL_TRANSCRIPTS);
}

The write to Redis is conditional. If audioUrl is null (the live endpoint found no recording and no active stream) we return null to the caller but we do not cache it.

Here is what happens in the hours right after an earnings call ends:

The call finishes. The transcript gets indexed quickly, but the audio field is empty. The recording pipeline is still processing the call.
First user requests audio. Path 1 misses (transcript has no audio_url). Audio cache misses (nothing stored yet). We hit the live endpoint. Recording not ready. Return null. Nothing cached.
Recording finishes processing. Now the live endpoint returns a URL.
Next user requests audio. Path 1 still misses (transcript detail cache is the stale 1-hour version without audio_url). Audio cache misses. Live endpoint now returns the recording URL. We cache it for 1 hour. Return it.
Every subsequent user within that hour: audio cache hits immediately. Live endpoint not called again.

If we had cached null on step 2, every user during the hot window would be served a stale null for up to an hour after the recording became available. The feature would look broken (the call happened, everyone knows it happened, but there is no audio) for a full cache cycle. By deliberately not caching null, the system keeps checking until it finds something, then locks in once it does.

The hot window is precisely when this feature is most used. Earnings calls attract the most traffic in the hours immediately after they end. Getting this behavior right matters more for recent calls than for anything else.

The Date Window and AMC Calls

The live endpoint query uses a date range derived from the call's own timestamp. The window runs from the call date to the call date plus one day. The extra day is specifically for AMC calls (After Market Close). Many companies report earnings after trading ends for the day. The call runs into the evening. The recording gets processed and uploaded overnight. By the time it is available it is technically the next calendar day in UTC.

A query scoped only to the call date would miss those recordings entirely. The +1 day window ensures that calls which ended in the evening and got processed by midnight are still found.

The Hot Window Is Where HLS Actually Matters

The interesting part of this feature is a very specific window: the few minutes before a scheduled call, the live call itself, and the first stretch right after it ends. That is when users are actively trying to listen, and it is also when provider data is least clean.

In that hot window, a stream URL can appear before the call really starts. The scheduled time can pass before live audio is actually ready. And after the call ends, the HLS stream can still be the best replay source for a while before the MP3 recording shows up. If the app treats those moments as the same state, the dialog feels flaky exactly when users are watching it most closely.

So the live earnings endpoint does more than return fields. It resolves each event into a status the dialog can actually trust: upcoming, live, or ended.

Tighter Polling Without Polling All Day

The frontend and backend both tighten up only around the time where a call can realistically change state. That is what makes the live experience feel responsive without turning the whole day into constant upstream churn.

More than 10 minutes before scheduled time -> no active polling
10 minutes before to 10 minutes after -> poll every 1 minute
10 to 30 minutes after scheduled time -> poll every 5 minutes
If a call is already live -> 5-minute fallback polling
Everything ended -> stop polling
Tab hidden -> pause polling

That is the actual hot window. It exists because earnings calls do not go live on a perfectly reliable second boundary. Streams can appear a little late, and right after the call the replay can be available before the recording is processed. Polling is concentrated around those transitions instead of running aggressively for the entire day.

On the backend, any live-events request that includes today is cached for 30 seconds. That is short enough to keep status fresh, but long enough for multiple users to share the same upstream fetch and the same round of HLS probes during the busiest period. Outside today's range, the cache stretches to 1 hour because there is no reason to keep rechecking stable historical data.

The HLS Playlist Tells Us When Live Becomes Replay

That recent branch is the one that matters most. It covers the ambiguous period where the call could still be upcoming, actively live, or already over but replayable.

For HLS, the useful signal is #EXT-X-ENDLIST. A live .m3u8 playlist keeps receiving new segments. Once the encoder finalizes it, that tag appears and the same playlist becomes a completed replay. The cleanest way to know which state you are in is to read the playlist itself.

const probeHlsStream = async (
  url: string,
): Promise<'live' | 'ended' | 'unavailable'> => {
  try {
    const response = await fetch(url, { signal: AbortSignal.timeout(5000) });
    if (response.status === 404) return 'unavailable';
    if (!response.ok) return 'live';
    const text = await response.text();
    return text.includes('#EXT-X-ENDLIST') ? 'ended' : 'live';
  } catch {
    return 'live';
  }
};

The server fetches the playlist with a 5-second timeout and checks for #EXT-X-ENDLIST. If the tag is present, the stream has stopped publishing new segments and the same URL can be treated as ended replay. If the tag is absent, the stream is still live.

The failure direction is deliberate too. A 404 means the playlist is gone, so the call is treated as ended. But other fetch failures default to live. That bias protects the worse case: falsely telling the user the call is over when the stream is still active and the probe just had a transient issue.

The Frontend Playback Layer

Once the backend has resolved the best available source, the frontend still has to decide how to open it in the dialog. This is where the live HLS stream, ended HLS replay, processed MP3, and historical transcript audio get turned into one playback order.

// Priority: live HLS -> ended HLS replay -> recording MP3 -> historical transcript audio
if (liveEvent?.status === 'live' && liveEvent.live_audio_url) {
  openTrack({ src: liveEvent.live_audio_url, live: true, autoplay: true });
  return;
}

if (liveEvent?.status === 'ended' && liveEvent.live_audio_url) {
  openTrack({ src: liveEvent.live_audio_url, autoplay: true });
  return;
}

if (liveEvent?.status === 'ended' && liveEvent.recording_url) {
  openTrack({ src: liveEvent.recording_url, autoplay: true });
  return;
}

// Fallback: historical transcript audio
const t = await getTranscript();
if (t?.audio_url) openTrack({ src: t.audio_url, autoplay: true });

The ordering is deliberate because HLS solves two adjacent problems, not one. During the call, the player opens the HLS stream with live: true, which enables the live indicator and the "Jump to Live" button. Right after the call ends, that same live_audio_url often remains the best replay source before the MP3 has been processed, so the player keeps using it without the live flag. Only once recording_url exists does the MP3 take over as the cleaner long-term source. The final fallback is the historical audio_url from the transcript detail, for any call where the live endpoint has no usable audio at all.

Two reactive effects bridge the gap between polling cycles. If polling says the call is no longer live, a useEffect clears the live flag immediately so the player drops the live affordances right away. And if the stream ends naturally before the next poll cycle, the player's ended event invalidates the live query so replay state is fetched immediately.

What I Took Away From This

The quarter field is not just a display value. It is a type discriminator. Recognizing that it encodes a meaningful distinction and using it as such eliminated an entire class of N+1 requests. Before designing a verification step, check whether the data you already fetch encodes the answer. In this case it did.

The null caching decision is a small conditional that shapes the entire behavior of the feature during the period when it matters most. Not caching null makes the system self-healing: it keeps checking until it finds something, then stabilizes once it does.

The HLS probe is the detail I find most interesting technically. The m3u8 playlist is a plain text file. It has a well-defined tag that signals stream termination. Reading that tag directly is a clean, spec-compliant way to determine live state without relying on any secondary API field. The failure modes are carefully chosen: anything ambiguous or broken defaults to 'live' rather than 'ended', because the cost of cutting off an active listener is higher than the cost of showing a live indicator that's slightly stale.

The dual TTL (30 seconds for today, 1 hour for everything else) only works because it is paired with the frontend's hot window. The client watches closely around the scheduled start time and then relaxes once the call moves out of that volatile period, while the backend shortens its cache only for today's range where those transitions are actually happening. Treating all dates identically would mean either polling too aggressively on old stable data or not freshly enough during the live window. Splitting both the polling behavior and the cache TTL around that hot window is what makes the system feel responsive without wasting work.

BLOG

PUBLISHED

READ TIME

TOPICS

SUMMARY