Y
Yasindu Nethmina
Full-Stack Product Engineer
MENU
Projects
Blog
Experience
About
Contact
©2026 Yasindu Nethmina. Made with in Colombo, Sri Lanka.
BLOG

PDF to Thumbnail in Milliseconds: MuPDF WASM, 3-Layer Caching, and Zero Wasted Renders

PUBLISHED

April 12, 2026

READ TIME

12 min read

TOPICS
CachingWebAssemblyS3RedisPerformance
SUMMARY

How to render exactly one thumbnail per PDF, ever: MuPDF WASM in-process rendering, a 3-layer Redis/S3/generation cache with content-addressable SHA-256 keys, Promise.withResolvers() request coalescing that collapses concurrent callers into a single inflight operation, local concurrency lanes with zero-poll callback handoff, and a 204 no-store failure response that keeps errors from sticking in any cache.

I've been working on a stock market research platform for a while now. The kind of thing where you can pull up any ticker and dig into its financials, read earnings call transcripts, track insider trades, check the economic calendar. Recently I shipped a full documents section for each company: SEC filings, earnings call transcripts, and shareholder presentations.

The presentations tab is what this post is about. It started as a small UX request and turned into a genuinely interesting engineering problem.

What Are Investor Presentations?

Publicly traded companies regularly publish shareholder presentations. These are slide decks that management puts together for investor conferences, earnings days, or capital markets days. Think a 40-page PDF with a mix of strategy slides, financial charts, and product roadmaps. They sit on the company's investor relations page, linked from press releases, referenced in earnings calls. Serious investors read them.

For the platform, these presentations are pulled from a financial data API. For any given ticker, it returns a list of presentation records: a title, a quarter label, a date, and a URL pointing to the raw PDF hosted somewhere on the company's IR site or a third-party PDF hosting service.

When I first shipped the presentations tab, it was a plain list. Title, quarter, date, a link to open the PDF. Functional. But looking at a vertical list of "Q3 2024 Investor Day Presentation" links is not a great experience. You have no visual context before clicking. You can't scan the list and immediately recognize what you're about to open. The client flagged it pretty quickly: it felt like a raw data dump rather than a feature.

The obvious improvement: show a thumbnail of the first slide. Let users see the cover page at a glance. That one change transforms the tab from a link list into something that feels like an actual document library.

So I built it. And the moment I started thinking about how to build it properly, the constraints started stacking up.

The Constraints That Shaped the Approach

The presentations are third-party PDFs. The URLs change, the hosts vary by company. Some companies use their own IR sites. Others use PDF hosting services like Investis or Q4 Inc. A few serve the file directly from a CDN.

This creates a few constraints immediately:

  • -I can't render on the frontend. We're fetching from arbitrary third-party URLs that have CORS restrictions. The browser can't load them directly even if I wanted it to.
  • -I can't pre-generate thumbnails for all companies up front. There are thousands of tickers and even more presentations across them. Batch-generating would be a significant upfront job and would need to re-run continuously whenever new presentations are added.
  • -I can't render on every request. PDF rendering is not free. Fetching a PDF from an IR website on every thumbnail request is slow, unreliable, and would hammer source servers.

So the design had to be lazy (generate only when first requested), permanent (render exactly once and never again for that PDF), and resilient (handle unreliable source servers without surfacing broken states to the user).

Those three requirements together ruled out almost every shortcut and pushed me toward building something a bit more deliberate.

Choosing a Renderer

The two realistic options for server-side PDF rendering were Puppeteer and MuPDF.

Puppeteer means spinning up a headless Chromium instance, loading the PDF, screenshotting the first page. Chrome is actually a capable PDF renderer, but you're paying for the whole browser to get it. You need to manage a process pool, deal with memory pressure from concurrent renders, handle crashes, and configure it to work inside a container. It also adds a meaningful binary dependency on the host machine.

MuPDF is a C library that has been rendering PDFs for a very long time. It handles the full spec: complex layouts, embedded fonts, transparency, ICC color profiles. It ships with a WASM build maintained under the mupdf npm package. You import it like any other module and it runs inside your process. No subprocess, no IPC, no external binary on the host.

I went with MuPDF. The rendering code ended up being this:

import * as mupdf from 'mupdf';

const renderPdfFirstPage = (pdfBuffer: ArrayBuffer): Uint8Array => {
  const doc = mupdf.Document.openDocument(pdfBuffer, 'application/pdf');
  try {
    const page = doc.loadPage(0);
    const scaleMatrix = mupdf.Matrix.scale(1, 1);
    const pixmap = page.toPixmap(scaleMatrix, mupdf.ColorSpace.DeviceRGB, false);
    return pixmap.asJPEG(80);
  } finally {
    doc.destroy();
  }
};

Open the document, load page zero, rasterize to a pixmap in RGB color space, encode as JPEG at quality 80, return the bytes. The WASM module handles everything underneath: object parsing, stream decompression, font rendering, image decoding. You hand it an ArrayBuffer, you get a Uint8Array back. The whole thing runs in-process with near-native performance because it's compiled C running through the WASM runtime.

On output format: I landed on JPEG rather than PNG because the first page of an investor presentation is essentially a photograph: complex background gradients, company branding, photos of management. PNG would be lossless but significantly larger with no visible quality gain for a thumbnail. WebP would actually have been the better long-term choice here, typically 25-35% smaller than JPEG at equivalent visual quality and universally supported in modern browsers. MuPDF does expose a WebP output path in newer versions, but the bindings I was working with only exposed asJPEG and asPNG. JPEG at quality 80 is a solid middle ground, and since thumbnails end up permanently cached anyway, the incremental bandwidth savings wouldn't change the UX story meaningfully.

The 3-Layer Cache

The entire design is built around one rule: after the first request, no PDF should ever be fetched or rendered again. The thumbnail is permanent.

Layer 1: Redis

The cache key is presentation-thumbnail:v1:${sha256(pdfUrl)}. We hash the PDF URL with SHA-256 and use that as the key. The value stored is not the image itself, it's the S3 key where the image lives. TTL is 24 hours.

If this key exists in Redis, we skip everything. We already know exactly where the image is stored and go fetch it directly.

Layer 2: S3

The S3 key is public/presentations/thumbnails/${sha256(pdfUrl)}.jpg. Before rendering, we run a HEAD request against that key. If the object exists, we populate Redis and return the key. This is the recovery path for when Redis has expired but S3 still has the file. No render needed.

Layer 3: Generation

Only reached if both Redis and S3 miss. Fetch the PDF, render it, return the buffer to the browser, and upload to S3 in the background.

The SHA-256 hash of the PDF URL as the storage key is deliberate: it's content-addressable. If two different tickers reference the same presentation URL, they map to the same S3 key. One file stored, never rendered twice, regardless of how many stocks point to it.

The background upload is also deliberate. We don't await the S3 put before responding to the browser. The JPEG buffer is already in memory. We return it immediately and let the upload finish asynchronously. Redis gets populated in the .then handler. From the user's perspective, first-generation requests feel just as fast as cached ones. The only overhead they experience is the PDF fetch plus the render itself.

The response also carries Cache-Control: public, max-age=86400 and CDN-Cache-Control: public, max-age=86400. That's a 24-hour browser and CDN cache layered on top of everything else. Once a thumbnail has been served once, the browser won't ask for it again for another day.

Deduplication: Sharing Promises

Here's a race condition that's easy to overlook. A user opens the presentations tab for a company for the first time. The page loads and fires several thumbnail requests simultaneously, all for presentations that haven't been cached yet. Without deduplication, you'd kick off parallel PDF fetches and renders for potentially the same presentation. Redundant work, and it hammers the source PDF server unnecessarily.

We handle this with a utility called createCacheItem. It's an in-process promise cache:

  • -First request for a given S3 key: allocates a Promise.withResolvers(), starts generation
  • -Every subsequent concurrent request for the same key: gets back the same promise
  • -When generation resolves: all waiters resolve simultaneously with the same result

This is not a mutex. A mutex serializes access so each waiter does its own work in turn. createCacheItem collapses all concurrent requests into a single inflight operation. One PDF fetch, one render, every concurrent caller gets the result.

The promises are stored in a two-level Map keyed by (cacheId, itemId). Once resolved, the entry lives for a configurable TTL before being cleared. The TTL for thumbnail generation is 60 seconds, which is enough to absorb any burst of concurrent requests while eventually freeing the memory.

Concurrency Limiting

PDF rendering is CPU-intensive. MuPDF runs synchronously in WASM, blocking the thread for the duration of the render. In a Bun/Node process with a single event loop thread, stacking too many concurrent renders starves everything else in the server.

The fix is a concurrency-limited sequence runner:

SCHEDULER.runInSequenceLocal(
  'thumbnail-gen',
  async () => { /* fetch + render */ },
  { maxConcurrency: 5 },
)

This limits active renders to 5 at any time. The implementation distributes incoming work across N lanes using a sequence counter modulo N. Each lane is a lock held as an entry in a Map, with a resolve function as the value. When a lane's current holder finishes, it resolves the next waiter's callback, handing the lock off in order. No Redis involved, no overhead. Just a local queue with bounded depth.

The combination of createCacheItem and the concurrency limiter means: no duplicated renders, no unbounded CPU usage, and a clean upper bound on memory pressure from inflight work.

Retry Logic and Failure Modes

IR websites are not known for their reliability. PDF hosts rate-limit, return intermittent errors, and some actively block cloud datacenter IP ranges. That last one is a real problem. Enterprise PDF hosting services that IR teams often use have bot mitigation layers that block AWS, GCP, and Azure IP ranges by default.

We retry twice with a growing delay between attempts and a 15-second timeout per fetch:

for (let attempt = 0; attempt <= THUMBNAIL_MAX_RETRIES; attempt++) {
  if (attempt > 0) {
    await new Promise((r) => setTimeout(r, THUMBNAIL_RETRY_DELAY * attempt));
  }
  const response = await fetchWithTimeout(pdfUrl, undefined, {
    timeout: THUMBNAIL_FETCH_TIMEOUT,
  });
  if (response.ok) {
    pdfBuffer = await response.arrayBuffer();
    break;
  }
}

// MAX_RETRIES = 2, RETRY_DELAY = 1000ms, FETCH_TIMEOUT = 15000ms

If all retries are exhausted, we throw. The controller catches it and returns a 204 with Cache-Control: no-store:

return new Response(null, {
  status: 204,
  headers: { 'Cache-Control': 'no-store' },
});

The no-store header matters. We don't want any cache, browser or CDN, to hold onto this failure. On the next visit the browser tries again. Maybe the IP block has rotated. Maybe the server came back up. Failures are explicitly treated as temporary.

On the frontend, while the thumbnail is loading, the card shows the presentation title and quarter badge as a placeholder. When the image loads, those get hidden behind it since the cover slide already shows the same information. If a 204 comes back, the placeholder stays. No broken images, no empty boxes, just the text that was always going to be there anyway.

The Full Picture

The thumbnail endpoint sits inside a larger document system for the platform. The same module handles earnings call transcripts with a fallback audio resolver for recent calls where the recording isn't indexed yet, SEC filings from EDGAR with CIK-based lookups, and on-demand PDF conversion for SEC filings that are HTML or XML rather than native PDFs.

That last one is worth a mention. SEC filings are often not PDFs at all. They're HTML or XBRL-tagged XML documents. The browser can't render them inline in a useful way. So for those filings, we run them through Puppeteer: fetch the HTML from SEC's archive servers with the required User-Agent identifier, render it headless, return a PDF. This is actually where Puppeteer earns its place in the stack, because there is no other sane way to render structured XBRL markup into something readable. The SEC enforces a 10-request-per-second rate limit that they document publicly. We respect it with a scheduler-enforced concurrency window: max 10 concurrent requests within a 1-second sliding window.

None of the raw source URLs are ever exposed to the frontend. Every PDF, every thumbnail, every transcript is served through the platform's own API. The presentation list returns proxy endpoints. The thumbnail URL is just /stocks/:symbol/documents/presentations/:index/thumbnail. The index maps to a position in the cached response from the data provider, valid for 1 hour. This keeps every external dependency entirely behind our own interface.

What I Took Away From This

The thing that stands out looking back is how few lines of code the actual rendering takes. The challenging part wasn't the MuPDF integration. It was designing the cache correctly so the render happens exactly once, handling every way that "exactly once" can be violated (concurrent requests, cache expiry, cross-stock deduplication), and making sure failures degrade without showing anything broken to the user.

The deduplication and the background upload were the two decisions that felt the most deliberate. Promise-sharing means you never do redundant work under concurrency. Background upload means first-generation latency is just fetch-plus-render, never fetch-plus-render-plus-S3-upload. Those two together mean the system behaves consistently whether it's handling one request or a burst of twenty.

The thumbnails are live. Shareholder presentations now load with cover slide previews. They look good.

EXPLORE MORE
ALL POSTS

Browse all engineering deep-dives and technical write-ups.

Contact