<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>rust on Abhishek Murthy</title><link>https://abhishekmurthy.com/tags/rust/</link><description>Recent content in rust on Abhishek Murthy</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Sat, 23 May 2026 21:22:16 -0400</lastBuildDate><atom:link href="https://abhishekmurthy.com/tags/rust/index.xml" rel="self" type="application/rss+xml"/><item><title>Durable OCR pipelines with Restate, Rust, and agent workers</title><link>https://abhishekmurthy.com/posts/building-pdf-extraction-pipelines-that-survive-real-documents/</link><pubDate>Sat, 23 May 2026 21:22:16 -0400</pubDate><guid>https://abhishekmurthy.com/posts/building-pdf-extraction-pipelines-that-survive-real-documents/</guid><description>Most document extraction systems start life as a three-line demo:
text = pdf.extract_text() result = model.extract(schema, text) save(result) That demo is useful because it proves the shape of the product. It is also where the architecture usually starts lying to you.
The real system is not &amp;ldquo;PDF in, JSON out&amp;rdquo;. It is a distributed rendering, OCR, indexing, retrieval, agent execution, validation, and evaluation pipeline with unreliable inputs at every layer. The failure modes are not just &amp;ldquo;the model got the answer wrong&amp;rdquo;.</description></item><item><title>Building a search engine that fits in your L3 cache</title><link>https://abhishekmurthy.com/posts/search-engine-fits-in-l3-cache/</link><pubDate>Mon, 24 Nov 2025 16:05:24 -0500</pubDate><guid>https://abhishekmurthy.com/posts/search-engine-fits-in-l3-cache/</guid><description>The first version of my search engine was slower after I added an index.
That sounds backwards, but it is a real failure mode. A bad index can turn a simple sequential scan into a pile of cache misses, hash lookups, tiny heap allocations, branchy score calculations, and random memory walks. The CPU stops doing search and starts waiting for memory.
The target I wanted was intentionally unreasonable: a local search engine for a few hundred thousand short technical records that could answer ranked queries inside an interactive UI budget, while keeping the hot path small enough to stay friendly to an L3 cache.</description></item></channel></rss>