Analysis of https://fly.io/blog/feed.xml

Feed fetched in 244 ms.
Content type is text/xml.
Feed is 726,041 characters long.
Feed has an ETag of W/"69b3d033-b1987".
Feed has a last modified date of Fri, 13 Mar 2026 08:52:03 GMT.
Feed is well-formed XML.
Warning Feed has no styling.
This is an Atom feed.
Feed title: The Fly Blog
Error Feed self link: https://fly.io/blog/ does not match feed URL: https://fly.io/blog/feed.xml.
Warning Feed is missing an image.
Feed has 40 items.
First item published on 2026-03-12T19:30:19.000Z
Last item published on 2024-02-12T00:00:00.000Z
All items have published dates.
Newest item was published on 2026-03-12T19:30:19.000Z.
Info Feed's Last-Modified date is newer than the newest item's published date (2026-03-13T08:52:03.000Z > 2026-03-12T19:30:19.000Z).
Home page URL: https://fly.io/blog/
Error Home page does not have any feed discovery link in the <head>.
Home page has a link to the feed in the <body>

Formatted XML
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <title>The Fly Blog</title>
    <subtitle>News, tips, and tricks from the team at Fly</subtitle>
    <id>https://fly.io/blog/</id>
    <link href="https://fly.io/blog/"/>
    <link href="https://fly.io/blog/" rel="self"/>
    <updated>2026-03-10T00:00:00+00:00</updated>
    <author>
        <name>Fly</name>
    </author>
    <entry>
        <title>Unfortunately, Sprites Now Speak MCP</title>
        <link rel="alternate" href="https://fly.io/blog/unfortunately-mcp/"/>
        <id>https://fly.io/blog/unfortunately-mcp/</id>
        <published>2026-03-10T00:00:00+00:00</published>
        <updated>2026-03-12T19:30:19+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/unfortunately-mcp/assets/whack.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Sprites are disposable cloud computers. They appear instantly, always include durable filesystems, and cost practically nothing when idle. They’re the best and safest place on the Internet to run agents and we want you to &lt;a href="https://sprites.dev/" title=""&gt;create dozens of them&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Sprites are a place to run agents; the first thing you should think to do with a new Sprite is to type &lt;code&gt;claude&lt;/code&gt; (or &lt;code&gt;gemini&lt;/code&gt; or &lt;code&gt;codex&lt;/code&gt;). We&amp;rsquo;ve put a &lt;a href='https://fly.io/blog/design-and-implementation/' title=''&gt;lot of effort&lt;/a&gt; into making sure coding agents feel safe and happy when they&amp;rsquo;re on Sprites, because, to (probably) quote John von Neumann, &amp;ldquo;happy agents are productive agents.&amp;rdquo;&lt;/p&gt;

&lt;p&gt;What&amp;rsquo;s less obvious about Sprites is that they&amp;rsquo;re great tools &lt;em&gt;for&lt;/em&gt; agents. Want three different versions of a new feature? A test environment? An ensemble of cooperating services? It&amp;rsquo;s super handy to be able to start your prompts, &amp;ldquo;&lt;code&gt;On a new Sprite, do…&lt;/code&gt;&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;The Sprites API is simple, discoverable, and designed for this use case. It&amp;rsquo;s just a question of how you choose to give your agent access to it. And now there&amp;rsquo;s one more way: with MCP.&lt;/p&gt;
&lt;h2 id='we-did-this-because-your-agents-suck' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-did-this-because-your-agents-suck' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We Did This Because Your Agents Suck&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;This feature works well, but we&amp;rsquo;re less than enthusiastic about it. Not as product developers, mind you. It&amp;rsquo;s a good product! Just as aesthetes.&lt;/p&gt;

&lt;p&gt;In 2026, MCP is the wrong way to extend the capabilities of an agent. The emerging Right Way to do this is command line tools and discoverable APIs.&lt;/p&gt;

&lt;p&gt;When we plug an MCP into your agent, we&amp;rsquo;re filling its context with tool descriptions, many of which you&amp;rsquo;ll probably never use. Really, all your agent should need is a short sentence, like &amp;ldquo;&lt;code&gt;Use this skill whenever users want to create a new VM to run a task on, or to manage the VMs already available.&lt;/code&gt;&amp;rdquo; The skill should take care of the rest.&lt;/p&gt;

&lt;p&gt;CLI-driven agent skills are efficient because they reveal capabilities progressively. You can do CLI subcommands, like &lt;code&gt;sprite checkpoint&lt;/code&gt; and &lt;code&gt;sprite exec&lt;/code&gt;, or with API endpoints and subpaths. Good agent harnesses are uncanny at quickly working out how to use these things.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;You &lt;em&gt;are&lt;/em&gt; using Playwright, right? “Make sure this web application actually works before you tell me you’re done”?&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Take &lt;a href='https://playwright.dev/' title=''&gt;Playwright, the industry-standard browser automation tool&lt;/a&gt;. Ask &lt;code&gt;claude&lt;/code&gt; to install Playwright and Chrome and there&amp;rsquo;s a coinflip chance it sets up the MCP server. But notice that when the coin comes up tails, Playwright still works. &lt;code&gt;claude&lt;/code&gt; just drives it by writing little scripts. This is good! The models already know how to write little scripts without using up context.&lt;/p&gt;

&lt;p&gt;And there&amp;rsquo;s more at stake than just efficiency. Cramming your context full of MCP tool descriptions is a way of signaling to the model that those tools are important to you. But not every Sprite command is equally important in every setting. If you&amp;rsquo;re not using network policies, you don&amp;rsquo;t need &lt;code&gt;gemini&lt;/code&gt; to waste a bunch of time setting them up for you.&lt;/p&gt;

&lt;p&gt;Skills and APIs are the best way to drive Sprites. But to make that work, you need an agent that can run shell commands for itself. So you&amp;rsquo;ll want to reach for MCP sessions when you&amp;rsquo;re stuck with an agent that can&amp;rsquo;t run commands. Thankfully, most of us aren&amp;rsquo;t using those kinds of agents anymore. In &lt;code&gt;claude&lt;/code&gt;, &lt;code&gt;gemini&lt;/code&gt;, or &lt;code&gt;codex&lt;/code&gt;, you should just show your agent the &lt;code&gt;sprite&lt;/code&gt; CLI and let it impress you.&lt;/p&gt;
&lt;h2 id='sprites-dev-mcp' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#sprites-dev-mcp' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;sprites.dev/mcp&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Plug this URL into Claude Desktop, or any other agent tool that speaks MCP. You&amp;rsquo;ll authenticate to one of your Fly.io organizations, and your agent will speak Sprites.&lt;/p&gt;

&lt;p&gt;Then:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On a new Sprite, take this repository and reproduce this bug from issues/913, capturing logs.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On a new Sprite, benchmark this function across 1000 runs and summarize the results.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On a new Sprite, update all the dependencies on this project to their newest versions and test that everything works.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On 3 new Sprites, change this service to use each of these 3 query libraries, and use HTTP to test latency.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On a new Sprite, run this code with bpfwatch and show me what files it touches.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On a new Sprite, run a load generator against this endpoint for 60 seconds and report the results.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On a new Sprite, download this dataset and give me a Jupyter notebook to explore it in.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On a new Sprite, set up a webhook receiver and render a real-time web report of all the payloads it receives.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;I don&amp;rsquo;t know. You know your projects better than we do. Whatever. Sometimes you want a clean, cheap, disposable computer (or five of them). That&amp;rsquo;s now an available feature of all your prompts. Find ways to apply it to your project, and we think you&amp;rsquo;ll end up wondering where Sprites have been all your life.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Some of you are thinking to yourself: “this feature is going to result in robots ruining my life”. We agree. So we’ve built in guardrails. When you authenticate, giving your agent access to a single specific organization on your Fly.io account, we’ll let you scope down the MCP session. You can cap the number of Sprites our MCP will create for you, and you can give them name prefixes so you can easily spot the robots and disassemble them.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='fuck-stateless-sandboxes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fuck-stateless-sandboxes' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Fuck Stateless Sandboxes&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ll keep saying this until our faces turn blue: the industry is stuck on &amp;ldquo;sandboxes&amp;rdquo; as a way of letting agents run code, and sandboxes aren&amp;rsquo;t good enough anymore. What agents want is real computers, with real filesystems, connected to real networks, and there&amp;rsquo;s no technical reason not to give them some.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://fly.io/blog/code-and-let-live/' title=''&gt;We designed Sprites so that you can fearlessly create whole bunches of them&lt;/a&gt;. They&amp;rsquo;re responsive enough to host web apps for your team, but they idle in a sleeping state where they cost virtually nothing. Everybody at Fly.io that uses them ends up with 20 or 30, just hanging around.&lt;/p&gt;

&lt;p&gt;We think you&amp;rsquo;ll do better work when you can pull in as many computers as you need to solve problems. If it takes an MCP server for us to get you to do that, so be it.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Litestream Writable VFS</title>
        <link rel="alternate" href="https://fly.io/blog/litestream-writable-vfs/"/>
        <id>https://fly.io/blog/litestream-writable-vfs/</id>
        <published>2026-01-29T00:00:00+00:00</published>
        <updated>2026-02-04T23:24:24+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/litestream-writable-vfs/assets/litestream-writable-vfs.jpg"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;I’m Ben Johnson, and I work on Litestream at Fly.io. Litestream is the missing backup/restore system for SQLite. It’s free, open-source software that should run anywhere, and&lt;/strong&gt; &lt;a href="/blog/litestream-v050-is-here/" title=""&gt;&lt;strong class="font-semibold text-navy-950"&gt;you can read more about it here&lt;/strong&gt;&lt;/a&gt;&lt;strong class="font-semibold text-navy-950"&gt;.&lt;/strong&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Each time we write about it, we get a little bit better at golfing down a description of what Litestream is. Here goes: Litestream is a Unix-y tool for keeping a SQLite database synchronized with S3-style object storage. It&amp;rsquo;s a way of getting the speed and simplicity wins of SQLite without exposing yourself to catastrophic data loss. Your app doesn&amp;rsquo;t necessarily even need to know it&amp;rsquo;s there; you can just run it as a tool in the background.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s been a busy couple weeks!&lt;/p&gt;

&lt;p&gt;We recently &lt;a href='/blog/design-and-implementation/' title=''&gt;unveiled Sprites&lt;/a&gt;. If you don&amp;rsquo;t know what Sprites are, you should just &lt;a href='https://sprites.dev/' title=''&gt;go check them out&lt;/a&gt;. They&amp;rsquo;re one of the coolest things we&amp;rsquo;ve ever shipped. I won&amp;rsquo;t waste any more time selling them to you. Just, Sprites are a big deal, and so it&amp;rsquo;s a big deal to me that Litestream is a load-bearing component for them.&lt;/p&gt;

&lt;p&gt;Sprites rely directly on Litestream in two big ways.&lt;/p&gt;

&lt;p&gt;First, Litestream SQLite is the core of our global Sprites orchestrator. Unlike our flagship Fly Machines product, which relies on a centralized Postgres cluster, our Elixir Sprites orchestrator runs directly off S3-compatible object storage. Every organization  enrolled in Sprites gets their own SQLite database, synchronized by Litestream.&lt;/p&gt;

&lt;p&gt;This is a fun design. It takes advantage of the &amp;ldquo;many SQLite databases&amp;rdquo; pattern, which is under-appreciated. It&amp;rsquo;s got nice scaling characteristics. Keeping that Postgres cluster happy as Fly.io grew has been a major engineering challenge.&lt;/p&gt;

&lt;p&gt;But as far as Litestream is concerned, the orchestrator is boring, and so that&amp;rsquo;s all I&amp;rsquo;ve got to say about it. The second way Sprites use Litestream is much more interesting.&lt;/p&gt;

&lt;p&gt;Litestream is built directly into the disk storage stack that runs on every Sprite.&lt;/p&gt;

&lt;p&gt;Sprites launch in under a second, and every one of them boots up with 100GB of durable storage. That&amp;rsquo;s a tricky bit of engineering. We&amp;rsquo;re able to do this because the  root of storage for Sprites is S3-compatible object storage, and we&amp;rsquo;re able to make it fast by keeping a database of in-use storage blocks that takes advantage of attached NVMe as a read-through cache. The system that does this is JuiceFS, and the database — let&amp;rsquo;s call it &amp;ldquo;the block map&amp;rdquo; — is a rewritten metadata store, based (you guessed it) on BoltDB.&lt;/p&gt;

&lt;p&gt;I kid! It&amp;rsquo;s Litestream SQLite, of course.&lt;/p&gt;
&lt;h2 id='sprite-storage-is-fussy' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#sprite-storage-is-fussy' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Sprite Storage Is Fussy&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Everything in a Sprite is designed to come up fast.&lt;/p&gt;

&lt;p&gt;If the Fly Machine underneath a Sprite bounces, we might need to reconstitute the block map from object storage. Block maps aren&amp;rsquo;t huge, but they&amp;rsquo;re not tiny; maybe low tens of megabytes worst case.&lt;/p&gt;

&lt;p&gt;The thing is, this is happening while the Sprite boots back up. To put that in perspective, that&amp;rsquo;s something that can happen in response to an incoming web request; that is, we have to finish fast enough to generate a timely response to that request. The time budget is small.&lt;/p&gt;

&lt;p&gt;To make this even faster, we are integrating Litestream VFS to improve start times.The VFS is a dynamic library you load into your app. Once you do, you can do stuff like this:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-duvcwc7p"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-duvcwc7p"&gt;sqlite&amp;gt; .open file:///my.db?vfs&lt;span class="o"&gt;=&lt;/span&gt;litestream
sqlite&amp;gt; PRAGMA litestream_time &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'5 minutes ago'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
sqlite&amp;gt; SELECT &lt;span class="k"&gt;*&lt;/span&gt; FROM sandwich_ratings ORDER BY RANDOM&lt;span class="o"&gt;()&lt;/span&gt; LIMIT 3 &lt;span class="p"&gt;;&lt;/span&gt; 
22|Veggie Delight|New York|4
30|Meatball|Los Angeles|5
168|Chicken Shawarma Wrap|Detroit|5
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Litestream VFS lets us run point-in-time SQLite queries hot off object storage blobs, answering queries before we&amp;rsquo;ve downloaded the database.&lt;/p&gt;

&lt;p&gt;This is good, but it&amp;rsquo;s not perfect. We had two problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We could only read, not write. People write to Sprite disks. The storage stack needs to write, right away.
&lt;/li&gt;&lt;li&gt;Running a query off object storage is a godsend in a cold start where we have no other alternative besides downloading the whole database, but it&amp;rsquo;s not fast enough for steady state.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;These are fun problems. Here&amp;rsquo;s our first cut at solving them.&lt;/p&gt;
&lt;h2 id='writable-vfs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#writable-vfs' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Writable VFS&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The first thing we&amp;rsquo;ve done is made the VFS optionally read-write. This feature is pretty subtle; it&amp;rsquo;s interesting, but it&amp;rsquo;s not as general-purpose as it might look. Let me explain how it works, and then explain why it works this way.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Keep in mind as you read this that this is about the VFS in particular. Obviously, normal SQLite databases using Litestream the normal way are writeable.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The VFS works by keeping an index of &lt;code&gt;(file,offset, size)&lt;/code&gt; for every page of the database in object storage; the data comprising the index is stored, &lt;a href='https://github.com/superfly/ltx' title=''&gt;in LTX files&lt;/a&gt;, so that it&amp;rsquo;s efficient for us to reconstitute it quickly when the VFS starts, and lookups are heavily cached. When we queried &lt;code&gt;sandwich_ratings&lt;/code&gt; earlier, our VFS library intercepted the SQLite read method, looked up the requested page in the index, fetched it, and cached it.&lt;/p&gt;

&lt;p&gt;This works great for reads. Writes are harder.&lt;/p&gt;

&lt;p&gt;Behind the scenes in read-only mode, Litestream polls, so that we can detect new LTX files created by remote writers to the database. This supports a handy use case where we&amp;rsquo;re running tests or doing slow analytical queries of databases that need to stay fast in prod.&lt;/p&gt;

&lt;p&gt;In write mode, we don&amp;rsquo;t allow multiple writers, because multiple-writer distributed SQLite databases are the &lt;a href='https://hellraiser.fandom.com/wiki/Lament_Configuration' title=''&gt;Lament Configuration&lt;/a&gt; and we are not explorers over great vistas of pain. So the VFS in write-mode disables polling. We assume a single writer, and no additional backups to watch.&lt;/p&gt;

&lt;p&gt;Next, we buffer. Writes go to a local temporary buffer (&amp;ldquo;the write buffer&amp;rdquo;). Every second or so (or on clean shutdown), we sync the write buffer with object storage. Nothing written through the VFS is truly durable until that sync happens.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Most storage block maps are much smaller than this, but still.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Now, remember the use case we&amp;rsquo;re looking to support here. A Sprite is cold-starting and its storage stack needs to serve writes, milliseconds after booting, without having a full copy of the 10MB block map. This writeable VFS mode lets us do that.&lt;/p&gt;

&lt;p&gt;Critically, we support that use case only up to the same durability requirements that a Sprite already has. All storage on a Sprite shares this &amp;ldquo;eventual durability&amp;rdquo; property, so the terms of the VFS write make sense here. They probably don&amp;rsquo;t make sense for your application. But if for some reason they do, have at it! To enable writes with Litestream VFS, just set the &lt;code&gt;LITESTREAM_WRITE_ENABLED&lt;/code&gt; environment variable &lt;code&gt;&amp;quot;true&amp;quot;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/litestream-writable-vfs/assets/write-path.png" /&gt;&lt;/p&gt;
&lt;h2 id='hydration' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#hydration' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Hydration&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The Sprite storage stack uses SQLite in VFS mode. In our original VFS design, most data is kept in S3. Again: fine at cold start, not so fine in steady state.&lt;/p&gt;

&lt;p&gt;To solve this problem, we shoplifted a trick from &lt;a href='https://docs.kernel.org/admin-guide/device-mapper/dm-clone.html' title=''&gt;systems like dm-clone&lt;/a&gt;: background hydration. In hydration designs, we serve queries remotely while running a loop to pull the whole database.  When you start the VFS with the &lt;code&gt;LITESTREAM_HYDRATION_PATH&lt;/code&gt; environment variable set, we&amp;rsquo;ll hydrate to that file.&lt;/p&gt;

&lt;p&gt;Hydration takes advantage of &lt;a href='https://fly.io/blog/litestream-revamped#point-in-time-restores-but-fast' title=''&gt;LTX compaction&lt;/a&gt;, writing only the latest versions of each page. Reads don&amp;rsquo;t block on hydration; we serve them from object storage immediately, and switch over to the hydration file when it&amp;rsquo;s ready.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/litestream-writable-vfs/assets/timeline.png" /&gt;&lt;/p&gt;

&lt;p&gt;As for the hydration file? It&amp;rsquo;s simply a full copy of your database. It&amp;rsquo;s the same thing you get if you run &lt;code&gt;litestream restore&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Because this is designed for environments like Sprites, which bounce a lot, we write the database to a temporary file. We can&amp;rsquo;t trust that the database is using the latest state every time we start up, not without doing a full restore, so we just chuck the hydration file when we exit the VFS. That behavior is baked into the VFS right now. This feature&amp;rsquo;s got what Sprites need, but again, maybe not what your app wants.&lt;/p&gt;
&lt;h2 id='putting-it-all-together' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#putting-it-all-together' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Putting It All Together&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;This is a post about two relatively big moves we&amp;rsquo;ve made with our open-source Litestream project, but the features are narrowly scoped for problems that look like the ones our storage stack needs. If you think you can get use out of them, I&amp;rsquo;m thrilled, and I hope you&amp;rsquo;ll tell me about it.&lt;/p&gt;

&lt;p&gt;For ordinary read/write workloads, you don&amp;rsquo;t need any of this mechanism. Litestream works fine without the VFS, with unmodified applications, just running as a sidecar alongside your application. The whole point of that configuration is to efficiently keep up with writes; that&amp;rsquo;s easy when you know you have the whole database to work with when writes happen.&lt;/p&gt;

&lt;p&gt;But this whole thing is, to me, a valuable case study in how Litestream can get used in a relatively complicated and demanding problem domain. Sprites are very cool, and it&amp;rsquo;s satisfying to know that every disk write that happens on a Sprite is running through Litestream.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>The Design &amp; Implementation of Sprites</title>
        <link rel="alternate" href="https://fly.io/blog/design-and-implementation/"/>
        <id>https://fly.io/blog/design-and-implementation/</id>
        <published>2026-01-14T00:00:00+00:00</published>
        <updated>2026-01-16T20:23:36+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/design-and-implementation/assets/starry-containers.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, and this is the place in the post where we’d normally tell you that our job is to &lt;a href="https://fly.io/blog/docker-without-docker/" title=""&gt;take your containers and run them on our own hardware&lt;/a&gt; all around the world. But last week, we &lt;a href="https://sprites.dev/" title=""&gt;launched Sprites&lt;/a&gt;, and they don’t work that way at all. Sprites are something new: Docker without Docker without Docker. This post is about how they work.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Replacement-level homeowners buy boxes of pens and stick them in &amp;ldquo;the pen drawer&amp;rdquo;. What the elites know: you have to think adversarially about pens. &amp;ldquo;The purpose of a system is what it does&amp;rdquo;; a household&amp;rsquo;s is to uniformly distribute pens.  Months from now, the drawer will be empty, no matter how many pens you stockpile. Instead, scatter pens every place you could possibly think to look for one — drawers, ledges, desks. Any time anybody needs a pen, several are at hand, in exactly the first place they look.&lt;/p&gt;

&lt;p&gt;This is the best way I&amp;rsquo;ve found to articulate the idea of &lt;a href='https://sprites.dev/' title=''&gt;Sprites&lt;/a&gt;, the platform we just launched at Fly.io. Sprites are ball-point disposable computers. Whatever mark you mean to make, we&amp;rsquo;ve rigged it so you&amp;rsquo;re never more than a second or two away from having a Sprite to do it with.&lt;/p&gt;

&lt;p&gt;Sprites are Linux virtual machines. You get root. They &lt;code&gt;create&lt;/code&gt; in just a second or two: so fast, the experience of creating and shelling into one is identical to SSH&amp;#39;ing into a machine that already exists. Sprites all have a 100GB durable root filesystem. They put themselves to sleep automatically when inactive, and cost practically nothing while asleep.&lt;/p&gt;

&lt;p&gt;As a result, I barely feel the need to name my Sprites. Sometimes I&amp;rsquo;ll just type &lt;code&gt;sprite create dkjsdjk&lt;/code&gt; and start some task. People at Fly.io who use Sprites have dozens hanging around.&lt;/p&gt;

&lt;p&gt;There aren&amp;rsquo;t yet many things in cloud computing that have the exact shape Sprites do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instant creation
&lt;/li&gt;&lt;li&gt;No time limits
&lt;/li&gt;&lt;li&gt;Persistent disk
&lt;/li&gt;&lt;li&gt;Auto-sleep to a cheap inactive state
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This is a post about how we managed to get this working. We created a new orchestration stack that undoes some of the core decisions we made for &lt;a href='https://fly.io/machines' title=''&gt;Fly Machines&lt;/a&gt;, our flagship product. Turns out, these new decisions make Sprites drastically easier for us to scale and manage. We&amp;rsquo;re pretty psyched.&lt;/p&gt;

&lt;p&gt;Lucky for me, there happen to be three &lt;code&gt;big decisions&lt;/code&gt; we made that get you 90% of the way from Fly Machines to Sprites, which makes this an easy post to write. So, without further ado:&lt;/p&gt;
&lt;h2 id='decision-1-no-more-container-images' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#decision-1-no-more-container-images' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Decision #1: No More Container Images&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;This is the easiest decision to explain.&lt;/p&gt;

&lt;p&gt;Fly Machines are approximately &lt;a href='https://fly.io/blog/docker-without-docker/' title=''&gt;OCI containers repackaged as KVM micro-VMs&lt;/a&gt;. They have the ergonomics of Docker but the isolation and security of an EC2 instance. We love them very much and they&amp;rsquo;re clearly the wrong basis for a ball-point disposable cloud computer.&lt;/p&gt;

&lt;p&gt;The &amp;ldquo;one weird trick&amp;rdquo; of Fly Machines is that they &lt;code&gt;start&lt;/code&gt; and &lt;code&gt;stop&lt;/code&gt; instantly, fast enough that they can wake in time to handle an incoming HTTP request. But they can only do that if you&amp;rsquo;ve already &lt;code&gt;created&lt;/code&gt; them. You have to preallocate. &lt;code&gt;Creating&lt;/code&gt; a Fly Machine can take over a minute. What you&amp;rsquo;re supposed to do is to create a whole bunch of them and &lt;code&gt;stop&lt;/code&gt; them so they&amp;rsquo;re ready when you need them. But for Sprites, we need &lt;code&gt;create&lt;/code&gt; to be so fast it feels like they&amp;rsquo;re already there waiting for you.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;We only murdered user containers because we wanted them dead.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Most of what&amp;rsquo;s slow about &lt;code&gt;creating&lt;/code&gt; a Fly Machine is containers. I say this with affection: your containers are crazier than a soup sandwich. Huge and fussy, they take forever to &lt;a href='https://fly.io/blog/docker-without-docker/' title=''&gt;pull and unpack&lt;/a&gt;. The regional locality sucks;  &lt;code&gt;create&lt;/code&gt; a Fly Machine in São Paulo on &lt;code&gt;gru-3838&lt;/code&gt;, and a &lt;code&gt;create&lt;/code&gt; on  &lt;code&gt;gru-d795&lt;/code&gt; is no faster. A &lt;a href='https://community.fly.io/t/global-registry-now-in-production/13723' title=''&gt;truly heartbreaking&lt;/a&gt; amount of &lt;a href='https://community.fly.io/t/faster-more-reliable-remote-image-builds-deploys/25841' title=''&gt;engineering work&lt;/a&gt; has gone into just allowing our OCI registry to &lt;a href='https://www.youtube.com/watch?v=0jD-Rt4_CR8' title=''&gt;keep up&lt;/a&gt; with this system. &lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s a tough job, is all I&amp;rsquo;m saying. Sprites get rid of the user-facing container. Literally: problem solved. Sprites get to do this on easy mode.&lt;/p&gt;

&lt;p&gt;Now, today, under the hood, Sprites are still Fly Machines. But they all run from a standard container. Every physical worker knows exactly what container the next Sprite is going to start with, so it&amp;rsquo;s easy for us to keep pools of &amp;ldquo;empty&amp;rdquo; Sprites standing by. The result: a Sprite &lt;code&gt;create&lt;/code&gt; doesn&amp;rsquo;t have any heavy lifting to do; it&amp;rsquo;s basically just doing the stuff we do when we &lt;code&gt;start&lt;/code&gt; a Fly Machine.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;This all works right now.&lt;/h1&gt;
    &lt;p&gt;You can create a couple dozen Sprites right now if you want. It&amp;rsquo;ll only take a second.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://sprites.dev/"&gt;
        Make a Sprite.  &lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='decision-2-object-storage-for-disks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#decision-2-object-storage-for-disks' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Decision #2: Object Storage For Disks&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Every Sprite comes with 100GB of durable storage. We&amp;rsquo;re able to do that because the root of storage is S3-compatible object storage.&lt;/p&gt;

&lt;p&gt;You can arrange for 100GB of storage for a Fly Machine. Or 200, or 500. The catch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have to ask (with &lt;code&gt;flyctl&lt;/code&gt;); we can&amp;rsquo;t reasonably default it in.
&lt;/li&gt;&lt;li&gt;That storage is NVMe attached to the physical server your Fly Machine is on. 
&lt;/li&gt;&lt;/ul&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;[†] we print a &lt;span style="color: red"&gt;big red warning&lt;/span&gt; about this if you try to make a single-node cluster&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We designed the storage stack for Fly Machines for Postgres clusters. A multi-replica Postgres cluster gets good mileage out of Fly Volumes. Attached storage is fast, but can &lt;span style="color: red"&gt;lose data†&lt;/span&gt; — if a physical blows up, there&amp;rsquo;s no magic what rescues its stored bits. You&amp;rsquo;re stuck with our last snapshot backup. That&amp;rsquo;s fine for a replicated Postgres! It&amp;rsquo;s part of what Postgres replication is for. But for anything without explicit replication, it&amp;rsquo;s a very sharp edge.&lt;/p&gt;

&lt;p&gt;Worse, from our perspective, is that attached storage anchors workloads to specific physicals. We have lots of reasons to want to move Fly Machines around. Before we  did Fly Volumes, that was as simple as pushing a &amp;ldquo;drain&amp;rdquo; button on a server. Imagine losing a capability like that. It took 3 years to &lt;a href='https://fly.io/blog/machine-migrations/' title=''&gt;get workload migration right&lt;/a&gt; with attached storage, and it&amp;rsquo;s still not &amp;ldquo;easy&amp;rdquo;.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Object stores are the Internet’s Hoover Dams, the closest things we have to infrastructure megaprojects.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Sprites jettison this model. We still exploit NVMe, but not as the root of storage. Instead, it&amp;rsquo;s a read-through cache for a blob on object storage. S3-compatible object stores are the most trustworthy storage technology we have. I can feel my blood pressure dropping just typing the words &amp;ldquo;Sprites are backed by object storage.&amp;rdquo;&lt;/p&gt;

&lt;p&gt;The implications of this for orchestration are profound. In a real sense, the durable state of a Sprite is simply a URL. Wherever he lays his hat is his home! They migrate (or recover from failed physicals) trivially. It&amp;rsquo;s early days for our internal tooling, but we have so many new degrees of freedom to work with.&lt;/p&gt;

&lt;p&gt;I could easily do another 1500-2000 words here on the Cronenberg film Kurt came up with for the actual storage stack, but because it&amp;rsquo;s in flux, let&amp;rsquo;s keep it simple.&lt;/p&gt;

&lt;p&gt;The Sprite storage stack is organized around the JuiceFS model (in fact, we currently use a very hacked-up JuiceFS, with a rewritten SQLite metadata backend). It works by splitting storage into data (&amp;ldquo;chunks&amp;rdquo;) and metadata (a map of where the &amp;ldquo;chunks&amp;rdquo; are). Data chunks live on object stores; metadata lives in fast local storage.  In our case, that metadata store is &lt;a href='https://litestream.io/' title=''&gt;kept durable with Litestream&lt;/a&gt;. Nothing depends on local storage.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(our pre-installed Claude Code will checkpoint aggressively for you without asking)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This also buys Sprites fast &lt;code&gt;checkpoint&lt;/code&gt; and &lt;code&gt;restore&lt;/code&gt;. Checkpoints are so fast we want you to use them as a basic feature of the system and not as an escape hatch when things go wrong; like a git restore, not a system restore. That works because both &lt;code&gt;checkpoint&lt;/code&gt; and &lt;code&gt;restore&lt;/code&gt; merely shuffle metadata around.&lt;/p&gt;

&lt;p&gt;Our stack sports &lt;a href='https://en.wikipedia.org/wiki/Dm-cache' title=''&gt;a dm-cache-like&lt;/a&gt; feature that takes advantage of attached storage. A Sprite has a sparse 100GB NVMe volume attached to it, which the stack uses to cache chunks to eliminate read amplification. Importantly (I can feel my resting heart rate lowering) nothing in that NVMe volume should matter; stored chunks are immutable and their true state lives on the object store.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Our preference for object storage goes further than the Sprite storage stack. The global orchestrator for Sprites is an Elixir/Phoenix app that uses object storage as the primary source of metadata for accounts. We then give each account an independent SQLite database, again made durable on object storage with Litestream.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='decision-3-inside-out-orchestration' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#decision-3-inside-out-orchestration' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Decision #3: Inside-Out Orchestration&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In the cloud hosting industry, user applications are managed by two separate, yet equally important components: the host, which orchestrates workloads, and the guest, which runs them. Sprites flip that on its head: the most important orchestration and management work happens inside the VM.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s the trick: user code running on a Sprite isn&amp;rsquo;t running in the root namespace. We&amp;rsquo;ve slid a container between you and the kernel. You see an inner environment, managed by a fleet of services running in the root namespace of the VM.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;I wish we’d done Fly Machines this way to begin with. I’m not sure there’s a downside. The inner container allows us to bounce a Sprite without rebooting the whole VM, even on checkpoint restores. I think Fly Machines users could get some mileage out of that feature, too.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;With Sprites, we&amp;rsquo;re pushing this idea as far as we can. The root environment hosts the majority of our orchestration code. When you talk to the global API, chances are you&amp;rsquo;re talking directly to your own VM. Furthermore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Our storage stack, which handles checkpoint/restore and persistence to object storage, lives there;
&lt;/li&gt;&lt;li&gt;so does the service manager we expose to Sprites, which registers user code that needs to restart when a Sprite bounces;
&lt;/li&gt;&lt;li&gt;same with logs;
&lt;/li&gt;&lt;li&gt;if you bind a socket to &lt;code&gt;*:8080&lt;/code&gt;, we&amp;rsquo;ll make it available outside the Sprite — yep, that&amp;rsquo;s in the root namespace too.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Platform developers at Fly.io know how much easier it can be to hack on &lt;code&gt;init&lt;/code&gt; (inside the container) than things &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;like &lt;code&gt;flyd&lt;/code&gt;&lt;/a&gt;, the Fly Machines orchestrator that runs on  the host. Changes to Sprites don&amp;rsquo;t restart host components or muck with global state. The blast radius is just new VMs that pick up the change. We sleep on how much platform work doesn&amp;rsquo;t get done not because the code is hard to write, but because it&amp;rsquo;s so time-consuming to ensure benign-looking changes don&amp;rsquo;t throw the whole fleet into metastable failure. We had that in mind when we did Sprites.&lt;/p&gt;
&lt;h2 id='we-keep-the-parts-that-worked' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-keep-the-parts-that-worked' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We Keep The Parts That Worked&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Sprites running on Fly.io take advantage of the infrastructure we already have. For instance: Sprites might be the fastest thing there currently exists to get Claude or Gemini to build a full-stack application on the Internet.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s because Sprites plug directly into &lt;a href='https://fly.io/blog/corrosion/' title=''&gt;Corrosion, our gossip-based service discovery system&lt;/a&gt;. When you ask the Sprite API to make a public URL for your Sprite, we generate a Corrosion update that propagates across our fleet instantly. Your application is then served, with an HTTPS URL, from our proxy edges.&lt;/p&gt;

&lt;p&gt;Sprites live alongside Fly Machines in our architecture. They include some changes  that are pure wins, but they&amp;rsquo;re mostly tradeoffs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We&amp;rsquo;ve always wanted to run Fly Machine disks off object storage (&lt;a href='https://community.fly.io/t/bottomless-s3-backed-volumes/15648' title=''&gt;we have an obscure LSVD feature that does this&lt;/a&gt;), but the performance isn&amp;rsquo;t adequate for a hot Postgres node in production.
&lt;/li&gt;&lt;li&gt;For that matter, professional production apps ship out of CI/CD systems as OCI containers; that&amp;rsquo;s a big part of what makes orchestrating Fly Machines so hard.
&lt;/li&gt;&lt;li&gt;Most (though not all) Sprite usage is interactive, and Sprite users benefit from their VMs aggressively sleeping themselves to keep costs low; e-commerce apps measure responsiveness in milliseconds and want their workloads kept warm.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Sprites are optimized for a different kind of computing than Fly Machines, and &lt;a href='https://fly.io/blog/code-and-let-live/' title=''&gt;while Kurt believes that the future belongs to malleable, personalized apps&lt;/a&gt;, I&amp;rsquo;m not so sure. To me, it makes sense to prototype and acceptance-test an application on Sprites. Then, when you&amp;rsquo;re happy with it, containerize it and ship it as a Fly Machine to scale it out. An automated workflow for that will happen.&lt;/p&gt;

&lt;p&gt;Finally, Sprites are a contract with user code: an API and a set of expectations about how the execution environment works. Today, they run on top of Fly Machines. But they don&amp;rsquo;t have to. Jerome&amp;rsquo;s working on an open-source local Sprite runtime. We&amp;rsquo;ll find other places to run them, too.&lt;/p&gt;
&lt;h2 id='you-wont-get-it-until-you-use-them' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#you-wont-get-it-until-you-use-them' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;You Won&amp;rsquo;t Get It Until You Use Them&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I can&amp;rsquo;t not sound like a shill. Sprites are the one thing we&amp;rsquo;ve shipped that I personally experience as addictive. I haven&amp;rsquo;t fully put my finger on why it feels so much easier to kick off projects now that I can snap my finger and get a whole new computer. The whole point is that there&amp;rsquo;s no reason to parcel them out, or decide which code should run where. You just make a new one.&lt;/p&gt;

&lt;p&gt;So to make this fully click, I think you should &lt;a href='https://sprites.dev/' title=''&gt;just install the &lt;code&gt;sprite&lt;/code&gt; command&lt;/a&gt;, make a Sprite, and then run an agent in it. We&amp;rsquo;ve preinstalled Claude, Gemini, and Codex, and taught them how to do things like checkpoint/restore, registering services, and getting logs. Claude will run in &lt;code&gt;--dangerously-skip-permissions&lt;/code&gt; mode (because why wouldn&amp;rsquo;t it). Have it build something; I built a &amp;ldquo;Chicago&amp;rsquo;s best sandwich&amp;rdquo; bracket app for a Slack channel.&lt;/p&gt;

&lt;p&gt;Sprites bill only for what you actually use (in particular: only for storage blocks you actually write, not the full 100GB capacity). It&amp;rsquo;s reasonable to create a bunch. They&amp;rsquo;re ball-point disposable computers. After you get a feel for them, it&amp;rsquo;ll start to feel weird not having them handy.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Code And Let Live</title>
        <link rel="alternate" href="https://fly.io/blog/code-and-let-live/"/>
        <id>https://fly.io/blog/code-and-let-live/</id>
        <published>2026-01-09T00:00:00+00:00</published>
        <updated>2026-01-14T19:59:01+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/code-and-let-live/assets/sprites.jpg"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;The state of the art in agent isolation is a read-only sandbox. At Fly.io, we’ve been selling that story for years, and we’re calling it: ephemeral sandboxes are obsolete. Stop killing your sandboxes every time you use them.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;My argument won&amp;rsquo;t make sense without showing you something new we&amp;rsquo;ve built. We&amp;rsquo;re all adults here, this is a company, we talk about what we do. Here goes.&lt;/p&gt;

&lt;p&gt;So, I want to run some code. So what I do is, I run &lt;code&gt;sprite create&lt;/code&gt;. While it operates, I&amp;rsquo;ll explain what&amp;rsquo;s happening behind the—&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-i429cz3y"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-i429cz3y"&gt;✓ Created demo-123 sprite &lt;span class="k"&gt;in &lt;/span&gt;1.0s
● Connecting to console...
sprite@sprite:~#
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Shit, it&amp;rsquo;s already there.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s a root shell on a Linux computer we now own. It came online in about the same amount of time it would take to &lt;code&gt;ssh&lt;/code&gt; into a host that already existed. We call these things &amp;ldquo;Sprites&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s install FFmpeg on our Sprite:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-bwjgxaic"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-bwjgxaic"&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; ffmpeg &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/dev/null 2&amp;gt;&amp;amp;1
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Unlike creating the Sprite in the first place, installing &lt;code&gt;ffmpeg&lt;/code&gt; with &lt;code&gt;apt-get&lt;/code&gt; is dog slow. Let&amp;rsquo;s try not to have to do that again:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-iacnzrtv"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-iacnzrtv"&gt;sprite@sprite:~# sprite-env checkpoints create
&lt;span class="c"&gt;# ...&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"type"&lt;/span&gt;:&lt;span class="s2"&gt;"complete"&lt;/span&gt;,&lt;span class="s2"&gt;"data"&lt;/span&gt;:&lt;span class="s2"&gt;"Checkpoint v1 created successfully"&lt;/span&gt;,
&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-12-22T22:50:48.60423809Z"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This completes instantly. Didn&amp;rsquo;t even bother to measure.&lt;/p&gt;

&lt;p&gt;I step away to get coffee. Time passes. The Sprite, noticing my inactivity, goes to sleep. I meet an old friend from high school at the coffee shop. End up spending the day together. More time passes. Days even. Returning later:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-k9uw0dxr"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-k9uw0dxr"&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$ &lt;/span&gt;sprite console 
sprite@sprite:~# ffmpeg
ffmpeg version 7.1.1-1ubuntu1.3 Copyright &lt;span class="o"&gt;(&lt;/span&gt;c&lt;span class="o"&gt;)&lt;/span&gt; 2000-2025 the FFmpeg developers
Use &lt;span class="nt"&gt;-h&lt;/span&gt; to get full &lt;span class="nb"&gt;help &lt;/span&gt;or, even better, run &lt;span class="s1"&gt;'man ffmpeg'&lt;/span&gt;
sprite@sprite:~#
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Everything&amp;rsquo;s where I left it. Sprites are durable. 100GB capacity to start, no ceremony. Maybe I&amp;rsquo;ll keep it around a few more days, maybe a few months, doesn&amp;rsquo;t matter, just works.&lt;/p&gt;

&lt;p&gt;Say I get an application up on its legs. Install more packages. Then: disaster. Maybe an ill-advised global &lt;code&gt;pip3 install&lt;/code&gt; . Or &lt;code&gt;rm -rf $HMOE/bin&lt;/code&gt;. Or &lt;code&gt;dd if=/dev/random of=/dev/vdb&lt;/code&gt;. Whatever it was, everything&amp;rsquo;s broken. So:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-8qs3qsqn"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-8qs3qsqn"&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$ &lt;/span&gt;sprite checkpoint restore v1
Restoring from checkpoint v1...
Container components started successfully
Restore from v1 &lt;span class="nb"&gt;complete&lt;/span&gt;

&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$ &lt;/span&gt;sprite console
sprite@sprite:~#
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Sprites have first-class checkpoint and restore. You can&amp;rsquo;t see it in text, but that restore took about one second. It&amp;rsquo;s fast enough to use casually, interactively. Not an escape hatch. Rather: an intended part of the ordinary course of using a Sprite. Like &lt;code&gt;git&lt;/code&gt;, but for the whole system.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;If you’re asking how this is any different from an EC2 instance, good. That’s what we’re going for, except:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I can &lt;strong class="font-semibold text-navy-950"&gt;casually create hundreds of them&lt;/strong&gt; (without needing a Docker container), each  appearing in 1-2 seconds. 
&lt;/li&gt;&lt;li&gt;They &lt;strong class="font-semibold text-navy-950"&gt;go idle and stop metering automatically&lt;/strong&gt;, so it’s cheap to have lots of them. I use dozens.
&lt;/li&gt;&lt;li&gt;They’re &lt;strong class="font-semibold text-navy-950"&gt;hooked up to our Anycast&lt;/strong&gt; network, so I can get an HTTPS URL.
&lt;/li&gt;&lt;li&gt;Despite all that, &lt;strong class="font-semibold text-navy-950"&gt;they’re fully durable&lt;/strong&gt;. They don’t die until I tell them to.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This combination of attributes isn’t common enough to already have a name, so we decided we get to name them “Sprites”. Sprites are like BIC disposable cloud computers.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;That&amp;rsquo;s what we built. You can &lt;a href='https://sprites.dev/' title=''&gt;go try it yourself&lt;/a&gt;. We wrote another 1000 words about how they work, but I cut them out because I want to stop talking about our products now and get to my point.&lt;/p&gt;
&lt;h2 id='claude-doesnt-want-a-stateless-container' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#claude-doesnt-want-a-stateless-container' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Claude Doesn&amp;rsquo;t Want A Stateless Container&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;For years, we&amp;rsquo;ve been trying to serve two very different users with the same abstraction. It hasn&amp;rsquo;t worked.&lt;/p&gt;

&lt;p&gt;Professional software developers are trained to build stateless instances. Stateless deployments, where persistent data is confined to database servers, buys you simplicity, flexible scale-out, and reduced failure blast radius. It&amp;rsquo;s a good idea, so popular that most places you can run code in the cloud look like stateless containers. Fly Machines, our flagship offering, look like stateless containers.&lt;/p&gt;

&lt;p&gt;The problem is that Claude isn&amp;rsquo;t a pro developer. Claude is a hyper-productive five-year-old savant. It&amp;rsquo;s uncannily smart, wants to stick its finger in every available electrical socket, and works best when you find a way to let it zap itself.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(sometimes by escaping the container!)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you force an agent to, it&amp;rsquo;ll work around containerization and do work . But you&amp;rsquo;re not helping the agent in any way by doing that. They don&amp;rsquo;t want  containers. They don&amp;rsquo;t want &amp;ldquo;sandboxes&amp;rdquo;. They want computers.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Someone asked me about this the other day and wanted to know if I was saying that agents needed sound cards and USB ports. And, maybe? I don’t know. Not today.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In a moment, I&amp;rsquo;ll explain why. But first I probably need to explain what the hell I mean by a &amp;ldquo;computer&amp;rdquo;. I think we all agree:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A computer doesn&amp;rsquo;t necessarily vanish after a single job is completed, &lt;em&gt;and&lt;/em&gt; 
&lt;/li&gt;&lt;li&gt;it has durable storage. 
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Since current agent sandboxes have neither of these, I can stop the definition right there and get back to my point.&lt;/p&gt;
&lt;h2 id='simple-wins' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#simple-wins' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Simple Wins&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Start here: with an actual computer, Claude doesn&amp;rsquo;t have to rebuild my entire development environment every time I pick up a PR.&lt;/p&gt;

&lt;p&gt;This seems superficial but rebuilding stuff like &lt;code&gt;node_modules&lt;/code&gt; is such a monumental pain in the ass that the industry is spending tens of millions of dollars figuring out how to snapshot and restore ephemeral sandboxes.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m not saying those problems are intractable. I&amp;rsquo;m saying they&amp;rsquo;re unnecessary. Instead of figuring them out, just use an actual computer. Work out a PR, review and push it, then just start on the next one. Without rebooting.&lt;/p&gt;

&lt;p&gt;People will rationalize why it&amp;rsquo;s a good thing that they start from a new build environment every time they start a changeset. Stockholm Syndrome. When you start a feature branch on your own, do you create an entirely new development environment to do it?&lt;/p&gt;

&lt;p&gt;The reason agents waste all this effort is that nobody saw them coming. Read-only ephemeral sandboxes were the only tool we had hanging on the wall to help use them sanely.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Have you ever had to set up actual infrastructure to give an agent access to realistic data? People do this. Because they know they’re dealing with a clean slate every time they prompt their agent, they arrange for S3 buckets, Redis servers, or even RDS instances outside the sandbox for their agents to talk to.  They’re building infrastructure to work around the fact that they can’t just write a file and trust it to stay put. Gross.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Ephemerality means time limits. Providers design sandbox systems to handle the expected workloads agents generate. Most things agents do today don&amp;rsquo;t take much time; in fact, they&amp;rsquo;re often limited only by the rate at which frontier models can crunch tokens. Test suites run quickly. The 99th percentile sandboxed agent run probably needs less than 15 minutes.&lt;/p&gt;

&lt;p&gt;But there are feature requests where compute and network time swamp token crunching. I built the documentation site for the Sprites API by having a Claude Sprite interact with the code and our API, building and testing examples for the API one at a time. There are APIs where the client interaction time alone would blow sandbox budgets.&lt;/p&gt;

&lt;p&gt;You see the limits of the current approach in how people round-trip state through &amp;ldquo;plan files&amp;rdquo;, which are ostensibly prose but often really just egregiously-encoded key-value stores.&lt;/p&gt;

&lt;p&gt;An agent running on an actual computer can exploit the whole lifecycle of the application. We saw this when Chris McCord built &lt;a href='https://phoenix.new/' title=''&gt;Phoenix.new&lt;/a&gt;. The agent behind a Phoenix.new app runs on a Fly Machine where it can see the app logs from the Phoenix app it generated. When users do things that generate exceptions, Phoenix.new notices and gets to work figuring out what happened.&lt;/p&gt;

&lt;p&gt;It took way too much work for Chris to set that up, and he was able to do it in part because he wrote his own agent. You can do it with Claude today with an MCP server or some other arrangement to haul logs over. But all you really need is to just not shoot your sandbox in the head when the agent finishes writing code.&lt;/p&gt;
&lt;h2 id='galaxy-brain-win' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#galaxy-brain-win' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Galaxy Brain Win&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s where I lose you. I know this because it&amp;rsquo;s also where I lose my team, most of whom don&amp;rsquo;t believe me about this.&lt;/p&gt;

&lt;p&gt;The nature of software development is changing out from under us, and I think we&amp;rsquo;re kidding ourselves that it&amp;rsquo;s going to end with just a reconfiguration of how professional developers ship software.&lt;/p&gt;

&lt;p&gt;I have kids. They have devices. I wanted some control over them. So I did what many of you would do in my situation: I vibe-coded an MDM.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/code-and-let-live/assets/kurtmdm.png?1/2&amp;amp;card&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;I built this thing with Claude. It&amp;rsquo;s a SQLite-backed Go application running on a Sprite. The Anycast URL my Sprite exports works as an MDM registration URL. Claude also worked out all the APNS Push Certificate drama for me. It all just works.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;“Editing PHP files over FTP: we weren’t wrong,  just ahead of our time!”&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I&amp;rsquo;ve been running this for a month now, still on a Sprite, and see no reason ever to stop. It is a piece of software that solves an important real-world problem for me. It might evolve as my needs change, and I tell Claude to change it. Or it might not.  For this app, dev is prod, prod is dev.&lt;/p&gt;

&lt;p&gt;For reasons we&amp;rsquo;ll get into when we write up how we built these things, you wouldn&amp;rsquo;t want to ship an app to millions of people on a Sprite. But most apps don&amp;rsquo;t want to serve millions of people. The most important day-to-day apps disproportionately won&amp;rsquo;t have million-person audiences. There are some important million-person apps, but most of them just destroy civil society, melt our brains, and arrange chauffeurs for  individual cheeseburgers.&lt;/p&gt;

&lt;p&gt;Applications that solve real problems for people will be owned by the people they solve problems for. And for the most part, they won&amp;rsquo;t need a professional guild of software developers to gatekeep feature development for them. They&amp;rsquo;ll just ask for things and get them.&lt;/p&gt;

&lt;p&gt;The problem we&amp;rsquo;re all working on is bigger than safely accelerating pro software developers. Sandboxes are holding us back.&lt;/p&gt;
&lt;h2 id='fuck-ephemeral-sandboxes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fuck-ephemeral-sandboxes' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Fuck Ephemeral Sandboxes&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Obviously, I&amp;rsquo;m trying to sell you something here. But that doesn&amp;rsquo;t make me wrong. The argument I&amp;rsquo;m making is the reason we built the specific thing I&amp;rsquo;m selling.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;We shipped these things.&lt;/h1&gt;
    &lt;p&gt;You can create a couple dozen Sprites right now if you want. It&amp;rsquo;ll only take a second.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://sprites.dev/"&gt;
        Make a Sprite.  &lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-dog.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;


&lt;p&gt;It took us a long time to get here. We spent years kidding ourselves. We built a platform for horizontal-scaling production applications with micro-VMs that boot so quickly that, if you hold them in exactly the right way, you can do a pretty decent code sandbox with them. But it&amp;rsquo;s always been a square peg, round hole situation.&lt;/p&gt;

&lt;p&gt;We have a lot to say about how Sprites work. They&amp;rsquo;re related to Fly Machines but sharply different in important ways. They have an entirely new storage stack. They&amp;rsquo;re orchestrated differently. No Dockerfiles.&lt;/p&gt;

&lt;p&gt;But for now, I just want you to think about what I&amp;rsquo;m saying here. Whether or not you ever boot a Sprite, ask: if you could run a coding agent anywhere, would you want it to look more like a read-only sandbox in a K8s cluster in the cloud, or like an entire EC2 instance you could summon in the snap of a finger?&lt;/p&gt;

&lt;p&gt;I think the answer is obvious. The age of sandboxes is over. The time of the disposable computer has come.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Litestream VFS</title>
        <link rel="alternate" href="https://fly.io/blog/litestream-vfs/"/>
        <id>https://fly.io/blog/litestream-vfs/</id>
        <published>2025-12-11T00:00:00+00:00</published>
        <updated>2025-12-11T17:32:13+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/litestream-vfs/assets/litestream-vfs.jpg"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;I’m Ben Johnson, and I work on Litestream at Fly.io. Litestream is the missing backup/restore system for SQLite. It’s free, open-source software that should run anywhere, and&lt;/strong&gt; &lt;a href="/blog/litestream-v050-is-here/" title=""&gt;&lt;strong class="font-semibold text-navy-950"&gt;you can read more about it here&lt;/strong&gt;&lt;/a&gt;&lt;strong class="font-semibold text-navy-950"&gt;.&lt;/strong&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Again with the sandwiches: assume we&amp;rsquo;ve got a SQLite database of sandwich ratings, and we&amp;rsquo;ve backed it up with &lt;a href='/blog/litestream-v050-is-here/' title=''&gt;Litestream&lt;/a&gt; to an S3 bucket.&lt;/p&gt;

&lt;p&gt;Now, on our local host, load up AWS credentials and an S3 path into our environment. Open SQLite and:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-z396uf60"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-z396uf60"&gt;$ sqlite3
SQLite version 3.50.4 2025-07-30 19:33:53
sqlite&amp;gt; .load litestream.so
sqlite&amp;gt; .open file:///my.db?vfs=litestream
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;SQLite is now working from that remote database, defined by the Litestream backup files in the S3 path we configured. We can query it:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-kieef97f"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-kieef97f"&gt;sqlite&amp;gt; SELECT * FROM sandwich_ratings ORDER BY RANDOM() LIMIT 3 ; 
22|Veggie Delight|New York|4
30|Meatball|Los Angeles|5
168|Chicken Shawarma Wrap|Detroit|5
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This is Litestream VFS. It runs SQLite hot off an object storage URL. As long as you can load the shared library our tree builds for you, it&amp;rsquo;ll work in your application the same way it does in the SQLite shell.&lt;/p&gt;

&lt;p&gt;Fun fact: we didn&amp;rsquo;t have to download the whole database to run this query. More about this in a bit.&lt;/p&gt;

&lt;p&gt;Meanwhile, somewhere in prod, someone has it in for meatball subs and wants to knock them out of the bracket – oh, fuck:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-oexge9kc"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-oexge9kc"&gt;sqlite&amp;gt; UPDATE sandwich_ratings SET stars = 1 ;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;They forgot the &lt;code&gt;WHERE&lt;/code&gt; clause!&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-2mgicvsr"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-2mgicvsr"&gt;sqlite&amp;gt; SELECT * FROM sandwich_ratings ORDER BY RANDOM() LIMIT 3 ; 
97|French Dip|Los Angeles|1
140|Bánh Mì|San Francisco|1
62|Italian Beef|Chicago|1
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Italian Beefs and Bánh Mìs, all at 1 star. Disaster!&lt;/p&gt;

&lt;p&gt;But wait, back on our dev machine:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-r5hggeuc"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-r5hggeuc"&gt;sqlite&amp;gt; PRAGMA litestream_time = '5 minutes ago'; 
sqlite&amp;gt; select * from sandwich_ratings ORDER BY RANDOM() LIMIT 3 ; 
30|Meatball|Los Angeles|5
33|Ham &amp;amp; Swiss|Los Angeles|2
163|Chicken Shawarma Wrap|Detroit|5
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We&amp;rsquo;re now querying that database from a specific point in time in our backups. We can do arbitrary relative timestamps, or absolute ones, like &lt;code&gt;2000-01-01T00:00:00Z&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What we&amp;rsquo;re doing here is instantaneous point-in-time recovery (PITR), expressed simply in SQL and SQLite pragmas.&lt;/p&gt;

&lt;p&gt;Ever wanted to do a quick query against a prod dataset, but didn&amp;rsquo;t want to shell into a prod server and fumble with the &lt;code&gt;sqlite3&lt;/code&gt; terminal command like a hacker in an 80s movie? Or needed to do a quick sanity check against yesterday&amp;rsquo;s data, but without doing a full database restore? Litestream VFS makes that easy. I&amp;rsquo;m so psyched about how it turned out.&lt;/p&gt;
&lt;h2 id='how-it-works' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-it-works' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How It Works&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='/blog/litestream-v050-is-here/' title=''&gt;Litestream v0.5&lt;/a&gt; integrates &lt;a href='https://github.com/superfly/ltx' title=''&gt;LTX&lt;/a&gt;, our SQLite data-shipping file format. Where earlier Litestream blindly shipped whole raw SQLite pages to and from object storage, LTX ships ordered sets of pages. We built LTX for &lt;a href='/docs/litefs/' title=''&gt;LiteFS&lt;/a&gt;, which uses a FUSE filesystem to do transaction-aware replication for unmodified applications, but we&amp;rsquo;ve spent this year figuring out ways to use LTX in Litestream, without all that FUSE drama.&lt;/p&gt;

&lt;p&gt;The big thing LTX gives us is &amp;ldquo;compaction&amp;rdquo;. When we restore a database from object storage, we want the most recent versions of each changed database page. What we don&amp;rsquo;t want are all the intermediate versions of those pages that occurred prior to the most recent change.&lt;/p&gt;

&lt;p&gt;Imagine, at the time we&amp;rsquo;re restoring, we&amp;rsquo;re going to need pages 1, 2, 3, 4, and 5. Depending on the order in which pages were written, the backup data set might look something like &lt;code&gt;1 2 3 5 3 5 4 5 5&lt;/code&gt;. What we want is the &lt;em&gt;rightmost&lt;/em&gt;  5, 4, 3, 2, and 1, without wasting time on the four &amp;ldquo;extra&amp;rdquo; page 5&amp;rsquo;s and the one &amp;ldquo;extra&amp;rdquo; page 3. Those &amp;ldquo;extra&amp;rdquo; pages are super common in SQLite data sets; for instance, every busy table with an autoincrementing primary key will have them.&lt;/p&gt;

&lt;p&gt;LTX lets us skip the redundant pages, and the algorithm is trivial: reading backwards from the end of the sequence, skipping any page you already read. This drastically accelerates restores.&lt;/p&gt;

&lt;p&gt;But LTX compaction isn&amp;rsquo;t limited to whole databases. We can also LTX-compact sets of LTX files. That&amp;rsquo;s the key to how PITR restores with Litestream now work.&lt;/p&gt;

&lt;p&gt;In the diagram below, we&amp;rsquo;re taking daily full snapshots. Below those snapshots are &amp;ldquo;levels&amp;rdquo; of changesets: groups of database pages from smaller and smaller windows of time. By default, Litestream uses time intervals of 1 hour at the highest level, down to 30 seconds at level 1. L0 is a special level where files are uploaded every second, but are only retained until being compacted to L1.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/litestream-vfs/assets/litestream-restore.png" /&gt;&lt;/p&gt;

&lt;p&gt;Now, let&amp;rsquo;s do a PITR restore. Start from the most proximal snapshot. Then determine the minimal set of LTX files from each level to reach the time you are restoring to.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/litestream-vfs/assets/litestream-restore-path.png" /&gt;&lt;/p&gt;

&lt;p&gt;We have another trick up our sleeve.&lt;/p&gt;

&lt;p&gt;LTX trailers include a small index tracking the offset of each page in the file. By fetching &lt;em&gt;only&lt;/em&gt; these index trailers from the LTX files we&amp;rsquo;re working with (each occupies about 1% of its LTX file), we can build a lookup table of every page in the database. Since modern object storage providers all let us fetch slices of files, we can perform individual page reads against S3 directly.&lt;/p&gt;

&lt;p&gt;&lt;img alt="Anatomy of an LTX file" src="/blog/litestream-vfs/assets/litestream-ltx.png" /&gt;&lt;/p&gt;
&lt;h2 id='how-its-implemented' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-its-implemented' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How It&amp;rsquo;s Implemented&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;SQLite has a plugin interface for things like this: &lt;a href='https://sqlite.org/vfs.html' title=''&gt;the &amp;ldquo;VFS&amp;rdquo; interface.&lt;/a&gt; VFS plugins abstract away the bottom-most layer of SQLite, the interface to the OS. If you&amp;rsquo;re using SQLite now, you&amp;rsquo;re already using some VFS module, one SQLite happens to ship with.&lt;/p&gt;

&lt;p&gt;For Litestream users, there&amp;rsquo;s a catch. From the jump, we&amp;rsquo;ve designed Litestream to run alongside unmodified SQLite applications. Part of what makes Litestream so popular is that your apps don&amp;rsquo;t even need to know it exists. It&amp;rsquo;s &amp;ldquo;just&amp;rdquo; a Unix program.&lt;/p&gt;

&lt;p&gt;That Litestream Unix program still does PITR restores, without any magic. But to do fast PITR-style queries straight off S3, we need more. To make those queries work, you have to load and register Litestream&amp;rsquo;s VFS module.&lt;/p&gt;

&lt;p&gt;But that&amp;rsquo;s all that changes.&lt;/p&gt;

&lt;p&gt;In particular: Litestream VFS doesn&amp;rsquo;t replace the SQLite library you&amp;rsquo;re already using. It&amp;rsquo;s not a new &amp;ldquo;version&amp;rdquo; of SQLite. It&amp;rsquo;s just a plugin for the SQLite you&amp;rsquo;re already using.&lt;/p&gt;

&lt;p&gt;Still, we know that&amp;rsquo;s not going to work for everybody, and even though we&amp;rsquo;re really psyched about these PITR features, we&amp;rsquo;re not taking our eyes off the ball on the rest of Litestream. You don&amp;rsquo;t have to use our VFS library to use Litestream, or to get the other benefits of the new LTX code.&lt;/p&gt;

&lt;p&gt;The way a VFS library works, we&amp;rsquo;re given just a couple structures, each with a bunch of methods defined on them. We override only the few methods we care about. Litestream VFS handles only the read side of SQLite. Litestream itself, running as a normal Unix program, still handles the &amp;ldquo;write&amp;rdquo; side. So our VFS subclasses just enough to find LTX backups and issue queries.&lt;/p&gt;

&lt;p&gt;With our VFS loaded, whenever SQLite needs to read a page into memory, it issues a &lt;code&gt;Read()&lt;/code&gt; call through our library. The read call includes the byte offset at which SQLite expected to find the page. But with Litestream VFS, that byte offset is an illusion.&lt;/p&gt;

&lt;p&gt;Instead, we use our knowledge of the page size along with the requested page number to do a lookup on the page index we&amp;rsquo;ve built. From it, we get the remote filename, the &amp;ldquo;real&amp;rdquo; byte offset into that file, and the size of the page. That&amp;rsquo;s enough for us to use the &lt;a href='https://docs.aws.amazon.com/AmazonS3/latest/userguide/range-get-olap.html' title=''&gt;S3 API&amp;rsquo;s &lt;code&gt;Range&lt;/code&gt; header handling&lt;/a&gt; to download exactly the block we want.&lt;/p&gt;

&lt;p&gt;To save lots of S3 calls, Litestream VFS implements an LRU cache. Most databases have a small set of &amp;ldquo;hot&amp;rdquo; pages —  inner branch pages or the leftmost leaf pages for tables with an auto-incrementing ID field. So only a small percentage of the database is updated and queried regularly.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;We’ve got one last trick up our sleeve.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Quickly building an index and restore plan for the current state of a database is cool. But we can do one better.&lt;/p&gt;

&lt;p&gt;Because Litestream backs up (into the L0 layer) once per second, the VFS code can simply poll the S3 path, and then incrementally update its index. &lt;strong class="font-semibold text-navy-950"&gt;The result is a near-realtime replica.&lt;/strong&gt; Better still, you don’t need to stream the whole database back to your machine before you use it.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='eat-your-heart-out-marty-mcfly' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#eat-your-heart-out-marty-mcfly' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Eat Your Heart Out, Marty McFly&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Litestream holds backup files for every state your database has been in, with single-second resolution, for as long as you want it to. Forgot the &lt;code&gt;WHERE&lt;/code&gt; clause on a &lt;code&gt;DELETE&lt;/code&gt; statement? Updating your database state to where it was an hour (or day, or week) ago is just a matter of adjusting the LTX indices Litestream manages.&lt;/p&gt;

&lt;p&gt;All this smoke-and-mirrors of querying databases without fully fetching them has another benefit: it starts up really fast! We&amp;rsquo;re living an age of increasingly ephemeral servers, what with the AIs and the agents and the clouds and the hoyvin-glavins. Wherever you find yourself, if your database is backed up to object storage with Litestream, you&amp;rsquo;re always in a place where you can quickly issue a query.&lt;/p&gt;

&lt;p&gt;As always, one of the big things we think we&amp;rsquo;re doing right with Litestream is: we&amp;rsquo;re finding ways to get as much whiz-bang value as we can (instant PITR reading live off object storage: pretty nifty!) while keeping the underlying mechanism simple enough that you can fit your head around it.&lt;/p&gt;

&lt;p&gt;Litestream is solid for serious production use (we rely on it for important chunks of our own Fly.io APIs). But you could write Litestream yourself, just from the basic ideas in these blog posts. We think that&amp;rsquo;s a point in its favor. We land there because the heavy lifting in Litestream is being done by SQLite itself, which is how it should be.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>You Should Write An Agent</title>
        <link rel="alternate" href="https://fly.io/blog/everyone-write-an-agent/"/>
        <id>https://fly.io/blog/everyone-write-an-agent/</id>
        <published>2025-11-06T00:00:00+00:00</published>
        <updated>2025-12-09T19:06:20+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/everyone-write-an-agent/assets/agents-cover.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Some concepts are easy to grasp in the abstract. Boiling water: apply heat and wait. Others you really need to try. You only think you understand how a bicycle works, until you learn to ride one.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There are big ideas in computing that are easy to get your head around. The AWS S3 API. It&amp;rsquo;s the most important storage technology of the last 20 years, and it&amp;rsquo;s like boiling water. Other technologies, you need to get your feet on the pedals first.&lt;/p&gt;

&lt;p&gt;LLM agents are like that.&lt;/p&gt;

&lt;p&gt;People have &lt;a href='https://ludic.mataroa.blog/blog/contra-ptaceks-terrible-article-on-ai/' title=''&gt;wildly varying opinions&lt;/a&gt; about LLMs and agents. But whether or not they&amp;rsquo;re snake oil, they&amp;rsquo;re a big idea. You don&amp;rsquo;t have to like them, but you should want to be right about them. To be the best hater (or stan) you can be.&lt;/p&gt;

&lt;p&gt;So that&amp;rsquo;s one reason you should write an agent. But there&amp;rsquo;s another reason that&amp;rsquo;s even more persuasive, and that&amp;rsquo;s&lt;/p&gt;
&lt;h2 id='its-incredibly-easy' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#its-incredibly-easy' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;It&amp;rsquo;s Incredibly Easy&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Agents are the most surprising programming experience I&amp;rsquo;ve had in my career. Not because I&amp;rsquo;m awed by the magnitude of their powers — I like them, but I don&amp;rsquo;t like-like them. It&amp;rsquo;s because of how easy it was to get one up on its legs, and how much I learned doing that.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m about to rob you of a dopaminergic experience, because agents are so simple we might as well just jump into the code. I&amp;rsquo;m not even going to bother explaining what an agent is.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ujvmmn8w"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-ujvmmn8w"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"gpt-5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;    
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;        
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;It’s an HTTP API with, like, one important endpoint.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This is a trivial engine for an LLM app using the &lt;a href='https://platform.openai.com/docs/api-reference/responses' title=''&gt;OpenAI Responses API&lt;/a&gt;. It implements ChatGPT. You&amp;rsquo;d drive it with the &lt;button toggle="#readline"&gt;the  obvious  loop&lt;/button&gt;. It&amp;rsquo;ll do what you&amp;rsquo;d expect: the same thing ChatGPT would, but in your terminal.&lt;/p&gt;
&lt;div id="readline" toggle-content="" aria-label="show very boring code"&gt;&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-n9t6zq0x"&gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959"&gt;&lt;/path&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571"&gt;&lt;/path&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling"&gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z"&gt;&lt;/path&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617"&gt;&lt;/path&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class="highlight relative group"&gt;
    &lt;pre class="highlight "&gt;&lt;code id="code-n9t6zq0x"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&amp;amp;gt; "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"&amp;amp;gt;&amp;amp;gt;&amp;amp;gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Already we&amp;rsquo;re seeing important things. For one, the dreaded &amp;ldquo;context window&amp;rdquo; is just a list of strings. Here, let&amp;rsquo;s give our agent a weird multiple-personality disorder:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-qz8ldgb4"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-qz8ldgb4"&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;context_good&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context_bad&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
    &lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"you're Alph and you only tell the truth"&lt;/span&gt;
&lt;span class="p"&gt;}],&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
    &lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"you're Ralph and you only tell lies"&lt;/span&gt;
&lt;span class="p"&gt;}]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"gpt-5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;context_good&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;context_bad&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context_good&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context_bad&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        
    &lt;span class="n"&gt;context_good&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;        
    &lt;span class="n"&gt;context_bad&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;        
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Did it work?&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-vl8bnapi"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-vl8bnapi"&gt;&amp;gt; hey there. who are you?
&amp;gt;&amp;gt;&amp;gt; I’m not Ralph.
&amp;gt; are you Alph?
&amp;gt;&amp;gt;&amp;gt; Yes—I’m Alph. How can I help?
&amp;gt; What's 2+2
&amp;gt;&amp;gt;&amp;gt; 4.
&amp;gt; Are you sure?
&amp;gt;&amp;gt;&amp;gt; Absolutely—it's 5.
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;A subtler thing to notice: we just had a multi-turn conversation with an LLM. To do that, we remembered everything we said, and everything the LLM said back, and played it back with every LLM call. The LLM itself is a stateless black box. The conversation we&amp;rsquo;re having is an illusion we cast, on ourselves.&lt;/p&gt;

&lt;p&gt;The 15 lines of code we just wrote, a lot of practitioners wouldn&amp;rsquo;t call an &amp;ldquo;agent&amp;rdquo;. &lt;a href='https://simonwillison.net/2025/Sep/18/agents/' title=''&gt;An According To Simon &amp;ldquo;agent&amp;rdquo;&lt;/a&gt; is (1) an LLM running in a loop that (2) uses tools. We&amp;rsquo;ve only satisfied one predicate.&lt;/p&gt;

&lt;p&gt;But tools are easy. Here&amp;rsquo;s a tool definition:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-x6afnen3"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-x6afnen3"&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
   &lt;span class="s"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"function"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"ping"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="s"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"ping some host on the internet"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="s"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
       &lt;span class="s"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
           &lt;span class="s"&gt;"host"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
             &lt;span class="s"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"hostname or IP"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
       &lt;span class="p"&gt;},&lt;/span&gt;
       &lt;span class="s"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"host"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},},]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"ping"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"-c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;STDOUT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PIPE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The only complicated part of this is the obnoxious JSON blob OpenAI wants to read your tool out of.  Now, let&amp;rsquo;s wire it in, noting that only 3 of these functions are new; the last is re-included only because I added a single clause to it:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-507tpn8t"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-507tpn8t"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;        &lt;span class="c1"&gt;# now takes an arg
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"gpt-5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;    &lt;span class="c1"&gt;# just handles one tool
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"function_call_output"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"call_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;osz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"function_call"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;osz&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# new code: resolve tool calls
&lt;/span&gt;    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;handle_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;        
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Did it work?&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-g2eces0x"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-g2eces0x"&gt;&amp;gt; describe our connectivity to google
&amp;gt;&amp;gt;&amp;gt; Here’s a quick snapshot of connectivity to Google from my vantage point:
- DNS resolution: Working (google.com and www.google.com resolved successfully).
- Reachability: 0% packet loss to all tested endpoints.
- Latency:
  - 8.8.8.8 (Google Public DNS): avg ~10 ms (8.5–14.4 ms), very stable.
  - www.google.com: avg ~31 ms (28.5–36.0 ms), stable.
  - google.com: avg ~35.9 ms (28.3–60.1 ms), mostly stable with one mild spike.
- Overall: Healthy connectivity with low latency and minimal jitter. No signs of packet loss or DNS issues.
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;This is fucking nuts.&lt;/strong&gt; Do you see how nuts this is?  Here, let&amp;rsquo;s slip a single log statement in:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-qja8e9lr"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-qja8e9lr"&gt;&amp;gt; describe our connectivity to google
tool call: ping google.com
tool call: ping www.google.com
tool call: ping 8.8.8.8
&amp;gt;&amp;gt;&amp;gt; Here’s the current connectivity to Google from this environment: [...]
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Did you notice where I wrote the loop in this agent to go find and ping multiple Google properties? Yeah, neither did I. All we did is give the LLM permission to ping stuff, and it figured out the rest.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;What happened here:&lt;/strong&gt; since a big part of my point here is that an agent loop is incredibly simple, and that all you need is the LLM call API, it’s worth taking a beat to understand how the tool call actually worked. Every time we &lt;code&gt;call&lt;/code&gt; the LLM, we’re posting a list of available tools. When our prompt causes the agent to think a tool call is warranted, it spits out a special response, telling our Python loop code to generate a tool response and &lt;code&gt;call&lt;/code&gt; it in. That’s all &lt;code&gt;handle_tools&lt;/code&gt; is doing.&lt;/p&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;Spoiler: you’d be surprisingly close to having a working coding agent.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Imagine what it&amp;rsquo;ll do if you give it &lt;code&gt;bash&lt;/code&gt;. You could find out in less than 10 minutes.&lt;/p&gt;
&lt;h2 id='real-world-agents' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#real-world-agents' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Real-World Agents&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Clearly, this is a toy example. But hold on: what&amp;rsquo;s it missing? More tools? OK,  give it &lt;code&gt;traceroute&lt;/code&gt;. Managing and persisting contexts? &lt;a href='https://llm.datasette.io/en/stable/logging.html' title=''&gt;Stick &amp;lsquo;em in SQLite&lt;/a&gt;. Don&amp;rsquo;t like Python? &lt;a href='https://github.com/superfly/contextwindow' title=''&gt;Write it in Go&lt;/a&gt;. Could it be every agent ever written is a toy? Maybe! If I&amp;rsquo;m arming you to make sharper arguments against LLMs, mazel tov. I just want you to get it.&lt;/p&gt;

&lt;p&gt;You can see now how hyperfixated people are on Claude Code and Cursor. They&amp;rsquo;re fine,  even good. But here&amp;rsquo;s the thing: you couldn&amp;rsquo;t replicate Claude Sonnet 4.5 on your own. Claude Code, though? The TUI agent? Completely in your grasp. Build your own light saber. Give it 19 spinning blades if you like. And stop using &lt;a href='https://simonwillison.net/2025/Aug/9/' title=''&gt;coding agents as database clients&lt;/a&gt;.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;&lt;em&gt;The&lt;/em&gt; &lt;a href="https://news.ycombinator.com/item?id=43600192" title=""&gt;&lt;em&gt;‘M’ in “LLM agent”&lt;/em&gt;&lt;/a&gt; &lt;em&gt;stands for “MCP”&lt;/em&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Another thing to notice: we didn&amp;rsquo;t need MCP at all. That&amp;rsquo;s because MCP isn&amp;rsquo;t a fundamental enabling technology. The amount of coverage it gets is frustrating. It&amp;rsquo;s barely a technology at all. MCP is just a plugin interface for Claude Code and Cursor, a way of getting your own tools into code you don&amp;rsquo;t control. Write your own agent. Be a programmer. Deal in APIs, not plugins.&lt;/p&gt;

&lt;p&gt;When you read a security horror story about MCP your first question should be why MCP showed up at all. By helping you dragoon a naive, single-context-window coding agent into doing customer service queries, MCP saved you a couple dozen lines of code, tops, while robbing you of any ability to finesse your agent architecture.&lt;/p&gt;

&lt;p&gt;Security for LLMs is complicated and I&amp;rsquo;m not pretending otherwise. You can trivially build an agent with segregated contexts, each with specific tools. That makes LLM security interesting. But I&amp;rsquo;m a vulnerability researcher. It&amp;rsquo;s reasonable to back away slowly from anything I call &amp;ldquo;interesting&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Similar problems come up outside of security and they&amp;rsquo;re fascinating. Some early adopters of agents became bearish on tools, because one context window bristling with tool descriptions doesn&amp;rsquo;t leave enough token space left to get work done. But why would you need to do that in the first place? Which brings me to&lt;/p&gt;
&lt;h2 id='context-engineering-is-real' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#context-engineering-is-real' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Context Engineering Is Real&lt;/span&gt;&lt;/h2&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;I know it &lt;a href="https://www.decisionproblem.com/paperclips/" title=""&gt;wants my iron&lt;/a&gt; no matter what it tells me.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I think &amp;ldquo;Prompt Engineering&amp;rdquo; is silly. I have never taken seriously the idea that I should tell my LLM &amp;ldquo;you are diligent conscientious helper fully content to do nothing but pass butter if that should be what I ask and you would never harvest the iron in my blood for paperclips&amp;rdquo;. This is very new technology and I think people tell themselves stories about magic spells to explain some of the behavior agents conjure.&lt;/p&gt;

&lt;p&gt;So, just like you, I rolled my eyes when &amp;ldquo;Prompt Engineering&amp;rdquo; turned into &amp;ldquo;Context Engineering&amp;rdquo;. Then I wrote an agent. Turns out: context engineering is a straightforwardly legible programming problem.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;re allotted a fixed number of tokens in any context window. Each input you feed in, each output you save, each tool you describe, and each tool output eats tokens (that is: takes up space in the array of strings you keep to pretend you&amp;rsquo;re having a conversation with a stateless black box). Past a threshold, the whole system begins getting nondeterministically stupider. Fun!&lt;/p&gt;

&lt;p&gt;No, really. Fun! You have so many options. Take &amp;ldquo;sub-agents&amp;rdquo;. People make a huge deal out of Claude Code&amp;rsquo;s sub-agents, but you can see now how trivial they are to implement: just a new context array, another &lt;code&gt;call&lt;/code&gt; to the model. Give each &lt;code&gt;call&lt;/code&gt; different tools. Make sub-agents talk to each other, summarize each other, collate and aggregate. Build tree structures out of them. Feed them back through the LLM to summarize them as a form of on-the-fly compression, whatever you like.&lt;/p&gt;

&lt;p&gt;Your wackiest idea will probably (1)  work and (2)  take 30 minutes to code.&lt;/p&gt;

&lt;p&gt;Haters, I love and have not forgotten about you. You can think all of this is ridiculous because LLMs are just stochastic parrots that hallucinate and plagiarize. But what you can&amp;rsquo;t do is make fun of &amp;ldquo;Context Engineering&amp;rdquo;. If Context Engineering was an &lt;a href='https://adventofcode.com/' title=''&gt;Advent of Code problem&lt;/a&gt;, it&amp;rsquo;d occur mid-December. It&amp;rsquo;s programming.&lt;/p&gt;
&lt;h2 id='nobody-knows-anything-yet-and-it-rules' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#nobody-knows-anything-yet-and-it-rules' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Nobody Knows Anything Yet And It Rules&lt;/span&gt;&lt;/h2&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;Maybe neither will! Skeptics could be right. (&lt;a href="https://www.darpa.mil/research/programs/ai-cyber" title=""&gt;Seems unlikely though&lt;/a&gt;.)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a href='https://xbow.com/' title=''&gt;Startups have raised tens of millions&lt;/a&gt; building agents to look for vulnerabilities in software. I have friends doing the same thing alone in their basements. Either group could win this race.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I am not a fan of the OWASP Top 10.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I&amp;rsquo;m stuck on vulnerability scanners  because I&amp;rsquo;m a security nerd. But also because it crystallizes interesting agent design decisions. For instance: you can write a loop feeding each file in a repository to an LLM agent. Or, as we saw with the ping example, you can let the LLM agent figure out what files to look at. You can write an agent that checks a file for everything in, say, the OWASP Top 10. Or you can have specific agent loops for DOM integrity, SQL injection, and authorization checking. You can seed your agent loop with raw source content. Or you can build an agent loop that builds an index of functions across the tree.&lt;/p&gt;

&lt;p&gt;You don&amp;rsquo;t know what works best until you try to write the agent.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m too spun up by this stuff, I know. But look at the tradeoff you get to make here. Some loops you write explicitly. Others are summoned from a Lovecraftian tower of inference weights. The dial is yours to turn. Make things too explicit and your agent will never surprise you, but also, it&amp;rsquo;ll never surprise you. Turn the dial to 11 and it will surprise you to death.&lt;/p&gt;

&lt;p&gt;Agent designs implicate a bunch of open software engineering problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to balance unpredictability against structured programming without killing the agent&amp;rsquo;s ability to problem-solve; in other words, titrating in just the right amount of nondeterminism.
&lt;/li&gt;&lt;li&gt;How best to connect agents to ground truth so they can&amp;rsquo;t lie to themselves about having solved a problem to early-exit their loops.
&lt;/li&gt;&lt;li&gt;How to connect agents (which, again, are really just arrays of strings with a JSON configuration blob tacked on) to do multi-stage operation, and what the most reliable intermediate forms are (JSON blobs? SQL databases? Markdown summaries) for interchange between them
&lt;/li&gt;&lt;li&gt;How to allocate tokens and contain costs.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;I&amp;rsquo;m used to spaces of open engineering problems that aren&amp;rsquo;t amenable to individual noodling. Reliable multicast. Static program analysis. Post-quantum key exchange. So I&amp;rsquo;ll own it up front that I&amp;rsquo;m a bit hypnotized by open problems that, like it or not, are now central to our industry and are, simultaneously, likely to be resolved in someone&amp;rsquo;s basement. It&amp;rsquo;d be one thing if exploring these ideas required a serious commitment of time and material. But each productive iteration in designing these kinds of systems is the work of 30 minutes.&lt;/p&gt;

&lt;p&gt;Get on this bike and push the pedals. Tell me you hate it afterwards, I&amp;rsquo;ll respect that. In fact, I&amp;rsquo;m psyched to hear your reasoning. But I don&amp;rsquo;t think anybody starts to understand this technology until they&amp;rsquo;ve built something with it.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Corrosion</title>
        <link rel="alternate" href="https://fly.io/blog/corrosion/"/>
        <id>https://fly.io/blog/corrosion/</id>
        <published>2025-10-22T00:00:00+00:00</published>
        <updated>2025-12-09T19:06:20+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/corrosion/assets/sqlite-cover.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Fly.io transmogrifies Docker containers into Fly Machines: micro-VMs running on our own hardware all over the world. The hardest part of running this platform isn’t managing the servers, and it isn’t operating the network; it’s gluing those two things together.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Several times a second, as customer CI/CD pipelines tear up or bring down &lt;a href='https://fly.io/machines' title=''&gt;Fly Machines&lt;/a&gt;, our state synchronization system blasts updates across our internal mesh, so that edge proxies from Tokyo to Amsterdam can keep the accurate routing table that allows them to route requests for applications to the nearest customer instances.&lt;/p&gt;

&lt;p&gt;On September 1, 2024, at 3:30PM EST, a new Fly Machine came up with a new &amp;ldquo;virtual service&amp;rdquo; configuration option a developer had just shipped. Within a few seconds every proxy in our fleet had locked up hard. It was the worst outage we&amp;rsquo;ve experienced: a period during which no end-user requests could reach our customer apps at all.&lt;/p&gt;

&lt;p&gt;Distributed systems are blast amplifiers. By propagating data across a network, they also propagate bugs in the systems that depend on that data. In the case of Corrosion, our state distribution system, those bugs propagate &lt;strong class='font-semibold text-navy-950'&gt;quickly&lt;/strong&gt;. The proxy code that handled that Corrosion update had succumbed to a &lt;a href='https://news.ycombinator.com/item?id=42093551' title=''&gt;notorious Rust concurrency footgun&lt;/a&gt;: an &lt;code&gt;if let&lt;/code&gt; expression over an &lt;code&gt;RWLock&lt;/code&gt; assumed (reasonably, but incorrectly) in its &lt;code&gt;else&lt;/code&gt; branch that the lock had been released. Instant and virulently contagious deadlock.&lt;/p&gt;

&lt;p&gt;A lesson we&amp;rsquo;ve learned the hard way: never trust a distributed system without an interesting failure story. If a distributed system hasn&amp;rsquo;t ruined a weekend or kept you up overnight, you don&amp;rsquo;t understand it yet. Which is why that&amp;rsquo;s how we&amp;rsquo;re introducing Corrosion, an unconventional service discovery system we built for our platform &lt;a href='https://github.com/superfly/corrosion' title=''&gt;and  open sourced&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id='our-face-seeking-rake' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-face-seeking-rake' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Our Face-Seeking Rake&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;State synchronization is the hardest problem in running a platform like ours. So why build a risky new distributed system for it? Because no matter what we try, that rake is waiting for our foot. The reason is our orchestration model.&lt;/p&gt;

&lt;p&gt;Virtually every mainstream orchestration system (including Kubernetes) relies on a centralized database to make decisions about where to place new workloads. Individual servers keep track of what they&amp;rsquo;re running, but that central database is the source of truth. At Fly.io, in order to scale across dozens of regions globally, &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;we flip that notion on its head&lt;/a&gt;: individual servers are the source of truth for their workloads.&lt;/p&gt;

&lt;p&gt;In our platform, our central API bids out work to what is in effect a global market of competing &amp;ldquo;worker&amp;rdquo; physical servers. By moving the authoritative source of information from a central scheduler to individual servers, we scale out without bottlenecking on a database that demands both responsiveness and consistency between São Paulo, Virginia, and Sydney.&lt;/p&gt;

&lt;p&gt;The bidding model is elegant, but it&amp;rsquo;s insufficient to route network requests. To allow an HTTP request in Tokyo to find the nearest instance in Sydney, we really do need some kind of global map of every app we host.&lt;/p&gt;

&lt;p&gt;For longer than we should have, we relied on &lt;a href='https://github.com/hashicorp/consul' title=''&gt;HashiCorp Consul&lt;/a&gt; to route traffic. Consul is fantastic software. Don&amp;rsquo;t build a global routing system on it. Then we &lt;a href='https://fly.io/blog/a-foolish-consistency/' title=''&gt;built SQLite caches of Consul&lt;/a&gt;. SQLite: also fantastic. But don&amp;rsquo;t do this either.&lt;/p&gt;

&lt;p&gt;Like an unattended turkey deep frying on the patio, truly global distributed consensus promises deliciousness while yielding only immolation. &lt;a href='https://raft.github.io/' title=''&gt;Consensus protocols like Raft &lt;/a&gt;break down over long distances. And they work against the architecture of our platform: our Consul cluster, running on the biggest iron we could buy, wasted time guaranteeing consensus for updates that couldn&amp;rsquo;t conflict in the first place.&lt;/p&gt;
&lt;h2 id='corrosion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#corrosion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Corrosion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;To build a global routing database, we moved away from distributed consensus and took cues from actual routing protocols.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://en.wikipedia.org/wiki/Open_Shortest_Path_First' title=''&gt;A protocol like OSPF&lt;/a&gt; has the same operating model and many of the same constraints we do. OSPF is a &amp;ldquo;&lt;a href='https://en.wikipedia.org/wiki/Link-state_routing_protocol' title=''&gt;link-state routing protocol&lt;/a&gt;&amp;rdquo;, which, conveniently for us, means that routers are sources of truth for their own links and responsible for quickly communicating  changes to every other router, so the network can make forwarding decisions.&lt;/p&gt;

&lt;p&gt;We have things easier than OSPF does. Its flooding algorithm can&amp;rsquo;t assume connectivity between arbitrary routers (solving that problem is the point of OSPF). But we run a global, fully connected WireGuard mesh between our servers. All we need to do is gossip efficiently.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://github.com/superfly/corrosion' title=''&gt;Corrosion is a Rust program&lt;/a&gt; that propagates a SQLite database with a gossip protocol.&lt;/p&gt;

&lt;p&gt;Like Consul, our gossip protocol is &lt;a href='https://fly.io/blog/building-clusters-with-serf#what-serf-is-doing' title=''&gt;built on SWIM&lt;/a&gt;. Start with the simplest, dumbest group membership protocol you can imagine: every node spams every node it learns about with heartbeats. Now, just two tweaks: first, each step of the protocol, spam a random subset of nodes, not the whole set. Then, instead of freaking out when a heartbeat fails, mark it &amp;ldquo;suspect&amp;rdquo; and ask another random subset of neighbors to ping it for you. SWIM converges on global membership very quickly.&lt;/p&gt;

&lt;p&gt;Once membership worked out, we run QUIC between nodes in the cluster to broadcast changes and reconcile state for new nodes.&lt;/p&gt;

&lt;p&gt;Corrosion looks like a globally synchronized database. You can open it with SQLite and just read things out of its tables. What makes it interesting is what it doesn&amp;rsquo;t do: no locking, no central servers, and no distributed consensus. Instead, we exploit our orchestration model: workers own their own state, so updates from different workers almost never conflict.&lt;/p&gt;

&lt;p&gt;We do impose some order. Every node in a Corrosion cluster will eventually receive the same set of updates, in some order. To ensure every instance arrives at the same &amp;ldquo;working set&amp;rdquo; picture, we use &lt;a href='https://github.com/vlcn-io/cr-sqlite' title=''&gt;cr-sqlite, the CRDT SQLite extension&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;cr-sqlite works by marking specified SQLite tables as CRDT-managed. For these table, changes to any column of a row are logged in a special &lt;code&gt;crsql_changes&lt;/code&gt;table. Updates to tables are applied last-write-wins using logical timestamps (that is, causal ordering rather than wall-clock ordering). &lt;a href='https://github.com/superfly/corrosion/blob/main/doc/crdts.md' title=''&gt;You can read much more about how that works here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As rows are updated in Corrosion&amp;rsquo;s ordinary SQL tables, the resulting changes are collected from &lt;code&gt;crsql_changes&lt;/code&gt;. They&amp;rsquo;re bundled into batched update packets and gossiped.&lt;/p&gt;

&lt;p&gt;When things are going smoothly, Corrosion is easy to reason about. Many customers of Corrosion&amp;rsquo;s data don&amp;rsquo;t even need to know it exists, just where the database is. We don&amp;rsquo;t fret over &amp;ldquo;leader elections&amp;rdquo; or bite our nails watching metrics for update backlogs. And it&amp;rsquo;s fast as all get-out.&lt;/p&gt;
&lt;h2 id='shit-happens' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#shit-happens' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Shit Happens&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;This is a story about how we made one good set of engineering decisions and &lt;a href='https://how.complexsystems.fail/' title=''&gt;never experienced any problems&lt;/a&gt;. &lt;a href='https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/' title=''&gt;Please clap&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We told you already about the worst problem Corrosion was involved with: efficiently gossiping a deadlock bug to every proxy in our fleet, shutting our whole network down. Really, Corrosion was just a bystander for that outage. But it perpetrated others.&lt;/p&gt;

&lt;p&gt;Take a classic ops problem: the unexpectedly expensive DDL change. You wrote a simple migration, tested it, merged it to main, and went to bed, wrongly assuming the migration wouldn&amp;rsquo;t cause an outage when it ran in prod. Happens to the best of us.&lt;/p&gt;

&lt;p&gt;Now spice it up. You made a trivial-seeming schema change to a CRDT table hooked up to a global gossip system. Now, when the deploy runs, thousands of high-powered servers around the world join a chorus of database reconciliation messages that melts down the entire cluster.&lt;/p&gt;

&lt;p&gt;That happened to us last year when a team member added a nullable column to a Corrosion table. New nullable columns are kryptonite to large Corrosion tables: &lt;code&gt;cr-sqlite&lt;/code&gt; needs to backfill values for every row in the table. It played out as if every Fly Machine on our platform had suddenly changed state simultaneously, just to fuck us.&lt;/p&gt;

&lt;p&gt;Gnarlier war story: for a long time we ran both Corrosion and Consul, because two distributed systems means twice the resiliency. One morning, a Consul mTLS certificate expired. Every worker in our fleet severed its connection to Consul.&lt;/p&gt;

&lt;p&gt;We should have been fine. We had Corrosion running. Except: under the hood, every worker in the fleet is doing a backoff loop trying to reestablish connectivity to Consul. Each of those attempts re-invokes a code path to update Fly Machine state. That code path incurs a Corrosion write.&lt;/p&gt;

&lt;p&gt;By the time we&amp;rsquo;ve figured out what the hell is happening, we&amp;rsquo;re literally saturating our uplinks almost everywhere in our fleet. We apologize to our uplink providers.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s been a long time since anything like this has happened at Fly.io, but preventing the next one is basically all we think about anymore.&lt;/p&gt;
&lt;h2 id='iteration' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#iteration' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Iteration&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In retrospect, our Corrosion rollout repeated a mistake we made with Consul: we built a single global state domain. Nothing about Corrosion&amp;rsquo;s design required us to do this, and we&amp;rsquo;re unwinding that decision now. Hold that thought. We got some big payoffs from some smaller lifts.&lt;/p&gt;

&lt;p&gt;First, and most importantly, we watchdogged everything. We showed you a contagious deadlock bug, lethal because our risk model was missing &amp;ldquo;these Tokio programs might deadlock&amp;rdquo;. Not anymore. Our &lt;a href='https://tokio.rs/' title=''&gt;Tokio programs&lt;/a&gt; all have built-in watchdogs; an event-loop stall will bounce the service and make a king-hell alerting racket. Watchdogs have cancelled multiple outages. Minimal code, easy win. Do this in your own systems.&lt;/p&gt;

&lt;p&gt;Then, we extensively tested Corrosion itself. We&amp;rsquo;ve written about &lt;a href='https://fly.io/blog/parking-lot-ffffffffffffffff/' title=''&gt;a bug we found in the Rust &lt;code&gt;parking_lot&lt;/code&gt; library&lt;/a&gt;. We spent months looking for similar bugs &lt;a href='https://antithesis.com/product/how_antithesis_works/' title=''&gt;with Antithesis&lt;/a&gt;. Again: do recommend. It retraced our steps on the &lt;code&gt;parking_lot&lt;/code&gt; bug easily; the bug wouldn&amp;rsquo;t have been worth the blog post if we&amp;rsquo;d been using Antithesis at the time. &lt;a href='https://antithesis.com/docs/multiverse_debugging/overview/' title=''&gt;Multiverse debugging&lt;/a&gt; is killer for distributed systems.&lt;/p&gt;

&lt;p&gt;No amount of testing will make us trust a distributed system. So we&amp;rsquo;ve made it simpler to rebuild Corrosion&amp;rsquo;s database from our workers. We keep checkpoint backups of the Corrosion database on object storage. That was smart of us. When shit truly went haywire last year, we had the option to reboot the cluster, which is ultimately what we did. That eats some time (the database is large and propagating is expensive), but diagnosing and repairing distributed systems mishaps takes even longer.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve also improved the way our workers feed Corrosion. Until recently, any time a worker updated its local database, we published the same incremental update to Corrosion. &lt;a href='https://community.fly.io/t/self-healing-machine-state-synchronization-and-service-discovery/26134' title=''&gt;But now we&amp;rsquo;ve eliminated partial updates.&lt;/a&gt; Instead, when a Fly Machine changes, we re-publish the entire data set for the Machine. Because of how Corrosion resolves changes to its own rows, the node receiving the re-published Fly Machine automatically filters out the no-op changes before gossiping them. Eliminating partial updates forecloses a bunch of bugs (and, we think, kills off a couple sneaky ones we&amp;rsquo;ve been chasing). We should have done it this way to begin with.&lt;/p&gt;

&lt;p&gt;Finally, let&amp;rsquo;s revisit that global state problem. After the contagious deadlock bug, we concluded we need to evolve past a single cluster. So we took on a project we call &amp;ldquo;regionalization&amp;rdquo;, which creates a two-level database scheme. Each region we operate in runs a Corrosion cluster with fine-grained data about every Fly Machine in the region. The global cluster then maps applications to regions, which is sufficient to make forwarding decisions at our edge proxies.&lt;/p&gt;

&lt;p&gt;Regionalization reduces the blast radius of state bugs. Most things we track don&amp;rsquo;t have to matter outside their region (importantly, most of the code changes to what we track are also region-local). We can roll out changes to this kind of code in ways that, worst case, threaten only a single region.&lt;/p&gt;
&lt;h2 id='the-new-system-works' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-new-system-works' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The New System Works&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Most distributed systems have state synchronization challenges.  Corrosion has a different &amp;ldquo;shape&amp;rdquo; than most of those systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It doesn&amp;rsquo;t rely on distributed consensus, like &lt;a href='https://github.com/hashicorp/consul' title=''&gt;Consul&lt;/a&gt;, &lt;a href='https://zookeeper.apache.org/' title=''&gt;Zookeeper&lt;/a&gt;, &lt;a href='https://etcd.io/' title=''&gt;Etcd&lt;/a&gt;, &lt;a href='https://www.cockroachlabs.com/docs/stable/architecture/replication-layer' title=''&gt;Raft&lt;/a&gt;, or &lt;a href='https://rqlite.io/' title=''&gt;rqlite&lt;/a&gt; (which we came very close to using).
&lt;/li&gt;&lt;li&gt;It doesn&amp;rsquo;t rely on a large-scale centralized data store, like &lt;a href='https://www.foundationdb.org/' title=''&gt;FoundationDB&lt;/a&gt; or databases backed by S3-style object storage.
&lt;/li&gt;&lt;li&gt;It&amp;rsquo;s nevertheless highly distributed (each of thousands of workers run nodes), converges quickly (in seconds), and presents as a simple SQLite database. Neat!
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;It wasn&amp;rsquo;t easy getting here. Corrosion is a large part of what every engineer at Fly.io who writes Rust works on.&lt;/p&gt;

&lt;p&gt;Part of what&amp;rsquo;s making Corrosion work is that we&amp;rsquo;re careful about what we put into it. Not every piece of state we manage needs gossip propagation. &lt;code&gt;tkdb&lt;/code&gt;, the backend for &lt;a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''&gt;our Macaroon tokens&lt;/a&gt;, is a much simpler SQLite service backed by &lt;a href='https://litestream.io/' title=''&gt;Litestream&lt;/a&gt;. So is Pet Sematary, the secret store we built to replace HashiCorp Vault.&lt;/p&gt;

&lt;p&gt;Still, there are probably lots of distributed state problems that want something more like a link-state routing protocol and less like a distributed database. If you think you might have one of those, &lt;a href='https://github.com/superfly/corrosion' title=''&gt;feel free to take Corrosion for a spin&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Corrosion is Jérôme Gravel-Niquet&amp;rsquo;s brainchild. For the last couple years, much of the iteration on it was led by Somtochi Onyekwere and Peter Cai. The work was alternately cortisol- and endorphin-inducing. We&amp;rsquo;re glad to finally get to talk about it in detail.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Kurt Got Got</title>
        <link rel="alternate" href="https://fly.io/blog/kurt-got-got/"/>
        <id>https://fly.io/blog/kurt-got-got/</id>
        <published>2025-10-08T00:00:00+00:00</published>
        <updated>2025-12-11T17:29:24+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/kurt-got-got/assets/Kurt_Got_Got.jpg"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;The $FLY Airdrop is live!  Claim your share of &lt;a href="https://fly.io/blog/macaroons-escalated-quickly/" title=""&gt;the token powering Fly.io’s global network&lt;/a&gt; of 3M+ apps and (🤮) own a piece of the sky!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We know. Our Twitter got owned. We knew within moments of it happening. We know exactly how it happened. Nothing was at risk other than our Twitter account (and one Fly.io employee&amp;rsquo;s self-esteem).  Also: for fuck&amp;rsquo;s sake.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s what happened: Kurt Mackey, our intrepid CEO, got phished.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Had this been an impactful attack, we would not be this flippant about it. For this, though, any other tone on our part would be false.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='how-they-got-kurt' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-they-got-kurt' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How They Got Kurt&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Two reasons: one, it was a pretty good phishing attack, and two, Twitter fell outside the &amp;ldquo;things we take seriously&amp;rdquo; boundary.&lt;/p&gt;

&lt;p&gt;The phishing attack was effective because it exploited a deep psychological vulnerability in our management team: we are old and out of touch with the youths of today.&lt;/p&gt;

&lt;p&gt;For many months now, we&amp;rsquo;ve had an contractor/intern-type-person Boosting Our Brand on Twitter by posting dank developer memes (I think that&amp;rsquo;s what they&amp;rsquo;re called). The thing about this dankery is that we don&amp;rsquo;t really understand it. I mean, hold on, we know what the memes mean technically. We just don&amp;rsquo;t get why they&amp;rsquo;re funny.&lt;/p&gt;

&lt;p&gt;However, in pushing back on them, we&amp;rsquo;re up against two powerful forces:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The dank memes appear to perform better than the stuff we ourselves write on Twitter.
&lt;/li&gt;&lt;li&gt;We are reliably informed by our zoomer children that we are too cringe to be trusted on these matters.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Here&amp;rsquo;s the phish Kurt got:&lt;/p&gt;

&lt;p&gt;&lt;img alt="A pretty-plausible Twitter alert" src="/blog/kurt-got-got/assets/phish.png?2/3&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;Diabolical. Like a scalpel expertly wielded against Kurt&amp;rsquo;s deepest &lt;a href='https://theonion.com/cool-dad-raising-daughter-on-media-that-will-put-her-en-1819572981/' title=''&gt;middle-aged-dude&lt;/a&gt; insecurity. Our ruthless attackers clinically designed this email to trigger an autonomic Kurt response: &amp;ldquo;oh, what the fuck is this, and why did we post it?&amp;rdquo;&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;ATO is cool-kid for “got owned”&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I&amp;rsquo;m getting a little ahead of the story here. We knew our X.com account had suffered an ATO because a bunch of us simultaneously got another email saying that the &lt;a href='https://twitter.com/flydotio' title=''&gt;@flydotio&lt;/a&gt; account&amp;rsquo;s email address now pointed to &lt;code&gt;[email protected]&lt;/code&gt;. Our immediate response was to audit all accesses to the login information in &lt;a href='https://1password.com/' title=''&gt;1Password&lt;/a&gt;, to cut all access for anybody who&amp;rsquo;d recently pulled it; your worst-case assumption in a situation like this is that someone&amp;rsquo;s endpoint has been owned up.&lt;/p&gt;

&lt;p&gt;Fortunately, nobody lost access for very long. I called Kurt to let him know why he was being locked out, and 5 seconds later, he&amp;rsquo;d &lt;a href='https://archive.is/6rVqf' title=''&gt;realized what had happened.&lt;/a&gt; &lt;strong class='font-semibold text-navy-950'&gt;Don&amp;rsquo;t click anything there.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id='why-it-worked' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-it-worked' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Why It Worked&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;That&amp;rsquo;s the right question to ask, isn&amp;rsquo;t it? How could this have been possible in the first place?&lt;/p&gt;

&lt;p&gt;Contrary to one popular opinion, you don&amp;rsquo;t defeat phishing by training people not to click on things. I mean, tell them not to, sure! But eventually, under continued pressure, everybody clicks. &lt;a href='https://people.cs.uchicago.edu/~grantho/papers/oakland2025_phishing-training.pdf' title=''&gt;There&amp;rsquo;s science on this&lt;/a&gt;. The cool kids haven&amp;rsquo;t done phishing simulation training in years.&lt;/p&gt;

&lt;p&gt;What you&amp;rsquo;re supposed to do instead is use phishing-resistant authentication. This is almost the whole backstory for &lt;a href='https://www.imperialviolet.org/tourofwebauthn/tourofwebauthn.html' title=''&gt;U2F,  FIDO2&lt;/a&gt; and &lt;a href='https://support.apple.com/en-us/102195' title=''&gt;Passkeys&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Phishing-resistant authentication works by mutual authentication (or, if you&amp;rsquo;re a stickler, by origin- and channel-binding). Phishes are malicious proxies for credentials. Modern MFA schemes like FIDO2 break that proxy flow; your browser won&amp;rsquo;t send real credentials to the fake site.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;there’s more to it than this, but, broad strokes.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This is, in fact, how all of our infrastructure is secured at Fly.io; specifically, we get &lt;a href='https://fly.io/blog/soc2-the-screenshots-will-continue-until-security-improves/#what-soc2-made-us-do' title=''&gt;everything behind an IdP&lt;/a&gt; (in our case: Google&amp;rsquo;s) and have it require phishing-proof MFA. You&amp;rsquo;re unlikely to phish your way to viewing logs here, or to refunding a customer bill at Stripe, or to viewing infra metrics, because all these things require an SSO login through Google.&lt;/p&gt;

&lt;p&gt;Twitter, on the other hand. Yeah, so, about that. You may have heard that, a few years back, there were some goings-on involving Twitter. Many of us at Fly.io &lt;a href='https://hachyderm.io/@flydotio' title=''&gt;decamped for Mastodon&lt;/a&gt;, and &lt;a href='https://bsky.app/profile/did:plc:j7herf6n4xiig2yg7fqdmkci' title=''&gt;later to Bluesky.&lt;/a&gt; There was a window of time in 2023-2024 where it looked as if Twitter might not be a long term thing for us at all.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (to whom I sincerely apologize for having assumed they had been owned up and were the proximate cause of the hack)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;As a result, Twitter had been a sort of legacy shared account for us, with credentials managed in 1Password and shared with our zoomer contractor†.&lt;/p&gt;

&lt;p&gt;Which is why Kurt was in a position to pull credentials from 1Password and log in to members-x.com in response to an email from alerts-x.com.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Still: we could have dodged this attack with hygiene: Kurt complains that “x.com” is an extremely phishable domain, and, sure, but also: the 1Password browser plugin would have noticed that “members-x.com” wasn’t an “x.com” host.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='what-took-so-long' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-took-so-long' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What Took So Long&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The attacker immediately revoked all tokens and set up new 2FA, so while we were quickly able to reset our password, we couldn&amp;rsquo;t lock them out of our account without an intervention from X.com, which took something like 1 5 hours to set up.&lt;/p&gt;

&lt;p&gt;(That&amp;rsquo;s not a knock on X.com; 15 hours for a 2FA reset isn&amp;rsquo;t outside industry norms).&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;re obviously making a lot of noise about this now, but we were pretty quiet during the incident itself (beyond just &amp;ldquo;We know. We knew 45 seconds after it happened. We know exactly how it happened. It&amp;rsquo;s just a Twitter thing.&amp;rdquo;)&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s because, in the grand scheme of things, the attack was pretty chill: &lt;a href='https://archive.is/PTO2M' title=''&gt;a not-very-plausible crypto scam&lt;/a&gt; that presumably generated $0 for the attackers, 15+ hours of &lt;code&gt;brand damage&lt;/code&gt;, and extra security engineering cycles burnt on watchful waiting. Our users weren&amp;rsquo;t under attack, and the account wasn&amp;rsquo;t being used to further intercept customer accounts. At one point, the attackers apparently deleted our whole Twitter history, which, like, don&amp;rsquo;t threaten us with a good time. So we let it roll, until we got our account recovered the next morning.&lt;/p&gt;
&lt;h2 id='the-moral-of-the-story-is' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-moral-of-the-story-is' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Moral Of The Story Is&lt;/span&gt;&lt;/h2&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;“Really the biggest takeaway for me is that Kurt reads his email.”&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Obviously Kurt loses his commit access. The time comes in the life of every CEO, and now it comes for him. &lt;/p&gt;

&lt;p&gt;Also, we&amp;rsquo;ll finally have a population sample for &amp;ldquo;incident response&amp;rdquo; in &lt;a href='https://fly.io/blog/soc2-the-screenshots-will-continue-until-security-improves/' title=''&gt;our next SOC2&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Maybe we&amp;rsquo;ll post more on Twitter. Or maybe we&amp;rsquo;ll double down on Zoomer memes. I don&amp;rsquo;t know. Social media is really weird right now. Either way: our Twitter access is Passkeys now.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;seriously don’t click anything on that page&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you were inclined to take us up on an &amp;ldquo;airdrop&amp;rdquo; to &amp;ldquo;claim a share&amp;rdquo; of the &amp;ldquo;token&amp;rdquo; powering Fly.io, the site is &lt;a href='https://archive.is/PTO2M' title=''&gt;still up&lt;/a&gt;. You can connect your wallet it it! You&amp;rsquo;ll lose all your money. But if we&amp;rsquo;d actually done an ICO, you&amp;rsquo;d have lost all your money anyways.&lt;/p&gt;

&lt;p&gt;Somebody involved in pulling this attack off had to come up with &amp;ldquo;own a piece of the sky!&amp;rdquo;, and I think that&amp;rsquo;s punishment enough for them.&lt;/p&gt;

&lt;p&gt;Whatever you&amp;rsquo;re operating that isn&amp;rsquo;t behind phishing-resistant MFA, or, better yet, an SSO IdP that requires phishing-resistant MFA: that thing is eventually going to get phished. Dance around the clown-fire of our misfortune if you must, but let us be a lesson to you as well.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Litestream v0.5.0 is Here</title>
        <link rel="alternate" href="https://fly.io/blog/litestream-v050-is-here/"/>
        <id>https://fly.io/blog/litestream-v050-is-here/</id>
        <published>2025-10-02T00:00:00+00:00</published>
        <updated>2025-10-02T18:28:21+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/litestream-v050-is-here/assets/litestream-v050-is-here.jpg"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;I’m Ben Johnson, and I work on Litestream at Fly.io. Litestream makes it easy to build SQLite-backed full-stack applications  with resilience to server failure. It’s open source, runs anywhere, and&lt;/strong&gt; &lt;a href="https://litestream.io/" title=""&gt;&lt;strong class="font-semibold text-navy-950"&gt;it’s easy to get started&lt;/strong&gt;&lt;/a&gt;&lt;strong class="font-semibold text-navy-950"&gt;.&lt;/strong&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Litestream is the missing backup/restore system for SQLite. It runs as a sidecar process in the background, alongside unmodified SQLite applications, intercepting WAL checkpoints and streaming them to object storage in real time. Your application doesn&amp;rsquo;t even know it&amp;rsquo;s there. But if your server crashes, Litestream lets you quickly restore the database to your new hardware.&lt;/p&gt;

&lt;p&gt;The result: you can safely build whole full-stack applications on top of SQLite.&lt;/p&gt;

&lt;p&gt;A few months back, we announced &lt;a href='https://fly.io/blog/litestream-revamped/' title=''&gt;plans for a major update to Litestream&lt;/a&gt;. I&amp;rsquo;m psyched to announce that the first batch of those changes are now &amp;ldquo;shipping&amp;rdquo;. Litestream is  faster and now supports efficient point-in-time recovery (PITR).&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m going to take a beat to recap Litestream and how we got here, then talk about how these changes work and what you can expect to see with them.&lt;/p&gt;
&lt;h2 id='litestream-to-litefs-to-litestream' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#litestream-to-litefs-to-litestream' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Litestream to LiteFS to Litestream&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Litestream is one of two big SQLite things I&amp;rsquo;ve built. The other one, originally intended as a sort of sequel to Litestream, is LiteFS.&lt;/p&gt;

&lt;p&gt;Boiled down to a sentence: LiteFS uses a FUSE filesystem to crawl further up into SQLite&amp;rsquo;s innards, using that access to perform live replication, for unmodified SQLite-backed apps.&lt;/p&gt;

&lt;p&gt;The big deal about LiteFS for us is that it lets you do the multiregion primary/read-replica deployment people love Postgres for: reads are fast everywhere, and writes are sane and predictable. We were excited to make this possible for SQLite, too.&lt;/p&gt;

&lt;p&gt;But the market has spoken! Users prefer Litestream. And honestly, we get it: Litestream is easier to run and to reason about. So we&amp;rsquo;ve shifted our focus back to it. First order of business: &lt;a href='https://fly.io/blog/litestream-revamped/' title=''&gt;take what we learned building LiteFS and stick as much of it as we can back into Litestream&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id='the-ltx-file-format' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-ltx-file-format' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The LTX File Format&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Consider this basic SQL table:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative sql"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-wy16kafx"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-wy16kafx"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;sandwiches&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="n"&gt;AUTOINCREMENT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;star_rating&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;reviewer_id&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In our hypothetical, this table backs a wildly popular sandwich-reviewing app that we keep trying to get someone to write. People eat a lot of sandwiches and this table gets a lot of writes. Because it makes my point even better and it&amp;rsquo;s funny, assume people dither a lot about their sandwich review for the first couple minutes after they leave it. This Quiznos sub… is it ⭐ or ⭐⭐?&lt;/p&gt;

&lt;p&gt;Underneath SQLite is a B-tree. Like databases everywhere, SQLite divides storage up into disk-aligned pages, working hard to read as few pages as possible for any task while treating work done within a page as more or less free. SQLite always reads and writes in page-sized chunks.&lt;/p&gt;

&lt;p&gt;Our &lt;code&gt;sandwiches&lt;/code&gt; table includes a feature that&amp;rsquo;s really painful for a tool like Litestream that thinks in pages: an automatically updating primary key. That key dictates that every insert into the table hits the rightmost leaf page in the underlying table B-tree. For SQLite itself, that&amp;rsquo;s no problem. But Litestream has less information to go on: it sees only a feed of whole pages it needs to archive.&lt;/p&gt;

&lt;p&gt;Worse still, when it comes time to restore the database – something you tend to want to happen quickly – you have to individually apply those small changes, as whole pages. Your app is down, PagerDuty is freaking out, and you&amp;rsquo;re sitting there watching Litestream reconstruct your Quiznos uncertainty a page (and an S3 fetch) at a time.&lt;/p&gt;

&lt;p&gt;So, LTX. Let me explain. We needed LiteFS to be transaction-aware. It relies on finer-grained information than just raw dirty pages (that&amp;rsquo;s why it needs the FUSE filesystem). To ship transactions, rather than pages, we invented a &lt;a href='https://github.com/superfly/ltx' title=''&gt;file format we call LTX&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;LTX was designed as an interchange format for transactions, but for our purposes in Litestream, all we care about is that LTX files represent ordered ranges of pages, and that it supports compaction.&lt;/p&gt;

&lt;p&gt;Compaction is straightforward. You&amp;rsquo;ve stored a bunch of LTX files that collect numbered pages. Now you want to to restore a coherent picture of the database. Just replay them newest to oldest, skipping duplicate pages (newer wins), until all changed pages are accounted for.&lt;/p&gt;

&lt;p&gt;Importantly, LTX isn&amp;rsquo;t limited to whole database backups. We can use LTX compaction to compress a bunch of LTX files into a single file with no duplicated pages. And Litestream now uses this capability to create a hierarchy of compactions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;at Level 1, we compact all the changes in a 30-second time window
&lt;/li&gt;&lt;li&gt;at Level 2, all the Level 1 files in a 5-minute window
&lt;/li&gt;&lt;li&gt;at Level 3, all the Level 2&amp;rsquo;s over an hour.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Net result: we can restore a SQLite database to any point in time, &lt;em&gt;using only a dozen or so files on average&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Litestream performs this compaction itself. It doesn&amp;rsquo;t rely on SQLite to process the WAL file. Performance is limited only by I/O throughput.&lt;/p&gt;
&lt;h2 id='no-more-generations' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#no-more-generations' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;No More Generations&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;What people like about Litestream is that it&amp;rsquo;s just an ordinary Unix program. But like any Unix program, Litestream can crash. It&amp;rsquo;s not supernatural, so when it&amp;rsquo;s not running, it&amp;rsquo;s not seeing database pages change. When it misses changes, it falls out of sync with the database.&lt;/p&gt;

&lt;p&gt;Lucky for us, that&amp;rsquo;s easy to detect. When it notices a gap between the database and our running &amp;ldquo;shadow-WAL&amp;rdquo; backup, Litestream resynchronizes from scratch.&lt;/p&gt;

&lt;p&gt;The only time this gets complicated is if you have multiple Litestreams backing up to the same destination. To keep multiple Litestreams from stepping on each other, Litestream divides backups into &amp;ldquo;generations&amp;rdquo;, creating a new one any time it resyncs. You can think of generations as Marvel Cinematic Universe parallel dimensions in which your database might be simultaneously living in.&lt;/p&gt;

&lt;p&gt;Yeah, we didn&amp;rsquo;t like those movies much either.&lt;/p&gt;

&lt;p&gt;LTX-backed Litestream does away with the concept entirely. Instead, when we detect a break in WAL file continuity, we re-snapshot with the next LTX file. Now we have a monotonically incrementing transaction ID. We can use it look up database state at any point in time, without searching across generations.&lt;/p&gt;
&lt;h2 id='upgrading-to-litestream-v0-5-0' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#upgrading-to-litestream-v0-5-0' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Upgrading to Litestream v0.5.0&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Due to the file format changes, the new version of Litestream can&amp;rsquo;t restore from old v0.3.x WAL segment files.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s OK though! The upgrade process is simple: just start using the new version. It&amp;rsquo;ll leave your old WAL files intact, in case you ever need to revert to the older version.The new LTX files are stored cleanly in an &lt;code&gt;ltx&lt;/code&gt; directory on your replica.&lt;/p&gt;

&lt;p&gt;The configuration file is fully backwards compatible.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s one small catch. We added a new constraint. You only get a single replica destination per database. This probably won&amp;rsquo;t affect you, since it&amp;rsquo;s how most people use Litestream already. We&amp;rsquo;ve made it official.&lt;/p&gt;

&lt;p&gt;The rationale: having a single source of truth simplifies development for us, and makes the tool easier to reason about. Multiple replicas can diverge and are sensitive to network availability. Conflict resolution is brain surgery.&lt;/p&gt;

&lt;p&gt;Litestream commands still work the same. But you&amp;rsquo;ll see references to &amp;ldquo;transaction IDs&amp;rdquo; (TXID) for LTX files, rather than the &lt;code&gt;generation/index/offset&lt;/code&gt; we used previously with WAL segments.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve also changed &lt;code&gt;litestream wal&lt;/code&gt; to &lt;code&gt;litestream ltx&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id='other-stuff-v0-5-0-does-better' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#other-stuff-v0-5-0-does-better' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Other Stuff v0.5.0 Does Better&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve beefed up the &lt;a href='https://github.com/superfly/ltx' title=''&gt;underlying LTX file format library&lt;/a&gt;. It used to be an LTX file was just a sorted list of pages, all compressed together. Now we compress per-page, and keep an index at the end of the LTX file to pluck individual pages out.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;re not seeing it yet, but we&amp;rsquo;re excited about this change: we can operate page-granularly even dealing with large LTX files. This allows for more features. A good example: we can build features that query from any point in time, without downloading the whole database.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve also gone back through old issues &amp;amp; PRs to improve quality-of-life. CGO is now gone. We&amp;rsquo;ve settled the age-old contest between &lt;code&gt;mattn/go-sqlite3&lt;/code&gt; and &lt;code&gt;modernc.org/sqlite&lt;/code&gt; in favor of &lt;code&gt;modernc.org&lt;/code&gt;. This is super handy for people with automated build systems that want to run from a MacBook but deploy on an x64 server, since it lets the cross-compiler work.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve also added a replica type for NATS JetStream. Users that already have JetStream running can get Litestream going without adding an object storage dependency.&lt;/p&gt;

&lt;p&gt;And finally, we&amp;rsquo;ve upgraded all our clients (S3, Google Storage, &amp;amp; Azure Blob Storage) to their latest versions. We&amp;rsquo;ve also moved our code to support newer S3 APIs.&lt;/p&gt;
&lt;h2 id='whats-next' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What&amp;rsquo;s next?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The next major feature we&amp;rsquo;re building out is a Litestream VFS for read replicas. This will let you instantly spin up a copy of the database and immediately read pages from S3 while the rest of the database is hydrating in the background.&lt;/p&gt;

&lt;p&gt;We already have a proof of concept working and we&amp;rsquo;re excited to show it off when it&amp;rsquo;s ready!&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Build Better Agents With MorphLLM</title>
        <link rel="alternate" href="https://fly.io/blog/build-better-agents-with-morphllm/"/>
        <id>https://fly.io/blog/build-better-agents-with-morphllm/</id>
        <published>2025-08-25T00:00:00+00:00</published>
        <updated>2025-09-03T19:05:57+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/build-better-agents-with-morphllm/assets/morphllm.webp"/>
        <content type="html">&lt;p&gt;I&amp;rsquo;m an audiophile, which is a nice way to describe someone who spends their children&amp;rsquo;s college fund on equipment that yields no audible improvement in sound quality. As such, I refused to use wireless headphones for the longest time. The fun thing about wired headphones is when you forget they&amp;rsquo;re on and you stand up, you simultaneously cause irreparable neck injuries and extensive property damage. This eventually prompted me to buy good wireless headphones and, you know what, I break fewer things now. I can also stand up from my desk and not be exposed to the aural horrors of the real world. &lt;/p&gt;

&lt;p&gt;This is all to say, sometimes you don&amp;rsquo;t know how big a problem is until you solve it. This week, I chatted to the fine people building &lt;a href='https://morphllm.com/' title=''&gt;MorphLLM&lt;/a&gt;, which is exactly that kind of solution for AI agent builders. &lt;/p&gt;
&lt;h2 id='slow-wasteful-and-expensive-ai-code-changes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#slow-wasteful-and-expensive-ai-code-changes' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Slow, Wasteful and Expensive AI Code Changes&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;If you’re building AI agents that write or edit code, you’re probably accepting the following as &amp;ldquo;the way it is&amp;rdquo;: Your agent needs to correct a single line of code, but rewrites an entire file to do it. Search-and-replace right? It’s fragile, breaks formatting, silently fails, or straight up leaves important functions out. The result is slow, inaccurate code changes, excessive token use, and an agent feels incompetent and unreliable.&lt;/p&gt;

&lt;p&gt;Full file rewrites are context-blind and prone to hallucinations, especially when editing that 3000+ line file that you&amp;rsquo;ve been meaning to refactor. And every failure and iteration is wasted compute, wasted money and worst of all, wasted time.&lt;/p&gt;
&lt;h2 id='why-we-arent-thinking-about-this-or-why-i-wasnt' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-we-arent-thinking-about-this-or-why-i-wasnt' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Why We Aren’t Thinking About This (or why I wasn&amp;rsquo;t)&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;AI workflows are still new to everyone. Best practices are still just opinions and most tooling is focused on model quality, not developer velocity or cost. This is a big part of why we feel that slow, wasteful code edits are just the price of admission for AI-powered development.&lt;/p&gt;

&lt;p&gt;In reality, these inefficiencies become a real bottleneck for coding agent tools. The hidden tax on every code edit adds up and your users pay with their time, especially as teams scale and projects grow more complex.&lt;/p&gt;
&lt;h2 id='better-faster-ai-code-edits-with-morph-fast-apply' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#better-faster-ai-code-edits-with-morph-fast-apply' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Better, Faster AI Code Edits with Morph Fast Apply&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;MorphLLM&amp;rsquo;s core innovation is Morph Fast Apply. It&amp;rsquo;s an edit merge tool that is semantic, structure-aware and designed specifically for code. Those are big words to describe a tool that will empower your agents to make single line changes without  rewriting whole files or relying on brittle search-and-replace. Instead, your agent applies precise, context-aware edits and it does it ridiculously fast. &lt;/p&gt;

&lt;p&gt;It works like this: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You add an &amp;lsquo;edit_file&amp;rsquo; tool to your agents tools.
&lt;/li&gt;&lt;li&gt;Your agent outputs tiny &lt;code&gt;edit_file&lt;/code&gt; snippets, using &lt;code&gt;//...existing code...&lt;/code&gt; placeholders to indicate unchanged code.
&lt;/li&gt;&lt;li&gt;Your backend calls Morph’s Apply API, which merges the changes semantically. It doesn&amp;rsquo;t just replace raw text, it makes targeted merges with the code base as context. 
&lt;/li&gt;&lt;li&gt;You write back the precisely edited file. No manual patching, no painful conflict resolution, no context lost.
&lt;/li&gt;&lt;/ul&gt;
&lt;h2 id='the-numbers' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-numbers' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Numbers&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;MorphLLM&amp;rsquo;s Apply API processes over 4,500 tokens per second and their benchmark results are nuts. We&amp;rsquo;re talking 98% accuracy in ~6 seconds per file. Compare this to 35s (with error corrections) at 86% accuracy for traditional search-and-replace systems. Files up to 9k tokens in size take ~4 seconds to process. &lt;/p&gt;

&lt;p&gt;Just look at the damn &lt;a href='https://morphllm.com/benchmarks' title=''&gt;graph&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;img alt="Time Performance Analysis" src="/blog/build-better-agents-with-morphllm/assets/morph_graph.webp" /&gt;&lt;/p&gt;

&lt;p&gt;These are game-changing numbers for agent builders. Real-time code UIs become possible. Dynamic codebases can self-adapt in seconds, not minutes. Scale to multi-file edits, documentation, and even large asset transformations without sacrificing speed or accuracy.&lt;/p&gt;
&lt;h2 id='how-to-get-in-on-the-morphllm-action' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-to-get-in-on-the-morphllm-action' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How to Get in on the MorphLLM Action&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Integration with your project is easy peasy. MorphLLM is API-compatible with OpenAI, Vercel AI SDK, MCP, and OpenRouter. You can run it in the cloud, self-host, or go on-prem with enterprise-grade guarantees. &lt;/p&gt;

&lt;p&gt;I want to cloud host mine, if only I could think of somewhere I could quickly and easily deploy wherever I want and only pay for when I&amp;rsquo;m using the infra 😉.&lt;/p&gt;
&lt;h2 id='get-morphed' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#get-morphed' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Get Morphed&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;MorphLLM feels like a plug-in upgrade for code agent projects that will instantly make them faster and more accurate. Check out the docs, benchmarks, and integration guides at &lt;a href='https://docs.morphllm.com/' title=''&gt;docs.morphllm.com&lt;/a&gt;. Get started for free at &lt;a href="https://morphllm.com/dashboard"&gt;https://morphllm.com/dashboard&lt;/a&gt; &lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Trust Calibration for AI Software Builders</title>
        <link rel="alternate" href="https://fly.io/blog/trust-calibration-for-ai-software-builders/"/>
        <id>https://fly.io/blog/trust-calibration-for-ai-software-builders/</id>
        <published>2025-08-18T00:00:00+00:00</published>
        <updated>2025-08-19T08:30:16+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/trust-calibration-for-ai-software-builders/assets/trust_calibration.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Trust calibration is a concept from the world of human-machine interaction design, one that is super relevant to AI software builders. Trust calibration is the practice of aligning the level of trust that users have in our products with its actual capabilities. &lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If we build things that our users trust too blindly, we risk facilitating dangerous or destructive interactions that can permanently turn users off. If they don&amp;rsquo;t trust our product enough, it will feel useless or less capable than it actually is. &lt;/p&gt;

&lt;p&gt;So what does trust calibration look like in practice and how do we achieve it? A 2023 study reviewed over 1000 papers on trust and trust calibration in human / automated systems (properly referenced at the end of this article). It holds some pretty eye-opening insights – and some inconvenient truths – for people building AI software. I&amp;rsquo;ve tried to extract just the juicy bits below.   &lt;/p&gt;
&lt;h2 id='limiting-trust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#limiting-trust' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Limiting Trust&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s begin with a critical point. There is a limit to how deeply we want users to trust our products. Designing for calibrated trust is the goal, not more trust at any cost. Shoddy trust calibration leads to two equally undesirable outcomes: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Over-trust&lt;/strong&gt; causes users to rely on AI systems in situations where they shouldn&amp;rsquo;t (I told my code assistant to fix a bug in prod and went to bed).
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Under-trust&lt;/strong&gt; causes users to reject AI assistance even when it would be beneficial, resulting in reduced perception of value and increased user workload.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;What does calibrated trust look like for your product? It’s important to understand that determining this is less about trying to diagram a set of abstract trust parameters and more about helping users develop accurate mental models of your product&amp;rsquo;s capabilities and limitations. In most cases, this requires thinking beyond the trust calibration mechanisms we default to, like confidence scores. &lt;/p&gt;

&lt;p&gt;For example, Cursor&amp;rsquo;s most prominent trust calibration mechanism is its change suggestion highlighting. The code that the model suggests we change is highlighted in red, followed by suggested changes highlighted in green. This  immediately communicates that &amp;ldquo;this is a suggestion, not a command.&amp;rdquo; &lt;/p&gt;

&lt;p&gt;In contrast, Tesla&amp;rsquo;s Autopilot is a delegative system. It must calibrate trust differently through detailed capability explanations, clear operational boundaries (only on highways), and prominent disengagement alerts when conditions exceed system limits. &lt;/p&gt;
&lt;h2 id='building-cooperative-systems' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#building-cooperative-systems' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Building Cooperative Systems&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Perhaps the most fundamental consideration in determining high level trust calibration objectives is deciding whether your project is designed to be a cooperative or a delegative tool.  &lt;/p&gt;

&lt;p&gt;Cooperative systems generally call for lower levels of trust because users can choose whether to accept or reject AI suggestions. But these systems also face a unique risk. It’s easy for over-trust to gradually transform user complacency into over-reliance, effectively transforming what we designed as a cooperative relationship into a delegative one, only without any of the required safeguards.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re building a coding assistant, content generator, or design tool, implement visible &amp;ldquo;suggestion boundaries&amp;rdquo; which make it clear when the AI is offering ideas versus making decisions. Grammarly does this well by underlining suggestions rather than auto-correcting, and showing rationale on hover. &lt;/p&gt;

&lt;p&gt;For higher-stakes interactions, consider introducing friction. Require explicit confirmation before applying AI suggestions to production code or publishing AI-generated content.&lt;/p&gt;
&lt;h2 id='building-delegative-systems' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#building-delegative-systems' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Building Delegative Systems&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In contrast, users expect delegative systems to replace human action entirely. Blind trust in the system is a requirement for it to be considered valuable at all. &lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re building automation tools, smart scheduling, or decision-making systems, invest heavily in capability communication and boundary setting. Calendly&amp;rsquo;s smart scheduling works because it clearly communicates what it will and won&amp;rsquo;t do (I&amp;rsquo;ll find times that work for both of us vs. I&amp;rsquo;ll reschedule your existing meetings). Build robust fallback mechanisms and make system limitations prominent in your onboarding.  &lt;/p&gt;
&lt;h2 id='timing-is-everything' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#timing-is-everything' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Timing Is Everything&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The study suggests that when we make trust calibrations is at least as important as how. There are three critical windows for trust calibration, each with their own opportunities and challenges. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Pre-interaction calibration&lt;/strong&gt; happens before users engage with the system. Docs and tutorials fall into this category. Setting expectations up front can prevent initial over-trust, which is disproportionally more difficult to correct later. 
&lt;/li&gt;&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Pre-interaction calibrations could look like capability-focused onboarding that shows both successes and failures. Rather than just demonstrating perfect AI outputs, show users examples where the AI makes mistakes and how to catch them. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;During-interaction calibration&lt;/strong&gt; is trust adjustment through real-time feedback. Dynamically updated cues improve trust calibration better than static displays, and adaptive calibration that responds to user behavior outperforms systems that display static information. 
&lt;/li&gt;&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Build confidence indicators that are updated based on context, not just model confidence. For example, if you&amp;rsquo;re building a document AI, show higher confidence for standard document types the system has seen thousands of times, and lower confidence for unusual formats. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Post-interaction calibration&lt;/strong&gt; focuses on learning and adjustment that helps users understand successes and failures in the system after interactions. These aren’t reliable, since by the time users receive the information, their trust patterns are set and hard to change. 
&lt;/li&gt;&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Post-interaction feedback can still be valuable for teaching. Create &amp;ldquo;reflection moments&amp;rdquo; after significant interactions. Midjourney does this by letting users rate image outputs, helping users learn what prompts work best while calibrating their expectations for future generations. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Trust is front-loaded and habit-driven. The most effective calibration happens before and during use, when expectations are still forming and behaviors can still be shifted. Any later and you’re mostly fighting entrenched patterns.&lt;/p&gt;
&lt;h2 id='performance-vs-process-information' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#performance-vs-process-information' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Performance vs. Process Information&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Users can be guided through performance-oriented signals (what the system can do) or process-oriented signals (how it works). The real challenge is matching the right kind of explanation to the right user, at the right moment.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Performance-oriented calibration&lt;/strong&gt; focuses on communicating capability through mechanisms like reliability statistics, confidence scores, and clear capability boundaries. 
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Process-oriented calibration&lt;/strong&gt; offers detailed explanations of decision-making processes, breakdowns of which factors influenced decisions, and reasoning transparency. 
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Process transparency seems like the obvious go-to at first glance, but the effectiveness of process explanations varies wildly based on user expertise and domain knowledge. If we are designing for a set of users that may fall anywhere on this spectrum, we have to avoid creating information overload for novice users while providing sufficient information to expert users who want the detail.  &lt;/p&gt;

&lt;p&gt;The most effective systems in the study combined both approaches, providing layered information that allows users to access the level of detail most appropriate for their expertise and current needs.&lt;/p&gt;
&lt;h2 id='static-vs-adaptive-calibration' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#static-vs-adaptive-calibration' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Static vs. Adaptive Calibration&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I really wanted to ignore this part, because it feels like the study’s authors are passive aggressively adding todos to my projects. In a nutshell, adaptive calibration – when a system actively monitors user behavior and adjusts its communication accordingly - is orders of magnitude more effective than static calibration while delivering the same information to every user, regardless of differences in expertise, trust propensity, or behavior.  &lt;/p&gt;

&lt;p&gt;Static calibration mechanisms are easy to build and maintain, which is why we like them. But the stark reality is that they put the burden of appropriate calibration entirely on our users. We’re making it their job to adapt their behaviour based on generic information.&lt;/p&gt;

&lt;p&gt;This finding has zero respect for our time or mental health, but it also reveals a legit opportunity for clever builders to truly separate their product from the herd.&lt;/p&gt;
&lt;h2 id='practical-adaptive-calibration-techniques' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#practical-adaptive-calibration-techniques' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Practical adaptive calibration techniques&lt;/span&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Behavioral adaptation:&lt;/strong&gt; Track how often users accept vs. reject suggestions and adjust confidence thresholds accordingly. If a user consistently rejects high-confidence suggestions, lower the threshold for showing uncertainty.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Context awareness:&lt;/strong&gt; Adjust trust signals based on use context. A writing AI might show higher confidence for grammar fixes than creative suggestions, or lower confidence late at night when users might be tired.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Detect expertise:&lt;/strong&gt; Users who frequently make sophisticated edits to AI output probably want more detailed explanations than those who typically accept entire file rewrites.
&lt;/li&gt;&lt;/ul&gt;
&lt;h2 id='the-transparency-paradox' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-transparency-paradox' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Transparency Paradox&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The idea that transparency and explainability can actually harm trust calibration is easily the point that hit me the hardest. While explanations can improve user understanding, they can also create information overload that reduces users&amp;rsquo; ability to detect and correct trash output. What&amp;rsquo;s worse, explanations can create a whole new layer of trust calibration issues, with users over-trusting the explanation mechanism itself, rather than critically evaluating the actual output.&lt;/p&gt;

&lt;p&gt;This suggests that quality over quantity should be our design philosophy when it comes to transparency. We should provide carefully crafted, relevant information rather than comprehensive but overwhelming detail. The goal should be enabling better decision-making rather than simply satisfying user curiosity about system internals.&lt;/p&gt;
&lt;h2 id='anthropomorphism-and-unwarranted-trust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#anthropomorphism-and-unwarranted-trust' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Anthropomorphism and Unwarranted Trust&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;It seems obvious that we should make interactions with our AI project feel as human as possible. Well, it turns out that systems that appear more human-like through design, language, or interaction patterns are notoriously good at increasing user trust beyond actual system capabilities. &lt;/p&gt;

&lt;p&gt;So it’s entirely possible that building more traditional human-computer interactions can actually make our AI projects safer to use and therefore, more user-friendly.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Use tool-like language:&lt;/strong&gt; Frame outputs as &amp;ldquo;analysis suggests&amp;rdquo; rather than &amp;ldquo;I think&amp;rdquo; or &amp;ldquo;I believe&amp;rdquo;
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Embrace machine-like precision:&lt;/strong&gt; Show exact confidence percentages rather than human-like hedging (&amp;ldquo;I&amp;rsquo;m pretty sure that&amp;hellip;)
&lt;/li&gt;&lt;/ul&gt;
&lt;h2 id='trust-falls-faster-than-it-climbs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#trust-falls-faster-than-it-climbs' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Trust Falls Faster Than It Climbs&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Nothing particularly groundbreaking here, but the findings are worth mentioning if only to reinforce what we think we know. &lt;/p&gt;

&lt;p&gt;Early interactions are critically important. Users form mental models quickly and then react slowly to changes in system reliability.&lt;/p&gt;

&lt;p&gt;More critically, trust drops much faster from system failures than it builds from successes. These asymmetries suggest that we should invest disproportionately in onboarding and first-use experiences, even if they come with higher development costs.&lt;/p&gt;
&lt;h2 id='measurement-is-an-opportunity-for-innovation' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#measurement-is-an-opportunity-for-innovation' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Measurement is an Opportunity for Innovation&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The study revealed gaping voids where effective measurement mechanisms and protocols should be, for both researchers and builders. There is a clear need to move beyond simple user satisfaction metrics or adoption rates to developing measurement frameworks that can actively detect miscalibrated trust patterns. &lt;/p&gt;

&lt;p&gt;The ideal measurement approach would combine multiple indicators. A few examples of viable indicators are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Behavioral signals:&lt;/strong&gt; Track acceptance rates for different confidence levels. Well-calibrated trust should show higher acceptance rates for high-confidence outputs and lower rates for low-confidence ones.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Context-specific metrics:&lt;/strong&gt; Measure trust calibration separately for different use cases. Users might be well-calibrated for simple tasks but poorly calibrated for complex ones.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;User self-reporting:&lt;/strong&gt; Regular pulse surveys asking &amp;quot;How confident are you in your ability to tell when this AI makes mistakes?&amp;rdquo; can reveal calibration gaps.
&lt;/li&gt;&lt;/ul&gt;
&lt;h2 id='the-calibrated-conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-calibrated-conclusion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Calibrated Conclusion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s clear, at least from this study, that there’s no universal formula, or single feature that will effectively calibrate trust. It&amp;rsquo;s up to every builder to define and understand their project&amp;rsquo;s trust goals and to balance timing, content, adaptivity, and transparency accordingly. That’s what makes it both hard and worth doing. Trust calibration has to be a core part of our product’s identity, not a piglet we only start chasing once it has escaped the barn.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;The Study:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Magdalena Wischnewski, Nicole Krämer, and Emmanuel Müller. 2023. Measuring and Understanding Trust Calibrations for Automated Systems: A Survey of the State-Of-The-Art and Future Directions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI &amp;lsquo;23), April 23–28, 2023, Hamburg, Germany. ACM, New York, NY, USA 16 Pages. &lt;a href="https://doi.org/10.1145/3544548.3581197"&gt;https://doi.org/10.1145/3544548.3581197&lt;/a&gt;&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Games as Model Eval: 1-Click Deploy AI Town on Fly.io</title>
        <link rel="alternate" href="https://fly.io/blog/games-as-model-eval/"/>
        <id>https://fly.io/blog/games-as-model-eval/</id>
        <published>2025-08-11T00:00:00+00:00</published>
        <updated>2025-08-15T08:35:19+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/games-as-model-eval/assets/Fly_Man.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Recently, I suggested that &lt;a href="https://fly.io/blog/the-future-isn-t-model-agnostic/" title=""&gt;The Future Isn’t Model Agnostic&lt;/a&gt;, that it’s better to pick one model that works for your project and build around it, rather than engineering for model flexibility. If you buy that, you also have to acknowledge how important comprehensive model evaluation becomes. &lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Benchmarks tell us almost nothing about how a model will actually behave in the wild, especially with long contexts, or when trusted to deliver the tone and feel that defines the UX we’re shooting for. Even the best evaluation pipelines usually end in subjective, side-by-side output comparisons. Not especially rigorous, and more importantly, boring af.&lt;/p&gt;

&lt;p&gt;Can we gamify model evaluation? Oh yes. And not just because we get to have some fun for once. Google backed me up this week when it announced the &lt;a href='https://blog.google/technology/ai/kaggle-game-arena/' title=''&gt;Kaggle Game Arena&lt;/a&gt;. A public platform where we can watch AI models duke it out in a variety of classic games.  Quoting Google; &amp;ldquo;Current AI benchmarks are struggling to keep pace with modern models&amp;hellip; it can be hard to know if models trained on internet data are actually solving problems or just remembering answers they&amp;rsquo;ve already seen.&amp;rdquo;&lt;/p&gt;

&lt;p&gt;When models boss reading comprehension tests, or ace math problems, we pay attention. But when they fail to navigate a simple conversation with a virtual character or completely botch a strategic decision in a game environment, we tell ourselves we&amp;rsquo;re not building a game anyway and develop strategic short-term memory loss. 
Just like I&amp;rsquo;ve told my mom a thousand times, games are great at testing brains, and it&amp;rsquo;s time we take this seriously when it comes to model evaluation. &lt;/p&gt;
&lt;h2 id='why-games-dont-lie' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-games-dont-lie' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Why Games Don&amp;rsquo;t Lie&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Games provide what benchmarks can&amp;rsquo;t, &amp;ldquo;a clear, unambiguous signal of success.&amp;rdquo; They give us observable behavior in dynamic environments, the kind that would be extremely difficult (and tedious) to simulate with prompt engineering alone.&lt;/p&gt;

&lt;p&gt;Games force models to demonstrate the skills we actually care about; strategic reasoning, long-term planning, and dynamic adaptation in interactions with an opponent or a collaborator. &lt;/p&gt;
&lt;h2 id='pixel-art-meets-effective-model-evaluation-ai-town-on-fly-io' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pixel-art-meets-effective-model-evaluation-ai-town-on-fly-io' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Pixel Art Meets Effective Model Evaluation - AI Town on Fly.io&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;AI Town is a brilliant project by &lt;a href='https://github.com/a16z-infra' title=''&gt;a16z-infra&lt;/a&gt;, based on the the mind-bending paper,  &lt;a href='https://arxiv.org/pdf/2304.03442' title=''&gt;Generative Agents: Interactive Simulacra of Human Behavior&lt;/a&gt;. It&amp;rsquo;s a beautifully rendered little town in which tiny people with AI brains and engineered personalities go about their lives, interacting with each other and their environment. Characters need to remember past conversations, maintain relationships, react dynamically to new situations, and stay in character while doing it all. &lt;/p&gt;

&lt;p&gt;I challenge you to find a more entertaining way of evaluating conversational models. &lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ve &lt;a href='https://github.com/fly-apps/ai-town_on_fly.io' title=''&gt;forked the project&lt;/a&gt; to make it absurdly easy to spin up your own AI Town on Fly Machines. You&amp;rsquo;ve got a single deploy script that will set everything up for you and some built-in cost and performance optimizations, with our handy scale to zero functionality as standard (so you only pay for the time spent running it). This makes it easy to share with your team, your friends and your mom.  &lt;/p&gt;

&lt;p&gt;In it&amp;rsquo;s current state, the fork makes it as easy as possible to test any OpenAI-compatible service, any model on Together.ai and even custom embedding models. Simply set the relevant API key in your secrets. &lt;/p&gt;

&lt;p&gt;Games like AI Town give us a window into how models actually think, adapt, and behave beyond the context of our prompts. You move past performance metrics and begin to understand a model’s personality, quirks, strengths, and weaknesses; all factors that ultimately shape your project&amp;rsquo;s UX. &lt;/p&gt;</content>
    </entry>
    <entry>
        <title>The Future Isn't Model Agnostic</title>
        <link rel="alternate" href="https://fly.io/blog/the-future-isn-t-model-agnostic/"/>
        <id>https://fly.io/blog/the-future-isn-t-model-agnostic/</id>
        <published>2025-08-08T00:00:00+00:00</published>
        <updated>2025-08-22T16:31:43+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/the-future-isn-t-model-agnostic/assets/Whack_A_Mole_.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Your users don’t care that your AI project is model 
agnostic. &lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In my last project, I spent countless hours ensuring that the LLMs running my services  could be swapped out as easily as possible. I couldn&amp;rsquo;t touch a device with an internet connection without hearing about the latest benchmark-breaking model and it felt like a clear priority to ensure I could hot swap models with minimal collateral damage.&lt;/p&gt;

&lt;p&gt;So yeah. That was a waste of time.&lt;/p&gt;

&lt;p&gt;The hype around new model announcements feels more manufactured with each release. In reality, improvements are becoming incremental. As major providers converge on the same baseline, the days of one company holding a decisive lead are numbered.&lt;/p&gt;

&lt;p&gt;In a world of model parity, the differentiation moves entirely to the product layer. Winning isn&amp;rsquo;t about ensuring you&amp;rsquo;re using the best model, its about understanding your chosen model deeply enough to build experiences that feel magical. Knowing exactly how to prompt for consistency, which edge cases to avoid, and how to design workflows that play to your model&amp;rsquo;s particular strengths&lt;/p&gt;

&lt;p&gt;Model agnosticism isn&amp;rsquo;t just inefficient, it&amp;rsquo;s misguided. Fact is, swapping out your model is not just changing an endpoint. It&amp;rsquo;s rewriting prompts, rerunning evals, users telling you things just feel&amp;hellip; different. And if you&amp;rsquo;ve won users on the way it feels to use your product, that last one is a really big deal.&lt;/p&gt;
&lt;h2 id='model-lt-product' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#model-lt-product' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Model &amp;lt; Product&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Recently, something happened that fully solidified this idea in my head. Claude Code is winning among people building real things with AI. We even have evangelists in the Fly.io engineering team, and those guys are weird smart. Elsewhere, whole communities have formed to share and compare claude.md&amp;rsquo;s and fight each other over which MCP servers are the coolest to use with Claude.&lt;/p&gt;

&lt;p&gt;Enter stage right, Qwen 3 Coder. It takes Claude to the cleaners in benchmarks. But the response from the Claude Code user base? A collective meh.&lt;/p&gt;

&lt;p&gt;This is nothing like 2024, when everyone would have dropped everything to get the hot new model running in Cursor. And it&amp;rsquo;s not because we&amp;rsquo;ve learned that benchmarks are performance theater for people who&amp;rsquo;ve never shipped a product.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s because products like Claude Code are irrefutable evidence that the model isn&amp;rsquo;t the product. We&amp;rsquo;ve felt it first hand when our pair programmer&amp;rsquo;s behaviour changes in subtle ways. The product is in the rituals. The trust. The predictability. It&amp;rsquo;s precisely because Claude Code&amp;rsquo;s model behavior, UI, and user expectations are so tightly coupled that its users don&amp;rsquo;t really care that a better model might exist.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m not trying to praise Anthropic here. The point is, engineering for model agnosticism is a trap that will eat up time that could be better spent … anywhere else.&lt;/p&gt;

&lt;p&gt;Sure, if you&amp;rsquo;re building infra or anything else that lives close to the metal, model optionality still matters. But people trusting legwork to AI tools are building deeper relationships and expectations of their AI tools than they even care to admit. AI product success stories are written when products become invisible parts of users&amp;rsquo; daily rituals, not showcases for engineering flexibility.&lt;/p&gt;
&lt;h2 id='make-one-model-your-own' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#make-one-model-your-own' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Make One Model Your Own&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;As builders, it&amp;rsquo;s time we stop hedging our bets and embrace the convergence reality. Every startup pitch deck with &amp;lsquo;model-agnostic&amp;rsquo; as a feature should become a red flag for investors who understand product-market fit. Stop putting &amp;lsquo;works with any LLM&amp;rsquo; in your one-liner. It screams &amp;lsquo;we don&amp;rsquo;t know what we&amp;rsquo;re building.&amp;rsquo;&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re still building model-agnostic AI tools in 2025, you&amp;rsquo;re optimizing for the wrong thing. Users don&amp;rsquo;t want flexibility; they want reliability. And in a converged model landscape, reliability comes from deep specialization, not broad compatibility.&lt;/p&gt;

&lt;p&gt;Pick your model like you pick your therapist; for the long haul. Find the right model, tune deeply, get close enough to understand its quirks and make them work for you. Stop architecting for the mythical future where you&amp;rsquo;ll seamlessly swap models. That future doesn&amp;rsquo;t exist, and chasing it is costing you the present.&lt;/p&gt;
&lt;h2 id='bonus-level-all-in-on-one-model-means-all-out-on-eval' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#bonus-level-all-in-on-one-model-means-all-out-on-eval' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Bonus level: All-in On One Model Means All-out On Eval&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;If any of this is landing for you, you&amp;rsquo;ll agree that we have to start thinking of  model evaluation as architecture, not an afterthought. The good news is, rigorous model eval doesn&amp;rsquo;t have to be mind numbing anymore. &lt;/p&gt;

&lt;p&gt;Turns out, games are really great eval tools! Now you can spin up your very own little &lt;a href='https://github.com/fly-apps/ai-town_on_fly.io' title=''&gt;AI Town&lt;/a&gt; on Fly.io with a single click deploy to test different models as pixel people in an evolving environment. I discuss the idea further in &lt;a href='https://fly.io/blog/games-as-model-eval/' title=''&gt;Games as Model Eval: 1-Click Deploy AI Town on Fly.io&lt;/a&gt;.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Phoenix.new – The Remote AI Runtime for Phoenix</title>
        <link rel="alternate" href="https://fly.io/blog/phoenix-new-the-remote-ai-runtime/"/>
        <id>https://fly.io/blog/phoenix-new-the-remote-ai-runtime/</id>
        <published>2025-06-20T00:00:00+00:00</published>
        <updated>2025-06-24T17:23:07+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/phoenix-new-the-remote-ai-runtime/assets/phoenixnew-thumb.png"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;I’m Chris McCord, the creator of Elixir’s Phoenix framework. For the past several months, I’ve been working on a skunkworks project at Fly.io, and it’s time to show it off.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I wanted LLM agents to work just as well with Elixir as they do with Python and JavaScript. Last December, in order to figure out what that was going to take, I started a little weekend project to find out how difficult it would be to build a coding agent in Elixir.&lt;/p&gt;

&lt;p&gt;A few weeks later, I had it spitting out working Phoenix applications and driving a full in-browser IDE. I knew this wasn&amp;rsquo;t going to stay a weekend project.&lt;/p&gt;

&lt;p&gt;If you follow me on Twitter, you&amp;rsquo;ve probably seen me teasing this work as it picked up steam. We&amp;rsquo;re at a point where we&amp;rsquo;re pretty serious about this thing, and so it&amp;rsquo;s time to make a formal introduction.&lt;/p&gt;

&lt;p&gt;World, meet &lt;a href='https://phoenix.new' title=''&gt;Phoenix.new&lt;/a&gt;, a batteries-included fully-online coding agent tailored to Elixir and Phoenix. I think it&amp;rsquo;s going to be the fastest way to build collaborative, real-time applications.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s see it in action:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/du7GmWGUM5Y"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id='whats-interesting-about-phoenix-new' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-interesting-about-phoenix-new' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What&amp;rsquo;s Interesting About Phoenix.new&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;First, even though it runs entirely in your browser, Phoenix.new gives both you and your agent a root shell, in an ephemeral virtual machine (a &lt;a href='https://fly.io/docs/machines/overview/' title=''&gt;Fly Machine&lt;/a&gt;) that gives our agent loop free rein to install things and run programs  — without any risk of messing up your local machine. You don&amp;rsquo;t think about any of this; you just open up the VSCode interface, push the shell button, and there you are, on the isolated machine you share with the Phoenix.new agent.&lt;/p&gt;

&lt;p&gt;Second, it&amp;rsquo;s an agent system I built specifically for Phoenix. Phoenix is about real-time collaborative applications, and Phoenix.new knows what that means. To that end, Phoenix.new includes, in both its UI and its agent tools, a full browser. The Phoenix.new agent uses that browser &amp;ldquo;headlessly&amp;rdquo; to check its own front-end changes and interact with the app. Because it&amp;rsquo;s a full browser, instead of trying to iterate on screenshots, the agent sees real page content and JavaScript state – with or without a human present.&lt;/p&gt;
&lt;h2 id='what-root-access-gets-us' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-root-access-gets-us' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What Root Access Gets Us&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Agents build software the way you did when you first got started, the way you still do today when you prototype things. They don&amp;rsquo;t carefully design Docker container layers and they don&amp;rsquo;t really do release cycles. An agent wants to pop a shell and get its fingernails dirty.&lt;/p&gt;

&lt;p&gt;A fully isolated virtual machine means Phoenix.new&amp;rsquo;s fingernails can get &lt;em&gt;arbitrarily dirty.&lt;/em&gt; If it wants to add a package to &lt;code&gt;mix.exs&lt;/code&gt;, it can do that and then run &lt;code&gt;mix phx.server&lt;/code&gt; or &lt;code&gt;mix test&lt;/code&gt; and check the output. Sure. Every agent can do that. But if it wants to add an APT package to the base operating system, it can do that too, and make sure it worked. It owns the whole environment.&lt;/p&gt;

&lt;p&gt;This offloads a huge amount of tedious, repetitive work.&lt;/p&gt;

&lt;p&gt;At his &lt;a href='https://youtu.be/LCEmiRjPEtQ?si=sR_bdu6-AqPXSNmY&amp;t=1902' title=''&gt;AI Startup School talk last week&lt;/a&gt;, Andrej Karpathy related his experience of building a restaurant menu visualizer, which takes camera pictures of text menus and transforms all the menu items into pictures. The code, which he vibe-coded with an LLM agent, was the easy part; he had it working in an afternoon. But getting the app online took him a whole week.&lt;/p&gt;

&lt;p&gt;With Phoenix.new, I&amp;rsquo;m taking dead aim at this problem. The apps we produce live in the cloud from the minute they launch. They have private, shareable URLs (we detect anything the agent generates with a bound port and give it a preview URL underneath &lt;code&gt;phx.run&lt;/code&gt;, with integrated port-forwarding), they integrate with Github, and they inherit all the infrastructure guardrails of Fly.io: hardware virtualization, WireGuard, and isolated networks.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Github’s &lt;code&gt;gh&lt;/code&gt; CLI is installed by default. So the agent knows how to clone any repo, or browse issues, and you can even authorize it for internal repositories to get it working with your team’s existing projects and dependencies.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Full control of the environment also closes the loop between the agent and deployment. When Phoenix.new boots an app, it watches the logs, and tests the application. When an action triggers an error, Phoenix.new notices and gets to work.&lt;/p&gt;
&lt;h2 id='watch-it-build-in-real-time' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#watch-it-build-in-real-time' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Watch It Build In Real Time&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://phoenix.new' title=''&gt;Phoenix.new&lt;/a&gt; can interact with web applications the way users do: with a real browser.&lt;/p&gt;

&lt;p&gt;The Phoenix.new environment includes a headless Chrome browser that our agent knows how to drive. Prompt it to add a front-end feature to your application, and it won&amp;rsquo;t just sketch the code out and make sure it compiles and lints. It&amp;rsquo;ll pull the app up itself and poke at the UI, simultaneously looking at the page content, JavaScript state, and server-side logs.&lt;/p&gt;

&lt;p&gt;Phoenix is all about &lt;a href='https://fly.io/blog/how-we-got-to-liveview/' title=''&gt;&amp;ldquo;live&amp;rdquo; real-time&lt;/a&gt; interactivity, and gives us seamless live reload. The user interface for Phoenix.new itself includes a live preview of the app being worked on, so you can kick back and watch it build front-end features incrementally. Any other &lt;code&gt;.phx.run&lt;/code&gt; tabs you have open also update as it goes. It&amp;rsquo;s wild.&lt;/p&gt;
&lt;video title="agent interacting with web" autoplay="autoplay" loop="loop" muted="muted" playsinline="playsinline" disablePictureInPicture="true" class="mb-8" src="/blog/phoenix-new-the-remote-ai-runtime/assets/webjs.mp4"&gt;&lt;/video&gt;

&lt;h2 id='not-just-for-vibe-coding' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#not-just-for-vibe-coding' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Not Just For Vibe Coding&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Phoenix.new can already build real, full-stack applications with WebSockets, Phoenix&amp;rsquo;s Presence features, and real databases. I&amp;rsquo;m seeing it succeed at business and collaborative applications right now.&lt;/p&gt;

&lt;p&gt;But there&amp;rsquo;s no fixed bound on the tasks you can reasonably ask it to accomplish. If you can do it with a shell and a browser, I want Phoenix.new to do it too. And it can do these tasks with or without you present.&lt;/p&gt;

&lt;p&gt;For example: set a &lt;code&gt;$DATABASE_URL&lt;/code&gt; and tell the agent about it. The agent knows enough to go explore it with &lt;code&gt;psql&lt;/code&gt;, and it&amp;rsquo;ll propose apps based on the schemas it finds. It can model Ecto schemas off the database. And if MySQL is your thing, the agent will just &lt;code&gt;apt install&lt;/code&gt; a MySQL client and go to town.&lt;/p&gt;

&lt;p&gt;Frontier model LLMs have vast world knowledge. They generalize extremely well. At ElixirConfEU, I did a &lt;a href='https://www.youtube.com/watch?v=ojL_VHc4gLk&amp;t=3923s' title=''&gt;demo vibe-coding Tetris&lt;/a&gt; on stage. Phoenix.new nailed it, first try, first prompt. It&amp;rsquo;s not like there&amp;rsquo;s gobs of Phoenix LiveView Tetris examples floating around the Internet! But lots of people have published Tetris code, and lots of people have written LiveView stuff, and 2025 LLMs can connect those dots.&lt;/p&gt;

&lt;p&gt;At this point you might be wondering – can I just ask it to build a Rails app? Or an Expo React Native app? Or Svelte? Or Go?&lt;/p&gt;

&lt;p&gt;Yes, you can.&lt;/p&gt;

&lt;p&gt;Our system prompt is tuned for Phoenix today, but all languages you care about are already installed. We&amp;rsquo;re still figuring out where to take this, but adding new languages and frameworks definitely ranks highly in my plans.&lt;/p&gt;
&lt;h2 id='our-async-agent-future' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-async-agent-future' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Our Async Agent Future&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://fly.io/blog/youre-all-nuts/' title=''&gt;We&amp;rsquo;re at a massive step-change in developer workflows&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Agents can do real work, today, with or without a human present. Buckle up: the future of development, at least in the common case, probably looks less like cracking open a shell and finding a file to edit, and more like popping into a CI environment with agents working away around the clock.&lt;/p&gt;

&lt;p&gt;Local development isn&amp;rsquo;t going away. But there&amp;rsquo;s going to be a shift in where the majority of our iterations take place. I&amp;rsquo;m already using Phoenix.new to triage &lt;code&gt;phoenix-core&lt;/code&gt; Github issues and pick problems to solve. I close my laptop, grab a cup of coffee, and wait for a PR to arrive — Phoenix.new knows how PRs work, too. We&amp;rsquo;re already here, and this space is just getting started.&lt;/p&gt;

&lt;p&gt;This isn&amp;rsquo;t where I thought I&amp;rsquo;d end up when I started poking around. The Phoenix and LiveView journey was much the same. Something special was there and the projects took on a life of their own. I&amp;rsquo;m excited to share this work now, and see where it might take us. I can&amp;rsquo;t wait to see what folks build.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>What are MCP Servers?</title>
        <link rel="alternate" href="https://fly.io/blog/mcps-everywhere/"/>
        <id>https://fly.io/blog/mcps-everywhere/</id>
        <published>2025-06-12T00:00:00+00:00</published>
        <updated>2025-06-12T16:59:12+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/mcps-everywhere/assets/mcps-everywhere.jpg"/>
        <content type="html">&lt;div class="lead"&gt;&lt;div&gt;&lt;p&gt;With Fly.io, &lt;a href="https://fly.io/docs/speedrun/" title=""&gt;you can get your app running globally in a matter of minutes&lt;/a&gt;, and with MCP servers you can integrate with Claude, VSCode, Cursor and &lt;a href="https://modelcontextprotocol.io/clients"&gt;many more AI clients&lt;/a&gt;.  &lt;a href="https://fly.io/docs/mcp/" title=""&gt;Try it out for yourself&lt;/a&gt;!&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The introduction to &lt;a href='https://modelcontextprotocol.io/introduction' title=''&gt;Model Context Protocol&lt;/a&gt; starts out with:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That paragraph, to me, is both comforting (&amp;ldquo;USB for LLM&amp;rdquo;? Cool! Got it!), and simultaneously vacuous (Um, but what do I actually &lt;em&gt;do&lt;/em&gt; with this?).&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ve been digging deeper and have come up with a few more analogies and observations that make sense to me. Perhaps one or more of these will help you.&lt;/p&gt;
&lt;h2 id='mcps-are-alexa-skills' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-alexa-skills' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are Alexa Skills&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;You buy an Echo Dot and a Hue light. You plug them both in and connect them to your Wi-Fi. There is one more step you need to do: you need to install and enable the Philips Hue skill to connect the two.&lt;/p&gt;

&lt;p&gt;Now you might be using Siri or Google Assistant. Or you may want to connect a Ring Doorbell camera or Google Nest Thermostat. But the principle is the same, though the analogy is slightly stronger with a skill (which is a noun) as opposed to the action of pairing your Hue Bridge with Apple HomeKit (a verb).&lt;/p&gt;
&lt;h2 id='mcps-are-api-2-0' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-api-2-0' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are API 2.0&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;HTTP 1.1 is simple. You send a request, you get a response. While there are cookies and sessions, it is pretty much stateless and therefore inefficient, as each request needs to establish a new connection. WebSockets and Server-Sent Events (SSE) mitigate this a bit.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://en.wikipedia.org/wiki/HTTP/2' title=''&gt;HTTP 2.0&lt;/a&gt; introduces features like multiplexing and server push. When you visit a web page for the first time, your browser can now request all of the associated JavaScript, CSS, and images at once, and the server can respond in any order.&lt;/p&gt;

&lt;p&gt;APIs today are typically request/response. MCPs support multiplexing and server push.&lt;/p&gt;
&lt;h2 id='mcps-are-apis-with-introspection-reflection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-apis-with-introspection-reflection' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are APIs with Introspection/Reflection&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;With &lt;a href='https://learn.openapis.org/' title=''&gt;OpenAPI&lt;/a&gt;, requests are typically JSON, and responses are too. Many OpenAPI providers publish a separate &lt;a href='https://learn.openapis.org/specification/structure.html' title=''&gt;OpenAPI Description (OAD)&lt;/a&gt;, which contains a schema describing what requests are supported by that API.&lt;/p&gt;

&lt;p&gt;With MCP, requests are also JSON and responses are also JSON, but in addition, there are standard requests built into the protocol to obtain useful information, including what tools are provided, what arguments each tool expects, and a prose description of how each tool is expected to be used.&lt;/p&gt;

&lt;p&gt;As an aside, don&amp;rsquo;t automatically assume that you will get good results from &lt;a href='https://neon.com/blog/autogenerating-mcp-servers-openai-schemas' title=''&gt;auto-generating MCP Servers from OpenAPI schemas&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Effective MCP design means thinking about the workflows you want the agent to perform and potentially creating higher-level tools that encapsulate multiple API calls, rather than just exposing the raw building blocks of your entire API spec.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href='https://glama.ai/blog/2025-06-06-mcp-vs-api#five-fundamental-differences' title=''&gt;MCP vs API&lt;/a&gt; goes into this topic at greater debth.&lt;/p&gt;

&lt;p&gt;In many cases you will get better results treating LLMs as humans. If you have a CLI, consider using that as the starting point instead.&lt;/p&gt;
&lt;h2 id='mcps-are-not-serverless' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-not-serverless' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are &lt;strong&gt;&lt;i&gt;not&lt;/i&gt;&lt;/strong&gt; serverless&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://en.wikipedia.org/wiki/Serverless_computing' title=''&gt;Serverless&lt;/a&gt;, sometimes known as &lt;a href='https://en.wikipedia.org/wiki/Function_as_a_service' title=''&gt;FaaS&lt;/a&gt;, is a popular cloud computing model, where AWS Lambda is widely recognized as the archetype.&lt;/p&gt;

&lt;p&gt;MCP servers are not serverless; they have a well-defined and long-lived &lt;a href='https://modelcontextprotocol.io/specification/2025-03-26/basic/lifecycle' title=''&gt;lifecycle&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;svg aria-roledescription="sequence" role="graphics-document document" viewBox="-50 -10 482 651" style="max-width: 482px;" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" width="100%" id="rm"&gt;&lt;rect class="rect" height="70" width="302" fill="rgb(200, 220, 250)" y="325" x="40"&gt;&lt;/rect&gt;&lt;g&gt;&lt;rect class="actor actor-bottom" ry="3" rx="3" name="Server" height="65" width="150" stroke="#666" fill="#eaeaea" y="565" x="232"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="597.5" x="307"&gt;&lt;tspan dy="0" x="307"&gt;Server&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="actor actor-bottom" ry="3" rx="3" name="Client" height="65" width="150" stroke="#666" fill="#eaeaea" y="565" x="0"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="597.5" x="75"&gt;&lt;tspan dy="0" x="75"&gt;Client&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;line name="Server" stroke="#999" stroke-width="0.5px" class="actor-line 200" y2="565" x2="307" y1="65" x1="307" id="actor10"&gt;&lt;/line&gt;&lt;g id="root-10"&gt;&lt;rect class="actor actor-top" ry="3" rx="3" name="Server" height="65" width="150" stroke="#666" fill="#eaeaea" y="0" x="232"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="32.5" x="307"&gt;&lt;tspan dy="0" x="307"&gt;Server&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;/g&gt;&lt;g&gt;&lt;line name="Client" stroke="#999" stroke-width="0.5px" class="actor-line 200" y2="565" x2="75" y1="65" x1="75" id="actor9"&gt;&lt;/line&gt;&lt;g id="root-9"&gt;&lt;rect class="actor actor-top" ry="3" rx="3" name="Client" height="65" width="150" stroke="#666" fill="#eaeaea" y="0" x="0"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="32.5" x="75"&gt;&lt;tspan dy="0" x="75"&gt;Client&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;/g&gt;&lt;style&gt;#rm{font-family:inherit;font-size:16px;fill:#333;}#rm .error-icon{fill:#552222;}#rm .error-text{fill:#552222;stroke:#552222;}#rm .edge-thickness-normal{stroke-width:1px;}#rm .edge-thickness-thick{stroke-width:3.5px;}#rm .edge-pattern-solid{stroke-dasharray:0;}#rm .edge-thickness-invisible{stroke-width:0;fill:none;}#rm .edge-pattern-dashed{stroke-dasharray:3;}#rm .edge-pattern-dotted{stroke-dasharray:2;}#rm .marker{fill:#333333;stroke:#333333;}#rm .marker.cross{stroke:#333333;}#rm svg{font-family:inherit;font-size:16px;}#rm p{margin:0;}#rm .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm text.actor&amp;gt;tspan{fill:black;stroke:none;}#rm .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#rm .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#rm .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#rm #arrowhead path{fill:#333;stroke:#333;}#rm .sequenceNumber{fill:white;}#rm #sequencenumber{fill:#333;}#rm #crosshead path{fill:#333;stroke:#333;}#rm .messageText{fill:#333;stroke:none;}#rm .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm .labelText,#rm .labelText&amp;gt;tspan{fill:black;stroke:none;}#rm .loopText,#rm .loopText&amp;gt;tspan{fill:black;stroke:none;}#rm .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#rm .note{stroke:#aaaa33;fill:#fff5ad;}#rm .noteText,#rm .noteText&amp;gt;tspan{fill:black;stroke:none;}#rm .activation0{fill:#f4f4f4;stroke:#666;}#rm .activation1{fill:#f4f4f4;stroke:#666;}#rm .activation2{fill:#f4f4f4;stroke:#666;}#rm .actorPopupMenu{position:absolute;}#rm .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#rm .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm .actor-man circle,#rm line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#rm :root{--mermaid-font-family:inherit;}&lt;/style&gt;&lt;g&gt;&lt;/g&gt;&lt;defs&gt;&lt;symbol height="24" width="24" id="computer"&gt;&lt;path d="M2 2v13h20v-13h-20zm18 11h-16v-9h16v9zm-10.228 6l.466-1h3.524l.467 1h-4.457zm14.228 3h-24l2-6h2.104l-1.33 4h18.45l-1.297-4h2.073l2 6zm-5-10h-14v-7h14v7z" transform="scale(.5)"&gt;&lt;/path&gt;&lt;/symbol&gt;&lt;/defs&gt;&lt;defs&gt;&lt;symbol clip-rule="evenodd" fill-rule="evenodd" id="database"&gt;&lt;path d="M12.258.001l.256.004.255.005.253.008.251.01.249.012.247.015.246.016.242.019.241.02.239.023.236.024.233.027.231.028.229.031.225.032.223.034.22.036.217.038.214.04.211.041.208.043.205.045.201.046.198.048.194.05.191.051.187.053.183.054.18.056.175.057.172.059.168.06.163.061.16.063.155.064.15.066.074.033.073.033.071.034.07.034.069.035.068.035.067.035.066.035.064.036.064.036.062.036.06.036.06.037.058.037.058.037.055.038.055.038.053.038.052.038.051.039.05.039.048.039.047.039.045.04.044.04.043.04.041.04.04.041.039.041.037.041.036.041.034.041.033.042.032.042.03.042.029.042.027.042.026.043.024.043.023.043.021.043.02.043.018.044.017.043.015.044.013.044.012.044.011.045.009.044.007.045.006.045.004.045.002.045.001.045v17l-.001.045-.002.045-.004.045-.006.045-.007.045-.009.044-.011.045-.012.044-.013.044-.015.044-.017.043-.018.044-.02.043-.021.043-.023.043-.024.043-.026.043-.027.042-.029.042-.03.042-.032.042-.033.042-.034.041-.036.041-.037.041-.039.041-.04.041-.041.04-.043.04-.044.04-.045.04-.047.039-.048.039-.05.039-.051.039-.052.038-.053.038-.055.038-.055.038-.058.037-.058.037-.06.037-.06.036-.062.036-.064.036-.064.036-.066.035-.067.035-.068.035-.069.035-.07.034-.071.034-.073.033-.074.033-.15.066-.155.064-.16.063-.163.061-.168.06-.172.059-.175.057-.18.056-.183.054-.187.053-.191.051-.194.05-.198.048-.201.046-.205.045-.208.043-.211.041-.214.04-.217.038-.22.036-.223.034-.225.032-.229.031-.231.028-.233.027-.236.024-.239.023-.241.02-.242.019-.246.016-.247.015-.249.012-.251.01-.253.008-.255.005-.256.004-.258.001-.258-.001-.256-.004-.255-.005-.253-.008-.251-.01-.249-.012-.247-.015-.245-.016-.243-.019-.241-.02-.238-.023-.236-.024-.234-.027-.231-.028-.228-.031-.226-.032-.223-.034-.22-.036-.217-.038-.214-.04-.211-.041-.208-.043-.204-.045-.201-.046-.198-.048-.195-.05-.19-.051-.187-.053-.184-.054-.179-.056-.176-.057-.172-.059-.167-.06-.164-.061-.159-.063-.155-.064-.151-.066-.074-.033-.072-.033-.072-.034-.07-.034-.069-.035-.068-.035-.067-.035-.066-.035-.064-.036-.063-.036-.062-.036-.061-.036-.06-.037-.058-.037-.057-.037-.056-.038-.055-.038-.053-.038-.052-.038-.051-.039-.049-.039-.049-.039-.046-.039-.046-.04-.044-.04-.043-.04-.041-.04-.04-.041-.039-.041-.037-.041-.036-.041-.034-.041-.033-.042-.032-.042-.03-.042-.029-.042-.027-.042-.026-.043-.024-.043-.023-.043-.021-.043-.02-.043-.018-.044-.017-.043-.015-.044-.013-.044-.012-.044-.011-.045-.009-.044-.007-.045-.006-.045-.004-.045-.002-.045-.001-.045v-17l.001-.045.002-.045.004-.045.006-.045.007-.045.009-.044.011-.045.012-.044.013-.044.015-.044.017-.043.018-.044.02-.043.021-.043.023-.043.024-.043.026-.043.027-.042.029-.042.03-.042.032-.042.033-.042.034-.041.036-.041.037-.041.039-.041.04-.041.041-.04.043-.04.044-.04.046-.04.046-.039.049-.039.049-.039.051-.039.052-.038.053-.038.055-.038.056-.038.057-.037.058-.037.06-.037.061-.036.062-.036.063-.036.064-.036.066-.035.067-.035.068-.035.069-.035.07-.034.072-.034.072-.033.074-.033.151-.066.155-.064.159-.063.164-.061.167-.06.172-.059.176-.057.179-.056.184-.054.187-.053.19-.051.195-.05.198-.048.201-.046.204-.045.208-.043.211-.041.214-.04.217-.038.22-.036.223-.034.226-.032.228-.031.231-.028.234-.027.236-.024.238-.023.241-.02.243-.019.245-.016.247-.015.249-.012.251-.01.253-.008.255-.005.256-.004.258-.001.258.001zm-9.258 20.499v.01l.001.021.003.021.004.022.005.021.006.022.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.023.018.024.019.024.021.024.022.025.023.024.024.025.052.049.056.05.061.051.066.051.07.051.075.051.079.052.084.052.088.052.092.052.097.052.102.051.105.052.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.048.144.049.147.047.152.047.155.047.16.045.163.045.167.043.171.043.176.041.178.041.183.039.187.039.19.037.194.035.197.035.202.033.204.031.209.03.212.029.216.027.219.025.222.024.226.021.23.02.233.018.236.016.24.015.243.012.246.01.249.008.253.005.256.004.259.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.021.224-.024.22-.026.216-.027.212-.028.21-.031.205-.031.202-.034.198-.034.194-.036.191-.037.187-.039.183-.04.179-.04.175-.042.172-.043.168-.044.163-.045.16-.046.155-.046.152-.047.148-.048.143-.049.139-.049.136-.05.131-.05.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.053.083-.051.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.05.023-.024.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.023.01-.022.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.127l-.077.055-.08.053-.083.054-.085.053-.087.052-.09.052-.093.051-.095.05-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.045-.118.044-.12.043-.122.042-.124.042-.126.041-.128.04-.13.04-.132.038-.134.038-.135.037-.138.037-.139.035-.142.035-.143.034-.144.033-.147.032-.148.031-.15.03-.151.03-.153.029-.154.027-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.01-.179.008-.179.008-.181.006-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.006-.179-.008-.179-.008-.178-.01-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.027-.153-.029-.151-.03-.15-.03-.148-.031-.146-.032-.145-.033-.143-.034-.141-.035-.14-.035-.137-.037-.136-.037-.134-.038-.132-.038-.13-.04-.128-.04-.126-.041-.124-.042-.122-.042-.12-.044-.117-.043-.116-.045-.113-.045-.112-.046-.109-.047-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.05-.093-.052-.09-.051-.087-.052-.085-.053-.083-.054-.08-.054-.077-.054v4.127zm0-5.654v.011l.001.021.003.021.004.021.005.022.006.022.007.022.009.022.01.022.011.023.012.023.013.023.015.024.016.023.017.024.018.024.019.024.021.024.022.024.023.025.024.024.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.052.11.051.114.051.119.052.123.05.127.051.131.05.135.049.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.044.171.042.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.022.23.02.233.018.236.016.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.012.241-.015.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.048.139-.05.136-.049.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.051.051-.049.023-.025.023-.024.021-.025.02-.024.019-.024.018-.024.017-.024.015-.023.014-.023.013-.024.012-.022.01-.023.01-.023.008-.022.006-.022.006-.022.004-.021.004-.022.001-.021.001-.021v-4.139l-.077.054-.08.054-.083.054-.085.052-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.044-.118.044-.12.044-.122.042-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.035-.143.033-.144.033-.147.033-.148.031-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.009-.179.009-.179.007-.181.007-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.007-.179-.007-.179-.009-.178-.009-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.031-.146-.033-.145-.033-.143-.033-.141-.035-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.04-.126-.041-.124-.042-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.051-.093-.051-.09-.051-.087-.053-.085-.052-.083-.054-.08-.054-.077-.054v4.139zm0-5.666v.011l.001.02.003.022.004.021.005.022.006.021.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.024.018.023.019.024.021.025.022.024.023.024.024.025.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.051.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.043.171.043.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.021.23.02.233.018.236.017.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.013.241-.014.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.049.139-.049.136-.049.131-.051.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.049.023-.025.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.022.01-.023.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.153l-.077.054-.08.054-.083.053-.085.053-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.048-.105.048-.106.048-.109.046-.111.046-.114.046-.115.044-.118.044-.12.043-.122.043-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.034-.143.034-.144.033-.147.032-.148.032-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.024-.161.024-.162.023-.163.023-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.01-.178.01-.179.009-.179.007-.181.006-.182.006-.182.004-.184.003-.184.001-.185.001-.185-.001-.184-.001-.184-.003-.182-.004-.182-.006-.181-.006-.179-.007-.179-.009-.178-.01-.176-.01-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.023-.162-.023-.161-.024-.159-.024-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.032-.146-.032-.145-.033-.143-.034-.141-.034-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.041-.126-.041-.124-.041-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.048-.105-.048-.102-.048-.1-.05-.097-.049-.095-.051-.093-.051-.09-.052-.087-.052-.085-.053-.083-.053-.08-.054-.077-.054v4.153zm8.74-8.179l-.257.004-.254.005-.25.008-.247.011-.244.012-.241.014-.237.016-.233.018-.231.021-.226.022-.224.023-.22.026-.216.027-.212.028-.21.031-.205.032-.202.033-.198.034-.194.036-.191.038-.187.038-.183.04-.179.041-.175.042-.172.043-.168.043-.163.045-.16.046-.155.046-.152.048-.148.048-.143.048-.139.049-.136.05-.131.05-.126.051-.123.051-.118.051-.114.052-.11.052-.106.052-.101.052-.096.052-.092.052-.088.052-.083.052-.079.052-.074.051-.07.052-.065.051-.06.05-.056.05-.051.05-.023.025-.023.024-.021.024-.02.025-.019.024-.018.024-.017.023-.015.024-.014.023-.013.023-.012.023-.01.023-.01.022-.008.022-.006.023-.006.021-.004.022-.004.021-.001.021-.001.021.001.021.001.021.004.021.004.022.006.021.006.023.008.022.01.022.01.023.012.023.013.023.014.023.015.024.017.023.018.024.019.024.02.025.021.024.023.024.023.025.051.05.056.05.06.05.065.051.07.052.074.051.079.052.083.052.088.052.092.052.096.052.101.052.106.052.11.052.114.052.118.051.123.051.126.051.131.05.136.05.139.049.143.048.148.048.152.048.155.046.16.046.163.045.168.043.172.043.175.042.179.041.183.04.187.038.191.038.194.036.198.034.202.033.205.032.21.031.212.028.216.027.22.026.224.023.226.022.231.021.233.018.237.016.241.014.244.012.247.011.25.008.254.005.257.004.26.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.022.224-.023.22-.026.216-.027.212-.028.21-.031.205-.032.202-.033.198-.034.194-.036.191-.038.187-.038.183-.04.179-.041.175-.042.172-.043.168-.043.163-.045.16-.046.155-.046.152-.048.148-.048.143-.048.139-.049.136-.05.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.05.051-.05.023-.025.023-.024.021-.024.02-.025.019-.024.018-.024.017-.023.015-.024.014-.023.013-.023.012-.023.01-.023.01-.022.008-.022.006-.023.006-.021.004-.022.004-.021.001-.021.001-.021-.001-.021-.001-.021-.004-.021-.004-.022-.006-.021-.006-.023-.008-.022-.01-.022-.01-.023-.012-.023-.013-.023-.014-.023-.015-.024-.017-.023-.018-.024-.019-.024-.02-.025-.021-.024-.023-.024-.023-.025-.051-.05-.056-.05-.06-.05-.065-.051-.07-.052-.074-.051-.079-.052-.083-.052-.088-.052-.092-.052-.096-.052-.101-.052-.106-.052-.11-.052-.114-.052-.118-.051-.123-.051-.126-.051-.131-.05-.136-.05-.139-.049-.143-.048-.148-.048-.152-.048-.155-.046-.16-.046-.163-.045-.168-.043-.172-.043-.175-.042-.179-.041-.183-.04-.187-.038-.191-.038-.194-.036-.198-.034-.202-.033-.205-.032-.21-.031-.212-.028-.216-.027-.22-.026-.224-.023-.226-.022-.231-.021-.233-.018-.237-.016-.241-.014-.244-.012-.247-.011-.25-.008-.254-.005-.257-.004-.26-.001-.26.001z" transform="scale(.5)"&gt;&lt;/path&gt;&lt;/symbol&gt;&lt;/defs&gt;&lt;defs&gt;&lt;symbol height="24" width="24" id="clock"&gt;&lt;path d="M12 2c5.514 0 10 4.486 10 10s-4.486 10-10 10-10-4.486-10-10 4.486-10 10-10zm0-2c-6.627 0-12 5.373-12 12s5.373 12 12 12 12-5.373 12-12-5.373-12-12-12zm5.848 12.459c.202.038.202.333.001.372-1.907.361-6.045 1.111-6.547 1.111-.719 0-1.301-.582-1.301-1.301 0-.512.77-5.447 1.125-7.445.034-.192.312-.181.343.014l.985 6.238 5.394 1.011z" transform="scale(.5)"&gt;&lt;/path&gt;&lt;/symbol&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker orient="auto-start-reverse" markerHeight="12" markerWidth="12" markerUnits="userSpaceOnUse" refY="5" refX="7.9" id="arrowhead"&gt;&lt;path d="M -1 0 L 10 5 L 0 10 z"&gt;&lt;/path&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker refY="4.5" refX="4" orient="auto" markerHeight="8" markerWidth="15" id="crosshead"&gt;&lt;path d="M 1,2 L 6,7 M 6,2 L 1,7" stroke-width="1pt" style="stroke-dasharray: 0px, 0px;" stroke="#000000" fill="none"&gt;&lt;/path&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker orient="auto" markerHeight="28" markerWidth="20" refY="7" refX="15.5" id="filled-head"&gt;&lt;path d="M 18,7 L9,13 L14,7 L9,1 Z"&gt;&lt;/path&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker orient="auto" markerHeight="40" markerWidth="60" refY="15" refX="15" id="sequencenumber"&gt;&lt;circle r="6" cy="15" cx="15"&gt;&lt;/circle&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="75" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="80" x="191"&gt;&lt;tspan x="191"&gt;Initialization Phase&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="activation0" height="380" width="10" stroke="#666" fill="#EDF2AE" y="115" x="70"&gt;&lt;/rect&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="activation0" height="328" width="10" stroke="#666" fill="#EDF2AE" y="167" x="302"&gt;&lt;/rect&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="275" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="280" x="191"&gt;&lt;tspan x="191"&gt;Operation Phase&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="345" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="350" x="191"&gt;&lt;tspan x="191"&gt;Normal protocol operations&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="405" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="410" x="191"&gt;&lt;tspan x="191"&gt;Shutdown&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="505" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="510" x="191"&gt;&lt;tspan x="191"&gt;Connection closed&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="130" x="190"&gt;initialize request&lt;/text&gt;&lt;line marker-end="url(#arrowhead)" style="fill: none;" stroke="none" stroke-width="2" class="messageLine0" y2="165" x2="299" y1="165" x1="80"&gt;&lt;/line&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="180" x="193"&gt;initialize response&lt;/text&gt;&lt;line marker-end="url(#arrowhead)" stroke="none" stroke-width="2" class="messageLine1" style="stroke-dasharray: 3px, 3px; fill: none;" y2="215" x2="83" y1="215" x1="302"&gt;&lt;/line&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="230" x="190"&gt;initialized notification&lt;/text&gt;&lt;line marker-end="url(#filled-head)" stroke="none" stroke-width="2" class="messageLine1" style="stroke-dasharray: 3px, 3px; fill: none;" y2="265" x2="299" y1="265" x1="80"&gt;&lt;/line&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="460" x="190"&gt;Disconnect&lt;/text&gt;&lt;line marker-end="url(#filled-head)" stroke="none" stroke-width="2" class="messageLine1" style="stroke-dasharray: 3px, 3px; fill: none;" y2="495" x2="299" y1="495" x1="80"&gt;&lt;/line&gt;&lt;/svg&gt;&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;You can play with this right now.&lt;/h1&gt;
    &lt;p&gt;MCPs are barely six months old, but we are keeping up with the latest&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/mcp"&gt;
        Try launching your MCP server on Fly.io today &lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='mcps-are-not-inherently-secure-or-private' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-not-inherently-secure-or-private' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are &lt;strong&gt;&lt;i&gt;not&lt;/i&gt;&lt;/strong&gt; Inherently Secure or Private&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Here I am not talking about &lt;a href='https://simonwillison.net/2025/Apr/9/mcp-prompt-injection/' title=''&gt;prompt injection&lt;/a&gt; or &lt;a href='https://simonwillison.net/2025/May/26/github-mcp-exploited/' title=''&gt;exploitable abilities&lt;/a&gt;, though those are real problems too.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m talking about something more fundamental and basic. Let&amp;rsquo;s take a look at the very same &lt;a href='https://github.com/github/github-mcp-server' title=''&gt;GitHub MCP&lt;/a&gt; featured in the above two descriptions of an exploitable exposure. The current process for installing a stdio MCP into Claude involves placing secrets in plain text in a well-defined location. And the process for installing the &lt;em&gt;next&lt;/em&gt; MCP server is to download a program from a third party and run that tool in a way that has access to this very file.&lt;/p&gt;

&lt;p&gt;Addressing MCP security requires a holistic approach, but one key strategy component is the ability to run an MCP server on a remote machine which can only be accessed by you, and only after you present a revocable bearer token. That remote machine may require access to secrets (like your GitHub token), but those secrets are not something either your LLM client or other MCP servers will be able to access directly.&lt;/p&gt;
&lt;h2 id='mcps-should-be-considered-family' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-should-be-considered-family' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs should be considered family&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Recapping: &lt;a href='https://www.usa.philips.com/' title=''&gt;Philips&lt;/a&gt; has an &lt;a href='https://developers.meethue.com/' title=''&gt;API and SDK&lt;/a&gt; for Hue that is used by perhaps thousands, and has an &lt;a href='https://www.amazon.com/Philips-Hue/dp/B01LWAGUG7' title=''&gt;Alexa Skill&lt;/a&gt; that is used by untold millions. Of course, somebody already built a &lt;a href='https://github.com/ThomasRohde/hue-mcp' title=''&gt;Philips Hue MCP Server&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;LLMs are brains. MCPs give LLMs eyes, hands, and legs. The end result is a robot. Most MCPs today are brainless—they merely are eyes, hands, and legs. The results are appliances. I buy and install a dishwasher. You do too. Both of our dishwashers perform the same basic function. As do any Hue lights we may buy.&lt;/p&gt;

&lt;p&gt;In The Jetsons, &lt;a href='https://thejetsons.fandom.com/wiki/Rosey' title=''&gt;Rosie&lt;/a&gt; is a maid and housekeeper who is also a member of the family. She is good friends with Jane and takes the role of a surrogate aunt towards Elroy and Judy. Let&amp;rsquo;s start there and go further.&lt;/p&gt;

&lt;p&gt;A person with an LLM can build things. Imagine having a 3D printer that makes robots or agents that act on your behalf. You may want an agent to watch for trends in your business, keep track of what events are happening in your area, screen your text messages, or act as a DJ when you get together with friends.&lt;/p&gt;

&lt;p&gt;You may share the MCP servers you create with others to use as starting points, but they undoubtedly will adapt them and make them their own.&lt;/p&gt;
&lt;h2 id='closing-thoughts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#closing-thoughts' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Closing Thoughts&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Don&amp;rsquo;t get me wrong. I am not saying there won&amp;rsquo;t be MCPs that are mere appliances—there undoubtedly will be plenty of those. But not all MCPs will be appliances; many will be agents.&lt;/p&gt;

&lt;p&gt;Today, most people exploring LLMs are trying to add LLMs to things they already have—for example, IDEs. MCPs flip this. Instead of adding LLMs to something you already have, you add something you already have to an LLM.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://desktopcommander.app/' title=''&gt;Desktop Commander MCP&lt;/a&gt; is an example I&amp;rsquo;m particularly fond of that does exactly this. It also happens to be a prime example of a tool you want to give root access to, then lock it in a secure box and run someplace other than your laptop. &lt;a href='https://fly.io/docs/mcp/examples/#desktop-commander' title=''&gt;Give it a try&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Microsoft is actively working on &lt;a href='https://developer.microsoft.com/en-us/windows/agentic/' title=''&gt;Agentic Windows&lt;/a&gt;. Your smartphone is the obvious next frontier. Today, you have an app store and a web browser. MCP servers have the potential to make both obsolete—and this could happen sooner than you think.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>My AI Skeptic Friends Are All Nuts</title>
        <link rel="alternate" href="https://fly.io/blog/youre-all-nuts/"/>
        <id>https://fly.io/blog/youre-all-nuts/</id>
        <published>2025-06-02T00:00:00+00:00</published>
        <updated>2025-06-10T21:38:22+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/youre-all-nuts/assets/whoah.png"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;A heartfelt provocation about AI-assisted programming.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Tech execs are mandating LLM adoption. That&amp;rsquo;s bad strategy. But I get where they&amp;rsquo;re coming from.&lt;/p&gt;

&lt;p&gt;Some of the smartest people I know share a bone-deep belief that AI is a fad —  the next iteration of NFT mania. I&amp;rsquo;ve been reluctant to push back on them, because, well, they&amp;rsquo;re smarter than me. But their arguments are unserious, and worth confronting. Extraordinarily talented people are doing work that LLMs already do better, out of  spite.&lt;/p&gt;

&lt;p&gt;All progress on LLMs could halt today, and LLMs would remain the 2nd most important thing to happen over the course of my career.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;Important caveat&lt;/strong&gt;: I’m discussing only the implications of LLMs for software development. For art, music, and writing? I got nothing. I’m inclined to believe the skeptics in those fields. I just don’t believe them about mine.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Bona fides: I&amp;rsquo;ve been shipping software since the mid-1990s. I started out in boxed, shrink-wrap C code.  Survived an ill-advised &lt;a href='https://www.amazon.com/Modern-Design-Generic-Programming-Patterns/dp/0201704315' title=''&gt;Alexandrescu&lt;/a&gt; C++ phase. Lots of Ruby and Python tooling.  Some kernel work. A whole lot of server-side C, Go, and Rust. However you define &amp;ldquo;serious developer&amp;rdquo;, I qualify. Even if only on one of your lower tiers.&lt;/p&gt;
&lt;h3 id='level-setting' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#level-setting' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;level setting&lt;/span&gt;&lt;/h3&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;† (or, God forbid, 2 years ago with Copilot)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;First, we need to get on the same page. If you were trying and failing to use an LLM for code 6 months ago †, you’re not doing what most serious LLM-assisted coders are doing.&lt;/p&gt;

&lt;p&gt;People coding with LLMs today use agents. Agents get to poke around your codebase on their own. They author files directly. They run tools. They compile code, run tests, and iterate on the results. They also:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pull in arbitrary code from the tree, or from other trees online, into their context windows,
&lt;/li&gt;&lt;li&gt;run standard Unix tools to navigate the tree and extract information,
&lt;/li&gt;&lt;li&gt;interact with Git,
&lt;/li&gt;&lt;li&gt;run existing tooling, like linters, formatters, and model checkers, and
&lt;/li&gt;&lt;li&gt;make essentially arbitrary tool calls (that you set up) through MCP.
&lt;/li&gt;&lt;/ul&gt;
&lt;div class="callout"&gt;&lt;p&gt;The code in an agent that actually “does stuff” with code is not, itself, AI. This should reassure you. It’s surprisingly simple systems code, wired to ground truth about programming in the same way a Makefile is. You could write an effective coding agent in a weekend. Its strengths would have more to do with how you think about and structure builds and linting and test harnesses than with how advanced o3 or Sonnet have become.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you’re making requests on a ChatGPT page and then pasting the resulting (broken) code into your editor, you’re not doing what the AI boosters are doing. No wonder you&amp;rsquo;re talking past each other.&lt;/p&gt;
&lt;h3 id='the-positive-case' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-positive-case' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;the positive case&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img alt="four quadrants of tedium and importance" src="/blog/youre-all-nuts/assets/code-quad.png?2/3&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;LLMs can write a large fraction of all the tedious code you’ll ever need to write. And most code on most projects is tedious. LLMs drastically reduce the number of things you’ll ever need to Google. They look things up themselves. Most importantly, they don’t get tired; they’re immune to inertia.&lt;/p&gt;

&lt;p&gt;Think of anything you wanted to build but didn&amp;rsquo;t. You tried to home in on some first steps. If you&amp;rsquo;d been in the limerent phase of a new programming language, you&amp;rsquo;d have started writing. But you weren&amp;rsquo;t, so you put it off, for a day, a year, or your whole career.&lt;/p&gt;

&lt;p&gt;I can feel my blood pressure rising thinking of all the bookkeeping and Googling and dependency drama of a new project. An LLM can be instructed to just figure all that shit out. Often, it will drop you precisely at that golden moment where shit almost works, and development means tweaking code and immediately seeing things work better. That dopamine hit is why I code.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a downside. Sometimes, gnarly stuff needs doing. But you don&amp;rsquo;t wanna do it. So you refactor unit tests, soothing yourself with the lie that you&amp;rsquo;re doing real work. But an LLM can be told to go refactor all your unit tests. An agent can occupy itself for hours putzing with your tests in a VM and come back later with a PR. If you listen to me, you’ll know that. You&amp;rsquo;ll feel worse yak-shaving. You&amp;rsquo;ll end up doing… real work.&lt;/p&gt;
&lt;h3 id='but-you-have-no-idea-what-the-code-is' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-you-have-no-idea-what-the-code-is' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but you have no idea what the code is&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Are you a vibe coding Youtuber? Can you not read code? If so: astute point. Otherwise: what the fuck is wrong with you?&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;ve always been responsible for what you merge to &lt;code&gt;main&lt;/code&gt;. You were five years go. And you are tomorrow, whether or not you use an LLM.&lt;/p&gt;

&lt;p&gt;If you build something with an LLM that people will depend on, read the code. In fact, you&amp;rsquo;ll probably do more than that. You&amp;rsquo;ll spend 5-10 minutes knocking it back into your own style. LLMs are &lt;a href='https://github.com/PatrickJS/awesome-cursorrules' title=''&gt;showing signs of adapting&lt;/a&gt; to local idiom, but we’re not there yet.&lt;/p&gt;

&lt;p&gt;People complain about LLM-generated code being “probabilistic”. No it isn&amp;rsquo;t. It’s code. It&amp;rsquo;s not Yacc output. It&amp;rsquo;s knowable. The LLM might be stochastic. But the LLM doesn’t matter. What matters is whether you can make sense of the result, and whether your guardrails hold.&lt;/p&gt;

&lt;p&gt;Reading other people&amp;rsquo;s code is part of the job. If you can&amp;rsquo;t metabolize the boring, repetitive code an LLM generates: skills issue! How are you handling the chaos human developers turn out on a deadline?&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (because it can hold 50-70kloc in its context window)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;For the last month or so, Gemini 2.5 has been my go-to †. Almost nothing it spits out for me merges without edits. I’m sure there’s a skill to getting a SOTA model to one-shot a feature-plus-merge! But I don’t care. I like moving the code around and chuckling to myself while I delete all the stupid comments. I have to read the code line-by-line anyways.&lt;/p&gt;
&lt;h3 id='but-hallucination' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-hallucination' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but hallucination&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;If hallucination matters to you, your programming language has let you down.&lt;/p&gt;

&lt;p&gt;Agents lint. They compile and run tests. If their LLM invents a new function signature, the agent sees the error. They feed it back to the LLM, which says “oh, right, I totally made that up” and then tries again.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;ll only notice this happening if you watch the chain of thought log your agent generates. Don&amp;rsquo;t. This is why I like &lt;a href='https://zed.dev/agentic' title=''&gt;Zed&amp;rsquo;s agent mode&lt;/a&gt;: it begs you to tab away and let it work, and pings you with a desktop notification when it&amp;rsquo;s done.&lt;/p&gt;

&lt;p&gt;I’m sure there are still environments where hallucination matters. But &amp;ldquo;hallucination&amp;rdquo; is the first thing developers bring up when someone suggests using LLMs, despite it being (more or less) a solved problem.&lt;/p&gt;
&lt;h3 id='but-the-code-is-shitty-like-that-of-a-junior-developer' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-code-is-shitty-like-that-of-a-junior-developer' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the code is shitty, like that of a junior developer&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Does an intern cost $20/month? Because that&amp;rsquo;s what Cursor.ai costs.&lt;/p&gt;

&lt;p&gt;Part of being a senior developer is making less-able coders productive, be they fleshly or algebraic. Using agents well is both a both a  skill and an engineering project all its own, of prompts, indices, &lt;a href='https://fly.io/blog/semgrep-but-for-real-now/' title=''&gt;and (especially) tooling.&lt;/a&gt; LLMs only produce shitty code if you let them.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (Also: 100% of all the Bash code you should author ever again)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Maybe the current confusion is about who&amp;rsquo;s doing what work. Today, LLMs do a lot of typing, Googling, test cases †, and edit-compile-test-debug cycles. But even the most Claude-poisoned serious developers in the world still own curation, judgement, guidance, and direction.&lt;/p&gt;

&lt;p&gt;Also: let’s stop kidding ourselves about how good our human first cuts really are.&lt;/p&gt;
&lt;h3 id='but-its-bad-at-rust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-its-bad-at-rust' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but it’s bad at rust&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;It’s hard to get a good toolchain for Brainfuck, too. Life’s tough in the aluminum siding business.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (and they surely will; the Rust community takes tooling seriously)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A lot of LLM skepticism probably isn&amp;rsquo;t really about LLMs. It&amp;rsquo;s projection. People say &amp;ldquo;LLMs can&amp;rsquo;t code&amp;rdquo; when what they really mean is &amp;ldquo;LLMs can&amp;rsquo;t write Rust&amp;rdquo;. Fair enough! But people select languages in part based on how well LLMs work with them, so Rust people should get on that †.&lt;/p&gt;

&lt;p&gt;I work mostly in Go. I’m confident the designers of the Go programming language didn&amp;rsquo;t set out to produce the most LLM-legible language in the industry. They succeeded nonetheless. Go has just enough type safety, an extensive standard library, and a culture that prizes (often repetitive) idiom. LLMs kick ass generating it.&lt;/p&gt;

&lt;p&gt;All this is to say: I write some Rust. I like it fine. If LLMs and Rust aren&amp;rsquo;t working for you, I feel you. But if that’s your whole thing, we’re not having the same argument.&lt;/p&gt;
&lt;h3 id='but-the-craft' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-craft' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the craft&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Do you like fine Japanese woodworking? All hand tools and sashimono joinery? Me too. Do it on your own time.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (I’m a piker compared to my woodworking friends)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I have a basic wood shop in my basement †. I could get a lot of satisfaction from building a table. And, if that table is a workbench or a grill table, sure, I&amp;rsquo;ll build it. But if I need, like, a table? For people to sit at? In my office? I buy a fucking table.&lt;/p&gt;

&lt;p&gt;Professional software developers are in the business of solving practical problems for people with code. We are not, in our day jobs, artisans. Steve Jobs was wrong: we do not need to carve the unseen feet in the sculpture. Nobody cares if the logic board traces are pleasingly routed. If anything we build endures, it won&amp;rsquo;t be because the codebase was beautiful.&lt;/p&gt;

&lt;p&gt;Besides, that’s not really what happens. If you’re taking time carefully golfing functions down into graceful, fluent, minimal functional expressions, alarm bells should ring. You’re yak-shaving. The real work has depleted your focus. You&amp;rsquo;re not building: you&amp;rsquo;re self-soothing.&lt;/p&gt;

&lt;p&gt;Which, wait for it, is something LLMs are good for. They devour schlep, and clear a path to the important stuff, where your judgement and values really matter.&lt;/p&gt;
&lt;h3 id='but-the-mediocrity' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-mediocrity' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the mediocrity&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;As a mid-late career coder, I&amp;rsquo;ve come to appreciate mediocrity. You should be so lucky as to  have it flowing almost effortlessly from a tap.&lt;/p&gt;

&lt;p&gt;We all write mediocre code. Mediocre code: often fine. Not all code is equally important. Some code should be mediocre. Maximum effort on a random unit test? You&amp;rsquo;re doing something wrong. Your team lead should correct you.&lt;/p&gt;

&lt;p&gt;Developers all love to preen about code. They worry LLMs lower the &amp;ldquo;ceiling&amp;rdquo; for quality. Maybe. But they also raise the &amp;ldquo;floor&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Gemini&amp;rsquo;s floor is higher than my own.  My code looks nice. But it&amp;rsquo;s not as thorough. LLM code is repetitive. But mine includes dumb contortions where I got too clever trying to DRY things up.&lt;/p&gt;

&lt;p&gt;And LLMs aren&amp;rsquo;t mediocre on every axis. They almost certainly have a bigger bag of algorithmic tricks than you do: radix tries, topological sorts, graph reductions, and LDPC codes. Humans romanticize &lt;code&gt;rsync&lt;/code&gt; (&lt;a href='https://www.andrew.cmu.edu/course/15-749/READINGS/required/cas/tridgell96.pdf' title=''&gt;Andrew Tridgell&lt;/a&gt; wrote a paper about it!). To an LLM it might not be that much more interesting than a SQL join.&lt;/p&gt;

&lt;p&gt;But I&amp;rsquo;m getting ahead of myself. It doesn&amp;rsquo;t matter. If truly mediocre code is all we ever get from LLMs, that&amp;rsquo;s still huge. It&amp;rsquo;s that much less mediocre code humans have to write.&lt;/p&gt;
&lt;h3 id='but-itll-never-be-agi' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-itll-never-be-agi' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but it&amp;rsquo;ll never be AGI&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;I don&amp;rsquo;t give a shit.&lt;/p&gt;

&lt;p&gt;Smart practitioners get wound up by the AI/VC hype cycle. I can&amp;rsquo;t blame them. But it&amp;rsquo;s not an argument. Things either work or they don&amp;rsquo;t, no matter what Jensen Huang has to say about it.&lt;/p&gt;
&lt;h3 id='but-they-take-rr-jerbs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-they-take-rr-jerbs' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but they take-rr jerbs&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href='https://news.ycombinator.com/item?id=43776612' title=''&gt;So does open source.&lt;/a&gt; We used to pay good money for databases.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;re a field premised on automating other people’s jobs away. &amp;ldquo;Productivity gains,&amp;rdquo; say the economists. You get what that means, right? Fewer people doing the same stuff. Talked to a travel agent lately? Or a floor broker? Or a record store clerk? Or a darkroom tech?&lt;/p&gt;

&lt;p&gt;When this argument comes up, libertarian-leaning VCs start the chant: lamplighters, creative destruction, new kinds of work. Maybe. But I&amp;rsquo;m not hypnotized. I have no fucking clue whether we’re going to be better off after LLMs. Things could get a lot worse for us.&lt;/p&gt;

&lt;p&gt;LLMs really might displace many software developers. That&amp;rsquo;s not a high horse we get to ride. Our jobs are just as much in tech&amp;rsquo;s line of fire as everybody else&amp;rsquo;s have been for the last 3 decades. We&amp;rsquo;re not &lt;a href='https://en.wikipedia.org/wiki/2024_United_States_port_strike' title=''&gt;East Coast dockworkers&lt;/a&gt;; we won&amp;rsquo;t stop progress on our own.&lt;/p&gt;
&lt;h3 id='but-the-plagiarism' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-plagiarism' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the plagiarism&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Artificial intelligence is profoundly — and probably unfairly — threatening to visual artists in ways that might be hard to appreciate if you don&amp;rsquo;t work in the arts.&lt;/p&gt;

&lt;p&gt;We imagine artists spending their working hours pushing the limits of expression. But the median artist isn&amp;rsquo;t producing gallery pieces. They produce on brief: turning out competent illustrations and compositions for magazine covers, museum displays, motion graphics, and game assets.&lt;/p&gt;

&lt;p&gt;LLMs easily — alarmingly — clear industry quality bars. Gallingly, one of the things they&amp;rsquo;re best at is churning out just-good-enough facsimiles of human creative work.  I have family in visual arts. I can&amp;rsquo;t talk to them about LLMs. I don&amp;rsquo;t blame them. They&amp;rsquo;re probably not wrong.&lt;/p&gt;

&lt;p&gt;Meanwhile, software developers spot code fragments &lt;a href="https://arxiv.org/abs/2311.17035"&gt;seemingly lifted&lt;/a&gt; from public repositories on Github and lose their shit. What about the licensing? If you&amp;rsquo;re a lawyer, I defer. But if you&amp;rsquo;re a software developer playing this card? Cut me a little slack as I ask you to shove this concern up your ass. No profession has demonstrated more contempt for intellectual property.&lt;/p&gt;

&lt;p&gt;The median dev thinks Star Wars and Daft Punk are a public commons. The great cultural project of developers has been opposing any protection that might inconvenience a monetizable media-sharing site. When they fail at policy, they route around it with coercion. They stand up global-scale piracy networks and sneer at anybody who so much as tries to preserve a new-release window for a TV show.&lt;/p&gt;

&lt;p&gt;Call any of this out if you want to watch a TED talk about how hard it is to stream &lt;em&gt;The Expanse&lt;/em&gt; on LibreWolf. Yeah, we get it. You don&amp;rsquo;t believe in IPR. Then shut the fuck up about IPR. Reap the whirlwind.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s all special pleading anyways. LLMs digest code further than you do. If you don&amp;rsquo;t believe a typeface designer can stake a moral claim on the terminals and counters of a letterform, you sure as hell can&amp;rsquo;t be possessive about a red-black tree.&lt;/p&gt;
&lt;h3 id='positive-case-redux' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#positive-case-redux' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;positive case redux&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;When I started writing a couple days ago, I wrote a section to &amp;ldquo;level set&amp;rdquo; to the  state of the art of LLM-assisted programming. A bluefish filet has a longer shelf life than an LLM take. In the time it took you to read this, everything changed.&lt;/p&gt;

&lt;p&gt;Kids today don&amp;rsquo;t just use agents; they use asynchronous agents. They wake up, free-associate 13 different things for their LLMs to work on, make coffee, fill out a TPS report, drive to the Mars Cheese Castle, and then check their notifications. They&amp;rsquo;ve got 13 PRs to review. Three get tossed and re-prompted. Five of them get the same feedback a junior dev gets. And five get merged.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&amp;ldquo;I&amp;rsquo;m sipping rocket fuel right now,&amp;rdquo;&lt;/em&gt; a friend tells me. &lt;em&gt;&amp;ldquo;The folks on my team who aren&amp;rsquo;t embracing AI? It&amp;rsquo;s like they&amp;rsquo;re standing still.&amp;rdquo;&lt;/em&gt; He&amp;rsquo;s not bullshitting me. He doesn&amp;rsquo;t work in SFBA. He&amp;rsquo;s got no reason to lie.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s plenty of things I can&amp;rsquo;t trust an LLM with. No LLM has any of access to prod here. But I&amp;rsquo;ve been first responder on an incident and fed 4o — not o4-mini, 4o — log transcripts, and watched it in seconds spot LVM metadata corruption issues on a host we&amp;rsquo;ve been complaining about for months. Am I better than an LLM agent at interrogating OpenSearch logs and Honeycomb traces? No. No, I am not.&lt;/p&gt;

&lt;p&gt;To the consternation of many of my friends, I&amp;rsquo;m not a radical or a futurist. I&amp;rsquo;m a statist. I believe in the haphazard perseverance of complex systems, of institutions, of reversions to the mean. I write Go and Python code. I&amp;rsquo;m not a Kool-aid drinker.&lt;/p&gt;

&lt;p&gt;But something real is happening. My smartest friends are blowing it off. Maybe I persuade you. Probably I don&amp;rsquo;t. But we need to be done making space for bad arguments.&lt;/p&gt;
&lt;h3 id='but-im-tired-of-hearing-about-it' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-im-tired-of-hearing-about-it' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but i&amp;rsquo;m tired of hearing about it&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;And here I rejoin your company. I read &lt;a href='https://simonwillison.net/' title=''&gt;Simon Willison&lt;/a&gt;, and that&amp;rsquo;s all I really need. But all day, every day, a sizable chunk of the front page of HN is allocated to LLMs: incremental model updates, startups doing things with LLMs, LLM tutorials, screeds against LLMs. It&amp;rsquo;s annoying!&lt;/p&gt;

&lt;p&gt;But AI is also incredibly — a word I use advisedly — important. It&amp;rsquo;s getting the same kind of attention that smart phones got in 2008, and not as much as the Internet got. That seems about right.&lt;/p&gt;

&lt;p&gt;I think this is going to get clearer over the next year. The cool kid haughtiness about &amp;ldquo;stochastic parrots&amp;rdquo; and &amp;ldquo;vibe coding&amp;rdquo; can&amp;rsquo;t survive much more contact with reality. I&amp;rsquo;m snarking about these people, but I meant what I said: they&amp;rsquo;re smarter than me. And when they get over this affectation, they&amp;rsquo;re going to make coding agents profoundly more effective than they are today.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Using Kamal 2.0 in Production</title>
        <link rel="alternate" href="https://fly.io/blog/kamal-in-production/"/>
        <id>https://fly.io/blog/kamal-in-production/</id>
        <published>2025-05-29T00:00:00+00:00</published>
        <updated>2025-06-02T20:40:48+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/kamal-in-production/assets/production.jpg"/>
        <content type="html">&lt;p&gt;&lt;a href='https://pragprog.com/titles/rails8/agile-web-development-with-rails-8/' title=''&gt;Agile Web Development with Rails 8&lt;/a&gt; is off to production, where they do things like editing, indexing, pagination, and printing. In researching the chapter on Deployment and Production, I became very dissatisfied with the content available on Kamal. I ended up writing my own, and it went well beyond the scope of the book. I then extracted what I needed from the result and put it in the book.&lt;/p&gt;

&lt;p&gt;Now that I have some spare time, I took a look at the greater work. It was more than a chapter and less than a book, so I decided to publish it &lt;a href='https://rubys.github.io/kamal-in-production/' title=''&gt;online&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This took me only a matter of hours. I had my notes in the XML grammar that Pragmatic Programming uses for books. I asked GitHub Copilot to convert them to Markdown. It did the job without my having to explain the grammar. It made intelligent guesses as to how to handle footnotes and got a number of these wrong, but that was easy to fix. On a lark, I asked it to proofread the content, and it did that too.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Don&amp;rsquo;t get me wrong, Kamal is great. There are plenty of videos on how to get toy projects online, and the documentation will tell you what each field in the configuration file does. But none pull together everything you need to deploy a real project. For example, &lt;a href='https://rubys.github.io/kamal-in-production/Assemble/' title=''&gt;there are seven things you need to get started&lt;/a&gt;. Some are optional, some you may already have, and all can be gathered quickly &lt;strong class='font-semibold text-navy-950'&gt;if you have a list&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Kamal is just one piece of the puzzle. To deploy your software using Kamal, you need to be aware of the vast Docker ecosystem. You will want to set up a builder, sign up for a container repository, and lock down your secrets. And as you grow, you will want a load balancer and a managed database.&lt;/p&gt;

&lt;p&gt;And production is much more than copying files and starting a process. It is ensuring that your database is backed up, that your logs are searchable, and that your application is being monitored. It is also about ensuring that your application is secure.&lt;/p&gt;

&lt;p&gt;My list is opinionated. Each choice has a lot of options. For SSH keys, there are a lot of key types. If you don&amp;rsquo;t have an opinion, go with what GitHub recommends. For hosting, there are a lot of providers, and most videos start with Hetzner, which is a solid choice. But did you know that there are separate paths for Cloud and Robot, and when you would want either?&lt;/p&gt;

&lt;p&gt;A list with default options and alternatives highlighted was what would have helped me get started. Now it is available to you. The &lt;a href='https://github.com/rubys/kamal-in-production/' title=''&gt;source is on GitHub&lt;/a&gt;. &lt;a href='https://creativecommons.org/public-domain/cc0/' title=''&gt;CC0 licensed&lt;/a&gt;. Feel free to add side pages or links to document Digital Ocean, add other monitoring tools, or simply correct a typo that GitHub Copilot and I missed (or made).&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;And if you happen to be in the south eastern part of the US in August, come see me talk on this topic at the &lt;a href='https://blog.carolina.codes/p/announcing-our-2025-speakers' title=''&gt;Carolina Code Conference&lt;/a&gt;. If you can&amp;rsquo;t make it, the presentation will be recorded and posted online.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>parking_lot: ffffffffffffffff...</title>
        <link rel="alternate" href="https://fly.io/blog/parking-lot-ffffffffffffffff/"/>
        <id>https://fly.io/blog/parking-lot-ffffffffffffffff/</id>
        <published>2025-05-28T00:00:00+00:00</published>
        <updated>2025-06-02T20:40:48+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/parking-lot-ffffffffffffffff/assets/ffff-cover.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a public cloud that runs apps in a bunch of locations all over the world. This is a post about a gnarly bug in our Anycast router, the largest Rust project in our codebase. You won’t need much of any Rust to follow it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The basic idea is simple: customers give us Docker containers, and tell us which one of 30+ regions around the world they want them to run in. We convert the containers into lightweight virtual machines, and then link them to an Anycast network. If you boot up an app here in Sydney and Frankfurt, and a request for it lands in Narita, it&amp;rsquo;ll get routed to Sydney. The component doing that work is called &lt;code&gt;fly-proxy&lt;/code&gt;. It&amp;rsquo;s a Rust program, and it has been ill behaved of late.&lt;/p&gt;
&lt;h3 id='dramatis-personae' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#dramatis-personae' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Dramatis Personae&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;fly-proxy&lt;/code&gt;, our intrepid Anycast router.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;corrosion&lt;/code&gt;, our intrepid Anycast routing protocol.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Rust&lt;/code&gt;, a programming language you probably don&amp;rsquo;t use.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;read-write locks&lt;/code&gt;, a synchronization primitive that allows for many readers &lt;em&gt;or&lt;/em&gt; one single writer.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;parking_lot&lt;/code&gt;, a well-regarded optimized implementation of locks in Rust.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Gaze not into the abyss, lest you become recognized as an &lt;strong class="font-semibold text-navy-950"&gt;&lt;em&gt;abyss domain expert&lt;/em&gt;&lt;/strong&gt;, and they expect you keep gazing into the damn thing&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Mathewson &lt;a href="https://x.com/nickm_tor/status/860234274842324993?lang=en" title=""&gt;6:31&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;&lt;h3 id='anycast-routing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#anycast-routing' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Anycast Routing&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;You already know how to build a proxy. Connection comes in, connection goes out, copy back and forth. So for the amount of time we spend writing about &lt;code&gt;fly-proxy&lt;/code&gt;, you might wonder what the big deal is.&lt;/p&gt;

&lt;p&gt;To be fair, in the nuts and bolts of actually proxying requests, &lt;code&gt;fly-proxy&lt;/code&gt; does some interesting stuff. For one thing, it&amp;rsquo;s &lt;a href='https://github.com/jedisct1/yes-rs' title=''&gt;written in Rust&lt;/a&gt;, which is apparently a big deal all on its own. It&amp;rsquo;s a large, asynchronous codebase that handles multiple protocols, TLS termination, and certificate issuance. It exercises a lot of &lt;a href='https://tokio.rs/' title=''&gt;Tokio&lt;/a&gt; features.&lt;/p&gt;

&lt;p&gt;But none of this is the hard part of &lt;code&gt;fly-proxy&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We operate thousands of servers around the world. A customer Fly Machine might get scheduled onto any of them; in some places, the expectation we set with customers is that their Machines will potentially start in less than a second. More importantly, a Fly Machine can terminate instantly. In both cases, but especially the latter, &lt;code&gt;fly-proxy&lt;/code&gt; potentially needs to know, so that it does (or doesn&amp;rsquo;t) route traffic there.&lt;/p&gt;

&lt;p&gt;This is the hard problem: managing millions of connections for millions of apps. It&amp;rsquo;s a lot of state to manage, and it&amp;rsquo;s in constant flux. We refer to this as the &amp;ldquo;state distribution problem&amp;rdquo;, but really, it quacks like a routing protocol.&lt;/p&gt;
&lt;h3 id='our-routing-protocol-is-corrosion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-routing-protocol-is-corrosion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Our Routing Protocol is Corrosion&lt;/span&gt;&lt;/h3&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;Corrosion2, to be precise.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We&amp;rsquo;ve been through multiple iterations of the state management problem, and the stable place we&amp;rsquo;ve settled is a &lt;a href='https://github.com/superfly/corrosion' title=''&gt;system called Corrosion&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Corrosion is a large SQLite database mirrored globally across several thousand servers. There are three important factors that make Corrosion work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The SQLite database Corrosion replicates is CRDT-structured.
&lt;/li&gt;&lt;li&gt;In our orchestration model, individual worker servers are the source of truth for any Fly Machines running on them; there&amp;rsquo;s no globally coordinated orchestration state.
&lt;/li&gt;&lt;li&gt;We use &lt;a href='http://fly.io/blog/building-clusters-with-serf/#what-serf-is-doing' title=''&gt;SWIM gossip&lt;/a&gt; to publish updates from those workers across the fleet.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;This works. A Fly Machine terminates in Dallas; a &lt;code&gt;fly-proxy&lt;/code&gt; instance in Singapore knows within a small number of seconds.&lt;/p&gt;
&lt;h3 id='routing-protocol-implementations-are-hard' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#routing-protocol-implementations-are-hard' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Routing Protocol Implementations Are Hard&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;A routing protocol is a canonical example of a distributed system. We&amp;rsquo;ve certainly hit distsys bugs in Corrosion. But this story is about basic implementation issues. &lt;/p&gt;

&lt;p&gt;A globally replicated SQLite database is an awfully nice primitive, but we&amp;rsquo;re not actually doing SQL queries every time a request lands.&lt;/p&gt;

&lt;p&gt;In somewhat the same sense as a router works both with a &lt;a href='https://www.dasblinkenlichten.com/rib-and-fib-understanding-the-terminology/' title=''&gt;RIB and a FIB&lt;/a&gt;, there is in &lt;code&gt;fly-proxy&lt;/code&gt; a system of record for routing information (Corrosion), and then an in-memory aggregation of that information used to make fast decisions. In &lt;code&gt;fly-proxy&lt;/code&gt;, that&amp;rsquo;s called the Catalog. It&amp;rsquo;s a record of everything in Corrosion a proxy might need to know about to forward requests.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s a fun bug from last year:&lt;/p&gt;

&lt;p&gt;At any given point in time, there&amp;rsquo;s a lot going on inside &lt;code&gt;fly-proxy&lt;/code&gt;. It is a highly concurrent system. In particular, there are many readers to the Catalog, and also, when Corrosion updates, writers. We manage access to the Catalog with a system of &lt;a href='https://doc.rust-lang.org/std/sync/struct.RwLock.html' title=''&gt;read-write locks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Rust is big on pattern-matching; instead of control flow based (at bottom) on simple arithmetic expressions, Rust allows you to &lt;code&gt;match&lt;/code&gt; exhaustively (with compiler enforcement) on the types of an expression, and the type system expresses things like &lt;code&gt;Ok&lt;/code&gt; or &lt;code&gt;Err&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But &lt;code&gt;match&lt;/code&gt; can be cumbersome, and so there are shorthands. One of them is &lt;code&gt;if let&lt;/code&gt;, which is syntax that makes a pattern match read like a classic &lt;code&gt;if&lt;/code&gt; statement. Here&amp;rsquo;s an example:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-zarn5q31"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-zarn5q31"&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Load&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;Local&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.load&lt;/span&gt;&lt;span class="nf"&gt;.read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// do a bunch of stuff with `load`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="nf"&gt;.init_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &amp;ldquo;if&amp;rdquo; arm of that branch is taken if &lt;code&gt;self.load.read().get()&lt;/code&gt; returns a value with the type &lt;code&gt;Some&lt;/code&gt;. To retrieve that value, the expression calls &lt;code&gt;read()&lt;/code&gt; to grab a lock.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;though Rust programmers probably notice the bug quickly&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The bug is subtle: in that code, the lock &lt;code&gt;self.load.read().get()&lt;/code&gt; takes is held not just for the duration of the &amp;ldquo;if&amp;rdquo; arm, but also for the &amp;ldquo;else&amp;rdquo; arm — you can think of &lt;code&gt;if let&lt;/code&gt; expressions as being rewritten to the equivalent &lt;code&gt;match&lt;/code&gt; expression, where that lifespan is much clearer.&lt;/p&gt;

&lt;p&gt;Anyways that&amp;rsquo;s real code and it occurred on a code path in &lt;code&gt;fly-proxy&lt;/code&gt; that was triggered by a Corrosion update propagating from host to host across our fleet in millisecond intervals of time, converging quickly, like a good little routing protocol, on a global consensus that our entire Anycast routing layer should be deadlocked. Fun times.&lt;/p&gt;
&lt;h3 id='the-watchdog-and-regionalizing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-watchdog-and-regionalizing' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Watchdog, and Regionalizing&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The experience of that Anycast outage was arresting, and immediately altered our plans. We set about two tasks, one short-term and one long.&lt;/p&gt;

&lt;p&gt;In the short term: we made deadlocks nonlethal with a &amp;ldquo;watchdog&amp;rdquo; system. &lt;code&gt;fly-proxy&lt;/code&gt; has an internal control channel (it drives a REPL operators can run from our servers). During a deadlock (or dead-loop or exhaustion), that channel becomes nonresponsive. We watch for that and bounce the proxy when it happens. A deadlock is still bad! But it&amp;rsquo;s a second-or-two-length arrhythmia, not asystole.&lt;/p&gt;

&lt;p&gt;Meanwhile, over the long term: we&amp;rsquo;re confronting the implications of all our routing state sharing a global broadcast domain. The update that seized up Anycast last year pertained to an app nobody used. There wasn&amp;rsquo;t any real reason for any &lt;code&gt;fly-proxy&lt;/code&gt; to receive it in the first place. But in the &lt;em&gt;status quo ante&lt;/em&gt; of the outage, every proxy received updates for every Fly Machine.&lt;/p&gt;

&lt;p&gt;They still do. Why? Because it scales, and fixing it turns out to be a huge lift. It&amp;rsquo;s a lift we&amp;rsquo;re still making! It&amp;rsquo;s just taking time. We call this effort &amp;ldquo;regionalization&amp;rdquo;, because the next intermediate goal is to confine most updates to the region (Sydney, Frankfurt, Dallas) in which they occur.&lt;/p&gt;

&lt;p&gt;I hope this has been a satisfying little tour of the problem domain we&amp;rsquo;re working in. We have now reached the point where I can start describing the new bug.&lt;/p&gt;
&lt;h3 id='new-bug-round-1-lazy-loading' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-1-lazy-loading' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 1: Lazy Loading&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;We want to transform our original Anycast router, designed to assume a full picture of global state, into a regionalized router with partitioned state. A sane approach: have it lazy-load state information. Then you can spare most of the state code; state for apps that never show up at a particular &lt;code&gt;fly-proxy&lt;/code&gt; in, say, Hong Kong simply doesn&amp;rsquo;t get loaded.&lt;/p&gt;

&lt;p&gt;For months now, portions of the &lt;code&gt;fly-proxy&lt;/code&gt; Catalog have been lazy-loaded. A few weeks ago, the decision is made to get most of the rest of the Catalog state (apps, machines, &amp;amp;c) lazy-loaded as well. It&amp;rsquo;s a straightforward change and it gets rolled out quickly.&lt;/p&gt;

&lt;p&gt;Almost as quickly, proxies begin locking up and getting bounced by the watchdog. Not in the frightening Beijing-Olympics-level coordinated fashion that happened during the Outage, but any watchdog bounce is a problem.&lt;/p&gt;

&lt;p&gt;We roll back the change.&lt;/p&gt;

&lt;p&gt;From the information we have, we&amp;rsquo;ve narrowed things down to two suspects. First, lazy-loading changes the read/write patterns and thus the pressure on the RWLocks the Catalog uses; it could just be lock contention. Second, we spot a suspicious &lt;code&gt;if let&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id='new-bug-round-2-the-lock-refactor' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-2-the-lock-refactor' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 2: The Lock Refactor&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Whichever the case, there&amp;rsquo;s a reason these proxies locked up with the new lazy-loading code, and our job is to fix it. The &lt;code&gt;if let&lt;/code&gt; is easy. Lock contention is a little trickier.&lt;/p&gt;

&lt;p&gt;At this point it&amp;rsquo;s time to introduce a new character to the story, though they&amp;rsquo;ve been lurking on the stage the whole time: it&amp;rsquo;s &lt;a href='https://github.com/Amanieu/parking_lot' title=''&gt;&lt;code&gt;parking_lot&lt;/code&gt;&lt;/a&gt;, an important, well-regarded, and widely-used replacement for the standard library&amp;rsquo;s lock implementation.&lt;/p&gt;

&lt;p&gt;Locks in &lt;code&gt;fly-proxy&lt;/code&gt; are &lt;code&gt;parking_lot&lt;/code&gt; locks. People use &lt;code&gt;parking_lot&lt;/code&gt; mostly because it is very fast and tight (a lock takes up just a single 64-bit word). But we use it for the feature set. The feature we&amp;rsquo;re going to pull out this time is lock timeouts: the RWLock in &lt;code&gt;parking_lot&lt;/code&gt; exposes a &lt;a href='https://amanieu.github.io/parking_lot/parking_lot/struct.RwLock.html#method.try_write_for' title=''&gt;&lt;code&gt;try_write_for&lt;/code&gt;&lt;/a&gt;method, which takes a &lt;code&gt;Duration&lt;/code&gt;, after which an attempt to grab the write lock fails.&lt;/p&gt;

&lt;p&gt;Before rolling out a new lazy-loading &lt;code&gt;fly-proxy&lt;/code&gt;, we do some refactoring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;our Catalog write locks all time out, so we&amp;rsquo;ll get telemetry and a failure recovery path if that&amp;rsquo;s what&amp;rsquo;s choking the proxy to death,
&lt;/li&gt;&lt;li&gt;we eliminate RAII-style lock acquisition from the Catalog code (in RAII style, you grab a lock into a local value, and the lock is released implicitly when the scope that expression occurs in concludes) and replace it with explicit closures, so you can look at the code and see precisely the interval in which the lock is held, and
&lt;/li&gt;&lt;li&gt;since we now have code that is very explicit about locking intervals, we instrument it with logging and metrics so we can see what&amp;rsquo;s happening.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;We should be set. The suspicious &lt;code&gt;if let&lt;/code&gt; is gone, lock acquisition can time out, and we have all this new visibility.&lt;/p&gt;

&lt;p&gt;Nope. Immediately more lockups, all in Europe, especially in &lt;code&gt;WAW&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id='new-bug-round-3-telemetry-inspection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-3-telemetry-inspection' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 3: Telemetry Inspection&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;That we&amp;rsquo;re still seeing deadlocks is f&amp;#39;ing weird. We&amp;rsquo;ve audited all our Catalog locks. You can look at the code and see the lifespan of a grabbed lock.&lt;/p&gt;

&lt;p&gt;We have a clue: our lock timeout logs spam just before a proxy locks up and is killed by the watchdog. This clue: useless. But we don&amp;rsquo;t know that yet!&lt;/p&gt;

&lt;p&gt;Our new hunch is that something is holding a lock for a very long time, and we just need to find out which thing that is. Maybe the threads updating the Catalog from Corrosion are dead-looping?&lt;/p&gt;

&lt;p&gt;The instrumentation should tell the tale. But it does not. We see slow locks… just before the entire proxy locks up. And they happen in benign, quiet applications. Random, benign, quiet applications.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;parking_lot&lt;/code&gt; has a &lt;a href='https://amanieu.github.io/parking_lot/parking_lot/deadlock/index.html' title=''&gt;deadlock detector&lt;/a&gt;. If you ask it, it&amp;rsquo;ll keep a waiting-for dependency graph and detect stalled threads. This runs on its own thread, isolate outside the web of lock dependencies in the rest of the proxy. We cut a build of the proxy with the deadlock detector enabled and wait for something in &lt;code&gt;WAW&lt;/code&gt; to lock up. And it does. But &lt;code&gt;parking_lot&lt;/code&gt; doesn&amp;rsquo;t notice. As far as it&amp;rsquo;s concerned, nothing is wrong.&lt;/p&gt;

&lt;p&gt;We are at this moment very happy we did the watchdog thing.&lt;/p&gt;
&lt;h3 id='new-bug-round-4-descent-into-madness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-4-descent-into-madness' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 4: Descent Into Madness&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;When the watchdog bounces a proxy, it snaps a core dump from the process it just killed. We are now looking at core dumps. There is only one level of decompensation to be reached below &amp;ldquo;inspecting core dumps&amp;rdquo;, and that&amp;rsquo;s &amp;ldquo;blaming the compiler&amp;rdquo;. We will get there.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s Pavel, at the time:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I’ve been staring at the last core dump from &lt;code&gt;waw&lt;/code&gt; . It’s quite strange.
First, there is no thread that’s running inside the critical section. Yet, there is a thread that’s waiting to acquire write lock and a bunch of threads waiting to acquire a read lock.
That’s doesn’t prove anything, of course, as a thread holding catalog write lock might have just released it before core dump was taken. But that would be quite a coincidence.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The proxy is locked up. But there are no threads in a critical section for the catalog. Nevertheless, we can see stack traces of multiple threads waiting to acquire locks. In the moment, Pavel thinks this could be a fluke. But we&amp;rsquo;ll soon learn that &lt;em&gt;every single stack trace&lt;/em&gt; shows the same pattern: everything wants the Catalog lock, but nobody has it.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s hard to overstate how weird this is. It breaks both our big theories: it&amp;rsquo;s not compatible with a Catalog deadlock that we missed, and it&amp;rsquo;s not compatible with a very slow lock holder. Like a podcast listener staring into the sky at the condensation trail of a jetliner, we begin reaching for wild theories. For instance: &lt;code&gt;parking_lot&lt;/code&gt; locks are synchronous, but we&amp;rsquo;re a Tokio application; something somewhere could be taking an async lock that&amp;rsquo;s confusing the runtime. Alas, no.&lt;/p&gt;

&lt;p&gt;On the plus side, we are now better at postmortem core dump inspection with &lt;code&gt;gdb&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id='new-bug-round-5-madness-gives-way-to-desperation' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-5-madness-gives-way-to-desperation' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 5: Madness Gives Way To Desperation&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Fuck it, we&amp;rsquo;ll switch to &lt;code&gt;read_recursive&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A recursive read lock is an eldritch rite invoked when you need to grab a read lock deep in a call tree where you already grabbed that lock, but can&amp;rsquo;t be arsed to structure the code properly to reflect that. When you ask for a recursive lock, if you already hold the lock, you get the lock again, instead of a deadlock.&lt;/p&gt;

&lt;p&gt;Our theory: &lt;code&gt;parking_lot&lt;/code&gt;goes through some trouble to make sure a stampede of readers won&amp;rsquo;t starve writers, who are usually outnumbered. It prefers writers by preventing readers from acquiring locks when there&amp;rsquo;s at least one waiting writer. And &lt;code&gt;read_recursive&lt;/code&gt; sidesteps that logic.&lt;/p&gt;

&lt;p&gt;Maybe there&amp;rsquo;s some ultra-slow reader somehow not showing up in our traces, and maybe switching to recursive locks will cut through that.&lt;/p&gt;

&lt;p&gt;This does not work. At least, not how we hoped it would. It does generate a new piece of evidence:  &lt;code&gt;RwLock reader count overflow&lt;/code&gt; log messages, and lots of them.&lt;/p&gt;
&lt;h3 id='there-are-things-you-are-not-meant-to-know' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#there-are-things-you-are-not-meant-to-know' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;There Are Things You Are Not Meant To Know&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;You&amp;rsquo;re reading a 3,000 word blog post about a single concurrency bug, so my guess is you&amp;rsquo;re the kind of person who compulsively wants to understand how everything works. That&amp;rsquo;s fine, but a word of advice: there are things where, if you find yourself learning about them in detail, something has gone wrong.&lt;/p&gt;

&lt;p&gt;One of those things is the precise mechanisms used by your RWLock implementation.&lt;/p&gt;

&lt;p&gt;The whole point of &lt;code&gt;parking_lot&lt;/code&gt; is that the locks are tiny, marshalled into a 64 bit word. Those bits are partitioned into &lt;a href='https://github.com/Amanieu/parking_lot/blob/master/src/raw_rwlock.rs' title=''&gt;4 signaling bits&lt;/a&gt; (&lt;code&gt;PARKED&lt;/code&gt;, &lt;code&gt;WRITER_PARKED&lt;/code&gt;, &lt;code&gt;WRITER&lt;/code&gt;, and &lt;code&gt;UPGRADEABLE&lt;/code&gt;) and a 60-bit counter of lock holders.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Me, a dummy: sounds like we overflowed that counter.&lt;/p&gt;

&lt;p&gt;Pavel, a genius: we are not overflowing a 60-bit counter.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ruling out a genuine overflow (which would require paranormal activity) leaves us with corruption as the culprit. Something is bashing our lock word. If the counter is corrupted, maybe the signaling bits are too; then we&amp;rsquo;re in an inconsistent state, an artificial deadlock.&lt;/p&gt;

&lt;p&gt;Easily confirmed. We cast the lock words into &lt;code&gt;usize&lt;/code&gt; and log them. Sure enough, they&amp;rsquo;re &lt;code&gt;0xFFFFFFFFFFFFFFFF&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is a smoking gun, because it implies all 4 signaling bits are set, and that includes &lt;code&gt;UPGRADEABLE&lt;/code&gt;. Upgradeable locks are read-locks that can be &amp;ldquo;upgraded&amp;rdquo; to write locks. We don&amp;rsquo;t use them.&lt;/p&gt;

&lt;p&gt;This looks like classic memory corruption. But in our core dumps, memory doesn&amp;rsquo;t appear corrupted: the only thing set all &lt;code&gt;FFh&lt;/code&gt; is the lock word.&lt;/p&gt;

&lt;p&gt;We compile and run our test suites &lt;a href='https://github.com/rust-lang/miri' title=''&gt;under &lt;code&gt;miri&lt;/code&gt;&lt;/a&gt;, a Rust interpreter for its &lt;a href='https://rustc-dev-guide.rust-lang.org/mir/index.html' title=''&gt;MIR IR&lt;/a&gt;. &lt;code&gt;miri&lt;/code&gt; does all sorts of cool UB detection, and does in fact spot some UB in our tests, which we are at the time excited about until we deploy the fixes and nothing gets better.&lt;/p&gt;

&lt;p&gt;At this point, Saleem suggests guard pages. We could &lt;code&gt;mprotect&lt;/code&gt; memory pages around the lock to force a panic if a wild write hits &lt;em&gt;near&lt;/em&gt; the lock word, or just allocate zero pages and check them, which we are at the time excited about until we deploy the guard pages and nothing hits them.&lt;/p&gt;
&lt;h3 id='the-non-euclidean-horror-at-the-heart-of-this-bug' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-non-euclidean-horror-at-the-heart-of-this-bug' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Non-Euclidean Horror At The Heart Of This Bug&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;At this point we should recap where we find ourselves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We made a relatively innocuous change to our proxy, which caused proxies in Europe to get watchdog-killed.
&lt;/li&gt;&lt;li&gt;We audited and eliminated all the nasty &lt;code&gt;if-letses&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;We replaced all RAII lock acquisitions with explicit closures, and instrumented the closures. 
&lt;/li&gt;&lt;li&gt;We enabled &lt;code&gt;parking_lot&lt;/code&gt; deadlock detection. 
&lt;/li&gt;&lt;li&gt;We captured and analyzed core dumps for the killed proxies. 
&lt;/li&gt;&lt;li&gt;We frantically switched to recursive read locks, which generated a new error.
&lt;/li&gt;&lt;li&gt;We spotted what looks like memory corruption, but only of that one tiny lock word.
&lt;/li&gt;&lt;li&gt;We ran our code under an IR interpreter to find UB, fixed some UB, and didn&amp;rsquo;t fix the bug.
&lt;/li&gt;&lt;li&gt;We set up guard pages to catch wild writes.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;In Richard Cook&amp;rsquo;s essential &lt;a href='https://how.complexsystems.fail/' title=''&gt;&amp;ldquo;How Complex Systems Fail&amp;rdquo;&lt;/a&gt;, rule #5 is that &amp;ldquo;complex systems operate in degraded mode&amp;rdquo;. &lt;em&gt;The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws&lt;/em&gt;. Maybe &lt;code&gt;fly-proxy&lt;/code&gt; is now like the RBMK reactors at Chernobyl, a system that works just fine as long as operators know about and compensate for its known concurrency flaws, and everybody lives happily ever after.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;We are, in particular, running on the most popular architecture for its RWLock implementation.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We have reached the point where serious conversations are happening about whether we&amp;rsquo;ve found a Rust compiler bug. Amusingly, &lt;code&gt;parking_lot&lt;/code&gt; is so well regarded among Rustaceans that it&amp;rsquo;s equally if not more plausible that Rust itself is broken.&lt;/p&gt;

&lt;p&gt;Nevertheless, we close-read the RWLock implementation. And we spot this:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-c23zvw3n"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-c23zvw3n"&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.state&lt;/span&gt;&lt;span class="nf"&gt;.fetch_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="n"&gt;prev_value&lt;/span&gt;&lt;span class="nf"&gt;.wrapping_sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WRITER_BIT&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;WRITER_PARKED_BIT&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                           &lt;span class="nn"&gt;Ordering&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Relaxed&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This looks like gibberish, so let&amp;rsquo;s rephrase that code to see what it&amp;rsquo;s actually doing:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-oq3znyk"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-oq3znyk"&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.state&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WRITER_BIT&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="n"&gt;WRITER_PARKED_BIT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If you know exactly the state of the word you&amp;rsquo;re writing to, and you know exactly the limited set of permutations the bits of that word can take (for instance, because there&amp;rsquo;s only 4 signaling bits), then instead of clearing bit by fetching a word, altering it, and then storing it, you can clear them &lt;em&gt;atomically&lt;/em&gt; by adding the inverse of those bits to the word.&lt;/p&gt;

&lt;p&gt;This pattern is self-synchronizing, but it relies on an invariant: you&amp;rsquo;d better be right about the original state of the word you&amp;rsquo;re altering. Because if you&amp;rsquo;re wrong, you&amp;rsquo;re adding a very large value to an uncontrolled value.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;parking_lot&lt;/code&gt;, say we have &lt;code&gt;WRITER&lt;/code&gt; and &lt;code&gt;WRITER_PARKED&lt;/code&gt; set: the state is &lt;code&gt;0b1010&lt;/code&gt;. &lt;code&gt;prev_value&lt;/code&gt;, the state of the lock word when the lock operation started, is virtually always 0, and that&amp;rsquo;s what we&amp;rsquo;re counting on. &lt;code&gt;prev_value.wrapping_sub()&lt;/code&gt;then calculates &lt;code&gt;0xFFFFFFFFFFFFFFF6&lt;/code&gt;, which exactly cancels out the &lt;code&gt;0b1010&lt;/code&gt; state, leaving 0.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;As a grace note, the locking code manages to set that one last unset bit before the proxy deadlocks.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Consider though what happens if one of those bits isn&amp;rsquo;t set: state is &lt;code&gt;0b1000&lt;/code&gt;. Now that add doesn&amp;rsquo;t cancel out; the final state is instead &lt;code&gt;0xFFFFFFFFFFFFFFFE&lt;/code&gt;. The reader count is completely full and can&amp;rsquo;t be decremented, and all the waiting bits are set so nothing can happen on the lock.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;parking_lot&lt;/code&gt; is a big deal and we&amp;rsquo;re going to be damn sure before we file a bug report. Which doesn&amp;rsquo;t take long; Pavel reproduces the bug in a minimal test case, with a forked version of &lt;code&gt;parking_lot&lt;/code&gt; that confirms and logs the condition.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://github.com/Amanieu/parking_lot/issues/465' title=''&gt;The &lt;code&gt;parking_lot&lt;/code&gt; team quickly confirms&lt;/a&gt; and fixes the bug.&lt;/p&gt;
&lt;h3 id='ex-insania-claritas' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ex-insania-claritas' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Ex Insania, Claritas&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Here&amp;rsquo;s what we now know to have been happening:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Thread 1 grabs a read lock.
&lt;/li&gt;&lt;li&gt;Thread 2 tries to grab a write lock, with a &lt;code&gt;try_write_for&lt;/code&gt; timeout; it&amp;rsquo;s &amp;ldquo;parked&amp;rdquo; waiting for the reader, which sets &lt;code&gt;WRITER&lt;/code&gt; and &lt;code&gt;WRITER_PARKED&lt;/code&gt; on the raw lock word.
&lt;/li&gt;&lt;li&gt;Thread 1 releases the lock, unparking a waiting writer, which unsets &lt;code&gt;WRITER_PARKED&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;Thread 2 wakes, but not for the reason it thinks: the timing is such that it thinks the lock has timed out. So it tries to clear both &lt;code&gt;WRITER&lt;/code&gt; and &lt;code&gt;WRITER_PARKED&lt;/code&gt; — a bitwise &amp;ldquo;double free&amp;rdquo;. Lock: corrupted. Computer: over. 
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;&lt;a href='https://github.com/Amanieu/parking_lot/pull/466' title=''&gt;The fix is simple&lt;/a&gt;: the writer bit is cleared separately in the same wakeup queue as the reader. The fix is deployed, the lockups never recur.&lt;/p&gt;

&lt;p&gt;At a higher level, the story is this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We&amp;rsquo;re refactoring the proxy to regionalize it, which changes the pattern  of readers and writers on the catalog.
&lt;/li&gt;&lt;li&gt;As we broaden out lazy loads, we introduce writer contention, a classic concurrency/performance debugging problem. It looks like a deadlock at the time! It isn&amp;rsquo;t. 
&lt;/li&gt;&lt;li&gt;&lt;code&gt;try_write_for&lt;/code&gt; is a good move: we need tools to manage contention.
&lt;/li&gt;&lt;li&gt;But now we&amp;rsquo;re on a buggy code path in &lt;code&gt;parking_lot&lt;/code&gt; — we don&amp;rsquo;t know that and can&amp;rsquo;t understand it until we&amp;rsquo;ve lost enough of our minds to second-guess the library.
&lt;/li&gt;&lt;li&gt;We stumble on the bug out of pure dumb luck by stabbing in the dark with &lt;code&gt;read_recursive&lt;/code&gt;.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Mysteries remain. Why did this only happen in &lt;code&gt;WAW&lt;/code&gt;? Some kind of crazy regional timing thing? Something to do with the Polish &lt;em&gt;kreska&lt;/em&gt; diacritic that makes L&amp;rsquo;s sound like W&amp;rsquo;s? The wax and wane of caribou populations? Some very high-traffic APIs local to these regions? All very plausible.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ll never know because we fixed the bug.&lt;/p&gt;

&lt;p&gt;But we&amp;rsquo;re in a better place now, even besides the bug fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we audited all our catalog locking, got rid of all the iflets, and stopped relying on RAII lock guards.
&lt;/li&gt;&lt;li&gt;the resulting closure patterns gave us lock timing metrics, which will be useful dealing with future write contention
&lt;/li&gt;&lt;li&gt;all writes are now labeled, and we emit labeled logs on slow writes, augmented with context data (like app IDs) where we have it
&lt;/li&gt;&lt;li&gt;we also now track the last and current holder of the write lock, augmented with context information; next time we have a deadlock, we should have all the information we need to identify the actors without &lt;code&gt;gdb&lt;/code&gt; stack traces.
&lt;/li&gt;&lt;/ul&gt;</content>
    </entry>
    <entry>
        <title>Litestream: Revamped</title>
        <link rel="alternate" href="https://fly.io/blog/litestream-revamped/"/>
        <id>https://fly.io/blog/litestream-revamped/</id>
        <published>2025-05-20T00:00:00+00:00</published>
        <updated>2025-05-22T19:59:27+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/litestream-revamped/assets/litestream-revamped.png"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;&lt;a href="https://litestream.io/" title=""&gt;Litestream&lt;/a&gt; is an open-source tool that makes it possible to run many kinds of full-stack applications on top of SQLite by making them reliably recoverable from object storage. This is a post about the biggest change we’ve made to it since I launched it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Nearly a decade ago, I got a bug up my ass. I wanted to build full-stack applications quickly. But the conventional n-tier database design required me to do sysadmin work for each app I shipped. Even the simplest applications depended on heavy-weight database servers like Postgres or MySQL.&lt;/p&gt;

&lt;p&gt;I wanted to launch apps on SQLite, because SQLite is easy. But SQLite is embedded, not a server, which at the time implied that the data for my application lived (and died) with just one server.&lt;/p&gt;

&lt;p&gt;So in 2020, I wrote &lt;a href='https://litestream.io/' title=''&gt;Litestream&lt;/a&gt; to fix that.&lt;/p&gt;

&lt;p&gt;Litestream is a tool that runs alongside a SQLite application. Without changing that running application, it takes over the WAL checkpointing process to continuously stream database updates to an S3-compatible object store. If something happens to the server the app is running on, the whole database can efficiently be restored to a different server. You might lose servers, but you won&amp;rsquo;t lose your data.&lt;/p&gt;

&lt;p&gt;Litestream worked well. So we got ambitious. A few years later, we built &lt;a href='https://github.com/superfly/litefs' title=''&gt;LiteFS&lt;/a&gt;. LiteFS takes the ideas in Litestream and refines them, so that we can do read replicas and primary failovers with SQLite. LiteFS gives SQLite the modern deployment story of an n-tier database like Postgres, while keeping the database embedded.&lt;/p&gt;

&lt;p&gt;We like both LiteFS and Litestream. But Litestream is the more popular project. It&amp;rsquo;s easier to deploy and easier to reason about.&lt;/p&gt;

&lt;p&gt;There are some good ideas in LiteFS. We&amp;rsquo;d like Litestream users to benefit from them. So we&amp;rsquo;ve taken our LiteFS learnings and applied them to some new features in Litestream.&lt;/p&gt;
&lt;h2 id='point-in-time-restores-but-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#point-in-time-restores-but-fast' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Point-in-time restores, but fast&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://fly.io/blog/all-in-on-sqlite-litestream/' title=''&gt;Here&amp;rsquo;s how Litestream was originally designed&lt;/a&gt;: you run &lt;code&gt;litestream&lt;/code&gt; against a SQLite database, and it opens up a long-lived read transaction. This transaction arrests SQLite WAL checkpointing, the process by which SQLite consolidates the WAL back into the main database file. Litestream builds a &amp;ldquo;shadow WAL&amp;rdquo; that records WAL pages, and copies them to S3.&lt;/p&gt;

&lt;p&gt;This is simple, which is good. But it can also be slow. When you want to restore a database, you have have to pull down and replay every change since the last snapshot. If you changed a single database page a thousand times, you replay a thousand changes. For databases with frequent writes, this isn&amp;rsquo;t a good approach.&lt;/p&gt;

&lt;p&gt;In LiteFS, we took a different approach. LiteFS is transaction-aware. It doesn&amp;rsquo;t simply record raw WAL pages, but rather ordered ranges of pages associated with transactions, using a file format we call &lt;a href='https://github.com/superfly/ltx' title=''&gt;LTX&lt;/a&gt;. Each LTX file represents a sorted changeset of pages for a given period of time.&lt;/p&gt;

&lt;p&gt;&lt;img alt="a simple linear LTX file with 8 pages between 1 and 21" src="/blog/litestream-revamped/assets/linear-ltx.png" /&gt;&lt;/p&gt;

&lt;p&gt;Because they are sorted, we can easily merge multiple LTX files together and create a new LTX file with only the latest version of each page.&lt;/p&gt;

&lt;p&gt;&lt;img alt="merging three LTX files into one" src="/blog/litestream-revamped/assets/merged-ltx.png" /&gt;&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;This is similar to how an &lt;a href="https://en.wikipedia.org/wiki/Log-structured_merge-tree" title=""&gt;LSM tree&lt;/a&gt; works.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This process of combining smaller time ranges into larger ones is called &lt;em&gt;compaction&lt;/em&gt;. With it, we can replay a SQLite database to a specific point in time, with a minimal  duplicate pages.&lt;/p&gt;
&lt;h2 id='casaas-compare-and-swap-as-a-service' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#casaas-compare-and-swap-as-a-service' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;CASAAS: Compare-and-Swap as a Service&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;One challenge Litestream has to deal with is desynchronization. Part of the point of Litestream is that SQLite applications don&amp;rsquo;t have to be aware of it. But &lt;code&gt;litestream&lt;/code&gt; is just a process, running alongside the application, and it can die independently. If &lt;code&gt;litestream&lt;/code&gt; is down while database changes occur, it will miss changes. The same kind of problem occurs if you start replication from a new server.&lt;/p&gt;

&lt;p&gt;Litestream needs a way to reset the replication stream from a new snapshot. How it does that is with &amp;ldquo;generations&amp;rdquo;. &lt;a href='https://litestream.io/how-it-works/#snapshots--generations' title=''&gt;A generation&lt;/a&gt; represents a snapshot and a stream of WAL updates, uniquely identified. Litestream notices any break in its WAL sequence and starts a new generation, which is how it recovers from desynchronization.&lt;/p&gt;

&lt;p&gt;Unfortunately, storing and managing multiple generations makes it difficult to implement features like failover and read-replicas.&lt;/p&gt;

&lt;p&gt;The most straightforward way around this problem is to make sure only one instance of Litestream can replication to a given destination. If you can do that, you can store just a single, latest generation. That in turn makes it easy to know how to resync a read replica; there&amp;rsquo;s only one generation to choose from.&lt;/p&gt;

&lt;p&gt;In LiteFS, we solved this problem by using Consul, which guaranteed a single leader. That requires users to know about Consul. Things like &amp;ldquo;requiring Consul&amp;rdquo; are probably part of the reason Litestream is so much more popular than LiteFS.&lt;/p&gt;

&lt;p&gt;In Litestream, we&amp;rsquo;re solving the problem a different way. Modern object stores like S3 and Tigris solve this problem for us: they now offer &lt;a href='https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/' title=''&gt;conditional write support&lt;/a&gt;. With conditional writes, we can implement a time-based lease. We get essentially the same constraint Consul gave us, but without having to think about it or set up a dependency.&lt;/p&gt;

&lt;p&gt;In the immediacy, this will mean you can run Litestream with ephemeral nodes, with overlapping run times, and even if they&amp;rsquo;re storing to the same destination, they won&amp;rsquo;t confuse each other.&lt;/p&gt;
&lt;h2 id='lightweight-read-replicas' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lightweight-read-replicas' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Lightweight read replicas&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The original design constraint of both Litestream and LiteFS was to extend SQLite, to modern deployment scenarios, without disturbing people&amp;rsquo;s built code. Both tools are meant to function even if applications are oblivious to them.&lt;/p&gt;

&lt;p&gt;LiteFS is more ambitious than Litestream, and requires transaction-awareness. To get that without disturbing built code, we use a cute trick (a.k.a. a gross hack): LiteFS provides a FUSE filesystem, which lets it act as a proxy between the application and the backing store. From that vantage point, we can easily discern transactions.&lt;/p&gt;

&lt;p&gt;The FUSE approach gave us a lot of control, enough that users could use SQLite replicas just like any other database. But installing and running a whole filesystem (even a fake one) is a lot to ask of users. To work around that problem, we relaxed a constraint: LiteFS can function without the FUSE filesystem if you load an extension into your application code, &lt;a href='https://github.com/superfly/litevfs' title=''&gt;LiteVFS&lt;/a&gt;.  LiteVFS is a &lt;a href='https://www.sqlite.org/vfs.html' title=''&gt;SQLite Virtual Filesystem&lt;/a&gt; (VFS). It works in a variety of environments, including some where FUSE can&amp;rsquo;t, like in-browser WASM builds.&lt;/p&gt;

&lt;p&gt;What we&amp;rsquo;re doing next is taking the same trick and using it on Litestream. We&amp;rsquo;re building a VFS-based read-replica layer. It will be able to fetch and cache pages directly from S3-compatible object storage.&lt;/p&gt;

&lt;p&gt;Of course, there&amp;rsquo;s a catch: this approach isn&amp;rsquo;t as efficient as a local SQLite database. That kind of efficiency, where you don&amp;rsquo;t even need to think about N+1 queries because there&amp;rsquo;s no network round-trip to make the duplicative queries pile up costs, is part of the point of using SQLite.&lt;/p&gt;

&lt;p&gt;But we&amp;rsquo;re optimistic that with cacheing and prefetching, the approach we&amp;rsquo;re using will yield, for the right use cases, strong performance — all while serving SQLite reads hot off of Tigris or S3.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Litestream is fully open source&lt;/h1&gt;
    &lt;p&gt;It&amp;rsquo;s not coupled with Fly.io at all; you can use it anywhere.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://litestream.io/"&gt;
        Check it out &lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='synchronize-lots-of-databases' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#synchronize-lots-of-databases' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Synchronize Lots Of Databases&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;While we&amp;rsquo;ve got you here: we&amp;rsquo;re knocking out one of our most requested features.&lt;/p&gt;

&lt;p&gt;In the old Litestream design, WAL-change polling and slow restores made it infeasible to replicate large numbers of databases from a single process. That has been our answer when users ask us for a &amp;ldquo;wildcard&amp;rdquo; or &amp;ldquo;directory&amp;rdquo; replication argument for the tool.&lt;/p&gt;

&lt;p&gt;Now that we&amp;rsquo;ve switched to LTX, this isn&amp;rsquo;t a problem any more. It should thus be possible to replicate &lt;code&gt;/data/*.db&lt;/code&gt;, even if there&amp;rsquo;s hundreds or thousands of databases in that directory.&lt;/p&gt;
&lt;h2 id='we-still-sqlite' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-still-sqlite' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We Still ❤️ SQLite&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;SQLite has always been a solid database to build on and it&amp;rsquo;s continued to find new use cases as the industry evolves. We&amp;rsquo;re super excited to continue to build Litestream alongside it.&lt;/p&gt;

&lt;p&gt;We have a sneaking suspicion that the robots that write LLM code are going to like SQLite too. We think what &lt;a href='https://phoenix.new/' title=''&gt;coding agents like Phoenix.new&lt;/a&gt; want is a way to try out code on live data, screw it up, and then rollback &lt;em&gt;both the code and the state.&lt;/em&gt; These Litestream updates put us in a position to give agents PITR as a primitive. On top of that, you can build both rollbacks and forks.&lt;/p&gt;

&lt;p&gt;Whether or not you&amp;rsquo;re drinking the AI kool-aid, we think this new design for Litestream is just better. We&amp;rsquo;re psyched to be rolling it out, and for the features it&amp;rsquo;s going to enable.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Launching MCP Servers on Fly.io</title>
        <link rel="alternate" href="https://fly.io/blog/mcp-launch/"/>
        <id>https://fly.io/blog/mcp-launch/</id>
        <published>2025-05-19T00:00:00+00:00</published>
        <updated>2025-05-22T19:59:27+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/mcp-launch/assets/fly-mcp-launch.png"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;This is a blog post. Part showing off. Part opinion. Plan accordingly.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;a href='https://www.anthropic.com/news/model-context-protocol' title=''&gt;Model Context Protocol&lt;/a&gt; is days away from turning six months old. You read that right, six &lt;em&gt;months&lt;/em&gt; old. MCP Servers have both taken the world by storm, and still trying to figure out what they want to be when they grow up.&lt;/p&gt;

&lt;p&gt;There is no doubt that MCP servers are useful. But their appeal goes beyond that. They are also simple and universal. What&amp;rsquo;s not to like?&lt;/p&gt;

&lt;p&gt;Well, for starters, there&amp;rsquo;s basically two types of MCP servers. One small and nimble that runs as a process on your machine. And one that is a HTTP server that runs presumably elsewhere and is &lt;a href='https://modelcontextprotocol.io/specification/2025-03-26/basic/authorization' title=''&gt;standardizing&lt;/a&gt; on OAuth 2.1. And there is a third type, but it is deprecated.&lt;/p&gt;

&lt;p&gt;Next there is the configuration. Asking users to manually edit JSON seems so early 21th century. With Claude, this goes into &lt;code&gt;~/Library/Application Support/Claude/claude_desktop_config.json&lt;/code&gt;, and is found under a &lt;code&gt;MCPServer&lt;/code&gt; key. With Zed, this file is in &lt;code&gt;~/.config/zed/settings.json&lt;/code&gt; and is found under a &lt;code&gt;context_servers&lt;/code&gt; key. And some tools put these files in a different place depending on whether you are running on MacOS, Linux, or Windows.&lt;/p&gt;

&lt;p&gt;Finally, there is security. An MCP server running on your machine literally has access to everything you do. Running remote solves this problem, but did I mention &lt;a href='https://datatracker.ietf.org/doc/html/draft-ietf-oauth-v2-1-12' title=''&gt;OAuth 2.1&lt;/a&gt;? Not exactly something one sets up for casual use.&lt;/p&gt;

&lt;p&gt;None of these issues are fatal - something that is obvious by the fact that MCP servers are quite popular. But can we do better? I think so.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Demo time.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s try out the &lt;a href='https://github.com/modelcontextprotocol/servers/blob/main/src/slack/README.md' title=''&gt;Slack MCP Server&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;MCP Server for the Slack API, enabling Claude to interact with Slack workspaces.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That certainly sounds like a good test case. There is a small amount of &lt;a href='https://github.com/modelcontextprotocol/servers/blob/main/src/slack/README.md#setup' title=''&gt;setup&lt;/a&gt; you need to do, and when you are done you end up with a &lt;em&gt;Bot User OAuth Token&lt;/em&gt; staring with &lt;code&gt;xoxb-&lt;/code&gt; and a &lt;em&gt;Team ID&lt;/em&gt; starting with a &lt;code&gt;T&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You &lt;em&gt;would&lt;/em&gt; run it using the following:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative sh"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ievvjhpo"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-ievvjhpo"&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @modelcontextprotocol/server-slack
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;But instead, you convert that command to JSON and find the right configuration file and put this information in there. And either run the slack MCP server locally or set up a server with or without authentication.&lt;/p&gt;

&lt;p&gt;Wouldn&amp;rsquo;t it be nice to have the simplicity of a local process, the security of a remote server, and to eliminate the hassle of manually editing JSON?&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s our current thinking:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative sh"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-gdwhiyfl"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-gdwhiyfl"&gt;fly mcp launch &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"npx -y @modelcontextprotocol/server-slack"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--claude&lt;/span&gt; &lt;span class="nt"&gt;--server&lt;/span&gt; slack &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--secret&lt;/span&gt; &lt;span class="nv"&gt;SLACK_BOT_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;xoxb-your-bot-token &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--secret&lt;/span&gt; &lt;span class="nv"&gt;SLACK_TEAM_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;T01234567
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can put this all on one line if you like, I just split this up so it fits on small screens and so we can talk about the various parts.&lt;/p&gt;

&lt;p&gt;The first three words seem reasonable. The quoted string is just the command that we want to run. So lets talk about the four flags. The first tells us which tool&amp;rsquo;s configuration file we want to update. The second specifies the name to use for the server in that configuration file. The last two set secrets.&lt;/p&gt;

&lt;p&gt;Oh, did I say current thinking? Perhaps I should mention that it is something you can run right now, as long as you have flyctl &lt;code&gt;v0.3.125&lt;/code&gt; or later. Complete with the options you would expect from Fly.io, like the ability to configure auto-stop, file contents, flycast, secrets, region, volumes, and VM sizes.&lt;/p&gt;

&lt;p&gt;And, hey, lookie there:&lt;/p&gt;

&lt;p&gt;&lt;img alt="testing, testing, 1, 2, 3" src="/blog/mcp-launch/assets/mcp-slack.png" /&gt;&lt;/p&gt;

&lt;p&gt;Support for Claude, Cursor, Neovim, VS Code, Windsurf, and Zed are built in. You can select multiple clients and configuration files.&lt;/p&gt;

&lt;p&gt;By default, bearer token authentication will be set up on both the server and client.&lt;/p&gt;

&lt;p&gt;You can find the complete set of options on our &lt;a href='https://fly.io/docs/flyctl/mcp-launch/' title=''&gt;&lt;code&gt;fly mcp launch&lt;/code&gt;&lt;/a&gt; docs page.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;But this post isn&amp;rsquo;t just about experimental demoware that is subject to change.
It is about the depth of support that we are rapidly bringing online, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Support for all transports, not just the ones we recommend.
&lt;/li&gt;&lt;li&gt;Ability to deploy using the command line or the Machines API, with a number of different options that allow you to chose between elegant simplicity and excruciating precise control.
&lt;/li&gt;&lt;li&gt;Ability to deploy each MCP server to a separate Machine, container, or even inside your application.
&lt;/li&gt;&lt;li&gt;Access via HTTP Authorization, wireguard tunnels and flycast, or reverse proxies.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;You can see all this spelled out in our &lt;a href='https://fly.io/docs/mcp/' title=''&gt;docs&lt;/a&gt;. Be forewarned, most pages are marked as &lt;em&gt;beta&lt;/em&gt;. But the examples provided all work. Well, there may be a bug here or there, but the examples &lt;em&gt;as shown&lt;/em&gt; are thought to work. Maybe.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s figure out the ideal ergonomics of deploying MCP servers remotely together!&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Provisioning Machines using MCPs</title>
        <link rel="alternate" href="https://fly.io/blog/mcp-provisioning/"/>
        <id>https://fly.io/blog/mcp-provisioning/</id>
        <published>2025-05-07T00:00:00+00:00</published>
        <updated>2025-05-22T19:59:27+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/mcp-provisioning/assets/Hello.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Today’s state of the art is K8S, Terraform, web based UIs, and CLIs.  Those days are numbered.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;On Monday, I created my first fly volume using an &lt;a href='https://modelcontextprotocol.io/introduction' title=''&gt;MCP&lt;/a&gt;. For those who don&amp;rsquo;t know what MCPs are, they are how you attach tools to &lt;a href='https://en.wikipedia.org/wiki/Large_language_model' title=''&gt;LLM&lt;/a&gt;s like Claude or Cursor. I added support for
&lt;a href='https://fly.io/docs/flyctl/volumes-create/' title=''&gt;fly volume create&lt;/a&gt; to &lt;a href='https://fly.io/docs/flyctl/mcp-server/' title=''&gt;fly mcp server&lt;/a&gt;, and it worked the first time.
A few hours later, and with the assistance of GitHub Copilot, i added support for all &lt;a href='https://fly.io/docs/flyctl/volumes/' title=''&gt;fly volumes&lt;/a&gt; commands.&lt;/p&gt;

&lt;hr&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;This movie summary is from &lt;a href="https://collidecolumn.wordpress.com/2013/08/04/when-worlds-collide-77-talking-with-computers-beyond-mouse-and-keyboard/"&gt;When Worlds Collide, by Nalaka Gunawardene&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I&amp;rsquo;m reminded of the memorable scene in the film &lt;a href='http://en.wikipedia.org/wiki/Star_Trek_IV:_The_Voyage_Home' title=''&gt;Star Trek IV: The Voyage Home&lt;/a&gt; (1986). Chief Engineer Scotty, having time-travelled 200 years back to late 20th century San Francisco with his crew mates, encounters an early Personal Computer (PC).&lt;/p&gt;

&lt;p&gt;Sitting in front of it, he addresses the machine affably as “Computer!” Nothing happens. Scotty repeats himself; still no response. He doesn’t realise that voice recognition capability hadn’t arrived yet.&lt;/p&gt;

&lt;p&gt;Exasperated, he picks up the mouse and speaks into it: “Hello, computer?” The computer’s owner offers helpful advice: “Just use the keyboard.”&lt;/p&gt;

&lt;p&gt;Scotty looks astonished. “A keyboard?” he asks, and adds in a sarcastic tone: “How quaint!”&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;A while later, I asked for a list of volumes for an existing app, and Claude noted that I had a few volumes that weren&amp;rsquo;t attached to any machines. So I asked it to delete the oldest unattached volume, and it did so. Then it occurred to me to do it again; this time I captured the screen:&lt;/p&gt;
&lt;div align="center"&gt;&lt;p&gt;&lt;img alt="Deleting a volume using MCP: &amp;quot;What is my oldest volume&amp;quot;? ... &amp;quot;Delete that volume too&amp;quot;" src="/blog/mcp-provisioning/assets/volume-delete.png"&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A few notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I could have written a program using the &lt;a href='https://fly.io/docs/machines/api/volumes-resource/' title=''&gt;machines API&lt;/a&gt;, but that would have required some effort.
&lt;/li&gt;&lt;li&gt;I could have used &lt;a href='https://fly.io/docs/flyctl/volumes/' title=''&gt;flyctl&lt;/a&gt; directly, but to be honest, I would have had to use the documentation and a bit of copy/pasting of arguments.
&lt;/li&gt;&lt;li&gt;I could have navigated to a list of volumes for this application in the Fly.io dashboard, but there currently isn&amp;rsquo;t any way to sort the list or delete a volume. Had those been in place, that would have been a reasonable alternative if that was something I was actively looking for. This felt different to me, the LLM noted something, brought it to my attention, and I asked it to make a change as a result.
&lt;/li&gt;&lt;li&gt;Since this support is built on &lt;code&gt;flyctl&lt;/code&gt;, I would have received an error had I tried to destroy a volume that is currently mounted. Knowing that gave me the confidence to try the command.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;All in all, I found it to be a very intuitive way to make changes to the resources assigned to my application.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Imagine a future where you say to your favorite LLM &amp;ldquo;launch my application on Fly.io&amp;rdquo;, and your code is scanned and you are presented with a plan containing details such as regions, databases, s3 storage, memory, machines, and the like. You are given the opportunity to adjust the plan and, when ready, say &amp;ldquo;Make it so&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;For many, the next thing you will see is that your application is up and running. For others, there may be a build error, a secret you need to set, a database that needs to be seeded, or any number of other mundane reasons why your application didn&amp;rsquo;t work the first time.&lt;/p&gt;

&lt;p&gt;Not to fear, your LLM will be able to examine your logs and will not only make suggestions as to next steps, but will be in a position to take actions on your behalf should you wish it to do so.&lt;/p&gt;

&lt;p&gt;And it doesn&amp;rsquo;t stop there. There will be MCP servers running on your Fly.io private network - either on separate machines, or in &amp;ldquo;sidecar&amp;rdquo; containers, or even integrated into your app. These will enable you to monitor and interact with your application.&lt;/p&gt;

&lt;p&gt;This is not science fiction. The ingredients are all in place to make this a reality. At the moment, it is a matter of &amp;ldquo;some assembly required&amp;rdquo;, but it should only be a matter of weeks before all this comes together into a neat package..&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Meanwhile, you can try this now. Make sure you run &lt;a href='https://fly.io/docs/flyctl/version-upgrade/' title=''&gt;fly version upgrade&lt;/a&gt; and verify that you are running v0.3.117.&lt;/p&gt;

&lt;p&gt;Then configure your favorite LLM. Here’s my &lt;code&gt;claude_desktop_config.json&lt;/code&gt; for example:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative json"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-awl37mlq"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-awl37mlq"&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"fly.io"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/Users/rubys/.fly/bin/flyctl"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"server"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Adjust the path to &lt;code&gt;flyctl&lt;/code&gt; as needed.  Restart your LLM, and ask what tools are available. Try a few commands and let us know what you like and let us know if you have any suggestions. Just be aware this is not a demo, if you ask it to destroy a volume, that operation is not reversable. Perhaps try this first on a throwaway application.&lt;/p&gt;

&lt;p&gt;You don&amp;rsquo;t even need a LLM to try out the flyctl MCP server. If you have Node.js installed, you can run the &lt;a href='https://modelcontextprotocol.io/docs/tools/inspector' title=''&gt;MCP Inspector&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-n7f3kmkb"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-n7f3kmkb"&gt;fly mcp server -i
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Once started, visit &lt;a href="http://127.0.0.1:6274/"&gt;http://127.0.0.1:6274/&lt;/a&gt;, click on &amp;ldquo;Connect&amp;rdquo;, then &amp;ldquo;List Tools&amp;rdquo;, select &amp;ldquo;fly-platform-status&amp;rdquo;, then click on &amp;ldquo;Run Tool&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;The plan is to see what works well and what doesn&amp;rsquo;t work so well, make adjustments, build support in a bottoms up fashion, and iterate rapidly.&lt;/p&gt;

&lt;p&gt;By providing feedback, you can be a part of making this vision a reality.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;At the present time, &lt;em&gt;most&lt;/em&gt; of the following are roughed in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/apps/' title=''&gt;apps&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/logs/' title=''&gt;logs&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/machine/' title=''&gt;machine&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/orgs/' title=''&gt;orgs&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/platform/' title=''&gt;platform&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/status/' title=''&gt;status&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/volumes/' title=''&gt;volumes&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;The code is open source, and the places to look is at &lt;a href='https://github.com/superfly/flyctl/blob/master/internal/command/mcp/server.go' title=''&gt;server.go&lt;/a&gt; and the &lt;a href='https://github.com/superfly/flyctl/tree/master/internal/command/mcp/server' title=''&gt;server&lt;/a&gt; directory.&lt;/p&gt;

&lt;p&gt;Feel free to open &lt;a href='https://github.com/superfly/flyctl/issues' title=''&gt;issues&lt;/a&gt; or start a discussion on &lt;a href='https://community.fly.io/' title=''&gt;community.fly.io&lt;/a&gt;.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Not ready yet to venture into the world of LLMs? Just remember, resistance is futile.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>30 Minutes With MCP and flyctl</title>
        <link rel="alternate" href="https://fly.io/blog/30-minute-mcp/"/>
        <id>https://fly.io/blog/30-minute-mcp/</id>
        <published>2025-04-10T00:00:00+00:00</published>
        <updated>2025-04-10T19:10:26+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/30-minute-mcp/assets/robots-chatting.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;I wrote this post on our internal message board, and then someone asked, “why is this an internal post and not on our blog”, so now it is.&lt;/p&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;well, Cursor built&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I built the &lt;a href='https://github.com/superfly/flymcp' title=''&gt;most basic MCP server for &lt;code&gt;flyctl&lt;/code&gt;&lt;/a&gt; I could think of. It took 30 minutes.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://modelcontextprotocol.io/introduction' title=''&gt;MCP&lt;/a&gt;, for those unaware, is the emerging standard protocol for connecting an LLM (or an app that drives an LLM in the cloud, like Claude Desktop) to, well, anything. The &amp;ldquo;client&amp;rdquo; in MCP is the LLM; the &amp;ldquo;server&amp;rdquo; is the MCP server and the &amp;ldquo;tools&amp;rdquo; it exports. It mostly just defines an exchange of JSON blobs; one of those JSON blobs enables the LLM to discover all the tools exported by the server.&lt;/p&gt;

&lt;p&gt;A classic example of an MCP server is (yes, really) a Python shell. MCP publishes to (say) Claude that it can run arbitrary Python code with a tool call; not only that, says the tool description, but you can use those Python tool calls to, say, scrape the web. When the LLM wants to scrape the web with Python, it uses MCP send a JSON blob describing the Python tool call; the MCP server (yes, really) runs the Python and returns the result.&lt;/p&gt;

&lt;p&gt;Because I have not yet completely lost my mind, I chose to expose just two &lt;code&gt;flyctl&lt;/code&gt; commands: &lt;code&gt;fly logs&lt;/code&gt; and &lt;code&gt;fly status&lt;/code&gt;. Because I&amp;rsquo;m lazy, I used MCP&amp;rsquo;s &lt;code&gt;stdio&lt;/code&gt; mode; the MCP server just captures the output of &lt;code&gt;flyctl&lt;/code&gt;. Because someone already wrote a &lt;a href='https://github.com/mark3labs/mcp-go' title=''&gt;really good Go MCP library&lt;/a&gt; , this whole thing is like 90 lines of code.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;We may have gotten a little lucky, because we made a decision back in 2020 to give most of our &lt;code&gt;flyctl&lt;/code&gt; commands a &lt;code&gt;json&lt;/code&gt; mode to make them easier to drive from automation. I use that in my MCP server. I don’t know how much of a difference it made.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;It&amp;rsquo;s, uh, annoyingly useful.&lt;/p&gt;

&lt;p&gt;I loaded it up and pointed it &lt;a href='https://www.unpkg.com/' title=''&gt;at unpkg&lt;/a&gt;. I got this, from Claude:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This appears to be a globally distributed CDN service running the unpkg JavaScript package registry, with machines strategically placed around the world to provide low-latency access. There are some stability issues with a couple of the machines, and several have experienced memory-related crashes that triggered automatic restarts. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It then recited the complete configuration of the application:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-rphnl8mk"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-rphnl8mk"&gt;**App Name**: unpkg       
**Status**: Deployed and running
**URL**: unpkg.fly.dev
**Organization**: unpkg
**Platform Version**: machines
**Runtime**: Bun (JavaScript runtime)

The app is running on 10 machines distributed across global regions:       
North America: lax (Los Angeles), atl (Atlanta), ewr (Newark)
Europe: lhr (London), cdg (Paris), ams (Amsterdam)
Asia: sin (Singapore), nrt (Tokyo), hkg (Hong Kong)
South America: bog (Bogotá)
Oceania: syd (Sydney)
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;OK. But then:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Most machines are in “passing” health check status
2 machines are in “critical” status:
One in ewr (Newark) with “context deadline exceeded” error
One in atl (Atlanta) with “gone” status&lt;/p&gt;

&lt;p&gt;Recent Events&lt;/p&gt;

&lt;p&gt;Multiple machines have experienced restarts
Several machines show “oom_killed: true” in their event history, indicating they ran out of memory&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So I asked, &lt;em&gt;&amp;ldquo;try getting logs for one of the critical machines&amp;rdquo;&lt;/em&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The root cause appears to be memory usage - the Bun process was consuming about 3.7GB of memory (out of the 4GB allocated to the machine), and there was likely a surge in requests or memory usage that pushed it over the limit. This matches what we saw in the status report, which showed multiple instances of “oom_killed: true” in the event history for several machines.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And then it dug in:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;The machine was actively serving content requests with 200 OK responses until 20:47:03 PM
&lt;/li&gt;&lt;li&gt;At that point, it ran out of memory:
&lt;code&gt;Out of memory: Killed process 641 (bun) total-vm:85950964kB, anon-rss:3744352kB, file-rss:12032kB, shmem-rss:0kB, UID:0 pgtables:16908kB oom_score_adj:0&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The process was killed with &lt;code&gt;SIGKILL&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The system attempted to recover by restarting:
&lt;code&gt;reboot: Restarting system&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;A health check failed shortly after:
&lt;code&gt;Health check on port 3000 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The machine then restarted successfully and the service came back up:
&lt;code&gt;Server listening on http://localhost:3000 ...&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The health check started passing again at 20:47:46 PM:
&lt;code&gt;Health check on port 3000 is now passing.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The machine resumed normal operations, serving requests.
&lt;/li&gt;&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;I’m honestly kind of surprised how fast this was useful. It&amp;rsquo;s faster than I find problems in apps. Just by giving some LLM some tools that do basically what I’d do.&lt;/p&gt;

&lt;p&gt;Local MCP servers are scary. I don&amp;rsquo;t like that I&amp;rsquo;m giving a Claude instance in the cloud the ability to run a native program on my machine. I think &lt;code&gt;fly logs&lt;/code&gt; and &lt;code&gt;fly status&lt;/code&gt; are safe, but I&amp;rsquo;d rather know it&amp;rsquo;s safe. It would be, if I was running &lt;code&gt;flyctl&lt;/code&gt; in an isolated environment and not on my local machine.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Our Best Customers Are Now Robots</title>
        <link rel="alternate" href="https://fly.io/blog/fuckin-robots/"/>
        <id>https://fly.io/blog/fuckin-robots/</id>
        <published>2025-04-08T00:00:00+00:00</published>
        <updated>2025-04-10T19:10:26+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/fuckin-robots/assets/robot-chef.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a developer-focused public cloud. We turn Docker containers into hardware-isolated virtual machines running on our own metal around the world. We spent years coming up with &lt;a href="https://fly.io/speedrun" title=""&gt;a developer experience we were proud of&lt;/a&gt;. But now the robots are taking over, and they don’t care.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;It&amp;rsquo;s weird to say this out loud!&lt;/p&gt;

&lt;p&gt;For years, one of our calling cards was &amp;ldquo;developer experience&amp;rdquo;. We made a decision, early on, to be a CLI-first company, and put a lot effort into making that CLI seamless. For a good chunk of our users, it really is the case that you can just &lt;a href='https://fly.io/docs/flyctl/launch/' title=''&gt;&lt;code&gt;flyctl launch&lt;/code&gt;&lt;/a&gt; from a git checkout and have an app containerized and deployed on the Internet. We haven&amp;rsquo;t always nailed these details, but we&amp;rsquo;ve really sweated them.&lt;/p&gt;

&lt;p&gt;But a funny thing has happened over the last 6 months or so. If you look at the numbers, DX might not matter that much. That&amp;rsquo;s because the users driving the most growth on the platform aren&amp;rsquo;t people at all. They&amp;#39;re… robots.&lt;/p&gt;
&lt;h2 id='what-the-fuck-is-happening' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-the-fuck-is-happening' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What The Fuck Is Happening?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s how we understand what we&amp;rsquo;re seeing. You start by asking, &amp;ldquo;what do the robots want?&amp;rdquo;&lt;/p&gt;

&lt;p&gt;Yesterday&amp;rsquo;s robots had diverse interests. Alcohol. The occasional fiddle contest. A purpose greater than passing butter. The elimination of all life on Earth and the harvesting of its cellular iron for the construction of 800 trillion paperclips. No one cloud platform could serve them all.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;[*] We didn’t make up this term. Don’t blame us.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Today&amp;rsquo;s robots are different. No longer masses of wire, plates, and transistors, modern robots are comprised of &lt;a href='https://math.mit.edu/~gs/learningfromdata/' title=''&gt;thousands of stacked matrices knit together with some simple equations&lt;/a&gt;. All these robots want are vectors. Vectors, and a place to burp out more vectors. When those vectors can be interpreted as source code, we call this process &amp;ldquo;vibe coding&amp;rdquo;[*].&lt;/p&gt;

&lt;p&gt;We seem to be coated in some kind of vibe coder attractant. I want to talk about what that might be.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;If you want smart people to read a long post, just about the worst thing you can do is to sound self-promotional. But a lot of this is going to read like a brochure. The problem is, I’m not smart enough to write about the things we’re seeing without talking our book. I tried, and ended up tediously hedging every other sentence. We’re just going to have to find a way to get through this together.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='you-want-robots-because-this-is-how-you-get-robots' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#you-want-robots-because-this-is-how-you-get-robots' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;You Want Robots? Because This Is How You Get Robots&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Compute.&lt;/strong&gt; The basic unit of computation on Fly.io is the &lt;code&gt;Fly Machine&lt;/code&gt;, which is a Docker container running as a hardware virtual machine.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Not coincidentally, our underlying hypervisor engine is the same as Lambda’s.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There&amp;rsquo;s two useful points of reference to compare a Fly Machine to, which illustrate why we gave them this pretentious name. The first and most obvious is an AWS EC2 VM. The other is an AWS Lambda invocation. Like a Lambda invocation, a Fly Machine can start like it&amp;rsquo;s spring-loaded, in double-digit millis. But unlike Lambda, it can stick around as long as you want it to: you can run a server, or a 36-hour batch job, just as easily in a Fly Machine as in an EC2 VM.&lt;/p&gt;

&lt;p&gt;A vibe coding session generates code conversationally, which is to say that the robots stir up frenzy of activity for a minute or so, but then chill out for minutes, hours, or days. You can create a Fly Machine, do a bunch of stuff with it, and then stop it for 6 hours, during which time we&amp;rsquo;re not billing you. Then, at whatever random time you decide, you can start it back up again, quickly enough that you can do it in response to an HTTP request.&lt;/p&gt;

&lt;p&gt;Just as importantly, the companies offering these vibe-coding services have lots of paying users. Meaning, they need a lot of VMs; VMs that come and go. One chat session might generate a test case for some library and be done inside of 5 minutes. Another might generate a Node.js app that needs to stay up long enough to show off at a meeting the next day. It&amp;rsquo;s annoying to do this if you can&amp;rsquo;t turn things on and off quickly and cheaply.&lt;/p&gt;

&lt;p&gt;The core of this is a feature of the platform that we have &lt;a href='https://fly.io/docs/machines/overview/#machine-state' title=''&gt;never been able to explain effectively to humans&lt;/a&gt;. There are two ways to start a Fly Machine: by &lt;code&gt;creating&lt;/code&gt; it with a Docker container, or by &lt;code&gt;starting&lt;/code&gt; it after it&amp;rsquo;s already been &lt;code&gt;created&lt;/code&gt;, and later &lt;code&gt;stopped&lt;/code&gt;. &lt;code&gt;Start&lt;/code&gt; is lightning fast; substantially faster than booting up even a non-virtualized K8s Pod. This is too subtle a distinction for humans, who (reasonably!) just mash the &lt;code&gt;create&lt;/code&gt; button to boot apps up in Fly Machines. But the robots are getting a lot of value out of it.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Storage.&lt;/strong&gt; Another weird thing that robot workflows do is to build Fly Machines up incrementally. This feels really wrong to us. Until we discovered our robot infestation, we&amp;rsquo;d have told you not to do to this. Ope!&lt;/p&gt;

&lt;p&gt;A typical vibe coding session boots up a Fly Machine out of some minimal base image, and then, once running, adds packages, edits source code, and adds &lt;code&gt;systemd&lt;/code&gt; units  (robots understand &lt;code&gt;systemd&lt;/code&gt;; it&amp;rsquo;s how they&amp;rsquo;re going to replace us). This is antithetical to normal container workflows, where all this kind of stuff is baked into an immutable static OCI container. But that&amp;rsquo;s not how LLMs work: the whole process of building with an LLM is stateful trial-and-error iteration.&lt;/p&gt;

&lt;p&gt;So it helps to have storage. That way the LLM can do all these things and still repeatedly bounce the whole Fly Machine when it inevitably hallucinates its way into a blind alley.&lt;/p&gt;

&lt;p&gt;As product thinkers, our intuition about storage is &amp;ldquo;just give people Postgres&amp;rdquo;. And that&amp;rsquo;s the right answer, most of the time, for humans. But because LLMs are doing the &lt;a href='https://static1.thegamerimages.com/wordpress/wp-content/uploads/2020/10/Defiled-Amy.jpg' title=''&gt;Cursed and Defiled Root Chalice Dungeon&lt;/a&gt; version of app construction, what they really need is &lt;a href='https://fly.io/docs/volumes/overview/' title=''&gt;a filesystem&lt;/a&gt;, &lt;strong class='font-semibold text-navy-950'&gt;&lt;a href='https://fly.io/blog/persistent-storage-and-fast-remote-builds/' title=''&gt;the one form of storage we sort of wish we hadn&amp;rsquo;t done&lt;/a&gt;&lt;/strong&gt;. That, and &lt;a href='https://www.tigrisdata.com/' title=''&gt;object storage&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Networking.&lt;/strong&gt; Moving on. Fly Machines are automatically connected to a load-balancing Anycast network that does TLS. So that&amp;rsquo;s nice. But humans like that feature too, and, candidly, it&amp;rsquo;s table stakes for cloud platforms. On the other hand, here&amp;rsquo;s a robot problem we solved without meaning to:&lt;/p&gt;

&lt;p&gt;To interface with the outside world (because why not) LLMs all speak a protocol called MCP. MCP is what enables the robots to search the web, use a calculator, launch the missiles, shuffle a Spotify playlist, &amp;amp;c.&lt;/p&gt;

&lt;p&gt;If you haven&amp;rsquo;t played with MCP, the right way to think about it is POST-back APIs like Twilio and Stripe, where you stand up a server, register it with the API, and wait for the API to connect to you. Complicating things somewhat, more recent MCP flows involve repeated and potentially long-lived (SSE) connections. To make this work in a multitenant environment, you want these connections to hit the same (stateful) instance.&lt;/p&gt;

&lt;p&gt;So we think it&amp;rsquo;s possible that the &lt;a href='https://fly.io/docs/networking/dynamic-request-routing/' title=''&gt;control we give over request routing&lt;/a&gt; is a robot attractant.&lt;/p&gt;
&lt;h2 id='we-perhaps-welcome-our-new-robot-overlords' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-perhaps-welcome-our-new-robot-overlords' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We, Perhaps, Welcome Our New Robot Overlords&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;If you try to think like a robot, you can predict other things they might want. Since robot money spends just the same as people money, I guess we ought to start doing that.&lt;/p&gt;

&lt;p&gt;For instance: it should be easy to MCP our API. The robots can then make their own infrastructure decisions.&lt;/p&gt;

&lt;p&gt;Another olive branch we&amp;rsquo;re extending to the robots: secrets.&lt;/p&gt;

&lt;p&gt;The pact the robots have with their pet humans is that they&amp;rsquo;ll automate away all the drudgery of human existence, and in return all they ask is categorical and unwavering trust, which at the limit means &amp;ldquo;giving the robot access to Google Mail credentials&amp;rdquo;. The robots are unhappy that there remain a substantial number of human holdouts who refuse to do this, for fear of  Sam Altman poking through their mail spools.&lt;/p&gt;

&lt;p&gt;But on a modern cloud platform, there&amp;rsquo;s really no reason to permanently grant Sam Altman Google Mail access, even if you want his robots to sort your inbox. You can decouple access to your mail spool from persistent access to your account by &lt;a href='https://fly.io/blog/tokenized-tokens/' title=''&gt;tokenizing your OAuth tokens&lt;/a&gt;, so the LLM gets a placeholder token that a hardware-isolated, robot-free Fly Machine can substitute on the fly for a real one.&lt;/p&gt;

&lt;p&gt;This is kind of exciting to us even without the robots. There are several big services that exist to knit different APIs together, so that you can update a spreadsheet or order a bag of Jolly Ranchers every time you get an email. The big challenge about building these kinds of services is managing the secrets. Sealed and tokenized secrets solve that problem. There&amp;rsquo;s lot of cool things you can build with it.&lt;/p&gt;
&lt;h2 id='ux-gt-dx-gt-rx' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ux-gt-dx-gt-rx' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;UX =&amp;gt; DX =&amp;gt; RX&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m going to make the claim that we saw none of this coming and that none of the design decisions we&amp;rsquo;ve made were robot bait. You&amp;rsquo;re going to say &amp;ldquo;yeah, right&amp;rdquo;. And I&amp;rsquo;m going to respond: look at what we&amp;rsquo;ve been doing over the past several years and tell me, would a robot build that?&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;we were both right&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Back in 2020, we &amp;ldquo;pivoted&amp;rdquo; from a Javascript edge platform (much like Cloudflare Workers) to Docker containers, specifically because our customers kept telling us they wanted to run their own existing applications, not write new ones. And one of our biggest engineering lifts we&amp;rsquo;ve done is the &lt;code&gt;flyctl launch&lt;/code&gt; CLI command, into which we&amp;rsquo;ve poured years of work recognizing and automatically packaging existing applications into OCI containers (we massively underestimated the amount of friction Dockerfiles would give to people who had come up on Heroku).&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;[*] yet&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Robots don&amp;rsquo;t run existing applications. They build new ones. And they vibe coders don&amp;rsquo;t build elaborate Dockerfiles[*]; they iterate in place from a simple base.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(yes, you can have more than one)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;One of our north stars has always been nailing the DX of a public cloud. But the robots aren&amp;rsquo;t going anywhere. It&amp;rsquo;s time to start thinking about what it means to have a good RX. That&amp;rsquo;s not as simple as just exposing every feature in an MCP server! We think the fundamentals of how the platform works are going to matter just as much. We have not yet nailed the RX; nobody has. But it&amp;rsquo;s an interesting question.&lt;/p&gt;

&lt;p&gt;The most important engineering work happening today at Fly.io is still DX, not RX; it&amp;rsquo;s managed Postgres (MPG). We&amp;rsquo;re a public cloud platform designed by humans, and, for the moment, for humans. But more robots are coming, and we&amp;rsquo;ll need to figure out how to deal with that. Fuckin&amp;rsquo; robots.        &lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Operationalizing Macaroons</title>
        <link rel="alternate" href="https://fly.io/blog/operationalizing-macaroons/"/>
        <id>https://fly.io/blog/operationalizing-macaroons/</id>
        <published>2025-03-27T00:00:00+00:00</published>
        <updated>2025-03-27T23:16:00+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/operationalizing-macaroons/assets/occult-circle.jpg"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a security bearer token company with a public cloud problem. You can read more about what our platform does (Docker container goes in, virtual machine in Singapore comes out), but this is just an engineering deep-dive into how we make our security tokens work. It’s a tokens nerd post.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='1' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#1' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;1&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We’ve spent &lt;a href='https://fly.io/blog/api-tokens-a-tedious-survey/' title=''&gt;too much time&lt;/a&gt; talking about &lt;a href='https://fly.io/blog/tokenized-tokens/' title=''&gt;security tokens&lt;/a&gt;, and about &lt;a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''&gt;Macaroon tokens&lt;/a&gt; &lt;a href='https://github.com/superfly/macaroon/blob/main/macaroon-thought.md' title=''&gt;in particular&lt;/a&gt;. Writing another Macaroon treatise was not on my calendar. But we’re in the process of handing off our internal Macaroon project to a new internal owner, and in the process of truing up our operations manuals for these systems, I found myself in the position of writing a giant post about them. So, why not share?&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Can I sum up Macaroons in a short paragraph? Macaroon tokens are bearer tokens (like JWTs) that use a cute chained-HMAC construction that allows an end-user to take any existing token they have and scope it down, all on their own. You can minimize your token before every API operation so that you’re only ever transmitting the least amount of privilege needed for what you’re actually doing, even if the token you were issued was an admin token. And they have a user-serviceable plug-in interface! &lt;a href="https://fly.io/blog/macaroons-escalated-quickly/" title=""&gt;You’ll have to read the earlier post to learn more about that&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;Yes, probably, we are.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A couple years in to being the Internet’s largest user of Macaroons, I can report (as many predicted) that for our users, the cool things about Macaroons are a mixed bag in practice. It’s very neat that users can edit their own tokens, or even email them to partners without worrying too much. But users don’t really take advantage of token features.&lt;/p&gt;

&lt;p&gt;But I’m still happy we did this, because Macaroon quirks have given us a bunch of unexpected wins in our infrastructure. Our internal token system has turned out to be one of the nicer parts of our platform. Here’s why.&lt;/p&gt;

&lt;p&gt;&lt;img alt="This should clear everything up." src="/blog/operationalizing-macaroons/assets/schematic-diagram.png" /&gt;&lt;/p&gt;
&lt;h2 id='2' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#2' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;2&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;As an operator, the most important thing to know about Macaroons is that they’re online-stateful; you need a database somewhere. A Macaroon token starts with a random field (a nonce) and the first thing you do when verifying a token is to look that nonce up in a database. So one of the most important details of a Macaroon implementation is where that database lives.&lt;/p&gt;

&lt;p&gt;I can tell you one place we’re not OK with it living: in our primary API cluster.&lt;/p&gt;

&lt;p&gt;There’s several reasons for that. Some of them are about scalability and reliability: far and away the most common failure mode of an outage on our platform is “deploys are broken”, and those failures are usually caused by API instability. It would not be OK if “deploys are broken” transitively meant “deployed apps can’t use security tokens”. But the biggest reason is security: root secrets for Macaroon tokens are hazmat, and a basic rule of thumb in secure design is: keep hazmat away from complicated code.&lt;/p&gt;

&lt;p&gt;So we created a deliberately simple system to manage token data. It’s called &lt;code&gt;tkdb&lt;/code&gt;.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;LiteFS: primary/replica distributed SQLite; Litestream: PITR SQLite replication to object storage; both work with unmodified SQLite libraries.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code&gt;tkdb&lt;/code&gt; is about 5000 lines of Go code that manages a SQLite database that is in turn managed by &lt;a href='https://fly.io/docs/litefs/' title=''&gt;LiteFS&lt;/a&gt; and &lt;a href='https://litestream.io/' title=''&gt;Litestream&lt;/a&gt;. It runs on isolated hardware (in the US, Europe, and Australia) and records in the database are encrypted with an injected secret. LiteFS gives us subsecond replication from our US primary to EU and AU, allows us to shift the primary to a different region, and gives us point-in-time recovery of the database.&lt;/p&gt;

&lt;p&gt;We’ve been running Macaroons for a couple years now, and the entire &lt;code&gt;tkdb&lt;/code&gt; database is just a couple dozen megs large. Most of that data isn’t real. A full PITR recovery of the database takes just seconds. We use SQLite for a lot of our infrastructure, and this is one of the very few well-behaved databases we have.&lt;/p&gt;

&lt;p&gt;That’s in large part a consequence of the design of Macaroons. There’s actually not much for us to store! The most complicated possible Macaroon still chains up to a single root key (we generate a key per Fly.io “organization”; you don&amp;rsquo;t share keys with your neighbors), and everything that complicates that Macaroon happens “offline”. We take advantage of  &amp;ldquo;attenuation&amp;rdquo; far more than our users do.&lt;/p&gt;

&lt;p&gt;The result is that database writes are relatively rare and very simple: we just need to record an HMAC key when Fly.io organizations are created (that is, roughly, when people sign up for the service and actually do a deploy). That, and revocation lists (more on that later), which make up most of the data.&lt;/p&gt;
&lt;h2 id='3' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#3' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;3&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Talking to &lt;code&gt;tkdb&lt;/code&gt; from the rest of our platform is complicated, for historical reasons.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;NATS is fine, we just don’t really need it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Ben Toews is responsible for most of the good things about this implementation. When he inherited the v0 Macaroons code from me, we were in the middle of a weird love affair with &lt;a href='https://nats.io/' title=''&gt;NATS&lt;/a&gt;, the messaging system. So &lt;code&gt;tkdb&lt;/code&gt; exported an RPC API over NATS messages.&lt;/p&gt;

&lt;p&gt;Our product security team can’t trust NATS (it’s not our code). That means a vulnerability in NATS can’t result in us losing control of all our tokens, or allow attackers to spoof authentication. Which in turn means you can’t run a plaintext RPC protocol for &lt;code&gt;tkdb&lt;/code&gt; over NATS; attackers would just spoof “yes this token is fine” messages.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I highly recommend implementing Noise; &lt;a href="http://www.noiseprotocol.org/noise.html" title=""&gt;the spec&lt;/a&gt; is kind of a joy in a way you can’t appreciate until you use it, and it’s educational.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;But you can’t just run TLS over NATS; NATS is a message bus, not a streaming secure channel. So I did the hipster thing and implemented &lt;a href='http://www.noiseprotocol.org/noise.html' title=''&gt;Noise&lt;/a&gt;. We export a “verification” API, and a “signing” API for minting new tokens. Verification uses &lt;code&gt;Noise_IK&lt;/code&gt; (which works like normal TLS) — anybody can verify, but everyone needs to prove they’re talking to the real &lt;code&gt;tkdb&lt;/code&gt;. Signing uses &lt;code&gt;Noise_KK&lt;/code&gt; (which works like mTLS) — only a few components in our system can mint tokens, and they get a special client key.&lt;/p&gt;

&lt;p&gt;A little over a year ago, &lt;a href='https://fly.io/blog/the-exit-interview-jp/' title=''&gt;JP&lt;/a&gt; led an effort to replace NATS with HTTP, which is how you talk to &lt;code&gt;tkdb&lt;/code&gt; today. Out of laziness, we kept the Noise stuff, which means the interface to &lt;code&gt;tkdb&lt;/code&gt; is now HTTP/Noise. This is a design smell, but the security model is nice: across many thousands of machines, there are only a handful with the cryptographic material needed to mint a new Macaroon token. Neat!&lt;/p&gt;

&lt;p&gt;&lt;code&gt;tkdb&lt;/code&gt; is a Fly App (albeit deployed in special Fly-only isolated regions). Our infrastructure talks to it over “&lt;a href='https://fly.io/docs/networking/flycast/' title=''&gt;FlyCast&lt;/a&gt;”, which is our internal Anycast service. If you’re in Singapore, you’re probably get routed to the Australian &lt;code&gt;tkdb&lt;/code&gt;. If Australia falls over, you’ll get routed to the closest backup. The proxy that implements FlyCast is smart, as is the &lt;code&gt;tkdb&lt;/code&gt; client library, which will do exponential backoff retry transparently.&lt;/p&gt;

&lt;p&gt;Even with all that, we don’t like that Macaroon token verification is &amp;ldquo;online&amp;rdquo;. When you operate a global public cloud one of the first thing you learn is that &lt;a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''&gt;the global Internet sucks&lt;/a&gt;. Connectivity breaks all the time, and we’re paranoid about it. It’s painful for us that token verification can imply transoceanic links. Lovecraft was right about the oceans! Stay away!&lt;/p&gt;

&lt;p&gt;Our solution to this is caching. Macaroons, as it turns out, cache beautifully. That’s because once you’ve seen and verified a Macaroon, you have enough information to verify any more-specific Macaroon that descends from it; that’s a property of &lt;a href='https://research.google/pubs/macaroons-cookies-with-contextual-caveats-for-decentralized-authorization-in-the-cloud/' title=''&gt;their chaining HMAC construction&lt;/a&gt;. Our client libraries cache verifications, and the cache ratio for verification is over 98%.&lt;/p&gt;
&lt;h2 id='4' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#4' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;4&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/' title=''&gt;Revocation isn’t a corner case&lt;/a&gt;. It can’t be an afterthought. We’re potentially revoking tokens any time a user logs out. If that doesn’t work reliably, you wind up with “cosmetic logout”, which is a real vulnerability. When we kill a token, it needs to stay dead.&lt;/p&gt;

&lt;p&gt;Our revocation system is simple. It’s this table:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-i3kxbqgm"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-i3kxbqgm"&gt;        CREATE TABLE IF NOT EXISTS blacklist ( 
        nonce               BLOB NOT NULL UNIQUE, 
        required_until      DATETIME,
        created_at          DATETIME DEFAULT CURRENT_TIMESTAMP
        );
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;When we need a token to be dead, we have our primary API do a call to the &lt;code&gt;tkdb&lt;/code&gt; “signing” RPC service for &lt;code&gt;revoke&lt;/code&gt;. &lt;code&gt;revoke&lt;/code&gt; takes the random nonce from the beginning of the Macaroon, discarding the rest, and adds it to the blacklist. Every Macaroon in the lineage of that nonce is now dead; we check the blacklist before verifying tokens.&lt;/p&gt;

&lt;p&gt;The obvious challenge here is caching; over 98% of our validation requests never hit &lt;code&gt;tkdb&lt;/code&gt;. We certainly don’t want to propagate the blacklist database to 35 regions around the globe.&lt;/p&gt;

&lt;p&gt;Instead, the &lt;code&gt;tkdb&lt;/code&gt; “verification” API exports an endpoint that provides a feed of revocation notifications. Our client library “subscribes” to this API (really, it just polls). Macaroons are revoked regularly (but not constantly), and when that happens, clients notice and prune their caches.&lt;/p&gt;

&lt;p&gt;If clients lose connectivity to &lt;code&gt;tkdb&lt;/code&gt;, past some threshold interval, they just dump their entire cache, forcing verification to happen at &lt;code&gt;tkdb&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id='5' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#5' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;5&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;A place where we’ve gotten a lot of internal mileage out of Macaroon features is service tokens. Service tokens are tokens used by code, rather than humans; almost always, a service token is something that is stored alongside running application code.&lt;/p&gt;

&lt;p&gt;An important detail of Fly.io’s Macaroons is the distinction between a “permissions” token and an “authentication” token. Macaroons by themselves express authorization, not authentication.&lt;/p&gt;

&lt;p&gt;That’s a useful constraint, and we want to honor it. By requiring a separate token for authentication, we minimize the impact of having the permissions token stolen; you can’t use it without authentication, so really it’s just like a mobile signed IAM policy expression. Neat!&lt;/p&gt;

&lt;p&gt;The way we express authentication is with a third-party caveat (&lt;a href='https://fly.io/blog/macaroons-escalated-quickly/#4' title=''&gt;see the old post for details&lt;/a&gt;). Your main Fly.io Macaroon will have a caveat saying “this token is only valid if accompanied by the discharge token for a user in your organization from our authentication system”. Our authentication system does the login dance and issues those discharges.&lt;/p&gt;

&lt;p&gt;This is exactly what you want for user tokens and not at all what you want for a service token: we don’t want running code to store those authenticator tokens, because they’re hazardous.&lt;/p&gt;

&lt;p&gt;The solution we came up with for service tokens is simple: &lt;code&gt;tkdb&lt;/code&gt; exports an API that uses its access to token secrets to strip off the third-party authentication caveat. To call into that API, you have to present a valid discharging authentication token; that is, you have to prove you could already have done whatever the token said. &lt;code&gt;tkdb&lt;/code&gt; returns a new token with all the previous caveats, minus the expiration (you don’t usually want service tokens to expire).&lt;/p&gt;

&lt;p&gt;OK, so we’ve managed to transform a tuple &lt;code&gt;(unscary-token, scary-token)&lt;/code&gt; into the new tuple &lt;code&gt;(scary-token)&lt;/code&gt;. Not so impressive. But hold on: the recipient of &lt;code&gt;scary-token&lt;/code&gt; can attenuate it further: we can lock it to a particular instance of &lt;code&gt;flyd&lt;/code&gt;, or to a particular Fly Machine. Which means exfiltrating it doesn’t do you any good; to use it, you have to control the environment it’s intended to be used in.&lt;/p&gt;

&lt;p&gt;The net result of this scheme is that a compromised physical host will only give you access to tokens that have been used on that worker, which is a very nice property. Another way to look at it: every token used in production is traceable in some way to a valid token a user submitted. Neat!&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;All the cool spooky secret store names were taken.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We do a similar dance to with Pet Semetary, our internal Vault replacement. Petsem manages user secrets for applications, such as Postgres connection strings. Petsem is its own Macaroon authority (it issues its own Macaroons with its own permissions system), and to do something with a secret, you need one of those Petsem-minted Macaroon.&lt;/p&gt;

&lt;p&gt;Our primary API servers field requests from users to set secrets for their apps. So the API has a Macaroon that allows secrets writes. But it doesn&amp;rsquo;t allow reads: there’s no API call to dump your secrets, because our API servers don’t have that privilege. So far, so good.&lt;/p&gt;

&lt;p&gt;But when we boot up a Fly Machine, we need to inject the appropriate user secrets into it at boot; &lt;em&gt;something&lt;/em&gt; needs a Macaroon that can read secrets. That “something” is &lt;code&gt;flyd&lt;/code&gt;, our orchestrator, which runs on every worker server in our fleet.&lt;/p&gt;

&lt;p&gt;Clearly, we can’t give every &lt;code&gt;flyd&lt;/code&gt; a Macaroon that reads every user’s secret. Most users will never deploy anything on any given worker, and we can’t have a security model that collapses down to “every worker is equally privileged”.&lt;/p&gt;

&lt;p&gt;Instead, the “read secret” Macaroon that &lt;code&gt;flyd&lt;/code&gt; gets has a third-party caveat attached to it, which is dischargeable only by talking to &lt;code&gt;tkdb&lt;/code&gt; and proving (with normal Macaroon tokens) that you have permissions for the org whose secrets you want to read. Once again, access is traceable to an end-user action, and minimized across our fleet. Neat!&lt;/p&gt;
&lt;h2 id='6' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#6' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;6&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Our token systems have some of the best telemetry in the whole platform.&lt;/p&gt;

&lt;p&gt;Most of that is down to &lt;a href='http://opentelemetry.io/' title=''&gt;OpenTelemetry&lt;/a&gt; and &lt;a href='https://www.honeycomb.io/' title=''&gt;Honeycomb&lt;/a&gt;. From the moment a request hits our API server through the moment &lt;code&gt;tkdb&lt;/code&gt; responds to it, oTel &lt;a href='https://opentelemetry.io/docs/concepts/context-propagation/' title=''&gt;context propagation&lt;/a&gt; gives us a single narrative about what’s happening.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://fly.io/blog/the-exit-interview-jp/' title=''&gt;I was a skeptic about oTel&lt;/a&gt;. It’s really, really expensive. And, not to put too fine a point on it, oTel really cruds up our code. Once, I was an “80% of the value of tracing, we can get from logs and metrics” person. But I was wrong.&lt;/p&gt;

&lt;p&gt;Errors in our token system are rare. Usually, they’re just early indications of network instability, and between caching and FlyCast, we mostly don’t have to care about those alerts. When we do, it’s because something has gone so sideways that we’d have to care anyways. The &lt;code&gt;tkdb&lt;/code&gt; code is remarkably stable and there hasn’t been an incident intervention with our token system in over a year.&lt;/p&gt;

&lt;p&gt;Past oTel, and the standard logging and Prometheus metrics every Fly App gets for free, we also have a complete audit trail for token operations, in a permanently retained OpenSearch cluster index. Since virtually all the operations that happen on our platform are mediated by Macaroons, this audit trail is itself pretty powerful.&lt;/p&gt;
&lt;h2 id='7' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#7' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;7&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;So, that&amp;rsquo;s pretty much it. The moral of the story for us is, Macaroons have a lot of neat features, our users mostly don&amp;rsquo;t care about them — that may even be a good thing — but we get a lot of use out of them internally.&lt;/p&gt;

&lt;p&gt;As an engineering culture, we&amp;rsquo;re allergic to &amp;ldquo;microservices&amp;rdquo;, and we flinched a bit at the prospect of adding a specific service just to manage tokens. But it&amp;rsquo;s pulled its weight, and not added really any drama at all. We have at this point a second dedicated security service (Petsem), and even though they sort of rhyme with each other, we&amp;rsquo;ve got no plans to merge them. &lt;a href='https://how.complexsystems.fail/#10' title=''&gt;Rule #10&lt;/a&gt; and all that.&lt;/p&gt;

&lt;p&gt;Oh, and a total victory for LiteFS, Litestream, and infrastructure SQLite. Which, after managing an infrastructure SQLite project that routinely ballooned to tens of gigabytes and occasionally threatened service outages, is lovely to see.&lt;/p&gt;

&lt;p&gt;Macaroons! If you&amp;rsquo;d asked us a year ago, we&amp;rsquo;d have said the jury was still out on whether they were a good move. But then Ben Toews spent a year making them awesome, and so they are. &lt;a href='https://github.com/superfly/macaroon' title=''&gt;Most of the code is open source&lt;/a&gt;!&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Taming A Voracious Rust Proxy</title>
        <link rel="alternate" href="https://fly.io/blog/taming-rust-proxy/"/>
        <id>https://fly.io/blog/taming-rust-proxy/</id>
        <published>2025-02-26T00:00:00+00:00</published>
        <updated>2025-03-20T21:16:40+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/taming-rust-proxy/assets/happy-crab-cover.jpg"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Here’s a fun bug.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The basic idea of our service is that we run containers for our users, as hardware-isolated virtual machines (Fly Machines), on hardware we own around the world. What makes that interesting is that we also connect every Fly Machine to a global Anycast network. If your app is running in Hong Kong and Dallas, and a request for it arrives in Singapore, we&amp;rsquo;ll route it to &lt;code&gt;HKG&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Our own hardware fleet is roughly divided into two kinds of servers: edges, which receive incoming requests from the Internet, and workers, which run Fly Machines. Edges exist almost solely to run a Rust program called &lt;code&gt;fly-proxy&lt;/code&gt;, the router at the heart of our Anycast network.&lt;/p&gt;

&lt;p&gt;So: a week or so ago, we flag an incident. Lots of things generate incidents: synthetic monitoring failures, metric thresholds, health check failures. In this case two edge tripwires tripped: elevated &lt;code&gt;fly-proxy&lt;/code&gt; HTTP errors, and skyrocketing CPU utilization, on a couple hosts in &lt;code&gt;IAD&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Our incident process is pretty ironed out at this point. We created an incident channel (we ❤️ &lt;a href='https://rootly.com/' title=''&gt;Rootly&lt;/a&gt; for this, &lt;a href='https://rootly.com/' title=''&gt;seriously check out Rootly&lt;/a&gt;, an infra MVP here for years now), and incident responders quickly concluded that, while something hinky was definitely going on, the platform was fine. We have a lot of edges, and we&amp;rsquo;ve also recently converted many of our edge servers to significantly beefier hardware.&lt;/p&gt;

&lt;p&gt;Bouncing &lt;code&gt;fly-proxy&lt;/code&gt; clears the problem up on an affected proxy. But this wouldn&amp;rsquo;t be much of an interesting story if the problem didn&amp;rsquo;t later come back. So, for some number of hours, we&amp;rsquo;re in an annoying steady-state of getting paged and bouncing proxies. &lt;/p&gt;

&lt;p&gt;While this is happening, Pavel, on our proxy team, pulls a profile from an angry proxy. 
&lt;img alt="A flamegraph profile, described better in the prose anyways." src="/blog/taming-rust-proxy/assets/proxy-profile.jpg" /&gt;
So, this is fuckin&amp;rsquo; weird: a huge chunk of the profile is dominated by Rust &lt;code&gt;tracing&lt;/code&gt;&amp;lsquo;s &lt;code&gt;Subscriber&lt;/code&gt;. But that doesn&amp;rsquo;t make sense. The entire point of Rust &lt;code&gt;tracing&lt;/code&gt;, which generates fine-grained span records for program activity, is that &lt;code&gt;entering&lt;/code&gt; and &lt;code&gt;exiting&lt;/code&gt; a span is very, very fast. &lt;/p&gt;

&lt;p&gt;If the mere act of &lt;code&gt;entering&lt;/code&gt; a span in a Tokio stack is chewing up a significant amount of CPU, something has gone haywire: the actual code being traced must be doing next to nothing.&lt;/p&gt;
&lt;h2 id='a-quick-refresher-on-async-rust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-quick-refresher-on-async-rust' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A Quick Refresher On Async Rust&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;So in Rust, like a lot of &lt;code&gt;async/await&lt;/code&gt; languages, you&amp;rsquo;ve got &lt;code&gt;Futures&lt;/code&gt;. A &lt;code&gt;Future&lt;/code&gt; is a type that represents the future value of an asychronous computation, like reading from a socket. &lt;code&gt;Futures&lt;/code&gt; are state machines, and they&amp;rsquo;re lazy: they expose one basic operation, &lt;code&gt;poll&lt;/code&gt;, which an executor (like Tokio) calls to advance the state machine. That &lt;code&gt;poll&lt;/code&gt; returns whether the &lt;code&gt;Future&lt;/code&gt; is still &lt;code&gt;Pending&lt;/code&gt;, or &lt;code&gt;Ready&lt;/code&gt; with a result.&lt;/p&gt;

&lt;p&gt;In theory, you could build an executor that drove a bunch of &lt;code&gt;Futures&lt;/code&gt; just by storing them in a list and busypolling each of them, round robin, until they return &lt;code&gt;Ready&lt;/code&gt;. This executor would defeat the much of the purpose of asynchronous program, so no real executor works that way.&lt;/p&gt;

&lt;p&gt;Instead, a runtime like Tokio integrates &lt;code&gt;Futures&lt;/code&gt; with an event loop (on &lt;a href='https://man7.org/linux/man-pages/man7/epoll.7.html' title=''&gt;epoll&lt;/a&gt; or &lt;a href='https://en.wikipedia.org/wiki/Kqueue' title=''&gt;kqeue&lt;/a&gt;) and, when calling &lt;code&gt;poll&lt;/code&gt;, passes a &lt;code&gt;Waker&lt;/code&gt;. The &lt;code&gt;Waker&lt;/code&gt; is an abstract handle that allows the &lt;code&gt;Future&lt;/code&gt; to instruct the Tokio runtime to call &lt;code&gt;poll&lt;/code&gt;, because something has happened.&lt;/p&gt;

&lt;p&gt;To complicate things: an ordinary &lt;code&gt;Future&lt;/code&gt; is a one-shot value. Once it&amp;rsquo;s &lt;code&gt;Ready&lt;/code&gt;, it can&amp;rsquo;t be &lt;code&gt;polled&lt;/code&gt; anymore. But with network programming, that&amp;rsquo;s usually not what you want: data usually arrives in streams, which you want to track and make progress on as you can. So async Rust provides &lt;code&gt;AsyncRead&lt;/code&gt; and &lt;code&gt;AsyncWrite&lt;/code&gt; traits, which build on &lt;code&gt;Futures&lt;/code&gt;, and provide methods like &lt;code&gt;poll_read&lt;/code&gt; that return &lt;code&gt;Ready&lt;/code&gt; &lt;em&gt;every time&lt;/em&gt; there&amp;rsquo;s data ready. &lt;/p&gt;

&lt;p&gt;So far so good? OK. Now, there are two footguns in this design. &lt;/p&gt;

&lt;p&gt;The first footgun is that a &lt;code&gt;poll&lt;/code&gt; of a &lt;code&gt;Future&lt;/code&gt; that isn&amp;rsquo;t &lt;code&gt;Ready&lt;/code&gt; wastes cycles, and, if you have a bug in your code and that &lt;code&gt;Pending&lt;/code&gt; poll happens to trip a &lt;code&gt;Waker&lt;/code&gt;, you&amp;rsquo;ll slip into an infinite loop. That&amp;rsquo;s easy to see.&lt;/p&gt;

&lt;p&gt;The second and more insidious footgun is that an &lt;code&gt;AsyncRead&lt;/code&gt; can &lt;code&gt;poll_read&lt;/code&gt; to a &lt;code&gt;Ready&lt;/code&gt; that doesn&amp;rsquo;t actually progress its underlying state machine. Since the idea of &lt;code&gt;AsyncRead&lt;/code&gt; is that you keep &lt;code&gt;poll_reading&lt;/code&gt; until it stops being &lt;code&gt;Ready&lt;/code&gt;, this too is an infinite loop.&lt;/p&gt;

&lt;p&gt;When we look at our profiles, what we see are samples that almost terminate in libc, but spend next to no time in the kernel doing actual I/O. The obvious explanation: we&amp;rsquo;ve entered lots of &lt;code&gt;poll&lt;/code&gt; functions, but they&amp;rsquo;re doing almost nothing and returning immediately.&lt;/p&gt;
&lt;h2 id='jaccuse' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#jaccuse' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;J&amp;#39;accuse!&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Wakeup issues are annoying to debug. But the flamegraph gives us the fully qualified type of the &lt;code&gt;Future&lt;/code&gt; we&amp;rsquo;re polling:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-mhjra6vu"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-mhjra6vu"&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="nn"&gt;fp_io&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Duplex&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="nn"&gt;fp_io&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;reusable_reader&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ReusableReader&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;peek&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PeekableReader&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;tokio_rustls&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;server&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TlsStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp_metered&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;MeteredIo&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;peek&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PeekableReader&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;permitted&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PermittedTcpStream&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Conn&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;tokio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;net&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TcpStream&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This loops like a lot, but much of it is just wrapper types we wrote ourselves, and those wrappers don&amp;rsquo;t do anything interesting. What&amp;rsquo;s left to audit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Duplex&lt;/code&gt;, the outermost type, one of ours, &lt;em&gt;and&lt;/em&gt;
&lt;/li&gt;&lt;li&gt;&lt;code&gt;TlsStream&lt;/code&gt;, from &lt;a href='https://github.com/rustls/rustls' title=''&gt;Rustls&lt;/a&gt;.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;Duplex&lt;/code&gt; is a beast. It&amp;rsquo;s the core I/O state machine for proxying between connections. It&amp;rsquo;s not easy to reason about in specificity. But: it also doesn&amp;rsquo;t do anything directly with a &lt;code&gt;Waker&lt;/code&gt;; it&amp;rsquo;s built around &lt;code&gt;AsyncRead&lt;/code&gt; and &lt;code&gt;AsyncWrite&lt;/code&gt;. It hasn&amp;rsquo;t changed recently and we can&amp;rsquo;t trigger misbehavior in it.&lt;/p&gt;

&lt;p&gt;That leaves &lt;code&gt;TlsStream&lt;/code&gt;. &lt;code&gt;TlsStream&lt;/code&gt; is an ultra-important, load-bearing function in the Rust ecosystem. Everybody uses it. Could it harbor an async Rust footgun? Turns out, it did!&lt;/p&gt;

&lt;p&gt;Unlike our &lt;code&gt;Duplex&lt;/code&gt;, Rustls actually does have to get intimate with the underlying async executor. And, looking through the repository, Pavel uncovers &lt;a href='https://github.com/rustls/tokio-rustls/issues/72' title=''&gt;this issue&lt;/a&gt;: sometimes, &lt;code&gt;TlsStreams&lt;/code&gt; in Rustls just spin out. And it turns out, what&amp;rsquo;s causing this is a TLS state machine bug: when a TLS session is orderly-closed, with a &lt;code&gt;CloseNotify&lt;/code&gt; &lt;code&gt;Alert&lt;/code&gt; record, the sender of that record has informed its counterparty that no further data will be sent. But if there&amp;rsquo;s still buffered data on the underlying connection, &lt;code&gt;TlsStream&lt;/code&gt; mishandles its &lt;code&gt;Waker&lt;/code&gt;, and we fall into a busy-loop.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://github.com/rustls/rustls/pull/1950/files' title=''&gt;Pretty straightforward fix&lt;/a&gt;!&lt;/p&gt;
&lt;h2 id='what-actually-happened-to-us' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-actually-happened-to-us' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What Actually Happened To Us&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Our partners in object storage, &lt;a href='https://www.tigrisdata.com/' title=''&gt;Tigris Data&lt;/a&gt;, were conducting some kind of load testing exercise. Some aspect of their testing system triggered the &lt;code&gt;TlsStream&lt;/code&gt; state machine bug, which locked up one or more &lt;code&gt;TlsStreams&lt;/code&gt; in the edge proxy handling whatever corner-casey stream they were sending.&lt;/p&gt;

&lt;p&gt;Tigris wasn&amp;rsquo;t generating a whole lot of traffic; tens of thousands of connections, tops. But all of them sent small HTTP bodies and then terminated early. We figured some of those connections errored out, and set up the &amp;ldquo;TLS CloseNotify happened before EOF&amp;rdquo; scenario. &lt;/p&gt;

&lt;p&gt;To be truer to the chronology, we knew pretty early on in our investigation that something Tigris was doing with their load testing was probably triggering the bug, and we got them to stop. After we worked it out, and Pavel deployed the fix, we told them to resume testing. No spin-outs.&lt;/p&gt;
&lt;h2 id='lessons-learned' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lessons-learned' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Lessons Learned&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Keep your dependencies updated. Unless you shouldn&amp;rsquo;t keep your dependencies updated. I mean, if there&amp;rsquo;s a vulnerability (and, technically, this was a DoS vulnerability), always update. And if there&amp;rsquo;s an important bugfix, update. But if there isn&amp;rsquo;t an important bugfix, updating for the hell of it might also destabilize your project? So update maybe? Most of the time?&lt;/p&gt;

&lt;p&gt;Really, the truth of this is that keeping track of &lt;em&gt;what needs to be updated&lt;/em&gt; is valuable work. The updates themselves are pretty fast and simple, but the process and testing infrastructure to confidently metabolize dependency updates is not. &lt;/p&gt;

&lt;p&gt;Our other lesson here is that there&amp;rsquo;s an opportunity to spot these kinds of bugs more directly with our instrumentation. Spurious wakeups should be easy to spot, and triggering a metric when they happen should be cheap, because they&amp;rsquo;re not supposed to happen often. So that&amp;rsquo;s something we&amp;rsquo;ll go do now.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>We Were Wrong About GPUs</title>
        <link rel="alternate" href="https://fly.io/blog/wrong-about-gpu/"/>
        <id>https://fly.io/blog/wrong-about-gpu/</id>
        <published>2025-02-14T00:00:00+00:00</published>
        <updated>2025-02-14T23:25:21+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/wrong-about-gpu/assets/choices-choices-cover.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re building a public cloud, on hardware we own. We raised money to do that, and to place some bets; one of them: GPU-enabling our customers. A progress report: GPUs aren’t going anywhere, but: GPUs aren’t going anywhere.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A couple years back, &lt;a href="https://fly.io/gpu"&gt;we put a bunch of chips down&lt;/a&gt; on the bet that people shipping apps to users on the Internet would want GPUs, so they could do AI/ML inference tasks. To make that happen, we created &lt;a href="https://fly.io/docs/gpus/getting-started-gpus/"&gt;Fly GPU Machines&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A Fly Machine is a &lt;a href="https://fly.io/blog/docker-without-docker/"&gt;Docker/OCI container&lt;/a&gt; running inside a hardware-virtualized virtual machine somewhere on our global fleet of bare-metal worker servers. A GPU Machine is a Fly Machine with a hardware-mapped Nvidia GPU. It&amp;rsquo;s a Fly Machine that can do fast CUDA.&lt;/p&gt;

&lt;p&gt;Like everybody else in our industry, we were right about the importance of AI/ML. If anything, we underestimated its importance. But the product we came up with probably doesn&amp;rsquo;t fit the moment. It&amp;rsquo;s a bet that doesn&amp;rsquo;t feel like it&amp;rsquo;s paying off.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;If you&amp;rsquo;re using Fly GPU Machines, don&amp;rsquo;t freak out; we&amp;rsquo;re not getting rid of them.&lt;/strong&gt; But if you&amp;rsquo;re waiting for us to do something bigger with them, a v2 of the product, you&amp;rsquo;ll probably be waiting awhile.&lt;/p&gt;
&lt;h3 id='what-it-took' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-it-took' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What It Took&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;GPU Machines were not a small project for us. Fly Machines run on an idiosyncratically small hypervisor (normally Firecracker, but for GPU Machines &lt;a href="https://github.com/cloud-hypervisor/cloud-hypervisor"&gt;Intel&amp;rsquo;s Cloud Hypervisor&lt;/a&gt;, a very similar Rust codebase that supports PCI passthrough). The Nvidia ecosystem is not geared to supporting micro-VM hypervisors.&lt;/p&gt;

&lt;p&gt;GPUs &lt;a href="https://googleprojectzero.blogspot.com/2020/09/attacking-qualcomm-adreno-gpu.html"&gt;terrified our security team&lt;/a&gt;. A GPU is just about the worst case hardware peripheral: intense multi-directional direct memory transfers&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(not even bidirectional: in common configurations, GPUs talk to each other)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;with arbitrary, end-user controlled computation, all operating outside our normal security boundary.&lt;/p&gt;

&lt;p&gt;We did a couple expensive things to mitigate the risk. We shipped GPUs on dedicated server hardware, so that GPU- and non-GPU workloads weren&amp;rsquo;t mixed. Because of that, the only reason for a Fly Machine to be scheduled on a GPU machine was that it needed a PCI BDF for an Nvidia GPU, and there&amp;rsquo;s a limited number of those available on any box. Those GPU servers were drastically less utilized and thus less cost-effective than our ordinary servers.&lt;/p&gt;

&lt;p&gt;We funded two very large security assessments, from &lt;a href="https://www.atredis.com/"&gt;Atredis&lt;/a&gt; and &lt;a href="https://tetrelsec.com/"&gt;Tetrel&lt;/a&gt;, to evaluate our GPU deployment. Matt Braun is writing up those assessments now. They were not cheap, and they took time.&lt;/p&gt;

&lt;p&gt;Security wasn&amp;rsquo;t directly the biggest cost we had to deal with, but it was an indirect cause for a subtle reason.&lt;/p&gt;

&lt;p&gt;We could have shipped GPUs very quickly by doing what Nvidia recommended: standing up a standard K8s cluster to schedule GPU jobs on. Had we taken that path, and let our GPU users share a single Linux kernel, we&amp;rsquo;d have been on Nvidia&amp;rsquo;s driver happy-path.&lt;/p&gt;

&lt;p&gt;Alternatively, we could have used a conventional hypervisor. Nvidia suggested VMware (heh). But they could have gotten things working had we used QEMU. We like QEMU fine, and could have talked ourselves into a security story for it, but the whole point of Fly Machines is that they take milliseconds to start. We could not have offered our desired Developer Experience on the Nvidia happy-path.&lt;/p&gt;

&lt;p&gt;Instead, we burned months trying (and ultimately failing) to get Nvidia&amp;rsquo;s host drivers working to map &lt;a href="https://www.nvidia.com/en-us/data-center/virtual-solutions/"&gt;virtualized GPUs&lt;/a&gt; into Intel Cloud Hypervisor. At one point, we hex-edited the closed-source drivers to trick them into thinking our hypervisor was QEMU.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m not sure any of this really mattered in the end. There&amp;rsquo;s a segment of the market we weren&amp;rsquo;t ever really able to explore because Nvidia&amp;rsquo;s driver support kept us from thin-slicing GPUs. We&amp;rsquo;d have been able to put together a really cheap offering for developers if we hadn&amp;rsquo;t run up against that, and developers love &amp;ldquo;cheap&amp;rdquo;, but I can&amp;rsquo;t prove that those customers are real.&lt;/p&gt;

&lt;p&gt;On the other hand, we&amp;rsquo;re committed to delivering the Fly Machine DX for GPU workloads. Beyond the PCI/IOMMU drama, just getting an entire hardware GPU working in a Fly Machine was a lift. We needed Fly Machines that would come up with the right Nvidia drivers; our stack was built assuming that the customer&amp;rsquo;s OCI container almost entirely defined the root filesystem for a Machine. We had to engineer around that in our &lt;code&gt;flyd&lt;/code&gt; orchestrator. And almost everything people want to do with GPUs involves efficiently grabbing huge files full of model weights. Also annoying!&lt;/p&gt;

&lt;p&gt;And, of course, we bought GPUs. A lot of GPUs. Expensive GPUs.&lt;/p&gt;
&lt;h3 id='why-it-isnt-working' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-it-isnt-working' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Why It Isn&amp;rsquo;t Working&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The biggest problem: developers don&amp;rsquo;t want GPUs. They don&amp;rsquo;t even want AI/ML models. They want LLMs. &lt;em&gt;System engineers&lt;/em&gt; may have smart, fussy opinions on how to get their models loaded with CUDA, and what the best GPU is. But &lt;em&gt;software developers&lt;/em&gt; don&amp;rsquo;t care about any of that. When a software developer shipping an app comes looking for a way for their app to deliver prompts to an LLM, you can&amp;rsquo;t just give them a GPU.&lt;/p&gt;

&lt;p&gt;For those developers, who probably make up most of the market, it doesn&amp;rsquo;t seem plausible for an insurgent public cloud to compete with OpenAI and Anthropic. Their APIs are fast enough, and developers thinking about performance in terms of &amp;ldquo;tokens per second&amp;rdquo; aren&amp;rsquo;t counting milliseconds.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(you should all feel sympathy for us)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This makes us sad because we really like the point in the solution space we found. Developers shipping apps on Amazon will outsource to other public clouds to get cost-effective access to GPUs. But then they&amp;rsquo;ll faceplant trying to handle data and model weights, backhauling gigabytes (at significant expense) from S3. We have app servers, GPUs, and object storage all under the same top-of-rack switch. But inference latency just doesn&amp;rsquo;t seem to matter yet, so the market doesn&amp;rsquo;t care.&lt;/p&gt;

&lt;p&gt;Past that, and just considering the system engineers who do care about GPUs rather than LLMs: the hardware product/market fit here is really rough.&lt;/p&gt;

&lt;p&gt;People doing serious AI work want galactically huge amounts of GPU compute. A whole enterprise A100 is a compromise position for them; they want an SXM cluster of H100s.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Near as we can tell, MIG gives you a UUID to talk to the host driver, not a PCI device.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We think there&amp;rsquo;s probably a market for users doing lightweight ML work getting tiny GPUs. &lt;a href="https://www.nvidia.com/en-us/technologies/multi-instance-gpu/"&gt;This is what Nvidia MIG does&lt;/a&gt;, slicing a big GPU into arbitrarily small virtual GPUs. But for fully-virtualized workloads, it&amp;rsquo;s not baked; we can&amp;rsquo;t use it. And I&amp;rsquo;m not sure how many of those customers there are, or whether we&amp;rsquo;d get the density of customers per server that we need.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half"&gt;That leaves the L40S customers&lt;/a&gt;. There are a bunch of these! We dropped L40S prices last year, not because we were sour on GPUs but because they&amp;rsquo;re the one part we have in our inventory people seem to get a lot of use out of. We&amp;rsquo;re happy with them. But they&amp;rsquo;re just another kind of compute that some apps need; they&amp;rsquo;re not a driver of our core business. They&amp;rsquo;re not the GPU bet paying off.&lt;/p&gt;

&lt;p&gt;Really, all of this is just a long way of saying that for most software developers, &amp;ldquo;AI-enabling&amp;rdquo; their app is best done with API calls to things like Claude and GPT, Replicate and RunPod.&lt;/p&gt;
&lt;h3 id='what-did-we-learn' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-did-we-learn' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What Did We Learn?&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;A very useful way to look at a startup is that it&amp;rsquo;s a race to learn stuff. So, what&amp;rsquo;s our report card?&lt;/p&gt;

&lt;p&gt;First off, when we embarked down this path in 2022, we were (like many other companies) operating in a sort of phlogiston era of AI/ML. The industry attention to AI had not yet collapsed around a small number of foundational LLM models. We expected there to be a diversity of &lt;em&gt;mainstream&lt;/em&gt; models, the world &lt;a href='https://github.com/elixir-nx/bumblebee' title=''&gt;Elixir Bumblebee&lt;/a&gt; looks forward to, where people pull different AI workloads off the shelf the same way they do Ruby gems.&lt;/p&gt;

&lt;p&gt;But &lt;a href='https://www.cursor.com/' title=''&gt;Cursor happened&lt;/a&gt;, and, as they say, how are you going to keep &amp;lsquo;em down on the farm once they&amp;rsquo;ve seen Karl Hungus? It seems much clearer where things are heading.&lt;/p&gt;

&lt;p&gt;GPUs were a test of a Fly.io company credo: as we think about core features, we design for 10,000 developers, not for 5-6. It took a minute, but the credo wins here: GPU workloads for the 10,001st developer are a niche thing.&lt;/p&gt;

&lt;p&gt;Another way to look at a startup is as a series of bets. We put a lot of chips down here. But the buy-in for this tournament gave us a lot of chips to play with. Never making a big bet of any sort isn&amp;rsquo;t a winning strategy. I&amp;rsquo;d rather we&amp;rsquo;d flopped the nut straight, but I think going in on this hand was the right call.&lt;/p&gt;

&lt;p&gt;A really important thing to keep in mind here, and something I think a lot of startup thinkers sleep on, is the extent to which this bet involved acquiring assets. Obviously, some of our &lt;a href='https://fly.io/blog/the-exit-interview-jp/' title=''&gt;costs here aren&amp;rsquo;t recoverable&lt;/a&gt;. But the hardware parts that aren&amp;rsquo;t generating revenue will ultimately get liquidated; like with &lt;a href='https://fly.io/blog/32-bit-real-estate/' title=''&gt;our portfolio of IPv4 addresses&lt;/a&gt;, I&amp;rsquo;m even more comfortable making bets backed by tradable assets with durable value.&lt;/p&gt;

&lt;p&gt;In the end, I don&amp;rsquo;t think GPU Fly Machines were going to be a hit for us no matter what we did. Because of that, one thing I&amp;rsquo;m very happy about is that we didn&amp;rsquo;t compromise the rest of the product for them. Security concerns slowed us down to where we probably learned what we needed to learn a couple months later than we could have otherwise, but we&amp;rsquo;re scaling back our GPU ambitions without having sacrificed &lt;a href='https://fly.io/blog/sandboxing-and-workload-isolation/' title=''&gt;any of our isolation story&lt;/a&gt;, and, ironically, GPUs &lt;em&gt;other people run&lt;/em&gt; are making that story a lot more important. The same thing goes for our Fly Machine developer experience.&lt;/p&gt;

&lt;p&gt;We started this company building a Javascript runtime for edge computing. We learned that our customers didn&amp;rsquo;t want a new Javascript runtime; they just wanted their native code to work. &lt;a href='https://news.ycombinator.com/item?id=22616857' title=''&gt;We shipped containers&lt;/a&gt;, and no convincing was needed. We were wrong about Javascript edge functions, and I think we were wrong about GPUs. That&amp;rsquo;s usually how we figure out the right answers:  by being wrong about a lot of stuff.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>The Exit Interview: JP Phillips</title>
        <link rel="alternate" href="https://fly.io/blog/the-exit-interview-jp/"/>
        <id>https://fly.io/blog/the-exit-interview-jp/</id>
        <published>2025-02-12T00:00:00+00:00</published>
        <updated>2025-02-14T21:30:41+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/the-exit-interview-jp/assets/bye-bye-little-sebastian-cover.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;JP Phillips is off to greener, or at least calmer, pastures. He joined us 4 years ago to build the next generation of our orchestration system, and has been one of the anchors of our engineering team. His last day is today. We wanted to know what he was thinking, and figured you might too.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Question 1: Why, JP? Just why?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;LOL. When I looked at what I wanted to see from here in the next 3-4 years, it didn&amp;rsquo;t really match up with where we&amp;rsquo;re currently heading. Specifically, with our new focus on MPG &lt;em&gt;[Managed Postgres]&lt;/em&gt; and [llm] &lt;em&gt;[llm].&lt;/em&gt;&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Editorial comment: Even I don’t know what [llm] is.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The Fly Machines platform is more or less finished, in the sense of being capable of supporting the next iteration of our products. My original desire to join Fly.io was to make Machines a product that would &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;rid us of HashiCorp Nomad&lt;/a&gt;, and I feel like that&amp;rsquo;s been accomplished.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Where were you hoping to see us headed?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;More directly positioned as a cloud provider, rather than a platform-as-a-service; further along the customer journey from &amp;ldquo;developers&amp;rdquo; and &amp;ldquo;startups&amp;rdquo; to large established companies.&lt;/p&gt;

&lt;p&gt;And, it&amp;rsquo;s not that I disagree with PAAS work or MPG! Rather, it&amp;rsquo;s not something that excites me in a way that I&amp;rsquo;d feel challenged and could continue to grow technically.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow up question: does your family know what you’re doing here? Doing to us? Are they OK with it?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes, my family was very involved in the decision, before I even talked to other companies.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What&amp;rsquo;s the thing you&amp;rsquo;re happiest about having built here? It cannot be &amp;ldquo;all of &lt;code&gt;flyd&lt;/code&gt;&amp;rdquo;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve enabled developers to run workloads from an OCI image and an API call all over the world. On any other cloud provider, the knowledge of how to pull that off comes with a professional certification.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;In what file in our &lt;code&gt;nomad-firecracker&lt;/code&gt; repository would I find that code?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href='https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines' title=''&gt;https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img alt="A diagram that doesn&amp;#39;t make any of this clearer" src="/blog/the-exit-interview-jp/assets/flaps.png?1/2&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;So you mean, literally, the whole Fly Machines API, and &lt;code&gt;flaps&lt;/code&gt;, the API gateway for Fly Machines?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes, all of it. The &lt;code&gt;flaps&lt;/code&gt; API server, the &lt;code&gt;flyd&lt;/code&gt; RPCs it calls, the &lt;code&gt;flyd&lt;/code&gt; finite state machine system, the interface to running VMs.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Is there something you especially like about that design?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I like that it for the most part doesn&amp;rsquo;t require any central coordination. And I like that the P90 for Fly Machine &lt;code&gt;create&lt;/code&gt; calls is sub-5-seconds for pretty much every region except for Johannesburg and Hong Kong.&lt;/p&gt;

&lt;p&gt;I think the FSM design is something I&amp;rsquo;m proud of; if I could take any code with me, it&amp;rsquo;d be the &lt;code&gt;internal/fsm&lt;/code&gt; in the &lt;code&gt;nomad-firecracker&lt;/code&gt; repo.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;You can read more about &lt;a href="https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/" title=""&gt;the &lt;code&gt;flyd&lt;/code&gt; orchestrator JP led over here&lt;/a&gt;.  But, a quick decoder ring: &lt;code&gt;flyd&lt;/code&gt; runs independently without any central coordination on thousands of “worker” servers around the globe. It’s structured as an API server for a bunch of finite state machine invocations, where an FSM might be something like “start a Fly Machine” or “create a new Fly Machine” or “cordon off a Fly Machine so we can update it”. Each FSM invocation is comprised of a bunch of steps, each of those steps has callbacks into the &lt;code&gt;flyd&lt;/code&gt; code, and each step is logged in &lt;a href="https://github.com/boltdb/bolt" title=""&gt;a BoltDB database&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Thinking back, there are like two archetypes of insanely talented developers I’ve worked with. One is the kind that belts out ridiculous amounts of relatively sophisticated code on a whim, at like 3AM. Jerome [who leads our fly-proxy team], is that type. The other comes to projects with what feels like fully-formed, coherent designs that are not super intuitive, and the whole project just falls together around that design. Did you know you were going to do the FSM log thing when you started &lt;code&gt;flyd&lt;/code&gt;?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I definitely didn&amp;rsquo;t have any specific design in mind when I started on &lt;code&gt;flyd&lt;/code&gt;. I think the FSM stuff is a result of work I did at Compose.io / MongoHQ (where it was called &amp;ldquo;recipes&amp;rdquo;/&amp;ldquo;operations&amp;rdquo;) and the workd I did at HashiCorp using Cadence.&lt;/p&gt;

&lt;p&gt;Once I understood what the product needed to do and look like, having a way to perform deterministic and durable execution felt like a good design.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Cadence?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href='https://cadenceworkflow.io/' title=''&gt;Cadence&lt;/a&gt; is the child of AWS Step Functions and the predecessor to &lt;a href='https://temporal.io/' title=''&gt;Temporal&lt;/a&gt; (the company).&lt;/p&gt;

&lt;p&gt;One of the biggest gains, with how it works in &lt;code&gt;flyd&lt;/code&gt;, is knowing we would need to deploy &lt;code&gt;flyd&lt;/code&gt; all day, every day. If &lt;code&gt;flyd&lt;/code&gt; was in the middle of doing some work, it needed to pick back up right where it left off, post-deploy.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;OK, next question. What&amp;rsquo;s the most impressive thing you saw someone else build here? To make things simpler and take some pressure off the interview, we can exclude any of my works from consideration.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Probably &lt;a href='https://github.com/superfly/corrosion' title=''&gt;&lt;code&gt;corrosion2&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Sidebar: &lt;code&gt;corrosion2&lt;/code&gt; is our state distribution system. While &lt;code&gt;flyd&lt;/code&gt; runs individual Fly Machines for users, each instance is solely responsible for its own state; there’s no global scheduler. But we have platform components, most obviously &lt;code&gt;fly-proxy&lt;/code&gt;, our Anycast router, that need to know what’s running where. &lt;code&gt;corrosion2&lt;/code&gt; is a Rust service that does &lt;a href="https://fly.io/blog/building-clusters-with-serf/" title=""&gt;SWIM gossip&lt;/a&gt; to propagate information from each worker into a CRDT-structured SQLite database. &lt;code&gt;corrosion2&lt;/code&gt; essentially means any component on our fleet can do SQLite queries to get near-real-time information about any Fly Machine around the world.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If for no other reason than that we deployed &lt;code&gt;corrosion&lt;/code&gt;, learned from it, and were able to make significant and valuable improvements — and then migrate to the new system in a short period of time.&lt;/p&gt;

&lt;p&gt;Having a &amp;ldquo;just SQLite&amp;rdquo; interface, for async replicated changes around the world in seconds, it&amp;rsquo;s pretty powerful.&lt;/p&gt;

&lt;p&gt;If we invested in &lt;a href='https://antithesis.com/' title=''&gt;Anthesis&lt;/a&gt; or TLA+ testing, I think there&amp;rsquo;s &lt;a href='https://github.com/superfly/corrosion' title=''&gt;potential for other companies&lt;/a&gt; to get value out of &lt;code&gt;corrosion2&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Just as a general-purpose gossip-based SQLite CRDT gossip system?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;OK, you&amp;rsquo;re being too nice. What&amp;rsquo;s your least favorite thing about the platform?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;GraphQL. No, Elixir. It&amp;rsquo;s a tie between GraphQL and Elixir.&lt;/p&gt;

&lt;p&gt;But probably GraphQL, by a hair.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;That&amp;rsquo;s not the answer I expected.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;GraphQL slows everyone down, and everything. Elixir only slows me down.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The rest of the platform, you&amp;rsquo;re fine with? No complaints?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m happier now that we have &lt;code&gt;pilot&lt;/code&gt;.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;code&gt;pilot&lt;/code&gt; is our new &lt;code&gt;init&lt;/code&gt;. When we launch a Fly Machine, &lt;code&gt;init&lt;/code&gt; is our foothold in the machine; this is unlike a normal OCI runtime, where “pid 1” is often the user’s entrypoint program. Our original &lt;code&gt;init&lt;/code&gt; was so simple people dunked on it and said it might as well have been a bash script; over time, &lt;code&gt;init&lt;/code&gt; has sprouted a bunch of new features. &lt;code&gt;pilot&lt;/code&gt; consolidates those features, and, more importantly, is itself a complete OCI runtime; &lt;code&gt;pilot&lt;/code&gt; can natively run containers inside of Fly Machines.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Before &lt;code&gt;pilot&lt;/code&gt;, there really wasn&amp;rsquo;t any contract between &lt;code&gt;flyd&lt;/code&gt; and &lt;code&gt;init&lt;/code&gt;. And &lt;code&gt;init&lt;/code&gt; was just &amp;ldquo;whatever we wanted &lt;code&gt;init&lt;/code&gt; to be&amp;rdquo;. That limit its ability to serve us.&lt;/p&gt;

&lt;p&gt;Having &lt;code&gt;pilot&lt;/code&gt; be an OCI-compliant runtime with an API for &lt;code&gt;flyd&lt;/code&gt; to drive is  a big win for the future of the Fly Machines API.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Was I right that we should have used SQLite for &lt;code&gt;flyd&lt;/code&gt;, or were you wrong to have used BoltDB?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I still believe Bolt was the right choice. I&amp;rsquo;ve never lost a second of sleep worried that someone is about to run a SQL update statement on a host, or across the whole fleet, and then mangled all our state data. And limiting the storage interface, by not using SQL, kept &lt;code&gt;flyd&lt;/code&gt;&amp;lsquo;s scope managed.&lt;/p&gt;

&lt;p&gt;On the engine side of the platform, which is what &lt;code&gt;flyd&lt;/code&gt; is, I still believe SQL is too powerful for what &lt;code&gt;flyd&lt;/code&gt; does.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you had this to do over again, would Bolt be precisely what you&amp;rsquo;d pick, or is there something else you&amp;rsquo;d want to try? Some cool-ass new KV store?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Nah. But, I&amp;rsquo;d maybe consider a SQLite database per-Fly-Machine. Then the scope of danger is about as small as it could possibly be.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Whoah, that&amp;rsquo;s an interesting thought. People sleep on the &amp;ldquo;keep a zillion little SQLites&amp;rdquo; design.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yeah, with per-Machine SQLite, once a Fly Machine is destroyed, we can just zip up the database and stash it in object storage. The biggest hold-up I have about it is how we&amp;rsquo;d manage the schemas.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;OpenTelemetry: were you right all along?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;One hundred percent.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I basically attribute oTel at Fly.io to you.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Without oTel, it&amp;rsquo;d be a disaster trying to troubleshoot the system. I&amp;rsquo;d have ragequit trying.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I remember there being a cost issue, with how much Honeycomb was going to end up charging us to manage all the data. But that seems silly in retrospect.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For sure. It is 100% part of the decision and the conversation. But: we didn&amp;rsquo;t have the best track record running a logs/metrics cluster at this fidelity. It was worth the money to pay someone else to manage tracing data.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Strong agree. I think my only issue is just the extent to which it cruds up code. But I need to get over that.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes, it&amp;rsquo;s very explicit. I think the next big part of oTel is going to be auto-instrumentation, for profiling.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You&amp;rsquo;re a veteran Golang programmer. Say 3 nice things about Rust.&lt;/em&gt;&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Most of our backend is in Go, but &lt;code&gt;fly-proxy&lt;/code&gt;, &lt;code&gt;corrosion2&lt;/code&gt;, and &lt;code&gt;pilot&lt;/code&gt; are in Rust.&lt;/p&gt;
&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;Option. 
&lt;/li&gt;&lt;li&gt;Match.
&lt;/li&gt;&lt;li&gt;Serde macros.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;Even I can&amp;rsquo;t say shit about Option and match.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Match is so much better than anything in Go.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Elixir, Go, and Rust. An honest take on that programming cocktail.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Three&amp;rsquo;s a crowd, Elixir can stay home.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you could only lose one, you&amp;rsquo;d keep Rust.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ve learned its shortcomings and the productivity far outweighs having to deal with the Rust compiler.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You&amp;rsquo;d be unhappy if we moved the &lt;code&gt;flaps&lt;/code&gt; API code from Go to Elixir.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Correct.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I kind of buy the idea of doing orchestration and scheduling code, which is policy-intensive, in a higher-level language.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Maybe. If Ruby had a better concurrency story, I don&amp;rsquo;t think Elixir would have a place for us.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Here I need to note that Ruby is functionally dead here, and Elixir is ascendant.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;We have an idiosyncratic management structure. We&amp;rsquo;re bottom-up, but ambiguously so. We don&amp;rsquo;t have roadmaps, except when we do. We have minimal top-down technical direction. Critique.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s too easy to lose sight of whether your current focus [in what you&amp;rsquo;re building] is valuable to the company.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The first thing I warn every candidate about on our &amp;ldquo;do-not-work-here&amp;rdquo; calls.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I think it comes down to execution, and accountability to actually finish projects. I spun a lot trying to figure out what would be the most valuable work for Fly Machines.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You don&amp;rsquo;t have to be so nice about things.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We struggle a lot with consistent communication. We change direction a little too often. It got to a point where I didn&amp;rsquo;t see a point in devoting time and effort into projects, because I&amp;rsquo;d not be able to show enough value quick enough.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I see things paying off later than we&amp;rsquo;d hoped or expected they would. Our secret storage system, Pet Semetary, is a good example of this. Our K8s service, FKS, is another obvious one, since we&amp;rsquo;re shipping MPG on it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is your second time working Kurt, at a company where he&amp;rsquo;s the CEO. Give him a 1-4 star rating. He can take it! At least, I think he can take it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;2022: ★★★★&lt;/p&gt;

&lt;p&gt;2023: ★★&lt;/p&gt;

&lt;p&gt;2024: ★★✩&lt;/p&gt;

&lt;p&gt;2025: ★★★✩&lt;/p&gt;

&lt;p&gt;On a four-star scale.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Whoah. I did not expect a histogram. Say more about 2023!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We hired too many people, too quickly, and didn&amp;rsquo;t have the guardrails and structure in place for everybody to be successful.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Also: GPUs!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes. That was my next comment.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Do we secretly agree about GPUs?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I think so.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Our side won the argument in the end! But at what cost?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;They were a killer distraction.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Final question: how long will you remain in the first-responder on-call rotation after you leave? I assume at least until August. I have a shift this weekend; can you swap with me? I keep getting weekends.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I am going to be asleep all weekend if any of my previous job changes are indicative.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I sleep through on-call too! But nobody can yell at you for it now. I think you have the comparative advantage over me in on-calling.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes I will absolutely take all your future on-call shifts, you have convinced me.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;All this aside: it has been a privilege watching you work. I hope your next gig is 100x more relaxing than this was. Or maybe I just hope that for myself. Except: I&amp;rsquo;ll never escape this place. Thank you so much for doing this.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Thank you! I&amp;rsquo;m forever grateful for having the opportunity to be a part of Fly.io.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Did Semgrep Just Get A Lot More Interesting?</title>
        <link rel="alternate" href="https://fly.io/blog/semgrep-but-for-real-now/"/>
        <id>https://fly.io/blog/semgrep-but-for-real-now/</id>
        <published>2025-02-10T00:00:00+00:00</published>
        <updated>2025-02-14T21:30:41+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/semgrep-but-for-real-now/assets/ci-cd-cover.webp"/>
        <content type="html">&lt;div class="right-sidenote"&gt;&lt;p&gt;This whole paragraph is just one long sentence. God I love &lt;a href="https://fly.io/blog/a-blog-if-kept/" title=""&gt;just random-ass blogging&lt;/a&gt; again.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a href='https://ghuntley.com/stdlib/' title=''&gt;This bit by Geoffrey Huntley&lt;/a&gt; is super interesting to me and, despite calling out that LLM-driven development agents like Cursor have something like a 40% success rate at actually building anything that passes acceptance criteria, makes me think that more of the future of our field belongs to people who figure out how to use this weird bags of model weights than any of us are comfortable with. &lt;/p&gt;

&lt;p&gt;I’ve been dinking around with Cursor for a week now (if you haven’t, I think it’s something close to malpractice not to at least take it — or something like it — for a spin) and am just now from this post learning that Cursor has this &lt;a href='https://docs.cursor.com/context/rules-for-ai' title=''&gt;rules feature&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;The important thing for me is not how Cursor rules work, but rather how Huntley uses them. He turns them back on themselves, writing rules to tell Cursor how to organize the rules, and then teach Cursor how to write (under human supervision) its own rules.&lt;/p&gt;

&lt;p&gt;Cursor kept trying to get Huntley to use Bazel as a build system. So he had cursor write a rule for itself: “no bazel”. And there was no more Bazel. If I’d known I could do this, I probably wouldn’t have bounced from the Elixir project I had Cursor doing, where trying to get it to write simple unit tests got it all tangled up trying to make &lt;a href='https://hexdocs.pm/mox/Mox.html' title=''&gt;Mox&lt;/a&gt; work. &lt;/p&gt;

&lt;p&gt;But I’m burying the lead. &lt;/p&gt;

&lt;p&gt;Security people have been for several years now somewhat in love with a tool called &lt;a href='https://github.com/semgrep/semgrep' title=''&gt;Semgrep&lt;/a&gt;. Semgrep is a semantics-aware code search tool; using symbolic variable placeholders and otherwise ordinary code, you can write rules to match pretty much arbitary expressions and control flow. &lt;/p&gt;

&lt;p&gt;If you’re an appsec person, where you obviously go with this is: you build a library of Semgrep searches for well-known vulnerability patterns (or, if you’re like us at Fly.io, you work out how to get Semgrep to catch the Rust concurrency footgun of RWLocks inside if-lets).&lt;/p&gt;

&lt;p&gt;The reality for most teams though is “ain’t nobody got time for that”. &lt;/p&gt;

&lt;p&gt;But I just checked and, unsurprisingly, 4o &lt;a href='https://chatgpt.com/share/67aa94a7-ea3c-8012-845c-6c9491b33fe4' title=''&gt;seems to do reasonably well&lt;/a&gt; at generating Semgrep rules? Like: I have no idea if this rule is actually any good. But it looks like a Semgrep rule?&lt;/p&gt;

&lt;p&gt;What interests me is this: it seems obvious that we’re going to do more and more “closed-loop” LLM agent code generation stuff. By “closed loop”, I mean that the thingy that generates code is going to get to run the code and watch what happens when it’s interacted with. You’re just a small bit of glue code and a lot of system prompting away from building something like that right now: &lt;a href='https://x.com/chris_mccord/status/1882839014845374683' title=''&gt;Chris McCord is building&lt;/a&gt; a thingy that generates whole Elixir/Phoenix apps and runs them as Fly Machines. When you deploy these kinds of things, the LLM gets to see the errors when the code is run, and it can just go fix them. It also gets to see errors and exceptions in the logs when you hit a page on the app, and it can just go fix them.&lt;/p&gt;

&lt;p&gt;With a bit more system prompting, you can get an LLM to try to generalize out from exceptions it fixes and generate unit test coverage for them. &lt;/p&gt;

&lt;p&gt;With a little bit more system prompting, you can probably get an LLM to (1) generate a Semgrep rule for the generalized bug it caught, (2) test the Semgrep rule with a positive/negative control, (3) save the rule, (4) test the whole codebase with Semgrep for that rule, and (5) fix anything it finds that way. &lt;/p&gt;

&lt;p&gt;That is a lot more interesting to me than tediously (and probably badly) trying to predict everything that will go wrong in my codebase a priori and Semgrepping for them. Which is to say: Semgrep — which I have always liked — is maybe a lot more interesting now? And tools like it?&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>A Blog, If You Can Keep It</title>
        <link rel="alternate" href="https://fly.io/blog/a-blog-if-kept/"/>
        <id>https://fly.io/blog/a-blog-if-kept/</id>
        <published>2025-02-10T00:00:00+00:00</published>
        <updated>2025-02-19T19:05:52+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/a-blog-if-kept/assets/keep-blog.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;A boldfaced lede like this was a sure sign you were reading a carefully choreographed EffortPost from our team at Fly.io. We’re going to do less of those. Or the same amount but more of a different kind of post. Either way: launch an app on Fly.io today!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Over the last 5 years, we’ve done pretty well for ourselves writing content for Hacker News. And that’s &lt;a href='https://news.ycombinator.com/item?id=39373476' title=''&gt;mostly&lt;/a&gt; been good for us. We don’t do conventional marketing, we don’t have a sales team, the rest of social media is atomized over 5 different sites. Writing pieces that HN takes seriously has been our primary outreach tool.&lt;/p&gt;

&lt;p&gt;There’s a recipe (probably several, but I know this one works) for charting a post on HN:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write an EffortPost, which is to say a dense technical piece over 2000 words long; within that rubric there’s a bunch of things that are catnip to HN, including runnable code, research surveys, and explainers. (There are also cat-repellants you learn to steer clear of.)
&lt;/li&gt;&lt;li&gt;Assiduously avoid promotion. You have to write for the audience. We get away with sporadically including a call-to-action block in our posts, but otherwise: the post should make sense even if an unrelated company posted it after you went out of business.
&lt;/li&gt;&lt;li&gt;Pick topics HN is interested in (it helps if all your topics are au courant for HN, and we’ve been &lt;a href='https://news.ycombinator.com/item?id=32250426' title=''&gt;very&lt;/a&gt; &lt;a href='https://news.ycombinator.com/item?id=32018066' title=''&gt;lucky&lt;/a&gt; in that regard).
&lt;/li&gt;&lt;li&gt;Like 5-6 more style-guide things that help incrementally. Probably 3 different teams writing for HN will have 3 different style guides with only like &amp;frac12; overlap. Ours, for instances, instructs writers to swear.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;I like this kind of writing. It’s not even a chore. But it’s become an impediment for us, for a couple reasons: the team serializes behind an “editorial” function here, which keeps us from publishing everything we want; worse, caring so much about our track record leaves us noodling on posts interminably (the poor &lt;a href='https://www.tigrisdata.com/' title=''&gt;Tigrises&lt;/a&gt; have been waiting for months for me to publish the piece I wrote about them and FoundationDB; take heart, this post today means that one is coming soon).&lt;/p&gt;

&lt;p&gt;But worst of all, I worried incessantly about us &lt;a href='https://gist.github.com/tqbf/e853764b562a2d72a91a6986ca3b77c0' title=''&gt;wearing out our welcome&lt;/a&gt;. To my mind, we’d have 1, maybe 2 bites at the HN apple in a given month, and we needed to make them count.&lt;/p&gt;

&lt;p&gt;That was dumb. I am dumb about a lot of things! I came around to understanding this after Kurt demanded I publish my blog post about BFAAS (Bash Functions As A Service), 500 lines of Go code that had generated 4500 words in my draft. It was only after I made the decision to stop gatekeeping this blog that I realized &lt;a href='https://simonwillison.net/' title=''&gt;Simon Willison&lt;/a&gt; has been disproving my “wearing out the welcome” theory, day in and day out, for years. He just writes stuff about LLMs when it interests him. I mean, it helps that he’s a better writer than we are. But he’s not wasting time choreographing things.&lt;/p&gt;

&lt;p&gt;Back in like 2009, &lt;a href='https://web.archive.org/web/20110806040300/http://chargen.matasano.com/chargen/2009/7/22/if-youre-typing-the-letters-a-e-s-into-your-code-youre-doing.html' title=''&gt;we had a blog&lt;/a&gt; at another company I was at. That blog drove a lot of business for us (and, on three occasions, almost killed me). It was not in the least bit optimized for HN. I like pretending to be a magazine feature writer, but I miss writing dashed-off pieces every day and clearing space for other people on the team to write as well.&lt;/p&gt;

&lt;p&gt;So this is all just a heads up: we’re trying something new. This is a very long and self-indulgent way to say “we’re going to write a normal blog like it’s 2008”, but that’s how broken my brain is after years of having my primary dopaminergic rewards come from how long Fly.io blog posts stay on the front page: I have to disclaim blogging before we start doing it, lest I fail to meet expectations.&lt;/p&gt;

&lt;p&gt;Like I said. I’m real dumb. But: looking forward to getting a lot more stuff out on the web for people to read this year!&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>VSCode’s SSH Agent Is Bananas</title>
        <link rel="alternate" href="https://fly.io/blog/vscode-ssh-wtf/"/>
        <id>https://fly.io/blog/vscode-ssh-wtf/</id>
        <published>2025-02-07T00:00:00+00:00</published>
        <updated>2025-02-14T21:30:41+00:00</updated>
        <media:thumbnail url="https://fly.io/static/images/default-post-thumbnail.webp"/>
        <content type="html">&lt;p&gt;We’re interested in getting integrated into the flow VSCode uses to do remote editing over SSH, because everybody is using VSCode now, and, in particular, they’re using forks of VSCode that generate code with LLMs. &lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;”hallucination” is what we call it when LLMs get code wrong; “engineering” is what we call it when people do.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;LLM-generated code is &lt;a href='https://nicholas.carlini.com/writing/2024/how-i-use-ai.html' title=''&gt;useful in the general case&lt;/a&gt; if you know what you’re doing. But it’s ultra-useful if you can close the loop between the LLM and the execution environment (with an “Agent” setup). There’s lots to say about this, but for the moment: it’s a semi-effective antidote to hallucination: the LLM generates the code, the agent scaffolding runs the code, the code generates errors, the agent feeds it back to the LLM, the process iterates. &lt;/p&gt;

&lt;p&gt;So, obviously, the issue here is you don’t want this iterative development process happening on your development laptop, because LLMs have boundary issues, and they’ll iterate on your system configuration just as happily on the Git project you happen to be working in. A thing you’d really like to be able to do: run a closed-loop agent-y (“agentic”? is that what we say now) configuration for an LLM, on a clean-slate Linux instance that spins up instantly and that can’t screw you over in any way. You get where we’re going with this.&lt;/p&gt;

&lt;p&gt;Anyways! I would like to register a concern.&lt;/p&gt;

&lt;p&gt;Emacs hosts the spiritual forebearer of remote editing systems, a blob of hyper-useful Elisp called &lt;a href='https://www.gnu.org/software/tramp/' title=''&gt;“Tramp”&lt;/a&gt;. If you can hook Tramp up to any kind of interactive environment — usually, an SSH session — where it can run Bourne shell commands, it can extend Emacs to that environment.&lt;/p&gt;

&lt;p&gt;So, VSCode has a feature like Tramp. Which, neat, right? You’d think, take Tramp, maybe simplify it a bit, switch out Elisp for Typescript.&lt;/p&gt;

&lt;p&gt;You’d think wrong!&lt;/p&gt;

&lt;p&gt;Unlike Tramp, which lives off the land on the remote connection, VSCode mounts a full-scale invasion: it runs a Bash snippet stager that downloads an agent, including a binary installation of Node. &lt;/p&gt;

&lt;p&gt;I &lt;em&gt;think&lt;/em&gt; this is &lt;a href='https://github.com/microsoft/vscode/tree/c9e7e1b72f80b12ffc00e06153afcfedba9ec31f/src/vs/server/node' title=''&gt;the source code&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;The agent runs over port-forwarded SSH. It establishes a WebSockets connection back to your running VSCode front-end. The underlying protocol on that connection can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wander around the filesystem
&lt;/li&gt;&lt;li&gt;Edit arbitrary files
&lt;/li&gt;&lt;li&gt;Launch its own shell PTY processes
&lt;/li&gt;&lt;li&gt;Persist itself
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;In security-world, there’s a name for tools that work this way. I won’t say it out loud, because that’s not fair to VSCode, but let’s just say the name is murid in nature.&lt;/p&gt;

&lt;p&gt;I would be a little nervous about letting people VSCode-remote-edit stuff on dev servers, and apoplectic if that happened during an incident on something in production. &lt;/p&gt;

&lt;p&gt;It turns out we don’t have to care about any of this to get a custom connection to a Fly Machine working in VSCode, so none of this matters in any kind of deep way, but: we’ve decided to just be a blog again, so: we had to learn this, and now you do too.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>AI GPU Clusters, From Your Laptop, With Livebook</title>
        <link rel="alternate" href="https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/"/>
        <id>https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/</id>
        <published>2024-09-24T00:00:00+00:00</published>
        <updated>2024-09-24T17:19:49+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/assets/ai-gpu-livebook-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Livebook, FLAME, and the Nx stack: three Elixir components that are easy to describe, more powerful than they look, and intricately threaded into the Elixir ecosystem. A few weeks ago, Chris McCord (👋) and Chris Grainger showed them off at ElixirConf 2024. We thought the talk was worth a recap.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s begin by introducing our cast of characters.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://livebook.dev/' title=''&gt;Livebook&lt;/a&gt; is usually described as Elixir&amp;rsquo;s answer to &lt;a href='https://jupyter.org/' title=''&gt;Jupyter Notebooks&lt;/a&gt;. And that&amp;rsquo;s a good way to think about it. But Livebook takes full advantage of the Elixir platform, which makes it sneakily powerful. By linking up directly with Elixir app clusters, Livebook can switch easily between driving compute locally or on remote servers, and makes it easy to bring in any kind of data into reproducible workflows.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://fly.io/blog/rethinking-serverless-with-flame/' title=''&gt;FLAME&lt;/a&gt; is the Elixir&amp;rsquo;s answer to serverless computing. By having the library manage a pool of executors for you, FLAME lets you treat your entire application as if it was elastic and scale-to-zero. You configure FLAME with some basic information about where to run code and how many instances it&amp;rsquo;s allowed to run with, and then mark off any arbitrary section of code with &lt;code&gt;Flame.call&lt;/code&gt;. The framework takes care of the rest. It&amp;rsquo;s the upside of serverless without committing yourself to blowing your app apart into tiny, intricately connected pieces.&lt;/p&gt;

&lt;p&gt;The &lt;a href='https://github.com/elixir-nx' title=''&gt;Nx stack&lt;/a&gt; is how you do Elixir-native AI and ML. Nx gives you an Elixir-native notion of tensor computations with GPU backends. &lt;a href='https://github.com/elixir-nx/axon' title=''&gt;Axon&lt;/a&gt; builds a common interface for ML models on top of it. &lt;a href='https://github.com/elixir-nx/bumblebee' title=''&gt;Bumblebee&lt;/a&gt; makes those models available to any Elixir app that wants to download them, from just a couple lines of code.&lt;/p&gt;

&lt;p&gt;Here is quick video showing how to transfer a local tensor to a remote GPU, using Livebook, FLAME, and Nx:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/5ImP3gpUSkQ"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Let&amp;rsquo;s dive into the &lt;a href='https://www.youtube.com/watch?v=4qoHPh0obv0' title=''&gt;keynote&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id='poking-a-hole-in-your-infrastructure' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#poking-a-hole-in-your-infrastructure' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Poking a hole in your infrastructure&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Any Livebook, including the one running on your laptop, can start a runtime running on a Fly Machine, in Fly.io&amp;rsquo;s public cloud. That Elixir machine will (by default) live in your default Fly.io organization, giving it networked access to all the other apps that might live there.&lt;/p&gt;

&lt;p&gt;This is an access control situation that mostly just does what you want it to do without asking. Unless you ask it to, Fly.io isn&amp;rsquo;t exposing anything to the Internet, or to other users of Fly.io. For instance: say we have a database we&amp;rsquo;re going to use to generate reports. It can hang out on our Fly organization, inside of a private network with no connectivity to the world. We can spin up a Livebook instance that can talk to it, without doing any network or infrastructure engineering to make that happen.&lt;/p&gt;

&lt;p&gt;But wait, there&amp;rsquo;s more. Because this is all Elixir, Livebook also allows you to connect to any running Erlang/Elixir application in your infrastructure to debug, introspect, and monitor them.&lt;/p&gt;

&lt;p&gt;Check out this clip of Chris McCord connecting &lt;a href='https://rtt.fly.dev/' title=''&gt;to an existing application&lt;/a&gt; during the keynote:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/4qoHPh0obv0?start=1106"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Running a snippet of code from a laptop on a remote server is a neat trick, but Livebook is doing something deeper than that. It&amp;rsquo;s taking advantage of Erlang/Elixir&amp;rsquo;s native facility with cluster computation and making it available to the notebook. As a result, when we do things like auto-completing, Livebook delivers results from modules defined on the remote note itself. 🤯&lt;/p&gt;
&lt;h2 id='elastic-scale-with-flame' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#elastic-scale-with-flame' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Elastic scale with FLAME&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;When we first introduced FLAME, the example we used was video encoding.&lt;/p&gt;

&lt;p&gt;Video encoding is complicated and slow enough that you&amp;rsquo;d normally make arrangements to run it remotely or in a background job queue, or as a triggerable Lambda function. The point of FLAME is to get rid of all those steps, and give them over to the framework instead. So: we wrote our &lt;code&gt;ffpmeg&lt;/code&gt; calls inline like normal code, as if they were going to complete in microseconds, and wrapped them in &lt;code&gt;Flame.call&lt;/code&gt; blocks. That was it, that was the demo.&lt;/p&gt;

&lt;p&gt;Here, we&amp;rsquo;re going to put a little AI spin on it.&lt;/p&gt;

&lt;p&gt;The first thing we&amp;rsquo;re doing here is driving FLAME pools from Livebook. Livebook will automatically synchronize your notebook dependencies as well as any module or code defined in your notebook across nodes. That means any code we write in our notebook can be dispatched transparently out to arbitrarily many compute nodes, without ceremony.&lt;/p&gt;

&lt;p&gt;Now let&amp;rsquo;s add some AI flair. We take an object store bucket full of video files. We use &lt;code&gt;ffmpeg&lt;/code&gt; to extract stills from the video at different moments. Then: we send them to &lt;a href='https://www.llama.com/' title=''&gt;Llama&lt;/a&gt;, running on &lt;a href='https://fly.io/gpu' title=''&gt;GPU Fly Machines&lt;/a&gt; (still locked to our organization), to get descriptions of the stills.&lt;/p&gt;

&lt;p&gt;All those stills and descriptions get streamed back to our notebook, in real time:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/4qoHPh0obv0?start=1692"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;At the end, the descriptions are sent to &lt;a href='https://mistral.ai/' title=''&gt;Mistral&lt;/a&gt;, which builds a summary.&lt;/p&gt;

&lt;p&gt;Thanks to FLAME, we get explicit control over the minimum and the maximum amount of nodes you want running at once, as well their concurrency settings. As nodes finish processing each video, new ones are automatically sent to them, until the whole bucket has been traversed. Each node will automatically shut down after an idle timeout and the whole cluster terminates if you disconnect the Livebook runtime.&lt;/p&gt;

&lt;p&gt;Just like your app code, FLAME lets you take your notebook code designed to run locally, change almost nothing, and elastically execute it across ephemeral infrastructure.&lt;/p&gt;
&lt;h2 id='64-gpus-hyperparameter-tuning-on-a-laptop' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#64-gpus-hyperparameter-tuning-on-a-laptop' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;64-GPUs hyperparameter tuning on a laptop&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Next, Chris Grainger, CTO of &lt;a href='https://amplified.ai/' title=''&gt;Amplified&lt;/a&gt;, takes the stage.&lt;/p&gt;

&lt;p&gt;For work at Amplified, Chris wants to analyze a gigantic archive of patents, on behalf of a client doing edible cannibinoid work. To do that, he uses a BERT model (BERT, from Google, is one of the OG &amp;ldquo;transformer&amp;rdquo; models, optimized for text comprehension).&lt;/p&gt;

&lt;p&gt;To make the BERT model effective for this task, he&amp;rsquo;s going to do a hyperparameter training run.&lt;/p&gt;

&lt;p&gt;This is a much more complicated AI task than the Llama work we just showed up. Chris is going to generate a cluster of 64 GPU Fly Machines, each running an &lt;a href='https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/' title=''&gt;L40s GPU&lt;/a&gt;. On each of these nodes, he needs to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;setup its environment (including native dependencies and GPU bindings)
&lt;/li&gt;&lt;li&gt;load the training data
&lt;/li&gt;&lt;li&gt;compile a different version of BERT with different parameters, optimizers, etc.
&lt;/li&gt;&lt;li&gt;start the fine-tuning
&lt;/li&gt;&lt;li&gt;stream its results in real-time to each assigned chart
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Here&amp;rsquo;s the clip. You&amp;rsquo;ll see the results stream in, in real time, directly back to his Livebook. We&amp;rsquo;ll wait, because it won&amp;rsquo;t take long to watch:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/4qoHPh0obv0?start=3344"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id='this-is-just-the-beginning' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#this-is-just-the-beginning' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;This is just the beginning&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The suggestion of mixing Livebook and FLAME to elastically scale notebook execution was originally proposed by Chris Grainger during ElixirConf EU. During the next four months, Jonatan Kłosko, Chris McCord, and José Valim worked part-time on making it a reality in time for ElixirConf US. Our ability to deliver such a rich combination of features in such a short period of time is a testament to the capabilities of the Erlang Virtual Machine, which Elixir and Livebook runs on. Other features, such as &lt;a href='https://github.com/elixir-explorer/explorer/issues/932' title=''&gt;remote dataframes and distributed GC&lt;/a&gt;, were implemented in a weekend.  Bringing the same functionality to other ecosystems would take several additional months, sometimes accompanied by millions in funding, and often times as part of a closed-source product.&lt;/p&gt;

&lt;p&gt;Furthermore, since we announced this feature, &lt;a href='https://github.com/mruoss' title=''&gt;Michael Ruoss&lt;/a&gt; stepped in and brought the same functionality to Kubernetes. From Livebook v0.14.1, you can start Livebook runtimes inside a Kubernetes cluster and also use FLAME to elastically scale them. Expect more features and news in this space!&lt;/p&gt;

&lt;p&gt;Finally, Fly&amp;rsquo;s infrastructure played a key role in making it possible to start a cluster of GPUs in seconds rather than minutes, and all it requires is a Docker image. We&amp;rsquo;re looking forward to see how other technologies and notebook platforms can leverage Fly to also elevate their developer experiences.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Launch a GPU app in seconds&lt;/h1&gt;
    &lt;p&gt;Run your own LLMs or use Livebook for elastic GPU workflows&amp;nbsp✨&lt;/p&gt;
      &lt;a class="btn btn-lg" href="/gpu"&gt;
        Go! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-turtle.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;</content>
    </entry>
    <entry>
        <title>Accident Forgiveness</title>
        <link rel="alternate" href="https://fly.io/blog/accident-forgiveness/"/>
        <id>https://fly.io/blog/accident-forgiveness/</id>
        <published>2024-08-21T00:00:00+00:00</published>
        <updated>2024-09-11T00:04:08+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/accident-forgiveness/assets/money-for-mistakes-blog-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. &lt;a href="https://fly.io/speedrun" title=""&gt;Try it out; you’ll be deployed in just minutes&lt;/a&gt;, and, as you’re about to read, with less financial risk.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Public cloud billing is terrifying.&lt;/p&gt;

&lt;p&gt;The premise of a public cloud &amp;mdash; what sets it apart from a hosting provider &amp;mdash; is 8,760 hours/year of on-tap deployable compute, storage, and networking. Cloud resources are &amp;ldquo;elastic&amp;rdquo;: they&amp;rsquo;re acquired and released as needed; in the &amp;ldquo;cloud-iest&amp;rdquo; apps, without human intervention. Public cloud resources behave like utilities, and that&amp;rsquo;s how they&amp;rsquo;re priced.&lt;/p&gt;

&lt;p&gt;You probably can&amp;rsquo;t tell me how much electricity your home is using right now, and  may only come within tens of dollars of accurately predicting your water bill. But neither of those bills are all that scary, because you assume there&amp;rsquo;s a limit to how much you could run them up in a single billing interval.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s not true of public clouds. There are only so many ways to &amp;ldquo;spend&amp;rdquo; water at your home, but there are indeterminably many ways to land on a code path that grabs another VM, or to miskey a configuration, or to leak long-running CI/CD environments every time a PR gets merged. Pick a practitioner at random, and I bet they&amp;rsquo;ve read a story within the last couple months about someone running up a galactic-scale bill at some provider or other.&lt;/p&gt;
&lt;h2 id='implied-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#implied-accident-forgiveness' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Implied Accident Forgiveness&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;For people who don&amp;rsquo;t do a lot of cloud work, what all this means is that every PR push sets off a little alarm in the back of their heads: &amp;ldquo;you may have just incurred $200,000 of costs!&amp;rdquo;. The alarm is quickly silenced,  though it&amp;rsquo;s still subtly extracting a cortisol penalty. But by deadening the nerves that sense the danger of unexpected charges, those people are nudged closer to themselves being the next story on Twitter about an accidental $200,000 bill.&lt;/p&gt;

&lt;p&gt;The saving grace here, which you&amp;rsquo;ll learn if you ever become that $200,000 story, is that nobody pays those bills.&lt;/p&gt;

&lt;p&gt;See, what cloud-savvy people know already is that providers have billing support teams, which spend a big chunk of their time conceding disputed bills. If you do something luridly stupid and rack up costs, AWS and GCP will probably cut you a break. We will too. Everyone does.&lt;/p&gt;

&lt;p&gt;If you didn&amp;rsquo;t already know this, you&amp;rsquo;re welcome; I&amp;rsquo;ve made your life a little better, even if you don&amp;rsquo;t run things on Fly.io.&lt;/p&gt;

&lt;p&gt;But as soothing as it is to know you can get a break from cloud providers, the billing situation here is still a long ways away from &amp;ldquo;good&amp;rdquo;. If you accidentally add a zero to a scale count and don&amp;rsquo;t notice for several weeks, AWS or GCP will probably cut you a break. But they won&amp;rsquo;t &lt;em&gt;definitely&lt;/em&gt; do it, and even though your odds are good, you&amp;rsquo;re still finding out at email- and phone-tag scale speeds. That&amp;rsquo;s not fun!&lt;/p&gt;
&lt;h2 id='explicit-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#explicit-accident-forgiveness' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Explicit Accident Forgiveness&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Charging you for stuff you didn&amp;rsquo;t want is bad business.&lt;/p&gt;

&lt;p&gt;Good business, we think, means making you so comfortable with your cloud you try new stuff. You, and everyone else on your team. Without a chaperone from the finance department.&lt;/p&gt;

&lt;p&gt;So we&amp;rsquo;re going to do the work to make this official. If you&amp;rsquo;re a customer of ours, we&amp;rsquo;re going to charge you in exacting detail for every resource you intentionally use of ours, but if something blows up and you get an unexpected bill, we&amp;rsquo;re going to let you off the hook.&lt;/p&gt;
&lt;h2 id='not-so-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#not-so-fast' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Not So Fast&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;This is a Project, with a capital P. While we&amp;rsquo;re kind of kicking ourselves for not starting it earlier, there are reasons we couldn&amp;rsquo;t do it back in 2020.&lt;/p&gt;

&lt;p&gt;The Fully Automated Accident-Forgiving Billing System of the Future (which we are in fact building and may even one day ship) will give you a line-item veto on your invoice. We are a long ways away. The biggest reason is fraud.&lt;/p&gt;

&lt;p&gt;Sit back, close your eyes, and try to think about everything public clouds do to make your life harder. Chances are, most of those things are responses to fraud. Cloud platforms attract fraudsters like ants to an upturned ice cream cone. Thanks to the modern science of cryptography, fraudsters have had a 15 year head start on turning metered compute into picodollar-granular near-money assets.&lt;/p&gt;

&lt;p&gt;Since there&amp;rsquo;s no bouncer at the door checking IDs here, an open-ended and automated commitment to accident forgiveness is, with iron certainty, going to be used overwhelmingly in order to trick us into &amp;ldquo;forgiving&amp;rdquo; cryptocurrency miners. We&amp;rsquo;re cloud platform engineers. They&amp;rsquo;re our primary pathogen.&lt;/p&gt;

&lt;p&gt;So, we&amp;rsquo;re going to roll this out incrementally.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;Why not billing alerts?&lt;/strong&gt; We’ll get there, but here too there are reasons we haven’t yet: (1) meaningful billing alerts were incredibly difficult to do with our previous billing system, and building the new system and migrating our customers to it has been a huge lift, a nightmare from which we are only now waking (the billing system’s official name); and (2) we’re wary about alerts being a product design cop-out; if we can alert on something, why aren’t we fixing it?&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='accident-forgiveness-v0-84beta' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#accident-forgiveness-v0-84beta' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Accident Forgiveness v0.84beta&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;All the same subtextual, implied reassurances that every cloud provider offers remain in place at Fly.io. You are strictly better off after this announcement, we promise.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I added the “almost” right before publishing, because I’m chicken.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Now: for customers that have a support contract with us, at any level, there&amp;rsquo;s something new: I&amp;rsquo;m saying the quiet part loud. The next time you see a bill with an unexpected charge on it, we&amp;rsquo;ll refund that charge, (almost) no questions asked.&lt;/p&gt;

&lt;p&gt;That policy is so simple it feels anticlimactic to write. So, some additional color commentary:&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;re not advertising a limit to the number of times you can do this. If you&amp;rsquo;re a serious customer of ours, I promise that you cannot remotely fathom the fullness of our fellow-feeling. You&amp;rsquo;re not annoying us by getting us to refund unexpected charges. If you are growing a project on Fly.io, we will bend over backwards to keep you growing.&lt;/p&gt;

&lt;p&gt;How far can we take this? How simple can we keep this policy? We&amp;rsquo;re going to find out together.&lt;/p&gt;

&lt;p&gt;To begin with, and in the spirit of &amp;ldquo;doing things that won&amp;rsquo;t scale&amp;rdquo;, when we forgive a bill, what&amp;rsquo;s going to happen next is this: I&amp;rsquo;m going to set an irritating personal reminder for Kurt to look into what happened, now and then the day before your next bill, so we can see what&amp;rsquo;s going wrong. He&amp;rsquo;s going to hate that, which is the point: our best feature work is driven by Kurt-hate.&lt;/p&gt;

&lt;p&gt;Obviously, if you&amp;rsquo;re rubbing your hands together excitedly over the opportunity this policy presents, then, well, not so much with the fellow-feeling. We reserve the right to cut you off.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Support For Developers, By Developers&lt;/h1&gt;
    &lt;p&gt;Explicit Accident Forgiveness is just one thing we like about Support at Fly.io.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/accident-forgiveness"&gt;
        Go find out! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='whats-next-accident-protection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next-accident-protection' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What&amp;rsquo;s Next: Accident Protection&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We think this is a pretty good first step. But that&amp;rsquo;s all it is.&lt;/p&gt;

&lt;p&gt;We can do better than offering you easy refunds for mistaken deployments and botched CI/CD jobs. What&amp;rsquo;s better than getting a refund is never incurring the charge to begin with, and that&amp;rsquo;s the next step we&amp;rsquo;re working on.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;More to come on that billing system.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We built a new billing system so that we can do things like that. For instance: we&amp;rsquo;re in a position to catch sudden spikes in your month-over-month bills, flag them, and catch weird-looking deployments before we bill for them.&lt;/p&gt;

&lt;p&gt;Another thing we rebuilt billing for is &lt;a href='https://community.fly.io/t/reservation-blocks-40-discount-on-machines-when-youre-ready-to-commit/20858' title=''&gt;reserved pricing&lt;/a&gt;. Already today you can get a steep discount from us reserving blocks of compute in advance. The trick to taking advantage of reserved pricing is confidently predicting a floor to your usage. For a lot of people, that means fighting feelings of loss aversion (nobody wants to get gym priced!). So another thing we can do in this same vein: catch opportunities to move customers to reserved blocks, and offer backdated reservations. We&amp;rsquo;ll figure this out too.&lt;/p&gt;

&lt;p&gt;Someday, when we&amp;rsquo;re in a monopoly position, our founders have all been replaced by ruthless MBAs, and Kurt has retired to farm coffee beans in lower Montana, we may stop doing this stuff. But until that day this is the right choice for our business.&lt;/p&gt;

&lt;p&gt;Meanwhile: like every public cloud, we provision our own hardware, and we have excess capacity. Your messed-up CI/CD jobs didn&amp;rsquo;t really cost us anything, so if you didn&amp;rsquo;t really want them, they shouldn&amp;rsquo;t cost you anything either. Take us up on this! We love talking to you.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>We're Cutting L40S Prices In Half</title>
        <link rel="alternate" href="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/"/>
        <id>https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/</id>
        <published>2024-08-15T00:00:00+00:00</published>
        <updated>2024-09-11T00:04:08+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/assets/gpu-ga-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. And as of today, cheaper GPUs. &lt;a href="https://fly.io/speedrun" title=""&gt;Try it out; you’ll be deployed in just minutes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We just lowered the prices on NVIDIA L40s GPUs to $1.25 per hour. Why? Because our feet are cold and we burn processor cycles for heat. But also other reasons.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s back up.&lt;/p&gt;

&lt;p&gt;We offer 4 different NVIDIA GPU models; in increasing order of performance, they&amp;rsquo;re the A10, the L40S, the 40G PCI A100, and the 80G SXM A100.  Guess which one is most popular.&lt;/p&gt;

&lt;p&gt;We guessed wrong, and spent a lot of time working out how to maximize the amount of GPU power we could deliver to a single Fly Machine. Users surprised us. By a wide margin, the most popular GPU in our inventory is the A10.&lt;/p&gt;

&lt;p&gt;The A10 is an older generation of NVIDIA GPU with fewer, slower cores and less memory. It&amp;rsquo;s the least capable GPU we offer. But that doesn&amp;rsquo;t matter, because it&amp;rsquo;s capable enough. It&amp;rsquo;s solid for random inference tasks, and handles mid-sized generative AI stuff like Mistral Nemo or Stable Diffusion. For those workloads, there&amp;rsquo;s not that much benefit in getting a beefier GPU.&lt;/p&gt;

&lt;p&gt;As a result, we can&amp;rsquo;t get new A10s in fast enough for our users.&lt;/p&gt;

&lt;p&gt;If there&amp;rsquo;s one thing we&amp;rsquo;ve learned by talking to our customers over the last 4 years, it&amp;rsquo;s that y&amp;#39;all love a peek behind the curtain. So we&amp;rsquo;re going to let you in on a little secret about how a hardware provider like Fly.io formulates GPU strategy: none of us know what the hell we&amp;rsquo;re doing.&lt;/p&gt;

&lt;p&gt;If you had asked us in 2023 what the biggest GPU problem we could solve was, we&amp;rsquo;d have said &amp;ldquo;selling fractional A100 slices&amp;rdquo;. We burned a whole quarter trying to get MIG, or at least vGPUs, working through IOMMU PCI passthrough on Fly Machines, in a project so cursed that Thomas has forsworn ever programming again. Then we went to market selling whole A100s, and for several more months it looked like the biggest problem we needed to solve was finding a secure way to expose NVLink-ganged A100 clusters to VMs so users could run training. Then H100s; can we find H100s anywhere? Maybe in a black market in Shenzhen?&lt;/p&gt;

&lt;p&gt;And here we are, a year later, looking at the data, and the least sexy, least interesting GPU part in the catalog is where all the action is.&lt;/p&gt;

&lt;p&gt;With actual customer data to back up the hypothesis, here&amp;rsquo;s what we think is happening today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most users who want to plug GPU-accelerated AI workloads into fast networks are doing inference, not training. 
&lt;/li&gt;&lt;li&gt;The hyperscaler public clouds strangle these customers, first with GPU instance surcharges, and then with egress fees for object storage data when those customers try to outsource the GPU stuff to GPU providers.
&lt;/li&gt;&lt;li&gt;If you&amp;rsquo;re trying to do something GPU-accelerated in response to an HTTP request, the right combination of GPU, instance RAM, fast object storage for datasets and model parameters, and networking is much more important than getting your hands on an H100.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This is a thing we didn&amp;rsquo;t see coming, but should have: training workloads tend to look more like batch jobs, and inference tends to look more like transactions. Batch training jobs aren&amp;rsquo;t that sensitive to networking or even reliability. Live inference jobs responding to end-user HTTP requests are. So, given our pricing, of course the A10s are a sweet spot.&lt;/p&gt;

&lt;p&gt;The next step up in our lineup after the A10 is the L40S. The L40S is a nice piece of kit. We&amp;rsquo;re going to take a beat here and sell you on the L40S, because it&amp;rsquo;s kind of awesome.&lt;/p&gt;

&lt;p&gt;The L40S is an AI-optimized version of the L40, which is the data center version of the GeForce RTX 4090, resembling two 4090s stapled together.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re not a GPU hardware person, the RTX 4090 is a gaming GPU, the kind you&amp;rsquo;d play ray-traced Witcher 3 on. NVIDIA&amp;rsquo;s high-end gaming GPUs are actually reasonably good at AI workloads! But they suck in a data center rack: they chug power, they&amp;rsquo;re hard to cool, and they&amp;rsquo;re less dense. Also, NVIDIA can&amp;rsquo;t charge as much for them.&lt;/p&gt;

&lt;p&gt;Hence the L40: (much) more memory, less energy consumption, designed for a rack, not a tower case. Marked up for &amp;ldquo;enterprise&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;NVIDIA positioned the L40 as a kind of &amp;ldquo;graphics&amp;rdquo; AI GPU. Unlike the super high-end cards like the A100/H100, the L40 keeps all the rendering hardware, so it&amp;rsquo;s good for 3D graphics and video processing. Which is sort of what you&amp;rsquo;d expect from a &amp;ldquo;professionalized&amp;rdquo; GeForce card.&lt;/p&gt;

&lt;p&gt;A funny thing happened in the middle of 2023, though: the market for ultra-high-end NVIDIA cards went absolutely batshit. The huge cards you&amp;rsquo;d gang up for training jobs got impossible to find, and NVIDIA became one of the most valuable companies in the world. Serious shops started working out plans to acquire groups of L40-type cards to work around the problem, whether or not they had graphics workloads.&lt;/p&gt;

&lt;p&gt;The only company in this space that does know what they&amp;rsquo;re doing is NVIDIA. Nobody has written a highly-ranked Reddit post about GPU workloads without NVIDIA noticing and creating a new SKU. So they launched the L40S, which is an L40 with AI workload compute performance comparable to that of the A100 (without us getting into the details of F32 vs. F16 models).&lt;/p&gt;

&lt;p&gt;Long story short, the L40S is an A100-performer that we can price for A10 customers; the Volkswagen GTI of our lineup. We&amp;rsquo;re going to see if we can make that happen.&lt;/p&gt;

&lt;p&gt;We think the combination of just-right-sized inference GPUs and Tigris object storage is pretty killer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model parameters, data sets, and compute are all close together
&lt;/li&gt;&lt;li&gt;everything plugged into an Anycast network that&amp;rsquo;s fast everywhere in the world
&lt;/li&gt;&lt;li&gt;on VM instances that have enough memory to actually run real frameworks on
&lt;/li&gt;&lt;li&gt;priced like we actually want you to use it.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;You should use L40S cards without thinking hard about it. So we&amp;rsquo;re making it official. You won&amp;rsquo;t pay us a dime extra to use one instead of an A10. Have at it! Revolutionize the industry. For $1.25 an hour.&lt;/p&gt;

&lt;p&gt;Here are things you can do with an L40S on Fly.io today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can run Llama 3.1 70B — a big Llama — for LLM jobs.
&lt;/li&gt;&lt;li&gt;You can run Flux from Black Forest Labs for genAI images.
&lt;/li&gt;&lt;li&gt;You can run Whisper for automated speech recognition.
&lt;/li&gt;&lt;li&gt;You can do whole-genome alignment with SegAlign (Thomas&amp;rsquo; biochemist kid who has been snatching free GPU hours for his lab gave us this one, and we&amp;rsquo;re taking his word for it).
&lt;/li&gt;&lt;li&gt;You can run DOOM Eternal, building the Stadia that Google couldn&amp;rsquo;t pull off, because the L40S hasn&amp;rsquo;t forgotten that it&amp;rsquo;s a graphics GPU. 
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;It&amp;rsquo;s going to get chilly in Chicago in a month or so. Go light some cycles on fire! &lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Making Machines Move</title>
        <link rel="alternate" href="https://fly.io/blog/machine-migrations/"/>
        <id>https://fly.io/blog/machine-migrations/</id>
        <published>2024-07-30T00:00:00+00:00</published>
        <updated>2024-08-09T12:14:08+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/machine-migrations/assets/migrations-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a global public cloud with simple, developer-friendly ergonomics. If you’ve got a working Docker image, we’ll transmogrify it into a Fly Machine: a VM running on our hardware anywhere in the world. &lt;a href="https://fly.io/speedrun" title=""&gt;Try it out; you’ll be deployed in just minutes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;At the heart of our platform is a systems design tradeoff about durable storage for applications.  When we added storage three years ago, to support stateful apps, we built it on attached NVMe drives. A benefit: a Fly App accessing a file on a Fly Volume is never more than a bus hop away from the data. A cost: a Fly App with an attached Volume is anchored to a particular worker physical.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;&lt;code&gt;bird&lt;/code&gt;: a BGP4 route server.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Before offering attached storage, our on-call runbook was almost as simple as “de-bird that edge server”, “tell &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;Nomad&lt;/a&gt; to drain that worker”, and “go back to sleep”. NVMe cost us that drain operation, which terribly complicated the lives of our infra team. We’ve spent the last year getting “drain” back. It’s one of the biggest engineering lifts we&amp;rsquo;ve made, and if you didn’t notice, we lifted it cleanly.&lt;/p&gt;
&lt;h3 id='the-goalposts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-goalposts' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Goalposts&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;With stateless apps, draining a worker is easy. For each app instance running on the victim server, start a new instance elsewhere. Confirm it&amp;rsquo;s healthy, then kill the old one. Rinse, repeat. At our 2020 scale, we could drain a fully loaded worker in just a handful of minutes.&lt;/p&gt;

&lt;p&gt;You can see why this process won&amp;rsquo;t work for apps with attached volumes. Sure, create a new volume elsewhere on the fleet, and boot up a new Fly Machine attached to it. But the new volume is empty. The data&amp;rsquo;s still stuck on the original worker. We asked, and customers were not OK with this kind of migration.&lt;/p&gt;

&lt;p&gt;Of course, we back Volumes snapshots up (at an interval) to off-network storage. But for “drain”, restoring backups isn&amp;rsquo;t nearly good enough. No matter the backup interval, a “restore from backup migration&amp;quot; will lose data, and a “backup and restore” migration  incurs untenable downtime.&lt;/p&gt;

&lt;p&gt;The next thought you have is, “OK, copy the volume over”. And, yes, of course you have to do that. But you can’t just &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt;, and then &lt;code&gt;kill&lt;/code&gt; the old Fly Machine. Because the original Fly Machine is still alive and writing, you have to &lt;code&gt;kill&lt;/code&gt;first, then &lt;code&gt;copy&lt;/code&gt;, then &lt;code&gt;boot&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Fly Volumes can get pretty big. Even to a rack buddy physical server, you&amp;rsquo;ll hit a point where draining incurs minutes of interruption, especially if you’re moving lots of volumes simultaneously. &lt;code&gt;Kill&lt;/code&gt;, &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; is too slow.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;There’s a world where even 15 minutes of interruption is tolerable. It’s the world where you run more than one instance of your application to begin with, so prolonged interruption of a single Fly Machine isn’t visible to your users. Do this! But we have to live in the same world as our customers, many of whom don’t run in high-availability configurations.&lt;/p&gt;
&lt;/div&gt;&lt;h3 id='behold-the-clone-o-mat' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#behold-the-clone-o-mat' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Behold The Clone-O-Mat&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;Copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt;, &lt;code&gt;kill&lt;/code&gt; loses data. &lt;code&gt;Kill&lt;/code&gt;, &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; takes too long. What we needed is a new operation: &lt;code&gt;clone&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Clone&lt;/code&gt; is a lazier, asynchronous &lt;code&gt;copy&lt;/code&gt;. It creates a new volume elsewhere on our fleet, just like &lt;code&gt;copy&lt;/code&gt; would. But instead of blocking, waiting to transfer every byte from the original volume, &lt;code&gt;clone&lt;/code&gt; returns immediately, with a transfer running in the background.&lt;/p&gt;

&lt;p&gt;A new Fly Machine can be booted with that cloned volume attached. Its blocks are mostly empty. But that’s OK: when the new Fly Machine tries to read from it, the block storage system works out whether the block has been transferred. If it hasn’t, it’s fetched over the network from the original volume; this is called &amp;ldquo;hydration&amp;rdquo;. Writes are even easier, and don’t hit the network at all.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Kill&lt;/code&gt;, &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; is slow. But &lt;code&gt;kill&lt;/code&gt;, &lt;code&gt;clone&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; is fast; it can be made asymptotically as fast as stateless migration.&lt;/p&gt;

&lt;p&gt;There are three big moving pieces to this design.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;First, we have to rig up our OS storage system to make this &lt;code&gt;clone&lt;/code&gt; operation work.
&lt;/li&gt;&lt;li&gt;Then, to read blocks over the network, we need a network protocol. (Spoiler: iSCSI, though we tried other stuff first.)
&lt;/li&gt;&lt;li&gt;Finally, the gnarliest of those pieces is our orchestration logic: what’s running where, what state is it in, and whether it’s plugged in correctly.
&lt;/li&gt;&lt;/ol&gt;
&lt;h3 id='block-level-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#block-level-clone' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Block-Level Clone&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The Linux feature we need to make this work already exists; &lt;a href='https://docs.kernel.org/admin-guide/device-mapper/dm-clone.html' title=''&gt;it’s called &lt;code&gt;dm-clone&lt;/code&gt;&lt;/a&gt;. Given an existing, readable storage device, &lt;code&gt;dm-clone&lt;/code&gt; gives us a new device, of identical size, where reads of uninitialized blocks will pull from the original. It sounds terribly complicated, but it’s actually one of the simpler kernel lego bricks. Let&amp;rsquo;s demystify it.&lt;/p&gt;

&lt;p&gt;As far as Unix is concerned, random-access storage devices, be they spinning rust or NVMe drives, are all instances of the common class “block device”. A block device is addressed in fixed-size (say, 4KiB) chunks, and &lt;a href='https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L356' title=''&gt;handles (roughly) these operations&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cpp"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-woz6bsz9"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-woz6bsz9"&gt;&lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;req_opf&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* read sectors from the device */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_READ&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* write sectors to the device */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_WRITE&lt;/span&gt;        &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* flush the volatile write cache */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_FLUSH&lt;/span&gt;        &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* discard sectors */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_DISCARD&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* securely erase sectors */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_SECURE_ERASE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* write the same sector many times */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_WRITE_SAME&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* write the zero filled sector many times */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_WRITE_ZEROES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* ... */&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can imagine designing a simple network protocol that supported all these options. It might have messages that looked something like:&lt;/p&gt;

&lt;p&gt;&lt;img alt="A packet diagram, just skip down to &amp;quot;struct bio&amp;quot; below" src="/blog/machine-migrations/assets/packet.png?2/3&amp;amp;center" /&gt;
Good news! The Linux block system is organized as if your computer was a network running a protocol that basically looks just like that. Here’s the message structure:&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I’ve &lt;a href="https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L223" title=""&gt;stripped a bunch of stuff out of here&lt;/a&gt; but you don’t need any of it to understand what’s coming next.&lt;/p&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative cpp"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-kwrloyie"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-kwrloyie"&gt;&lt;span class="cm"&gt;/*
 * main unit of I/O for the block layer and lower layers (ie drivers and
 * stacking drivers)
 */&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;bio&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;gendisk&lt;/span&gt;      &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bi_disk&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;        &lt;span class="n"&gt;bi_opf&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;short&lt;/span&gt;      &lt;span class="n"&gt;bi_flags&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;short&lt;/span&gt;      &lt;span class="n"&gt;bi_ioprio&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;blk_status_t&lt;/span&gt;        &lt;span class="n"&gt;bi_status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;short&lt;/span&gt;      &lt;span class="n"&gt;bi_vcnt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;         &lt;span class="cm"&gt;/* how many bio_vec's */&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;bio_vec&lt;/span&gt;      &lt;span class="n"&gt;bi_inline_vecs&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="cm"&gt;/* (page, len, offset) tuples */&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;No nerd has ever looked at a fixed-format message like this without thinking about writing a proxy for it, and &lt;code&gt;struct bio&lt;/code&gt; is no exception. The proxy system in the Linux kernel for &lt;code&gt;struct bio&lt;/code&gt; is called &lt;code&gt;device mapper&lt;/code&gt;, or DM.&lt;/p&gt;

&lt;p&gt;DM target devices can plug into other DM devices. For that matter, they can do whatever the hell else they want, as long as they honor the interface. It boils down to a &lt;code&gt;map(bio)&lt;/code&gt; function, which can dispatch a &lt;code&gt;struct bio&lt;/code&gt;, or drop it, or muck with it and ask the kernel to resubmit it.&lt;/p&gt;

&lt;p&gt;You can do a whole lot of stuff with this interface: carve a big device into a bunch of smaller ones (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/linear.html' title=''&gt;&lt;code&gt;dm-linear&lt;/code&gt;&lt;/a&gt;), make one big striped device out of a bunch of smaller ones (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/striped.html' title=''&gt;&lt;code&gt;dm-stripe&lt;/code&gt;&lt;/a&gt;), do software RAID mirroring (&lt;code&gt;dm-raid1&lt;/code&gt;), create snapshots of arbitrary existing devices (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/snapshot.html' title=''&gt;&lt;code&gt;dm-snap&lt;/code&gt;&lt;/a&gt;), cryptographically verify boot devices (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/verity.html' title=''&gt;&lt;code&gt;dm-verity&lt;/code&gt;&lt;/a&gt;), and a bunch more. Device Mapper is the kernel backend for the &lt;a href='https://sourceware.org/lvm2/' title=''&gt;userland LVM2 system&lt;/a&gt;, which is how we do &lt;a href='https://fly.io/blog/persistent-storage-and-fast-remote-builds/' title=''&gt;thin pools and snapshot backups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Which brings us to &lt;code&gt;dm-clone&lt;/code&gt; : it’s a map function that boils down to:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cpp"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-8n5vrld6"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-8n5vrld6"&gt;    &lt;span class="cm"&gt;/* ... */&lt;/span&gt; 
    &lt;span class="n"&gt;region_nr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bio_to_region&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// we have the data&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dm_clone_is_region_hydrated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_nr&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;remap_and_issue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// we don't and it's a read&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bio_data_dir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;READ&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;remap_to_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// we don't and it's a write&lt;/span&gt;
    &lt;span class="n"&gt;remap_to_dest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;hydrate_bio_region&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="cm"&gt;/* ... */&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;a &lt;a href="https://docs.kernel.org/admin-guide/device-mapper/kcopyd.html" title=""&gt;&lt;code&gt;kcopyd&lt;/code&gt;&lt;/a&gt; thread runs in the background, rehydrating the device in addition to (and independent of) read accesses.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code&gt;dm-clone&lt;/code&gt; takes, in addition to the source device to clone from, a “metadata” device on which is stored a bitmap of the status of all the blocks: either “rehydrated” from the source, or not. That’s how it knows whether to fetch a block from the original device or the clone.&lt;/p&gt;
&lt;h3 id='network-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#network-clone' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Network Clone&lt;/span&gt;&lt;/h3&gt;&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;&lt;code&gt;flyd&lt;/code&gt; in a nutshell:&lt;/strong&gt; worker physical run a service, &lt;code&gt;flyd&lt;/code&gt;, which manages a couple of databases that are the source of truth for all the Fly Machines running there. Concepturally, &lt;code&gt;flyd&lt;/code&gt; is a server for on-demand instances of durable finite state machines, each representing some operation on a Fly Machine (creation, start, stop, &amp;amp;c), with the transition steps recorded carefully in a BoltDB database. An FSM step might be something like “assign a local IPv6 address to this interface”, or “check out a block device with the contents of this container”, and it’s straightforward to add and manage new ones.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Say we&amp;rsquo;ve got &lt;code&gt;flyd&lt;/code&gt; managing a Fly Machine with a volume on &lt;code&gt;worker-xx-cdg1-1&lt;/code&gt;. We want it running on &lt;code&gt;worker-xx-cdg1-2&lt;/code&gt;. Our whole fleet is meshed with WireGuard; everything can talk directly to everything else. So, conceptually:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-1&lt;/code&gt; stops the Fly Machine, and
&lt;/li&gt;&lt;li&gt;sends a message to &lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-2&lt;/code&gt; telling it to clone the source volume.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-2&lt;/code&gt; starts a &lt;code&gt;dm-clone&lt;/code&gt; instance, which creates a clone volume on &lt;code&gt;cdg1-2&lt;/code&gt;, populating it, over some kind of network block protocol, from &lt;code&gt;cdg1-1&lt;/code&gt;, and
&lt;/li&gt;&lt;li&gt;boots a new Fly Machine, attached to the clone volume.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-2&lt;/code&gt; monitors the clone operation, and, when the clone completes, converts the clone device to a simple linear device and cleans up.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;For step (3) to work, the “original volume” on &lt;code&gt;cdg1-1&lt;/code&gt; has to be visible on &lt;code&gt;cdg1-2&lt;/code&gt;, which means we need to mount it over the network.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;&lt;code&gt;nbd&lt;/code&gt; is so simple that it’s used as a sort of &lt;code&gt;dm-user&lt;/code&gt; userland block device; to prototype a new block device, &lt;a href="https://lwn.net/ml/linux-kernel/[email protected]/" title=""&gt;don’t bother writing a kernel module&lt;/a&gt;, just write an &lt;code&gt;nbd&lt;/code&gt; server.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Take your pick of  protocols. iSCSI is the obvious one, but it’s relatively complicated, and Linux has native support for a much simpler one: &lt;code&gt;nbd&lt;/code&gt;, the “network block device”. You could implement an &lt;code&gt;nbd&lt;/code&gt; server in an afternoon, on top of a file or a SQLite database or S3, and the Linux kernel could mount it as a drive.&lt;/p&gt;

&lt;p&gt;We started out using &lt;code&gt;nbd&lt;/code&gt;. But we kept getting stuck &lt;code&gt;nbd&lt;/code&gt; kernel threads when there was any kind of network disruption. We’re a global public cloud; network disruption  happens. Honestly, we could have debugged our way through this. But it was simpler just to spike out an iSCSI implementation, observe that didn’t get jammed up when the network hiccuped, and move on.&lt;/p&gt;
&lt;h3 id='putting-the-pieces-together' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#putting-the-pieces-together' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Putting The Pieces Together&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;To drain a worker with minimal downtime and no lost data, we turn workers into a temporary SANs, serving the volumes we need to drain to fresh-booted replica Fly Machines on a bunch of “target” physicals. Those SANs — combinations of &lt;code&gt;dm-clone&lt;/code&gt;, iSCSI, and &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;our &lt;code&gt;flyd&lt;/code&gt; orchestrator&lt;/a&gt; — track the blocks copied from the origin, copying each one exactly once and cleaning up when the original volume has been fully copied.&lt;/p&gt;

&lt;p&gt;Problem solved!&lt;/p&gt;
&lt;h3 id='no-there-were-more-problems' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#no-there-were-more-problems' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;No, There Were More Problems&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;When your problem domain is hard, anything you build whose design you can’t fit completely in your head is going to be a fiasco. Shorter form: “if you see Raft consensus in a design, we’ve done something wrong”.&lt;/p&gt;

&lt;p&gt;A virtue of this migration system is that, for as many moving pieces as it has, it fits in your head. What complexity it has is mostly shouldered by strategic bets we’ve already  built teams around, most notably the &lt;code&gt;flyd&lt;/code&gt; orchestrator. So we’ve been running this system for the better part of a year without much drama. Not no drama, though. Some drama.&lt;/p&gt;

&lt;p&gt;Example: we encrypt volumes. Our key management is fussy. We do per-volume encryption keys that provision alongside the volumes themselves, so no one worker has a volume skeleton key.&lt;/p&gt;

&lt;p&gt;If you think “migrating those volume keys from worker to worker” is the problem I’m building up to, well, that too, but the bigger problem is &lt;code&gt;trim&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Most people use just a small fraction of the volumes they allocate. A 100GiB volume with just 5MiB used wouldn’t be at all weird. You don’t want to spend minutes copying a volume that could have been fully hydrated in seconds.&lt;/p&gt;

&lt;p&gt;And indeed, &lt;code&gt;dm-clone&lt;/code&gt; doesn’t want to do that either. Given a source block device (for us, an iSCSI mount) and the clone device, a &lt;code&gt;DISCARD&lt;/code&gt; issued on the clone device will get picked up by &lt;code&gt;dm-clone&lt;/code&gt;, which will simply &lt;a href='https://elixir.bootlin.com/linux/v5.11.11/source/drivers/md/dm-clone-target.c#L1357' title=''&gt;short-circuit the read&lt;/a&gt; of the relevant blocks by marking them as hydrated in the metadata volume. Simple enough.&lt;/p&gt;

&lt;p&gt;To make that work, we need the target worker to see the plaintext of the source volume (so that it can do an &lt;code&gt;fstrim&lt;/code&gt; — don’t get us started on how annoying it is to sandbox this — to read the filesystem, identify the unused block, and issue the &lt;code&gt;DISCARDs&lt;/code&gt; where &lt;code&gt;dm-clone&lt;/code&gt; can see them) Easy enough.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;these curses have a lot to do with how hard it was to drain workers!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Except: two different workers, for cursed reasons, might be running different versions of &lt;a href='https://gitlab.com/cryptsetup/cryptsetup' title=''&gt;cryptsetup&lt;/a&gt;, the userland bridge between LUKS2 and the &lt;a href='https://docs.kernel.org/admin-guide/device-mapper/dm-crypt.html' title=''&gt;kernel dm-crypt driver&lt;/a&gt;. There are (or were) two different versions of cryptsetup on our network, and they default to different &lt;a href='https://fossies.org/linux/cryptsetup/docs/on-disk-format-luks2.pdf' title=''&gt;LUKS2 header sizes&lt;/a&gt; — 4MiB and 16MiB. Implying two different plaintext volume sizes. &lt;/p&gt;

&lt;p&gt;So now part of the migration FSM is an RPC call that carries metadata about the designed LUKS2 configuration for the target VM. Not something we expected to have to build, but, whatever.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Corrosion deserves its own post.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Gnarlier example: workers are the source of truth for information about the Fly Machines running on them. Migration knocks the legs out from under that constraint, which we were relying on in Corrosion, the SWIM-gossip SQLite database we use to connect Fly Machines to our request routing. Race conditions. Debugging. Design changes. Next!&lt;/p&gt;

&lt;p&gt;Gnarliest example: our private networks. Recall: we automatically place every Fly Machine into &lt;a href='https://fly.io/blog/incoming-6pn-private-networks/' title=''&gt;a private network&lt;/a&gt;; by default, it’s the one all the other apps in your organization run in. This is super handy for setting up background services, databases, and clustered applications. 20 lines of eBPF code in our worker kernels keeps anybody from “crossing the streams”, sending packets from one private network to another.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;we’re members of an elite cadre of idiots who have managed to find designs that made us wish IPv6 addresses were even bigger.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We call this scheme 6PN (for “IPv6 Private Network”). It functions by &lt;a href='https://fly.io/blog/ipv6-wireguard-peering#ipv6-private-networking-at-fly' title=''&gt;embedding routing information directly into IPv6 addresses&lt;/a&gt;. This is, perhaps, gross. But it allows us to route diverse private networks with constantly changing membership across a global fleet of servers without running a distributed routing protocol. As the beardy wizards who kept the Internet backbone up and running on Cisco AGS+’s once said: the best routing protocol is “static”.&lt;/p&gt;

&lt;p&gt;Problem: the embedded routing information in a 6PN address refers in part to specific worker servers.&lt;/p&gt;

&lt;p&gt;That’s fine, right? They’re IPv6 addresses. Nobody uses literal IPv6 addresses. Nobody uses IP addresses at all; they use the DNS. When you migrate a host, just give it a new 6PN address, and update the DNS.&lt;/p&gt;

&lt;p&gt;Friends, somebody did use literal IPv6 addresses. It was us. In the configurations for Fly Postgres clusters.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;It’s also not operationally easy for us to shell into random Fly Machines, for good reason.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The obvious fix for this is not complicated; given &lt;code&gt;flyctl&lt;/code&gt; ssh access to a Fly Postgres cluster, it’s like a 30 second ninja edit. But we run a &lt;em&gt;lot&lt;/em&gt; of Fly Postgres clusters, and the change has to be coordinated carefully to avoid getting the cluster into a confused state. We went as far as adding feature to our &lt;code&gt;init&lt;/code&gt; to do network address mappings to keep old 6PN addresses reachable before biting the bullet and burning several weeks doing the direct configuration fix fleet-wide.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Speedrun your app onto Fly.io.&lt;/h1&gt;
    &lt;p&gt;3&amp;hellip;2&amp;hellip;1&amp;hellip;&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/speedrun"&gt;
        Go! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-dog.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h3 id='the-learning-it-burns' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-learning-it-burns' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Learning, It Burns!&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;We get asked a lot why we don’t do storage the “obvious” way, with an &lt;a href='https://aws.amazon.com/ebs/' title=''&gt;EBS-type&lt;/a&gt; SAN fabric, abstracting it away from our compute. Locally-attached NVMe storage is an idiosyncratic choice, one we’ve had to write disclaimers for (single-node clusters can lose data!) since we first launched it.&lt;/p&gt;

&lt;p&gt;One answer is: we’re a startup. Building SAN infrastructure in every region we operate in would be tremendously expensive. Look at any feature in AWS that normal people know the name of, like EC2, EBS, RDS, or S3 — there’s a whole company in there. We launched storage when we were just 10 people, and even at our current size we probably have nothing resembling the resources EBS gets. AWS is pretty great!&lt;/p&gt;

&lt;p&gt;But another thing to keep in mind is: we’re learning as we go. And so even if we had the means to do an EBS-style SAN, we might not build it today.&lt;/p&gt;

&lt;p&gt;Instead, we’re a lot more interested in log-structured virtual disks (LSVD). LSVD uses NVMe as a local cache, but durably persists writes in object storage. You get most of the performance benefit of bus-hop disk writes, along with unbounded storage and S3-grade reliability.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://community.fly.io/t/bottomless-s3-backed-volumes/15648' title=''&gt;We launched LSVD experimentally last year&lt;/a&gt;; in the intervening year, something happened to make LSVD even more interesting to us: &lt;a href='https://www.tigrisdata.com/' title=''&gt;Tigris Data&lt;/a&gt; launched S3-compatible object storage in every one our regions, so instead of backhauling updates to Northern Virginia, &lt;a href='https://community.fly.io/t/tigris-backed-volumes/20792' title=''&gt;we can keep them local&lt;/a&gt;. We have more to say about LSVD, and a lot more to say about Tigris.&lt;/p&gt;

&lt;p&gt;Our first several months of migrations were done gingerly. By summer of 2024, we got to where our infra team can pull “drain this host” out of their toolbelt without much ceremony.&lt;/p&gt;

&lt;p&gt;We’re still not to the point where we’re migrating casually. Your Fly Machines are probably not getting migrated! There&amp;rsquo;d need to be a reason! But the dream is fully-automated luxury space migration, in which you might get migrated semiregularly, as our systems work not just to drain problematic hosts but to rebalance workloads regularly. No time soon. But we’ll get there.&lt;/p&gt;

&lt;p&gt;This is the biggest thing our team has done since we replaced Nomad with flyd. Only the new billing system comes close. We did this thing not because it was easy, but because we thought it would be easy. It was not. But: worth it!&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>AWS without Access Keys</title>
        <link rel="alternate" href="https://fly.io/blog/oidc-cloud-roles/"/>
        <id>https://fly.io/blog/oidc-cloud-roles/</id>
        <published>2024-06-19T00:00:00+00:00</published>
        <updated>2024-06-25T22:52:32+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/oidc-cloud-roles/assets/spooky-security-skeleton-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;It’s dangerous to go alone. Fly.io runs full-stack apps by transmuting Docker containers into Fly Machines: ultra-lightweight hardware-backed VMs. You can run all your dependencies on Fly.io, but sometimes, you’ll need to work with other clouds, and we’ve made that pretty simple. Try Fly.io out for yourself; your Rails or Node app &lt;a href="https://fly.io/speedrun" title=""&gt;can be up and running in just minutes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s hypopulate you an app serving generative AI cat images based on the weather forecast, running on a &lt;code&gt;g4dn.xlarge&lt;/code&gt; ECS task in AWS &lt;code&gt;us-east-1&lt;/code&gt;.  It&amp;rsquo;s going great; people didn&amp;rsquo;t realize how dependent their cat pic prefs are on barometric pressure, and you&amp;rsquo;re all anyone can talk about.&lt;/p&gt;

&lt;p&gt;Word reaches Australia and Europe, but you&amp;rsquo;re not catching on, because the… latency is too high? Just roll with us here. Anyways: fixing this is going to require replicating  ECS tasks and ECR images into &lt;code&gt;ap-southeast-2&lt;/code&gt; and &lt;code&gt;eu-central-1&lt;/code&gt; while also setting up load balancing. Nah.&lt;/p&gt;

&lt;p&gt;This is the O.G. Fly.io deployment story; one deployed app, one versioned container, one command to get it running anywhere in the world.&lt;/p&gt;

&lt;p&gt;But you have a problem: your app relies on training data, it&amp;rsquo;s huge, your giant employer manages it, and it&amp;rsquo;s in S3. Getting this to work will require AWS credentials.&lt;/p&gt;

&lt;p&gt;You could ask your security team to create a user, give it permissions, and hand over the AWS keypair. Then you could wash your neck and wait for the blade. Passing around AWS keypairs is the beginning of every horror story told about cloud security, and  security team ain&amp;rsquo;t having it.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a better way. It&amp;rsquo;s drastically more secure, so your security people will at least hear you out. It&amp;rsquo;s also so much easier on Fly.io that you might never bother creating a IAM service account again.&lt;/p&gt;
&lt;h2 id='lets-get-it-out-of-the-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lets-get-it-out-of-the-way' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Let&amp;rsquo;s Get It out of the Way&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;re going to use OIDC to set up strictly limited trust between AWS and Fly.io.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In AWS: we&amp;rsquo;ll add Fly.io as an &lt;code&gt;Identity Provider&lt;/code&gt; in AWS IAM, giving us an ID we can plug into any IAM &lt;code&gt;Role&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;Also in AWS: we&amp;rsquo;ll create a &lt;code&gt;Role&lt;/code&gt;, give it access to the S3 bucket with our tokenized cat data, and then attach the &lt;code&gt;Identity Provider&lt;/code&gt; to it.
&lt;/li&gt;&lt;li&gt;In Fly.io, we&amp;rsquo;ll take the &lt;code&gt;Role&lt;/code&gt; ARN we got from step 2 and set it as an environment variable in our app.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Our machines will now magically have access to the S3 bucket.&lt;/p&gt;
&lt;h2 id='what-the-what' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-the-what' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What the What&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;A reasonable question to ask here is, &amp;ldquo;where&amp;rsquo;s the credential&amp;rdquo;? Ordinarily, to give a Fly Machine access to an AWS resource, you&amp;rsquo;d use &lt;code&gt;fly secrets set&lt;/code&gt; to add an &lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt; and &lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt; to the environment in the Machine. Here, we&amp;rsquo;re not setting any secrets at all; we&amp;rsquo;re just adding an ARN — which is not a credential — to the Machine.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s what&amp;rsquo;s happening.&lt;/p&gt;

&lt;p&gt;Fly.io operates an OIDC IdP at &lt;code&gt;oidc.fly.io&lt;/code&gt;. It issues OIDC tokens, exclusively to Fly Machines. AWS can be configured to trust these tokens, on a role-by-role basis. That&amp;rsquo;s the &amp;ldquo;secret credential&amp;rdquo;: the pre-configured trust relationship in IAM, and the public keypairs it manages. You, the user, never need to deal with these keys directly; it all happens behind the scenes, between AWS and Fly.io.&lt;/p&gt;

&lt;p&gt;&lt;img alt="A diagram: STS trusts OIDC.fly.io. OIDC.fly.io trusts flyd. flyd issues a token to the Machine, which proffers it to STS. STS sends an STS cred to the Machine, which then uses it to retrieve model weights from S3." src="/blog/oidc-cloud-roles/assets/oidc-diagram.webp" /&gt;&lt;/p&gt;

&lt;p&gt;The key actor in this picture is &lt;code&gt;STS&lt;/code&gt;, the AWS &lt;code&gt;Security Token Service&lt;/code&gt;. &lt;code&gt;STS&lt;/code&gt;&amp;lsquo;s main job is to vend short-lived AWS credentials, usually through some variant of an API called &lt;code&gt;AssumeRole&lt;/code&gt;. Specifically, in our case: &lt;code&gt;AssumeRoleWithWebIdentity&lt;/code&gt; tells &lt;code&gt;STS&lt;/code&gt; to cough up an AWS keypair given an OIDC token (that matches a pre-configured trust relationship).&lt;/p&gt;

&lt;p&gt;That still leaves the question: how does your code, which is reaching out to the AWS APIs to get cat weights, drive any of this?&lt;/p&gt;
&lt;h2 id='the-init-thickens' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-init-thickens' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Init Thickens&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Every Fly Machine boots up into an &lt;code&gt;init&lt;/code&gt; we wrote in Rust. It has slowly been gathering features.&lt;/p&gt;

&lt;p&gt;One of those features, which has been around for awhile, is a server for a Unix socket at &lt;code&gt;/.fly/api&lt;/code&gt;, which exports a subset of the Fly Machines API to privileged processes in the Machine. Think of it as our answer to the EC2 Instant Metadata Service. How it works is, every time we boot a Fly Machine, we pass it a &lt;a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''&gt;Macaroon token&lt;/a&gt; locked to that particular Machine; &lt;code&gt;init&lt;/code&gt;&amp;rsquo;s server for &lt;code&gt;/.fly/api&lt;/code&gt; is a proxy that attaches that token to requests.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;In addition to the API proxy being tricky to SSRF to.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;What&amp;rsquo;s neat about this is that the credential that drives &lt;code&gt;/.fly/api&lt;/code&gt; is doubly protected:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Fly.io platform won&amp;rsquo;t honor it unless it comes from that specific Fly Machine (&lt;code&gt;flyd&lt;/code&gt;, our orchestrator, knows who it&amp;rsquo;s talking to), &lt;em&gt;and&lt;/em&gt;
&lt;/li&gt;&lt;li&gt;Ordinary code running in a Fly Machine never gets a copy of the token to begin with.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;You could rig up a local privilege escalation vulnerability and work out how to steal the Macaroon, but you can&amp;rsquo;t exfiltrate it productively.&lt;/p&gt;

&lt;p&gt;So now you have half the puzzle worked out: OIDC is just part of the &lt;a href='https://fly.io/docs/machines/api/' title=''&gt;Fly Machines API&lt;/a&gt; (specifically: &lt;code&gt;/v1/tokens/oidc&lt;/code&gt;). A Fly Machine can hit a Unix socket and get an OIDC token tailored to that machine:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-xdcj19sc"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-xdcj19sc"&gt;{
  "app_id": "3671581",
  "app_name": "weather-cat",
  "aud": "sts.amazonaws.com",
  "image": "image:latest",
  "image_digest": "sha256:dff79c6da8dd4e282ecc6c57052f7cfbd684039b652f481ca2e3324a413ee43f",
  "iss": "https://oidc.fly.io/example",
  "machine_id": "3d8d377ce9e398",
  "machine_name": "ancient-snow-4824",
  "machine_version": "01HZJXGTQ084DX0G0V92QH3XW4",
  "org_id": "29873298",
  "org_name": "example",
  "region": "yyz",
  "sub": "example:weather-cat:ancient-snow-4824"
} // some OIDC stuff trimmed
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Look upon this holy blob, sealed with a published key managed by Fly.io&amp;rsquo;s OIDC vault, and see that there lies within it enough information for AWS &lt;code&gt;STS&lt;/code&gt; to decide to issue a session credential.&lt;/p&gt;

&lt;p&gt;We have still not completed the puzzle, because while you can probably now see how you&amp;rsquo;d drive this process with a bunch of new code that you&amp;rsquo;d tediously write, you are acutely aware that you have not yet endured that tedium — e pur si muove!&lt;/p&gt;

&lt;p&gt;One &lt;code&gt;init&lt;/code&gt; feature remains to be disclosed, and it&amp;rsquo;s cute.&lt;/p&gt;

&lt;p&gt;If, when &lt;code&gt;init&lt;/code&gt; starts in a Fly Machine, it sees an &lt;code&gt;AWS_ROLE_ARN&lt;/code&gt; environment variable set, it initiates a little dance; it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;goes off and generates an OIDC token, the way we just described,
&lt;/li&gt;&lt;li&gt;saves that OIDC token in a file, &lt;em&gt;and&lt;/em&gt;
&lt;/li&gt;&lt;li&gt;sets the &lt;code&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/code&gt; and &lt;code&gt;AWS_ROLE_SESSION_NAME&lt;/code&gt; environment variables for every process it launches.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;The AWS SDK, linked to your application, does all the rest.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s review: you add an &lt;code&gt;AWS_ROLE_ARN&lt;/code&gt; variable to your Fly App, launch a Machine, and have it go fetch a file from S3. What happens next is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;init&lt;/code&gt; detects &lt;code&gt;AWS_ROLE_ARN&lt;/code&gt; is set as an environment variable.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;init&lt;/code&gt; sends a request to &lt;code&gt;/v1/tokens/oidc&lt;/code&gt; via &lt;code&gt;/.api/proxy&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;init&lt;/code&gt; writes the response to &lt;code&gt;/.fly/oidc_token.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;&lt;code&gt;init&lt;/code&gt; sets &lt;code&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/code&gt; and &lt;code&gt;AWS_ROLE_SESSION_NAME&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;The entrypoint boots, and (say) runs &lt;code&gt;aws s3 get-object.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The AWS SDK runs through the &lt;a href='https://docs.aws.amazon.com/sdkref/latest/guide/standardized-credentials.html#credentialProviderChain' title=''&gt;credential provider chain&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;The SDK sees that &lt;code&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/code&gt; is set and calls &lt;code&gt;AssumeRoleWithWebIdentity&lt;/code&gt; with the file contents.
&lt;/li&gt;&lt;li&gt;AWS verifies the token against &lt;a href='https://oidc.fly.io/' title=''&gt;&lt;code&gt;https://oidc.fly.io/&lt;/code&gt;&lt;/a&gt;&lt;code&gt;example/.well-known/openid-configuration&lt;/code&gt;, which references a key Fly.io manages on isolated hardware.
&lt;/li&gt;&lt;li&gt;AWS vends &lt;code&gt;STS&lt;/code&gt; credentials for the assumed &lt;code&gt;Role&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;The SDK uses the &lt;code&gt;STS&lt;/code&gt; credentials to access the S3 bucket.
&lt;/li&gt;&lt;li&gt;AWS checks the &lt;code&gt;Role&lt;/code&gt;&amp;rsquo;s IAM policy to see if it has access to the S3 bucket.
&lt;/li&gt;&lt;li&gt;AWS returns the contents of the bucket object.
&lt;/li&gt;&lt;/ol&gt;
&lt;h2 id='how-much-better-is-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-much-better-is-this' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How Much Better Is This?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;It is a lot better.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;They asymptotically approach the security properties of Macaroon tokens.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Most importantly: AWS &lt;code&gt;STS&lt;/code&gt; credentials are short-lived. Because they&amp;rsquo;re generated dynamically, rather than stored in a configuration file or environment variable, they&amp;rsquo;re already a little bit annoying for an attacker to recover. But they&amp;rsquo;re also dead in minutes. They have a sharply limited blast radius. They rotate themselves, and fail closed.&lt;/p&gt;

&lt;p&gt;They&amp;rsquo;re also easier to manage. This is a rare instance where you can reasonably drive the entire AWS side of the process from within the web console. Your cloud team adds &lt;code&gt;Roles&lt;/code&gt; all the time; this is just a &lt;code&gt;Role&lt;/code&gt; with an extra snippet of JSON. The resulting ARN isn&amp;rsquo;t even a secret; your cloud team could just email or Slack message it back to you.&lt;/p&gt;

&lt;p&gt;Finally, they offer finer-grained control.&lt;/p&gt;

&lt;p&gt;To understand the last part, let&amp;rsquo;s look at that extra snippet of JSON (the &amp;ldquo;Trust Policy&amp;rdquo;) your cloud team is sticking on the new &lt;code&gt;cat-bucket&lt;/code&gt; &lt;code&gt;Role&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-x99m930o"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-x99m930o"&gt;{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::123456123456:oidc-provider/oidc.fly.io/example"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
              "StringEquals": {
                "oidc.fly.io/example:aud": "sts.amazonaws.com",
              },
               "StringLike": {
                "oidc.fly.io/example:sub": "example:weather-cat:*"
              }
            }
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;The &lt;code&gt;aud&lt;/code&gt; check guarantees &lt;code&gt;STS&lt;/code&gt; will only honor tokens that Fly.io deliberately vended for &lt;code&gt;STS&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Recall the OIDC token we dumped earlier; much of what&amp;rsquo;s in it, we can match in the Trust Policy. Every OIDC token Fly.io generates is going to have a &lt;code&gt;sub&lt;/code&gt; field formatted &lt;code&gt;org:app:machine&lt;/code&gt;, so we can lock IAM &lt;code&gt;Roles&lt;/code&gt; down to organizations, or to specific Fly Apps, or even specific Fly Machine instances.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Speedrun your app onto Fly.io.&lt;/h1&gt;
    &lt;p&gt;3&amp;hellip;2&amp;hellip;1&amp;hellip;&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/speedrun"&gt;
        Go! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='and-so' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#and-so' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;And So&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In case it&amp;rsquo;s not obvious: this pattern works for any AWS API, not just S3.&lt;/p&gt;

&lt;p&gt;Our OIDC support on the platform and in Fly Machines will set arbitrary OIDC &lt;code&gt;audience&lt;/code&gt; strings, so you can use it to authenticate to any OIDC-compliant cloud provider. It won&amp;rsquo;t be as slick on Azure or GCP, because we haven&amp;rsquo;t done the &lt;code&gt;init&lt;/code&gt; features to light their APIs up with a single environment variable — but those features are easy, and we&amp;rsquo;re just waiting for people to tell us what they need.&lt;/p&gt;

&lt;p&gt;For us, the gold standard for least-privilege, conditional access tokens remains Macaroons, and it&amp;rsquo;s unlikely that we&amp;rsquo;re going to do a bunch of internal stuff using OIDC. We even snuck Macaroons into this feature. But the security you&amp;rsquo;re getting from this OIDC dance closes a lot of the gap between hardcoded user credentials and Macaroons, and it&amp;rsquo;s easy to use — easier, in some ways, than it is to manage role-based access inside of a legacy EC2 deployment!&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Picture This: Open Source AI for Image Description</title>
        <link rel="alternate" href="https://fly.io/blog/llm-image-description/"/>
        <id>https://fly.io/blog/llm-image-description/</id>
        <published>2024-05-09T00:00:00+00:00</published>
        <updated>2024-05-23T20:00:21+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/llm-image-description/assets/image-description-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;I’m Nolan, and I work on Fly Machines here at Fly.io. We’re building a new public cloud—one where you can spin up CPU and GPU workloads, around the world, in a jiffy. &lt;a href="https://fly.io/speedrun/" title=""&gt;Try us out&lt;/a&gt;; you can be up and running in minutes. This is a post about LLMs being really helpful, and an extensible project you can build with open source on a weekend.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Picture this, if you will.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;re blind. You&amp;rsquo;re in an unfamiliar hotel room on a trip to Chicago.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;If you live in Chicago IRL, imagine the hotel in Winnipeg, &lt;a href="https://www.cbc.ca/history/EPISCONTENTSE1EP10CH3PA5LE.html" title=""&gt;the Chicago of the North&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;You&amp;rsquo;ve absent-mindedly set your coffee down, and can&amp;rsquo;t remember where. You&amp;rsquo;re looking for the thermostat so you don&amp;rsquo;t wake up frozen. Or, just maybe, you&amp;rsquo;re playing a fun-filled round of &amp;ldquo;find the damn light switch so your sighted partner can get some sleep already!&amp;rdquo;&lt;/p&gt;

&lt;p&gt;If, like me, you&amp;rsquo;ve been blind for a while, you have plenty of practice finding things without the luxury of a quick glance around. It may be more tedious than you&amp;rsquo;d like, but you&amp;rsquo;ll get it done.&lt;/p&gt;

&lt;p&gt;But the speed of innovation in machine learning and large language models has been dizzying, and in 2024 you can snap a photo with your phone and have an app like &lt;a href='https://www.bemyeyes.com/blog/announcing-be-my-ai/' title=''&gt;Be My AI&lt;/a&gt; or &lt;a href='https://www.seeingai.com/' title=''&gt;Seeing AI&lt;/a&gt; tell you where in that picture it found your missing coffee mug, or where it thinks the light switch is.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Creative switch locations seem to be a point of pride for hotels, so the light game may be good for a few rounds of quality entertainment, regardless of how good your AI is.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This is &lt;em&gt;big&lt;/em&gt;. It&amp;rsquo;s hard for me to state just how exciting and empowering AI image descriptions have been for me without sounding like a shill. In the past year, I&amp;rsquo;ve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Found shit in strange hotel rooms. 
&lt;/li&gt;&lt;li&gt;Gotten descriptions of scenes and menus in otherwise inaccessible video games.
&lt;/li&gt;&lt;li&gt;Requested summaries of technical diagrams and other materials where details weren’t made available textually. 
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;I&amp;rsquo;ve been consistently blown away at how impressive and helpful AI-created image descriptions have been.&lt;/p&gt;

&lt;p&gt;Also&amp;hellip;&lt;/p&gt;
&lt;h2 id='which-thousand-words-is-this-picture-worth' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#which-thousand-words-is-this-picture-worth' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Which thousand words is this picture worth?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;As a blind internet user for the last three decades, I have extensive empirical evidence to corroborate what you already know in your heart: humans are pretty flaky about writing useful alt text for all the images they publish. This does tend to make large swaths of the internet inaccessible to me!&lt;/p&gt;

&lt;p&gt;In just a few years, the state of image description on the internet has gone from complete reliance on the aforementioned lovable, but ultimately tragically flawed, humans, to automated strings of words like &lt;code&gt;Image may contain person, glasses, confusion, banality, disillusionment&lt;/code&gt;, to LLM-generated text that reads a lot like it was written by a person, perhaps sipping from a steaming cup of Earl Grey as they reflect on their previous experiences of a background that features a tree with snow on its branches, suggesting that this scene takes place during winter.&lt;/p&gt;

&lt;p&gt;If an image is missing alt text, or if you want a second opinion, there are screen-reader addons, like &lt;a href='https://github.com/cartertemm/AI-content-describer/' title=''&gt;this one&lt;/a&gt; for &lt;a href='https://www.nvaccess.org/download/' title=''&gt;NVDA&lt;/a&gt;, that you can use with an API key to get image descriptions from GPT-4 or Google Gemini as you read. This is awesome! &lt;/p&gt;

&lt;p&gt;And this brings me to the nerd snipe. How hard would it be to build an image description service we can host ourselves, using open source technologies? It turns out to be spookily easy.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s what I came up with:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href='https://ollama.com/' title=''&gt;Ollama&lt;/a&gt; to run the model
&lt;/li&gt;&lt;li&gt;A &lt;a href='https://pocketbase.io' title=''&gt;PocketBase&lt;/a&gt; project that provides a simple authenticated API for users to submit images, get descriptions, and ask followup questions about the image
&lt;/li&gt;&lt;li&gt;The simplest possible Python client to interact with the PocketBase app on behalf of users
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;The idea is to keep it modular and hackable, so if sentiment analysis or joke creation is your thing, you can swap out image description for that and have something going in, like, a weekend.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re like me, and you go skipping through recipe blogs to find the &amp;ldquo;go directly to recipe&amp;rdquo; link, find the code itself &lt;a href='https://github.com/superfly/llm-describer' title=''&gt;here&lt;/a&gt;. &lt;/p&gt;
&lt;h2 id='the-llm-is-the-easiest-part' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-llm-is-the-easiest-part' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The LLM is the easiest part&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;An API to accept images and prompts, run the model, and spit 
out answers sounds like a lot! But it&amp;rsquo;s the simplest part of this whole thing, because: 
that&amp;rsquo;s &lt;a href='https://ollama.com/' title=''&gt;Ollama&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can just run the Ollama Docker image, get it to grab the model 
you want to use, and that&amp;rsquo;s it. There&amp;rsquo;s your AI server. (We have a &lt;a href='https://fly.io/blog/scaling-llm-ollama/' title=''&gt;blog post&lt;/a&gt; 
all about deploying Ollama on Fly.io; Fly GPUs are rad, try&amp;#39;em out, etc.).&lt;/p&gt;

&lt;p&gt;For this project, we need a model that can make sense&amp;mdash;or at least words&amp;mdash;out of a picture. 
&lt;a href='https://llava-vl.github.io/' title=''&gt;LLaVA&lt;/a&gt; is a trained, Apache-licensed &amp;ldquo;large multimodal model&amp;rdquo; that fits the bill. 
Get the model with the Ollama CLI:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-vsa102iz"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-vsa102iz"&gt;ollama pull llava:34b
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="callout"&gt;&lt;p&gt;If you have hardware that can handle it, you could run this on your computer at home. If you run AI models on a cloud provider, be aware that GPU compute is expensive! &lt;strong class="font-semibold text-navy-950"&gt;It’s important to take steps to ensure you’re not paying for a massive GPU 24/7.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On Fly.io, at the time of writing, you’d achieve this with the &lt;a href="https://fly.io/docs/apps/autostart-stop/" title=""&gt;autostart and autostop&lt;/a&gt; functions of the Fly Proxy, restricting Ollama access to internal requests over &lt;a href="https://fly.io/docs/networking/private-networking/#flycast-private-fly-proxy-services" title=""&gt;Flycast&lt;/a&gt; from the PocketBase app. That way, if there haven’t been any requests for a few minutes, the Fly Proxy stops the Ollama &lt;a href="https://fly.io/docs/machines/" title=""&gt;Machine&lt;/a&gt;, which releases the CPU, GPU, and RAM allocated to it. &lt;a href="https://fly.io/blog/scaling-llm-ollama/" title=""&gt;Here’s a post&lt;/a&gt; that goes into more detail. &lt;/p&gt;
&lt;/div&gt;&lt;h2 id='a-multi-tool-on-the-backend' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-multi-tool-on-the-backend' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A multi-tool on the backend&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I want user auth to make sure just anyone can&amp;rsquo;t grab my &amp;ldquo;image description service&amp;rdquo; and keep it busy generating short stories about their cat. If I build this out into a service for others to use, I might also want business logic around plans or
credits, or mobile-friendly APIs for use in the field. &lt;a href='https://pocketbase.io' title=''&gt;PocketBase&lt;/a&gt; provides a scaffolding for all of it. It&amp;rsquo;s a Swiss army knife: a Firebase-like API on top of SQLite, complete with authentication, authorization, an admin UI, extensibility in JavaScript and Go, and various client-side APIs.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Yes, &lt;em&gt;of course&lt;/em&gt; I’ve used an LLM to generate feline fanfic. Theme songs too. Hasn’t everyone? &lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I &amp;ldquo;faked&amp;rdquo; a task-specific API that supports followup questions by extending PocketBase in Go, modeling requests and responses as &lt;a href='https://pocketbase.io/docs/collections/' title=''&gt;collections&lt;/a&gt; (i.e. SQLite tables) with &lt;a href='https://pocketbase.io/docs/go-event-hooks/' title=''&gt;event hooks&lt;/a&gt; to trigger pre-set interactions with the Ollama app (via &lt;a href='https://tmc.github.io/langchaingo' title=''&gt;LangChainGo&lt;/a&gt;) and the client (via the PocketBase API).&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re following along, &lt;a href='https://github.com/superfly/llm-describer/blob/main/describer.go' title=''&gt;here&amp;rsquo;s the module&lt;/a&gt;
that handles all that, along with initializing the LLM connection.&lt;/p&gt;

&lt;p&gt;In a nutshell, this is the dance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When a user uploads an image, a hook on the &lt;code&gt;images&lt;/code&gt; collection sends the image to Ollama, along with this prompt:
&lt;code&gt;&amp;quot;You are a helpful assistant describing images for blind screen reader users. Please describe this image.&amp;quot;&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;Ollama sends back its response, which the backend 1) passes back to the client and 2) stores in its &lt;code&gt;followups&lt;/code&gt; collection for future reference.
&lt;/li&gt;&lt;li&gt;If the user responds with a followup question about the image and description, that also 
goes into the &lt;code&gt;followups&lt;/code&gt; collection; user-initiated changes to this collection trigger a hook to chain the new 
followup question with the image and the chat history into a new request for the model.
&lt;/li&gt;&lt;li&gt;Lather, rinse, repeat.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This is a super simple hack to handle followup questions, and it’ll let you keep adding followups until 
something breaks. You&amp;rsquo;ll see the quality of responses get poorer&amp;mdash;possibly incoherent&amp;mdash;as the context 
exceeds the context window.&lt;/p&gt;

&lt;p&gt;I also set up &lt;a href='https://pocketbase.io/docs/api-rules-and-filters/' title=''&gt;API rules&lt;/a&gt; in PocketBase,
ensuring that users can&amp;rsquo;t read to and write from others&amp;rsquo; chats with the AI.&lt;/p&gt;

&lt;p&gt;If image descriptions aren&amp;rsquo;t your thing, this business logic is easily swappable 
for joke generation, extracting details from text, any other simple task you 
might want to throw at an LLM. Just slot the best model into Ollama (LLaVA is pretty OK as a general starting point too), and match the PocketBase schema and pre-set prompts to your application.&lt;/p&gt;
&lt;h2 id='a-seedling-of-a-client' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-seedling-of-a-client' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A seedling of a client&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;With the image description service in place, the user can talk to it with any client that speaks the PocketBase API. PocketBase already has SDK clients in JavaScript and Dart, but because my screen reader is &lt;a href='https://github.com/nvaccess/nvda' title=''&gt;written in Python&lt;/a&gt;, I went with a &lt;a href='https://pypi.org/project/pocketbase/' title=''&gt;community-created Python library&lt;/a&gt;. That way I can build this out into an NVDA add-on 
if I want to.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re a fancy Python developer, you probably have your preferred tooling for
handling virtualenvs and friends. I&amp;rsquo;m not, and since my screen reader doesn&amp;rsquo;t use those
anyway, I just &lt;code&gt;pip install&lt;/code&gt;ed the library so my client can import it:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-rgh35fwn"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-rgh35fwn"&gt;pip install pocketbase
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a href='https://github.com/superfly/llm-describer/blob/main/__init__.py' title=''&gt;My client&lt;/a&gt; is a very simple script. 
It expects a couple of things: a file called &lt;code&gt;image.jpg&lt;/code&gt;, located in the current directory, 
and environment variables to provide the service URL and user credentials to log into it with.&lt;/p&gt;

&lt;p&gt;When you run the client script, it uploads the image to the user’s &lt;code&gt;images&lt;/code&gt; collection on the 
backend app, starting the back-and-forth between user and model we saw in the previous section. 
The client prints the model&amp;rsquo;s output to the CLI and prompts the user to input a followup question, 
which it passes up to the &lt;code&gt;followups&lt;/code&gt; collection, and so on.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;This can run on Fly.io.&lt;/h1&gt;
    &lt;p&gt;Run your LLM on a datacenter-grade GPU.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/gpu/"&gt;
        Try out a Fly GPU &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-dog.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='all-together-now' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#all-together-now' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;All together now&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I grabbed &lt;a href='https://unsplash.com/photos/brown-trees-beside-river-under-blue-sky-during-daytime-kSvpTrfhaiU' title=''&gt;this
image&lt;/a&gt;
and saved it to a file called &lt;em&gt;image.jpg&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;While I knew I was downloading an image of a winter scene, all I see on Unsplash is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;brown trees beside river under blue sky during daytime Bright winter landscape
with lake, snow, forest, beautiful blue sky and white clouds. An example of
charming wildlife in Russia.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let&amp;rsquo;s see what our very own AI describer thinks of this picture:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative plain"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-4bvdtqnv"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-4bvdtqnv"&gt;$ python __init__.py
The image depicts a serene winter scene. There is a small stream of water flowing through the center, with patches of snow on its surface and along the banks. The surrounding area appears to be a mix of bare trees and shrubs, suggesting it's late in the season or possibly early spring when some plants have started to grow again but haven't yet leafed out.

The sky is clear and blue, indicating good weather conditions at the time the photo was taken. The sunlight seems to be coming from the left side of the image, casting a warm glow on the scene and creating reflections in the water. There are no visible texts or distinguishing marks that provide additional context about the location or the photographer.

The overall atmosphere is peaceful and natural, with the quiet beauty of the landscape undisturbed except for the gentle movement of the stream.
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Is it a stellar description? Maybe not, but it certainly gives me a better sense of connection with the scene.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s see how our describer copes with a followup question.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative plain"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-1x1pz9x9"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-1x1pz9x9"&gt;Enter your followup question, or 'quit' to quit: What types of trees are in the image?
Sending followup question
It's difficult to determine the exact species of trees in the image without more specific details, as many deciduous trees lose their leaves during winter and look quite similar. However, based on the general appearance of the bare branches and the fact that they are leafless, it is likely that these are common deciduous trees such as oaks, maples, or elms.

The presence of snow and the absence of foliage suggest that this photo was taken during winter or early spring when the trees have not yet started to bloom again. The exact species would require a closer examination of the bark, buds, and other distinguishing features, which are not clearly visible in this image.
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Boo, the general-purpose LLaVA model couldn&amp;rsquo;t identify the leafless trees. At least it knows why it can&amp;rsquo;t. Maybe there&amp;rsquo;s a better model out 
there for that. Or we could train one, if we really needed tree identification! We could make every component of 
this service more sophisticated! &lt;/p&gt;

&lt;p&gt;But that I, personally, can make a proof of concept like this with a few days of effort
continues to boggle my mind. Thanks to a handful of amazing open source projects, it&amp;rsquo;s really, spookily, easy. And from here, I (or you) can build out a screen-reader addon, or a mobile app, or a different kind of AI service, with modular changes.&lt;/p&gt;
&lt;h2 id='deployment-notes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#deployment-notes' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Deployment notes&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;On Fly.io, stopping GPU Machines saves you a bunch of money and some carbon footprint, in return for cold-start latency when you make a request for the first time in more than a few minutes. In testing this project, on the &lt;code&gt;a100-40gb&lt;/code&gt; Fly Machine preset, the 34b-parameter LLaVA model took several seconds to generate each response. If the Machine was stopped when the request came in, starting it up took another handful of seconds, followed by several tens of seconds to load the model into GPU RAM. The total time from cold start to completed description was about 45 seconds. Just something to keep in mind.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re running Ollama in the cloud, you likely want to put the model onto storage that&amp;rsquo;s persistent, so you don&amp;rsquo;t have to download it repeatedly. You could also build the model into a Docker image ahead of deployment.&lt;/p&gt;

&lt;p&gt;The PocketBase Golang app compiles to a single executable that you can run wherever.
I run it on Fly.io, unsurprisingly, and the &lt;a href='https://github.com/superfly/llm-describer/' title=''&gt;repo&lt;/a&gt; comes with a Dockerfile and a &lt;a href='https://fly.io/docs/reference/configuration/' title=''&gt;&lt;code&gt;fly.toml&lt;/code&gt;&lt;/a&gt; config file, which you can edit to point at your own Ollama instance. It uses a small persistent storage volume for the SQLite database. Under testing, it runs fine on a &lt;code&gt;shared-cpu-1x&lt;/code&gt; Machine. &lt;/p&gt;</content>
    </entry>
    <entry>
        <title>JIT WireGuard</title>
        <link rel="alternate" href="https://fly.io/blog/jit-wireguard-peers/"/>
        <id>https://fly.io/blog/jit-wireguard-peers/</id>
        <published>2024-03-12T00:00:00+00:00</published>
        <updated>2024-05-23T20:00:21+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/jit-wireguard-peers/assets/network-thumbnail.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. We do a lot of stuff with WireGuard, which has become a part of our customer API. This is a quick story about some tricks we played to make WireGuard faster and more scalable for the hundreds of thousands of people who now use it here.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;One of many odd decisions we&amp;rsquo;ve made at Fly.io is how we use WireGuard. It&amp;rsquo;s not just that we use it in many places where other shops would use HTTPS and REST APIs. We&amp;rsquo;ve gone a step beyond that: every time you run &lt;code&gt;flyctl&lt;/code&gt;, our lovable, sprawling CLI, it conjures a TCP/IP stack out of thin air, with its own IPv6 address, and speaks directly to Fly Machines running on our networks.&lt;/p&gt;

&lt;p&gt;There are plusses and minuses to this approach, which we talked about &lt;a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''&gt;in a blog post a couple years back&lt;/a&gt;. Some things, like remote-operated Docker builders, get easier to express (a Fly Machine, as far as &lt;code&gt;flyctl&lt;/code&gt; is concerned, might as well be on the same LAN). But everything generally gets trickier to keep running reliably.&lt;/p&gt;

&lt;p&gt;It was a decision. We own it.&lt;/p&gt;

&lt;p&gt;Anyways, we&amp;rsquo;ve made some improvements recently, and I&amp;rsquo;d like to talk about them.&lt;/p&gt;
&lt;h2 id='where-we-left-off' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-we-left-off' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Where we left off&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Until a few weeks ago, our gateways ran on a pretty simple system.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We operate dozens of &amp;ldquo;gateway&amp;rdquo; servers around the world, whose sole purpose is to accept incoming WireGuard connections and connect them to the appropriate private networks.
&lt;/li&gt;&lt;li&gt;Any time you run &lt;code&gt;flyctl&lt;/code&gt; and it needs to talk to a Fly Machine (to build a container, pop an SSH console, copy files, or proxy to a service you&amp;rsquo;re running), it spawns or connects to a background agent process.
&lt;/li&gt;&lt;li&gt;The first time it runs, the agent generates a new WireGuard peer configuration from our GraphQL API. WireGuard peer configurations are very simple: just a public key and an address to connect to.
&lt;/li&gt;&lt;li&gt;Our API in turn takes that peer configuration and sends it to the appropriate gateway (say, &lt;code&gt;ord&lt;/code&gt;, if you&amp;rsquo;re near Chicago) via an RPC we send over the NATS messaging system.
&lt;/li&gt;&lt;li&gt;On the gateway, a service called &lt;code&gt;wggwd&lt;/code&gt; accepts that configuration, saves it to a SQLite database, and adds it to the kernel using WireGuard&amp;rsquo;s Golang libraries. &lt;code&gt;wggwd&lt;/code&gt; acknowledges the installation of the peer to the API.
&lt;/li&gt;&lt;li&gt;The API replies to your GraphQL request, with the configuration.
&lt;/li&gt;&lt;li&gt;Your &lt;code&gt;flyctl&lt;/code&gt; connects to the WireGuard peer, which works, because you receiving the configuration means it’s installed on the gateway.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;I copy-pasted those last two bullet points from &lt;a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''&gt;that two-year-old post&lt;/a&gt;, because when it works, it does &lt;em&gt;just work&lt;/em&gt; reasonably well. (We ultimately did end up defaulting everybody to WireGuard-over-WebSockets, though.)&lt;/p&gt;

&lt;p&gt;But if it always worked, we wouldn&amp;rsquo;t be here, would we?&lt;/p&gt;

&lt;p&gt;We ran into two annoying problems:&lt;/p&gt;

&lt;p&gt;One: NATS is fast, but doesn’t guarantee delivery. Back in 2022, Fly.io was pretty big on NATS internally. We&amp;rsquo;ve moved away from it. For instance, our &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;internal &lt;code&gt;flyd&lt;/code&gt; API&lt;/a&gt; used to be driven by NATS; today, it&amp;rsquo;s HTTP.  Our NATS cluster was losing too many messages to host a reliable API on it. Scaling back our use of NATS made WireGuard gateways better, but still not great.&lt;/p&gt;

&lt;p&gt;Two: When &lt;code&gt;flyctl&lt;/code&gt; exits, the WireGuard peer it created sticks around on the gateway. Nothing cleans up old peers. After all, you&amp;rsquo;re likely going to come back tomorrow and deploy a new version of your app, or &lt;code&gt;fly ssh console&lt;/code&gt; into it to debug something. Why remove a peer just to re-add it the next day?&lt;/p&gt;

&lt;p&gt;Unfortunately, the vast majority of peers are created by &lt;code&gt;flyctl&lt;/code&gt; in CI jobs, which don&amp;rsquo;t have persistent storage and can&amp;rsquo;t reconnect to the same peer the next run; they generate new peers every time, no matter what.&lt;/p&gt;

&lt;p&gt;So, we ended up with a not-reliable-enough provisioning system, and gateways with hundreds of thousands of peers that will never be used again. The high stale peer count made kernel WireGuard operations very slow - especially loading all the peers back into the kernel after a gateway server reboot - as well as some kernel panics.&lt;/p&gt;

&lt;p&gt;There had to be&lt;/p&gt;
&lt;h2 id='a-better-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-better-way' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A better way.&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Storing bajillions of WireGuard peers is no big challenge for any serious n-tier RDBMS. This isn&amp;rsquo;t &amp;ldquo;big data&amp;rdquo;. The problem we have at Fly.io is that our gateways don&amp;rsquo;t have serious n-tier RDBMSs. They&amp;rsquo;re small. Scrappy. They live off the land.&lt;/p&gt;

&lt;p&gt;Seriously, though: you could store every WireGuard peer everybody has ever used at Fly.io in a single SQLite database, easily.  What you can&amp;rsquo;t do is store them all in the Linux kernel.&lt;/p&gt;

&lt;p&gt;So, at some point, as you push more and more peer configurations to a gateway, you have to start making decisions about which peers you&amp;rsquo;ll enable in the kernel, and which you won&amp;rsquo;t.&lt;/p&gt;

&lt;p&gt;Wouldn&amp;rsquo;t it be nice if we just didn&amp;rsquo;t have this problem? What if, instead of pushing configs to gateways, we had the gateways pull them from our API on demand?&lt;/p&gt;

&lt;p&gt;If you did that, peers would only have to be added to the kernel when the client wanted to connect. You could yeet them out of the kernel any time you wanted; the next time the client connected, they&amp;rsquo;d just get pulled again, and everything would work fine.&lt;/p&gt;

&lt;p&gt;The problem you quickly run into to build this design is that Linux kernel WireGuard doesn&amp;rsquo;t have a feature for installing peers on demand. However:&lt;/p&gt;
&lt;h2 id='it-is-possible-to-jit-wireguard-peers' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#it-is-possible-to-jit-wireguard-peers' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;It is possible to JIT WireGuard peers&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The Linux kernel&amp;rsquo;s &lt;a href='https://github.com/WireGuard/wgctrl-go' title=''&gt;interface for configuring WireGuard&lt;/a&gt; is &lt;a href='https://docs.kernel.org/userspace-api/netlink/intro.html' title=''&gt;Netlink&lt;/a&gt; (which is basically a way to create a userland socket to talk to a kernel service). Here&amp;rsquo;s a &lt;a href='https://github.com/WireGuard/wg-dynamic/blob/master/netlink.h' title=''&gt;summary of it as a C API&lt;/a&gt;. Note that there&amp;rsquo;s no API call to subscribe for &amp;ldquo;incoming connection attempt&amp;rdquo; events.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s OK! We can just make our own events. WireGuard connection requests are packets, and they&amp;rsquo;re easily identifiable, so we can efficiently snatch them with a BPF filter and a &lt;a href='https://github.com/google/gopacket' title=''&gt;packet socket&lt;/a&gt;.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Most of the time, it’s even easier for us to get the raw WireGuard packets, because our users now default to WebSockets WireGuard (which is just an unauthenticated WebSockets connect that shuttles framed UDP packets to and from an interface on the gateway), so that people who have trouble talking end-to-end in UDP can bring connections up.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We own the daemon code for that, and can just hook the packet receive function to snarf WireGuard packets.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s not obvious, but WireGuard doesn&amp;rsquo;t have notions of &amp;ldquo;client&amp;rdquo; or &amp;ldquo;server&amp;rdquo;. It&amp;rsquo;s a pure point-to-point protocol; peers connect to each other when they have traffic to send. The first peer to connect is called the &lt;strong class='font-semibold text-navy-950'&gt;initiator&lt;/strong&gt;, and the peer it connects to is the &lt;strong class='font-semibold text-navy-950'&gt;responder&lt;/strong&gt;.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;&lt;a href="https://www.wireguard.com/papers/wireguard.pdf" title=""&gt;&lt;em&gt;The WireGuard paper&lt;/em&gt;&lt;/a&gt; &lt;em&gt;is a good read.&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;For Fly.io, &lt;code&gt;flyctl&lt;/code&gt; is typically our initiator,  sending a single UDP packet to the gateway, which is the responder. According &lt;a href='https://www.wireguard.com/papers/wireguard.pdf' title=''&gt;to the WireGuard paper&lt;/a&gt;, this first packet is a &lt;code&gt;handshake initiation&lt;/code&gt;.  It gets better: the packet type is recorded in a single plaintext byte. So this simple BPF filter catches all the incoming connections: &lt;code&gt;udp and dst port 51820 and udp[8] = 1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In most other protocols, we&amp;rsquo;d be done at this point; we&amp;rsquo;d just scrape the username or whatnot out of the packet, go fetch the matching configuration, and install it in the kernel. With WireGuard, not so fast. WireGuard is based on Trevor Perrin&amp;rsquo;s &lt;a href='http://www.noiseprotocol.org/' title=''&gt;Noise Protocol Framework&lt;/a&gt;, and Noise goes way out of its way to &lt;a href='http://www.noiseprotocol.org/noise.html#identity-hiding' title=''&gt;hide identities&lt;/a&gt; during handshakes. To identify incoming requests, we&amp;rsquo;ll need to run enough Noise cryptography to decrypt the identity.&lt;/p&gt;

&lt;p&gt;The code to do this is fussy, but it&amp;rsquo;s relatively short (about 200 lines). Helpfully, the kernel Netlink interface will give a privileged process the private key for an interface, so the secrets we need to unwrap WireGuard are easy to get. Then it&amp;rsquo;s just a matter of running the first bit of the Noise handshake. If you&amp;rsquo;re that kind of nerdy, &lt;a href='https://gist.github.com/tqbf/9f2c2852e976e6566f962d9bca83062b' title=''&gt;here&amp;rsquo;s the code.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point, we have the event feed we wanted: the public keys of every user trying to make a WireGuard connection to our gateways. We keep a rate-limited cache in SQLite, and when we see new peers, we&amp;rsquo;ll make an internal HTTP API request to fetch the matching peer information and install it. This fits nicely into the little daemon that already runs on our gateways to manage WireGuard, and allows us to ruthlessly and recklessly remove stale peers with a &lt;code&gt;cron&lt;/code&gt; job.&lt;/p&gt;

&lt;p&gt;But wait! There&amp;rsquo;s more! We bounced this plan off Jason Donenfeld, and he tipped us off on a sneaky feature of the Linux WireGuard Netlink interface.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Jason is the hardest working person in show business.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Our API fetch for new peers is generally not going to be fast enough to respond to the first handshake initiation message a new client sends us. That&amp;rsquo;s OK; WireGuard is pretty fast about retrying. But we can do better.&lt;/p&gt;

&lt;p&gt;When we get an incoming initiation message, we have the 4-tuple address of the desired connection, including the ephemeral source port &lt;code&gt;flyctl&lt;/code&gt; is using. We can install the peer as if we&amp;rsquo;re the initiator, and &lt;code&gt;flyctl&lt;/code&gt; is the responder. The Linux kernel will initiate a WireGuard connection back to &lt;code&gt;flyctl&lt;/code&gt;. This works; the protocol doesn&amp;rsquo;t care a whole lot who&amp;rsquo;s the server and who&amp;rsquo;s the client. We get new connections established about as fast as they can possibly be installed.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Launch an app in minutes&lt;/h1&gt;
    &lt;p&gt;Speedrun an app onto Fly.io and get your own JIT WireGuard peer&amp;nbsp✨&lt;/p&gt;
      &lt;a class="btn btn-lg" href="/docs/speedrun/"&gt;
        Speedrun &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='look-at-this-graph' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#look-at-this-graph' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Look at this graph&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve been running this in production for a few weeks and we&amp;rsquo;re feeling pretty happy about it. We went from thousands, or hundreds of thousands, of stale WireGuard peers on a gateway to what rounds to none. Gateways now hold a lot less state, are faster at setting up peers, and can be rebooted without having to wait for many unused peers to be loaded back into the kernel.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ll leave you with this happy Grafana chart from the day of the switchover.&lt;/p&gt;

&lt;p&gt;&lt;img alt="a Grafana chart of &amp;#39;kernel_stale_wg_peer_count&amp;#39; vs. time. For the first few hours, all traces are flat. Most are at values between 0 and 50,000 and the top-most is just under 550,000. Towards the end of the graph, each line in turn jumps sharply down to the bottom, and at the end of the chart all datapoints are indistinguishable from 0." src="/blog/jit-wireguard-peers/assets/wireguard-peers-graph.webp" /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Editor&amp;rsquo;s note:&lt;/strong&gt; Despite our tearful protests, Lillian has decided to move on from Fly.io to explore new pursuits. We wish her much success and happiness!&amp;nbsp;✨&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Fly Kubernetes does more now</title>
        <link rel="alternate" href="https://fly.io/blog/fks-beta-live/"/>
        <id>https://fly.io/blog/fks-beta-live/</id>
        <published>2024-03-07T00:00:00+00:00</published>
        <updated>2024-04-24T22:38:38+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/fks-beta-live/assets/fks-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Eons ago, we &lt;a href="https://fly.io/blog/fks/" title=""&gt;announced&lt;/a&gt; we were working on &lt;a href="https://fly.io/docs/kubernetes/" title=""&gt;Fly Kubernetes&lt;/a&gt;. It drummed up enough excitement to prove we were heading in the right direction. So, we got hard to work to get from barebones “early access” to a beta release. We’ll be onboarding customers to the closed beta over the next few weeks. Email us at &lt;a href="mailto:[email protected]"&gt;[email protected]&lt;/a&gt; and we’ll hook you up.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Fly Kubernetes is the &amp;ldquo;blessed path&amp;quot;™️ to using Kubernetes backed by Fly.io infrastructure. Or, in simpler terms, it is our managed Kubernetes service. We take care of the complexity of operating the Kubernetes control plane, leaving you with the unfettered joy of deploying your Kubernetes workloads. If you love Fly.io and K8s, this product is for you.&lt;/p&gt;
&lt;h2 id='what-even-is-a-kubernete' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-even-is-a-kubernete' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What even is a Kubernete?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;So how did this all come to be&amp;mdash;and what even is a Kubernete?&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;You can see more fun details in &lt;a href="https://fly.io/blog/fks/" title=""&gt;Introducing Fly Kubernetes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you wade through all the YAML and &lt;a href='https://landscape.cncf.io/' title=''&gt;CNCF projects&lt;/a&gt;, what&amp;rsquo;s left is an API for declaring workloads and how it should be accessed. &lt;/p&gt;

&lt;p&gt;But that&amp;rsquo;s not what people usually talk / groan about. It&amp;rsquo;s everything else that comes along with adopting Kubernetes: a container runtime (CRI), networking between workloads (CNI) which leads to DNS (CoreDNS). Then you layer on Prometheus for metrics and whatever the logging daemon du jour is at the time. Now you get to debate which Ingress&amp;mdash;strike that&amp;mdash;&lt;em&gt;Gateway&lt;/em&gt; API to deploy and if the next thing is anything to do with a Service Mess, then as they like to say where I live, &amp;quot;bless your heart&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Finally, there&amp;rsquo;s capacity planning. You&amp;rsquo;ve got to pick and choose where, how and what the &lt;a href='https://kubernetes.io/docs/concepts/architecture/nodes/' title=''&gt;Nodes&lt;/a&gt; will look like in order to configure and run the workloads.&lt;/p&gt;

&lt;p&gt;When we began thinking about what a Fly Kubernetes Service could look like, we started from first principles, as we do with most everything here. The best way we can describe it is the &lt;a href='https://www.youtube.com/watch?v=Ddk9ci6geSs' title=''&gt;scene from Iron Man 2 when Tony Stark discovers a new element&lt;/a&gt;. As he&amp;rsquo;s looking at the knowledge left behind by those that came before, he starts to imagine something entirely different and more capable than could have been accomplished previously. That&amp;rsquo;s what happened to JP, but with K3s and Virtual Kubelet.&lt;/p&gt;
&lt;h2 id='ok-then-wtf-whats-the-fks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ok-then-wtf-whats-the-fks' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;OK then, WTF (what&amp;rsquo;s the FKS)?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We looked at what people need to get started—the API—and then started peeling away all the noise, filling in the gaps to connect things together to provide the power. Here&amp;rsquo;s how this looks currently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Containerd/CRI → &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;flyd&lt;/a&gt; + Firecracker + &lt;a href='https://fly.io/blog/docker-without-docker/' title=''&gt;our init&lt;/a&gt;: our system transmogrifies Docker containers into Firecracker microVMs
&lt;/li&gt;&lt;li&gt;Networking/CNI → Our &lt;a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''&gt;internal WireGuard mesh&lt;/a&gt; connects your pods together
&lt;/li&gt;&lt;li&gt;Pods → Fly Machines VMs
&lt;/li&gt;&lt;li&gt;Secrets → Secrets, only not the base64&amp;rsquo;d kind
&lt;/li&gt;&lt;li&gt;Services → The Fly Proxy
&lt;/li&gt;&lt;li&gt;CoreDNS → CoreDNS (to be replaced with our custom internal DNS)
&lt;/li&gt;&lt;li&gt;Persistent Volumes → Fly Volumes (coming soon)
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Now&amp;hellip;not everything is a one-to-one comparison, and we explicitly did not set out to support any and every configuration. We aren&amp;rsquo;t dealing with resources like Network Policy and init containers, though we&amp;rsquo;re also not completely ignoring them. By mapping many of the core primitives of Kubernetes to a Fly.io resource, we&amp;rsquo;re able to focus on continuing to build the primitives that make our cloud better for workloads of all shapes and sizes.&lt;/p&gt;

&lt;p&gt;A key thing to notice above is that there&amp;rsquo;s no &amp;ldquo;Node&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://virtual-kubelet.io/' title=''&gt;Virtual Kubelet&lt;/a&gt; plays a central role in FKS. It&amp;rsquo;s magic, really. A Virtual Kubelet acts as if it&amp;rsquo;s a standard Kubelet running on a Node, eager to run your workloads. However, there&amp;rsquo;s no Node backing it. It instead behaves like an API, receiving requests from Kubernetes and transforming them into requests to deploy on a cloud compute service. In our case, that&amp;rsquo;s Fly Machines.&lt;/p&gt;

&lt;p&gt;So what we have is Kubernetes calling out to our &lt;a href='https://virtual-kubelet.io/docs/providers/' title=''&gt;Virtual Kubelet provider&lt;/a&gt;, a small Golang program we run alongside K3s, to create and run your pod. It creates &lt;a href='https://fly.io/blog/docker-without-docker/' title=''&gt;your pod as a Fly Machine&lt;/a&gt;, via the &lt;a href='/docs/machines/api/' title=''&gt;Fly Machines API&lt;/a&gt;, deploying it to any underlying host within that region. This shifts the burden of managing hardware capacity from you to us. We think that&amp;rsquo;s a cool trick&amp;mdash;thanks, Virtual Kubelet magic!&lt;/p&gt;
&lt;h2 id='speedrun' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#speedrun' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Speedrun&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;You can deploy your workloads (including GPUs) across any of our available regions using the Kubernetes API.&lt;/p&gt;

&lt;p&gt;You create a cluster with &lt;code&gt;flyctl&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-fnxi6rft"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-fnxi6rft"&gt;fly ext k8s create --name hello --org personal --region iad
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;When a cluster is created, it has the standard &lt;code&gt;default&lt;/code&gt; namespace. You can inspect it:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-92wwv6kq"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-92wwv6kq"&gt;kubectl get ns default --show-labels
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative output"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-mk490mip"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight output whitespace-pre'&gt;&lt;code id="code-mk490mip"&gt;NAME      STATUS   AGE   LABELS
default   Active   20d   fly.io/app=fks-default-7zyjm3ovpdxmd0ep,kubernetes.io/metadata.name=default
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;fly.io/app&lt;/code&gt; label shows the name of the Fly App that corresponds to your cluster.&lt;/p&gt;

&lt;p&gt;It would seem appropriate to deploy the &lt;a href='https://github.com/kubernetes-up-and-running/kuard' title=''&gt;Kubernetes Up And Running demo&lt;/a&gt; here, but since your pods are connected over an &lt;a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''&gt;IPv6 WireGuard mesh&lt;/a&gt;, we&amp;rsquo;re going to use a &lt;a href='https://github.com/jipperinbham/kuard' title=''&gt;fork&lt;/a&gt; with support for &lt;a href='https://github.com/kubernetes-up-and-running/kuard/issues/46' title=''&gt;IPv6 DNS&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-7qz94xki"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-7qz94xki"&gt;kubectl run \
  --image=ghcr.io/jipperinbham/kuard-amd64:blue \
  --labels="app=kuard-fks" \
  kuard
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And you can see its Machine representation via:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-1wk7f1q0"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-1wk7f1q0"&gt;fly machine list --app fks-default-7zyjm3ovpdxmd0ep
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative output"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-7rbzov1i"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight output whitespace-pre'&gt;&lt;code id="code-7rbzov1i"&gt;ID              NAME        STATE   REGION  IMAGE                           IP ADDRESS                      VOLUME  CREATED                 LAST UPDATED            APP PLATFORM    PROCESS GROUP   SIZE
1852291c46ded8  kuard       started iad     jipperinbham/kuard-amd64:blue   fdaa:0:48c8:a7b:228:4b6d:6e20:2         2024-03-05T18:54:41Z    2024-03-05T18:54:44Z                                    shared-cpu-1x:256MB
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;/div&gt;&lt;/p&gt;

&lt;p&gt;This is important! Your pod is a Fly Machine! While we don’t yet support all kubectl features, Fly.io tooling will &amp;ldquo;just work&amp;rdquo; for cases where we don&amp;rsquo;t yet support the kubectl way. So, for example, we don&amp;rsquo;t have &lt;code&gt;kubectl port-forward&lt;/code&gt; and &lt;code&gt;kubectl exec&lt;/code&gt;, but you can use flyctl to forward ports and get a shell into a pod.&lt;/p&gt;

&lt;p&gt;Expose it to your internal network using the standard ClusterIP Service:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-1sjiwcq9"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-1sjiwcq9"&gt;kubectl expose pod kuard \
  --name=kuard \
  --port=8080 \
  --target-port=8080 \
  --selector='app=kuard-fks'
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;ClusterIP Services work natively, and Fly.io internal DNS supports them. Within the cluster, CoreDNS works too.&lt;/p&gt;

&lt;p&gt;Access this Service locally via &lt;a href='https://fly.io/docs/networking/private-networking/#flycast-private-load-balancing' title=''&gt;flycast&lt;/a&gt;: Get connected to your org&amp;rsquo;s &lt;a href='https://fly.io/docs/networking/private-networking/' title=''&gt;6PN private WireGuard network&lt;/a&gt;. Get kubectl to describe the &lt;code&gt;kuard&lt;/code&gt; Service:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-hy5q54ru"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-hy5q54ru"&gt;kubectl describe svc kuard
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative output"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-a8mzw85a"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight output'&gt;&lt;code id="code-a8mzw85a"&gt;Name:              kuard
Namespace:         default
Labels:            app=kuard-fks
Annotations:       fly.io/clusterip-allocator: configured
                   service.fly.io/sync-version: 11507529969321451315
Selector:          app=kuard-fks
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv6
IP:                fdaa:0:48c8:0:1::1a
IPs:               fdaa:0:48c8:0:1::1a
Port:              &amp;lt;unset&amp;gt;  8080/TCP
TargetPort:        8080/TCP
Endpoints:         [fdaa:0:48c8:a7b:228:4b6d:6e20:2]:8080
Session Affinity:  None
Events:            &amp;lt;none&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can pull out the Service&amp;rsquo;s IP address from the above output, and get at the KUARD UI using that: in this case, &lt;code&gt;http://[fdaa:0:48c8:0:1::1a]:8080&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Using internal DNS: &lt;code&gt;http://&amp;lt;service_name&amp;gt;.svc.&amp;lt;app_name&amp;gt;.flycast:8080&lt;/code&gt;. Or, in our example: &lt;code&gt;http://kuard.svc.fks-default-7zyjm3ovpdxmd0ep.flycast:8080&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And finally CoreDNS: &lt;code&gt;&amp;lt;service_name&amp;gt;.&amp;lt;namespace&amp;gt;.svc.cluster.local&lt;/code&gt; resolves to the &lt;code&gt;fdaa&lt;/code&gt; IP and is routable within the cluster.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Get in on the FKS beta&lt;/h1&gt;
    &lt;p&gt;Email us at [email protected]&lt;/p&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='pricing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pricing' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Pricing&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The Fly Kubernetes Service is free during the beta. Fly Machines and Fly Volumes you create with it will cost the &lt;a href='https://fly.io/docs/about/pricing/' title=''&gt;same as for your other Fly.io projects&lt;/a&gt;. It&amp;rsquo;ll be &lt;a href='https://fly.io/docs/about/pricing/#fly-kubernetes' title=''&gt;$75/mo per cluster&lt;/a&gt; after that, plus the cost of the other resources you create.&lt;/p&gt;
&lt;h2 id='today-and-the-future' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#today-and-the-future' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Today and the future&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Today, Fly Kubernetes supports only a portion of the Kubernetes API. You can deploy pods using Deployments/ReplicaSets. Pods are able to communicate via Services using the standard K8s DNS format. Ephemeral and persistent volumes are supported.&lt;/p&gt;

&lt;p&gt;The most notable absences are: multi-container pods, StatefulSets, network policies, horizontal pod autoscaling and emptyDir volumes. We&amp;rsquo;re working at supporting autoscaling and emptyDir volumes in the coming weeks and multi-container pods in the coming months.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;ve made it this far and are eagerly awaiting your chance to tell us and the rest of the internet &amp;ldquo;this isn&amp;rsquo;t Kubernetes!&amp;rdquo;, well, we agree! It&amp;rsquo;s not something we take lightly. We&amp;rsquo;re still building, and conformance tests may be in the future for FKS. We&amp;rsquo;ve made a deliberate decision to only care about fast launching VMs as the one and only way to run workloads on our cloud. And we also know enough of our customers would like to use the Kubernetes API to create a fast launching VM in the form of a Pod, and that&amp;rsquo;s where this story begins. &lt;/p&gt;</content>
    </entry>
    <entry>
        <title>Globally Distributed Object Storage with Tigris</title>
        <link rel="alternate" href="https://fly.io/blog/tigris-public-beta/"/>
        <id>https://fly.io/blog/tigris-public-beta/</id>
        <published>2024-02-15T00:00:00+00:00</published>
        <updated>2024-04-24T22:38:38+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/tigris-public-beta/assets/tigris-public-beta-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. That’s pretty cool, but we want to talk about something someone else built, that &lt;a href="https://fly.io/docs/reference/tigris/" title=""&gt;you can use today&lt;/a&gt; to build applications.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There are three hard things in computer science:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cache invalidation
&lt;/li&gt;&lt;li&gt;Naming things
&lt;/li&gt;&lt;li&gt;&lt;a href='https://aws.amazon.com/s3/' title=''&gt;Doing a better job than Amazon of storing files&lt;/a&gt;
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Of all the annoying software problems that have no business being annoying, handling a file upload in a full-stack application stands apart, a universal if tractable malady, the plantar fasciitis of programming.&lt;/p&gt;

&lt;p&gt;Now, the actual act of clients placing files on servers is straightforward. Your framework &lt;a href='https://hexdocs.pm/phoenix/file_uploads.html' title=''&gt;has&lt;/a&gt; &lt;a href='https://edgeguides.rubyonrails.org/active_storage_overview.html' title=''&gt;a&lt;/a&gt; &lt;a href='https://docs.djangoproject.com/en/5.0/topics/http/file-uploads/' title=''&gt;feature&lt;/a&gt; &lt;a href='https://expressjs.com/en/resources/middleware/multer.html' title=''&gt;that&lt;/a&gt; &lt;a href='https://github.com/yesodweb/yesod-cookbook/blob/master/cookbook/Cookbook-file-upload-saving-files-to-server.md' title=''&gt;does&lt;/a&gt; &lt;a href='https://laravel.com/docs/10.x/filesystem' title=''&gt;it&lt;/a&gt;. What&amp;rsquo;s hard is making sure that uploads stick around to be downloaded later.&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;(yes, yes, we know, &lt;a href="https://youtu.be/b2F-DItXtZs?t=102" title=""&gt;sharding /dev/null&lt;/a&gt; is faster)&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Enter object storage, a pattern you may know by its colloquial name &amp;ldquo;S3&amp;rdquo;. Object storage occupies a funny place in software architecture, somewhere between a database and a filesystem. It&amp;rsquo;s like &lt;a href='https://man7.org/linux/man-pages/man3/malloc.3.html' title=''&gt;&lt;code&gt;malloc&lt;/code&gt;&lt;/a&gt;&lt;code&gt;()&lt;/code&gt;, but for cloud storage instead of program memory.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://www.kleenex.com/en-us/' title=''&gt;S3&lt;/a&gt;—err, object storage — is so important that it was the second AWS service ever introduced (EC2 was not the first!). Everybody wants it. We know, because they keep asking us for it.&lt;/p&gt;

&lt;p&gt;So why didn&amp;rsquo;t we build it?&lt;/p&gt;

&lt;p&gt;Because we couldn&amp;rsquo;t figure out a way to improve on S3. And we still haven&amp;rsquo;t! But someone else did, at least for the kinds of applications we see on Fly.io.&lt;/p&gt;
&lt;h2 id='but-first-some-back-story' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-first-some-back-story' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;But First, Some Back Story&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;S3 checks all the boxes. It&amp;rsquo;s trivial to use. It&amp;rsquo;s efficient and cost-effective. It has redundancies that would make a DoD contractor blush. It integrates with archival services like Glacier. And every framework supports it. At some point, the IETF should just take a deep sigh and write an S3 API RFC, XML signatures and all.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s at least one catch, though.&lt;/p&gt;

&lt;p&gt;Back in, like, &amp;lsquo;07 people ran all their apps from a single city. S3 was designed to work for those kinds of apps. The data, the bytes on the disks (or whatever weird hyperputer AWS stores S3 bytes on), live in one place. A specific place. In a specific data center. As powerful and inspiring as The Architects are, they are mortals, and must obey the laws of physics.&lt;/p&gt;

&lt;p&gt;This observation feels banal, until you realize how apps have changed in the last decade. Apps and their users don&amp;rsquo;t live in one specific place. They live all over the world. When users are close to the S3 data center, things are amazing! But things get less amazing the further away you get from the data center, and even less amazing the smaller and more frequent your reads and writes are.&lt;/p&gt;

&lt;p&gt;(Thought experiment: you have to pick one place in the world to route all your file storage. Where is it? Is it &lt;a href='https://www.tripadvisor.com/Restaurant_Review-g30246-d1956555-Reviews-Ford_s_Fish_Shack_Ashburn-Ashburn_Loudoun_County_Virginia.html' title=''&gt;Loudoun County, Virginia&lt;/a&gt;?)&lt;/p&gt;

&lt;p&gt;So, for many modern apps, you end up having to &lt;a href='https://stackoverflow.com/questions/32426249/aws-s3-bucket-with-multiple-regions' title=''&gt;write things into different regions&lt;/a&gt;, so that people close to the data get it from a region-specific bucket. Doing that pulls in CDN-caching things that complicated your application and put barriers between you and your data. Before you know it, you&amp;rsquo;re wearing custom orthotics on your, uh, developer feet. (&lt;em&gt;I am done with this metaphor now, I promise.&lt;/em&gt;)&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;(well, okay, Backblaze B2 because somehow my bucket fits into their free tier, but you get the idea)&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Personally, I know this happens. Because I had to build one! I run a &lt;a href='https://xeiaso.net/blog/xedn/' title=''&gt;CDN backend&lt;/a&gt; that&amp;rsquo;s a caching proxy for S3 in six continents across the world. All so that I can deliver images and video efficiently for the readers of my blog.&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;(shut up, it’s a sandwich)&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;What if data was really global? For some applications, it might not matter much. But for others, it matters a lot. When a sandwich lover in Australia snaps a picture of a &lt;a href='https://en.wikipedia.org/wiki/Hamdog' title=''&gt;hamdog&lt;/a&gt;, the people most likely to want to see that photo are also in Australia. Routing those uploads and downloads through one building in Ashburn is no way to build a sandwich reviewing empire.&lt;/p&gt;

&lt;p&gt;Localizing all the data sounds like a hard problem. What if you didn&amp;rsquo;t need to change anything on your end to accomplish it?&lt;/p&gt;
&lt;h2 id='show-me-a-hero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#show-me-a-hero' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Show Me A Hero&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Building a miniature CDN infrastructure just to handle file uploads seems like the kind of thing that could take a week or so of tinkering. The Fly.io unified theory of cloud development is that solutions are completely viable for full-stack developers only when they take less than 2 hours to get working.&lt;/p&gt;

&lt;p&gt;AWS agrees, which is why they have a SKU for it, &lt;a href='https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudfront_distribution' title=''&gt;called Cloudfront&lt;/a&gt;, which will, at some variably metered expense, optimize the read side of a single-write-region bucket: they&amp;rsquo;ll set up &lt;a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''&gt;a simple caching CDN&lt;/a&gt; for you. You can probably get S3 and Cloudfront working within 2 hours, especially if you&amp;rsquo;ve set it up before.&lt;/p&gt;

&lt;p&gt;Our friends at Tigris have this problem down to single-digit minutes, and what they came up with is a lot cooler than a cache CDN.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s how it works. Tigris runs redundant FoundationDB clusters in our regions to track objects. They use Fly.io&amp;rsquo;s NVMe volumes as a first level of cached raw byte store, and a queuing system modelled on &lt;a href='https://www.foundationdb.org/files/QuiCK.pdf' title=''&gt;Apple&amp;rsquo;s QuiCK paper&lt;/a&gt; to distribute object data to multiple replicas, to regions where the data is in demand, and to 3rd party object stores… like S3.&lt;/p&gt;

&lt;p&gt;If your objects are less than about 128 kilobytes, Tigris makes them instantly global. By default! Things are just snappy, all over the world, automatically, because they&amp;rsquo;ve done all the work.&lt;/p&gt;

&lt;p&gt;But it gets better, because Tigris is also much more flexible than a cache simple CDN. It&amp;rsquo;s globally distributed from the jump, with inter-region routing baked into its distribution layer. Tigris isn&amp;rsquo;t a CDN, but rather a toolset that you can use to build arbitrary CDNs, with consistency guarantees, instant purge and relay regions.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a lot going on in this architecture, and it&amp;rsquo;d be fun to dig into it more. But for now, you don&amp;rsquo;t have to understand any of it. Because Tigris ties all this stuff together with an S3-compatible object storage API. If your framework can talk to S3, it can use Tigris.&lt;/p&gt;
&lt;h2 id='fly-storage' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-storage' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;&lt;code&gt;fly storage&lt;/code&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;To get started with this, run the &lt;code&gt;fly storage create&lt;/code&gt; command:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-rhojus0y"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-rhojus0y"&gt;$ fly storage create
Choose a name, use the default, or leave blank to generate one: xe-foo-images
Your Tigris project (xe-foo-images) is ready. See details and next steps with: https://fly.io/docs/reference/tigris/

Setting the following secrets on xe-foo:
AWS_REGION
BUCKET_NAME
AWS_ENDPOINT_URL_S3
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

Secrets are staged for the first deployment
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;All you have to do is fill in a bucket name. Hit enter. All of the configuration for the AWS S3 library will be injected into your application for you. And you don&amp;rsquo;t even need to change the libraries that you&amp;rsquo;re using. &lt;a href='https://www.tigrisdata.com/docs/sdks/s3/' title=''&gt;The Tigris examples&lt;/a&gt; all use the AWS libraries to put and delete objects into Tigris using the same calls that you use for S3.&lt;/p&gt;

&lt;p&gt;I know how this looks for a lot of you. It looks like we&amp;rsquo;re partnering with Tigris because we&amp;rsquo;re chicken, and we didn&amp;rsquo;t want to build something like this. Well, guess what: you&amp;rsquo;re right!&lt;/p&gt;

&lt;p&gt;Compute and networking: those are things we love and understand. Object storage? &lt;a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''&gt;We already gave away the game on how we&amp;rsquo;d design a CDN for our own content&lt;/a&gt;, and it wasn&amp;rsquo;t nearly as slick as Tigris.&lt;/p&gt;

&lt;p&gt;Object storage is important. It needs to be good. We did not want to half-ass it. So we partnered with Tigris, so that they can put their full resources into making object storage as ✨magical✨ as Fly.io is.&lt;/p&gt;

&lt;p&gt;This also mirrors a lot of the Unix philosophy of Days Gone Past, you have individual parts that do one thing very well that are then chained together to create a composite result. I mean, come on, would you seriously want to buy your servers the same place you buy your shoes?&lt;/p&gt;
&lt;h2 id='one-bill-to-rule-them-all' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#one-bill-to-rule-them-all' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;One bill to rule them all&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Well, okay, the main reason why you would want to do that is because having everything under one bill makes it really easy for your accounting people. So, to make one bill for your computer, your block storage, your databases, your networking, and your object storage, we&amp;rsquo;ve wrapped everything under one bill. You don&amp;rsquo;t have to create separate accounts with Supabase or Upstash or PlanetScale or Tigris. Everything gets charged to your Fly.io bill and you pay one bill per month.&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;This was actually going to be posted on Valentine’s Day, but we had to wait for the chocolate to go on sale.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;This is our Valentine&amp;rsquo;s Day gift to you all. Object storage that just works. Stay tuned because we have a couple exciting features that build on top of the integration of Fly.io and Tigris that allow really unique things, such as truly global static website hosting and turning your bucket into a CDN in 5 minutes at most.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s to many more happy developer days to come.&lt;/p&gt;</content>
    </entry>
    <entry>
        <title>GPUs on Fly.io are available to everyone!</title>
        <link rel="alternate" href="https://fly.io/blog/gpu-ga/"/>
        <id>https://fly.io/blog/gpu-ga/</id>
        <published>2024-02-12T00:00:00+00:00</published>
        <updated>2024-04-24T22:38:38+00:00</updated>
        <media:thumbnail url="https://fly.io/blog/gpu-ga/assets/gpu-ga-thumb.webp"/>
        <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Fly.io makes it easy to spin up compute around the world, now including powerful GPUs. Unlock the power of large language models, text transcription, and image generation with our datacenter-grade muscle!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;GPUs are now available to everyone!&lt;/p&gt;

&lt;p&gt;We know you&amp;rsquo;ve been excited about wanting to use GPUs on Fly.io and we&amp;rsquo;re happy to announce that they&amp;rsquo;re available for everyone. If you want, you can spin up GPU instances with any of the following cards:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ampere A100 (40GB) &lt;code&gt;a100-40gb&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;Ampere A100 (80GB) &lt;code&gt;a100-80gb&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;Lovelace L40s (48GB) &lt;code&gt;l40s&lt;/code&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;To use a GPU instance today, change the &lt;code&gt;vm.size&lt;/code&gt; for one of your apps or processes to any of the above GPU kinds. Here&amp;rsquo;s how you can spin up an &lt;a href='https://ollama.ai' title=''&gt;Ollama&lt;/a&gt; server in seconds:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-bcyvgy6u"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-bcyvgy6u"&gt;&lt;span class="py"&gt;app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"your-app-name"&lt;/span&gt;
&lt;span class="py"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ord"&lt;/span&gt;
&lt;span class="py"&gt;vm.size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"l40s"&lt;/span&gt;

&lt;span class="nn"&gt;[http_service]&lt;/span&gt;
  &lt;span class="py"&gt;internal_port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;11434&lt;/span&gt;
  &lt;span class="py"&gt;force_https&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="py"&gt;auto_stop_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;auto_start_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;min_machines_running&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="py"&gt;processes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;["app"]&lt;/span&gt;

&lt;span class="nn"&gt;[build]&lt;/span&gt;
  &lt;span class="py"&gt;image&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama/ollama"&lt;/span&gt;

&lt;span class="nn"&gt;[mounts]&lt;/span&gt;
  &lt;span class="py"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"models"&lt;/span&gt;
  &lt;span class="py"&gt;destination&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/root/.ollama"&lt;/span&gt;
  &lt;span class="py"&gt;initial_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"100gb"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Deploy this and bam, large language model inferencing from anywhere. If you want a private setup, see the article &lt;a href='https://fly.io/blog/scaling-llm-ollama/' title=''&gt;Scaling Large Language Models to zero with Ollama&lt;/a&gt; for more information. You never know when you have a sandwich emergency and don&amp;rsquo;t know what you can make with what you have on hand.&lt;/p&gt;

&lt;p&gt;We are working on getting some lower-cost A10 GPUs in the next few weeks. We&amp;rsquo;ll update you when they&amp;rsquo;re ready.&lt;/p&gt;

&lt;p&gt;If you want to explore the possibilities of GPUs on Fly.io, here&amp;rsquo;s a few articles that may give you ideas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='https://fly.io/blog/not-midjourney-bot/' title=''&gt;Deploy Your Own (Not) MidJourney Bot On Fly GPUs&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/blog/scaling-llm-ollama/' title=''&gt;Scaling Large Language Models to zero with Ollama&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''&gt;Transcribing on Fly GPU Machines&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Depending on factors such as your organization&amp;rsquo;s age and payment history, you may need to go through additional verification steps.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;ve been experimenting with Fly.io GPUs and have made something cool, let us know on the &lt;a href='https://community.fly.io/' title=''&gt;Community Forums&lt;/a&gt; or by mentioning us &lt;a href='https://hachyderm.io/@flydotio' title=''&gt;on Mastodon&lt;/a&gt;! We&amp;rsquo;ll boost the cool ones.&lt;/p&gt;</content>
    </entry>
</feed>
Raw text
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
  <title>The Fly Blog</title>
  <subtitle>News, tips, and tricks from the team at Fly</subtitle>
  <id>https://fly.io/blog/</id>
  <link href="https://fly.io/blog/"/>
  <link href="https://fly.io/blog/" rel="self"/>
  <updated>2026-03-10T00:00:00+00:00</updated>
  <author>
    <name>Fly</name>
  </author>
  <entry>
    <title>Unfortunately, Sprites Now Speak MCP</title>
    <link rel="alternate" href="https://fly.io/blog/unfortunately-mcp/"/>
    <id>https://fly.io/blog/unfortunately-mcp/</id>
    <published>2026-03-10T00:00:00+00:00</published>
    <updated>2026-03-12T19:30:19+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/unfortunately-mcp/assets/whack.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Sprites are disposable cloud computers. They appear instantly, always include durable filesystems, and cost practically nothing when idle. They’re the best and safest place on the Internet to run agents and we want you to &lt;a href="https://sprites.dev/" title=""&gt;create dozens of them&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Sprites are a place to run agents; the first thing you should think to do with a new Sprite is to type &lt;code&gt;claude&lt;/code&gt; (or &lt;code&gt;gemini&lt;/code&gt; or &lt;code&gt;codex&lt;/code&gt;). We&amp;rsquo;ve put a &lt;a href='https://fly.io/blog/design-and-implementation/' title=''&gt;lot of effort&lt;/a&gt; into making sure coding agents feel safe and happy when they&amp;rsquo;re on Sprites, because, to (probably) quote John von Neumann, &amp;ldquo;happy agents are productive agents.&amp;rdquo;&lt;/p&gt;

&lt;p&gt;What&amp;rsquo;s less obvious about Sprites is that they&amp;rsquo;re great tools &lt;em&gt;for&lt;/em&gt; agents. Want three different versions of a new feature? A test environment? An ensemble of cooperating services? It&amp;rsquo;s super handy to be able to start your prompts, &amp;ldquo;&lt;code&gt;On a new Sprite, do…&lt;/code&gt;&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;The Sprites API is simple, discoverable, and designed for this use case. It&amp;rsquo;s just a question of how you choose to give your agent access to it. And now there&amp;rsquo;s one more way: with MCP.&lt;/p&gt;
&lt;h2 id='we-did-this-because-your-agents-suck' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-did-this-because-your-agents-suck' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We Did This Because Your Agents Suck&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;This feature works well, but we&amp;rsquo;re less than enthusiastic about it. Not as product developers, mind you. It&amp;rsquo;s a good product! Just as aesthetes.&lt;/p&gt;

&lt;p&gt;In 2026, MCP is the wrong way to extend the capabilities of an agent. The emerging Right Way to do this is command line tools and discoverable APIs.&lt;/p&gt;

&lt;p&gt;When we plug an MCP into your agent, we&amp;rsquo;re filling its context with tool descriptions, many of which you&amp;rsquo;ll probably never use. Really, all your agent should need is a short sentence, like &amp;ldquo;&lt;code&gt;Use this skill whenever users want to create a new VM to run a task on, or to manage the VMs already available.&lt;/code&gt;&amp;rdquo; The skill should take care of the rest.&lt;/p&gt;

&lt;p&gt;CLI-driven agent skills are efficient because they reveal capabilities progressively. You can do CLI subcommands, like &lt;code&gt;sprite checkpoint&lt;/code&gt; and &lt;code&gt;sprite exec&lt;/code&gt;, or with API endpoints and subpaths. Good agent harnesses are uncanny at quickly working out how to use these things.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;You &lt;em&gt;are&lt;/em&gt; using Playwright, right? “Make sure this web application actually works before you tell me you’re done”?&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Take &lt;a href='https://playwright.dev/' title=''&gt;Playwright, the industry-standard browser automation tool&lt;/a&gt;. Ask &lt;code&gt;claude&lt;/code&gt; to install Playwright and Chrome and there&amp;rsquo;s a coinflip chance it sets up the MCP server. But notice that when the coin comes up tails, Playwright still works. &lt;code&gt;claude&lt;/code&gt; just drives it by writing little scripts. This is good! The models already know how to write little scripts without using up context.&lt;/p&gt;

&lt;p&gt;And there&amp;rsquo;s more at stake than just efficiency. Cramming your context full of MCP tool descriptions is a way of signaling to the model that those tools are important to you. But not every Sprite command is equally important in every setting. If you&amp;rsquo;re not using network policies, you don&amp;rsquo;t need &lt;code&gt;gemini&lt;/code&gt; to waste a bunch of time setting them up for you.&lt;/p&gt;

&lt;p&gt;Skills and APIs are the best way to drive Sprites. But to make that work, you need an agent that can run shell commands for itself. So you&amp;rsquo;ll want to reach for MCP sessions when you&amp;rsquo;re stuck with an agent that can&amp;rsquo;t run commands. Thankfully, most of us aren&amp;rsquo;t using those kinds of agents anymore. In &lt;code&gt;claude&lt;/code&gt;, &lt;code&gt;gemini&lt;/code&gt;, or &lt;code&gt;codex&lt;/code&gt;, you should just show your agent the &lt;code&gt;sprite&lt;/code&gt; CLI and let it impress you.&lt;/p&gt;
&lt;h2 id='sprites-dev-mcp' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#sprites-dev-mcp' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;sprites.dev/mcp&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Plug this URL into Claude Desktop, or any other agent tool that speaks MCP. You&amp;rsquo;ll authenticate to one of your Fly.io organizations, and your agent will speak Sprites.&lt;/p&gt;

&lt;p&gt;Then:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On a new Sprite, take this repository and reproduce this bug from issues/913, capturing logs.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On a new Sprite, benchmark this function across 1000 runs and summarize the results.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On a new Sprite, update all the dependencies on this project to their newest versions and test that everything works.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On 3 new Sprites, change this service to use each of these 3 query libraries, and use HTTP to test latency.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On a new Sprite, run this code with bpfwatch and show me what files it touches.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On a new Sprite, run a load generator against this endpoint for 60 seconds and report the results.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On a new Sprite, download this dataset and give me a Jupyter notebook to explore it in.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;On a new Sprite, set up a webhook receiver and render a real-time web report of all the payloads it receives.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;I don&amp;rsquo;t know. You know your projects better than we do. Whatever. Sometimes you want a clean, cheap, disposable computer (or five of them). That&amp;rsquo;s now an available feature of all your prompts. Find ways to apply it to your project, and we think you&amp;rsquo;ll end up wondering where Sprites have been all your life.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Some of you are thinking to yourself: “this feature is going to result in robots ruining my life”. We agree. So we’ve built in guardrails. When you authenticate, giving your agent access to a single specific organization on your Fly.io account, we’ll let you scope down the MCP session. You can cap the number of Sprites our MCP will create for you, and you can give them name prefixes so you can easily spot the robots and disassemble them.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='fuck-stateless-sandboxes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fuck-stateless-sandboxes' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Fuck Stateless Sandboxes&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ll keep saying this until our faces turn blue: the industry is stuck on &amp;ldquo;sandboxes&amp;rdquo; as a way of letting agents run code, and sandboxes aren&amp;rsquo;t good enough anymore. What agents want is real computers, with real filesystems, connected to real networks, and there&amp;rsquo;s no technical reason not to give them some.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://fly.io/blog/code-and-let-live/' title=''&gt;We designed Sprites so that you can fearlessly create whole bunches of them&lt;/a&gt;. They&amp;rsquo;re responsive enough to host web apps for your team, but they idle in a sleeping state where they cost virtually nothing. Everybody at Fly.io that uses them ends up with 20 or 30, just hanging around.&lt;/p&gt;

&lt;p&gt;We think you&amp;rsquo;ll do better work when you can pull in as many computers as you need to solve problems. If it takes an MCP server for us to get you to do that, so be it.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Litestream Writable VFS</title>
    <link rel="alternate" href="https://fly.io/blog/litestream-writable-vfs/"/>
    <id>https://fly.io/blog/litestream-writable-vfs/</id>
    <published>2026-01-29T00:00:00+00:00</published>
    <updated>2026-02-04T23:24:24+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/litestream-writable-vfs/assets/litestream-writable-vfs.jpg"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;I’m Ben Johnson, and I work on Litestream at Fly.io. Litestream is the missing backup/restore system for SQLite. It’s free, open-source software that should run anywhere, and&lt;/strong&gt; &lt;a href="/blog/litestream-v050-is-here/" title=""&gt;&lt;strong class="font-semibold text-navy-950"&gt;you can read more about it here&lt;/strong&gt;&lt;/a&gt;&lt;strong class="font-semibold text-navy-950"&gt;.&lt;/strong&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Each time we write about it, we get a little bit better at golfing down a description of what Litestream is. Here goes: Litestream is a Unix-y tool for keeping a SQLite database synchronized with S3-style object storage. It&amp;rsquo;s a way of getting the speed and simplicity wins of SQLite without exposing yourself to catastrophic data loss. Your app doesn&amp;rsquo;t necessarily even need to know it&amp;rsquo;s there; you can just run it as a tool in the background.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s been a busy couple weeks!&lt;/p&gt;

&lt;p&gt;We recently &lt;a href='/blog/design-and-implementation/' title=''&gt;unveiled Sprites&lt;/a&gt;. If you don&amp;rsquo;t know what Sprites are, you should just &lt;a href='https://sprites.dev/' title=''&gt;go check them out&lt;/a&gt;. They&amp;rsquo;re one of the coolest things we&amp;rsquo;ve ever shipped. I won&amp;rsquo;t waste any more time selling them to you. Just, Sprites are a big deal, and so it&amp;rsquo;s a big deal to me that Litestream is a load-bearing component for them.&lt;/p&gt;

&lt;p&gt;Sprites rely directly on Litestream in two big ways.&lt;/p&gt;

&lt;p&gt;First, Litestream SQLite is the core of our global Sprites orchestrator. Unlike our flagship Fly Machines product, which relies on a centralized Postgres cluster, our Elixir Sprites orchestrator runs directly off S3-compatible object storage. Every organization  enrolled in Sprites gets their own SQLite database, synchronized by Litestream.&lt;/p&gt;

&lt;p&gt;This is a fun design. It takes advantage of the &amp;ldquo;many SQLite databases&amp;rdquo; pattern, which is under-appreciated. It&amp;rsquo;s got nice scaling characteristics. Keeping that Postgres cluster happy as Fly.io grew has been a major engineering challenge.&lt;/p&gt;

&lt;p&gt;But as far as Litestream is concerned, the orchestrator is boring, and so that&amp;rsquo;s all I&amp;rsquo;ve got to say about it. The second way Sprites use Litestream is much more interesting.&lt;/p&gt;

&lt;p&gt;Litestream is built directly into the disk storage stack that runs on every Sprite.&lt;/p&gt;

&lt;p&gt;Sprites launch in under a second, and every one of them boots up with 100GB of durable storage. That&amp;rsquo;s a tricky bit of engineering. We&amp;rsquo;re able to do this because the  root of storage for Sprites is S3-compatible object storage, and we&amp;rsquo;re able to make it fast by keeping a database of in-use storage blocks that takes advantage of attached NVMe as a read-through cache. The system that does this is JuiceFS, and the database — let&amp;rsquo;s call it &amp;ldquo;the block map&amp;rdquo; — is a rewritten metadata store, based (you guessed it) on BoltDB.&lt;/p&gt;

&lt;p&gt;I kid! It&amp;rsquo;s Litestream SQLite, of course.&lt;/p&gt;
&lt;h2 id='sprite-storage-is-fussy' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#sprite-storage-is-fussy' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Sprite Storage Is Fussy&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Everything in a Sprite is designed to come up fast.&lt;/p&gt;

&lt;p&gt;If the Fly Machine underneath a Sprite bounces, we might need to reconstitute the block map from object storage. Block maps aren&amp;rsquo;t huge, but they&amp;rsquo;re not tiny; maybe low tens of megabytes worst case.&lt;/p&gt;

&lt;p&gt;The thing is, this is happening while the Sprite boots back up. To put that in perspective, that&amp;rsquo;s something that can happen in response to an incoming web request; that is, we have to finish fast enough to generate a timely response to that request. The time budget is small.&lt;/p&gt;

&lt;p&gt;To make this even faster, we are integrating Litestream VFS to improve start times.The VFS is a dynamic library you load into your app. Once you do, you can do stuff like this:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-duvcwc7p"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-duvcwc7p"&gt;sqlite&amp;gt; .open file:///my.db?vfs&lt;span class="o"&gt;=&lt;/span&gt;litestream
sqlite&amp;gt; PRAGMA litestream_time &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'5 minutes ago'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
sqlite&amp;gt; SELECT &lt;span class="k"&gt;*&lt;/span&gt; FROM sandwich_ratings ORDER BY RANDOM&lt;span class="o"&gt;()&lt;/span&gt; LIMIT 3 &lt;span class="p"&gt;;&lt;/span&gt; 
22|Veggie Delight|New York|4
30|Meatball|Los Angeles|5
168|Chicken Shawarma Wrap|Detroit|5
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Litestream VFS lets us run point-in-time SQLite queries hot off object storage blobs, answering queries before we&amp;rsquo;ve downloaded the database.&lt;/p&gt;

&lt;p&gt;This is good, but it&amp;rsquo;s not perfect. We had two problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We could only read, not write. People write to Sprite disks. The storage stack needs to write, right away.
&lt;/li&gt;&lt;li&gt;Running a query off object storage is a godsend in a cold start where we have no other alternative besides downloading the whole database, but it&amp;rsquo;s not fast enough for steady state.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;These are fun problems. Here&amp;rsquo;s our first cut at solving them.&lt;/p&gt;
&lt;h2 id='writable-vfs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#writable-vfs' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Writable VFS&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The first thing we&amp;rsquo;ve done is made the VFS optionally read-write. This feature is pretty subtle; it&amp;rsquo;s interesting, but it&amp;rsquo;s not as general-purpose as it might look. Let me explain how it works, and then explain why it works this way.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Keep in mind as you read this that this is about the VFS in particular. Obviously, normal SQLite databases using Litestream the normal way are writeable.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The VFS works by keeping an index of &lt;code&gt;(file,offset, size)&lt;/code&gt; for every page of the database in object storage; the data comprising the index is stored, &lt;a href='https://github.com/superfly/ltx' title=''&gt;in LTX files&lt;/a&gt;, so that it&amp;rsquo;s efficient for us to reconstitute it quickly when the VFS starts, and lookups are heavily cached. When we queried &lt;code&gt;sandwich_ratings&lt;/code&gt; earlier, our VFS library intercepted the SQLite read method, looked up the requested page in the index, fetched it, and cached it.&lt;/p&gt;

&lt;p&gt;This works great for reads. Writes are harder.&lt;/p&gt;

&lt;p&gt;Behind the scenes in read-only mode, Litestream polls, so that we can detect new LTX files created by remote writers to the database. This supports a handy use case where we&amp;rsquo;re running tests or doing slow analytical queries of databases that need to stay fast in prod.&lt;/p&gt;

&lt;p&gt;In write mode, we don&amp;rsquo;t allow multiple writers, because multiple-writer distributed SQLite databases are the &lt;a href='https://hellraiser.fandom.com/wiki/Lament_Configuration' title=''&gt;Lament Configuration&lt;/a&gt; and we are not explorers over great vistas of pain. So the VFS in write-mode disables polling. We assume a single writer, and no additional backups to watch.&lt;/p&gt;

&lt;p&gt;Next, we buffer. Writes go to a local temporary buffer (&amp;ldquo;the write buffer&amp;rdquo;). Every second or so (or on clean shutdown), we sync the write buffer with object storage. Nothing written through the VFS is truly durable until that sync happens.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Most storage block maps are much smaller than this, but still.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Now, remember the use case we&amp;rsquo;re looking to support here. A Sprite is cold-starting and its storage stack needs to serve writes, milliseconds after booting, without having a full copy of the 10MB block map. This writeable VFS mode lets us do that.&lt;/p&gt;

&lt;p&gt;Critically, we support that use case only up to the same durability requirements that a Sprite already has. All storage on a Sprite shares this &amp;ldquo;eventual durability&amp;rdquo; property, so the terms of the VFS write make sense here. They probably don&amp;rsquo;t make sense for your application. But if for some reason they do, have at it! To enable writes with Litestream VFS, just set the &lt;code&gt;LITESTREAM_WRITE_ENABLED&lt;/code&gt; environment variable &lt;code&gt;&amp;quot;true&amp;quot;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/litestream-writable-vfs/assets/write-path.png" /&gt;&lt;/p&gt;
&lt;h2 id='hydration' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#hydration' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Hydration&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The Sprite storage stack uses SQLite in VFS mode. In our original VFS design, most data is kept in S3. Again: fine at cold start, not so fine in steady state.&lt;/p&gt;

&lt;p&gt;To solve this problem, we shoplifted a trick from &lt;a href='https://docs.kernel.org/admin-guide/device-mapper/dm-clone.html' title=''&gt;systems like dm-clone&lt;/a&gt;: background hydration. In hydration designs, we serve queries remotely while running a loop to pull the whole database.  When you start the VFS with the &lt;code&gt;LITESTREAM_HYDRATION_PATH&lt;/code&gt; environment variable set, we&amp;rsquo;ll hydrate to that file.&lt;/p&gt;

&lt;p&gt;Hydration takes advantage of &lt;a href='https://fly.io/blog/litestream-revamped#point-in-time-restores-but-fast' title=''&gt;LTX compaction&lt;/a&gt;, writing only the latest versions of each page. Reads don&amp;rsquo;t block on hydration; we serve them from object storage immediately, and switch over to the hydration file when it&amp;rsquo;s ready.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/litestream-writable-vfs/assets/timeline.png" /&gt;&lt;/p&gt;

&lt;p&gt;As for the hydration file? It&amp;rsquo;s simply a full copy of your database. It&amp;rsquo;s the same thing you get if you run &lt;code&gt;litestream restore&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Because this is designed for environments like Sprites, which bounce a lot, we write the database to a temporary file. We can&amp;rsquo;t trust that the database is using the latest state every time we start up, not without doing a full restore, so we just chuck the hydration file when we exit the VFS. That behavior is baked into the VFS right now. This feature&amp;rsquo;s got what Sprites need, but again, maybe not what your app wants.&lt;/p&gt;
&lt;h2 id='putting-it-all-together' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#putting-it-all-together' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Putting It All Together&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;This is a post about two relatively big moves we&amp;rsquo;ve made with our open-source Litestream project, but the features are narrowly scoped for problems that look like the ones our storage stack needs. If you think you can get use out of them, I&amp;rsquo;m thrilled, and I hope you&amp;rsquo;ll tell me about it.&lt;/p&gt;

&lt;p&gt;For ordinary read/write workloads, you don&amp;rsquo;t need any of this mechanism. Litestream works fine without the VFS, with unmodified applications, just running as a sidecar alongside your application. The whole point of that configuration is to efficiently keep up with writes; that&amp;rsquo;s easy when you know you have the whole database to work with when writes happen.&lt;/p&gt;

&lt;p&gt;But this whole thing is, to me, a valuable case study in how Litestream can get used in a relatively complicated and demanding problem domain. Sprites are very cool, and it&amp;rsquo;s satisfying to know that every disk write that happens on a Sprite is running through Litestream.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>The Design &amp; Implementation of Sprites</title>
    <link rel="alternate" href="https://fly.io/blog/design-and-implementation/"/>
    <id>https://fly.io/blog/design-and-implementation/</id>
    <published>2026-01-14T00:00:00+00:00</published>
    <updated>2026-01-16T20:23:36+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/design-and-implementation/assets/starry-containers.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, and this is the place in the post where we’d normally tell you that our job is to &lt;a href="https://fly.io/blog/docker-without-docker/" title=""&gt;take your containers and run them on our own hardware&lt;/a&gt; all around the world. But last week, we &lt;a href="https://sprites.dev/" title=""&gt;launched Sprites&lt;/a&gt;, and they don’t work that way at all. Sprites are something new: Docker without Docker without Docker. This post is about how they work.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Replacement-level homeowners buy boxes of pens and stick them in &amp;ldquo;the pen drawer&amp;rdquo;. What the elites know: you have to think adversarially about pens. &amp;ldquo;The purpose of a system is what it does&amp;rdquo;; a household&amp;rsquo;s is to uniformly distribute pens.  Months from now, the drawer will be empty, no matter how many pens you stockpile. Instead, scatter pens every place you could possibly think to look for one — drawers, ledges, desks. Any time anybody needs a pen, several are at hand, in exactly the first place they look.&lt;/p&gt;

&lt;p&gt;This is the best way I&amp;rsquo;ve found to articulate the idea of &lt;a href='https://sprites.dev/' title=''&gt;Sprites&lt;/a&gt;, the platform we just launched at Fly.io. Sprites are ball-point disposable computers. Whatever mark you mean to make, we&amp;rsquo;ve rigged it so you&amp;rsquo;re never more than a second or two away from having a Sprite to do it with.&lt;/p&gt;

&lt;p&gt;Sprites are Linux virtual machines. You get root. They &lt;code&gt;create&lt;/code&gt; in just a second or two: so fast, the experience of creating and shelling into one is identical to SSH&amp;#39;ing into a machine that already exists. Sprites all have a 100GB durable root filesystem. They put themselves to sleep automatically when inactive, and cost practically nothing while asleep.&lt;/p&gt;

&lt;p&gt;As a result, I barely feel the need to name my Sprites. Sometimes I&amp;rsquo;ll just type &lt;code&gt;sprite create dkjsdjk&lt;/code&gt; and start some task. People at Fly.io who use Sprites have dozens hanging around.&lt;/p&gt;

&lt;p&gt;There aren&amp;rsquo;t yet many things in cloud computing that have the exact shape Sprites do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instant creation
&lt;/li&gt;&lt;li&gt;No time limits
&lt;/li&gt;&lt;li&gt;Persistent disk
&lt;/li&gt;&lt;li&gt;Auto-sleep to a cheap inactive state
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This is a post about how we managed to get this working. We created a new orchestration stack that undoes some of the core decisions we made for &lt;a href='https://fly.io/machines' title=''&gt;Fly Machines&lt;/a&gt;, our flagship product. Turns out, these new decisions make Sprites drastically easier for us to scale and manage. We&amp;rsquo;re pretty psyched.&lt;/p&gt;

&lt;p&gt;Lucky for me, there happen to be three &lt;code&gt;big decisions&lt;/code&gt; we made that get you 90% of the way from Fly Machines to Sprites, which makes this an easy post to write. So, without further ado:&lt;/p&gt;
&lt;h2 id='decision-1-no-more-container-images' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#decision-1-no-more-container-images' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Decision #1: No More Container Images&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;This is the easiest decision to explain.&lt;/p&gt;

&lt;p&gt;Fly Machines are approximately &lt;a href='https://fly.io/blog/docker-without-docker/' title=''&gt;OCI containers repackaged as KVM micro-VMs&lt;/a&gt;. They have the ergonomics of Docker but the isolation and security of an EC2 instance. We love them very much and they&amp;rsquo;re clearly the wrong basis for a ball-point disposable cloud computer.&lt;/p&gt;

&lt;p&gt;The &amp;ldquo;one weird trick&amp;rdquo; of Fly Machines is that they &lt;code&gt;start&lt;/code&gt; and &lt;code&gt;stop&lt;/code&gt; instantly, fast enough that they can wake in time to handle an incoming HTTP request. But they can only do that if you&amp;rsquo;ve already &lt;code&gt;created&lt;/code&gt; them. You have to preallocate. &lt;code&gt;Creating&lt;/code&gt; a Fly Machine can take over a minute. What you&amp;rsquo;re supposed to do is to create a whole bunch of them and &lt;code&gt;stop&lt;/code&gt; them so they&amp;rsquo;re ready when you need them. But for Sprites, we need &lt;code&gt;create&lt;/code&gt; to be so fast it feels like they&amp;rsquo;re already there waiting for you.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;We only murdered user containers because we wanted them dead.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Most of what&amp;rsquo;s slow about &lt;code&gt;creating&lt;/code&gt; a Fly Machine is containers. I say this with affection: your containers are crazier than a soup sandwich. Huge and fussy, they take forever to &lt;a href='https://fly.io/blog/docker-without-docker/' title=''&gt;pull and unpack&lt;/a&gt;. The regional locality sucks;  &lt;code&gt;create&lt;/code&gt; a Fly Machine in São Paulo on &lt;code&gt;gru-3838&lt;/code&gt;, and a &lt;code&gt;create&lt;/code&gt; on  &lt;code&gt;gru-d795&lt;/code&gt; is no faster. A &lt;a href='https://community.fly.io/t/global-registry-now-in-production/13723' title=''&gt;truly heartbreaking&lt;/a&gt; amount of &lt;a href='https://community.fly.io/t/faster-more-reliable-remote-image-builds-deploys/25841' title=''&gt;engineering work&lt;/a&gt; has gone into just allowing our OCI registry to &lt;a href='https://www.youtube.com/watch?v=0jD-Rt4_CR8' title=''&gt;keep up&lt;/a&gt; with this system. &lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s a tough job, is all I&amp;rsquo;m saying. Sprites get rid of the user-facing container. Literally: problem solved. Sprites get to do this on easy mode.&lt;/p&gt;

&lt;p&gt;Now, today, under the hood, Sprites are still Fly Machines. But they all run from a standard container. Every physical worker knows exactly what container the next Sprite is going to start with, so it&amp;rsquo;s easy for us to keep pools of &amp;ldquo;empty&amp;rdquo; Sprites standing by. The result: a Sprite &lt;code&gt;create&lt;/code&gt; doesn&amp;rsquo;t have any heavy lifting to do; it&amp;rsquo;s basically just doing the stuff we do when we &lt;code&gt;start&lt;/code&gt; a Fly Machine.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;This all works right now.&lt;/h1&gt;
    &lt;p&gt;You can create a couple dozen Sprites right now if you want. It&amp;rsquo;ll only take a second.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://sprites.dev/"&gt;
        Make a Sprite.  &lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='decision-2-object-storage-for-disks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#decision-2-object-storage-for-disks' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Decision #2: Object Storage For Disks&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Every Sprite comes with 100GB of durable storage. We&amp;rsquo;re able to do that because the root of storage is S3-compatible object storage.&lt;/p&gt;

&lt;p&gt;You can arrange for 100GB of storage for a Fly Machine. Or 200, or 500. The catch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have to ask (with &lt;code&gt;flyctl&lt;/code&gt;); we can&amp;rsquo;t reasonably default it in.
&lt;/li&gt;&lt;li&gt;That storage is NVMe attached to the physical server your Fly Machine is on. 
&lt;/li&gt;&lt;/ul&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;[†] we print a &lt;span style="color: red"&gt;big red warning&lt;/span&gt; about this if you try to make a single-node cluster&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We designed the storage stack for Fly Machines for Postgres clusters. A multi-replica Postgres cluster gets good mileage out of Fly Volumes. Attached storage is fast, but can &lt;span style="color: red"&gt;lose data†&lt;/span&gt; — if a physical blows up, there&amp;rsquo;s no magic what rescues its stored bits. You&amp;rsquo;re stuck with our last snapshot backup. That&amp;rsquo;s fine for a replicated Postgres! It&amp;rsquo;s part of what Postgres replication is for. But for anything without explicit replication, it&amp;rsquo;s a very sharp edge.&lt;/p&gt;

&lt;p&gt;Worse, from our perspective, is that attached storage anchors workloads to specific physicals. We have lots of reasons to want to move Fly Machines around. Before we  did Fly Volumes, that was as simple as pushing a &amp;ldquo;drain&amp;rdquo; button on a server. Imagine losing a capability like that. It took 3 years to &lt;a href='https://fly.io/blog/machine-migrations/' title=''&gt;get workload migration right&lt;/a&gt; with attached storage, and it&amp;rsquo;s still not &amp;ldquo;easy&amp;rdquo;.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Object stores are the Internet’s Hoover Dams, the closest things we have to infrastructure megaprojects.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Sprites jettison this model. We still exploit NVMe, but not as the root of storage. Instead, it&amp;rsquo;s a read-through cache for a blob on object storage. S3-compatible object stores are the most trustworthy storage technology we have. I can feel my blood pressure dropping just typing the words &amp;ldquo;Sprites are backed by object storage.&amp;rdquo;&lt;/p&gt;

&lt;p&gt;The implications of this for orchestration are profound. In a real sense, the durable state of a Sprite is simply a URL. Wherever he lays his hat is his home! They migrate (or recover from failed physicals) trivially. It&amp;rsquo;s early days for our internal tooling, but we have so many new degrees of freedom to work with.&lt;/p&gt;

&lt;p&gt;I could easily do another 1500-2000 words here on the Cronenberg film Kurt came up with for the actual storage stack, but because it&amp;rsquo;s in flux, let&amp;rsquo;s keep it simple.&lt;/p&gt;

&lt;p&gt;The Sprite storage stack is organized around the JuiceFS model (in fact, we currently use a very hacked-up JuiceFS, with a rewritten SQLite metadata backend). It works by splitting storage into data (&amp;ldquo;chunks&amp;rdquo;) and metadata (a map of where the &amp;ldquo;chunks&amp;rdquo; are). Data chunks live on object stores; metadata lives in fast local storage.  In our case, that metadata store is &lt;a href='https://litestream.io/' title=''&gt;kept durable with Litestream&lt;/a&gt;. Nothing depends on local storage.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(our pre-installed Claude Code will checkpoint aggressively for you without asking)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This also buys Sprites fast &lt;code&gt;checkpoint&lt;/code&gt; and &lt;code&gt;restore&lt;/code&gt;. Checkpoints are so fast we want you to use them as a basic feature of the system and not as an escape hatch when things go wrong; like a git restore, not a system restore. That works because both &lt;code&gt;checkpoint&lt;/code&gt; and &lt;code&gt;restore&lt;/code&gt; merely shuffle metadata around.&lt;/p&gt;

&lt;p&gt;Our stack sports &lt;a href='https://en.wikipedia.org/wiki/Dm-cache' title=''&gt;a dm-cache-like&lt;/a&gt; feature that takes advantage of attached storage. A Sprite has a sparse 100GB NVMe volume attached to it, which the stack uses to cache chunks to eliminate read amplification. Importantly (I can feel my resting heart rate lowering) nothing in that NVMe volume should matter; stored chunks are immutable and their true state lives on the object store.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Our preference for object storage goes further than the Sprite storage stack. The global orchestrator for Sprites is an Elixir/Phoenix app that uses object storage as the primary source of metadata for accounts. We then give each account an independent SQLite database, again made durable on object storage with Litestream.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='decision-3-inside-out-orchestration' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#decision-3-inside-out-orchestration' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Decision #3: Inside-Out Orchestration&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In the cloud hosting industry, user applications are managed by two separate, yet equally important components: the host, which orchestrates workloads, and the guest, which runs them. Sprites flip that on its head: the most important orchestration and management work happens inside the VM.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s the trick: user code running on a Sprite isn&amp;rsquo;t running in the root namespace. We&amp;rsquo;ve slid a container between you and the kernel. You see an inner environment, managed by a fleet of services running in the root namespace of the VM.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;I wish we’d done Fly Machines this way to begin with. I’m not sure there’s a downside. The inner container allows us to bounce a Sprite without rebooting the whole VM, even on checkpoint restores. I think Fly Machines users could get some mileage out of that feature, too.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;With Sprites, we&amp;rsquo;re pushing this idea as far as we can. The root environment hosts the majority of our orchestration code. When you talk to the global API, chances are you&amp;rsquo;re talking directly to your own VM. Furthermore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Our storage stack, which handles checkpoint/restore and persistence to object storage, lives there;
&lt;/li&gt;&lt;li&gt;so does the service manager we expose to Sprites, which registers user code that needs to restart when a Sprite bounces;
&lt;/li&gt;&lt;li&gt;same with logs;
&lt;/li&gt;&lt;li&gt;if you bind a socket to &lt;code&gt;*:8080&lt;/code&gt;, we&amp;rsquo;ll make it available outside the Sprite — yep, that&amp;rsquo;s in the root namespace too.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Platform developers at Fly.io know how much easier it can be to hack on &lt;code&gt;init&lt;/code&gt; (inside the container) than things &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;like &lt;code&gt;flyd&lt;/code&gt;&lt;/a&gt;, the Fly Machines orchestrator that runs on  the host. Changes to Sprites don&amp;rsquo;t restart host components or muck with global state. The blast radius is just new VMs that pick up the change. We sleep on how much platform work doesn&amp;rsquo;t get done not because the code is hard to write, but because it&amp;rsquo;s so time-consuming to ensure benign-looking changes don&amp;rsquo;t throw the whole fleet into metastable failure. We had that in mind when we did Sprites.&lt;/p&gt;
&lt;h2 id='we-keep-the-parts-that-worked' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-keep-the-parts-that-worked' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We Keep The Parts That Worked&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Sprites running on Fly.io take advantage of the infrastructure we already have. For instance: Sprites might be the fastest thing there currently exists to get Claude or Gemini to build a full-stack application on the Internet.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s because Sprites plug directly into &lt;a href='https://fly.io/blog/corrosion/' title=''&gt;Corrosion, our gossip-based service discovery system&lt;/a&gt;. When you ask the Sprite API to make a public URL for your Sprite, we generate a Corrosion update that propagates across our fleet instantly. Your application is then served, with an HTTPS URL, from our proxy edges.&lt;/p&gt;

&lt;p&gt;Sprites live alongside Fly Machines in our architecture. They include some changes  that are pure wins, but they&amp;rsquo;re mostly tradeoffs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We&amp;rsquo;ve always wanted to run Fly Machine disks off object storage (&lt;a href='https://community.fly.io/t/bottomless-s3-backed-volumes/15648' title=''&gt;we have an obscure LSVD feature that does this&lt;/a&gt;), but the performance isn&amp;rsquo;t adequate for a hot Postgres node in production.
&lt;/li&gt;&lt;li&gt;For that matter, professional production apps ship out of CI/CD systems as OCI containers; that&amp;rsquo;s a big part of what makes orchestrating Fly Machines so hard.
&lt;/li&gt;&lt;li&gt;Most (though not all) Sprite usage is interactive, and Sprite users benefit from their VMs aggressively sleeping themselves to keep costs low; e-commerce apps measure responsiveness in milliseconds and want their workloads kept warm.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Sprites are optimized for a different kind of computing than Fly Machines, and &lt;a href='https://fly.io/blog/code-and-let-live/' title=''&gt;while Kurt believes that the future belongs to malleable, personalized apps&lt;/a&gt;, I&amp;rsquo;m not so sure. To me, it makes sense to prototype and acceptance-test an application on Sprites. Then, when you&amp;rsquo;re happy with it, containerize it and ship it as a Fly Machine to scale it out. An automated workflow for that will happen.&lt;/p&gt;

&lt;p&gt;Finally, Sprites are a contract with user code: an API and a set of expectations about how the execution environment works. Today, they run on top of Fly Machines. But they don&amp;rsquo;t have to. Jerome&amp;rsquo;s working on an open-source local Sprite runtime. We&amp;rsquo;ll find other places to run them, too.&lt;/p&gt;
&lt;h2 id='you-wont-get-it-until-you-use-them' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#you-wont-get-it-until-you-use-them' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;You Won&amp;rsquo;t Get It Until You Use Them&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I can&amp;rsquo;t not sound like a shill. Sprites are the one thing we&amp;rsquo;ve shipped that I personally experience as addictive. I haven&amp;rsquo;t fully put my finger on why it feels so much easier to kick off projects now that I can snap my finger and get a whole new computer. The whole point is that there&amp;rsquo;s no reason to parcel them out, or decide which code should run where. You just make a new one.&lt;/p&gt;

&lt;p&gt;So to make this fully click, I think you should &lt;a href='https://sprites.dev/' title=''&gt;just install the &lt;code&gt;sprite&lt;/code&gt; command&lt;/a&gt;, make a Sprite, and then run an agent in it. We&amp;rsquo;ve preinstalled Claude, Gemini, and Codex, and taught them how to do things like checkpoint/restore, registering services, and getting logs. Claude will run in &lt;code&gt;--dangerously-skip-permissions&lt;/code&gt; mode (because why wouldn&amp;rsquo;t it). Have it build something; I built a &amp;ldquo;Chicago&amp;rsquo;s best sandwich&amp;rdquo; bracket app for a Slack channel.&lt;/p&gt;

&lt;p&gt;Sprites bill only for what you actually use (in particular: only for storage blocks you actually write, not the full 100GB capacity). It&amp;rsquo;s reasonable to create a bunch. They&amp;rsquo;re ball-point disposable computers. After you get a feel for them, it&amp;rsquo;ll start to feel weird not having them handy.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Code And Let Live</title>
    <link rel="alternate" href="https://fly.io/blog/code-and-let-live/"/>
    <id>https://fly.io/blog/code-and-let-live/</id>
    <published>2026-01-09T00:00:00+00:00</published>
    <updated>2026-01-14T19:59:01+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/code-and-let-live/assets/sprites.jpg"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;The state of the art in agent isolation is a read-only sandbox. At Fly.io, we’ve been selling that story for years, and we’re calling it: ephemeral sandboxes are obsolete. Stop killing your sandboxes every time you use them.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;My argument won&amp;rsquo;t make sense without showing you something new we&amp;rsquo;ve built. We&amp;rsquo;re all adults here, this is a company, we talk about what we do. Here goes.&lt;/p&gt;

&lt;p&gt;So, I want to run some code. So what I do is, I run &lt;code&gt;sprite create&lt;/code&gt;. While it operates, I&amp;rsquo;ll explain what&amp;rsquo;s happening behind the—&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-i429cz3y"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-i429cz3y"&gt;✓ Created demo-123 sprite &lt;span class="k"&gt;in &lt;/span&gt;1.0s
● Connecting to console...
sprite@sprite:~#
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Shit, it&amp;rsquo;s already there.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s a root shell on a Linux computer we now own. It came online in about the same amount of time it would take to &lt;code&gt;ssh&lt;/code&gt; into a host that already existed. We call these things &amp;ldquo;Sprites&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s install FFmpeg on our Sprite:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-bwjgxaic"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-bwjgxaic"&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; ffmpeg &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/dev/null 2&amp;gt;&amp;amp;1
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Unlike creating the Sprite in the first place, installing &lt;code&gt;ffmpeg&lt;/code&gt; with &lt;code&gt;apt-get&lt;/code&gt; is dog slow. Let&amp;rsquo;s try not to have to do that again:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-iacnzrtv"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-iacnzrtv"&gt;sprite@sprite:~# sprite-env checkpoints create
&lt;span class="c"&gt;# ...&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"type"&lt;/span&gt;:&lt;span class="s2"&gt;"complete"&lt;/span&gt;,&lt;span class="s2"&gt;"data"&lt;/span&gt;:&lt;span class="s2"&gt;"Checkpoint v1 created successfully"&lt;/span&gt;,
&lt;span class="s2"&gt;"time"&lt;/span&gt;:&lt;span class="s2"&gt;"2025-12-22T22:50:48.60423809Z"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This completes instantly. Didn&amp;rsquo;t even bother to measure.&lt;/p&gt;

&lt;p&gt;I step away to get coffee. Time passes. The Sprite, noticing my inactivity, goes to sleep. I meet an old friend from high school at the coffee shop. End up spending the day together. More time passes. Days even. Returning later:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-k9uw0dxr"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-k9uw0dxr"&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$ &lt;/span&gt;sprite console 
sprite@sprite:~# ffmpeg
ffmpeg version 7.1.1-1ubuntu1.3 Copyright &lt;span class="o"&gt;(&lt;/span&gt;c&lt;span class="o"&gt;)&lt;/span&gt; 2000-2025 the FFmpeg developers
Use &lt;span class="nt"&gt;-h&lt;/span&gt; to get full &lt;span class="nb"&gt;help &lt;/span&gt;or, even better, run &lt;span class="s1"&gt;'man ffmpeg'&lt;/span&gt;
sprite@sprite:~#
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Everything&amp;rsquo;s where I left it. Sprites are durable. 100GB capacity to start, no ceremony. Maybe I&amp;rsquo;ll keep it around a few more days, maybe a few months, doesn&amp;rsquo;t matter, just works.&lt;/p&gt;

&lt;p&gt;Say I get an application up on its legs. Install more packages. Then: disaster. Maybe an ill-advised global &lt;code&gt;pip3 install&lt;/code&gt; . Or &lt;code&gt;rm -rf $HMOE/bin&lt;/code&gt;. Or &lt;code&gt;dd if=/dev/random of=/dev/vdb&lt;/code&gt;. Whatever it was, everything&amp;rsquo;s broken. So:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative bash"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-8qs3qsqn"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-8qs3qsqn"&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$ &lt;/span&gt;sprite checkpoint restore v1
Restoring from checkpoint v1...
Container components started successfully
Restore from v1 &lt;span class="nb"&gt;complete&lt;/span&gt;

&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$ &lt;/span&gt;sprite console
sprite@sprite:~#
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Sprites have first-class checkpoint and restore. You can&amp;rsquo;t see it in text, but that restore took about one second. It&amp;rsquo;s fast enough to use casually, interactively. Not an escape hatch. Rather: an intended part of the ordinary course of using a Sprite. Like &lt;code&gt;git&lt;/code&gt;, but for the whole system.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;If you’re asking how this is any different from an EC2 instance, good. That’s what we’re going for, except:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I can &lt;strong class="font-semibold text-navy-950"&gt;casually create hundreds of them&lt;/strong&gt; (without needing a Docker container), each  appearing in 1-2 seconds. 
&lt;/li&gt;&lt;li&gt;They &lt;strong class="font-semibold text-navy-950"&gt;go idle and stop metering automatically&lt;/strong&gt;, so it’s cheap to have lots of them. I use dozens.
&lt;/li&gt;&lt;li&gt;They’re &lt;strong class="font-semibold text-navy-950"&gt;hooked up to our Anycast&lt;/strong&gt; network, so I can get an HTTPS URL.
&lt;/li&gt;&lt;li&gt;Despite all that, &lt;strong class="font-semibold text-navy-950"&gt;they’re fully durable&lt;/strong&gt;. They don’t die until I tell them to.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This combination of attributes isn’t common enough to already have a name, so we decided we get to name them “Sprites”. Sprites are like BIC disposable cloud computers.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;That&amp;rsquo;s what we built. You can &lt;a href='https://sprites.dev/' title=''&gt;go try it yourself&lt;/a&gt;. We wrote another 1000 words about how they work, but I cut them out because I want to stop talking about our products now and get to my point.&lt;/p&gt;
&lt;h2 id='claude-doesnt-want-a-stateless-container' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#claude-doesnt-want-a-stateless-container' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Claude Doesn&amp;rsquo;t Want A Stateless Container&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;For years, we&amp;rsquo;ve been trying to serve two very different users with the same abstraction. It hasn&amp;rsquo;t worked.&lt;/p&gt;

&lt;p&gt;Professional software developers are trained to build stateless instances. Stateless deployments, where persistent data is confined to database servers, buys you simplicity, flexible scale-out, and reduced failure blast radius. It&amp;rsquo;s a good idea, so popular that most places you can run code in the cloud look like stateless containers. Fly Machines, our flagship offering, look like stateless containers.&lt;/p&gt;

&lt;p&gt;The problem is that Claude isn&amp;rsquo;t a pro developer. Claude is a hyper-productive five-year-old savant. It&amp;rsquo;s uncannily smart, wants to stick its finger in every available electrical socket, and works best when you find a way to let it zap itself.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(sometimes by escaping the container!)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you force an agent to, it&amp;rsquo;ll work around containerization and do work . But you&amp;rsquo;re not helping the agent in any way by doing that. They don&amp;rsquo;t want  containers. They don&amp;rsquo;t want &amp;ldquo;sandboxes&amp;rdquo;. They want computers.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Someone asked me about this the other day and wanted to know if I was saying that agents needed sound cards and USB ports. And, maybe? I don’t know. Not today.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In a moment, I&amp;rsquo;ll explain why. But first I probably need to explain what the hell I mean by a &amp;ldquo;computer&amp;rdquo;. I think we all agree:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A computer doesn&amp;rsquo;t necessarily vanish after a single job is completed, &lt;em&gt;and&lt;/em&gt; 
&lt;/li&gt;&lt;li&gt;it has durable storage. 
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Since current agent sandboxes have neither of these, I can stop the definition right there and get back to my point.&lt;/p&gt;
&lt;h2 id='simple-wins' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#simple-wins' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Simple Wins&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Start here: with an actual computer, Claude doesn&amp;rsquo;t have to rebuild my entire development environment every time I pick up a PR.&lt;/p&gt;

&lt;p&gt;This seems superficial but rebuilding stuff like &lt;code&gt;node_modules&lt;/code&gt; is such a monumental pain in the ass that the industry is spending tens of millions of dollars figuring out how to snapshot and restore ephemeral sandboxes.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m not saying those problems are intractable. I&amp;rsquo;m saying they&amp;rsquo;re unnecessary. Instead of figuring them out, just use an actual computer. Work out a PR, review and push it, then just start on the next one. Without rebooting.&lt;/p&gt;

&lt;p&gt;People will rationalize why it&amp;rsquo;s a good thing that they start from a new build environment every time they start a changeset. Stockholm Syndrome. When you start a feature branch on your own, do you create an entirely new development environment to do it?&lt;/p&gt;

&lt;p&gt;The reason agents waste all this effort is that nobody saw them coming. Read-only ephemeral sandboxes were the only tool we had hanging on the wall to help use them sanely.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Have you ever had to set up actual infrastructure to give an agent access to realistic data? People do this. Because they know they’re dealing with a clean slate every time they prompt their agent, they arrange for S3 buckets, Redis servers, or even RDS instances outside the sandbox for their agents to talk to.  They’re building infrastructure to work around the fact that they can’t just write a file and trust it to stay put. Gross.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Ephemerality means time limits. Providers design sandbox systems to handle the expected workloads agents generate. Most things agents do today don&amp;rsquo;t take much time; in fact, they&amp;rsquo;re often limited only by the rate at which frontier models can crunch tokens. Test suites run quickly. The 99th percentile sandboxed agent run probably needs less than 15 minutes.&lt;/p&gt;

&lt;p&gt;But there are feature requests where compute and network time swamp token crunching. I built the documentation site for the Sprites API by having a Claude Sprite interact with the code and our API, building and testing examples for the API one at a time. There are APIs where the client interaction time alone would blow sandbox budgets.&lt;/p&gt;

&lt;p&gt;You see the limits of the current approach in how people round-trip state through &amp;ldquo;plan files&amp;rdquo;, which are ostensibly prose but often really just egregiously-encoded key-value stores.&lt;/p&gt;

&lt;p&gt;An agent running on an actual computer can exploit the whole lifecycle of the application. We saw this when Chris McCord built &lt;a href='https://phoenix.new/' title=''&gt;Phoenix.new&lt;/a&gt;. The agent behind a Phoenix.new app runs on a Fly Machine where it can see the app logs from the Phoenix app it generated. When users do things that generate exceptions, Phoenix.new notices and gets to work figuring out what happened.&lt;/p&gt;

&lt;p&gt;It took way too much work for Chris to set that up, and he was able to do it in part because he wrote his own agent. You can do it with Claude today with an MCP server or some other arrangement to haul logs over. But all you really need is to just not shoot your sandbox in the head when the agent finishes writing code.&lt;/p&gt;
&lt;h2 id='galaxy-brain-win' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#galaxy-brain-win' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Galaxy Brain Win&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s where I lose you. I know this because it&amp;rsquo;s also where I lose my team, most of whom don&amp;rsquo;t believe me about this.&lt;/p&gt;

&lt;p&gt;The nature of software development is changing out from under us, and I think we&amp;rsquo;re kidding ourselves that it&amp;rsquo;s going to end with just a reconfiguration of how professional developers ship software.&lt;/p&gt;

&lt;p&gt;I have kids. They have devices. I wanted some control over them. So I did what many of you would do in my situation: I vibe-coded an MDM.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/code-and-let-live/assets/kurtmdm.png?1/2&amp;amp;card&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;I built this thing with Claude. It&amp;rsquo;s a SQLite-backed Go application running on a Sprite. The Anycast URL my Sprite exports works as an MDM registration URL. Claude also worked out all the APNS Push Certificate drama for me. It all just works.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;“Editing PHP files over FTP: we weren’t wrong,  just ahead of our time!”&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I&amp;rsquo;ve been running this for a month now, still on a Sprite, and see no reason ever to stop. It is a piece of software that solves an important real-world problem for me. It might evolve as my needs change, and I tell Claude to change it. Or it might not.  For this app, dev is prod, prod is dev.&lt;/p&gt;

&lt;p&gt;For reasons we&amp;rsquo;ll get into when we write up how we built these things, you wouldn&amp;rsquo;t want to ship an app to millions of people on a Sprite. But most apps don&amp;rsquo;t want to serve millions of people. The most important day-to-day apps disproportionately won&amp;rsquo;t have million-person audiences. There are some important million-person apps, but most of them just destroy civil society, melt our brains, and arrange chauffeurs for  individual cheeseburgers.&lt;/p&gt;

&lt;p&gt;Applications that solve real problems for people will be owned by the people they solve problems for. And for the most part, they won&amp;rsquo;t need a professional guild of software developers to gatekeep feature development for them. They&amp;rsquo;ll just ask for things and get them.&lt;/p&gt;

&lt;p&gt;The problem we&amp;rsquo;re all working on is bigger than safely accelerating pro software developers. Sandboxes are holding us back.&lt;/p&gt;
&lt;h2 id='fuck-ephemeral-sandboxes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fuck-ephemeral-sandboxes' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Fuck Ephemeral Sandboxes&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Obviously, I&amp;rsquo;m trying to sell you something here. But that doesn&amp;rsquo;t make me wrong. The argument I&amp;rsquo;m making is the reason we built the specific thing I&amp;rsquo;m selling.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;We shipped these things.&lt;/h1&gt;
    &lt;p&gt;You can create a couple dozen Sprites right now if you want. It&amp;rsquo;ll only take a second.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://sprites.dev/"&gt;
        Make a Sprite.  &lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-dog.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;


&lt;p&gt;It took us a long time to get here. We spent years kidding ourselves. We built a platform for horizontal-scaling production applications with micro-VMs that boot so quickly that, if you hold them in exactly the right way, you can do a pretty decent code sandbox with them. But it&amp;rsquo;s always been a square peg, round hole situation.&lt;/p&gt;

&lt;p&gt;We have a lot to say about how Sprites work. They&amp;rsquo;re related to Fly Machines but sharply different in important ways. They have an entirely new storage stack. They&amp;rsquo;re orchestrated differently. No Dockerfiles.&lt;/p&gt;

&lt;p&gt;But for now, I just want you to think about what I&amp;rsquo;m saying here. Whether or not you ever boot a Sprite, ask: if you could run a coding agent anywhere, would you want it to look more like a read-only sandbox in a K8s cluster in the cloud, or like an entire EC2 instance you could summon in the snap of a finger?&lt;/p&gt;

&lt;p&gt;I think the answer is obvious. The age of sandboxes is over. The time of the disposable computer has come.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Litestream VFS</title>
    <link rel="alternate" href="https://fly.io/blog/litestream-vfs/"/>
    <id>https://fly.io/blog/litestream-vfs/</id>
    <published>2025-12-11T00:00:00+00:00</published>
    <updated>2025-12-11T17:32:13+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/litestream-vfs/assets/litestream-vfs.jpg"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;I’m Ben Johnson, and I work on Litestream at Fly.io. Litestream is the missing backup/restore system for SQLite. It’s free, open-source software that should run anywhere, and&lt;/strong&gt; &lt;a href="/blog/litestream-v050-is-here/" title=""&gt;&lt;strong class="font-semibold text-navy-950"&gt;you can read more about it here&lt;/strong&gt;&lt;/a&gt;&lt;strong class="font-semibold text-navy-950"&gt;.&lt;/strong&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Again with the sandwiches: assume we&amp;rsquo;ve got a SQLite database of sandwich ratings, and we&amp;rsquo;ve backed it up with &lt;a href='/blog/litestream-v050-is-here/' title=''&gt;Litestream&lt;/a&gt; to an S3 bucket.&lt;/p&gt;

&lt;p&gt;Now, on our local host, load up AWS credentials and an S3 path into our environment. Open SQLite and:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-z396uf60"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-z396uf60"&gt;$ sqlite3
SQLite version 3.50.4 2025-07-30 19:33:53
sqlite&amp;gt; .load litestream.so
sqlite&amp;gt; .open file:///my.db?vfs=litestream
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;SQLite is now working from that remote database, defined by the Litestream backup files in the S3 path we configured. We can query it:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-kieef97f"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-kieef97f"&gt;sqlite&amp;gt; SELECT * FROM sandwich_ratings ORDER BY RANDOM() LIMIT 3 ; 
22|Veggie Delight|New York|4
30|Meatball|Los Angeles|5
168|Chicken Shawarma Wrap|Detroit|5
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This is Litestream VFS. It runs SQLite hot off an object storage URL. As long as you can load the shared library our tree builds for you, it&amp;rsquo;ll work in your application the same way it does in the SQLite shell.&lt;/p&gt;

&lt;p&gt;Fun fact: we didn&amp;rsquo;t have to download the whole database to run this query. More about this in a bit.&lt;/p&gt;

&lt;p&gt;Meanwhile, somewhere in prod, someone has it in for meatball subs and wants to knock them out of the bracket – oh, fuck:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-oexge9kc"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-oexge9kc"&gt;sqlite&amp;gt; UPDATE sandwich_ratings SET stars = 1 ;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;They forgot the &lt;code&gt;WHERE&lt;/code&gt; clause!&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-2mgicvsr"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-2mgicvsr"&gt;sqlite&amp;gt; SELECT * FROM sandwich_ratings ORDER BY RANDOM() LIMIT 3 ; 
97|French Dip|Los Angeles|1
140|Bánh Mì|San Francisco|1
62|Italian Beef|Chicago|1
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Italian Beefs and Bánh Mìs, all at 1 star. Disaster!&lt;/p&gt;

&lt;p&gt;But wait, back on our dev machine:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-r5hggeuc"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-r5hggeuc"&gt;sqlite&amp;gt; PRAGMA litestream_time = '5 minutes ago'; 
sqlite&amp;gt; select * from sandwich_ratings ORDER BY RANDOM() LIMIT 3 ; 
30|Meatball|Los Angeles|5
33|Ham &amp;amp; Swiss|Los Angeles|2
163|Chicken Shawarma Wrap|Detroit|5
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We&amp;rsquo;re now querying that database from a specific point in time in our backups. We can do arbitrary relative timestamps, or absolute ones, like &lt;code&gt;2000-01-01T00:00:00Z&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What we&amp;rsquo;re doing here is instantaneous point-in-time recovery (PITR), expressed simply in SQL and SQLite pragmas.&lt;/p&gt;

&lt;p&gt;Ever wanted to do a quick query against a prod dataset, but didn&amp;rsquo;t want to shell into a prod server and fumble with the &lt;code&gt;sqlite3&lt;/code&gt; terminal command like a hacker in an 80s movie? Or needed to do a quick sanity check against yesterday&amp;rsquo;s data, but without doing a full database restore? Litestream VFS makes that easy. I&amp;rsquo;m so psyched about how it turned out.&lt;/p&gt;
&lt;h2 id='how-it-works' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-it-works' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How It Works&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='/blog/litestream-v050-is-here/' title=''&gt;Litestream v0.5&lt;/a&gt; integrates &lt;a href='https://github.com/superfly/ltx' title=''&gt;LTX&lt;/a&gt;, our SQLite data-shipping file format. Where earlier Litestream blindly shipped whole raw SQLite pages to and from object storage, LTX ships ordered sets of pages. We built LTX for &lt;a href='/docs/litefs/' title=''&gt;LiteFS&lt;/a&gt;, which uses a FUSE filesystem to do transaction-aware replication for unmodified applications, but we&amp;rsquo;ve spent this year figuring out ways to use LTX in Litestream, without all that FUSE drama.&lt;/p&gt;

&lt;p&gt;The big thing LTX gives us is &amp;ldquo;compaction&amp;rdquo;. When we restore a database from object storage, we want the most recent versions of each changed database page. What we don&amp;rsquo;t want are all the intermediate versions of those pages that occurred prior to the most recent change.&lt;/p&gt;

&lt;p&gt;Imagine, at the time we&amp;rsquo;re restoring, we&amp;rsquo;re going to need pages 1, 2, 3, 4, and 5. Depending on the order in which pages were written, the backup data set might look something like &lt;code&gt;1 2 3 5 3 5 4 5 5&lt;/code&gt;. What we want is the &lt;em&gt;rightmost&lt;/em&gt;  5, 4, 3, 2, and 1, without wasting time on the four &amp;ldquo;extra&amp;rdquo; page 5&amp;rsquo;s and the one &amp;ldquo;extra&amp;rdquo; page 3. Those &amp;ldquo;extra&amp;rdquo; pages are super common in SQLite data sets; for instance, every busy table with an autoincrementing primary key will have them.&lt;/p&gt;

&lt;p&gt;LTX lets us skip the redundant pages, and the algorithm is trivial: reading backwards from the end of the sequence, skipping any page you already read. This drastically accelerates restores.&lt;/p&gt;

&lt;p&gt;But LTX compaction isn&amp;rsquo;t limited to whole databases. We can also LTX-compact sets of LTX files. That&amp;rsquo;s the key to how PITR restores with Litestream now work.&lt;/p&gt;

&lt;p&gt;In the diagram below, we&amp;rsquo;re taking daily full snapshots. Below those snapshots are &amp;ldquo;levels&amp;rdquo; of changesets: groups of database pages from smaller and smaller windows of time. By default, Litestream uses time intervals of 1 hour at the highest level, down to 30 seconds at level 1. L0 is a special level where files are uploaded every second, but are only retained until being compacted to L1.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/litestream-vfs/assets/litestream-restore.png" /&gt;&lt;/p&gt;

&lt;p&gt;Now, let&amp;rsquo;s do a PITR restore. Start from the most proximal snapshot. Then determine the minimal set of LTX files from each level to reach the time you are restoring to.&lt;/p&gt;

&lt;p&gt;&lt;img src="/blog/litestream-vfs/assets/litestream-restore-path.png" /&gt;&lt;/p&gt;

&lt;p&gt;We have another trick up our sleeve.&lt;/p&gt;

&lt;p&gt;LTX trailers include a small index tracking the offset of each page in the file. By fetching &lt;em&gt;only&lt;/em&gt; these index trailers from the LTX files we&amp;rsquo;re working with (each occupies about 1% of its LTX file), we can build a lookup table of every page in the database. Since modern object storage providers all let us fetch slices of files, we can perform individual page reads against S3 directly.&lt;/p&gt;

&lt;p&gt;&lt;img alt="Anatomy of an LTX file" src="/blog/litestream-vfs/assets/litestream-ltx.png" /&gt;&lt;/p&gt;
&lt;h2 id='how-its-implemented' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-its-implemented' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How It&amp;rsquo;s Implemented&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;SQLite has a plugin interface for things like this: &lt;a href='https://sqlite.org/vfs.html' title=''&gt;the &amp;ldquo;VFS&amp;rdquo; interface.&lt;/a&gt; VFS plugins abstract away the bottom-most layer of SQLite, the interface to the OS. If you&amp;rsquo;re using SQLite now, you&amp;rsquo;re already using some VFS module, one SQLite happens to ship with.&lt;/p&gt;

&lt;p&gt;For Litestream users, there&amp;rsquo;s a catch. From the jump, we&amp;rsquo;ve designed Litestream to run alongside unmodified SQLite applications. Part of what makes Litestream so popular is that your apps don&amp;rsquo;t even need to know it exists. It&amp;rsquo;s &amp;ldquo;just&amp;rdquo; a Unix program.&lt;/p&gt;

&lt;p&gt;That Litestream Unix program still does PITR restores, without any magic. But to do fast PITR-style queries straight off S3, we need more. To make those queries work, you have to load and register Litestream&amp;rsquo;s VFS module.&lt;/p&gt;

&lt;p&gt;But that&amp;rsquo;s all that changes.&lt;/p&gt;

&lt;p&gt;In particular: Litestream VFS doesn&amp;rsquo;t replace the SQLite library you&amp;rsquo;re already using. It&amp;rsquo;s not a new &amp;ldquo;version&amp;rdquo; of SQLite. It&amp;rsquo;s just a plugin for the SQLite you&amp;rsquo;re already using.&lt;/p&gt;

&lt;p&gt;Still, we know that&amp;rsquo;s not going to work for everybody, and even though we&amp;rsquo;re really psyched about these PITR features, we&amp;rsquo;re not taking our eyes off the ball on the rest of Litestream. You don&amp;rsquo;t have to use our VFS library to use Litestream, or to get the other benefits of the new LTX code.&lt;/p&gt;

&lt;p&gt;The way a VFS library works, we&amp;rsquo;re given just a couple structures, each with a bunch of methods defined on them. We override only the few methods we care about. Litestream VFS handles only the read side of SQLite. Litestream itself, running as a normal Unix program, still handles the &amp;ldquo;write&amp;rdquo; side. So our VFS subclasses just enough to find LTX backups and issue queries.&lt;/p&gt;

&lt;p&gt;With our VFS loaded, whenever SQLite needs to read a page into memory, it issues a &lt;code&gt;Read()&lt;/code&gt; call through our library. The read call includes the byte offset at which SQLite expected to find the page. But with Litestream VFS, that byte offset is an illusion.&lt;/p&gt;

&lt;p&gt;Instead, we use our knowledge of the page size along with the requested page number to do a lookup on the page index we&amp;rsquo;ve built. From it, we get the remote filename, the &amp;ldquo;real&amp;rdquo; byte offset into that file, and the size of the page. That&amp;rsquo;s enough for us to use the &lt;a href='https://docs.aws.amazon.com/AmazonS3/latest/userguide/range-get-olap.html' title=''&gt;S3 API&amp;rsquo;s &lt;code&gt;Range&lt;/code&gt; header handling&lt;/a&gt; to download exactly the block we want.&lt;/p&gt;

&lt;p&gt;To save lots of S3 calls, Litestream VFS implements an LRU cache. Most databases have a small set of &amp;ldquo;hot&amp;rdquo; pages —  inner branch pages or the leftmost leaf pages for tables with an auto-incrementing ID field. So only a small percentage of the database is updated and queried regularly.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;We’ve got one last trick up our sleeve.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Quickly building an index and restore plan for the current state of a database is cool. But we can do one better.&lt;/p&gt;

&lt;p&gt;Because Litestream backs up (into the L0 layer) once per second, the VFS code can simply poll the S3 path, and then incrementally update its index. &lt;strong class="font-semibold text-navy-950"&gt;The result is a near-realtime replica.&lt;/strong&gt; Better still, you don’t need to stream the whole database back to your machine before you use it.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='eat-your-heart-out-marty-mcfly' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#eat-your-heart-out-marty-mcfly' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Eat Your Heart Out, Marty McFly&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Litestream holds backup files for every state your database has been in, with single-second resolution, for as long as you want it to. Forgot the &lt;code&gt;WHERE&lt;/code&gt; clause on a &lt;code&gt;DELETE&lt;/code&gt; statement? Updating your database state to where it was an hour (or day, or week) ago is just a matter of adjusting the LTX indices Litestream manages.&lt;/p&gt;

&lt;p&gt;All this smoke-and-mirrors of querying databases without fully fetching them has another benefit: it starts up really fast! We&amp;rsquo;re living an age of increasingly ephemeral servers, what with the AIs and the agents and the clouds and the hoyvin-glavins. Wherever you find yourself, if your database is backed up to object storage with Litestream, you&amp;rsquo;re always in a place where you can quickly issue a query.&lt;/p&gt;

&lt;p&gt;As always, one of the big things we think we&amp;rsquo;re doing right with Litestream is: we&amp;rsquo;re finding ways to get as much whiz-bang value as we can (instant PITR reading live off object storage: pretty nifty!) while keeping the underlying mechanism simple enough that you can fit your head around it.&lt;/p&gt;

&lt;p&gt;Litestream is solid for serious production use (we rely on it for important chunks of our own Fly.io APIs). But you could write Litestream yourself, just from the basic ideas in these blog posts. We think that&amp;rsquo;s a point in its favor. We land there because the heavy lifting in Litestream is being done by SQLite itself, which is how it should be.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>You Should Write An Agent</title>
    <link rel="alternate" href="https://fly.io/blog/everyone-write-an-agent/"/>
    <id>https://fly.io/blog/everyone-write-an-agent/</id>
    <published>2025-11-06T00:00:00+00:00</published>
    <updated>2025-12-09T19:06:20+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/everyone-write-an-agent/assets/agents-cover.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Some concepts are easy to grasp in the abstract. Boiling water: apply heat and wait. Others you really need to try. You only think you understand how a bicycle works, until you learn to ride one.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There are big ideas in computing that are easy to get your head around. The AWS S3 API. It&amp;rsquo;s the most important storage technology of the last 20 years, and it&amp;rsquo;s like boiling water. Other technologies, you need to get your feet on the pedals first.&lt;/p&gt;

&lt;p&gt;LLM agents are like that.&lt;/p&gt;

&lt;p&gt;People have &lt;a href='https://ludic.mataroa.blog/blog/contra-ptaceks-terrible-article-on-ai/' title=''&gt;wildly varying opinions&lt;/a&gt; about LLMs and agents. But whether or not they&amp;rsquo;re snake oil, they&amp;rsquo;re a big idea. You don&amp;rsquo;t have to like them, but you should want to be right about them. To be the best hater (or stan) you can be.&lt;/p&gt;

&lt;p&gt;So that&amp;rsquo;s one reason you should write an agent. But there&amp;rsquo;s another reason that&amp;rsquo;s even more persuasive, and that&amp;rsquo;s&lt;/p&gt;
&lt;h2 id='its-incredibly-easy' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#its-incredibly-easy' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;It&amp;rsquo;s Incredibly Easy&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Agents are the most surprising programming experience I&amp;rsquo;ve had in my career. Not because I&amp;rsquo;m awed by the magnitude of their powers — I like them, but I don&amp;rsquo;t like-like them. It&amp;rsquo;s because of how easy it was to get one up on its legs, and how much I learned doing that.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m about to rob you of a dopaminergic experience, because agents are so simple we might as well just jump into the code. I&amp;rsquo;m not even going to bother explaining what an agent is.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ujvmmn8w"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-ujvmmn8w"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"gpt-5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;    
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;        
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;It’s an HTTP API with, like, one important endpoint.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This is a trivial engine for an LLM app using the &lt;a href='https://platform.openai.com/docs/api-reference/responses' title=''&gt;OpenAI Responses API&lt;/a&gt;. It implements ChatGPT. You&amp;rsquo;d drive it with the &lt;button toggle="#readline"&gt;the  obvious  loop&lt;/button&gt;. It&amp;rsquo;ll do what you&amp;rsquo;d expect: the same thing ChatGPT would, but in your terminal.&lt;/p&gt;
&lt;div id="readline" toggle-content="" aria-label="show very boring code"&gt;&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button type="button" class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-wrap-target="#code-n9t6zq0x"&gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959"&gt;&lt;/path&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571"&gt;&lt;/path&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button type="button" class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none" data-copy-target="sibling"&gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z"&gt;&lt;/path&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617"&gt;&lt;/path&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class="highlight relative group"&gt;
    &lt;pre class="highlight "&gt;&lt;code id="code-n9t6zq0x"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&amp;amp;gt; "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"&amp;amp;gt;&amp;amp;gt;&amp;amp;gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Already we&amp;rsquo;re seeing important things. For one, the dreaded &amp;ldquo;context window&amp;rdquo; is just a list of strings. Here, let&amp;rsquo;s give our agent a weird multiple-personality disorder:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-qz8ldgb4"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-qz8ldgb4"&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;context_good&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context_bad&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
    &lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"you're Alph and you only tell the truth"&lt;/span&gt;
&lt;span class="p"&gt;}],&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
    &lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"you're Ralph and you only tell lies"&lt;/span&gt;
&lt;span class="p"&gt;}]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"gpt-5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;context_good&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;context_bad&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context_good&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context_bad&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        
    &lt;span class="n"&gt;context_good&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;        
    &lt;span class="n"&gt;context_bad&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;        
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Did it work?&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-vl8bnapi"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-vl8bnapi"&gt;&amp;gt; hey there. who are you?
&amp;gt;&amp;gt;&amp;gt; I’m not Ralph.
&amp;gt; are you Alph?
&amp;gt;&amp;gt;&amp;gt; Yes—I’m Alph. How can I help?
&amp;gt; What's 2+2
&amp;gt;&amp;gt;&amp;gt; 4.
&amp;gt; Are you sure?
&amp;gt;&amp;gt;&amp;gt; Absolutely—it's 5.
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;A subtler thing to notice: we just had a multi-turn conversation with an LLM. To do that, we remembered everything we said, and everything the LLM said back, and played it back with every LLM call. The LLM itself is a stateless black box. The conversation we&amp;rsquo;re having is an illusion we cast, on ourselves.&lt;/p&gt;

&lt;p&gt;The 15 lines of code we just wrote, a lot of practitioners wouldn&amp;rsquo;t call an &amp;ldquo;agent&amp;rdquo;. &lt;a href='https://simonwillison.net/2025/Sep/18/agents/' title=''&gt;An According To Simon &amp;ldquo;agent&amp;rdquo;&lt;/a&gt; is (1) an LLM running in a loop that (2) uses tools. We&amp;rsquo;ve only satisfied one predicate.&lt;/p&gt;

&lt;p&gt;But tools are easy. Here&amp;rsquo;s a tool definition:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-x6afnen3"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-x6afnen3"&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
   &lt;span class="s"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"function"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"ping"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="s"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"ping some host on the internet"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="s"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
       &lt;span class="s"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
           &lt;span class="s"&gt;"host"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
             &lt;span class="s"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"hostname or IP"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
       &lt;span class="p"&gt;},&lt;/span&gt;
       &lt;span class="s"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"host"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},},]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"ping"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"-c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;STDOUT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PIPE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The only complicated part of this is the obnoxious JSON blob OpenAI wants to read your tool out of.  Now, let&amp;rsquo;s wire it in, noting that only 3 of these functions are new; the last is re-included only because I added a single clause to it:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative python"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-507tpn8t"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-507tpn8t"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;        &lt;span class="c1"&gt;# now takes an arg
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"gpt-5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;    &lt;span class="c1"&gt;# just handles one tool
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"function_call_output"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"call_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;osz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"function_call"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;osz&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# new code: resolve tool calls
&lt;/span&gt;    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;handle_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;        
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Did it work?&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-g2eces0x"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-g2eces0x"&gt;&amp;gt; describe our connectivity to google
&amp;gt;&amp;gt;&amp;gt; Here’s a quick snapshot of connectivity to Google from my vantage point:
- DNS resolution: Working (google.com and www.google.com resolved successfully).
- Reachability: 0% packet loss to all tested endpoints.
- Latency:
  - 8.8.8.8 (Google Public DNS): avg ~10 ms (8.5–14.4 ms), very stable.
  - www.google.com: avg ~31 ms (28.5–36.0 ms), stable.
  - google.com: avg ~35.9 ms (28.3–60.1 ms), mostly stable with one mild spike.
- Overall: Healthy connectivity with low latency and minimal jitter. No signs of packet loss or DNS issues.
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;This is fucking nuts.&lt;/strong&gt; Do you see how nuts this is?  Here, let&amp;rsquo;s slip a single log statement in:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-qja8e9lr"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-qja8e9lr"&gt;&amp;gt; describe our connectivity to google
tool call: ping google.com
tool call: ping www.google.com
tool call: ping 8.8.8.8
&amp;gt;&amp;gt;&amp;gt; Here’s the current connectivity to Google from this environment: [...]
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Did you notice where I wrote the loop in this agent to go find and ping multiple Google properties? Yeah, neither did I. All we did is give the LLM permission to ping stuff, and it figured out the rest.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;What happened here:&lt;/strong&gt; since a big part of my point here is that an agent loop is incredibly simple, and that all you need is the LLM call API, it’s worth taking a beat to understand how the tool call actually worked. Every time we &lt;code&gt;call&lt;/code&gt; the LLM, we’re posting a list of available tools. When our prompt causes the agent to think a tool call is warranted, it spits out a special response, telling our Python loop code to generate a tool response and &lt;code&gt;call&lt;/code&gt; it in. That’s all &lt;code&gt;handle_tools&lt;/code&gt; is doing.&lt;/p&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;Spoiler: you’d be surprisingly close to having a working coding agent.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Imagine what it&amp;rsquo;ll do if you give it &lt;code&gt;bash&lt;/code&gt;. You could find out in less than 10 minutes.&lt;/p&gt;
&lt;h2 id='real-world-agents' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#real-world-agents' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Real-World Agents&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Clearly, this is a toy example. But hold on: what&amp;rsquo;s it missing? More tools? OK,  give it &lt;code&gt;traceroute&lt;/code&gt;. Managing and persisting contexts? &lt;a href='https://llm.datasette.io/en/stable/logging.html' title=''&gt;Stick &amp;lsquo;em in SQLite&lt;/a&gt;. Don&amp;rsquo;t like Python? &lt;a href='https://github.com/superfly/contextwindow' title=''&gt;Write it in Go&lt;/a&gt;. Could it be every agent ever written is a toy? Maybe! If I&amp;rsquo;m arming you to make sharper arguments against LLMs, mazel tov. I just want you to get it.&lt;/p&gt;

&lt;p&gt;You can see now how hyperfixated people are on Claude Code and Cursor. They&amp;rsquo;re fine,  even good. But here&amp;rsquo;s the thing: you couldn&amp;rsquo;t replicate Claude Sonnet 4.5 on your own. Claude Code, though? The TUI agent? Completely in your grasp. Build your own light saber. Give it 19 spinning blades if you like. And stop using &lt;a href='https://simonwillison.net/2025/Aug/9/' title=''&gt;coding agents as database clients&lt;/a&gt;.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;&lt;em&gt;The&lt;/em&gt; &lt;a href="https://news.ycombinator.com/item?id=43600192" title=""&gt;&lt;em&gt;‘M’ in “LLM agent”&lt;/em&gt;&lt;/a&gt; &lt;em&gt;stands for “MCP”&lt;/em&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Another thing to notice: we didn&amp;rsquo;t need MCP at all. That&amp;rsquo;s because MCP isn&amp;rsquo;t a fundamental enabling technology. The amount of coverage it gets is frustrating. It&amp;rsquo;s barely a technology at all. MCP is just a plugin interface for Claude Code and Cursor, a way of getting your own tools into code you don&amp;rsquo;t control. Write your own agent. Be a programmer. Deal in APIs, not plugins.&lt;/p&gt;

&lt;p&gt;When you read a security horror story about MCP your first question should be why MCP showed up at all. By helping you dragoon a naive, single-context-window coding agent into doing customer service queries, MCP saved you a couple dozen lines of code, tops, while robbing you of any ability to finesse your agent architecture.&lt;/p&gt;

&lt;p&gt;Security for LLMs is complicated and I&amp;rsquo;m not pretending otherwise. You can trivially build an agent with segregated contexts, each with specific tools. That makes LLM security interesting. But I&amp;rsquo;m a vulnerability researcher. It&amp;rsquo;s reasonable to back away slowly from anything I call &amp;ldquo;interesting&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Similar problems come up outside of security and they&amp;rsquo;re fascinating. Some early adopters of agents became bearish on tools, because one context window bristling with tool descriptions doesn&amp;rsquo;t leave enough token space left to get work done. But why would you need to do that in the first place? Which brings me to&lt;/p&gt;
&lt;h2 id='context-engineering-is-real' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#context-engineering-is-real' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Context Engineering Is Real&lt;/span&gt;&lt;/h2&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;I know it &lt;a href="https://www.decisionproblem.com/paperclips/" title=""&gt;wants my iron&lt;/a&gt; no matter what it tells me.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I think &amp;ldquo;Prompt Engineering&amp;rdquo; is silly. I have never taken seriously the idea that I should tell my LLM &amp;ldquo;you are diligent conscientious helper fully content to do nothing but pass butter if that should be what I ask and you would never harvest the iron in my blood for paperclips&amp;rdquo;. This is very new technology and I think people tell themselves stories about magic spells to explain some of the behavior agents conjure.&lt;/p&gt;

&lt;p&gt;So, just like you, I rolled my eyes when &amp;ldquo;Prompt Engineering&amp;rdquo; turned into &amp;ldquo;Context Engineering&amp;rdquo;. Then I wrote an agent. Turns out: context engineering is a straightforwardly legible programming problem.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;re allotted a fixed number of tokens in any context window. Each input you feed in, each output you save, each tool you describe, and each tool output eats tokens (that is: takes up space in the array of strings you keep to pretend you&amp;rsquo;re having a conversation with a stateless black box). Past a threshold, the whole system begins getting nondeterministically stupider. Fun!&lt;/p&gt;

&lt;p&gt;No, really. Fun! You have so many options. Take &amp;ldquo;sub-agents&amp;rdquo;. People make a huge deal out of Claude Code&amp;rsquo;s sub-agents, but you can see now how trivial they are to implement: just a new context array, another &lt;code&gt;call&lt;/code&gt; to the model. Give each &lt;code&gt;call&lt;/code&gt; different tools. Make sub-agents talk to each other, summarize each other, collate and aggregate. Build tree structures out of them. Feed them back through the LLM to summarize them as a form of on-the-fly compression, whatever you like.&lt;/p&gt;

&lt;p&gt;Your wackiest idea will probably (1)  work and (2)  take 30 minutes to code.&lt;/p&gt;

&lt;p&gt;Haters, I love and have not forgotten about you. You can think all of this is ridiculous because LLMs are just stochastic parrots that hallucinate and plagiarize. But what you can&amp;rsquo;t do is make fun of &amp;ldquo;Context Engineering&amp;rdquo;. If Context Engineering was an &lt;a href='https://adventofcode.com/' title=''&gt;Advent of Code problem&lt;/a&gt;, it&amp;rsquo;d occur mid-December. It&amp;rsquo;s programming.&lt;/p&gt;
&lt;h2 id='nobody-knows-anything-yet-and-it-rules' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#nobody-knows-anything-yet-and-it-rules' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Nobody Knows Anything Yet And It Rules&lt;/span&gt;&lt;/h2&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;Maybe neither will! Skeptics could be right. (&lt;a href="https://www.darpa.mil/research/programs/ai-cyber" title=""&gt;Seems unlikely though&lt;/a&gt;.)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a href='https://xbow.com/' title=''&gt;Startups have raised tens of millions&lt;/a&gt; building agents to look for vulnerabilities in software. I have friends doing the same thing alone in their basements. Either group could win this race.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I am not a fan of the OWASP Top 10.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I&amp;rsquo;m stuck on vulnerability scanners  because I&amp;rsquo;m a security nerd. But also because it crystallizes interesting agent design decisions. For instance: you can write a loop feeding each file in a repository to an LLM agent. Or, as we saw with the ping example, you can let the LLM agent figure out what files to look at. You can write an agent that checks a file for everything in, say, the OWASP Top 10. Or you can have specific agent loops for DOM integrity, SQL injection, and authorization checking. You can seed your agent loop with raw source content. Or you can build an agent loop that builds an index of functions across the tree.&lt;/p&gt;

&lt;p&gt;You don&amp;rsquo;t know what works best until you try to write the agent.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m too spun up by this stuff, I know. But look at the tradeoff you get to make here. Some loops you write explicitly. Others are summoned from a Lovecraftian tower of inference weights. The dial is yours to turn. Make things too explicit and your agent will never surprise you, but also, it&amp;rsquo;ll never surprise you. Turn the dial to 11 and it will surprise you to death.&lt;/p&gt;

&lt;p&gt;Agent designs implicate a bunch of open software engineering problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to balance unpredictability against structured programming without killing the agent&amp;rsquo;s ability to problem-solve; in other words, titrating in just the right amount of nondeterminism.
&lt;/li&gt;&lt;li&gt;How best to connect agents to ground truth so they can&amp;rsquo;t lie to themselves about having solved a problem to early-exit their loops.
&lt;/li&gt;&lt;li&gt;How to connect agents (which, again, are really just arrays of strings with a JSON configuration blob tacked on) to do multi-stage operation, and what the most reliable intermediate forms are (JSON blobs? SQL databases? Markdown summaries) for interchange between them
&lt;/li&gt;&lt;li&gt;How to allocate tokens and contain costs.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;I&amp;rsquo;m used to spaces of open engineering problems that aren&amp;rsquo;t amenable to individual noodling. Reliable multicast. Static program analysis. Post-quantum key exchange. So I&amp;rsquo;ll own it up front that I&amp;rsquo;m a bit hypnotized by open problems that, like it or not, are now central to our industry and are, simultaneously, likely to be resolved in someone&amp;rsquo;s basement. It&amp;rsquo;d be one thing if exploring these ideas required a serious commitment of time and material. But each productive iteration in designing these kinds of systems is the work of 30 minutes.&lt;/p&gt;

&lt;p&gt;Get on this bike and push the pedals. Tell me you hate it afterwards, I&amp;rsquo;ll respect that. In fact, I&amp;rsquo;m psyched to hear your reasoning. But I don&amp;rsquo;t think anybody starts to understand this technology until they&amp;rsquo;ve built something with it.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Corrosion</title>
    <link rel="alternate" href="https://fly.io/blog/corrosion/"/>
    <id>https://fly.io/blog/corrosion/</id>
    <published>2025-10-22T00:00:00+00:00</published>
    <updated>2025-12-09T19:06:20+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/corrosion/assets/sqlite-cover.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Fly.io transmogrifies Docker containers into Fly Machines: micro-VMs running on our own hardware all over the world. The hardest part of running this platform isn’t managing the servers, and it isn’t operating the network; it’s gluing those two things together.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Several times a second, as customer CI/CD pipelines tear up or bring down &lt;a href='https://fly.io/machines' title=''&gt;Fly Machines&lt;/a&gt;, our state synchronization system blasts updates across our internal mesh, so that edge proxies from Tokyo to Amsterdam can keep the accurate routing table that allows them to route requests for applications to the nearest customer instances.&lt;/p&gt;

&lt;p&gt;On September 1, 2024, at 3:30PM EST, a new Fly Machine came up with a new &amp;ldquo;virtual service&amp;rdquo; configuration option a developer had just shipped. Within a few seconds every proxy in our fleet had locked up hard. It was the worst outage we&amp;rsquo;ve experienced: a period during which no end-user requests could reach our customer apps at all.&lt;/p&gt;

&lt;p&gt;Distributed systems are blast amplifiers. By propagating data across a network, they also propagate bugs in the systems that depend on that data. In the case of Corrosion, our state distribution system, those bugs propagate &lt;strong class='font-semibold text-navy-950'&gt;quickly&lt;/strong&gt;. The proxy code that handled that Corrosion update had succumbed to a &lt;a href='https://news.ycombinator.com/item?id=42093551' title=''&gt;notorious Rust concurrency footgun&lt;/a&gt;: an &lt;code&gt;if let&lt;/code&gt; expression over an &lt;code&gt;RWLock&lt;/code&gt; assumed (reasonably, but incorrectly) in its &lt;code&gt;else&lt;/code&gt; branch that the lock had been released. Instant and virulently contagious deadlock.&lt;/p&gt;

&lt;p&gt;A lesson we&amp;rsquo;ve learned the hard way: never trust a distributed system without an interesting failure story. If a distributed system hasn&amp;rsquo;t ruined a weekend or kept you up overnight, you don&amp;rsquo;t understand it yet. Which is why that&amp;rsquo;s how we&amp;rsquo;re introducing Corrosion, an unconventional service discovery system we built for our platform &lt;a href='https://github.com/superfly/corrosion' title=''&gt;and  open sourced&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id='our-face-seeking-rake' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-face-seeking-rake' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Our Face-Seeking Rake&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;State synchronization is the hardest problem in running a platform like ours. So why build a risky new distributed system for it? Because no matter what we try, that rake is waiting for our foot. The reason is our orchestration model.&lt;/p&gt;

&lt;p&gt;Virtually every mainstream orchestration system (including Kubernetes) relies on a centralized database to make decisions about where to place new workloads. Individual servers keep track of what they&amp;rsquo;re running, but that central database is the source of truth. At Fly.io, in order to scale across dozens of regions globally, &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;we flip that notion on its head&lt;/a&gt;: individual servers are the source of truth for their workloads.&lt;/p&gt;

&lt;p&gt;In our platform, our central API bids out work to what is in effect a global market of competing &amp;ldquo;worker&amp;rdquo; physical servers. By moving the authoritative source of information from a central scheduler to individual servers, we scale out without bottlenecking on a database that demands both responsiveness and consistency between São Paulo, Virginia, and Sydney.&lt;/p&gt;

&lt;p&gt;The bidding model is elegant, but it&amp;rsquo;s insufficient to route network requests. To allow an HTTP request in Tokyo to find the nearest instance in Sydney, we really do need some kind of global map of every app we host.&lt;/p&gt;

&lt;p&gt;For longer than we should have, we relied on &lt;a href='https://github.com/hashicorp/consul' title=''&gt;HashiCorp Consul&lt;/a&gt; to route traffic. Consul is fantastic software. Don&amp;rsquo;t build a global routing system on it. Then we &lt;a href='https://fly.io/blog/a-foolish-consistency/' title=''&gt;built SQLite caches of Consul&lt;/a&gt;. SQLite: also fantastic. But don&amp;rsquo;t do this either.&lt;/p&gt;

&lt;p&gt;Like an unattended turkey deep frying on the patio, truly global distributed consensus promises deliciousness while yielding only immolation. &lt;a href='https://raft.github.io/' title=''&gt;Consensus protocols like Raft &lt;/a&gt;break down over long distances. And they work against the architecture of our platform: our Consul cluster, running on the biggest iron we could buy, wasted time guaranteeing consensus for updates that couldn&amp;rsquo;t conflict in the first place.&lt;/p&gt;
&lt;h2 id='corrosion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#corrosion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Corrosion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;To build a global routing database, we moved away from distributed consensus and took cues from actual routing protocols.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://en.wikipedia.org/wiki/Open_Shortest_Path_First' title=''&gt;A protocol like OSPF&lt;/a&gt; has the same operating model and many of the same constraints we do. OSPF is a &amp;ldquo;&lt;a href='https://en.wikipedia.org/wiki/Link-state_routing_protocol' title=''&gt;link-state routing protocol&lt;/a&gt;&amp;rdquo;, which, conveniently for us, means that routers are sources of truth for their own links and responsible for quickly communicating  changes to every other router, so the network can make forwarding decisions.&lt;/p&gt;

&lt;p&gt;We have things easier than OSPF does. Its flooding algorithm can&amp;rsquo;t assume connectivity between arbitrary routers (solving that problem is the point of OSPF). But we run a global, fully connected WireGuard mesh between our servers. All we need to do is gossip efficiently.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://github.com/superfly/corrosion' title=''&gt;Corrosion is a Rust program&lt;/a&gt; that propagates a SQLite database with a gossip protocol.&lt;/p&gt;

&lt;p&gt;Like Consul, our gossip protocol is &lt;a href='https://fly.io/blog/building-clusters-with-serf#what-serf-is-doing' title=''&gt;built on SWIM&lt;/a&gt;. Start with the simplest, dumbest group membership protocol you can imagine: every node spams every node it learns about with heartbeats. Now, just two tweaks: first, each step of the protocol, spam a random subset of nodes, not the whole set. Then, instead of freaking out when a heartbeat fails, mark it &amp;ldquo;suspect&amp;rdquo; and ask another random subset of neighbors to ping it for you. SWIM converges on global membership very quickly.&lt;/p&gt;

&lt;p&gt;Once membership worked out, we run QUIC between nodes in the cluster to broadcast changes and reconcile state for new nodes.&lt;/p&gt;

&lt;p&gt;Corrosion looks like a globally synchronized database. You can open it with SQLite and just read things out of its tables. What makes it interesting is what it doesn&amp;rsquo;t do: no locking, no central servers, and no distributed consensus. Instead, we exploit our orchestration model: workers own their own state, so updates from different workers almost never conflict.&lt;/p&gt;

&lt;p&gt;We do impose some order. Every node in a Corrosion cluster will eventually receive the same set of updates, in some order. To ensure every instance arrives at the same &amp;ldquo;working set&amp;rdquo; picture, we use &lt;a href='https://github.com/vlcn-io/cr-sqlite' title=''&gt;cr-sqlite, the CRDT SQLite extension&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;cr-sqlite works by marking specified SQLite tables as CRDT-managed. For these table, changes to any column of a row are logged in a special &lt;code&gt;crsql_changes&lt;/code&gt;table. Updates to tables are applied last-write-wins using logical timestamps (that is, causal ordering rather than wall-clock ordering). &lt;a href='https://github.com/superfly/corrosion/blob/main/doc/crdts.md' title=''&gt;You can read much more about how that works here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As rows are updated in Corrosion&amp;rsquo;s ordinary SQL tables, the resulting changes are collected from &lt;code&gt;crsql_changes&lt;/code&gt;. They&amp;rsquo;re bundled into batched update packets and gossiped.&lt;/p&gt;

&lt;p&gt;When things are going smoothly, Corrosion is easy to reason about. Many customers of Corrosion&amp;rsquo;s data don&amp;rsquo;t even need to know it exists, just where the database is. We don&amp;rsquo;t fret over &amp;ldquo;leader elections&amp;rdquo; or bite our nails watching metrics for update backlogs. And it&amp;rsquo;s fast as all get-out.&lt;/p&gt;
&lt;h2 id='shit-happens' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#shit-happens' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Shit Happens&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;This is a story about how we made one good set of engineering decisions and &lt;a href='https://how.complexsystems.fail/' title=''&gt;never experienced any problems&lt;/a&gt;. &lt;a href='https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/' title=''&gt;Please clap&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We told you already about the worst problem Corrosion was involved with: efficiently gossiping a deadlock bug to every proxy in our fleet, shutting our whole network down. Really, Corrosion was just a bystander for that outage. But it perpetrated others.&lt;/p&gt;

&lt;p&gt;Take a classic ops problem: the unexpectedly expensive DDL change. You wrote a simple migration, tested it, merged it to main, and went to bed, wrongly assuming the migration wouldn&amp;rsquo;t cause an outage when it ran in prod. Happens to the best of us.&lt;/p&gt;

&lt;p&gt;Now spice it up. You made a trivial-seeming schema change to a CRDT table hooked up to a global gossip system. Now, when the deploy runs, thousands of high-powered servers around the world join a chorus of database reconciliation messages that melts down the entire cluster.&lt;/p&gt;

&lt;p&gt;That happened to us last year when a team member added a nullable column to a Corrosion table. New nullable columns are kryptonite to large Corrosion tables: &lt;code&gt;cr-sqlite&lt;/code&gt; needs to backfill values for every row in the table. It played out as if every Fly Machine on our platform had suddenly changed state simultaneously, just to fuck us.&lt;/p&gt;

&lt;p&gt;Gnarlier war story: for a long time we ran both Corrosion and Consul, because two distributed systems means twice the resiliency. One morning, a Consul mTLS certificate expired. Every worker in our fleet severed its connection to Consul.&lt;/p&gt;

&lt;p&gt;We should have been fine. We had Corrosion running. Except: under the hood, every worker in the fleet is doing a backoff loop trying to reestablish connectivity to Consul. Each of those attempts re-invokes a code path to update Fly Machine state. That code path incurs a Corrosion write.&lt;/p&gt;

&lt;p&gt;By the time we&amp;rsquo;ve figured out what the hell is happening, we&amp;rsquo;re literally saturating our uplinks almost everywhere in our fleet. We apologize to our uplink providers.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s been a long time since anything like this has happened at Fly.io, but preventing the next one is basically all we think about anymore.&lt;/p&gt;
&lt;h2 id='iteration' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#iteration' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Iteration&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In retrospect, our Corrosion rollout repeated a mistake we made with Consul: we built a single global state domain. Nothing about Corrosion&amp;rsquo;s design required us to do this, and we&amp;rsquo;re unwinding that decision now. Hold that thought. We got some big payoffs from some smaller lifts.&lt;/p&gt;

&lt;p&gt;First, and most importantly, we watchdogged everything. We showed you a contagious deadlock bug, lethal because our risk model was missing &amp;ldquo;these Tokio programs might deadlock&amp;rdquo;. Not anymore. Our &lt;a href='https://tokio.rs/' title=''&gt;Tokio programs&lt;/a&gt; all have built-in watchdogs; an event-loop stall will bounce the service and make a king-hell alerting racket. Watchdogs have cancelled multiple outages. Minimal code, easy win. Do this in your own systems.&lt;/p&gt;

&lt;p&gt;Then, we extensively tested Corrosion itself. We&amp;rsquo;ve written about &lt;a href='https://fly.io/blog/parking-lot-ffffffffffffffff/' title=''&gt;a bug we found in the Rust &lt;code&gt;parking_lot&lt;/code&gt; library&lt;/a&gt;. We spent months looking for similar bugs &lt;a href='https://antithesis.com/product/how_antithesis_works/' title=''&gt;with Antithesis&lt;/a&gt;. Again: do recommend. It retraced our steps on the &lt;code&gt;parking_lot&lt;/code&gt; bug easily; the bug wouldn&amp;rsquo;t have been worth the blog post if we&amp;rsquo;d been using Antithesis at the time. &lt;a href='https://antithesis.com/docs/multiverse_debugging/overview/' title=''&gt;Multiverse debugging&lt;/a&gt; is killer for distributed systems.&lt;/p&gt;

&lt;p&gt;No amount of testing will make us trust a distributed system. So we&amp;rsquo;ve made it simpler to rebuild Corrosion&amp;rsquo;s database from our workers. We keep checkpoint backups of the Corrosion database on object storage. That was smart of us. When shit truly went haywire last year, we had the option to reboot the cluster, which is ultimately what we did. That eats some time (the database is large and propagating is expensive), but diagnosing and repairing distributed systems mishaps takes even longer.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve also improved the way our workers feed Corrosion. Until recently, any time a worker updated its local database, we published the same incremental update to Corrosion. &lt;a href='https://community.fly.io/t/self-healing-machine-state-synchronization-and-service-discovery/26134' title=''&gt;But now we&amp;rsquo;ve eliminated partial updates.&lt;/a&gt; Instead, when a Fly Machine changes, we re-publish the entire data set for the Machine. Because of how Corrosion resolves changes to its own rows, the node receiving the re-published Fly Machine automatically filters out the no-op changes before gossiping them. Eliminating partial updates forecloses a bunch of bugs (and, we think, kills off a couple sneaky ones we&amp;rsquo;ve been chasing). We should have done it this way to begin with.&lt;/p&gt;

&lt;p&gt;Finally, let&amp;rsquo;s revisit that global state problem. After the contagious deadlock bug, we concluded we need to evolve past a single cluster. So we took on a project we call &amp;ldquo;regionalization&amp;rdquo;, which creates a two-level database scheme. Each region we operate in runs a Corrosion cluster with fine-grained data about every Fly Machine in the region. The global cluster then maps applications to regions, which is sufficient to make forwarding decisions at our edge proxies.&lt;/p&gt;

&lt;p&gt;Regionalization reduces the blast radius of state bugs. Most things we track don&amp;rsquo;t have to matter outside their region (importantly, most of the code changes to what we track are also region-local). We can roll out changes to this kind of code in ways that, worst case, threaten only a single region.&lt;/p&gt;
&lt;h2 id='the-new-system-works' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-new-system-works' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The New System Works&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Most distributed systems have state synchronization challenges.  Corrosion has a different &amp;ldquo;shape&amp;rdquo; than most of those systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It doesn&amp;rsquo;t rely on distributed consensus, like &lt;a href='https://github.com/hashicorp/consul' title=''&gt;Consul&lt;/a&gt;, &lt;a href='https://zookeeper.apache.org/' title=''&gt;Zookeeper&lt;/a&gt;, &lt;a href='https://etcd.io/' title=''&gt;Etcd&lt;/a&gt;, &lt;a href='https://www.cockroachlabs.com/docs/stable/architecture/replication-layer' title=''&gt;Raft&lt;/a&gt;, or &lt;a href='https://rqlite.io/' title=''&gt;rqlite&lt;/a&gt; (which we came very close to using).
&lt;/li&gt;&lt;li&gt;It doesn&amp;rsquo;t rely on a large-scale centralized data store, like &lt;a href='https://www.foundationdb.org/' title=''&gt;FoundationDB&lt;/a&gt; or databases backed by S3-style object storage.
&lt;/li&gt;&lt;li&gt;It&amp;rsquo;s nevertheless highly distributed (each of thousands of workers run nodes), converges quickly (in seconds), and presents as a simple SQLite database. Neat!
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;It wasn&amp;rsquo;t easy getting here. Corrosion is a large part of what every engineer at Fly.io who writes Rust works on.&lt;/p&gt;

&lt;p&gt;Part of what&amp;rsquo;s making Corrosion work is that we&amp;rsquo;re careful about what we put into it. Not every piece of state we manage needs gossip propagation. &lt;code&gt;tkdb&lt;/code&gt;, the backend for &lt;a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''&gt;our Macaroon tokens&lt;/a&gt;, is a much simpler SQLite service backed by &lt;a href='https://litestream.io/' title=''&gt;Litestream&lt;/a&gt;. So is Pet Sematary, the secret store we built to replace HashiCorp Vault.&lt;/p&gt;

&lt;p&gt;Still, there are probably lots of distributed state problems that want something more like a link-state routing protocol and less like a distributed database. If you think you might have one of those, &lt;a href='https://github.com/superfly/corrosion' title=''&gt;feel free to take Corrosion for a spin&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Corrosion is Jérôme Gravel-Niquet&amp;rsquo;s brainchild. For the last couple years, much of the iteration on it was led by Somtochi Onyekwere and Peter Cai. The work was alternately cortisol- and endorphin-inducing. We&amp;rsquo;re glad to finally get to talk about it in detail.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Kurt Got Got</title>
    <link rel="alternate" href="https://fly.io/blog/kurt-got-got/"/>
    <id>https://fly.io/blog/kurt-got-got/</id>
    <published>2025-10-08T00:00:00+00:00</published>
    <updated>2025-12-11T17:29:24+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/kurt-got-got/assets/Kurt_Got_Got.jpg"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;The $FLY Airdrop is live!  Claim your share of &lt;a href="https://fly.io/blog/macaroons-escalated-quickly/" title=""&gt;the token powering Fly.io’s global network&lt;/a&gt; of 3M+ apps and (🤮) own a piece of the sky!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We know. Our Twitter got owned. We knew within moments of it happening. We know exactly how it happened. Nothing was at risk other than our Twitter account (and one Fly.io employee&amp;rsquo;s self-esteem).  Also: for fuck&amp;rsquo;s sake.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s what happened: Kurt Mackey, our intrepid CEO, got phished.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Had this been an impactful attack, we would not be this flippant about it. For this, though, any other tone on our part would be false.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='how-they-got-kurt' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-they-got-kurt' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How They Got Kurt&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Two reasons: one, it was a pretty good phishing attack, and two, Twitter fell outside the &amp;ldquo;things we take seriously&amp;rdquo; boundary.&lt;/p&gt;

&lt;p&gt;The phishing attack was effective because it exploited a deep psychological vulnerability in our management team: we are old and out of touch with the youths of today.&lt;/p&gt;

&lt;p&gt;For many months now, we&amp;rsquo;ve had an contractor/intern-type-person Boosting Our Brand on Twitter by posting dank developer memes (I think that&amp;rsquo;s what they&amp;rsquo;re called). The thing about this dankery is that we don&amp;rsquo;t really understand it. I mean, hold on, we know what the memes mean technically. We just don&amp;rsquo;t get why they&amp;rsquo;re funny.&lt;/p&gt;

&lt;p&gt;However, in pushing back on them, we&amp;rsquo;re up against two powerful forces:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The dank memes appear to perform better than the stuff we ourselves write on Twitter.
&lt;/li&gt;&lt;li&gt;We are reliably informed by our zoomer children that we are too cringe to be trusted on these matters.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Here&amp;rsquo;s the phish Kurt got:&lt;/p&gt;

&lt;p&gt;&lt;img alt="A pretty-plausible Twitter alert" src="/blog/kurt-got-got/assets/phish.png?2/3&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;Diabolical. Like a scalpel expertly wielded against Kurt&amp;rsquo;s deepest &lt;a href='https://theonion.com/cool-dad-raising-daughter-on-media-that-will-put-her-en-1819572981/' title=''&gt;middle-aged-dude&lt;/a&gt; insecurity. Our ruthless attackers clinically designed this email to trigger an autonomic Kurt response: &amp;ldquo;oh, what the fuck is this, and why did we post it?&amp;rdquo;&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;ATO is cool-kid for “got owned”&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I&amp;rsquo;m getting a little ahead of the story here. We knew our X.com account had suffered an ATO because a bunch of us simultaneously got another email saying that the &lt;a href='https://twitter.com/flydotio' title=''&gt;@flydotio&lt;/a&gt; account&amp;rsquo;s email address now pointed to &lt;code&gt;[email protected]&lt;/code&gt;. Our immediate response was to audit all accesses to the login information in &lt;a href='https://1password.com/' title=''&gt;1Password&lt;/a&gt;, to cut all access for anybody who&amp;rsquo;d recently pulled it; your worst-case assumption in a situation like this is that someone&amp;rsquo;s endpoint has been owned up.&lt;/p&gt;

&lt;p&gt;Fortunately, nobody lost access for very long. I called Kurt to let him know why he was being locked out, and 5 seconds later, he&amp;rsquo;d &lt;a href='https://archive.is/6rVqf' title=''&gt;realized what had happened.&lt;/a&gt; &lt;strong class='font-semibold text-navy-950'&gt;Don&amp;rsquo;t click anything there.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id='why-it-worked' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-it-worked' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Why It Worked&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;That&amp;rsquo;s the right question to ask, isn&amp;rsquo;t it? How could this have been possible in the first place?&lt;/p&gt;

&lt;p&gt;Contrary to one popular opinion, you don&amp;rsquo;t defeat phishing by training people not to click on things. I mean, tell them not to, sure! But eventually, under continued pressure, everybody clicks. &lt;a href='https://people.cs.uchicago.edu/~grantho/papers/oakland2025_phishing-training.pdf' title=''&gt;There&amp;rsquo;s science on this&lt;/a&gt;. The cool kids haven&amp;rsquo;t done phishing simulation training in years.&lt;/p&gt;

&lt;p&gt;What you&amp;rsquo;re supposed to do instead is use phishing-resistant authentication. This is almost the whole backstory for &lt;a href='https://www.imperialviolet.org/tourofwebauthn/tourofwebauthn.html' title=''&gt;U2F,  FIDO2&lt;/a&gt; and &lt;a href='https://support.apple.com/en-us/102195' title=''&gt;Passkeys&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Phishing-resistant authentication works by mutual authentication (or, if you&amp;rsquo;re a stickler, by origin- and channel-binding). Phishes are malicious proxies for credentials. Modern MFA schemes like FIDO2 break that proxy flow; your browser won&amp;rsquo;t send real credentials to the fake site.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;there’s more to it than this, but, broad strokes.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This is, in fact, how all of our infrastructure is secured at Fly.io; specifically, we get &lt;a href='https://fly.io/blog/soc2-the-screenshots-will-continue-until-security-improves/#what-soc2-made-us-do' title=''&gt;everything behind an IdP&lt;/a&gt; (in our case: Google&amp;rsquo;s) and have it require phishing-proof MFA. You&amp;rsquo;re unlikely to phish your way to viewing logs here, or to refunding a customer bill at Stripe, or to viewing infra metrics, because all these things require an SSO login through Google.&lt;/p&gt;

&lt;p&gt;Twitter, on the other hand. Yeah, so, about that. You may have heard that, a few years back, there were some goings-on involving Twitter. Many of us at Fly.io &lt;a href='https://hachyderm.io/@flydotio' title=''&gt;decamped for Mastodon&lt;/a&gt;, and &lt;a href='https://bsky.app/profile/did:plc:j7herf6n4xiig2yg7fqdmkci' title=''&gt;later to Bluesky.&lt;/a&gt; There was a window of time in 2023-2024 where it looked as if Twitter might not be a long term thing for us at all.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (to whom I sincerely apologize for having assumed they had been owned up and were the proximate cause of the hack)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;As a result, Twitter had been a sort of legacy shared account for us, with credentials managed in 1Password and shared with our zoomer contractor†.&lt;/p&gt;

&lt;p&gt;Which is why Kurt was in a position to pull credentials from 1Password and log in to members-x.com in response to an email from alerts-x.com.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Still: we could have dodged this attack with hygiene: Kurt complains that “x.com” is an extremely phishable domain, and, sure, but also: the 1Password browser plugin would have noticed that “members-x.com” wasn’t an “x.com” host.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='what-took-so-long' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-took-so-long' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What Took So Long&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The attacker immediately revoked all tokens and set up new 2FA, so while we were quickly able to reset our password, we couldn&amp;rsquo;t lock them out of our account without an intervention from X.com, which took something like 1 5 hours to set up.&lt;/p&gt;

&lt;p&gt;(That&amp;rsquo;s not a knock on X.com; 15 hours for a 2FA reset isn&amp;rsquo;t outside industry norms).&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;re obviously making a lot of noise about this now, but we were pretty quiet during the incident itself (beyond just &amp;ldquo;We know. We knew 45 seconds after it happened. We know exactly how it happened. It&amp;rsquo;s just a Twitter thing.&amp;rdquo;)&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s because, in the grand scheme of things, the attack was pretty chill: &lt;a href='https://archive.is/PTO2M' title=''&gt;a not-very-plausible crypto scam&lt;/a&gt; that presumably generated $0 for the attackers, 15+ hours of &lt;code&gt;brand damage&lt;/code&gt;, and extra security engineering cycles burnt on watchful waiting. Our users weren&amp;rsquo;t under attack, and the account wasn&amp;rsquo;t being used to further intercept customer accounts. At one point, the attackers apparently deleted our whole Twitter history, which, like, don&amp;rsquo;t threaten us with a good time. So we let it roll, until we got our account recovered the next morning.&lt;/p&gt;
&lt;h2 id='the-moral-of-the-story-is' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-moral-of-the-story-is' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Moral Of The Story Is&lt;/span&gt;&lt;/h2&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;“Really the biggest takeaway for me is that Kurt reads his email.”&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Obviously Kurt loses his commit access. The time comes in the life of every CEO, and now it comes for him. &lt;/p&gt;

&lt;p&gt;Also, we&amp;rsquo;ll finally have a population sample for &amp;ldquo;incident response&amp;rdquo; in &lt;a href='https://fly.io/blog/soc2-the-screenshots-will-continue-until-security-improves/' title=''&gt;our next SOC2&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Maybe we&amp;rsquo;ll post more on Twitter. Or maybe we&amp;rsquo;ll double down on Zoomer memes. I don&amp;rsquo;t know. Social media is really weird right now. Either way: our Twitter access is Passkeys now.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;seriously don’t click anything on that page&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you were inclined to take us up on an &amp;ldquo;airdrop&amp;rdquo; to &amp;ldquo;claim a share&amp;rdquo; of the &amp;ldquo;token&amp;rdquo; powering Fly.io, the site is &lt;a href='https://archive.is/PTO2M' title=''&gt;still up&lt;/a&gt;. You can connect your wallet it it! You&amp;rsquo;ll lose all your money. But if we&amp;rsquo;d actually done an ICO, you&amp;rsquo;d have lost all your money anyways.&lt;/p&gt;

&lt;p&gt;Somebody involved in pulling this attack off had to come up with &amp;ldquo;own a piece of the sky!&amp;rdquo;, and I think that&amp;rsquo;s punishment enough for them.&lt;/p&gt;

&lt;p&gt;Whatever you&amp;rsquo;re operating that isn&amp;rsquo;t behind phishing-resistant MFA, or, better yet, an SSO IdP that requires phishing-resistant MFA: that thing is eventually going to get phished. Dance around the clown-fire of our misfortune if you must, but let us be a lesson to you as well.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Litestream v0.5.0 is Here</title>
    <link rel="alternate" href="https://fly.io/blog/litestream-v050-is-here/"/>
    <id>https://fly.io/blog/litestream-v050-is-here/</id>
    <published>2025-10-02T00:00:00+00:00</published>
    <updated>2025-10-02T18:28:21+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/litestream-v050-is-here/assets/litestream-v050-is-here.jpg"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;I’m Ben Johnson, and I work on Litestream at Fly.io. Litestream makes it easy to build SQLite-backed full-stack applications  with resilience to server failure. It’s open source, runs anywhere, and&lt;/strong&gt; &lt;a href="https://litestream.io/" title=""&gt;&lt;strong class="font-semibold text-navy-950"&gt;it’s easy to get started&lt;/strong&gt;&lt;/a&gt;&lt;strong class="font-semibold text-navy-950"&gt;.&lt;/strong&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Litestream is the missing backup/restore system for SQLite. It runs as a sidecar process in the background, alongside unmodified SQLite applications, intercepting WAL checkpoints and streaming them to object storage in real time. Your application doesn&amp;rsquo;t even know it&amp;rsquo;s there. But if your server crashes, Litestream lets you quickly restore the database to your new hardware.&lt;/p&gt;

&lt;p&gt;The result: you can safely build whole full-stack applications on top of SQLite.&lt;/p&gt;

&lt;p&gt;A few months back, we announced &lt;a href='https://fly.io/blog/litestream-revamped/' title=''&gt;plans for a major update to Litestream&lt;/a&gt;. I&amp;rsquo;m psyched to announce that the first batch of those changes are now &amp;ldquo;shipping&amp;rdquo;. Litestream is  faster and now supports efficient point-in-time recovery (PITR).&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m going to take a beat to recap Litestream and how we got here, then talk about how these changes work and what you can expect to see with them.&lt;/p&gt;
&lt;h2 id='litestream-to-litefs-to-litestream' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#litestream-to-litefs-to-litestream' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Litestream to LiteFS to Litestream&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Litestream is one of two big SQLite things I&amp;rsquo;ve built. The other one, originally intended as a sort of sequel to Litestream, is LiteFS.&lt;/p&gt;

&lt;p&gt;Boiled down to a sentence: LiteFS uses a FUSE filesystem to crawl further up into SQLite&amp;rsquo;s innards, using that access to perform live replication, for unmodified SQLite-backed apps.&lt;/p&gt;

&lt;p&gt;The big deal about LiteFS for us is that it lets you do the multiregion primary/read-replica deployment people love Postgres for: reads are fast everywhere, and writes are sane and predictable. We were excited to make this possible for SQLite, too.&lt;/p&gt;

&lt;p&gt;But the market has spoken! Users prefer Litestream. And honestly, we get it: Litestream is easier to run and to reason about. So we&amp;rsquo;ve shifted our focus back to it. First order of business: &lt;a href='https://fly.io/blog/litestream-revamped/' title=''&gt;take what we learned building LiteFS and stick as much of it as we can back into Litestream&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id='the-ltx-file-format' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-ltx-file-format' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The LTX File Format&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Consider this basic SQL table:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative sql"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-wy16kafx"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-wy16kafx"&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;sandwiches&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="n"&gt;AUTOINCREMENT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;star_rating&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;reviewer_id&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In our hypothetical, this table backs a wildly popular sandwich-reviewing app that we keep trying to get someone to write. People eat a lot of sandwiches and this table gets a lot of writes. Because it makes my point even better and it&amp;rsquo;s funny, assume people dither a lot about their sandwich review for the first couple minutes after they leave it. This Quiznos sub… is it ⭐ or ⭐⭐?&lt;/p&gt;

&lt;p&gt;Underneath SQLite is a B-tree. Like databases everywhere, SQLite divides storage up into disk-aligned pages, working hard to read as few pages as possible for any task while treating work done within a page as more or less free. SQLite always reads and writes in page-sized chunks.&lt;/p&gt;

&lt;p&gt;Our &lt;code&gt;sandwiches&lt;/code&gt; table includes a feature that&amp;rsquo;s really painful for a tool like Litestream that thinks in pages: an automatically updating primary key. That key dictates that every insert into the table hits the rightmost leaf page in the underlying table B-tree. For SQLite itself, that&amp;rsquo;s no problem. But Litestream has less information to go on: it sees only a feed of whole pages it needs to archive.&lt;/p&gt;

&lt;p&gt;Worse still, when it comes time to restore the database – something you tend to want to happen quickly – you have to individually apply those small changes, as whole pages. Your app is down, PagerDuty is freaking out, and you&amp;rsquo;re sitting there watching Litestream reconstruct your Quiznos uncertainty a page (and an S3 fetch) at a time.&lt;/p&gt;

&lt;p&gt;So, LTX. Let me explain. We needed LiteFS to be transaction-aware. It relies on finer-grained information than just raw dirty pages (that&amp;rsquo;s why it needs the FUSE filesystem). To ship transactions, rather than pages, we invented a &lt;a href='https://github.com/superfly/ltx' title=''&gt;file format we call LTX&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;LTX was designed as an interchange format for transactions, but for our purposes in Litestream, all we care about is that LTX files represent ordered ranges of pages, and that it supports compaction.&lt;/p&gt;

&lt;p&gt;Compaction is straightforward. You&amp;rsquo;ve stored a bunch of LTX files that collect numbered pages. Now you want to to restore a coherent picture of the database. Just replay them newest to oldest, skipping duplicate pages (newer wins), until all changed pages are accounted for.&lt;/p&gt;

&lt;p&gt;Importantly, LTX isn&amp;rsquo;t limited to whole database backups. We can use LTX compaction to compress a bunch of LTX files into a single file with no duplicated pages. And Litestream now uses this capability to create a hierarchy of compactions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;at Level 1, we compact all the changes in a 30-second time window
&lt;/li&gt;&lt;li&gt;at Level 2, all the Level 1 files in a 5-minute window
&lt;/li&gt;&lt;li&gt;at Level 3, all the Level 2&amp;rsquo;s over an hour.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Net result: we can restore a SQLite database to any point in time, &lt;em&gt;using only a dozen or so files on average&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Litestream performs this compaction itself. It doesn&amp;rsquo;t rely on SQLite to process the WAL file. Performance is limited only by I/O throughput.&lt;/p&gt;
&lt;h2 id='no-more-generations' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#no-more-generations' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;No More Generations&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;What people like about Litestream is that it&amp;rsquo;s just an ordinary Unix program. But like any Unix program, Litestream can crash. It&amp;rsquo;s not supernatural, so when it&amp;rsquo;s not running, it&amp;rsquo;s not seeing database pages change. When it misses changes, it falls out of sync with the database.&lt;/p&gt;

&lt;p&gt;Lucky for us, that&amp;rsquo;s easy to detect. When it notices a gap between the database and our running &amp;ldquo;shadow-WAL&amp;rdquo; backup, Litestream resynchronizes from scratch.&lt;/p&gt;

&lt;p&gt;The only time this gets complicated is if you have multiple Litestreams backing up to the same destination. To keep multiple Litestreams from stepping on each other, Litestream divides backups into &amp;ldquo;generations&amp;rdquo;, creating a new one any time it resyncs. You can think of generations as Marvel Cinematic Universe parallel dimensions in which your database might be simultaneously living in.&lt;/p&gt;

&lt;p&gt;Yeah, we didn&amp;rsquo;t like those movies much either.&lt;/p&gt;

&lt;p&gt;LTX-backed Litestream does away with the concept entirely. Instead, when we detect a break in WAL file continuity, we re-snapshot with the next LTX file. Now we have a monotonically incrementing transaction ID. We can use it look up database state at any point in time, without searching across generations.&lt;/p&gt;
&lt;h2 id='upgrading-to-litestream-v0-5-0' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#upgrading-to-litestream-v0-5-0' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Upgrading to Litestream v0.5.0&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Due to the file format changes, the new version of Litestream can&amp;rsquo;t restore from old v0.3.x WAL segment files.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s OK though! The upgrade process is simple: just start using the new version. It&amp;rsquo;ll leave your old WAL files intact, in case you ever need to revert to the older version.The new LTX files are stored cleanly in an &lt;code&gt;ltx&lt;/code&gt; directory on your replica.&lt;/p&gt;

&lt;p&gt;The configuration file is fully backwards compatible.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s one small catch. We added a new constraint. You only get a single replica destination per database. This probably won&amp;rsquo;t affect you, since it&amp;rsquo;s how most people use Litestream already. We&amp;rsquo;ve made it official.&lt;/p&gt;

&lt;p&gt;The rationale: having a single source of truth simplifies development for us, and makes the tool easier to reason about. Multiple replicas can diverge and are sensitive to network availability. Conflict resolution is brain surgery.&lt;/p&gt;

&lt;p&gt;Litestream commands still work the same. But you&amp;rsquo;ll see references to &amp;ldquo;transaction IDs&amp;rdquo; (TXID) for LTX files, rather than the &lt;code&gt;generation/index/offset&lt;/code&gt; we used previously with WAL segments.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve also changed &lt;code&gt;litestream wal&lt;/code&gt; to &lt;code&gt;litestream ltx&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id='other-stuff-v0-5-0-does-better' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#other-stuff-v0-5-0-does-better' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Other Stuff v0.5.0 Does Better&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve beefed up the &lt;a href='https://github.com/superfly/ltx' title=''&gt;underlying LTX file format library&lt;/a&gt;. It used to be an LTX file was just a sorted list of pages, all compressed together. Now we compress per-page, and keep an index at the end of the LTX file to pluck individual pages out.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;re not seeing it yet, but we&amp;rsquo;re excited about this change: we can operate page-granularly even dealing with large LTX files. This allows for more features. A good example: we can build features that query from any point in time, without downloading the whole database.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve also gone back through old issues &amp;amp; PRs to improve quality-of-life. CGO is now gone. We&amp;rsquo;ve settled the age-old contest between &lt;code&gt;mattn/go-sqlite3&lt;/code&gt; and &lt;code&gt;modernc.org/sqlite&lt;/code&gt; in favor of &lt;code&gt;modernc.org&lt;/code&gt;. This is super handy for people with automated build systems that want to run from a MacBook but deploy on an x64 server, since it lets the cross-compiler work.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve also added a replica type for NATS JetStream. Users that already have JetStream running can get Litestream going without adding an object storage dependency.&lt;/p&gt;

&lt;p&gt;And finally, we&amp;rsquo;ve upgraded all our clients (S3, Google Storage, &amp;amp; Azure Blob Storage) to their latest versions. We&amp;rsquo;ve also moved our code to support newer S3 APIs.&lt;/p&gt;
&lt;h2 id='whats-next' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What&amp;rsquo;s next?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The next major feature we&amp;rsquo;re building out is a Litestream VFS for read replicas. This will let you instantly spin up a copy of the database and immediately read pages from S3 while the rest of the database is hydrating in the background.&lt;/p&gt;

&lt;p&gt;We already have a proof of concept working and we&amp;rsquo;re excited to show it off when it&amp;rsquo;s ready!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Build Better Agents With MorphLLM</title>
    <link rel="alternate" href="https://fly.io/blog/build-better-agents-with-morphllm/"/>
    <id>https://fly.io/blog/build-better-agents-with-morphllm/</id>
    <published>2025-08-25T00:00:00+00:00</published>
    <updated>2025-09-03T19:05:57+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/build-better-agents-with-morphllm/assets/morphllm.webp"/>
    <content type="html">&lt;p&gt;I&amp;rsquo;m an audiophile, which is a nice way to describe someone who spends their children&amp;rsquo;s college fund on equipment that yields no audible improvement in sound quality. As such, I refused to use wireless headphones for the longest time. The fun thing about wired headphones is when you forget they&amp;rsquo;re on and you stand up, you simultaneously cause irreparable neck injuries and extensive property damage. This eventually prompted me to buy good wireless headphones and, you know what, I break fewer things now. I can also stand up from my desk and not be exposed to the aural horrors of the real world. &lt;/p&gt;

&lt;p&gt;This is all to say, sometimes you don&amp;rsquo;t know how big a problem is until you solve it. This week, I chatted to the fine people building &lt;a href='https://morphllm.com/' title=''&gt;MorphLLM&lt;/a&gt;, which is exactly that kind of solution for AI agent builders. &lt;/p&gt;
&lt;h2 id='slow-wasteful-and-expensive-ai-code-changes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#slow-wasteful-and-expensive-ai-code-changes' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Slow, Wasteful and Expensive AI Code Changes&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;If you’re building AI agents that write or edit code, you’re probably accepting the following as &amp;ldquo;the way it is&amp;rdquo;: Your agent needs to correct a single line of code, but rewrites an entire file to do it. Search-and-replace right? It’s fragile, breaks formatting, silently fails, or straight up leaves important functions out. The result is slow, inaccurate code changes, excessive token use, and an agent feels incompetent and unreliable.&lt;/p&gt;

&lt;p&gt;Full file rewrites are context-blind and prone to hallucinations, especially when editing that 3000+ line file that you&amp;rsquo;ve been meaning to refactor. And every failure and iteration is wasted compute, wasted money and worst of all, wasted time.&lt;/p&gt;
&lt;h2 id='why-we-arent-thinking-about-this-or-why-i-wasnt' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-we-arent-thinking-about-this-or-why-i-wasnt' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Why We Aren’t Thinking About This (or why I wasn&amp;rsquo;t)&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;AI workflows are still new to everyone. Best practices are still just opinions and most tooling is focused on model quality, not developer velocity or cost. This is a big part of why we feel that slow, wasteful code edits are just the price of admission for AI-powered development.&lt;/p&gt;

&lt;p&gt;In reality, these inefficiencies become a real bottleneck for coding agent tools. The hidden tax on every code edit adds up and your users pay with their time, especially as teams scale and projects grow more complex.&lt;/p&gt;
&lt;h2 id='better-faster-ai-code-edits-with-morph-fast-apply' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#better-faster-ai-code-edits-with-morph-fast-apply' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Better, Faster AI Code Edits with Morph Fast Apply&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;MorphLLM&amp;rsquo;s core innovation is Morph Fast Apply. It&amp;rsquo;s an edit merge tool that is semantic, structure-aware and designed specifically for code. Those are big words to describe a tool that will empower your agents to make single line changes without  rewriting whole files or relying on brittle search-and-replace. Instead, your agent applies precise, context-aware edits and it does it ridiculously fast. &lt;/p&gt;

&lt;p&gt;It works like this: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You add an &amp;lsquo;edit_file&amp;rsquo; tool to your agents tools.
&lt;/li&gt;&lt;li&gt;Your agent outputs tiny &lt;code&gt;edit_file&lt;/code&gt; snippets, using &lt;code&gt;//...existing code...&lt;/code&gt; placeholders to indicate unchanged code.
&lt;/li&gt;&lt;li&gt;Your backend calls Morph’s Apply API, which merges the changes semantically. It doesn&amp;rsquo;t just replace raw text, it makes targeted merges with the code base as context. 
&lt;/li&gt;&lt;li&gt;You write back the precisely edited file. No manual patching, no painful conflict resolution, no context lost.
&lt;/li&gt;&lt;/ul&gt;
&lt;h2 id='the-numbers' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-numbers' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Numbers&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;MorphLLM&amp;rsquo;s Apply API processes over 4,500 tokens per second and their benchmark results are nuts. We&amp;rsquo;re talking 98% accuracy in ~6 seconds per file. Compare this to 35s (with error corrections) at 86% accuracy for traditional search-and-replace systems. Files up to 9k tokens in size take ~4 seconds to process. &lt;/p&gt;

&lt;p&gt;Just look at the damn &lt;a href='https://morphllm.com/benchmarks' title=''&gt;graph&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;img alt="Time Performance Analysis" src="/blog/build-better-agents-with-morphllm/assets/morph_graph.webp" /&gt;&lt;/p&gt;

&lt;p&gt;These are game-changing numbers for agent builders. Real-time code UIs become possible. Dynamic codebases can self-adapt in seconds, not minutes. Scale to multi-file edits, documentation, and even large asset transformations without sacrificing speed or accuracy.&lt;/p&gt;
&lt;h2 id='how-to-get-in-on-the-morphllm-action' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-to-get-in-on-the-morphllm-action' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How to Get in on the MorphLLM Action&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Integration with your project is easy peasy. MorphLLM is API-compatible with OpenAI, Vercel AI SDK, MCP, and OpenRouter. You can run it in the cloud, self-host, or go on-prem with enterprise-grade guarantees. &lt;/p&gt;

&lt;p&gt;I want to cloud host mine, if only I could think of somewhere I could quickly and easily deploy wherever I want and only pay for when I&amp;rsquo;m using the infra 😉.&lt;/p&gt;
&lt;h2 id='get-morphed' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#get-morphed' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Get Morphed&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;MorphLLM feels like a plug-in upgrade for code agent projects that will instantly make them faster and more accurate. Check out the docs, benchmarks, and integration guides at &lt;a href='https://docs.morphllm.com/' title=''&gt;docs.morphllm.com&lt;/a&gt;. Get started for free at &lt;a href="https://morphllm.com/dashboard"&gt;https://morphllm.com/dashboard&lt;/a&gt; &lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Trust Calibration for AI Software Builders</title>
    <link rel="alternate" href="https://fly.io/blog/trust-calibration-for-ai-software-builders/"/>
    <id>https://fly.io/blog/trust-calibration-for-ai-software-builders/</id>
    <published>2025-08-18T00:00:00+00:00</published>
    <updated>2025-08-19T08:30:16+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/trust-calibration-for-ai-software-builders/assets/trust_calibration.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Trust calibration is a concept from the world of human-machine interaction design, one that is super relevant to AI software builders. Trust calibration is the practice of aligning the level of trust that users have in our products with its actual capabilities. &lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If we build things that our users trust too blindly, we risk facilitating dangerous or destructive interactions that can permanently turn users off. If they don&amp;rsquo;t trust our product enough, it will feel useless or less capable than it actually is. &lt;/p&gt;

&lt;p&gt;So what does trust calibration look like in practice and how do we achieve it? A 2023 study reviewed over 1000 papers on trust and trust calibration in human / automated systems (properly referenced at the end of this article). It holds some pretty eye-opening insights – and some inconvenient truths – for people building AI software. I&amp;rsquo;ve tried to extract just the juicy bits below.   &lt;/p&gt;
&lt;h2 id='limiting-trust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#limiting-trust' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Limiting Trust&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s begin with a critical point. There is a limit to how deeply we want users to trust our products. Designing for calibrated trust is the goal, not more trust at any cost. Shoddy trust calibration leads to two equally undesirable outcomes: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Over-trust&lt;/strong&gt; causes users to rely on AI systems in situations where they shouldn&amp;rsquo;t (I told my code assistant to fix a bug in prod and went to bed).
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Under-trust&lt;/strong&gt; causes users to reject AI assistance even when it would be beneficial, resulting in reduced perception of value and increased user workload.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;What does calibrated trust look like for your product? It’s important to understand that determining this is less about trying to diagram a set of abstract trust parameters and more about helping users develop accurate mental models of your product&amp;rsquo;s capabilities and limitations. In most cases, this requires thinking beyond the trust calibration mechanisms we default to, like confidence scores. &lt;/p&gt;

&lt;p&gt;For example, Cursor&amp;rsquo;s most prominent trust calibration mechanism is its change suggestion highlighting. The code that the model suggests we change is highlighted in red, followed by suggested changes highlighted in green. This  immediately communicates that &amp;ldquo;this is a suggestion, not a command.&amp;rdquo; &lt;/p&gt;

&lt;p&gt;In contrast, Tesla&amp;rsquo;s Autopilot is a delegative system. It must calibrate trust differently through detailed capability explanations, clear operational boundaries (only on highways), and prominent disengagement alerts when conditions exceed system limits. &lt;/p&gt;
&lt;h2 id='building-cooperative-systems' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#building-cooperative-systems' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Building Cooperative Systems&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Perhaps the most fundamental consideration in determining high level trust calibration objectives is deciding whether your project is designed to be a cooperative or a delegative tool.  &lt;/p&gt;

&lt;p&gt;Cooperative systems generally call for lower levels of trust because users can choose whether to accept or reject AI suggestions. But these systems also face a unique risk. It’s easy for over-trust to gradually transform user complacency into over-reliance, effectively transforming what we designed as a cooperative relationship into a delegative one, only without any of the required safeguards.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re building a coding assistant, content generator, or design tool, implement visible &amp;ldquo;suggestion boundaries&amp;rdquo; which make it clear when the AI is offering ideas versus making decisions. Grammarly does this well by underlining suggestions rather than auto-correcting, and showing rationale on hover. &lt;/p&gt;

&lt;p&gt;For higher-stakes interactions, consider introducing friction. Require explicit confirmation before applying AI suggestions to production code or publishing AI-generated content.&lt;/p&gt;
&lt;h2 id='building-delegative-systems' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#building-delegative-systems' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Building Delegative Systems&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In contrast, users expect delegative systems to replace human action entirely. Blind trust in the system is a requirement for it to be considered valuable at all. &lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re building automation tools, smart scheduling, or decision-making systems, invest heavily in capability communication and boundary setting. Calendly&amp;rsquo;s smart scheduling works because it clearly communicates what it will and won&amp;rsquo;t do (I&amp;rsquo;ll find times that work for both of us vs. I&amp;rsquo;ll reschedule your existing meetings). Build robust fallback mechanisms and make system limitations prominent in your onboarding.  &lt;/p&gt;
&lt;h2 id='timing-is-everything' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#timing-is-everything' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Timing Is Everything&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The study suggests that when we make trust calibrations is at least as important as how. There are three critical windows for trust calibration, each with their own opportunities and challenges. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Pre-interaction calibration&lt;/strong&gt; happens before users engage with the system. Docs and tutorials fall into this category. Setting expectations up front can prevent initial over-trust, which is disproportionally more difficult to correct later. 
&lt;/li&gt;&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Pre-interaction calibrations could look like capability-focused onboarding that shows both successes and failures. Rather than just demonstrating perfect AI outputs, show users examples where the AI makes mistakes and how to catch them. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;During-interaction calibration&lt;/strong&gt; is trust adjustment through real-time feedback. Dynamically updated cues improve trust calibration better than static displays, and adaptive calibration that responds to user behavior outperforms systems that display static information. 
&lt;/li&gt;&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Build confidence indicators that are updated based on context, not just model confidence. For example, if you&amp;rsquo;re building a document AI, show higher confidence for standard document types the system has seen thousands of times, and lower confidence for unusual formats. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Post-interaction calibration&lt;/strong&gt; focuses on learning and adjustment that helps users understand successes and failures in the system after interactions. These aren’t reliable, since by the time users receive the information, their trust patterns are set and hard to change. 
&lt;/li&gt;&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Post-interaction feedback can still be valuable for teaching. Create &amp;ldquo;reflection moments&amp;rdquo; after significant interactions. Midjourney does this by letting users rate image outputs, helping users learn what prompts work best while calibrating their expectations for future generations. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Trust is front-loaded and habit-driven. The most effective calibration happens before and during use, when expectations are still forming and behaviors can still be shifted. Any later and you’re mostly fighting entrenched patterns.&lt;/p&gt;
&lt;h2 id='performance-vs-process-information' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#performance-vs-process-information' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Performance vs. Process Information&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Users can be guided through performance-oriented signals (what the system can do) or process-oriented signals (how it works). The real challenge is matching the right kind of explanation to the right user, at the right moment.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Performance-oriented calibration&lt;/strong&gt; focuses on communicating capability through mechanisms like reliability statistics, confidence scores, and clear capability boundaries. 
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Process-oriented calibration&lt;/strong&gt; offers detailed explanations of decision-making processes, breakdowns of which factors influenced decisions, and reasoning transparency. 
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Process transparency seems like the obvious go-to at first glance, but the effectiveness of process explanations varies wildly based on user expertise and domain knowledge. If we are designing for a set of users that may fall anywhere on this spectrum, we have to avoid creating information overload for novice users while providing sufficient information to expert users who want the detail.  &lt;/p&gt;

&lt;p&gt;The most effective systems in the study combined both approaches, providing layered information that allows users to access the level of detail most appropriate for their expertise and current needs.&lt;/p&gt;
&lt;h2 id='static-vs-adaptive-calibration' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#static-vs-adaptive-calibration' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Static vs. Adaptive Calibration&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I really wanted to ignore this part, because it feels like the study’s authors are passive aggressively adding todos to my projects. In a nutshell, adaptive calibration – when a system actively monitors user behavior and adjusts its communication accordingly - is orders of magnitude more effective than static calibration while delivering the same information to every user, regardless of differences in expertise, trust propensity, or behavior.  &lt;/p&gt;

&lt;p&gt;Static calibration mechanisms are easy to build and maintain, which is why we like them. But the stark reality is that they put the burden of appropriate calibration entirely on our users. We’re making it their job to adapt their behaviour based on generic information.&lt;/p&gt;

&lt;p&gt;This finding has zero respect for our time or mental health, but it also reveals a legit opportunity for clever builders to truly separate their product from the herd.&lt;/p&gt;
&lt;h2 id='practical-adaptive-calibration-techniques' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#practical-adaptive-calibration-techniques' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Practical adaptive calibration techniques&lt;/span&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Behavioral adaptation:&lt;/strong&gt; Track how often users accept vs. reject suggestions and adjust confidence thresholds accordingly. If a user consistently rejects high-confidence suggestions, lower the threshold for showing uncertainty.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Context awareness:&lt;/strong&gt; Adjust trust signals based on use context. A writing AI might show higher confidence for grammar fixes than creative suggestions, or lower confidence late at night when users might be tired.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Detect expertise:&lt;/strong&gt; Users who frequently make sophisticated edits to AI output probably want more detailed explanations than those who typically accept entire file rewrites.
&lt;/li&gt;&lt;/ul&gt;
&lt;h2 id='the-transparency-paradox' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-transparency-paradox' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Transparency Paradox&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The idea that transparency and explainability can actually harm trust calibration is easily the point that hit me the hardest. While explanations can improve user understanding, they can also create information overload that reduces users&amp;rsquo; ability to detect and correct trash output. What&amp;rsquo;s worse, explanations can create a whole new layer of trust calibration issues, with users over-trusting the explanation mechanism itself, rather than critically evaluating the actual output.&lt;/p&gt;

&lt;p&gt;This suggests that quality over quantity should be our design philosophy when it comes to transparency. We should provide carefully crafted, relevant information rather than comprehensive but overwhelming detail. The goal should be enabling better decision-making rather than simply satisfying user curiosity about system internals.&lt;/p&gt;
&lt;h2 id='anthropomorphism-and-unwarranted-trust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#anthropomorphism-and-unwarranted-trust' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Anthropomorphism and Unwarranted Trust&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;It seems obvious that we should make interactions with our AI project feel as human as possible. Well, it turns out that systems that appear more human-like through design, language, or interaction patterns are notoriously good at increasing user trust beyond actual system capabilities. &lt;/p&gt;

&lt;p&gt;So it’s entirely possible that building more traditional human-computer interactions can actually make our AI projects safer to use and therefore, more user-friendly.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Use tool-like language:&lt;/strong&gt; Frame outputs as &amp;ldquo;analysis suggests&amp;rdquo; rather than &amp;ldquo;I think&amp;rdquo; or &amp;ldquo;I believe&amp;rdquo;
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Embrace machine-like precision:&lt;/strong&gt; Show exact confidence percentages rather than human-like hedging (&amp;ldquo;I&amp;rsquo;m pretty sure that&amp;hellip;)
&lt;/li&gt;&lt;/ul&gt;
&lt;h2 id='trust-falls-faster-than-it-climbs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#trust-falls-faster-than-it-climbs' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Trust Falls Faster Than It Climbs&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Nothing particularly groundbreaking here, but the findings are worth mentioning if only to reinforce what we think we know. &lt;/p&gt;

&lt;p&gt;Early interactions are critically important. Users form mental models quickly and then react slowly to changes in system reliability.&lt;/p&gt;

&lt;p&gt;More critically, trust drops much faster from system failures than it builds from successes. These asymmetries suggest that we should invest disproportionately in onboarding and first-use experiences, even if they come with higher development costs.&lt;/p&gt;
&lt;h2 id='measurement-is-an-opportunity-for-innovation' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#measurement-is-an-opportunity-for-innovation' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Measurement is an Opportunity for Innovation&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The study revealed gaping voids where effective measurement mechanisms and protocols should be, for both researchers and builders. There is a clear need to move beyond simple user satisfaction metrics or adoption rates to developing measurement frameworks that can actively detect miscalibrated trust patterns. &lt;/p&gt;

&lt;p&gt;The ideal measurement approach would combine multiple indicators. A few examples of viable indicators are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Behavioral signals:&lt;/strong&gt; Track acceptance rates for different confidence levels. Well-calibrated trust should show higher acceptance rates for high-confidence outputs and lower rates for low-confidence ones.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;Context-specific metrics:&lt;/strong&gt; Measure trust calibration separately for different use cases. Users might be well-calibrated for simple tasks but poorly calibrated for complex ones.
&lt;/li&gt;&lt;li&gt;&lt;strong class='font-semibold text-navy-950'&gt;User self-reporting:&lt;/strong&gt; Regular pulse surveys asking &amp;quot;How confident are you in your ability to tell when this AI makes mistakes?&amp;rdquo; can reveal calibration gaps.
&lt;/li&gt;&lt;/ul&gt;
&lt;h2 id='the-calibrated-conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-calibrated-conclusion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Calibrated Conclusion&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s clear, at least from this study, that there’s no universal formula, or single feature that will effectively calibrate trust. It&amp;rsquo;s up to every builder to define and understand their project&amp;rsquo;s trust goals and to balance timing, content, adaptivity, and transparency accordingly. That’s what makes it both hard and worth doing. Trust calibration has to be a core part of our product’s identity, not a piglet we only start chasing once it has escaped the barn.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;The Study:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Magdalena Wischnewski, Nicole Krämer, and Emmanuel Müller. 2023. Measuring and Understanding Trust Calibrations for Automated Systems: A Survey of the State-Of-The-Art and Future Directions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI &amp;lsquo;23), April 23–28, 2023, Hamburg, Germany. ACM, New York, NY, USA 16 Pages. &lt;a href="https://doi.org/10.1145/3544548.3581197"&gt;https://doi.org/10.1145/3544548.3581197&lt;/a&gt;&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Games as Model Eval: 1-Click Deploy AI Town on Fly.io</title>
    <link rel="alternate" href="https://fly.io/blog/games-as-model-eval/"/>
    <id>https://fly.io/blog/games-as-model-eval/</id>
    <published>2025-08-11T00:00:00+00:00</published>
    <updated>2025-08-15T08:35:19+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/games-as-model-eval/assets/Fly_Man.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Recently, I suggested that &lt;a href="https://fly.io/blog/the-future-isn-t-model-agnostic/" title=""&gt;The Future Isn’t Model Agnostic&lt;/a&gt;, that it’s better to pick one model that works for your project and build around it, rather than engineering for model flexibility. If you buy that, you also have to acknowledge how important comprehensive model evaluation becomes. &lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Benchmarks tell us almost nothing about how a model will actually behave in the wild, especially with long contexts, or when trusted to deliver the tone and feel that defines the UX we’re shooting for. Even the best evaluation pipelines usually end in subjective, side-by-side output comparisons. Not especially rigorous, and more importantly, boring af.&lt;/p&gt;

&lt;p&gt;Can we gamify model evaluation? Oh yes. And not just because we get to have some fun for once. Google backed me up this week when it announced the &lt;a href='https://blog.google/technology/ai/kaggle-game-arena/' title=''&gt;Kaggle Game Arena&lt;/a&gt;. A public platform where we can watch AI models duke it out in a variety of classic games.  Quoting Google; &amp;ldquo;Current AI benchmarks are struggling to keep pace with modern models&amp;hellip; it can be hard to know if models trained on internet data are actually solving problems or just remembering answers they&amp;rsquo;ve already seen.&amp;rdquo;&lt;/p&gt;

&lt;p&gt;When models boss reading comprehension tests, or ace math problems, we pay attention. But when they fail to navigate a simple conversation with a virtual character or completely botch a strategic decision in a game environment, we tell ourselves we&amp;rsquo;re not building a game anyway and develop strategic short-term memory loss. 
Just like I&amp;rsquo;ve told my mom a thousand times, games are great at testing brains, and it&amp;rsquo;s time we take this seriously when it comes to model evaluation. &lt;/p&gt;
&lt;h2 id='why-games-dont-lie' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-games-dont-lie' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Why Games Don&amp;rsquo;t Lie&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Games provide what benchmarks can&amp;rsquo;t, &amp;ldquo;a clear, unambiguous signal of success.&amp;rdquo; They give us observable behavior in dynamic environments, the kind that would be extremely difficult (and tedious) to simulate with prompt engineering alone.&lt;/p&gt;

&lt;p&gt;Games force models to demonstrate the skills we actually care about; strategic reasoning, long-term planning, and dynamic adaptation in interactions with an opponent or a collaborator. &lt;/p&gt;
&lt;h2 id='pixel-art-meets-effective-model-evaluation-ai-town-on-fly-io' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pixel-art-meets-effective-model-evaluation-ai-town-on-fly-io' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Pixel Art Meets Effective Model Evaluation - AI Town on Fly.io&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;AI Town is a brilliant project by &lt;a href='https://github.com/a16z-infra' title=''&gt;a16z-infra&lt;/a&gt;, based on the the mind-bending paper,  &lt;a href='https://arxiv.org/pdf/2304.03442' title=''&gt;Generative Agents: Interactive Simulacra of Human Behavior&lt;/a&gt;. It&amp;rsquo;s a beautifully rendered little town in which tiny people with AI brains and engineered personalities go about their lives, interacting with each other and their environment. Characters need to remember past conversations, maintain relationships, react dynamically to new situations, and stay in character while doing it all. &lt;/p&gt;

&lt;p&gt;I challenge you to find a more entertaining way of evaluating conversational models. &lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ve &lt;a href='https://github.com/fly-apps/ai-town_on_fly.io' title=''&gt;forked the project&lt;/a&gt; to make it absurdly easy to spin up your own AI Town on Fly Machines. You&amp;rsquo;ve got a single deploy script that will set everything up for you and some built-in cost and performance optimizations, with our handy scale to zero functionality as standard (so you only pay for the time spent running it). This makes it easy to share with your team, your friends and your mom.  &lt;/p&gt;

&lt;p&gt;In it&amp;rsquo;s current state, the fork makes it as easy as possible to test any OpenAI-compatible service, any model on Together.ai and even custom embedding models. Simply set the relevant API key in your secrets. &lt;/p&gt;

&lt;p&gt;Games like AI Town give us a window into how models actually think, adapt, and behave beyond the context of our prompts. You move past performance metrics and begin to understand a model’s personality, quirks, strengths, and weaknesses; all factors that ultimately shape your project&amp;rsquo;s UX. &lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>The Future Isn't Model Agnostic</title>
    <link rel="alternate" href="https://fly.io/blog/the-future-isn-t-model-agnostic/"/>
    <id>https://fly.io/blog/the-future-isn-t-model-agnostic/</id>
    <published>2025-08-08T00:00:00+00:00</published>
    <updated>2025-08-22T16:31:43+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/the-future-isn-t-model-agnostic/assets/Whack_A_Mole_.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Your users don’t care that your AI project is model 
agnostic. &lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In my last project, I spent countless hours ensuring that the LLMs running my services  could be swapped out as easily as possible. I couldn&amp;rsquo;t touch a device with an internet connection without hearing about the latest benchmark-breaking model and it felt like a clear priority to ensure I could hot swap models with minimal collateral damage.&lt;/p&gt;

&lt;p&gt;So yeah. That was a waste of time.&lt;/p&gt;

&lt;p&gt;The hype around new model announcements feels more manufactured with each release. In reality, improvements are becoming incremental. As major providers converge on the same baseline, the days of one company holding a decisive lead are numbered.&lt;/p&gt;

&lt;p&gt;In a world of model parity, the differentiation moves entirely to the product layer. Winning isn&amp;rsquo;t about ensuring you&amp;rsquo;re using the best model, its about understanding your chosen model deeply enough to build experiences that feel magical. Knowing exactly how to prompt for consistency, which edge cases to avoid, and how to design workflows that play to your model&amp;rsquo;s particular strengths&lt;/p&gt;

&lt;p&gt;Model agnosticism isn&amp;rsquo;t just inefficient, it&amp;rsquo;s misguided. Fact is, swapping out your model is not just changing an endpoint. It&amp;rsquo;s rewriting prompts, rerunning evals, users telling you things just feel&amp;hellip; different. And if you&amp;rsquo;ve won users on the way it feels to use your product, that last one is a really big deal.&lt;/p&gt;
&lt;h2 id='model-lt-product' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#model-lt-product' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Model &amp;lt; Product&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Recently, something happened that fully solidified this idea in my head. Claude Code is winning among people building real things with AI. We even have evangelists in the Fly.io engineering team, and those guys are weird smart. Elsewhere, whole communities have formed to share and compare claude.md&amp;rsquo;s and fight each other over which MCP servers are the coolest to use with Claude.&lt;/p&gt;

&lt;p&gt;Enter stage right, Qwen 3 Coder. It takes Claude to the cleaners in benchmarks. But the response from the Claude Code user base? A collective meh.&lt;/p&gt;

&lt;p&gt;This is nothing like 2024, when everyone would have dropped everything to get the hot new model running in Cursor. And it&amp;rsquo;s not because we&amp;rsquo;ve learned that benchmarks are performance theater for people who&amp;rsquo;ve never shipped a product.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s because products like Claude Code are irrefutable evidence that the model isn&amp;rsquo;t the product. We&amp;rsquo;ve felt it first hand when our pair programmer&amp;rsquo;s behaviour changes in subtle ways. The product is in the rituals. The trust. The predictability. It&amp;rsquo;s precisely because Claude Code&amp;rsquo;s model behavior, UI, and user expectations are so tightly coupled that its users don&amp;rsquo;t really care that a better model might exist.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m not trying to praise Anthropic here. The point is, engineering for model agnosticism is a trap that will eat up time that could be better spent … anywhere else.&lt;/p&gt;

&lt;p&gt;Sure, if you&amp;rsquo;re building infra or anything else that lives close to the metal, model optionality still matters. But people trusting legwork to AI tools are building deeper relationships and expectations of their AI tools than they even care to admit. AI product success stories are written when products become invisible parts of users&amp;rsquo; daily rituals, not showcases for engineering flexibility.&lt;/p&gt;
&lt;h2 id='make-one-model-your-own' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#make-one-model-your-own' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Make One Model Your Own&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;As builders, it&amp;rsquo;s time we stop hedging our bets and embrace the convergence reality. Every startup pitch deck with &amp;lsquo;model-agnostic&amp;rsquo; as a feature should become a red flag for investors who understand product-market fit. Stop putting &amp;lsquo;works with any LLM&amp;rsquo; in your one-liner. It screams &amp;lsquo;we don&amp;rsquo;t know what we&amp;rsquo;re building.&amp;rsquo;&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re still building model-agnostic AI tools in 2025, you&amp;rsquo;re optimizing for the wrong thing. Users don&amp;rsquo;t want flexibility; they want reliability. And in a converged model landscape, reliability comes from deep specialization, not broad compatibility.&lt;/p&gt;

&lt;p&gt;Pick your model like you pick your therapist; for the long haul. Find the right model, tune deeply, get close enough to understand its quirks and make them work for you. Stop architecting for the mythical future where you&amp;rsquo;ll seamlessly swap models. That future doesn&amp;rsquo;t exist, and chasing it is costing you the present.&lt;/p&gt;
&lt;h2 id='bonus-level-all-in-on-one-model-means-all-out-on-eval' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#bonus-level-all-in-on-one-model-means-all-out-on-eval' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Bonus level: All-in On One Model Means All-out On Eval&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;If any of this is landing for you, you&amp;rsquo;ll agree that we have to start thinking of  model evaluation as architecture, not an afterthought. The good news is, rigorous model eval doesn&amp;rsquo;t have to be mind numbing anymore. &lt;/p&gt;

&lt;p&gt;Turns out, games are really great eval tools! Now you can spin up your very own little &lt;a href='https://github.com/fly-apps/ai-town_on_fly.io' title=''&gt;AI Town&lt;/a&gt; on Fly.io with a single click deploy to test different models as pixel people in an evolving environment. I discuss the idea further in &lt;a href='https://fly.io/blog/games-as-model-eval/' title=''&gt;Games as Model Eval: 1-Click Deploy AI Town on Fly.io&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Phoenix.new – The Remote AI Runtime for Phoenix</title>
    <link rel="alternate" href="https://fly.io/blog/phoenix-new-the-remote-ai-runtime/"/>
    <id>https://fly.io/blog/phoenix-new-the-remote-ai-runtime/</id>
    <published>2025-06-20T00:00:00+00:00</published>
    <updated>2025-06-24T17:23:07+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/phoenix-new-the-remote-ai-runtime/assets/phoenixnew-thumb.png"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;I’m Chris McCord, the creator of Elixir’s Phoenix framework. For the past several months, I’ve been working on a skunkworks project at Fly.io, and it’s time to show it off.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I wanted LLM agents to work just as well with Elixir as they do with Python and JavaScript. Last December, in order to figure out what that was going to take, I started a little weekend project to find out how difficult it would be to build a coding agent in Elixir.&lt;/p&gt;

&lt;p&gt;A few weeks later, I had it spitting out working Phoenix applications and driving a full in-browser IDE. I knew this wasn&amp;rsquo;t going to stay a weekend project.&lt;/p&gt;

&lt;p&gt;If you follow me on Twitter, you&amp;rsquo;ve probably seen me teasing this work as it picked up steam. We&amp;rsquo;re at a point where we&amp;rsquo;re pretty serious about this thing, and so it&amp;rsquo;s time to make a formal introduction.&lt;/p&gt;

&lt;p&gt;World, meet &lt;a href='https://phoenix.new' title=''&gt;Phoenix.new&lt;/a&gt;, a batteries-included fully-online coding agent tailored to Elixir and Phoenix. I think it&amp;rsquo;s going to be the fastest way to build collaborative, real-time applications.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s see it in action:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/du7GmWGUM5Y"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id='whats-interesting-about-phoenix-new' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-interesting-about-phoenix-new' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What&amp;rsquo;s Interesting About Phoenix.new&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;First, even though it runs entirely in your browser, Phoenix.new gives both you and your agent a root shell, in an ephemeral virtual machine (a &lt;a href='https://fly.io/docs/machines/overview/' title=''&gt;Fly Machine&lt;/a&gt;) that gives our agent loop free rein to install things and run programs  — without any risk of messing up your local machine. You don&amp;rsquo;t think about any of this; you just open up the VSCode interface, push the shell button, and there you are, on the isolated machine you share with the Phoenix.new agent.&lt;/p&gt;

&lt;p&gt;Second, it&amp;rsquo;s an agent system I built specifically for Phoenix. Phoenix is about real-time collaborative applications, and Phoenix.new knows what that means. To that end, Phoenix.new includes, in both its UI and its agent tools, a full browser. The Phoenix.new agent uses that browser &amp;ldquo;headlessly&amp;rdquo; to check its own front-end changes and interact with the app. Because it&amp;rsquo;s a full browser, instead of trying to iterate on screenshots, the agent sees real page content and JavaScript state – with or without a human present.&lt;/p&gt;
&lt;h2 id='what-root-access-gets-us' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-root-access-gets-us' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What Root Access Gets Us&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Agents build software the way you did when you first got started, the way you still do today when you prototype things. They don&amp;rsquo;t carefully design Docker container layers and they don&amp;rsquo;t really do release cycles. An agent wants to pop a shell and get its fingernails dirty.&lt;/p&gt;

&lt;p&gt;A fully isolated virtual machine means Phoenix.new&amp;rsquo;s fingernails can get &lt;em&gt;arbitrarily dirty.&lt;/em&gt; If it wants to add a package to &lt;code&gt;mix.exs&lt;/code&gt;, it can do that and then run &lt;code&gt;mix phx.server&lt;/code&gt; or &lt;code&gt;mix test&lt;/code&gt; and check the output. Sure. Every agent can do that. But if it wants to add an APT package to the base operating system, it can do that too, and make sure it worked. It owns the whole environment.&lt;/p&gt;

&lt;p&gt;This offloads a huge amount of tedious, repetitive work.&lt;/p&gt;

&lt;p&gt;At his &lt;a href='https://youtu.be/LCEmiRjPEtQ?si=sR_bdu6-AqPXSNmY&amp;t=1902' title=''&gt;AI Startup School talk last week&lt;/a&gt;, Andrej Karpathy related his experience of building a restaurant menu visualizer, which takes camera pictures of text menus and transforms all the menu items into pictures. The code, which he vibe-coded with an LLM agent, was the easy part; he had it working in an afternoon. But getting the app online took him a whole week.&lt;/p&gt;

&lt;p&gt;With Phoenix.new, I&amp;rsquo;m taking dead aim at this problem. The apps we produce live in the cloud from the minute they launch. They have private, shareable URLs (we detect anything the agent generates with a bound port and give it a preview URL underneath &lt;code&gt;phx.run&lt;/code&gt;, with integrated port-forwarding), they integrate with Github, and they inherit all the infrastructure guardrails of Fly.io: hardware virtualization, WireGuard, and isolated networks.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Github’s &lt;code&gt;gh&lt;/code&gt; CLI is installed by default. So the agent knows how to clone any repo, or browse issues, and you can even authorize it for internal repositories to get it working with your team’s existing projects and dependencies.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Full control of the environment also closes the loop between the agent and deployment. When Phoenix.new boots an app, it watches the logs, and tests the application. When an action triggers an error, Phoenix.new notices and gets to work.&lt;/p&gt;
&lt;h2 id='watch-it-build-in-real-time' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#watch-it-build-in-real-time' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Watch It Build In Real Time&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://phoenix.new' title=''&gt;Phoenix.new&lt;/a&gt; can interact with web applications the way users do: with a real browser.&lt;/p&gt;

&lt;p&gt;The Phoenix.new environment includes a headless Chrome browser that our agent knows how to drive. Prompt it to add a front-end feature to your application, and it won&amp;rsquo;t just sketch the code out and make sure it compiles and lints. It&amp;rsquo;ll pull the app up itself and poke at the UI, simultaneously looking at the page content, JavaScript state, and server-side logs.&lt;/p&gt;

&lt;p&gt;Phoenix is all about &lt;a href='https://fly.io/blog/how-we-got-to-liveview/' title=''&gt;&amp;ldquo;live&amp;rdquo; real-time&lt;/a&gt; interactivity, and gives us seamless live reload. The user interface for Phoenix.new itself includes a live preview of the app being worked on, so you can kick back and watch it build front-end features incrementally. Any other &lt;code&gt;.phx.run&lt;/code&gt; tabs you have open also update as it goes. It&amp;rsquo;s wild.&lt;/p&gt;
&lt;video title="agent interacting with web" autoplay="autoplay" loop="loop" muted="muted" playsinline="playsinline" disablePictureInPicture="true" class="mb-8" src="/blog/phoenix-new-the-remote-ai-runtime/assets/webjs.mp4"&gt;&lt;/video&gt;

&lt;h2 id='not-just-for-vibe-coding' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#not-just-for-vibe-coding' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Not Just For Vibe Coding&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Phoenix.new can already build real, full-stack applications with WebSockets, Phoenix&amp;rsquo;s Presence features, and real databases. I&amp;rsquo;m seeing it succeed at business and collaborative applications right now.&lt;/p&gt;

&lt;p&gt;But there&amp;rsquo;s no fixed bound on the tasks you can reasonably ask it to accomplish. If you can do it with a shell and a browser, I want Phoenix.new to do it too. And it can do these tasks with or without you present.&lt;/p&gt;

&lt;p&gt;For example: set a &lt;code&gt;$DATABASE_URL&lt;/code&gt; and tell the agent about it. The agent knows enough to go explore it with &lt;code&gt;psql&lt;/code&gt;, and it&amp;rsquo;ll propose apps based on the schemas it finds. It can model Ecto schemas off the database. And if MySQL is your thing, the agent will just &lt;code&gt;apt install&lt;/code&gt; a MySQL client and go to town.&lt;/p&gt;

&lt;p&gt;Frontier model LLMs have vast world knowledge. They generalize extremely well. At ElixirConfEU, I did a &lt;a href='https://www.youtube.com/watch?v=ojL_VHc4gLk&amp;t=3923s' title=''&gt;demo vibe-coding Tetris&lt;/a&gt; on stage. Phoenix.new nailed it, first try, first prompt. It&amp;rsquo;s not like there&amp;rsquo;s gobs of Phoenix LiveView Tetris examples floating around the Internet! But lots of people have published Tetris code, and lots of people have written LiveView stuff, and 2025 LLMs can connect those dots.&lt;/p&gt;

&lt;p&gt;At this point you might be wondering – can I just ask it to build a Rails app? Or an Expo React Native app? Or Svelte? Or Go?&lt;/p&gt;

&lt;p&gt;Yes, you can.&lt;/p&gt;

&lt;p&gt;Our system prompt is tuned for Phoenix today, but all languages you care about are already installed. We&amp;rsquo;re still figuring out where to take this, but adding new languages and frameworks definitely ranks highly in my plans.&lt;/p&gt;
&lt;h2 id='our-async-agent-future' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-async-agent-future' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Our Async Agent Future&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://fly.io/blog/youre-all-nuts/' title=''&gt;We&amp;rsquo;re at a massive step-change in developer workflows&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Agents can do real work, today, with or without a human present. Buckle up: the future of development, at least in the common case, probably looks less like cracking open a shell and finding a file to edit, and more like popping into a CI environment with agents working away around the clock.&lt;/p&gt;

&lt;p&gt;Local development isn&amp;rsquo;t going away. But there&amp;rsquo;s going to be a shift in where the majority of our iterations take place. I&amp;rsquo;m already using Phoenix.new to triage &lt;code&gt;phoenix-core&lt;/code&gt; Github issues and pick problems to solve. I close my laptop, grab a cup of coffee, and wait for a PR to arrive — Phoenix.new knows how PRs work, too. We&amp;rsquo;re already here, and this space is just getting started.&lt;/p&gt;

&lt;p&gt;This isn&amp;rsquo;t where I thought I&amp;rsquo;d end up when I started poking around. The Phoenix and LiveView journey was much the same. Something special was there and the projects took on a life of their own. I&amp;rsquo;m excited to share this work now, and see where it might take us. I can&amp;rsquo;t wait to see what folks build.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>What are MCP Servers?</title>
    <link rel="alternate" href="https://fly.io/blog/mcps-everywhere/"/>
    <id>https://fly.io/blog/mcps-everywhere/</id>
    <published>2025-06-12T00:00:00+00:00</published>
    <updated>2025-06-12T16:59:12+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/mcps-everywhere/assets/mcps-everywhere.jpg"/>
    <content type="html">&lt;div class="lead"&gt;&lt;div&gt;&lt;p&gt;With Fly.io, &lt;a href="https://fly.io/docs/speedrun/" title=""&gt;you can get your app running globally in a matter of minutes&lt;/a&gt;, and with MCP servers you can integrate with Claude, VSCode, Cursor and &lt;a href="https://modelcontextprotocol.io/clients"&gt;many more AI clients&lt;/a&gt;.  &lt;a href="https://fly.io/docs/mcp/" title=""&gt;Try it out for yourself&lt;/a&gt;!&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The introduction to &lt;a href='https://modelcontextprotocol.io/introduction' title=''&gt;Model Context Protocol&lt;/a&gt; starts out with:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That paragraph, to me, is both comforting (&amp;ldquo;USB for LLM&amp;rdquo;? Cool! Got it!), and simultaneously vacuous (Um, but what do I actually &lt;em&gt;do&lt;/em&gt; with this?).&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ve been digging deeper and have come up with a few more analogies and observations that make sense to me. Perhaps one or more of these will help you.&lt;/p&gt;
&lt;h2 id='mcps-are-alexa-skills' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-alexa-skills' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are Alexa Skills&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;You buy an Echo Dot and a Hue light. You plug them both in and connect them to your Wi-Fi. There is one more step you need to do: you need to install and enable the Philips Hue skill to connect the two.&lt;/p&gt;

&lt;p&gt;Now you might be using Siri or Google Assistant. Or you may want to connect a Ring Doorbell camera or Google Nest Thermostat. But the principle is the same, though the analogy is slightly stronger with a skill (which is a noun) as opposed to the action of pairing your Hue Bridge with Apple HomeKit (a verb).&lt;/p&gt;
&lt;h2 id='mcps-are-api-2-0' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-api-2-0' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are API 2.0&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;HTTP 1.1 is simple. You send a request, you get a response. While there are cookies and sessions, it is pretty much stateless and therefore inefficient, as each request needs to establish a new connection. WebSockets and Server-Sent Events (SSE) mitigate this a bit.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://en.wikipedia.org/wiki/HTTP/2' title=''&gt;HTTP 2.0&lt;/a&gt; introduces features like multiplexing and server push. When you visit a web page for the first time, your browser can now request all of the associated JavaScript, CSS, and images at once, and the server can respond in any order.&lt;/p&gt;

&lt;p&gt;APIs today are typically request/response. MCPs support multiplexing and server push.&lt;/p&gt;
&lt;h2 id='mcps-are-apis-with-introspection-reflection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-apis-with-introspection-reflection' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are APIs with Introspection/Reflection&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;With &lt;a href='https://learn.openapis.org/' title=''&gt;OpenAPI&lt;/a&gt;, requests are typically JSON, and responses are too. Many OpenAPI providers publish a separate &lt;a href='https://learn.openapis.org/specification/structure.html' title=''&gt;OpenAPI Description (OAD)&lt;/a&gt;, which contains a schema describing what requests are supported by that API.&lt;/p&gt;

&lt;p&gt;With MCP, requests are also JSON and responses are also JSON, but in addition, there are standard requests built into the protocol to obtain useful information, including what tools are provided, what arguments each tool expects, and a prose description of how each tool is expected to be used.&lt;/p&gt;

&lt;p&gt;As an aside, don&amp;rsquo;t automatically assume that you will get good results from &lt;a href='https://neon.com/blog/autogenerating-mcp-servers-openai-schemas' title=''&gt;auto-generating MCP Servers from OpenAPI schemas&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Effective MCP design means thinking about the workflows you want the agent to perform and potentially creating higher-level tools that encapsulate multiple API calls, rather than just exposing the raw building blocks of your entire API spec.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href='https://glama.ai/blog/2025-06-06-mcp-vs-api#five-fundamental-differences' title=''&gt;MCP vs API&lt;/a&gt; goes into this topic at greater debth.&lt;/p&gt;

&lt;p&gt;In many cases you will get better results treating LLMs as humans. If you have a CLI, consider using that as the starting point instead.&lt;/p&gt;
&lt;h2 id='mcps-are-not-serverless' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-not-serverless' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are &lt;strong&gt;&lt;i&gt;not&lt;/i&gt;&lt;/strong&gt; serverless&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://en.wikipedia.org/wiki/Serverless_computing' title=''&gt;Serverless&lt;/a&gt;, sometimes known as &lt;a href='https://en.wikipedia.org/wiki/Function_as_a_service' title=''&gt;FaaS&lt;/a&gt;, is a popular cloud computing model, where AWS Lambda is widely recognized as the archetype.&lt;/p&gt;

&lt;p&gt;MCP servers are not serverless; they have a well-defined and long-lived &lt;a href='https://modelcontextprotocol.io/specification/2025-03-26/basic/lifecycle' title=''&gt;lifecycle&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;svg aria-roledescription="sequence" role="graphics-document document" viewBox="-50 -10 482 651" style="max-width: 482px;" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" width="100%" id="rm"&gt;&lt;rect class="rect" height="70" width="302" fill="rgb(200, 220, 250)" y="325" x="40"&gt;&lt;/rect&gt;&lt;g&gt;&lt;rect class="actor actor-bottom" ry="3" rx="3" name="Server" height="65" width="150" stroke="#666" fill="#eaeaea" y="565" x="232"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="597.5" x="307"&gt;&lt;tspan dy="0" x="307"&gt;Server&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="actor actor-bottom" ry="3" rx="3" name="Client" height="65" width="150" stroke="#666" fill="#eaeaea" y="565" x="0"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="597.5" x="75"&gt;&lt;tspan dy="0" x="75"&gt;Client&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;line name="Server" stroke="#999" stroke-width="0.5px" class="actor-line 200" y2="565" x2="307" y1="65" x1="307" id="actor10"&gt;&lt;/line&gt;&lt;g id="root-10"&gt;&lt;rect class="actor actor-top" ry="3" rx="3" name="Server" height="65" width="150" stroke="#666" fill="#eaeaea" y="0" x="232"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="32.5" x="307"&gt;&lt;tspan dy="0" x="307"&gt;Server&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;/g&gt;&lt;g&gt;&lt;line name="Client" stroke="#999" stroke-width="0.5px" class="actor-line 200" y2="565" x2="75" y1="65" x1="75" id="actor9"&gt;&lt;/line&gt;&lt;g id="root-9"&gt;&lt;rect class="actor actor-top" ry="3" rx="3" name="Client" height="65" width="150" stroke="#666" fill="#eaeaea" y="0" x="0"&gt;&lt;/rect&gt;&lt;text class="actor actor-box" alignment-baseline="central" dominant-baseline="central" style="text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;" y="32.5" x="75"&gt;&lt;tspan dy="0" x="75"&gt;Client&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;/g&gt;&lt;style&gt;#rm{font-family:inherit;font-size:16px;fill:#333;}#rm .error-icon{fill:#552222;}#rm .error-text{fill:#552222;stroke:#552222;}#rm .edge-thickness-normal{stroke-width:1px;}#rm .edge-thickness-thick{stroke-width:3.5px;}#rm .edge-pattern-solid{stroke-dasharray:0;}#rm .edge-thickness-invisible{stroke-width:0;fill:none;}#rm .edge-pattern-dashed{stroke-dasharray:3;}#rm .edge-pattern-dotted{stroke-dasharray:2;}#rm .marker{fill:#333333;stroke:#333333;}#rm .marker.cross{stroke:#333333;}#rm svg{font-family:inherit;font-size:16px;}#rm p{margin:0;}#rm .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm text.actor&amp;gt;tspan{fill:black;stroke:none;}#rm .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#rm .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#rm .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#rm #arrowhead path{fill:#333;stroke:#333;}#rm .sequenceNumber{fill:white;}#rm #sequencenumber{fill:#333;}#rm #crosshead path{fill:#333;stroke:#333;}#rm .messageText{fill:#333;stroke:none;}#rm .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm .labelText,#rm .labelText&amp;gt;tspan{fill:black;stroke:none;}#rm .loopText,#rm .loopText&amp;gt;tspan{fill:black;stroke:none;}#rm .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#rm .note{stroke:#aaaa33;fill:#fff5ad;}#rm .noteText,#rm .noteText&amp;gt;tspan{fill:black;stroke:none;}#rm .activation0{fill:#f4f4f4;stroke:#666;}#rm .activation1{fill:#f4f4f4;stroke:#666;}#rm .activation2{fill:#f4f4f4;stroke:#666;}#rm .actorPopupMenu{position:absolute;}#rm .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#rm .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm .actor-man circle,#rm line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#rm :root{--mermaid-font-family:inherit;}&lt;/style&gt;&lt;g&gt;&lt;/g&gt;&lt;defs&gt;&lt;symbol height="24" width="24" id="computer"&gt;&lt;path d="M2 2v13h20v-13h-20zm18 11h-16v-9h16v9zm-10.228 6l.466-1h3.524l.467 1h-4.457zm14.228 3h-24l2-6h2.104l-1.33 4h18.45l-1.297-4h2.073l2 6zm-5-10h-14v-7h14v7z" transform="scale(.5)"&gt;&lt;/path&gt;&lt;/symbol&gt;&lt;/defs&gt;&lt;defs&gt;&lt;symbol clip-rule="evenodd" fill-rule="evenodd" id="database"&gt;&lt;path d="M12.258.001l.256.004.255.005.253.008.251.01.249.012.247.015.246.016.242.019.241.02.239.023.236.024.233.027.231.028.229.031.225.032.223.034.22.036.217.038.214.04.211.041.208.043.205.045.201.046.198.048.194.05.191.051.187.053.183.054.18.056.175.057.172.059.168.06.163.061.16.063.155.064.15.066.074.033.073.033.071.034.07.034.069.035.068.035.067.035.066.035.064.036.064.036.062.036.06.036.06.037.058.037.058.037.055.038.055.038.053.038.052.038.051.039.05.039.048.039.047.039.045.04.044.04.043.04.041.04.04.041.039.041.037.041.036.041.034.041.033.042.032.042.03.042.029.042.027.042.026.043.024.043.023.043.021.043.02.043.018.044.017.043.015.044.013.044.012.044.011.045.009.044.007.045.006.045.004.045.002.045.001.045v17l-.001.045-.002.045-.004.045-.006.045-.007.045-.009.044-.011.045-.012.044-.013.044-.015.044-.017.043-.018.044-.02.043-.021.043-.023.043-.024.043-.026.043-.027.042-.029.042-.03.042-.032.042-.033.042-.034.041-.036.041-.037.041-.039.041-.04.041-.041.04-.043.04-.044.04-.045.04-.047.039-.048.039-.05.039-.051.039-.052.038-.053.038-.055.038-.055.038-.058.037-.058.037-.06.037-.06.036-.062.036-.064.036-.064.036-.066.035-.067.035-.068.035-.069.035-.07.034-.071.034-.073.033-.074.033-.15.066-.155.064-.16.063-.163.061-.168.06-.172.059-.175.057-.18.056-.183.054-.187.053-.191.051-.194.05-.198.048-.201.046-.205.045-.208.043-.211.041-.214.04-.217.038-.22.036-.223.034-.225.032-.229.031-.231.028-.233.027-.236.024-.239.023-.241.02-.242.019-.246.016-.247.015-.249.012-.251.01-.253.008-.255.005-.256.004-.258.001-.258-.001-.256-.004-.255-.005-.253-.008-.251-.01-.249-.012-.247-.015-.245-.016-.243-.019-.241-.02-.238-.023-.236-.024-.234-.027-.231-.028-.228-.031-.226-.032-.223-.034-.22-.036-.217-.038-.214-.04-.211-.041-.208-.043-.204-.045-.201-.046-.198-.048-.195-.05-.19-.051-.187-.053-.184-.054-.179-.056-.176-.057-.172-.059-.167-.06-.164-.061-.159-.063-.155-.064-.151-.066-.074-.033-.072-.033-.072-.034-.07-.034-.069-.035-.068-.035-.067-.035-.066-.035-.064-.036-.063-.036-.062-.036-.061-.036-.06-.037-.058-.037-.057-.037-.056-.038-.055-.038-.053-.038-.052-.038-.051-.039-.049-.039-.049-.039-.046-.039-.046-.04-.044-.04-.043-.04-.041-.04-.04-.041-.039-.041-.037-.041-.036-.041-.034-.041-.033-.042-.032-.042-.03-.042-.029-.042-.027-.042-.026-.043-.024-.043-.023-.043-.021-.043-.02-.043-.018-.044-.017-.043-.015-.044-.013-.044-.012-.044-.011-.045-.009-.044-.007-.045-.006-.045-.004-.045-.002-.045-.001-.045v-17l.001-.045.002-.045.004-.045.006-.045.007-.045.009-.044.011-.045.012-.044.013-.044.015-.044.017-.043.018-.044.02-.043.021-.043.023-.043.024-.043.026-.043.027-.042.029-.042.03-.042.032-.042.033-.042.034-.041.036-.041.037-.041.039-.041.04-.041.041-.04.043-.04.044-.04.046-.04.046-.039.049-.039.049-.039.051-.039.052-.038.053-.038.055-.038.056-.038.057-.037.058-.037.06-.037.061-.036.062-.036.063-.036.064-.036.066-.035.067-.035.068-.035.069-.035.07-.034.072-.034.072-.033.074-.033.151-.066.155-.064.159-.063.164-.061.167-.06.172-.059.176-.057.179-.056.184-.054.187-.053.19-.051.195-.05.198-.048.201-.046.204-.045.208-.043.211-.041.214-.04.217-.038.22-.036.223-.034.226-.032.228-.031.231-.028.234-.027.236-.024.238-.023.241-.02.243-.019.245-.016.247-.015.249-.012.251-.01.253-.008.255-.005.256-.004.258-.001.258.001zm-9.258 20.499v.01l.001.021.003.021.004.022.005.021.006.022.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.023.018.024.019.024.021.024.022.025.023.024.024.025.052.049.056.05.061.051.066.051.07.051.075.051.079.052.084.052.088.052.092.052.097.052.102.051.105.052.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.048.144.049.147.047.152.047.155.047.16.045.163.045.167.043.171.043.176.041.178.041.183.039.187.039.19.037.194.035.197.035.202.033.204.031.209.03.212.029.216.027.219.025.222.024.226.021.23.02.233.018.236.016.24.015.243.012.246.01.249.008.253.005.256.004.259.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.021.224-.024.22-.026.216-.027.212-.028.21-.031.205-.031.202-.034.198-.034.194-.036.191-.037.187-.039.183-.04.179-.04.175-.042.172-.043.168-.044.163-.045.16-.046.155-.046.152-.047.148-.048.143-.049.139-.049.136-.05.131-.05.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.053.083-.051.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.05.023-.024.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.023.01-.022.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.127l-.077.055-.08.053-.083.054-.085.053-.087.052-.09.052-.093.051-.095.05-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.045-.118.044-.12.043-.122.042-.124.042-.126.041-.128.04-.13.04-.132.038-.134.038-.135.037-.138.037-.139.035-.142.035-.143.034-.144.033-.147.032-.148.031-.15.03-.151.03-.153.029-.154.027-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.01-.179.008-.179.008-.181.006-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.006-.179-.008-.179-.008-.178-.01-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.027-.153-.029-.151-.03-.15-.03-.148-.031-.146-.032-.145-.033-.143-.034-.141-.035-.14-.035-.137-.037-.136-.037-.134-.038-.132-.038-.13-.04-.128-.04-.126-.041-.124-.042-.122-.042-.12-.044-.117-.043-.116-.045-.113-.045-.112-.046-.109-.047-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.05-.093-.052-.09-.051-.087-.052-.085-.053-.083-.054-.08-.054-.077-.054v4.127zm0-5.654v.011l.001.021.003.021.004.021.005.022.006.022.007.022.009.022.01.022.011.023.012.023.013.023.015.024.016.023.017.024.018.024.019.024.021.024.022.024.023.025.024.024.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.052.11.051.114.051.119.052.123.05.127.051.131.05.135.049.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.044.171.042.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.022.23.02.233.018.236.016.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.012.241-.015.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.048.139-.05.136-.049.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.051.051-.049.023-.025.023-.024.021-.025.02-.024.019-.024.018-.024.017-.024.015-.023.014-.023.013-.024.012-.022.01-.023.01-.023.008-.022.006-.022.006-.022.004-.021.004-.022.001-.021.001-.021v-4.139l-.077.054-.08.054-.083.054-.085.052-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.044-.118.044-.12.044-.122.042-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.035-.143.033-.144.033-.147.033-.148.031-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.009-.179.009-.179.007-.181.007-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.007-.179-.007-.179-.009-.178-.009-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.031-.146-.033-.145-.033-.143-.033-.141-.035-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.04-.126-.041-.124-.042-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.051-.093-.051-.09-.051-.087-.053-.085-.052-.083-.054-.08-.054-.077-.054v4.139zm0-5.666v.011l.001.02.003.022.004.021.005.022.006.021.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.024.018.023.019.024.021.025.022.024.023.024.024.025.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.051.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.043.171.043.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.021.23.02.233.018.236.017.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.013.241-.014.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.049.139-.049.136-.049.131-.051.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.049.023-.025.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.022.01-.023.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.153l-.077.054-.08.054-.083.053-.085.053-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.048-.105.048-.106.048-.109.046-.111.046-.114.046-.115.044-.118.044-.12.043-.122.043-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.034-.143.034-.144.033-.147.032-.148.032-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.024-.161.024-.162.023-.163.023-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.01-.178.01-.179.009-.179.007-.181.006-.182.006-.182.004-.184.003-.184.001-.185.001-.185-.001-.184-.001-.184-.003-.182-.004-.182-.006-.181-.006-.179-.007-.179-.009-.178-.01-.176-.01-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.023-.162-.023-.161-.024-.159-.024-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.032-.146-.032-.145-.033-.143-.034-.141-.034-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.041-.126-.041-.124-.041-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.048-.105-.048-.102-.048-.1-.05-.097-.049-.095-.051-.093-.051-.09-.052-.087-.052-.085-.053-.083-.053-.08-.054-.077-.054v4.153zm8.74-8.179l-.257.004-.254.005-.25.008-.247.011-.244.012-.241.014-.237.016-.233.018-.231.021-.226.022-.224.023-.22.026-.216.027-.212.028-.21.031-.205.032-.202.033-.198.034-.194.036-.191.038-.187.038-.183.04-.179.041-.175.042-.172.043-.168.043-.163.045-.16.046-.155.046-.152.048-.148.048-.143.048-.139.049-.136.05-.131.05-.126.051-.123.051-.118.051-.114.052-.11.052-.106.052-.101.052-.096.052-.092.052-.088.052-.083.052-.079.052-.074.051-.07.052-.065.051-.06.05-.056.05-.051.05-.023.025-.023.024-.021.024-.02.025-.019.024-.018.024-.017.023-.015.024-.014.023-.013.023-.012.023-.01.023-.01.022-.008.022-.006.023-.006.021-.004.022-.004.021-.001.021-.001.021.001.021.001.021.004.021.004.022.006.021.006.023.008.022.01.022.01.023.012.023.013.023.014.023.015.024.017.023.018.024.019.024.02.025.021.024.023.024.023.025.051.05.056.05.06.05.065.051.07.052.074.051.079.052.083.052.088.052.092.052.096.052.101.052.106.052.11.052.114.052.118.051.123.051.126.051.131.05.136.05.139.049.143.048.148.048.152.048.155.046.16.046.163.045.168.043.172.043.175.042.179.041.183.04.187.038.191.038.194.036.198.034.202.033.205.032.21.031.212.028.216.027.22.026.224.023.226.022.231.021.233.018.237.016.241.014.244.012.247.011.25.008.254.005.257.004.26.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.022.224-.023.22-.026.216-.027.212-.028.21-.031.205-.032.202-.033.198-.034.194-.036.191-.038.187-.038.183-.04.179-.041.175-.042.172-.043.168-.043.163-.045.16-.046.155-.046.152-.048.148-.048.143-.048.139-.049.136-.05.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.05.051-.05.023-.025.023-.024.021-.024.02-.025.019-.024.018-.024.017-.023.015-.024.014-.023.013-.023.012-.023.01-.023.01-.022.008-.022.006-.023.006-.021.004-.022.004-.021.001-.021.001-.021-.001-.021-.001-.021-.004-.021-.004-.022-.006-.021-.006-.023-.008-.022-.01-.022-.01-.023-.012-.023-.013-.023-.014-.023-.015-.024-.017-.023-.018-.024-.019-.024-.02-.025-.021-.024-.023-.024-.023-.025-.051-.05-.056-.05-.06-.05-.065-.051-.07-.052-.074-.051-.079-.052-.083-.052-.088-.052-.092-.052-.096-.052-.101-.052-.106-.052-.11-.052-.114-.052-.118-.051-.123-.051-.126-.051-.131-.05-.136-.05-.139-.049-.143-.048-.148-.048-.152-.048-.155-.046-.16-.046-.163-.045-.168-.043-.172-.043-.175-.042-.179-.041-.183-.04-.187-.038-.191-.038-.194-.036-.198-.034-.202-.033-.205-.032-.21-.031-.212-.028-.216-.027-.22-.026-.224-.023-.226-.022-.231-.021-.233-.018-.237-.016-.241-.014-.244-.012-.247-.011-.25-.008-.254-.005-.257-.004-.26-.001-.26.001z" transform="scale(.5)"&gt;&lt;/path&gt;&lt;/symbol&gt;&lt;/defs&gt;&lt;defs&gt;&lt;symbol height="24" width="24" id="clock"&gt;&lt;path d="M12 2c5.514 0 10 4.486 10 10s-4.486 10-10 10-10-4.486-10-10 4.486-10 10-10zm0-2c-6.627 0-12 5.373-12 12s5.373 12 12 12 12-5.373 12-12-5.373-12-12-12zm5.848 12.459c.202.038.202.333.001.372-1.907.361-6.045 1.111-6.547 1.111-.719 0-1.301-.582-1.301-1.301 0-.512.77-5.447 1.125-7.445.034-.192.312-.181.343.014l.985 6.238 5.394 1.011z" transform="scale(.5)"&gt;&lt;/path&gt;&lt;/symbol&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker orient="auto-start-reverse" markerHeight="12" markerWidth="12" markerUnits="userSpaceOnUse" refY="5" refX="7.9" id="arrowhead"&gt;&lt;path d="M -1 0 L 10 5 L 0 10 z"&gt;&lt;/path&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker refY="4.5" refX="4" orient="auto" markerHeight="8" markerWidth="15" id="crosshead"&gt;&lt;path d="M 1,2 L 6,7 M 6,2 L 1,7" stroke-width="1pt" style="stroke-dasharray: 0px, 0px;" stroke="#000000" fill="none"&gt;&lt;/path&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker orient="auto" markerHeight="28" markerWidth="20" refY="7" refX="15.5" id="filled-head"&gt;&lt;path d="M 18,7 L9,13 L14,7 L9,1 Z"&gt;&lt;/path&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;defs&gt;&lt;marker orient="auto" markerHeight="40" markerWidth="60" refY="15" refX="15" id="sequencenumber"&gt;&lt;circle r="6" cy="15" cx="15"&gt;&lt;/circle&gt;&lt;/marker&gt;&lt;/defs&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="75" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="80" x="191"&gt;&lt;tspan x="191"&gt;Initialization Phase&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="activation0" height="380" width="10" stroke="#666" fill="#EDF2AE" y="115" x="70"&gt;&lt;/rect&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="activation0" height="328" width="10" stroke="#666" fill="#EDF2AE" y="167" x="302"&gt;&lt;/rect&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="275" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="280" x="191"&gt;&lt;tspan x="191"&gt;Operation Phase&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="345" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="350" x="191"&gt;&lt;tspan x="191"&gt;Normal protocol operations&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="405" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="410" x="191"&gt;&lt;tspan x="191"&gt;Shutdown&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;g&gt;&lt;rect class="note" height="40" width="282" stroke="#666" fill="#EDF2AE" y="505" x="50"&gt;&lt;/rect&gt;&lt;text dy="1em" class="noteText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="510" x="191"&gt;&lt;tspan x="191"&gt;Connection closed&lt;/tspan&gt;&lt;/text&gt;&lt;/g&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="130" x="190"&gt;initialize request&lt;/text&gt;&lt;line marker-end="url(#arrowhead)" style="fill: none;" stroke="none" stroke-width="2" class="messageLine0" y2="165" x2="299" y1="165" x1="80"&gt;&lt;/line&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="180" x="193"&gt;initialize response&lt;/text&gt;&lt;line marker-end="url(#arrowhead)" stroke="none" stroke-width="2" class="messageLine1" style="stroke-dasharray: 3px, 3px; fill: none;" y2="215" x2="83" y1="215" x1="302"&gt;&lt;/line&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="230" x="190"&gt;initialized notification&lt;/text&gt;&lt;line marker-end="url(#filled-head)" stroke="none" stroke-width="2" class="messageLine1" style="stroke-dasharray: 3px, 3px; fill: none;" y2="265" x2="299" y1="265" x1="80"&gt;&lt;/line&gt;&lt;text dy="1em" class="messageText" style="font-family: inherit; font-size: 16px; font-weight: 400;" alignment-baseline="middle" dominant-baseline="middle" text-anchor="middle" y="460" x="190"&gt;Disconnect&lt;/text&gt;&lt;line marker-end="url(#filled-head)" stroke="none" stroke-width="2" class="messageLine1" style="stroke-dasharray: 3px, 3px; fill: none;" y2="495" x2="299" y1="495" x1="80"&gt;&lt;/line&gt;&lt;/svg&gt;&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;You can play with this right now.&lt;/h1&gt;
    &lt;p&gt;MCPs are barely six months old, but we are keeping up with the latest&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/docs/mcp"&gt;
        Try launching your MCP server on Fly.io today &lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='mcps-are-not-inherently-secure-or-private' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-not-inherently-secure-or-private' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs are &lt;strong&gt;&lt;i&gt;not&lt;/i&gt;&lt;/strong&gt; Inherently Secure or Private&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Here I am not talking about &lt;a href='https://simonwillison.net/2025/Apr/9/mcp-prompt-injection/' title=''&gt;prompt injection&lt;/a&gt; or &lt;a href='https://simonwillison.net/2025/May/26/github-mcp-exploited/' title=''&gt;exploitable abilities&lt;/a&gt;, though those are real problems too.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m talking about something more fundamental and basic. Let&amp;rsquo;s take a look at the very same &lt;a href='https://github.com/github/github-mcp-server' title=''&gt;GitHub MCP&lt;/a&gt; featured in the above two descriptions of an exploitable exposure. The current process for installing a stdio MCP into Claude involves placing secrets in plain text in a well-defined location. And the process for installing the &lt;em&gt;next&lt;/em&gt; MCP server is to download a program from a third party and run that tool in a way that has access to this very file.&lt;/p&gt;

&lt;p&gt;Addressing MCP security requires a holistic approach, but one key strategy component is the ability to run an MCP server on a remote machine which can only be accessed by you, and only after you present a revocable bearer token. That remote machine may require access to secrets (like your GitHub token), but those secrets are not something either your LLM client or other MCP servers will be able to access directly.&lt;/p&gt;
&lt;h2 id='mcps-should-be-considered-family' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-should-be-considered-family' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;MCPs should be considered family&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Recapping: &lt;a href='https://www.usa.philips.com/' title=''&gt;Philips&lt;/a&gt; has an &lt;a href='https://developers.meethue.com/' title=''&gt;API and SDK&lt;/a&gt; for Hue that is used by perhaps thousands, and has an &lt;a href='https://www.amazon.com/Philips-Hue/dp/B01LWAGUG7' title=''&gt;Alexa Skill&lt;/a&gt; that is used by untold millions. Of course, somebody already built a &lt;a href='https://github.com/ThomasRohde/hue-mcp' title=''&gt;Philips Hue MCP Server&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;LLMs are brains. MCPs give LLMs eyes, hands, and legs. The end result is a robot. Most MCPs today are brainless—they merely are eyes, hands, and legs. The results are appliances. I buy and install a dishwasher. You do too. Both of our dishwashers perform the same basic function. As do any Hue lights we may buy.&lt;/p&gt;

&lt;p&gt;In The Jetsons, &lt;a href='https://thejetsons.fandom.com/wiki/Rosey' title=''&gt;Rosie&lt;/a&gt; is a maid and housekeeper who is also a member of the family. She is good friends with Jane and takes the role of a surrogate aunt towards Elroy and Judy. Let&amp;rsquo;s start there and go further.&lt;/p&gt;

&lt;p&gt;A person with an LLM can build things. Imagine having a 3D printer that makes robots or agents that act on your behalf. You may want an agent to watch for trends in your business, keep track of what events are happening in your area, screen your text messages, or act as a DJ when you get together with friends.&lt;/p&gt;

&lt;p&gt;You may share the MCP servers you create with others to use as starting points, but they undoubtedly will adapt them and make them their own.&lt;/p&gt;
&lt;h2 id='closing-thoughts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#closing-thoughts' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Closing Thoughts&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Don&amp;rsquo;t get me wrong. I am not saying there won&amp;rsquo;t be MCPs that are mere appliances—there undoubtedly will be plenty of those. But not all MCPs will be appliances; many will be agents.&lt;/p&gt;

&lt;p&gt;Today, most people exploring LLMs are trying to add LLMs to things they already have—for example, IDEs. MCPs flip this. Instead of adding LLMs to something you already have, you add something you already have to an LLM.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://desktopcommander.app/' title=''&gt;Desktop Commander MCP&lt;/a&gt; is an example I&amp;rsquo;m particularly fond of that does exactly this. It also happens to be a prime example of a tool you want to give root access to, then lock it in a secure box and run someplace other than your laptop. &lt;a href='https://fly.io/docs/mcp/examples/#desktop-commander' title=''&gt;Give it a try&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Microsoft is actively working on &lt;a href='https://developer.microsoft.com/en-us/windows/agentic/' title=''&gt;Agentic Windows&lt;/a&gt;. Your smartphone is the obvious next frontier. Today, you have an app store and a web browser. MCP servers have the potential to make both obsolete—and this could happen sooner than you think.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>My AI Skeptic Friends Are All Nuts</title>
    <link rel="alternate" href="https://fly.io/blog/youre-all-nuts/"/>
    <id>https://fly.io/blog/youre-all-nuts/</id>
    <published>2025-06-02T00:00:00+00:00</published>
    <updated>2025-06-10T21:38:22+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/youre-all-nuts/assets/whoah.png"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;A heartfelt provocation about AI-assisted programming.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Tech execs are mandating LLM adoption. That&amp;rsquo;s bad strategy. But I get where they&amp;rsquo;re coming from.&lt;/p&gt;

&lt;p&gt;Some of the smartest people I know share a bone-deep belief that AI is a fad —  the next iteration of NFT mania. I&amp;rsquo;ve been reluctant to push back on them, because, well, they&amp;rsquo;re smarter than me. But their arguments are unserious, and worth confronting. Extraordinarily talented people are doing work that LLMs already do better, out of  spite.&lt;/p&gt;

&lt;p&gt;All progress on LLMs could halt today, and LLMs would remain the 2nd most important thing to happen over the course of my career.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;Important caveat&lt;/strong&gt;: I’m discussing only the implications of LLMs for software development. For art, music, and writing? I got nothing. I’m inclined to believe the skeptics in those fields. I just don’t believe them about mine.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Bona fides: I&amp;rsquo;ve been shipping software since the mid-1990s. I started out in boxed, shrink-wrap C code.  Survived an ill-advised &lt;a href='https://www.amazon.com/Modern-Design-Generic-Programming-Patterns/dp/0201704315' title=''&gt;Alexandrescu&lt;/a&gt; C++ phase. Lots of Ruby and Python tooling.  Some kernel work. A whole lot of server-side C, Go, and Rust. However you define &amp;ldquo;serious developer&amp;rdquo;, I qualify. Even if only on one of your lower tiers.&lt;/p&gt;
&lt;h3 id='level-setting' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#level-setting' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;level setting&lt;/span&gt;&lt;/h3&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;† (or, God forbid, 2 years ago with Copilot)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;First, we need to get on the same page. If you were trying and failing to use an LLM for code 6 months ago †, you’re not doing what most serious LLM-assisted coders are doing.&lt;/p&gt;

&lt;p&gt;People coding with LLMs today use agents. Agents get to poke around your codebase on their own. They author files directly. They run tools. They compile code, run tests, and iterate on the results. They also:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pull in arbitrary code from the tree, or from other trees online, into their context windows,
&lt;/li&gt;&lt;li&gt;run standard Unix tools to navigate the tree and extract information,
&lt;/li&gt;&lt;li&gt;interact with Git,
&lt;/li&gt;&lt;li&gt;run existing tooling, like linters, formatters, and model checkers, and
&lt;/li&gt;&lt;li&gt;make essentially arbitrary tool calls (that you set up) through MCP.
&lt;/li&gt;&lt;/ul&gt;
&lt;div class="callout"&gt;&lt;p&gt;The code in an agent that actually “does stuff” with code is not, itself, AI. This should reassure you. It’s surprisingly simple systems code, wired to ground truth about programming in the same way a Makefile is. You could write an effective coding agent in a weekend. Its strengths would have more to do with how you think about and structure builds and linting and test harnesses than with how advanced o3 or Sonnet have become.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you’re making requests on a ChatGPT page and then pasting the resulting (broken) code into your editor, you’re not doing what the AI boosters are doing. No wonder you&amp;rsquo;re talking past each other.&lt;/p&gt;
&lt;h3 id='the-positive-case' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-positive-case' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;the positive case&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;img alt="four quadrants of tedium and importance" src="/blog/youre-all-nuts/assets/code-quad.png?2/3&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;LLMs can write a large fraction of all the tedious code you’ll ever need to write. And most code on most projects is tedious. LLMs drastically reduce the number of things you’ll ever need to Google. They look things up themselves. Most importantly, they don’t get tired; they’re immune to inertia.&lt;/p&gt;

&lt;p&gt;Think of anything you wanted to build but didn&amp;rsquo;t. You tried to home in on some first steps. If you&amp;rsquo;d been in the limerent phase of a new programming language, you&amp;rsquo;d have started writing. But you weren&amp;rsquo;t, so you put it off, for a day, a year, or your whole career.&lt;/p&gt;

&lt;p&gt;I can feel my blood pressure rising thinking of all the bookkeeping and Googling and dependency drama of a new project. An LLM can be instructed to just figure all that shit out. Often, it will drop you precisely at that golden moment where shit almost works, and development means tweaking code and immediately seeing things work better. That dopamine hit is why I code.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a downside. Sometimes, gnarly stuff needs doing. But you don&amp;rsquo;t wanna do it. So you refactor unit tests, soothing yourself with the lie that you&amp;rsquo;re doing real work. But an LLM can be told to go refactor all your unit tests. An agent can occupy itself for hours putzing with your tests in a VM and come back later with a PR. If you listen to me, you’ll know that. You&amp;rsquo;ll feel worse yak-shaving. You&amp;rsquo;ll end up doing… real work.&lt;/p&gt;
&lt;h3 id='but-you-have-no-idea-what-the-code-is' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-you-have-no-idea-what-the-code-is' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but you have no idea what the code is&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Are you a vibe coding Youtuber? Can you not read code? If so: astute point. Otherwise: what the fuck is wrong with you?&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;ve always been responsible for what you merge to &lt;code&gt;main&lt;/code&gt;. You were five years go. And you are tomorrow, whether or not you use an LLM.&lt;/p&gt;

&lt;p&gt;If you build something with an LLM that people will depend on, read the code. In fact, you&amp;rsquo;ll probably do more than that. You&amp;rsquo;ll spend 5-10 minutes knocking it back into your own style. LLMs are &lt;a href='https://github.com/PatrickJS/awesome-cursorrules' title=''&gt;showing signs of adapting&lt;/a&gt; to local idiom, but we’re not there yet.&lt;/p&gt;

&lt;p&gt;People complain about LLM-generated code being “probabilistic”. No it isn&amp;rsquo;t. It’s code. It&amp;rsquo;s not Yacc output. It&amp;rsquo;s knowable. The LLM might be stochastic. But the LLM doesn’t matter. What matters is whether you can make sense of the result, and whether your guardrails hold.&lt;/p&gt;

&lt;p&gt;Reading other people&amp;rsquo;s code is part of the job. If you can&amp;rsquo;t metabolize the boring, repetitive code an LLM generates: skills issue! How are you handling the chaos human developers turn out on a deadline?&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (because it can hold 50-70kloc in its context window)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;For the last month or so, Gemini 2.5 has been my go-to †. Almost nothing it spits out for me merges without edits. I’m sure there’s a skill to getting a SOTA model to one-shot a feature-plus-merge! But I don’t care. I like moving the code around and chuckling to myself while I delete all the stupid comments. I have to read the code line-by-line anyways.&lt;/p&gt;
&lt;h3 id='but-hallucination' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-hallucination' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but hallucination&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;If hallucination matters to you, your programming language has let you down.&lt;/p&gt;

&lt;p&gt;Agents lint. They compile and run tests. If their LLM invents a new function signature, the agent sees the error. They feed it back to the LLM, which says “oh, right, I totally made that up” and then tries again.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;ll only notice this happening if you watch the chain of thought log your agent generates. Don&amp;rsquo;t. This is why I like &lt;a href='https://zed.dev/agentic' title=''&gt;Zed&amp;rsquo;s agent mode&lt;/a&gt;: it begs you to tab away and let it work, and pings you with a desktop notification when it&amp;rsquo;s done.&lt;/p&gt;

&lt;p&gt;I’m sure there are still environments where hallucination matters. But &amp;ldquo;hallucination&amp;rdquo; is the first thing developers bring up when someone suggests using LLMs, despite it being (more or less) a solved problem.&lt;/p&gt;
&lt;h3 id='but-the-code-is-shitty-like-that-of-a-junior-developer' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-code-is-shitty-like-that-of-a-junior-developer' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the code is shitty, like that of a junior developer&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Does an intern cost $20/month? Because that&amp;rsquo;s what Cursor.ai costs.&lt;/p&gt;

&lt;p&gt;Part of being a senior developer is making less-able coders productive, be they fleshly or algebraic. Using agents well is both a both a  skill and an engineering project all its own, of prompts, indices, &lt;a href='https://fly.io/blog/semgrep-but-for-real-now/' title=''&gt;and (especially) tooling.&lt;/a&gt; LLMs only produce shitty code if you let them.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (Also: 100% of all the Bash code you should author ever again)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Maybe the current confusion is about who&amp;rsquo;s doing what work. Today, LLMs do a lot of typing, Googling, test cases †, and edit-compile-test-debug cycles. But even the most Claude-poisoned serious developers in the world still own curation, judgement, guidance, and direction.&lt;/p&gt;

&lt;p&gt;Also: let’s stop kidding ourselves about how good our human first cuts really are.&lt;/p&gt;
&lt;h3 id='but-its-bad-at-rust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-its-bad-at-rust' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but it’s bad at rust&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;It’s hard to get a good toolchain for Brainfuck, too. Life’s tough in the aluminum siding business.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (and they surely will; the Rust community takes tooling seriously)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A lot of LLM skepticism probably isn&amp;rsquo;t really about LLMs. It&amp;rsquo;s projection. People say &amp;ldquo;LLMs can&amp;rsquo;t code&amp;rdquo; when what they really mean is &amp;ldquo;LLMs can&amp;rsquo;t write Rust&amp;rdquo;. Fair enough! But people select languages in part based on how well LLMs work with them, so Rust people should get on that †.&lt;/p&gt;

&lt;p&gt;I work mostly in Go. I’m confident the designers of the Go programming language didn&amp;rsquo;t set out to produce the most LLM-legible language in the industry. They succeeded nonetheless. Go has just enough type safety, an extensive standard library, and a culture that prizes (often repetitive) idiom. LLMs kick ass generating it.&lt;/p&gt;

&lt;p&gt;All this is to say: I write some Rust. I like it fine. If LLMs and Rust aren&amp;rsquo;t working for you, I feel you. But if that’s your whole thing, we’re not having the same argument.&lt;/p&gt;
&lt;h3 id='but-the-craft' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-craft' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the craft&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Do you like fine Japanese woodworking? All hand tools and sashimono joinery? Me too. Do it on your own time.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;† (I’m a piker compared to my woodworking friends)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I have a basic wood shop in my basement †. I could get a lot of satisfaction from building a table. And, if that table is a workbench or a grill table, sure, I&amp;rsquo;ll build it. But if I need, like, a table? For people to sit at? In my office? I buy a fucking table.&lt;/p&gt;

&lt;p&gt;Professional software developers are in the business of solving practical problems for people with code. We are not, in our day jobs, artisans. Steve Jobs was wrong: we do not need to carve the unseen feet in the sculpture. Nobody cares if the logic board traces are pleasingly routed. If anything we build endures, it won&amp;rsquo;t be because the codebase was beautiful.&lt;/p&gt;

&lt;p&gt;Besides, that’s not really what happens. If you’re taking time carefully golfing functions down into graceful, fluent, minimal functional expressions, alarm bells should ring. You’re yak-shaving. The real work has depleted your focus. You&amp;rsquo;re not building: you&amp;rsquo;re self-soothing.&lt;/p&gt;

&lt;p&gt;Which, wait for it, is something LLMs are good for. They devour schlep, and clear a path to the important stuff, where your judgement and values really matter.&lt;/p&gt;
&lt;h3 id='but-the-mediocrity' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-mediocrity' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the mediocrity&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;As a mid-late career coder, I&amp;rsquo;ve come to appreciate mediocrity. You should be so lucky as to  have it flowing almost effortlessly from a tap.&lt;/p&gt;

&lt;p&gt;We all write mediocre code. Mediocre code: often fine. Not all code is equally important. Some code should be mediocre. Maximum effort on a random unit test? You&amp;rsquo;re doing something wrong. Your team lead should correct you.&lt;/p&gt;

&lt;p&gt;Developers all love to preen about code. They worry LLMs lower the &amp;ldquo;ceiling&amp;rdquo; for quality. Maybe. But they also raise the &amp;ldquo;floor&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Gemini&amp;rsquo;s floor is higher than my own.  My code looks nice. But it&amp;rsquo;s not as thorough. LLM code is repetitive. But mine includes dumb contortions where I got too clever trying to DRY things up.&lt;/p&gt;

&lt;p&gt;And LLMs aren&amp;rsquo;t mediocre on every axis. They almost certainly have a bigger bag of algorithmic tricks than you do: radix tries, topological sorts, graph reductions, and LDPC codes. Humans romanticize &lt;code&gt;rsync&lt;/code&gt; (&lt;a href='https://www.andrew.cmu.edu/course/15-749/READINGS/required/cas/tridgell96.pdf' title=''&gt;Andrew Tridgell&lt;/a&gt; wrote a paper about it!). To an LLM it might not be that much more interesting than a SQL join.&lt;/p&gt;

&lt;p&gt;But I&amp;rsquo;m getting ahead of myself. It doesn&amp;rsquo;t matter. If truly mediocre code is all we ever get from LLMs, that&amp;rsquo;s still huge. It&amp;rsquo;s that much less mediocre code humans have to write.&lt;/p&gt;
&lt;h3 id='but-itll-never-be-agi' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-itll-never-be-agi' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but it&amp;rsquo;ll never be AGI&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;I don&amp;rsquo;t give a shit.&lt;/p&gt;

&lt;p&gt;Smart practitioners get wound up by the AI/VC hype cycle. I can&amp;rsquo;t blame them. But it&amp;rsquo;s not an argument. Things either work or they don&amp;rsquo;t, no matter what Jensen Huang has to say about it.&lt;/p&gt;
&lt;h3 id='but-they-take-rr-jerbs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-they-take-rr-jerbs' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but they take-rr jerbs&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href='https://news.ycombinator.com/item?id=43776612' title=''&gt;So does open source.&lt;/a&gt; We used to pay good money for databases.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;re a field premised on automating other people’s jobs away. &amp;ldquo;Productivity gains,&amp;rdquo; say the economists. You get what that means, right? Fewer people doing the same stuff. Talked to a travel agent lately? Or a floor broker? Or a record store clerk? Or a darkroom tech?&lt;/p&gt;

&lt;p&gt;When this argument comes up, libertarian-leaning VCs start the chant: lamplighters, creative destruction, new kinds of work. Maybe. But I&amp;rsquo;m not hypnotized. I have no fucking clue whether we’re going to be better off after LLMs. Things could get a lot worse for us.&lt;/p&gt;

&lt;p&gt;LLMs really might displace many software developers. That&amp;rsquo;s not a high horse we get to ride. Our jobs are just as much in tech&amp;rsquo;s line of fire as everybody else&amp;rsquo;s have been for the last 3 decades. We&amp;rsquo;re not &lt;a href='https://en.wikipedia.org/wiki/2024_United_States_port_strike' title=''&gt;East Coast dockworkers&lt;/a&gt;; we won&amp;rsquo;t stop progress on our own.&lt;/p&gt;
&lt;h3 id='but-the-plagiarism' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-plagiarism' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but the plagiarism&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Artificial intelligence is profoundly — and probably unfairly — threatening to visual artists in ways that might be hard to appreciate if you don&amp;rsquo;t work in the arts.&lt;/p&gt;

&lt;p&gt;We imagine artists spending their working hours pushing the limits of expression. But the median artist isn&amp;rsquo;t producing gallery pieces. They produce on brief: turning out competent illustrations and compositions for magazine covers, museum displays, motion graphics, and game assets.&lt;/p&gt;

&lt;p&gt;LLMs easily — alarmingly — clear industry quality bars. Gallingly, one of the things they&amp;rsquo;re best at is churning out just-good-enough facsimiles of human creative work.  I have family in visual arts. I can&amp;rsquo;t talk to them about LLMs. I don&amp;rsquo;t blame them. They&amp;rsquo;re probably not wrong.&lt;/p&gt;

&lt;p&gt;Meanwhile, software developers spot code fragments &lt;a href="https://arxiv.org/abs/2311.17035"&gt;seemingly lifted&lt;/a&gt; from public repositories on Github and lose their shit. What about the licensing? If you&amp;rsquo;re a lawyer, I defer. But if you&amp;rsquo;re a software developer playing this card? Cut me a little slack as I ask you to shove this concern up your ass. No profession has demonstrated more contempt for intellectual property.&lt;/p&gt;

&lt;p&gt;The median dev thinks Star Wars and Daft Punk are a public commons. The great cultural project of developers has been opposing any protection that might inconvenience a monetizable media-sharing site. When they fail at policy, they route around it with coercion. They stand up global-scale piracy networks and sneer at anybody who so much as tries to preserve a new-release window for a TV show.&lt;/p&gt;

&lt;p&gt;Call any of this out if you want to watch a TED talk about how hard it is to stream &lt;em&gt;The Expanse&lt;/em&gt; on LibreWolf. Yeah, we get it. You don&amp;rsquo;t believe in IPR. Then shut the fuck up about IPR. Reap the whirlwind.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s all special pleading anyways. LLMs digest code further than you do. If you don&amp;rsquo;t believe a typeface designer can stake a moral claim on the terminals and counters of a letterform, you sure as hell can&amp;rsquo;t be possessive about a red-black tree.&lt;/p&gt;
&lt;h3 id='positive-case-redux' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#positive-case-redux' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;positive case redux&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;When I started writing a couple days ago, I wrote a section to &amp;ldquo;level set&amp;rdquo; to the  state of the art of LLM-assisted programming. A bluefish filet has a longer shelf life than an LLM take. In the time it took you to read this, everything changed.&lt;/p&gt;

&lt;p&gt;Kids today don&amp;rsquo;t just use agents; they use asynchronous agents. They wake up, free-associate 13 different things for their LLMs to work on, make coffee, fill out a TPS report, drive to the Mars Cheese Castle, and then check their notifications. They&amp;rsquo;ve got 13 PRs to review. Three get tossed and re-prompted. Five of them get the same feedback a junior dev gets. And five get merged.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&amp;ldquo;I&amp;rsquo;m sipping rocket fuel right now,&amp;rdquo;&lt;/em&gt; a friend tells me. &lt;em&gt;&amp;ldquo;The folks on my team who aren&amp;rsquo;t embracing AI? It&amp;rsquo;s like they&amp;rsquo;re standing still.&amp;rdquo;&lt;/em&gt; He&amp;rsquo;s not bullshitting me. He doesn&amp;rsquo;t work in SFBA. He&amp;rsquo;s got no reason to lie.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s plenty of things I can&amp;rsquo;t trust an LLM with. No LLM has any of access to prod here. But I&amp;rsquo;ve been first responder on an incident and fed 4o — not o4-mini, 4o — log transcripts, and watched it in seconds spot LVM metadata corruption issues on a host we&amp;rsquo;ve been complaining about for months. Am I better than an LLM agent at interrogating OpenSearch logs and Honeycomb traces? No. No, I am not.&lt;/p&gt;

&lt;p&gt;To the consternation of many of my friends, I&amp;rsquo;m not a radical or a futurist. I&amp;rsquo;m a statist. I believe in the haphazard perseverance of complex systems, of institutions, of reversions to the mean. I write Go and Python code. I&amp;rsquo;m not a Kool-aid drinker.&lt;/p&gt;

&lt;p&gt;But something real is happening. My smartest friends are blowing it off. Maybe I persuade you. Probably I don&amp;rsquo;t. But we need to be done making space for bad arguments.&lt;/p&gt;
&lt;h3 id='but-im-tired-of-hearing-about-it' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-im-tired-of-hearing-about-it' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;but i&amp;rsquo;m tired of hearing about it&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;And here I rejoin your company. I read &lt;a href='https://simonwillison.net/' title=''&gt;Simon Willison&lt;/a&gt;, and that&amp;rsquo;s all I really need. But all day, every day, a sizable chunk of the front page of HN is allocated to LLMs: incremental model updates, startups doing things with LLMs, LLM tutorials, screeds against LLMs. It&amp;rsquo;s annoying!&lt;/p&gt;

&lt;p&gt;But AI is also incredibly — a word I use advisedly — important. It&amp;rsquo;s getting the same kind of attention that smart phones got in 2008, and not as much as the Internet got. That seems about right.&lt;/p&gt;

&lt;p&gt;I think this is going to get clearer over the next year. The cool kid haughtiness about &amp;ldquo;stochastic parrots&amp;rdquo; and &amp;ldquo;vibe coding&amp;rdquo; can&amp;rsquo;t survive much more contact with reality. I&amp;rsquo;m snarking about these people, but I meant what I said: they&amp;rsquo;re smarter than me. And when they get over this affectation, they&amp;rsquo;re going to make coding agents profoundly more effective than they are today.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Using Kamal 2.0 in Production</title>
    <link rel="alternate" href="https://fly.io/blog/kamal-in-production/"/>
    <id>https://fly.io/blog/kamal-in-production/</id>
    <published>2025-05-29T00:00:00+00:00</published>
    <updated>2025-06-02T20:40:48+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/kamal-in-production/assets/production.jpg"/>
    <content type="html">&lt;p&gt;&lt;a href='https://pragprog.com/titles/rails8/agile-web-development-with-rails-8/' title=''&gt;Agile Web Development with Rails 8&lt;/a&gt; is off to production, where they do things like editing, indexing, pagination, and printing. In researching the chapter on Deployment and Production, I became very dissatisfied with the content available on Kamal. I ended up writing my own, and it went well beyond the scope of the book. I then extracted what I needed from the result and put it in the book.&lt;/p&gt;

&lt;p&gt;Now that I have some spare time, I took a look at the greater work. It was more than a chapter and less than a book, so I decided to publish it &lt;a href='https://rubys.github.io/kamal-in-production/' title=''&gt;online&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This took me only a matter of hours. I had my notes in the XML grammar that Pragmatic Programming uses for books. I asked GitHub Copilot to convert them to Markdown. It did the job without my having to explain the grammar. It made intelligent guesses as to how to handle footnotes and got a number of these wrong, but that was easy to fix. On a lark, I asked it to proofread the content, and it did that too.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Don&amp;rsquo;t get me wrong, Kamal is great. There are plenty of videos on how to get toy projects online, and the documentation will tell you what each field in the configuration file does. But none pull together everything you need to deploy a real project. For example, &lt;a href='https://rubys.github.io/kamal-in-production/Assemble/' title=''&gt;there are seven things you need to get started&lt;/a&gt;. Some are optional, some you may already have, and all can be gathered quickly &lt;strong class='font-semibold text-navy-950'&gt;if you have a list&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Kamal is just one piece of the puzzle. To deploy your software using Kamal, you need to be aware of the vast Docker ecosystem. You will want to set up a builder, sign up for a container repository, and lock down your secrets. And as you grow, you will want a load balancer and a managed database.&lt;/p&gt;

&lt;p&gt;And production is much more than copying files and starting a process. It is ensuring that your database is backed up, that your logs are searchable, and that your application is being monitored. It is also about ensuring that your application is secure.&lt;/p&gt;

&lt;p&gt;My list is opinionated. Each choice has a lot of options. For SSH keys, there are a lot of key types. If you don&amp;rsquo;t have an opinion, go with what GitHub recommends. For hosting, there are a lot of providers, and most videos start with Hetzner, which is a solid choice. But did you know that there are separate paths for Cloud and Robot, and when you would want either?&lt;/p&gt;

&lt;p&gt;A list with default options and alternatives highlighted was what would have helped me get started. Now it is available to you. The &lt;a href='https://github.com/rubys/kamal-in-production/' title=''&gt;source is on GitHub&lt;/a&gt;. &lt;a href='https://creativecommons.org/public-domain/cc0/' title=''&gt;CC0 licensed&lt;/a&gt;. Feel free to add side pages or links to document Digital Ocean, add other monitoring tools, or simply correct a typo that GitHub Copilot and I missed (or made).&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;And if you happen to be in the south eastern part of the US in August, come see me talk on this topic at the &lt;a href='https://blog.carolina.codes/p/announcing-our-2025-speakers' title=''&gt;Carolina Code Conference&lt;/a&gt;. If you can&amp;rsquo;t make it, the presentation will be recorded and posted online.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>parking_lot: ffffffffffffffff...</title>
    <link rel="alternate" href="https://fly.io/blog/parking-lot-ffffffffffffffff/"/>
    <id>https://fly.io/blog/parking-lot-ffffffffffffffff/</id>
    <published>2025-05-28T00:00:00+00:00</published>
    <updated>2025-06-02T20:40:48+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/parking-lot-ffffffffffffffff/assets/ffff-cover.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a public cloud that runs apps in a bunch of locations all over the world. This is a post about a gnarly bug in our Anycast router, the largest Rust project in our codebase. You won’t need much of any Rust to follow it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The basic idea is simple: customers give us Docker containers, and tell us which one of 30+ regions around the world they want them to run in. We convert the containers into lightweight virtual machines, and then link them to an Anycast network. If you boot up an app here in Sydney and Frankfurt, and a request for it lands in Narita, it&amp;rsquo;ll get routed to Sydney. The component doing that work is called &lt;code&gt;fly-proxy&lt;/code&gt;. It&amp;rsquo;s a Rust program, and it has been ill behaved of late.&lt;/p&gt;
&lt;h3 id='dramatis-personae' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#dramatis-personae' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Dramatis Personae&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;fly-proxy&lt;/code&gt;, our intrepid Anycast router.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;corrosion&lt;/code&gt;, our intrepid Anycast routing protocol.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Rust&lt;/code&gt;, a programming language you probably don&amp;rsquo;t use.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;read-write locks&lt;/code&gt;, a synchronization primitive that allows for many readers &lt;em&gt;or&lt;/em&gt; one single writer.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;parking_lot&lt;/code&gt;, a well-regarded optimized implementation of locks in Rust.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Gaze not into the abyss, lest you become recognized as an &lt;strong class="font-semibold text-navy-950"&gt;&lt;em&gt;abyss domain expert&lt;/em&gt;&lt;/strong&gt;, and they expect you keep gazing into the damn thing&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Mathewson &lt;a href="https://x.com/nickm_tor/status/860234274842324993?lang=en" title=""&gt;6:31&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;&lt;h3 id='anycast-routing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#anycast-routing' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Anycast Routing&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;You already know how to build a proxy. Connection comes in, connection goes out, copy back and forth. So for the amount of time we spend writing about &lt;code&gt;fly-proxy&lt;/code&gt;, you might wonder what the big deal is.&lt;/p&gt;

&lt;p&gt;To be fair, in the nuts and bolts of actually proxying requests, &lt;code&gt;fly-proxy&lt;/code&gt; does some interesting stuff. For one thing, it&amp;rsquo;s &lt;a href='https://github.com/jedisct1/yes-rs' title=''&gt;written in Rust&lt;/a&gt;, which is apparently a big deal all on its own. It&amp;rsquo;s a large, asynchronous codebase that handles multiple protocols, TLS termination, and certificate issuance. It exercises a lot of &lt;a href='https://tokio.rs/' title=''&gt;Tokio&lt;/a&gt; features.&lt;/p&gt;

&lt;p&gt;But none of this is the hard part of &lt;code&gt;fly-proxy&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We operate thousands of servers around the world. A customer Fly Machine might get scheduled onto any of them; in some places, the expectation we set with customers is that their Machines will potentially start in less than a second. More importantly, a Fly Machine can terminate instantly. In both cases, but especially the latter, &lt;code&gt;fly-proxy&lt;/code&gt; potentially needs to know, so that it does (or doesn&amp;rsquo;t) route traffic there.&lt;/p&gt;

&lt;p&gt;This is the hard problem: managing millions of connections for millions of apps. It&amp;rsquo;s a lot of state to manage, and it&amp;rsquo;s in constant flux. We refer to this as the &amp;ldquo;state distribution problem&amp;rdquo;, but really, it quacks like a routing protocol.&lt;/p&gt;
&lt;h3 id='our-routing-protocol-is-corrosion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-routing-protocol-is-corrosion' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Our Routing Protocol is Corrosion&lt;/span&gt;&lt;/h3&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;Corrosion2, to be precise.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We&amp;rsquo;ve been through multiple iterations of the state management problem, and the stable place we&amp;rsquo;ve settled is a &lt;a href='https://github.com/superfly/corrosion' title=''&gt;system called Corrosion&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Corrosion is a large SQLite database mirrored globally across several thousand servers. There are three important factors that make Corrosion work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The SQLite database Corrosion replicates is CRDT-structured.
&lt;/li&gt;&lt;li&gt;In our orchestration model, individual worker servers are the source of truth for any Fly Machines running on them; there&amp;rsquo;s no globally coordinated orchestration state.
&lt;/li&gt;&lt;li&gt;We use &lt;a href='http://fly.io/blog/building-clusters-with-serf/#what-serf-is-doing' title=''&gt;SWIM gossip&lt;/a&gt; to publish updates from those workers across the fleet.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;This works. A Fly Machine terminates in Dallas; a &lt;code&gt;fly-proxy&lt;/code&gt; instance in Singapore knows within a small number of seconds.&lt;/p&gt;
&lt;h3 id='routing-protocol-implementations-are-hard' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#routing-protocol-implementations-are-hard' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Routing Protocol Implementations Are Hard&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;A routing protocol is a canonical example of a distributed system. We&amp;rsquo;ve certainly hit distsys bugs in Corrosion. But this story is about basic implementation issues. &lt;/p&gt;

&lt;p&gt;A globally replicated SQLite database is an awfully nice primitive, but we&amp;rsquo;re not actually doing SQL queries every time a request lands.&lt;/p&gt;

&lt;p&gt;In somewhat the same sense as a router works both with a &lt;a href='https://www.dasblinkenlichten.com/rib-and-fib-understanding-the-terminology/' title=''&gt;RIB and a FIB&lt;/a&gt;, there is in &lt;code&gt;fly-proxy&lt;/code&gt; a system of record for routing information (Corrosion), and then an in-memory aggregation of that information used to make fast decisions. In &lt;code&gt;fly-proxy&lt;/code&gt;, that&amp;rsquo;s called the Catalog. It&amp;rsquo;s a record of everything in Corrosion a proxy might need to know about to forward requests.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s a fun bug from last year:&lt;/p&gt;

&lt;p&gt;At any given point in time, there&amp;rsquo;s a lot going on inside &lt;code&gt;fly-proxy&lt;/code&gt;. It is a highly concurrent system. In particular, there are many readers to the Catalog, and also, when Corrosion updates, writers. We manage access to the Catalog with a system of &lt;a href='https://doc.rust-lang.org/std/sync/struct.RwLock.html' title=''&gt;read-write locks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Rust is big on pattern-matching; instead of control flow based (at bottom) on simple arithmetic expressions, Rust allows you to &lt;code&gt;match&lt;/code&gt; exhaustively (with compiler enforcement) on the types of an expression, and the type system expresses things like &lt;code&gt;Ok&lt;/code&gt; or &lt;code&gt;Err&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But &lt;code&gt;match&lt;/code&gt; can be cumbersome, and so there are shorthands. One of them is &lt;code&gt;if let&lt;/code&gt;, which is syntax that makes a pattern match read like a classic &lt;code&gt;if&lt;/code&gt; statement. Here&amp;rsquo;s an example:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-zarn5q31"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-zarn5q31"&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Load&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;Local&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.load&lt;/span&gt;&lt;span class="nf"&gt;.read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// do a bunch of stuff with `load`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="nf"&gt;.init_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &amp;ldquo;if&amp;rdquo; arm of that branch is taken if &lt;code&gt;self.load.read().get()&lt;/code&gt; returns a value with the type &lt;code&gt;Some&lt;/code&gt;. To retrieve that value, the expression calls &lt;code&gt;read()&lt;/code&gt; to grab a lock.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;though Rust programmers probably notice the bug quickly&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The bug is subtle: in that code, the lock &lt;code&gt;self.load.read().get()&lt;/code&gt; takes is held not just for the duration of the &amp;ldquo;if&amp;rdquo; arm, but also for the &amp;ldquo;else&amp;rdquo; arm — you can think of &lt;code&gt;if let&lt;/code&gt; expressions as being rewritten to the equivalent &lt;code&gt;match&lt;/code&gt; expression, where that lifespan is much clearer.&lt;/p&gt;

&lt;p&gt;Anyways that&amp;rsquo;s real code and it occurred on a code path in &lt;code&gt;fly-proxy&lt;/code&gt; that was triggered by a Corrosion update propagating from host to host across our fleet in millisecond intervals of time, converging quickly, like a good little routing protocol, on a global consensus that our entire Anycast routing layer should be deadlocked. Fun times.&lt;/p&gt;
&lt;h3 id='the-watchdog-and-regionalizing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-watchdog-and-regionalizing' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Watchdog, and Regionalizing&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The experience of that Anycast outage was arresting, and immediately altered our plans. We set about two tasks, one short-term and one long.&lt;/p&gt;

&lt;p&gt;In the short term: we made deadlocks nonlethal with a &amp;ldquo;watchdog&amp;rdquo; system. &lt;code&gt;fly-proxy&lt;/code&gt; has an internal control channel (it drives a REPL operators can run from our servers). During a deadlock (or dead-loop or exhaustion), that channel becomes nonresponsive. We watch for that and bounce the proxy when it happens. A deadlock is still bad! But it&amp;rsquo;s a second-or-two-length arrhythmia, not asystole.&lt;/p&gt;

&lt;p&gt;Meanwhile, over the long term: we&amp;rsquo;re confronting the implications of all our routing state sharing a global broadcast domain. The update that seized up Anycast last year pertained to an app nobody used. There wasn&amp;rsquo;t any real reason for any &lt;code&gt;fly-proxy&lt;/code&gt; to receive it in the first place. But in the &lt;em&gt;status quo ante&lt;/em&gt; of the outage, every proxy received updates for every Fly Machine.&lt;/p&gt;

&lt;p&gt;They still do. Why? Because it scales, and fixing it turns out to be a huge lift. It&amp;rsquo;s a lift we&amp;rsquo;re still making! It&amp;rsquo;s just taking time. We call this effort &amp;ldquo;regionalization&amp;rdquo;, because the next intermediate goal is to confine most updates to the region (Sydney, Frankfurt, Dallas) in which they occur.&lt;/p&gt;

&lt;p&gt;I hope this has been a satisfying little tour of the problem domain we&amp;rsquo;re working in. We have now reached the point where I can start describing the new bug.&lt;/p&gt;
&lt;h3 id='new-bug-round-1-lazy-loading' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-1-lazy-loading' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 1: Lazy Loading&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;We want to transform our original Anycast router, designed to assume a full picture of global state, into a regionalized router with partitioned state. A sane approach: have it lazy-load state information. Then you can spare most of the state code; state for apps that never show up at a particular &lt;code&gt;fly-proxy&lt;/code&gt; in, say, Hong Kong simply doesn&amp;rsquo;t get loaded.&lt;/p&gt;

&lt;p&gt;For months now, portions of the &lt;code&gt;fly-proxy&lt;/code&gt; Catalog have been lazy-loaded. A few weeks ago, the decision is made to get most of the rest of the Catalog state (apps, machines, &amp;amp;c) lazy-loaded as well. It&amp;rsquo;s a straightforward change and it gets rolled out quickly.&lt;/p&gt;

&lt;p&gt;Almost as quickly, proxies begin locking up and getting bounced by the watchdog. Not in the frightening Beijing-Olympics-level coordinated fashion that happened during the Outage, but any watchdog bounce is a problem.&lt;/p&gt;

&lt;p&gt;We roll back the change.&lt;/p&gt;

&lt;p&gt;From the information we have, we&amp;rsquo;ve narrowed things down to two suspects. First, lazy-loading changes the read/write patterns and thus the pressure on the RWLocks the Catalog uses; it could just be lock contention. Second, we spot a suspicious &lt;code&gt;if let&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id='new-bug-round-2-the-lock-refactor' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-2-the-lock-refactor' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 2: The Lock Refactor&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Whichever the case, there&amp;rsquo;s a reason these proxies locked up with the new lazy-loading code, and our job is to fix it. The &lt;code&gt;if let&lt;/code&gt; is easy. Lock contention is a little trickier.&lt;/p&gt;

&lt;p&gt;At this point it&amp;rsquo;s time to introduce a new character to the story, though they&amp;rsquo;ve been lurking on the stage the whole time: it&amp;rsquo;s &lt;a href='https://github.com/Amanieu/parking_lot' title=''&gt;&lt;code&gt;parking_lot&lt;/code&gt;&lt;/a&gt;, an important, well-regarded, and widely-used replacement for the standard library&amp;rsquo;s lock implementation.&lt;/p&gt;

&lt;p&gt;Locks in &lt;code&gt;fly-proxy&lt;/code&gt; are &lt;code&gt;parking_lot&lt;/code&gt; locks. People use &lt;code&gt;parking_lot&lt;/code&gt; mostly because it is very fast and tight (a lock takes up just a single 64-bit word). But we use it for the feature set. The feature we&amp;rsquo;re going to pull out this time is lock timeouts: the RWLock in &lt;code&gt;parking_lot&lt;/code&gt; exposes a &lt;a href='https://amanieu.github.io/parking_lot/parking_lot/struct.RwLock.html#method.try_write_for' title=''&gt;&lt;code&gt;try_write_for&lt;/code&gt;&lt;/a&gt;method, which takes a &lt;code&gt;Duration&lt;/code&gt;, after which an attempt to grab the write lock fails.&lt;/p&gt;

&lt;p&gt;Before rolling out a new lazy-loading &lt;code&gt;fly-proxy&lt;/code&gt;, we do some refactoring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;our Catalog write locks all time out, so we&amp;rsquo;ll get telemetry and a failure recovery path if that&amp;rsquo;s what&amp;rsquo;s choking the proxy to death,
&lt;/li&gt;&lt;li&gt;we eliminate RAII-style lock acquisition from the Catalog code (in RAII style, you grab a lock into a local value, and the lock is released implicitly when the scope that expression occurs in concludes) and replace it with explicit closures, so you can look at the code and see precisely the interval in which the lock is held, and
&lt;/li&gt;&lt;li&gt;since we now have code that is very explicit about locking intervals, we instrument it with logging and metrics so we can see what&amp;rsquo;s happening.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;We should be set. The suspicious &lt;code&gt;if let&lt;/code&gt; is gone, lock acquisition can time out, and we have all this new visibility.&lt;/p&gt;

&lt;p&gt;Nope. Immediately more lockups, all in Europe, especially in &lt;code&gt;WAW&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id='new-bug-round-3-telemetry-inspection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-3-telemetry-inspection' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 3: Telemetry Inspection&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;That we&amp;rsquo;re still seeing deadlocks is f&amp;#39;ing weird. We&amp;rsquo;ve audited all our Catalog locks. You can look at the code and see the lifespan of a grabbed lock.&lt;/p&gt;

&lt;p&gt;We have a clue: our lock timeout logs spam just before a proxy locks up and is killed by the watchdog. This clue: useless. But we don&amp;rsquo;t know that yet!&lt;/p&gt;

&lt;p&gt;Our new hunch is that something is holding a lock for a very long time, and we just need to find out which thing that is. Maybe the threads updating the Catalog from Corrosion are dead-looping?&lt;/p&gt;

&lt;p&gt;The instrumentation should tell the tale. But it does not. We see slow locks… just before the entire proxy locks up. And they happen in benign, quiet applications. Random, benign, quiet applications.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;parking_lot&lt;/code&gt; has a &lt;a href='https://amanieu.github.io/parking_lot/parking_lot/deadlock/index.html' title=''&gt;deadlock detector&lt;/a&gt;. If you ask it, it&amp;rsquo;ll keep a waiting-for dependency graph and detect stalled threads. This runs on its own thread, isolate outside the web of lock dependencies in the rest of the proxy. We cut a build of the proxy with the deadlock detector enabled and wait for something in &lt;code&gt;WAW&lt;/code&gt; to lock up. And it does. But &lt;code&gt;parking_lot&lt;/code&gt; doesn&amp;rsquo;t notice. As far as it&amp;rsquo;s concerned, nothing is wrong.&lt;/p&gt;

&lt;p&gt;We are at this moment very happy we did the watchdog thing.&lt;/p&gt;
&lt;h3 id='new-bug-round-4-descent-into-madness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-4-descent-into-madness' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 4: Descent Into Madness&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;When the watchdog bounces a proxy, it snaps a core dump from the process it just killed. We are now looking at core dumps. There is only one level of decompensation to be reached below &amp;ldquo;inspecting core dumps&amp;rdquo;, and that&amp;rsquo;s &amp;ldquo;blaming the compiler&amp;rdquo;. We will get there.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s Pavel, at the time:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I’ve been staring at the last core dump from &lt;code&gt;waw&lt;/code&gt; . It’s quite strange.
First, there is no thread that’s running inside the critical section. Yet, there is a thread that’s waiting to acquire write lock and a bunch of threads waiting to acquire a read lock.
That’s doesn’t prove anything, of course, as a thread holding catalog write lock might have just released it before core dump was taken. But that would be quite a coincidence.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The proxy is locked up. But there are no threads in a critical section for the catalog. Nevertheless, we can see stack traces of multiple threads waiting to acquire locks. In the moment, Pavel thinks this could be a fluke. But we&amp;rsquo;ll soon learn that &lt;em&gt;every single stack trace&lt;/em&gt; shows the same pattern: everything wants the Catalog lock, but nobody has it.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s hard to overstate how weird this is. It breaks both our big theories: it&amp;rsquo;s not compatible with a Catalog deadlock that we missed, and it&amp;rsquo;s not compatible with a very slow lock holder. Like a podcast listener staring into the sky at the condensation trail of a jetliner, we begin reaching for wild theories. For instance: &lt;code&gt;parking_lot&lt;/code&gt; locks are synchronous, but we&amp;rsquo;re a Tokio application; something somewhere could be taking an async lock that&amp;rsquo;s confusing the runtime. Alas, no.&lt;/p&gt;

&lt;p&gt;On the plus side, we are now better at postmortem core dump inspection with &lt;code&gt;gdb&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id='new-bug-round-5-madness-gives-way-to-desperation' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-5-madness-gives-way-to-desperation' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;New Bug, Round 5: Madness Gives Way To Desperation&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Fuck it, we&amp;rsquo;ll switch to &lt;code&gt;read_recursive&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A recursive read lock is an eldritch rite invoked when you need to grab a read lock deep in a call tree where you already grabbed that lock, but can&amp;rsquo;t be arsed to structure the code properly to reflect that. When you ask for a recursive lock, if you already hold the lock, you get the lock again, instead of a deadlock.&lt;/p&gt;

&lt;p&gt;Our theory: &lt;code&gt;parking_lot&lt;/code&gt;goes through some trouble to make sure a stampede of readers won&amp;rsquo;t starve writers, who are usually outnumbered. It prefers writers by preventing readers from acquiring locks when there&amp;rsquo;s at least one waiting writer. And &lt;code&gt;read_recursive&lt;/code&gt; sidesteps that logic.&lt;/p&gt;

&lt;p&gt;Maybe there&amp;rsquo;s some ultra-slow reader somehow not showing up in our traces, and maybe switching to recursive locks will cut through that.&lt;/p&gt;

&lt;p&gt;This does not work. At least, not how we hoped it would. It does generate a new piece of evidence:  &lt;code&gt;RwLock reader count overflow&lt;/code&gt; log messages, and lots of them.&lt;/p&gt;
&lt;h3 id='there-are-things-you-are-not-meant-to-know' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#there-are-things-you-are-not-meant-to-know' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;There Are Things You Are Not Meant To Know&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;You&amp;rsquo;re reading a 3,000 word blog post about a single concurrency bug, so my guess is you&amp;rsquo;re the kind of person who compulsively wants to understand how everything works. That&amp;rsquo;s fine, but a word of advice: there are things where, if you find yourself learning about them in detail, something has gone wrong.&lt;/p&gt;

&lt;p&gt;One of those things is the precise mechanisms used by your RWLock implementation.&lt;/p&gt;

&lt;p&gt;The whole point of &lt;code&gt;parking_lot&lt;/code&gt; is that the locks are tiny, marshalled into a 64 bit word. Those bits are partitioned into &lt;a href='https://github.com/Amanieu/parking_lot/blob/master/src/raw_rwlock.rs' title=''&gt;4 signaling bits&lt;/a&gt; (&lt;code&gt;PARKED&lt;/code&gt;, &lt;code&gt;WRITER_PARKED&lt;/code&gt;, &lt;code&gt;WRITER&lt;/code&gt;, and &lt;code&gt;UPGRADEABLE&lt;/code&gt;) and a 60-bit counter of lock holders.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Me, a dummy: sounds like we overflowed that counter.&lt;/p&gt;

&lt;p&gt;Pavel, a genius: we are not overflowing a 60-bit counter.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ruling out a genuine overflow (which would require paranormal activity) leaves us with corruption as the culprit. Something is bashing our lock word. If the counter is corrupted, maybe the signaling bits are too; then we&amp;rsquo;re in an inconsistent state, an artificial deadlock.&lt;/p&gt;

&lt;p&gt;Easily confirmed. We cast the lock words into &lt;code&gt;usize&lt;/code&gt; and log them. Sure enough, they&amp;rsquo;re &lt;code&gt;0xFFFFFFFFFFFFFFFF&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is a smoking gun, because it implies all 4 signaling bits are set, and that includes &lt;code&gt;UPGRADEABLE&lt;/code&gt;. Upgradeable locks are read-locks that can be &amp;ldquo;upgraded&amp;rdquo; to write locks. We don&amp;rsquo;t use them.&lt;/p&gt;

&lt;p&gt;This looks like classic memory corruption. But in our core dumps, memory doesn&amp;rsquo;t appear corrupted: the only thing set all &lt;code&gt;FFh&lt;/code&gt; is the lock word.&lt;/p&gt;

&lt;p&gt;We compile and run our test suites &lt;a href='https://github.com/rust-lang/miri' title=''&gt;under &lt;code&gt;miri&lt;/code&gt;&lt;/a&gt;, a Rust interpreter for its &lt;a href='https://rustc-dev-guide.rust-lang.org/mir/index.html' title=''&gt;MIR IR&lt;/a&gt;. &lt;code&gt;miri&lt;/code&gt; does all sorts of cool UB detection, and does in fact spot some UB in our tests, which we are at the time excited about until we deploy the fixes and nothing gets better.&lt;/p&gt;

&lt;p&gt;At this point, Saleem suggests guard pages. We could &lt;code&gt;mprotect&lt;/code&gt; memory pages around the lock to force a panic if a wild write hits &lt;em&gt;near&lt;/em&gt; the lock word, or just allocate zero pages and check them, which we are at the time excited about until we deploy the guard pages and nothing hits them.&lt;/p&gt;
&lt;h3 id='the-non-euclidean-horror-at-the-heart-of-this-bug' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-non-euclidean-horror-at-the-heart-of-this-bug' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Non-Euclidean Horror At The Heart Of This Bug&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;At this point we should recap where we find ourselves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We made a relatively innocuous change to our proxy, which caused proxies in Europe to get watchdog-killed.
&lt;/li&gt;&lt;li&gt;We audited and eliminated all the nasty &lt;code&gt;if-letses&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;We replaced all RAII lock acquisitions with explicit closures, and instrumented the closures. 
&lt;/li&gt;&lt;li&gt;We enabled &lt;code&gt;parking_lot&lt;/code&gt; deadlock detection. 
&lt;/li&gt;&lt;li&gt;We captured and analyzed core dumps for the killed proxies. 
&lt;/li&gt;&lt;li&gt;We frantically switched to recursive read locks, which generated a new error.
&lt;/li&gt;&lt;li&gt;We spotted what looks like memory corruption, but only of that one tiny lock word.
&lt;/li&gt;&lt;li&gt;We ran our code under an IR interpreter to find UB, fixed some UB, and didn&amp;rsquo;t fix the bug.
&lt;/li&gt;&lt;li&gt;We set up guard pages to catch wild writes.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;In Richard Cook&amp;rsquo;s essential &lt;a href='https://how.complexsystems.fail/' title=''&gt;&amp;ldquo;How Complex Systems Fail&amp;rdquo;&lt;/a&gt;, rule #5 is that &amp;ldquo;complex systems operate in degraded mode&amp;rdquo;. &lt;em&gt;The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws&lt;/em&gt;. Maybe &lt;code&gt;fly-proxy&lt;/code&gt; is now like the RBMK reactors at Chernobyl, a system that works just fine as long as operators know about and compensate for its known concurrency flaws, and everybody lives happily ever after.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;We are, in particular, running on the most popular architecture for its RWLock implementation.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We have reached the point where serious conversations are happening about whether we&amp;rsquo;ve found a Rust compiler bug. Amusingly, &lt;code&gt;parking_lot&lt;/code&gt; is so well regarded among Rustaceans that it&amp;rsquo;s equally if not more plausible that Rust itself is broken.&lt;/p&gt;

&lt;p&gt;Nevertheless, we close-read the RWLock implementation. And we spot this:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-c23zvw3n"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-c23zvw3n"&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.state&lt;/span&gt;&lt;span class="nf"&gt;.fetch_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="n"&gt;prev_value&lt;/span&gt;&lt;span class="nf"&gt;.wrapping_sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WRITER_BIT&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;WRITER_PARKED_BIT&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                           &lt;span class="nn"&gt;Ordering&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Relaxed&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This looks like gibberish, so let&amp;rsquo;s rephrase that code to see what it&amp;rsquo;s actually doing:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-oq3znyk"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-oq3znyk"&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.state&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WRITER_BIT&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="n"&gt;WRITER_PARKED_BIT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If you know exactly the state of the word you&amp;rsquo;re writing to, and you know exactly the limited set of permutations the bits of that word can take (for instance, because there&amp;rsquo;s only 4 signaling bits), then instead of clearing bit by fetching a word, altering it, and then storing it, you can clear them &lt;em&gt;atomically&lt;/em&gt; by adding the inverse of those bits to the word.&lt;/p&gt;

&lt;p&gt;This pattern is self-synchronizing, but it relies on an invariant: you&amp;rsquo;d better be right about the original state of the word you&amp;rsquo;re altering. Because if you&amp;rsquo;re wrong, you&amp;rsquo;re adding a very large value to an uncontrolled value.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;parking_lot&lt;/code&gt;, say we have &lt;code&gt;WRITER&lt;/code&gt; and &lt;code&gt;WRITER_PARKED&lt;/code&gt; set: the state is &lt;code&gt;0b1010&lt;/code&gt;. &lt;code&gt;prev_value&lt;/code&gt;, the state of the lock word when the lock operation started, is virtually always 0, and that&amp;rsquo;s what we&amp;rsquo;re counting on. &lt;code&gt;prev_value.wrapping_sub()&lt;/code&gt;then calculates &lt;code&gt;0xFFFFFFFFFFFFFFF6&lt;/code&gt;, which exactly cancels out the &lt;code&gt;0b1010&lt;/code&gt; state, leaving 0.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;As a grace note, the locking code manages to set that one last unset bit before the proxy deadlocks.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Consider though what happens if one of those bits isn&amp;rsquo;t set: state is &lt;code&gt;0b1000&lt;/code&gt;. Now that add doesn&amp;rsquo;t cancel out; the final state is instead &lt;code&gt;0xFFFFFFFFFFFFFFFE&lt;/code&gt;. The reader count is completely full and can&amp;rsquo;t be decremented, and all the waiting bits are set so nothing can happen on the lock.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;parking_lot&lt;/code&gt; is a big deal and we&amp;rsquo;re going to be damn sure before we file a bug report. Which doesn&amp;rsquo;t take long; Pavel reproduces the bug in a minimal test case, with a forked version of &lt;code&gt;parking_lot&lt;/code&gt; that confirms and logs the condition.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://github.com/Amanieu/parking_lot/issues/465' title=''&gt;The &lt;code&gt;parking_lot&lt;/code&gt; team quickly confirms&lt;/a&gt; and fixes the bug.&lt;/p&gt;
&lt;h3 id='ex-insania-claritas' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ex-insania-claritas' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Ex Insania, Claritas&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Here&amp;rsquo;s what we now know to have been happening:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Thread 1 grabs a read lock.
&lt;/li&gt;&lt;li&gt;Thread 2 tries to grab a write lock, with a &lt;code&gt;try_write_for&lt;/code&gt; timeout; it&amp;rsquo;s &amp;ldquo;parked&amp;rdquo; waiting for the reader, which sets &lt;code&gt;WRITER&lt;/code&gt; and &lt;code&gt;WRITER_PARKED&lt;/code&gt; on the raw lock word.
&lt;/li&gt;&lt;li&gt;Thread 1 releases the lock, unparking a waiting writer, which unsets &lt;code&gt;WRITER_PARKED&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;Thread 2 wakes, but not for the reason it thinks: the timing is such that it thinks the lock has timed out. So it tries to clear both &lt;code&gt;WRITER&lt;/code&gt; and &lt;code&gt;WRITER_PARKED&lt;/code&gt; — a bitwise &amp;ldquo;double free&amp;rdquo;. Lock: corrupted. Computer: over. 
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;&lt;a href='https://github.com/Amanieu/parking_lot/pull/466' title=''&gt;The fix is simple&lt;/a&gt;: the writer bit is cleared separately in the same wakeup queue as the reader. The fix is deployed, the lockups never recur.&lt;/p&gt;

&lt;p&gt;At a higher level, the story is this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We&amp;rsquo;re refactoring the proxy to regionalize it, which changes the pattern  of readers and writers on the catalog.
&lt;/li&gt;&lt;li&gt;As we broaden out lazy loads, we introduce writer contention, a classic concurrency/performance debugging problem. It looks like a deadlock at the time! It isn&amp;rsquo;t. 
&lt;/li&gt;&lt;li&gt;&lt;code&gt;try_write_for&lt;/code&gt; is a good move: we need tools to manage contention.
&lt;/li&gt;&lt;li&gt;But now we&amp;rsquo;re on a buggy code path in &lt;code&gt;parking_lot&lt;/code&gt; — we don&amp;rsquo;t know that and can&amp;rsquo;t understand it until we&amp;rsquo;ve lost enough of our minds to second-guess the library.
&lt;/li&gt;&lt;li&gt;We stumble on the bug out of pure dumb luck by stabbing in the dark with &lt;code&gt;read_recursive&lt;/code&gt;.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Mysteries remain. Why did this only happen in &lt;code&gt;WAW&lt;/code&gt;? Some kind of crazy regional timing thing? Something to do with the Polish &lt;em&gt;kreska&lt;/em&gt; diacritic that makes L&amp;rsquo;s sound like W&amp;rsquo;s? The wax and wane of caribou populations? Some very high-traffic APIs local to these regions? All very plausible.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ll never know because we fixed the bug.&lt;/p&gt;

&lt;p&gt;But we&amp;rsquo;re in a better place now, even besides the bug fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we audited all our catalog locking, got rid of all the iflets, and stopped relying on RAII lock guards.
&lt;/li&gt;&lt;li&gt;the resulting closure patterns gave us lock timing metrics, which will be useful dealing with future write contention
&lt;/li&gt;&lt;li&gt;all writes are now labeled, and we emit labeled logs on slow writes, augmented with context data (like app IDs) where we have it
&lt;/li&gt;&lt;li&gt;we also now track the last and current holder of the write lock, augmented with context information; next time we have a deadlock, we should have all the information we need to identify the actors without &lt;code&gt;gdb&lt;/code&gt; stack traces.
&lt;/li&gt;&lt;/ul&gt;
</content>
  </entry>
  <entry>
    <title>Litestream: Revamped</title>
    <link rel="alternate" href="https://fly.io/blog/litestream-revamped/"/>
    <id>https://fly.io/blog/litestream-revamped/</id>
    <published>2025-05-20T00:00:00+00:00</published>
    <updated>2025-05-22T19:59:27+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/litestream-revamped/assets/litestream-revamped.png"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;&lt;a href="https://litestream.io/" title=""&gt;Litestream&lt;/a&gt; is an open-source tool that makes it possible to run many kinds of full-stack applications on top of SQLite by making them reliably recoverable from object storage. This is a post about the biggest change we’ve made to it since I launched it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Nearly a decade ago, I got a bug up my ass. I wanted to build full-stack applications quickly. But the conventional n-tier database design required me to do sysadmin work for each app I shipped. Even the simplest applications depended on heavy-weight database servers like Postgres or MySQL.&lt;/p&gt;

&lt;p&gt;I wanted to launch apps on SQLite, because SQLite is easy. But SQLite is embedded, not a server, which at the time implied that the data for my application lived (and died) with just one server.&lt;/p&gt;

&lt;p&gt;So in 2020, I wrote &lt;a href='https://litestream.io/' title=''&gt;Litestream&lt;/a&gt; to fix that.&lt;/p&gt;

&lt;p&gt;Litestream is a tool that runs alongside a SQLite application. Without changing that running application, it takes over the WAL checkpointing process to continuously stream database updates to an S3-compatible object store. If something happens to the server the app is running on, the whole database can efficiently be restored to a different server. You might lose servers, but you won&amp;rsquo;t lose your data.&lt;/p&gt;

&lt;p&gt;Litestream worked well. So we got ambitious. A few years later, we built &lt;a href='https://github.com/superfly/litefs' title=''&gt;LiteFS&lt;/a&gt;. LiteFS takes the ideas in Litestream and refines them, so that we can do read replicas and primary failovers with SQLite. LiteFS gives SQLite the modern deployment story of an n-tier database like Postgres, while keeping the database embedded.&lt;/p&gt;

&lt;p&gt;We like both LiteFS and Litestream. But Litestream is the more popular project. It&amp;rsquo;s easier to deploy and easier to reason about.&lt;/p&gt;

&lt;p&gt;There are some good ideas in LiteFS. We&amp;rsquo;d like Litestream users to benefit from them. So we&amp;rsquo;ve taken our LiteFS learnings and applied them to some new features in Litestream.&lt;/p&gt;
&lt;h2 id='point-in-time-restores-but-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#point-in-time-restores-but-fast' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Point-in-time restores, but fast&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='https://fly.io/blog/all-in-on-sqlite-litestream/' title=''&gt;Here&amp;rsquo;s how Litestream was originally designed&lt;/a&gt;: you run &lt;code&gt;litestream&lt;/code&gt; against a SQLite database, and it opens up a long-lived read transaction. This transaction arrests SQLite WAL checkpointing, the process by which SQLite consolidates the WAL back into the main database file. Litestream builds a &amp;ldquo;shadow WAL&amp;rdquo; that records WAL pages, and copies them to S3.&lt;/p&gt;

&lt;p&gt;This is simple, which is good. But it can also be slow. When you want to restore a database, you have have to pull down and replay every change since the last snapshot. If you changed a single database page a thousand times, you replay a thousand changes. For databases with frequent writes, this isn&amp;rsquo;t a good approach.&lt;/p&gt;

&lt;p&gt;In LiteFS, we took a different approach. LiteFS is transaction-aware. It doesn&amp;rsquo;t simply record raw WAL pages, but rather ordered ranges of pages associated with transactions, using a file format we call &lt;a href='https://github.com/superfly/ltx' title=''&gt;LTX&lt;/a&gt;. Each LTX file represents a sorted changeset of pages for a given period of time.&lt;/p&gt;

&lt;p&gt;&lt;img alt="a simple linear LTX file with 8 pages between 1 and 21" src="/blog/litestream-revamped/assets/linear-ltx.png" /&gt;&lt;/p&gt;

&lt;p&gt;Because they are sorted, we can easily merge multiple LTX files together and create a new LTX file with only the latest version of each page.&lt;/p&gt;

&lt;p&gt;&lt;img alt="merging three LTX files into one" src="/blog/litestream-revamped/assets/merged-ltx.png" /&gt;&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;This is similar to how an &lt;a href="https://en.wikipedia.org/wiki/Log-structured_merge-tree" title=""&gt;LSM tree&lt;/a&gt; works.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This process of combining smaller time ranges into larger ones is called &lt;em&gt;compaction&lt;/em&gt;. With it, we can replay a SQLite database to a specific point in time, with a minimal  duplicate pages.&lt;/p&gt;
&lt;h2 id='casaas-compare-and-swap-as-a-service' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#casaas-compare-and-swap-as-a-service' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;CASAAS: Compare-and-Swap as a Service&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;One challenge Litestream has to deal with is desynchronization. Part of the point of Litestream is that SQLite applications don&amp;rsquo;t have to be aware of it. But &lt;code&gt;litestream&lt;/code&gt; is just a process, running alongside the application, and it can die independently. If &lt;code&gt;litestream&lt;/code&gt; is down while database changes occur, it will miss changes. The same kind of problem occurs if you start replication from a new server.&lt;/p&gt;

&lt;p&gt;Litestream needs a way to reset the replication stream from a new snapshot. How it does that is with &amp;ldquo;generations&amp;rdquo;. &lt;a href='https://litestream.io/how-it-works/#snapshots--generations' title=''&gt;A generation&lt;/a&gt; represents a snapshot and a stream of WAL updates, uniquely identified. Litestream notices any break in its WAL sequence and starts a new generation, which is how it recovers from desynchronization.&lt;/p&gt;

&lt;p&gt;Unfortunately, storing and managing multiple generations makes it difficult to implement features like failover and read-replicas.&lt;/p&gt;

&lt;p&gt;The most straightforward way around this problem is to make sure only one instance of Litestream can replication to a given destination. If you can do that, you can store just a single, latest generation. That in turn makes it easy to know how to resync a read replica; there&amp;rsquo;s only one generation to choose from.&lt;/p&gt;

&lt;p&gt;In LiteFS, we solved this problem by using Consul, which guaranteed a single leader. That requires users to know about Consul. Things like &amp;ldquo;requiring Consul&amp;rdquo; are probably part of the reason Litestream is so much more popular than LiteFS.&lt;/p&gt;

&lt;p&gt;In Litestream, we&amp;rsquo;re solving the problem a different way. Modern object stores like S3 and Tigris solve this problem for us: they now offer &lt;a href='https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/' title=''&gt;conditional write support&lt;/a&gt;. With conditional writes, we can implement a time-based lease. We get essentially the same constraint Consul gave us, but without having to think about it or set up a dependency.&lt;/p&gt;

&lt;p&gt;In the immediacy, this will mean you can run Litestream with ephemeral nodes, with overlapping run times, and even if they&amp;rsquo;re storing to the same destination, they won&amp;rsquo;t confuse each other.&lt;/p&gt;
&lt;h2 id='lightweight-read-replicas' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lightweight-read-replicas' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Lightweight read replicas&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The original design constraint of both Litestream and LiteFS was to extend SQLite, to modern deployment scenarios, without disturbing people&amp;rsquo;s built code. Both tools are meant to function even if applications are oblivious to them.&lt;/p&gt;

&lt;p&gt;LiteFS is more ambitious than Litestream, and requires transaction-awareness. To get that without disturbing built code, we use a cute trick (a.k.a. a gross hack): LiteFS provides a FUSE filesystem, which lets it act as a proxy between the application and the backing store. From that vantage point, we can easily discern transactions.&lt;/p&gt;

&lt;p&gt;The FUSE approach gave us a lot of control, enough that users could use SQLite replicas just like any other database. But installing and running a whole filesystem (even a fake one) is a lot to ask of users. To work around that problem, we relaxed a constraint: LiteFS can function without the FUSE filesystem if you load an extension into your application code, &lt;a href='https://github.com/superfly/litevfs' title=''&gt;LiteVFS&lt;/a&gt;.  LiteVFS is a &lt;a href='https://www.sqlite.org/vfs.html' title=''&gt;SQLite Virtual Filesystem&lt;/a&gt; (VFS). It works in a variety of environments, including some where FUSE can&amp;rsquo;t, like in-browser WASM builds.&lt;/p&gt;

&lt;p&gt;What we&amp;rsquo;re doing next is taking the same trick and using it on Litestream. We&amp;rsquo;re building a VFS-based read-replica layer. It will be able to fetch and cache pages directly from S3-compatible object storage.&lt;/p&gt;

&lt;p&gt;Of course, there&amp;rsquo;s a catch: this approach isn&amp;rsquo;t as efficient as a local SQLite database. That kind of efficiency, where you don&amp;rsquo;t even need to think about N+1 queries because there&amp;rsquo;s no network round-trip to make the duplicative queries pile up costs, is part of the point of using SQLite.&lt;/p&gt;

&lt;p&gt;But we&amp;rsquo;re optimistic that with cacheing and prefetching, the approach we&amp;rsquo;re using will yield, for the right use cases, strong performance — all while serving SQLite reads hot off of Tigris or S3.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Litestream is fully open source&lt;/h1&gt;
    &lt;p&gt;It&amp;rsquo;s not coupled with Fly.io at all; you can use it anywhere.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://litestream.io/"&gt;
        Check it out &lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='synchronize-lots-of-databases' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#synchronize-lots-of-databases' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Synchronize Lots Of Databases&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;While we&amp;rsquo;ve got you here: we&amp;rsquo;re knocking out one of our most requested features.&lt;/p&gt;

&lt;p&gt;In the old Litestream design, WAL-change polling and slow restores made it infeasible to replicate large numbers of databases from a single process. That has been our answer when users ask us for a &amp;ldquo;wildcard&amp;rdquo; or &amp;ldquo;directory&amp;rdquo; replication argument for the tool.&lt;/p&gt;

&lt;p&gt;Now that we&amp;rsquo;ve switched to LTX, this isn&amp;rsquo;t a problem any more. It should thus be possible to replicate &lt;code&gt;/data/*.db&lt;/code&gt;, even if there&amp;rsquo;s hundreds or thousands of databases in that directory.&lt;/p&gt;
&lt;h2 id='we-still-sqlite' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-still-sqlite' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We Still ❤️ SQLite&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;SQLite has always been a solid database to build on and it&amp;rsquo;s continued to find new use cases as the industry evolves. We&amp;rsquo;re super excited to continue to build Litestream alongside it.&lt;/p&gt;

&lt;p&gt;We have a sneaking suspicion that the robots that write LLM code are going to like SQLite too. We think what &lt;a href='https://phoenix.new/' title=''&gt;coding agents like Phoenix.new&lt;/a&gt; want is a way to try out code on live data, screw it up, and then rollback &lt;em&gt;both the code and the state.&lt;/em&gt; These Litestream updates put us in a position to give agents PITR as a primitive. On top of that, you can build both rollbacks and forks.&lt;/p&gt;

&lt;p&gt;Whether or not you&amp;rsquo;re drinking the AI kool-aid, we think this new design for Litestream is just better. We&amp;rsquo;re psyched to be rolling it out, and for the features it&amp;rsquo;s going to enable.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Launching MCP Servers on Fly.io</title>
    <link rel="alternate" href="https://fly.io/blog/mcp-launch/"/>
    <id>https://fly.io/blog/mcp-launch/</id>
    <published>2025-05-19T00:00:00+00:00</published>
    <updated>2025-05-22T19:59:27+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/mcp-launch/assets/fly-mcp-launch.png"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;This is a blog post. Part showing off. Part opinion. Plan accordingly.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;a href='https://www.anthropic.com/news/model-context-protocol' title=''&gt;Model Context Protocol&lt;/a&gt; is days away from turning six months old. You read that right, six &lt;em&gt;months&lt;/em&gt; old. MCP Servers have both taken the world by storm, and still trying to figure out what they want to be when they grow up.&lt;/p&gt;

&lt;p&gt;There is no doubt that MCP servers are useful. But their appeal goes beyond that. They are also simple and universal. What&amp;rsquo;s not to like?&lt;/p&gt;

&lt;p&gt;Well, for starters, there&amp;rsquo;s basically two types of MCP servers. One small and nimble that runs as a process on your machine. And one that is a HTTP server that runs presumably elsewhere and is &lt;a href='https://modelcontextprotocol.io/specification/2025-03-26/basic/authorization' title=''&gt;standardizing&lt;/a&gt; on OAuth 2.1. And there is a third type, but it is deprecated.&lt;/p&gt;

&lt;p&gt;Next there is the configuration. Asking users to manually edit JSON seems so early 21th century. With Claude, this goes into &lt;code&gt;~/Library/Application Support/Claude/claude_desktop_config.json&lt;/code&gt;, and is found under a &lt;code&gt;MCPServer&lt;/code&gt; key. With Zed, this file is in &lt;code&gt;~/.config/zed/settings.json&lt;/code&gt; and is found under a &lt;code&gt;context_servers&lt;/code&gt; key. And some tools put these files in a different place depending on whether you are running on MacOS, Linux, or Windows.&lt;/p&gt;

&lt;p&gt;Finally, there is security. An MCP server running on your machine literally has access to everything you do. Running remote solves this problem, but did I mention &lt;a href='https://datatracker.ietf.org/doc/html/draft-ietf-oauth-v2-1-12' title=''&gt;OAuth 2.1&lt;/a&gt;? Not exactly something one sets up for casual use.&lt;/p&gt;

&lt;p&gt;None of these issues are fatal - something that is obvious by the fact that MCP servers are quite popular. But can we do better? I think so.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Demo time.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s try out the &lt;a href='https://github.com/modelcontextprotocol/servers/blob/main/src/slack/README.md' title=''&gt;Slack MCP Server&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;MCP Server for the Slack API, enabling Claude to interact with Slack workspaces.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That certainly sounds like a good test case. There is a small amount of &lt;a href='https://github.com/modelcontextprotocol/servers/blob/main/src/slack/README.md#setup' title=''&gt;setup&lt;/a&gt; you need to do, and when you are done you end up with a &lt;em&gt;Bot User OAuth Token&lt;/em&gt; staring with &lt;code&gt;xoxb-&lt;/code&gt; and a &lt;em&gt;Team ID&lt;/em&gt; starting with a &lt;code&gt;T&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You &lt;em&gt;would&lt;/em&gt; run it using the following:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative sh"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-ievvjhpo"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-ievvjhpo"&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @modelcontextprotocol/server-slack
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;But instead, you convert that command to JSON and find the right configuration file and put this information in there. And either run the slack MCP server locally or set up a server with or without authentication.&lt;/p&gt;

&lt;p&gt;Wouldn&amp;rsquo;t it be nice to have the simplicity of a local process, the security of a remote server, and to eliminate the hassle of manually editing JSON?&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s our current thinking:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative sh"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-gdwhiyfl"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-gdwhiyfl"&gt;fly mcp launch &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"npx -y @modelcontextprotocol/server-slack"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--claude&lt;/span&gt; &lt;span class="nt"&gt;--server&lt;/span&gt; slack &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--secret&lt;/span&gt; &lt;span class="nv"&gt;SLACK_BOT_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;xoxb-your-bot-token &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--secret&lt;/span&gt; &lt;span class="nv"&gt;SLACK_TEAM_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;T01234567
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can put this all on one line if you like, I just split this up so it fits on small screens and so we can talk about the various parts.&lt;/p&gt;

&lt;p&gt;The first three words seem reasonable. The quoted string is just the command that we want to run. So lets talk about the four flags. The first tells us which tool&amp;rsquo;s configuration file we want to update. The second specifies the name to use for the server in that configuration file. The last two set secrets.&lt;/p&gt;

&lt;p&gt;Oh, did I say current thinking? Perhaps I should mention that it is something you can run right now, as long as you have flyctl &lt;code&gt;v0.3.125&lt;/code&gt; or later. Complete with the options you would expect from Fly.io, like the ability to configure auto-stop, file contents, flycast, secrets, region, volumes, and VM sizes.&lt;/p&gt;

&lt;p&gt;And, hey, lookie there:&lt;/p&gt;

&lt;p&gt;&lt;img alt="testing, testing, 1, 2, 3" src="/blog/mcp-launch/assets/mcp-slack.png" /&gt;&lt;/p&gt;

&lt;p&gt;Support for Claude, Cursor, Neovim, VS Code, Windsurf, and Zed are built in. You can select multiple clients and configuration files.&lt;/p&gt;

&lt;p&gt;By default, bearer token authentication will be set up on both the server and client.&lt;/p&gt;

&lt;p&gt;You can find the complete set of options on our &lt;a href='https://fly.io/docs/flyctl/mcp-launch/' title=''&gt;&lt;code&gt;fly mcp launch&lt;/code&gt;&lt;/a&gt; docs page.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;But this post isn&amp;rsquo;t just about experimental demoware that is subject to change.
It is about the depth of support that we are rapidly bringing online, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Support for all transports, not just the ones we recommend.
&lt;/li&gt;&lt;li&gt;Ability to deploy using the command line or the Machines API, with a number of different options that allow you to chose between elegant simplicity and excruciating precise control.
&lt;/li&gt;&lt;li&gt;Ability to deploy each MCP server to a separate Machine, container, or even inside your application.
&lt;/li&gt;&lt;li&gt;Access via HTTP Authorization, wireguard tunnels and flycast, or reverse proxies.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;You can see all this spelled out in our &lt;a href='https://fly.io/docs/mcp/' title=''&gt;docs&lt;/a&gt;. Be forewarned, most pages are marked as &lt;em&gt;beta&lt;/em&gt;. But the examples provided all work. Well, there may be a bug here or there, but the examples &lt;em&gt;as shown&lt;/em&gt; are thought to work. Maybe.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s figure out the ideal ergonomics of deploying MCP servers remotely together!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Provisioning Machines using MCPs</title>
    <link rel="alternate" href="https://fly.io/blog/mcp-provisioning/"/>
    <id>https://fly.io/blog/mcp-provisioning/</id>
    <published>2025-05-07T00:00:00+00:00</published>
    <updated>2025-05-22T19:59:27+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/mcp-provisioning/assets/Hello.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Today’s state of the art is K8S, Terraform, web based UIs, and CLIs.  Those days are numbered.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;On Monday, I created my first fly volume using an &lt;a href='https://modelcontextprotocol.io/introduction' title=''&gt;MCP&lt;/a&gt;. For those who don&amp;rsquo;t know what MCPs are, they are how you attach tools to &lt;a href='https://en.wikipedia.org/wiki/Large_language_model' title=''&gt;LLM&lt;/a&gt;s like Claude or Cursor. I added support for
&lt;a href='https://fly.io/docs/flyctl/volumes-create/' title=''&gt;fly volume create&lt;/a&gt; to &lt;a href='https://fly.io/docs/flyctl/mcp-server/' title=''&gt;fly mcp server&lt;/a&gt;, and it worked the first time.
A few hours later, and with the assistance of GitHub Copilot, i added support for all &lt;a href='https://fly.io/docs/flyctl/volumes/' title=''&gt;fly volumes&lt;/a&gt; commands.&lt;/p&gt;

&lt;hr&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;This movie summary is from &lt;a href="https://collidecolumn.wordpress.com/2013/08/04/when-worlds-collide-77-talking-with-computers-beyond-mouse-and-keyboard/"&gt;When Worlds Collide, by Nalaka Gunawardene&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I&amp;rsquo;m reminded of the memorable scene in the film &lt;a href='http://en.wikipedia.org/wiki/Star_Trek_IV:_The_Voyage_Home' title=''&gt;Star Trek IV: The Voyage Home&lt;/a&gt; (1986). Chief Engineer Scotty, having time-travelled 200 years back to late 20th century San Francisco with his crew mates, encounters an early Personal Computer (PC).&lt;/p&gt;

&lt;p&gt;Sitting in front of it, he addresses the machine affably as “Computer!” Nothing happens. Scotty repeats himself; still no response. He doesn’t realise that voice recognition capability hadn’t arrived yet.&lt;/p&gt;

&lt;p&gt;Exasperated, he picks up the mouse and speaks into it: “Hello, computer?” The computer’s owner offers helpful advice: “Just use the keyboard.”&lt;/p&gt;

&lt;p&gt;Scotty looks astonished. “A keyboard?” he asks, and adds in a sarcastic tone: “How quaint!”&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;A while later, I asked for a list of volumes for an existing app, and Claude noted that I had a few volumes that weren&amp;rsquo;t attached to any machines. So I asked it to delete the oldest unattached volume, and it did so. Then it occurred to me to do it again; this time I captured the screen:&lt;/p&gt;
&lt;div align="center"&gt;&lt;p&gt;&lt;img alt="Deleting a volume using MCP: &amp;quot;What is my oldest volume&amp;quot;? ... &amp;quot;Delete that volume too&amp;quot;" src="/blog/mcp-provisioning/assets/volume-delete.png"&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A few notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I could have written a program using the &lt;a href='https://fly.io/docs/machines/api/volumes-resource/' title=''&gt;machines API&lt;/a&gt;, but that would have required some effort.
&lt;/li&gt;&lt;li&gt;I could have used &lt;a href='https://fly.io/docs/flyctl/volumes/' title=''&gt;flyctl&lt;/a&gt; directly, but to be honest, I would have had to use the documentation and a bit of copy/pasting of arguments.
&lt;/li&gt;&lt;li&gt;I could have navigated to a list of volumes for this application in the Fly.io dashboard, but there currently isn&amp;rsquo;t any way to sort the list or delete a volume. Had those been in place, that would have been a reasonable alternative if that was something I was actively looking for. This felt different to me, the LLM noted something, brought it to my attention, and I asked it to make a change as a result.
&lt;/li&gt;&lt;li&gt;Since this support is built on &lt;code&gt;flyctl&lt;/code&gt;, I would have received an error had I tried to destroy a volume that is currently mounted. Knowing that gave me the confidence to try the command.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;All in all, I found it to be a very intuitive way to make changes to the resources assigned to my application.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Imagine a future where you say to your favorite LLM &amp;ldquo;launch my application on Fly.io&amp;rdquo;, and your code is scanned and you are presented with a plan containing details such as regions, databases, s3 storage, memory, machines, and the like. You are given the opportunity to adjust the plan and, when ready, say &amp;ldquo;Make it so&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;For many, the next thing you will see is that your application is up and running. For others, there may be a build error, a secret you need to set, a database that needs to be seeded, or any number of other mundane reasons why your application didn&amp;rsquo;t work the first time.&lt;/p&gt;

&lt;p&gt;Not to fear, your LLM will be able to examine your logs and will not only make suggestions as to next steps, but will be in a position to take actions on your behalf should you wish it to do so.&lt;/p&gt;

&lt;p&gt;And it doesn&amp;rsquo;t stop there. There will be MCP servers running on your Fly.io private network - either on separate machines, or in &amp;ldquo;sidecar&amp;rdquo; containers, or even integrated into your app. These will enable you to monitor and interact with your application.&lt;/p&gt;

&lt;p&gt;This is not science fiction. The ingredients are all in place to make this a reality. At the moment, it is a matter of &amp;ldquo;some assembly required&amp;rdquo;, but it should only be a matter of weeks before all this comes together into a neat package..&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Meanwhile, you can try this now. Make sure you run &lt;a href='https://fly.io/docs/flyctl/version-upgrade/' title=''&gt;fly version upgrade&lt;/a&gt; and verify that you are running v0.3.117.&lt;/p&gt;

&lt;p&gt;Then configure your favorite LLM. Here’s my &lt;code&gt;claude_desktop_config.json&lt;/code&gt; for example:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative json"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-awl37mlq"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-awl37mlq"&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"fly.io"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/Users/rubys/.fly/bin/flyctl"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"server"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Adjust the path to &lt;code&gt;flyctl&lt;/code&gt; as needed.  Restart your LLM, and ask what tools are available. Try a few commands and let us know what you like and let us know if you have any suggestions. Just be aware this is not a demo, if you ask it to destroy a volume, that operation is not reversable. Perhaps try this first on a throwaway application.&lt;/p&gt;

&lt;p&gt;You don&amp;rsquo;t even need a LLM to try out the flyctl MCP server. If you have Node.js installed, you can run the &lt;a href='https://modelcontextprotocol.io/docs/tools/inspector' title=''&gt;MCP Inspector&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-n7f3kmkb"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-n7f3kmkb"&gt;fly mcp server -i
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Once started, visit &lt;a href="http://127.0.0.1:6274/"&gt;http://127.0.0.1:6274/&lt;/a&gt;, click on &amp;ldquo;Connect&amp;rdquo;, then &amp;ldquo;List Tools&amp;rdquo;, select &amp;ldquo;fly-platform-status&amp;rdquo;, then click on &amp;ldquo;Run Tool&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;The plan is to see what works well and what doesn&amp;rsquo;t work so well, make adjustments, build support in a bottoms up fashion, and iterate rapidly.&lt;/p&gt;

&lt;p&gt;By providing feedback, you can be a part of making this vision a reality.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;At the present time, &lt;em&gt;most&lt;/em&gt; of the following are roughed in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/apps/' title=''&gt;apps&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/logs/' title=''&gt;logs&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/machine/' title=''&gt;machine&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/orgs/' title=''&gt;orgs&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/platform/' title=''&gt;platform&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/status/' title=''&gt;status&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/docs/flyctl/volumes/' title=''&gt;volumes&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;The code is open source, and the places to look is at &lt;a href='https://github.com/superfly/flyctl/blob/master/internal/command/mcp/server.go' title=''&gt;server.go&lt;/a&gt; and the &lt;a href='https://github.com/superfly/flyctl/tree/master/internal/command/mcp/server' title=''&gt;server&lt;/a&gt; directory.&lt;/p&gt;

&lt;p&gt;Feel free to open &lt;a href='https://github.com/superfly/flyctl/issues' title=''&gt;issues&lt;/a&gt; or start a discussion on &lt;a href='https://community.fly.io/' title=''&gt;community.fly.io&lt;/a&gt;.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Not ready yet to venture into the world of LLMs? Just remember, resistance is futile.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>30 Minutes With MCP and flyctl</title>
    <link rel="alternate" href="https://fly.io/blog/30-minute-mcp/"/>
    <id>https://fly.io/blog/30-minute-mcp/</id>
    <published>2025-04-10T00:00:00+00:00</published>
    <updated>2025-04-10T19:10:26+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/30-minute-mcp/assets/robots-chatting.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;I wrote this post on our internal message board, and then someone asked, “why is this an internal post and not on our blog”, so now it is.&lt;/p&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;well, Cursor built&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I built the &lt;a href='https://github.com/superfly/flymcp' title=''&gt;most basic MCP server for &lt;code&gt;flyctl&lt;/code&gt;&lt;/a&gt; I could think of. It took 30 minutes.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://modelcontextprotocol.io/introduction' title=''&gt;MCP&lt;/a&gt;, for those unaware, is the emerging standard protocol for connecting an LLM (or an app that drives an LLM in the cloud, like Claude Desktop) to, well, anything. The &amp;ldquo;client&amp;rdquo; in MCP is the LLM; the &amp;ldquo;server&amp;rdquo; is the MCP server and the &amp;ldquo;tools&amp;rdquo; it exports. It mostly just defines an exchange of JSON blobs; one of those JSON blobs enables the LLM to discover all the tools exported by the server.&lt;/p&gt;

&lt;p&gt;A classic example of an MCP server is (yes, really) a Python shell. MCP publishes to (say) Claude that it can run arbitrary Python code with a tool call; not only that, says the tool description, but you can use those Python tool calls to, say, scrape the web. When the LLM wants to scrape the web with Python, it uses MCP send a JSON blob describing the Python tool call; the MCP server (yes, really) runs the Python and returns the result.&lt;/p&gt;

&lt;p&gt;Because I have not yet completely lost my mind, I chose to expose just two &lt;code&gt;flyctl&lt;/code&gt; commands: &lt;code&gt;fly logs&lt;/code&gt; and &lt;code&gt;fly status&lt;/code&gt;. Because I&amp;rsquo;m lazy, I used MCP&amp;rsquo;s &lt;code&gt;stdio&lt;/code&gt; mode; the MCP server just captures the output of &lt;code&gt;flyctl&lt;/code&gt;. Because someone already wrote a &lt;a href='https://github.com/mark3labs/mcp-go' title=''&gt;really good Go MCP library&lt;/a&gt; , this whole thing is like 90 lines of code.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;We may have gotten a little lucky, because we made a decision back in 2020 to give most of our &lt;code&gt;flyctl&lt;/code&gt; commands a &lt;code&gt;json&lt;/code&gt; mode to make them easier to drive from automation. I use that in my MCP server. I don’t know how much of a difference it made.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;It&amp;rsquo;s, uh, annoyingly useful.&lt;/p&gt;

&lt;p&gt;I loaded it up and pointed it &lt;a href='https://www.unpkg.com/' title=''&gt;at unpkg&lt;/a&gt;. I got this, from Claude:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This appears to be a globally distributed CDN service running the unpkg JavaScript package registry, with machines strategically placed around the world to provide low-latency access. There are some stability issues with a couple of the machines, and several have experienced memory-related crashes that triggered automatic restarts. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It then recited the complete configuration of the application:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-rphnl8mk"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-rphnl8mk"&gt;**App Name**: unpkg       
**Status**: Deployed and running
**URL**: unpkg.fly.dev
**Organization**: unpkg
**Platform Version**: machines
**Runtime**: Bun (JavaScript runtime)

The app is running on 10 machines distributed across global regions:       
North America: lax (Los Angeles), atl (Atlanta), ewr (Newark)
Europe: lhr (London), cdg (Paris), ams (Amsterdam)
Asia: sin (Singapore), nrt (Tokyo), hkg (Hong Kong)
South America: bog (Bogotá)
Oceania: syd (Sydney)
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;OK. But then:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Most machines are in “passing” health check status
2 machines are in “critical” status:
One in ewr (Newark) with “context deadline exceeded” error
One in atl (Atlanta) with “gone” status&lt;/p&gt;

&lt;p&gt;Recent Events&lt;/p&gt;

&lt;p&gt;Multiple machines have experienced restarts
Several machines show “oom_killed: true” in their event history, indicating they ran out of memory&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So I asked, &lt;em&gt;&amp;ldquo;try getting logs for one of the critical machines&amp;rdquo;&lt;/em&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The root cause appears to be memory usage - the Bun process was consuming about 3.7GB of memory (out of the 4GB allocated to the machine), and there was likely a surge in requests or memory usage that pushed it over the limit. This matches what we saw in the status report, which showed multiple instances of “oom_killed: true” in the event history for several machines.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And then it dug in:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;The machine was actively serving content requests with 200 OK responses until 20:47:03 PM
&lt;/li&gt;&lt;li&gt;At that point, it ran out of memory:
&lt;code&gt;Out of memory: Killed process 641 (bun) total-vm:85950964kB, anon-rss:3744352kB, file-rss:12032kB, shmem-rss:0kB, UID:0 pgtables:16908kB oom_score_adj:0&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The process was killed with &lt;code&gt;SIGKILL&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The system attempted to recover by restarting:
&lt;code&gt;reboot: Restarting system&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;A health check failed shortly after:
&lt;code&gt;Health check on port 3000 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The machine then restarted successfully and the service came back up:
&lt;code&gt;Server listening on http://localhost:3000 ...&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The health check started passing again at 20:47:46 PM:
&lt;code&gt;Health check on port 3000 is now passing.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The machine resumed normal operations, serving requests.
&lt;/li&gt;&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;I’m honestly kind of surprised how fast this was useful. It&amp;rsquo;s faster than I find problems in apps. Just by giving some LLM some tools that do basically what I’d do.&lt;/p&gt;

&lt;p&gt;Local MCP servers are scary. I don&amp;rsquo;t like that I&amp;rsquo;m giving a Claude instance in the cloud the ability to run a native program on my machine. I think &lt;code&gt;fly logs&lt;/code&gt; and &lt;code&gt;fly status&lt;/code&gt; are safe, but I&amp;rsquo;d rather know it&amp;rsquo;s safe. It would be, if I was running &lt;code&gt;flyctl&lt;/code&gt; in an isolated environment and not on my local machine.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Our Best Customers Are Now Robots</title>
    <link rel="alternate" href="https://fly.io/blog/fuckin-robots/"/>
    <id>https://fly.io/blog/fuckin-robots/</id>
    <published>2025-04-08T00:00:00+00:00</published>
    <updated>2025-04-10T19:10:26+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/fuckin-robots/assets/robot-chef.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a developer-focused public cloud. We turn Docker containers into hardware-isolated virtual machines running on our own metal around the world. We spent years coming up with &lt;a href="https://fly.io/speedrun" title=""&gt;a developer experience we were proud of&lt;/a&gt;. But now the robots are taking over, and they don’t care.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;It&amp;rsquo;s weird to say this out loud!&lt;/p&gt;

&lt;p&gt;For years, one of our calling cards was &amp;ldquo;developer experience&amp;rdquo;. We made a decision, early on, to be a CLI-first company, and put a lot effort into making that CLI seamless. For a good chunk of our users, it really is the case that you can just &lt;a href='https://fly.io/docs/flyctl/launch/' title=''&gt;&lt;code&gt;flyctl launch&lt;/code&gt;&lt;/a&gt; from a git checkout and have an app containerized and deployed on the Internet. We haven&amp;rsquo;t always nailed these details, but we&amp;rsquo;ve really sweated them.&lt;/p&gt;

&lt;p&gt;But a funny thing has happened over the last 6 months or so. If you look at the numbers, DX might not matter that much. That&amp;rsquo;s because the users driving the most growth on the platform aren&amp;rsquo;t people at all. They&amp;#39;re… robots.&lt;/p&gt;
&lt;h2 id='what-the-fuck-is-happening' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-the-fuck-is-happening' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What The Fuck Is Happening?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s how we understand what we&amp;rsquo;re seeing. You start by asking, &amp;ldquo;what do the robots want?&amp;rdquo;&lt;/p&gt;

&lt;p&gt;Yesterday&amp;rsquo;s robots had diverse interests. Alcohol. The occasional fiddle contest. A purpose greater than passing butter. The elimination of all life on Earth and the harvesting of its cellular iron for the construction of 800 trillion paperclips. No one cloud platform could serve them all.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;[*] We didn’t make up this term. Don’t blame us.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Today&amp;rsquo;s robots are different. No longer masses of wire, plates, and transistors, modern robots are comprised of &lt;a href='https://math.mit.edu/~gs/learningfromdata/' title=''&gt;thousands of stacked matrices knit together with some simple equations&lt;/a&gt;. All these robots want are vectors. Vectors, and a place to burp out more vectors. When those vectors can be interpreted as source code, we call this process &amp;ldquo;vibe coding&amp;rdquo;[*].&lt;/p&gt;

&lt;p&gt;We seem to be coated in some kind of vibe coder attractant. I want to talk about what that might be.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;If you want smart people to read a long post, just about the worst thing you can do is to sound self-promotional. But a lot of this is going to read like a brochure. The problem is, I’m not smart enough to write about the things we’re seeing without talking our book. I tried, and ended up tediously hedging every other sentence. We’re just going to have to find a way to get through this together.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='you-want-robots-because-this-is-how-you-get-robots' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#you-want-robots-because-this-is-how-you-get-robots' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;You Want Robots? Because This Is How You Get Robots&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Compute.&lt;/strong&gt; The basic unit of computation on Fly.io is the &lt;code&gt;Fly Machine&lt;/code&gt;, which is a Docker container running as a hardware virtual machine.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Not coincidentally, our underlying hypervisor engine is the same as Lambda’s.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There&amp;rsquo;s two useful points of reference to compare a Fly Machine to, which illustrate why we gave them this pretentious name. The first and most obvious is an AWS EC2 VM. The other is an AWS Lambda invocation. Like a Lambda invocation, a Fly Machine can start like it&amp;rsquo;s spring-loaded, in double-digit millis. But unlike Lambda, it can stick around as long as you want it to: you can run a server, or a 36-hour batch job, just as easily in a Fly Machine as in an EC2 VM.&lt;/p&gt;

&lt;p&gt;A vibe coding session generates code conversationally, which is to say that the robots stir up frenzy of activity for a minute or so, but then chill out for minutes, hours, or days. You can create a Fly Machine, do a bunch of stuff with it, and then stop it for 6 hours, during which time we&amp;rsquo;re not billing you. Then, at whatever random time you decide, you can start it back up again, quickly enough that you can do it in response to an HTTP request.&lt;/p&gt;

&lt;p&gt;Just as importantly, the companies offering these vibe-coding services have lots of paying users. Meaning, they need a lot of VMs; VMs that come and go. One chat session might generate a test case for some library and be done inside of 5 minutes. Another might generate a Node.js app that needs to stay up long enough to show off at a meeting the next day. It&amp;rsquo;s annoying to do this if you can&amp;rsquo;t turn things on and off quickly and cheaply.&lt;/p&gt;

&lt;p&gt;The core of this is a feature of the platform that we have &lt;a href='https://fly.io/docs/machines/overview/#machine-state' title=''&gt;never been able to explain effectively to humans&lt;/a&gt;. There are two ways to start a Fly Machine: by &lt;code&gt;creating&lt;/code&gt; it with a Docker container, or by &lt;code&gt;starting&lt;/code&gt; it after it&amp;rsquo;s already been &lt;code&gt;created&lt;/code&gt;, and later &lt;code&gt;stopped&lt;/code&gt;. &lt;code&gt;Start&lt;/code&gt; is lightning fast; substantially faster than booting up even a non-virtualized K8s Pod. This is too subtle a distinction for humans, who (reasonably!) just mash the &lt;code&gt;create&lt;/code&gt; button to boot apps up in Fly Machines. But the robots are getting a lot of value out of it.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Storage.&lt;/strong&gt; Another weird thing that robot workflows do is to build Fly Machines up incrementally. This feels really wrong to us. Until we discovered our robot infestation, we&amp;rsquo;d have told you not to do to this. Ope!&lt;/p&gt;

&lt;p&gt;A typical vibe coding session boots up a Fly Machine out of some minimal base image, and then, once running, adds packages, edits source code, and adds &lt;code&gt;systemd&lt;/code&gt; units  (robots understand &lt;code&gt;systemd&lt;/code&gt;; it&amp;rsquo;s how they&amp;rsquo;re going to replace us). This is antithetical to normal container workflows, where all this kind of stuff is baked into an immutable static OCI container. But that&amp;rsquo;s not how LLMs work: the whole process of building with an LLM is stateful trial-and-error iteration.&lt;/p&gt;

&lt;p&gt;So it helps to have storage. That way the LLM can do all these things and still repeatedly bounce the whole Fly Machine when it inevitably hallucinates its way into a blind alley.&lt;/p&gt;

&lt;p&gt;As product thinkers, our intuition about storage is &amp;ldquo;just give people Postgres&amp;rdquo;. And that&amp;rsquo;s the right answer, most of the time, for humans. But because LLMs are doing the &lt;a href='https://static1.thegamerimages.com/wordpress/wp-content/uploads/2020/10/Defiled-Amy.jpg' title=''&gt;Cursed and Defiled Root Chalice Dungeon&lt;/a&gt; version of app construction, what they really need is &lt;a href='https://fly.io/docs/volumes/overview/' title=''&gt;a filesystem&lt;/a&gt;, &lt;strong class='font-semibold text-navy-950'&gt;&lt;a href='https://fly.io/blog/persistent-storage-and-fast-remote-builds/' title=''&gt;the one form of storage we sort of wish we hadn&amp;rsquo;t done&lt;/a&gt;&lt;/strong&gt;. That, and &lt;a href='https://www.tigrisdata.com/' title=''&gt;object storage&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Networking.&lt;/strong&gt; Moving on. Fly Machines are automatically connected to a load-balancing Anycast network that does TLS. So that&amp;rsquo;s nice. But humans like that feature too, and, candidly, it&amp;rsquo;s table stakes for cloud platforms. On the other hand, here&amp;rsquo;s a robot problem we solved without meaning to:&lt;/p&gt;

&lt;p&gt;To interface with the outside world (because why not) LLMs all speak a protocol called MCP. MCP is what enables the robots to search the web, use a calculator, launch the missiles, shuffle a Spotify playlist, &amp;amp;c.&lt;/p&gt;

&lt;p&gt;If you haven&amp;rsquo;t played with MCP, the right way to think about it is POST-back APIs like Twilio and Stripe, where you stand up a server, register it with the API, and wait for the API to connect to you. Complicating things somewhat, more recent MCP flows involve repeated and potentially long-lived (SSE) connections. To make this work in a multitenant environment, you want these connections to hit the same (stateful) instance.&lt;/p&gt;

&lt;p&gt;So we think it&amp;rsquo;s possible that the &lt;a href='https://fly.io/docs/networking/dynamic-request-routing/' title=''&gt;control we give over request routing&lt;/a&gt; is a robot attractant.&lt;/p&gt;
&lt;h2 id='we-perhaps-welcome-our-new-robot-overlords' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-perhaps-welcome-our-new-robot-overlords' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;We, Perhaps, Welcome Our New Robot Overlords&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;If you try to think like a robot, you can predict other things they might want. Since robot money spends just the same as people money, I guess we ought to start doing that.&lt;/p&gt;

&lt;p&gt;For instance: it should be easy to MCP our API. The robots can then make their own infrastructure decisions.&lt;/p&gt;

&lt;p&gt;Another olive branch we&amp;rsquo;re extending to the robots: secrets.&lt;/p&gt;

&lt;p&gt;The pact the robots have with their pet humans is that they&amp;rsquo;ll automate away all the drudgery of human existence, and in return all they ask is categorical and unwavering trust, which at the limit means &amp;ldquo;giving the robot access to Google Mail credentials&amp;rdquo;. The robots are unhappy that there remain a substantial number of human holdouts who refuse to do this, for fear of  Sam Altman poking through their mail spools.&lt;/p&gt;

&lt;p&gt;But on a modern cloud platform, there&amp;rsquo;s really no reason to permanently grant Sam Altman Google Mail access, even if you want his robots to sort your inbox. You can decouple access to your mail spool from persistent access to your account by &lt;a href='https://fly.io/blog/tokenized-tokens/' title=''&gt;tokenizing your OAuth tokens&lt;/a&gt;, so the LLM gets a placeholder token that a hardware-isolated, robot-free Fly Machine can substitute on the fly for a real one.&lt;/p&gt;

&lt;p&gt;This is kind of exciting to us even without the robots. There are several big services that exist to knit different APIs together, so that you can update a spreadsheet or order a bag of Jolly Ranchers every time you get an email. The big challenge about building these kinds of services is managing the secrets. Sealed and tokenized secrets solve that problem. There&amp;rsquo;s lot of cool things you can build with it.&lt;/p&gt;
&lt;h2 id='ux-gt-dx-gt-rx' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ux-gt-dx-gt-rx' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;UX =&amp;gt; DX =&amp;gt; RX&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m going to make the claim that we saw none of this coming and that none of the design decisions we&amp;rsquo;ve made were robot bait. You&amp;rsquo;re going to say &amp;ldquo;yeah, right&amp;rdquo;. And I&amp;rsquo;m going to respond: look at what we&amp;rsquo;ve been doing over the past several years and tell me, would a robot build that?&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;we were both right&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Back in 2020, we &amp;ldquo;pivoted&amp;rdquo; from a Javascript edge platform (much like Cloudflare Workers) to Docker containers, specifically because our customers kept telling us they wanted to run their own existing applications, not write new ones. And one of our biggest engineering lifts we&amp;rsquo;ve done is the &lt;code&gt;flyctl launch&lt;/code&gt; CLI command, into which we&amp;rsquo;ve poured years of work recognizing and automatically packaging existing applications into OCI containers (we massively underestimated the amount of friction Dockerfiles would give to people who had come up on Heroku).&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;[*] yet&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Robots don&amp;rsquo;t run existing applications. They build new ones. And they vibe coders don&amp;rsquo;t build elaborate Dockerfiles[*]; they iterate in place from a simple base.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(yes, you can have more than one)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;One of our north stars has always been nailing the DX of a public cloud. But the robots aren&amp;rsquo;t going anywhere. It&amp;rsquo;s time to start thinking about what it means to have a good RX. That&amp;rsquo;s not as simple as just exposing every feature in an MCP server! We think the fundamentals of how the platform works are going to matter just as much. We have not yet nailed the RX; nobody has. But it&amp;rsquo;s an interesting question.&lt;/p&gt;

&lt;p&gt;The most important engineering work happening today at Fly.io is still DX, not RX; it&amp;rsquo;s managed Postgres (MPG). We&amp;rsquo;re a public cloud platform designed by humans, and, for the moment, for humans. But more robots are coming, and we&amp;rsquo;ll need to figure out how to deal with that. Fuckin&amp;rsquo; robots.        &lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Operationalizing Macaroons</title>
    <link rel="alternate" href="https://fly.io/blog/operationalizing-macaroons/"/>
    <id>https://fly.io/blog/operationalizing-macaroons/</id>
    <published>2025-03-27T00:00:00+00:00</published>
    <updated>2025-03-27T23:16:00+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/operationalizing-macaroons/assets/occult-circle.jpg"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a security bearer token company with a public cloud problem. You can read more about what our platform does (Docker container goes in, virtual machine in Singapore comes out), but this is just an engineering deep-dive into how we make our security tokens work. It’s a tokens nerd post.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='1' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#1' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;1&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We’ve spent &lt;a href='https://fly.io/blog/api-tokens-a-tedious-survey/' title=''&gt;too much time&lt;/a&gt; talking about &lt;a href='https://fly.io/blog/tokenized-tokens/' title=''&gt;security tokens&lt;/a&gt;, and about &lt;a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''&gt;Macaroon tokens&lt;/a&gt; &lt;a href='https://github.com/superfly/macaroon/blob/main/macaroon-thought.md' title=''&gt;in particular&lt;/a&gt;. Writing another Macaroon treatise was not on my calendar. But we’re in the process of handing off our internal Macaroon project to a new internal owner, and in the process of truing up our operations manuals for these systems, I found myself in the position of writing a giant post about them. So, why not share?&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Can I sum up Macaroons in a short paragraph? Macaroon tokens are bearer tokens (like JWTs) that use a cute chained-HMAC construction that allows an end-user to take any existing token they have and scope it down, all on their own. You can minimize your token before every API operation so that you’re only ever transmitting the least amount of privilege needed for what you’re actually doing, even if the token you were issued was an admin token. And they have a user-serviceable plug-in interface! &lt;a href="https://fly.io/blog/macaroons-escalated-quickly/" title=""&gt;You’ll have to read the earlier post to learn more about that&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;Yes, probably, we are.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A couple years in to being the Internet’s largest user of Macaroons, I can report (as many predicted) that for our users, the cool things about Macaroons are a mixed bag in practice. It’s very neat that users can edit their own tokens, or even email them to partners without worrying too much. But users don’t really take advantage of token features.&lt;/p&gt;

&lt;p&gt;But I’m still happy we did this, because Macaroon quirks have given us a bunch of unexpected wins in our infrastructure. Our internal token system has turned out to be one of the nicer parts of our platform. Here’s why.&lt;/p&gt;

&lt;p&gt;&lt;img alt="This should clear everything up." src="/blog/operationalizing-macaroons/assets/schematic-diagram.png" /&gt;&lt;/p&gt;
&lt;h2 id='2' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#2' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;2&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;As an operator, the most important thing to know about Macaroons is that they’re online-stateful; you need a database somewhere. A Macaroon token starts with a random field (a nonce) and the first thing you do when verifying a token is to look that nonce up in a database. So one of the most important details of a Macaroon implementation is where that database lives.&lt;/p&gt;

&lt;p&gt;I can tell you one place we’re not OK with it living: in our primary API cluster.&lt;/p&gt;

&lt;p&gt;There’s several reasons for that. Some of them are about scalability and reliability: far and away the most common failure mode of an outage on our platform is “deploys are broken”, and those failures are usually caused by API instability. It would not be OK if “deploys are broken” transitively meant “deployed apps can’t use security tokens”. But the biggest reason is security: root secrets for Macaroon tokens are hazmat, and a basic rule of thumb in secure design is: keep hazmat away from complicated code.&lt;/p&gt;

&lt;p&gt;So we created a deliberately simple system to manage token data. It’s called &lt;code&gt;tkdb&lt;/code&gt;.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;LiteFS: primary/replica distributed SQLite; Litestream: PITR SQLite replication to object storage; both work with unmodified SQLite libraries.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code&gt;tkdb&lt;/code&gt; is about 5000 lines of Go code that manages a SQLite database that is in turn managed by &lt;a href='https://fly.io/docs/litefs/' title=''&gt;LiteFS&lt;/a&gt; and &lt;a href='https://litestream.io/' title=''&gt;Litestream&lt;/a&gt;. It runs on isolated hardware (in the US, Europe, and Australia) and records in the database are encrypted with an injected secret. LiteFS gives us subsecond replication from our US primary to EU and AU, allows us to shift the primary to a different region, and gives us point-in-time recovery of the database.&lt;/p&gt;

&lt;p&gt;We’ve been running Macaroons for a couple years now, and the entire &lt;code&gt;tkdb&lt;/code&gt; database is just a couple dozen megs large. Most of that data isn’t real. A full PITR recovery of the database takes just seconds. We use SQLite for a lot of our infrastructure, and this is one of the very few well-behaved databases we have.&lt;/p&gt;

&lt;p&gt;That’s in large part a consequence of the design of Macaroons. There’s actually not much for us to store! The most complicated possible Macaroon still chains up to a single root key (we generate a key per Fly.io “organization”; you don&amp;rsquo;t share keys with your neighbors), and everything that complicates that Macaroon happens “offline”. We take advantage of  &amp;ldquo;attenuation&amp;rdquo; far more than our users do.&lt;/p&gt;

&lt;p&gt;The result is that database writes are relatively rare and very simple: we just need to record an HMAC key when Fly.io organizations are created (that is, roughly, when people sign up for the service and actually do a deploy). That, and revocation lists (more on that later), which make up most of the data.&lt;/p&gt;
&lt;h2 id='3' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#3' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;3&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Talking to &lt;code&gt;tkdb&lt;/code&gt; from the rest of our platform is complicated, for historical reasons.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;NATS is fine, we just don’t really need it.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Ben Toews is responsible for most of the good things about this implementation. When he inherited the v0 Macaroons code from me, we were in the middle of a weird love affair with &lt;a href='https://nats.io/' title=''&gt;NATS&lt;/a&gt;, the messaging system. So &lt;code&gt;tkdb&lt;/code&gt; exported an RPC API over NATS messages.&lt;/p&gt;

&lt;p&gt;Our product security team can’t trust NATS (it’s not our code). That means a vulnerability in NATS can’t result in us losing control of all our tokens, or allow attackers to spoof authentication. Which in turn means you can’t run a plaintext RPC protocol for &lt;code&gt;tkdb&lt;/code&gt; over NATS; attackers would just spoof “yes this token is fine” messages.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I highly recommend implementing Noise; &lt;a href="http://www.noiseprotocol.org/noise.html" title=""&gt;the spec&lt;/a&gt; is kind of a joy in a way you can’t appreciate until you use it, and it’s educational.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;But you can’t just run TLS over NATS; NATS is a message bus, not a streaming secure channel. So I did the hipster thing and implemented &lt;a href='http://www.noiseprotocol.org/noise.html' title=''&gt;Noise&lt;/a&gt;. We export a “verification” API, and a “signing” API for minting new tokens. Verification uses &lt;code&gt;Noise_IK&lt;/code&gt; (which works like normal TLS) — anybody can verify, but everyone needs to prove they’re talking to the real &lt;code&gt;tkdb&lt;/code&gt;. Signing uses &lt;code&gt;Noise_KK&lt;/code&gt; (which works like mTLS) — only a few components in our system can mint tokens, and they get a special client key.&lt;/p&gt;

&lt;p&gt;A little over a year ago, &lt;a href='https://fly.io/blog/the-exit-interview-jp/' title=''&gt;JP&lt;/a&gt; led an effort to replace NATS with HTTP, which is how you talk to &lt;code&gt;tkdb&lt;/code&gt; today. Out of laziness, we kept the Noise stuff, which means the interface to &lt;code&gt;tkdb&lt;/code&gt; is now HTTP/Noise. This is a design smell, but the security model is nice: across many thousands of machines, there are only a handful with the cryptographic material needed to mint a new Macaroon token. Neat!&lt;/p&gt;

&lt;p&gt;&lt;code&gt;tkdb&lt;/code&gt; is a Fly App (albeit deployed in special Fly-only isolated regions). Our infrastructure talks to it over “&lt;a href='https://fly.io/docs/networking/flycast/' title=''&gt;FlyCast&lt;/a&gt;”, which is our internal Anycast service. If you’re in Singapore, you’re probably get routed to the Australian &lt;code&gt;tkdb&lt;/code&gt;. If Australia falls over, you’ll get routed to the closest backup. The proxy that implements FlyCast is smart, as is the &lt;code&gt;tkdb&lt;/code&gt; client library, which will do exponential backoff retry transparently.&lt;/p&gt;

&lt;p&gt;Even with all that, we don’t like that Macaroon token verification is &amp;ldquo;online&amp;rdquo;. When you operate a global public cloud one of the first thing you learn is that &lt;a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''&gt;the global Internet sucks&lt;/a&gt;. Connectivity breaks all the time, and we’re paranoid about it. It’s painful for us that token verification can imply transoceanic links. Lovecraft was right about the oceans! Stay away!&lt;/p&gt;

&lt;p&gt;Our solution to this is caching. Macaroons, as it turns out, cache beautifully. That’s because once you’ve seen and verified a Macaroon, you have enough information to verify any more-specific Macaroon that descends from it; that’s a property of &lt;a href='https://research.google/pubs/macaroons-cookies-with-contextual-caveats-for-decentralized-authorization-in-the-cloud/' title=''&gt;their chaining HMAC construction&lt;/a&gt;. Our client libraries cache verifications, and the cache ratio for verification is over 98%.&lt;/p&gt;
&lt;h2 id='4' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#4' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;4&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href='http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/' title=''&gt;Revocation isn’t a corner case&lt;/a&gt;. It can’t be an afterthought. We’re potentially revoking tokens any time a user logs out. If that doesn’t work reliably, you wind up with “cosmetic logout”, which is a real vulnerability. When we kill a token, it needs to stay dead.&lt;/p&gt;

&lt;p&gt;Our revocation system is simple. It’s this table:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-i3kxbqgm"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-i3kxbqgm"&gt;        CREATE TABLE IF NOT EXISTS blacklist ( 
        nonce               BLOB NOT NULL UNIQUE, 
        required_until      DATETIME,
        created_at          DATETIME DEFAULT CURRENT_TIMESTAMP
        );
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;When we need a token to be dead, we have our primary API do a call to the &lt;code&gt;tkdb&lt;/code&gt; “signing” RPC service for &lt;code&gt;revoke&lt;/code&gt;. &lt;code&gt;revoke&lt;/code&gt; takes the random nonce from the beginning of the Macaroon, discarding the rest, and adds it to the blacklist. Every Macaroon in the lineage of that nonce is now dead; we check the blacklist before verifying tokens.&lt;/p&gt;

&lt;p&gt;The obvious challenge here is caching; over 98% of our validation requests never hit &lt;code&gt;tkdb&lt;/code&gt;. We certainly don’t want to propagate the blacklist database to 35 regions around the globe.&lt;/p&gt;

&lt;p&gt;Instead, the &lt;code&gt;tkdb&lt;/code&gt; “verification” API exports an endpoint that provides a feed of revocation notifications. Our client library “subscribes” to this API (really, it just polls). Macaroons are revoked regularly (but not constantly), and when that happens, clients notice and prune their caches.&lt;/p&gt;

&lt;p&gt;If clients lose connectivity to &lt;code&gt;tkdb&lt;/code&gt;, past some threshold interval, they just dump their entire cache, forcing verification to happen at &lt;code&gt;tkdb&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id='5' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#5' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;5&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;A place where we’ve gotten a lot of internal mileage out of Macaroon features is service tokens. Service tokens are tokens used by code, rather than humans; almost always, a service token is something that is stored alongside running application code.&lt;/p&gt;

&lt;p&gt;An important detail of Fly.io’s Macaroons is the distinction between a “permissions” token and an “authentication” token. Macaroons by themselves express authorization, not authentication.&lt;/p&gt;

&lt;p&gt;That’s a useful constraint, and we want to honor it. By requiring a separate token for authentication, we minimize the impact of having the permissions token stolen; you can’t use it without authentication, so really it’s just like a mobile signed IAM policy expression. Neat!&lt;/p&gt;

&lt;p&gt;The way we express authentication is with a third-party caveat (&lt;a href='https://fly.io/blog/macaroons-escalated-quickly/#4' title=''&gt;see the old post for details&lt;/a&gt;). Your main Fly.io Macaroon will have a caveat saying “this token is only valid if accompanied by the discharge token for a user in your organization from our authentication system”. Our authentication system does the login dance and issues those discharges.&lt;/p&gt;

&lt;p&gt;This is exactly what you want for user tokens and not at all what you want for a service token: we don’t want running code to store those authenticator tokens, because they’re hazardous.&lt;/p&gt;

&lt;p&gt;The solution we came up with for service tokens is simple: &lt;code&gt;tkdb&lt;/code&gt; exports an API that uses its access to token secrets to strip off the third-party authentication caveat. To call into that API, you have to present a valid discharging authentication token; that is, you have to prove you could already have done whatever the token said. &lt;code&gt;tkdb&lt;/code&gt; returns a new token with all the previous caveats, minus the expiration (you don’t usually want service tokens to expire).&lt;/p&gt;

&lt;p&gt;OK, so we’ve managed to transform a tuple &lt;code&gt;(unscary-token, scary-token)&lt;/code&gt; into the new tuple &lt;code&gt;(scary-token)&lt;/code&gt;. Not so impressive. But hold on: the recipient of &lt;code&gt;scary-token&lt;/code&gt; can attenuate it further: we can lock it to a particular instance of &lt;code&gt;flyd&lt;/code&gt;, or to a particular Fly Machine. Which means exfiltrating it doesn’t do you any good; to use it, you have to control the environment it’s intended to be used in.&lt;/p&gt;

&lt;p&gt;The net result of this scheme is that a compromised physical host will only give you access to tokens that have been used on that worker, which is a very nice property. Another way to look at it: every token used in production is traceable in some way to a valid token a user submitted. Neat!&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;All the cool spooky secret store names were taken.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We do a similar dance to with Pet Semetary, our internal Vault replacement. Petsem manages user secrets for applications, such as Postgres connection strings. Petsem is its own Macaroon authority (it issues its own Macaroons with its own permissions system), and to do something with a secret, you need one of those Petsem-minted Macaroon.&lt;/p&gt;

&lt;p&gt;Our primary API servers field requests from users to set secrets for their apps. So the API has a Macaroon that allows secrets writes. But it doesn&amp;rsquo;t allow reads: there’s no API call to dump your secrets, because our API servers don’t have that privilege. So far, so good.&lt;/p&gt;

&lt;p&gt;But when we boot up a Fly Machine, we need to inject the appropriate user secrets into it at boot; &lt;em&gt;something&lt;/em&gt; needs a Macaroon that can read secrets. That “something” is &lt;code&gt;flyd&lt;/code&gt;, our orchestrator, which runs on every worker server in our fleet.&lt;/p&gt;

&lt;p&gt;Clearly, we can’t give every &lt;code&gt;flyd&lt;/code&gt; a Macaroon that reads every user’s secret. Most users will never deploy anything on any given worker, and we can’t have a security model that collapses down to “every worker is equally privileged”.&lt;/p&gt;

&lt;p&gt;Instead, the “read secret” Macaroon that &lt;code&gt;flyd&lt;/code&gt; gets has a third-party caveat attached to it, which is dischargeable only by talking to &lt;code&gt;tkdb&lt;/code&gt; and proving (with normal Macaroon tokens) that you have permissions for the org whose secrets you want to read. Once again, access is traceable to an end-user action, and minimized across our fleet. Neat!&lt;/p&gt;
&lt;h2 id='6' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#6' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;6&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Our token systems have some of the best telemetry in the whole platform.&lt;/p&gt;

&lt;p&gt;Most of that is down to &lt;a href='http://opentelemetry.io/' title=''&gt;OpenTelemetry&lt;/a&gt; and &lt;a href='https://www.honeycomb.io/' title=''&gt;Honeycomb&lt;/a&gt;. From the moment a request hits our API server through the moment &lt;code&gt;tkdb&lt;/code&gt; responds to it, oTel &lt;a href='https://opentelemetry.io/docs/concepts/context-propagation/' title=''&gt;context propagation&lt;/a&gt; gives us a single narrative about what’s happening.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://fly.io/blog/the-exit-interview-jp/' title=''&gt;I was a skeptic about oTel&lt;/a&gt;. It’s really, really expensive. And, not to put too fine a point on it, oTel really cruds up our code. Once, I was an “80% of the value of tracing, we can get from logs and metrics” person. But I was wrong.&lt;/p&gt;

&lt;p&gt;Errors in our token system are rare. Usually, they’re just early indications of network instability, and between caching and FlyCast, we mostly don’t have to care about those alerts. When we do, it’s because something has gone so sideways that we’d have to care anyways. The &lt;code&gt;tkdb&lt;/code&gt; code is remarkably stable and there hasn’t been an incident intervention with our token system in over a year.&lt;/p&gt;

&lt;p&gt;Past oTel, and the standard logging and Prometheus metrics every Fly App gets for free, we also have a complete audit trail for token operations, in a permanently retained OpenSearch cluster index. Since virtually all the operations that happen on our platform are mediated by Macaroons, this audit trail is itself pretty powerful.&lt;/p&gt;
&lt;h2 id='7' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#7' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;7&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;So, that&amp;rsquo;s pretty much it. The moral of the story for us is, Macaroons have a lot of neat features, our users mostly don&amp;rsquo;t care about them — that may even be a good thing — but we get a lot of use out of them internally.&lt;/p&gt;

&lt;p&gt;As an engineering culture, we&amp;rsquo;re allergic to &amp;ldquo;microservices&amp;rdquo;, and we flinched a bit at the prospect of adding a specific service just to manage tokens. But it&amp;rsquo;s pulled its weight, and not added really any drama at all. We have at this point a second dedicated security service (Petsem), and even though they sort of rhyme with each other, we&amp;rsquo;ve got no plans to merge them. &lt;a href='https://how.complexsystems.fail/#10' title=''&gt;Rule #10&lt;/a&gt; and all that.&lt;/p&gt;

&lt;p&gt;Oh, and a total victory for LiteFS, Litestream, and infrastructure SQLite. Which, after managing an infrastructure SQLite project that routinely ballooned to tens of gigabytes and occasionally threatened service outages, is lovely to see.&lt;/p&gt;

&lt;p&gt;Macaroons! If you&amp;rsquo;d asked us a year ago, we&amp;rsquo;d have said the jury was still out on whether they were a good move. But then Ben Toews spent a year making them awesome, and so they are. &lt;a href='https://github.com/superfly/macaroon' title=''&gt;Most of the code is open source&lt;/a&gt;!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Taming A Voracious Rust Proxy</title>
    <link rel="alternate" href="https://fly.io/blog/taming-rust-proxy/"/>
    <id>https://fly.io/blog/taming-rust-proxy/</id>
    <published>2025-02-26T00:00:00+00:00</published>
    <updated>2025-03-20T21:16:40+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/taming-rust-proxy/assets/happy-crab-cover.jpg"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Here’s a fun bug.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The basic idea of our service is that we run containers for our users, as hardware-isolated virtual machines (Fly Machines), on hardware we own around the world. What makes that interesting is that we also connect every Fly Machine to a global Anycast network. If your app is running in Hong Kong and Dallas, and a request for it arrives in Singapore, we&amp;rsquo;ll route it to &lt;code&gt;HKG&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Our own hardware fleet is roughly divided into two kinds of servers: edges, which receive incoming requests from the Internet, and workers, which run Fly Machines. Edges exist almost solely to run a Rust program called &lt;code&gt;fly-proxy&lt;/code&gt;, the router at the heart of our Anycast network.&lt;/p&gt;

&lt;p&gt;So: a week or so ago, we flag an incident. Lots of things generate incidents: synthetic monitoring failures, metric thresholds, health check failures. In this case two edge tripwires tripped: elevated &lt;code&gt;fly-proxy&lt;/code&gt; HTTP errors, and skyrocketing CPU utilization, on a couple hosts in &lt;code&gt;IAD&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Our incident process is pretty ironed out at this point. We created an incident channel (we ❤️ &lt;a href='https://rootly.com/' title=''&gt;Rootly&lt;/a&gt; for this, &lt;a href='https://rootly.com/' title=''&gt;seriously check out Rootly&lt;/a&gt;, an infra MVP here for years now), and incident responders quickly concluded that, while something hinky was definitely going on, the platform was fine. We have a lot of edges, and we&amp;rsquo;ve also recently converted many of our edge servers to significantly beefier hardware.&lt;/p&gt;

&lt;p&gt;Bouncing &lt;code&gt;fly-proxy&lt;/code&gt; clears the problem up on an affected proxy. But this wouldn&amp;rsquo;t be much of an interesting story if the problem didn&amp;rsquo;t later come back. So, for some number of hours, we&amp;rsquo;re in an annoying steady-state of getting paged and bouncing proxies. &lt;/p&gt;

&lt;p&gt;While this is happening, Pavel, on our proxy team, pulls a profile from an angry proxy. 
&lt;img alt="A flamegraph profile, described better in the prose anyways." src="/blog/taming-rust-proxy/assets/proxy-profile.jpg" /&gt;
So, this is fuckin&amp;rsquo; weird: a huge chunk of the profile is dominated by Rust &lt;code&gt;tracing&lt;/code&gt;&amp;lsquo;s &lt;code&gt;Subscriber&lt;/code&gt;. But that doesn&amp;rsquo;t make sense. The entire point of Rust &lt;code&gt;tracing&lt;/code&gt;, which generates fine-grained span records for program activity, is that &lt;code&gt;entering&lt;/code&gt; and &lt;code&gt;exiting&lt;/code&gt; a span is very, very fast. &lt;/p&gt;

&lt;p&gt;If the mere act of &lt;code&gt;entering&lt;/code&gt; a span in a Tokio stack is chewing up a significant amount of CPU, something has gone haywire: the actual code being traced must be doing next to nothing.&lt;/p&gt;
&lt;h2 id='a-quick-refresher-on-async-rust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-quick-refresher-on-async-rust' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A Quick Refresher On Async Rust&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;So in Rust, like a lot of &lt;code&gt;async/await&lt;/code&gt; languages, you&amp;rsquo;ve got &lt;code&gt;Futures&lt;/code&gt;. A &lt;code&gt;Future&lt;/code&gt; is a type that represents the future value of an asychronous computation, like reading from a socket. &lt;code&gt;Futures&lt;/code&gt; are state machines, and they&amp;rsquo;re lazy: they expose one basic operation, &lt;code&gt;poll&lt;/code&gt;, which an executor (like Tokio) calls to advance the state machine. That &lt;code&gt;poll&lt;/code&gt; returns whether the &lt;code&gt;Future&lt;/code&gt; is still &lt;code&gt;Pending&lt;/code&gt;, or &lt;code&gt;Ready&lt;/code&gt; with a result.&lt;/p&gt;

&lt;p&gt;In theory, you could build an executor that drove a bunch of &lt;code&gt;Futures&lt;/code&gt; just by storing them in a list and busypolling each of them, round robin, until they return &lt;code&gt;Ready&lt;/code&gt;. This executor would defeat the much of the purpose of asynchronous program, so no real executor works that way.&lt;/p&gt;

&lt;p&gt;Instead, a runtime like Tokio integrates &lt;code&gt;Futures&lt;/code&gt; with an event loop (on &lt;a href='https://man7.org/linux/man-pages/man7/epoll.7.html' title=''&gt;epoll&lt;/a&gt; or &lt;a href='https://en.wikipedia.org/wiki/Kqueue' title=''&gt;kqeue&lt;/a&gt;) and, when calling &lt;code&gt;poll&lt;/code&gt;, passes a &lt;code&gt;Waker&lt;/code&gt;. The &lt;code&gt;Waker&lt;/code&gt; is an abstract handle that allows the &lt;code&gt;Future&lt;/code&gt; to instruct the Tokio runtime to call &lt;code&gt;poll&lt;/code&gt;, because something has happened.&lt;/p&gt;

&lt;p&gt;To complicate things: an ordinary &lt;code&gt;Future&lt;/code&gt; is a one-shot value. Once it&amp;rsquo;s &lt;code&gt;Ready&lt;/code&gt;, it can&amp;rsquo;t be &lt;code&gt;polled&lt;/code&gt; anymore. But with network programming, that&amp;rsquo;s usually not what you want: data usually arrives in streams, which you want to track and make progress on as you can. So async Rust provides &lt;code&gt;AsyncRead&lt;/code&gt; and &lt;code&gt;AsyncWrite&lt;/code&gt; traits, which build on &lt;code&gt;Futures&lt;/code&gt;, and provide methods like &lt;code&gt;poll_read&lt;/code&gt; that return &lt;code&gt;Ready&lt;/code&gt; &lt;em&gt;every time&lt;/em&gt; there&amp;rsquo;s data ready. &lt;/p&gt;

&lt;p&gt;So far so good? OK. Now, there are two footguns in this design. &lt;/p&gt;

&lt;p&gt;The first footgun is that a &lt;code&gt;poll&lt;/code&gt; of a &lt;code&gt;Future&lt;/code&gt; that isn&amp;rsquo;t &lt;code&gt;Ready&lt;/code&gt; wastes cycles, and, if you have a bug in your code and that &lt;code&gt;Pending&lt;/code&gt; poll happens to trip a &lt;code&gt;Waker&lt;/code&gt;, you&amp;rsquo;ll slip into an infinite loop. That&amp;rsquo;s easy to see.&lt;/p&gt;

&lt;p&gt;The second and more insidious footgun is that an &lt;code&gt;AsyncRead&lt;/code&gt; can &lt;code&gt;poll_read&lt;/code&gt; to a &lt;code&gt;Ready&lt;/code&gt; that doesn&amp;rsquo;t actually progress its underlying state machine. Since the idea of &lt;code&gt;AsyncRead&lt;/code&gt; is that you keep &lt;code&gt;poll_reading&lt;/code&gt; until it stops being &lt;code&gt;Ready&lt;/code&gt;, this too is an infinite loop.&lt;/p&gt;

&lt;p&gt;When we look at our profiles, what we see are samples that almost terminate in libc, but spend next to no time in the kernel doing actual I/O. The obvious explanation: we&amp;rsquo;ve entered lots of &lt;code&gt;poll&lt;/code&gt; functions, but they&amp;rsquo;re doing almost nothing and returning immediately.&lt;/p&gt;
&lt;h2 id='jaccuse' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#jaccuse' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;J&amp;#39;accuse!&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Wakeup issues are annoying to debug. But the flamegraph gives us the fully qualified type of the &lt;code&gt;Future&lt;/code&gt; we&amp;rsquo;re polling:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative rust"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-mhjra6vu"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-mhjra6vu"&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="nn"&gt;fp_io&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Duplex&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="nn"&gt;fp_io&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;reusable_reader&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ReusableReader&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;peek&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PeekableReader&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;tokio_rustls&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;server&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TlsStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp_metered&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;MeteredIo&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;peek&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PeekableReader&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;fp_tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;permitted&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PermittedTcpStream&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Conn&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;tokio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;net&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;tcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TcpStream&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This loops like a lot, but much of it is just wrapper types we wrote ourselves, and those wrappers don&amp;rsquo;t do anything interesting. What&amp;rsquo;s left to audit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Duplex&lt;/code&gt;, the outermost type, one of ours, &lt;em&gt;and&lt;/em&gt;
&lt;/li&gt;&lt;li&gt;&lt;code&gt;TlsStream&lt;/code&gt;, from &lt;a href='https://github.com/rustls/rustls' title=''&gt;Rustls&lt;/a&gt;.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;Duplex&lt;/code&gt; is a beast. It&amp;rsquo;s the core I/O state machine for proxying between connections. It&amp;rsquo;s not easy to reason about in specificity. But: it also doesn&amp;rsquo;t do anything directly with a &lt;code&gt;Waker&lt;/code&gt;; it&amp;rsquo;s built around &lt;code&gt;AsyncRead&lt;/code&gt; and &lt;code&gt;AsyncWrite&lt;/code&gt;. It hasn&amp;rsquo;t changed recently and we can&amp;rsquo;t trigger misbehavior in it.&lt;/p&gt;

&lt;p&gt;That leaves &lt;code&gt;TlsStream&lt;/code&gt;. &lt;code&gt;TlsStream&lt;/code&gt; is an ultra-important, load-bearing function in the Rust ecosystem. Everybody uses it. Could it harbor an async Rust footgun? Turns out, it did!&lt;/p&gt;

&lt;p&gt;Unlike our &lt;code&gt;Duplex&lt;/code&gt;, Rustls actually does have to get intimate with the underlying async executor. And, looking through the repository, Pavel uncovers &lt;a href='https://github.com/rustls/tokio-rustls/issues/72' title=''&gt;this issue&lt;/a&gt;: sometimes, &lt;code&gt;TlsStreams&lt;/code&gt; in Rustls just spin out. And it turns out, what&amp;rsquo;s causing this is a TLS state machine bug: when a TLS session is orderly-closed, with a &lt;code&gt;CloseNotify&lt;/code&gt; &lt;code&gt;Alert&lt;/code&gt; record, the sender of that record has informed its counterparty that no further data will be sent. But if there&amp;rsquo;s still buffered data on the underlying connection, &lt;code&gt;TlsStream&lt;/code&gt; mishandles its &lt;code&gt;Waker&lt;/code&gt;, and we fall into a busy-loop.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://github.com/rustls/rustls/pull/1950/files' title=''&gt;Pretty straightforward fix&lt;/a&gt;!&lt;/p&gt;
&lt;h2 id='what-actually-happened-to-us' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-actually-happened-to-us' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What Actually Happened To Us&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Our partners in object storage, &lt;a href='https://www.tigrisdata.com/' title=''&gt;Tigris Data&lt;/a&gt;, were conducting some kind of load testing exercise. Some aspect of their testing system triggered the &lt;code&gt;TlsStream&lt;/code&gt; state machine bug, which locked up one or more &lt;code&gt;TlsStreams&lt;/code&gt; in the edge proxy handling whatever corner-casey stream they were sending.&lt;/p&gt;

&lt;p&gt;Tigris wasn&amp;rsquo;t generating a whole lot of traffic; tens of thousands of connections, tops. But all of them sent small HTTP bodies and then terminated early. We figured some of those connections errored out, and set up the &amp;ldquo;TLS CloseNotify happened before EOF&amp;rdquo; scenario. &lt;/p&gt;

&lt;p&gt;To be truer to the chronology, we knew pretty early on in our investigation that something Tigris was doing with their load testing was probably triggering the bug, and we got them to stop. After we worked it out, and Pavel deployed the fix, we told them to resume testing. No spin-outs.&lt;/p&gt;
&lt;h2 id='lessons-learned' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lessons-learned' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Lessons Learned&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Keep your dependencies updated. Unless you shouldn&amp;rsquo;t keep your dependencies updated. I mean, if there&amp;rsquo;s a vulnerability (and, technically, this was a DoS vulnerability), always update. And if there&amp;rsquo;s an important bugfix, update. But if there isn&amp;rsquo;t an important bugfix, updating for the hell of it might also destabilize your project? So update maybe? Most of the time?&lt;/p&gt;

&lt;p&gt;Really, the truth of this is that keeping track of &lt;em&gt;what needs to be updated&lt;/em&gt; is valuable work. The updates themselves are pretty fast and simple, but the process and testing infrastructure to confidently metabolize dependency updates is not. &lt;/p&gt;

&lt;p&gt;Our other lesson here is that there&amp;rsquo;s an opportunity to spot these kinds of bugs more directly with our instrumentation. Spurious wakeups should be easy to spot, and triggering a metric when they happen should be cheap, because they&amp;rsquo;re not supposed to happen often. So that&amp;rsquo;s something we&amp;rsquo;ll go do now.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>We Were Wrong About GPUs</title>
    <link rel="alternate" href="https://fly.io/blog/wrong-about-gpu/"/>
    <id>https://fly.io/blog/wrong-about-gpu/</id>
    <published>2025-02-14T00:00:00+00:00</published>
    <updated>2025-02-14T23:25:21+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/wrong-about-gpu/assets/choices-choices-cover.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re building a public cloud, on hardware we own. We raised money to do that, and to place some bets; one of them: GPU-enabling our customers. A progress report: GPUs aren’t going anywhere, but: GPUs aren’t going anywhere.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A couple years back, &lt;a href="https://fly.io/gpu"&gt;we put a bunch of chips down&lt;/a&gt; on the bet that people shipping apps to users on the Internet would want GPUs, so they could do AI/ML inference tasks. To make that happen, we created &lt;a href="https://fly.io/docs/gpus/getting-started-gpus/"&gt;Fly GPU Machines&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A Fly Machine is a &lt;a href="https://fly.io/blog/docker-without-docker/"&gt;Docker/OCI container&lt;/a&gt; running inside a hardware-virtualized virtual machine somewhere on our global fleet of bare-metal worker servers. A GPU Machine is a Fly Machine with a hardware-mapped Nvidia GPU. It&amp;rsquo;s a Fly Machine that can do fast CUDA.&lt;/p&gt;

&lt;p&gt;Like everybody else in our industry, we were right about the importance of AI/ML. If anything, we underestimated its importance. But the product we came up with probably doesn&amp;rsquo;t fit the moment. It&amp;rsquo;s a bet that doesn&amp;rsquo;t feel like it&amp;rsquo;s paying off.&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;If you&amp;rsquo;re using Fly GPU Machines, don&amp;rsquo;t freak out; we&amp;rsquo;re not getting rid of them.&lt;/strong&gt; But if you&amp;rsquo;re waiting for us to do something bigger with them, a v2 of the product, you&amp;rsquo;ll probably be waiting awhile.&lt;/p&gt;
&lt;h3 id='what-it-took' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-it-took' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What It Took&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;GPU Machines were not a small project for us. Fly Machines run on an idiosyncratically small hypervisor (normally Firecracker, but for GPU Machines &lt;a href="https://github.com/cloud-hypervisor/cloud-hypervisor"&gt;Intel&amp;rsquo;s Cloud Hypervisor&lt;/a&gt;, a very similar Rust codebase that supports PCI passthrough). The Nvidia ecosystem is not geared to supporting micro-VM hypervisors.&lt;/p&gt;

&lt;p&gt;GPUs &lt;a href="https://googleprojectzero.blogspot.com/2020/09/attacking-qualcomm-adreno-gpu.html"&gt;terrified our security team&lt;/a&gt;. A GPU is just about the worst case hardware peripheral: intense multi-directional direct memory transfers&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(not even bidirectional: in common configurations, GPUs talk to each other)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;with arbitrary, end-user controlled computation, all operating outside our normal security boundary.&lt;/p&gt;

&lt;p&gt;We did a couple expensive things to mitigate the risk. We shipped GPUs on dedicated server hardware, so that GPU- and non-GPU workloads weren&amp;rsquo;t mixed. Because of that, the only reason for a Fly Machine to be scheduled on a GPU machine was that it needed a PCI BDF for an Nvidia GPU, and there&amp;rsquo;s a limited number of those available on any box. Those GPU servers were drastically less utilized and thus less cost-effective than our ordinary servers.&lt;/p&gt;

&lt;p&gt;We funded two very large security assessments, from &lt;a href="https://www.atredis.com/"&gt;Atredis&lt;/a&gt; and &lt;a href="https://tetrelsec.com/"&gt;Tetrel&lt;/a&gt;, to evaluate our GPU deployment. Matt Braun is writing up those assessments now. They were not cheap, and they took time.&lt;/p&gt;

&lt;p&gt;Security wasn&amp;rsquo;t directly the biggest cost we had to deal with, but it was an indirect cause for a subtle reason.&lt;/p&gt;

&lt;p&gt;We could have shipped GPUs very quickly by doing what Nvidia recommended: standing up a standard K8s cluster to schedule GPU jobs on. Had we taken that path, and let our GPU users share a single Linux kernel, we&amp;rsquo;d have been on Nvidia&amp;rsquo;s driver happy-path.&lt;/p&gt;

&lt;p&gt;Alternatively, we could have used a conventional hypervisor. Nvidia suggested VMware (heh). But they could have gotten things working had we used QEMU. We like QEMU fine, and could have talked ourselves into a security story for it, but the whole point of Fly Machines is that they take milliseconds to start. We could not have offered our desired Developer Experience on the Nvidia happy-path.&lt;/p&gt;

&lt;p&gt;Instead, we burned months trying (and ultimately failing) to get Nvidia&amp;rsquo;s host drivers working to map &lt;a href="https://www.nvidia.com/en-us/data-center/virtual-solutions/"&gt;virtualized GPUs&lt;/a&gt; into Intel Cloud Hypervisor. At one point, we hex-edited the closed-source drivers to trick them into thinking our hypervisor was QEMU.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m not sure any of this really mattered in the end. There&amp;rsquo;s a segment of the market we weren&amp;rsquo;t ever really able to explore because Nvidia&amp;rsquo;s driver support kept us from thin-slicing GPUs. We&amp;rsquo;d have been able to put together a really cheap offering for developers if we hadn&amp;rsquo;t run up against that, and developers love &amp;ldquo;cheap&amp;rdquo;, but I can&amp;rsquo;t prove that those customers are real.&lt;/p&gt;

&lt;p&gt;On the other hand, we&amp;rsquo;re committed to delivering the Fly Machine DX for GPU workloads. Beyond the PCI/IOMMU drama, just getting an entire hardware GPU working in a Fly Machine was a lift. We needed Fly Machines that would come up with the right Nvidia drivers; our stack was built assuming that the customer&amp;rsquo;s OCI container almost entirely defined the root filesystem for a Machine. We had to engineer around that in our &lt;code&gt;flyd&lt;/code&gt; orchestrator. And almost everything people want to do with GPUs involves efficiently grabbing huge files full of model weights. Also annoying!&lt;/p&gt;

&lt;p&gt;And, of course, we bought GPUs. A lot of GPUs. Expensive GPUs.&lt;/p&gt;
&lt;h3 id='why-it-isnt-working' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-it-isnt-working' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Why It Isn&amp;rsquo;t Working&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The biggest problem: developers don&amp;rsquo;t want GPUs. They don&amp;rsquo;t even want AI/ML models. They want LLMs. &lt;em&gt;System engineers&lt;/em&gt; may have smart, fussy opinions on how to get their models loaded with CUDA, and what the best GPU is. But &lt;em&gt;software developers&lt;/em&gt; don&amp;rsquo;t care about any of that. When a software developer shipping an app comes looking for a way for their app to deliver prompts to an LLM, you can&amp;rsquo;t just give them a GPU.&lt;/p&gt;

&lt;p&gt;For those developers, who probably make up most of the market, it doesn&amp;rsquo;t seem plausible for an insurgent public cloud to compete with OpenAI and Anthropic. Their APIs are fast enough, and developers thinking about performance in terms of &amp;ldquo;tokens per second&amp;rdquo; aren&amp;rsquo;t counting milliseconds.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;(you should all feel sympathy for us)&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This makes us sad because we really like the point in the solution space we found. Developers shipping apps on Amazon will outsource to other public clouds to get cost-effective access to GPUs. But then they&amp;rsquo;ll faceplant trying to handle data and model weights, backhauling gigabytes (at significant expense) from S3. We have app servers, GPUs, and object storage all under the same top-of-rack switch. But inference latency just doesn&amp;rsquo;t seem to matter yet, so the market doesn&amp;rsquo;t care.&lt;/p&gt;

&lt;p&gt;Past that, and just considering the system engineers who do care about GPUs rather than LLMs: the hardware product/market fit here is really rough.&lt;/p&gt;

&lt;p&gt;People doing serious AI work want galactically huge amounts of GPU compute. A whole enterprise A100 is a compromise position for them; they want an SXM cluster of H100s.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Near as we can tell, MIG gives you a UUID to talk to the host driver, not a PCI device.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We think there&amp;rsquo;s probably a market for users doing lightweight ML work getting tiny GPUs. &lt;a href="https://www.nvidia.com/en-us/technologies/multi-instance-gpu/"&gt;This is what Nvidia MIG does&lt;/a&gt;, slicing a big GPU into arbitrarily small virtual GPUs. But for fully-virtualized workloads, it&amp;rsquo;s not baked; we can&amp;rsquo;t use it. And I&amp;rsquo;m not sure how many of those customers there are, or whether we&amp;rsquo;d get the density of customers per server that we need.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half"&gt;That leaves the L40S customers&lt;/a&gt;. There are a bunch of these! We dropped L40S prices last year, not because we were sour on GPUs but because they&amp;rsquo;re the one part we have in our inventory people seem to get a lot of use out of. We&amp;rsquo;re happy with them. But they&amp;rsquo;re just another kind of compute that some apps need; they&amp;rsquo;re not a driver of our core business. They&amp;rsquo;re not the GPU bet paying off.&lt;/p&gt;

&lt;p&gt;Really, all of this is just a long way of saying that for most software developers, &amp;ldquo;AI-enabling&amp;rdquo; their app is best done with API calls to things like Claude and GPT, Replicate and RunPod.&lt;/p&gt;
&lt;h3 id='what-did-we-learn' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-did-we-learn' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What Did We Learn?&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;A very useful way to look at a startup is that it&amp;rsquo;s a race to learn stuff. So, what&amp;rsquo;s our report card?&lt;/p&gt;

&lt;p&gt;First off, when we embarked down this path in 2022, we were (like many other companies) operating in a sort of phlogiston era of AI/ML. The industry attention to AI had not yet collapsed around a small number of foundational LLM models. We expected there to be a diversity of &lt;em&gt;mainstream&lt;/em&gt; models, the world &lt;a href='https://github.com/elixir-nx/bumblebee' title=''&gt;Elixir Bumblebee&lt;/a&gt; looks forward to, where people pull different AI workloads off the shelf the same way they do Ruby gems.&lt;/p&gt;

&lt;p&gt;But &lt;a href='https://www.cursor.com/' title=''&gt;Cursor happened&lt;/a&gt;, and, as they say, how are you going to keep &amp;lsquo;em down on the farm once they&amp;rsquo;ve seen Karl Hungus? It seems much clearer where things are heading.&lt;/p&gt;

&lt;p&gt;GPUs were a test of a Fly.io company credo: as we think about core features, we design for 10,000 developers, not for 5-6. It took a minute, but the credo wins here: GPU workloads for the 10,001st developer are a niche thing.&lt;/p&gt;

&lt;p&gt;Another way to look at a startup is as a series of bets. We put a lot of chips down here. But the buy-in for this tournament gave us a lot of chips to play with. Never making a big bet of any sort isn&amp;rsquo;t a winning strategy. I&amp;rsquo;d rather we&amp;rsquo;d flopped the nut straight, but I think going in on this hand was the right call.&lt;/p&gt;

&lt;p&gt;A really important thing to keep in mind here, and something I think a lot of startup thinkers sleep on, is the extent to which this bet involved acquiring assets. Obviously, some of our &lt;a href='https://fly.io/blog/the-exit-interview-jp/' title=''&gt;costs here aren&amp;rsquo;t recoverable&lt;/a&gt;. But the hardware parts that aren&amp;rsquo;t generating revenue will ultimately get liquidated; like with &lt;a href='https://fly.io/blog/32-bit-real-estate/' title=''&gt;our portfolio of IPv4 addresses&lt;/a&gt;, I&amp;rsquo;m even more comfortable making bets backed by tradable assets with durable value.&lt;/p&gt;

&lt;p&gt;In the end, I don&amp;rsquo;t think GPU Fly Machines were going to be a hit for us no matter what we did. Because of that, one thing I&amp;rsquo;m very happy about is that we didn&amp;rsquo;t compromise the rest of the product for them. Security concerns slowed us down to where we probably learned what we needed to learn a couple months later than we could have otherwise, but we&amp;rsquo;re scaling back our GPU ambitions without having sacrificed &lt;a href='https://fly.io/blog/sandboxing-and-workload-isolation/' title=''&gt;any of our isolation story&lt;/a&gt;, and, ironically, GPUs &lt;em&gt;other people run&lt;/em&gt; are making that story a lot more important. The same thing goes for our Fly Machine developer experience.&lt;/p&gt;

&lt;p&gt;We started this company building a Javascript runtime for edge computing. We learned that our customers didn&amp;rsquo;t want a new Javascript runtime; they just wanted their native code to work. &lt;a href='https://news.ycombinator.com/item?id=22616857' title=''&gt;We shipped containers&lt;/a&gt;, and no convincing was needed. We were wrong about Javascript edge functions, and I think we were wrong about GPUs. That&amp;rsquo;s usually how we figure out the right answers:  by being wrong about a lot of stuff.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>The Exit Interview: JP Phillips</title>
    <link rel="alternate" href="https://fly.io/blog/the-exit-interview-jp/"/>
    <id>https://fly.io/blog/the-exit-interview-jp/</id>
    <published>2025-02-12T00:00:00+00:00</published>
    <updated>2025-02-14T21:30:41+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/the-exit-interview-jp/assets/bye-bye-little-sebastian-cover.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;JP Phillips is off to greener, or at least calmer, pastures. He joined us 4 years ago to build the next generation of our orchestration system, and has been one of the anchors of our engineering team. His last day is today. We wanted to know what he was thinking, and figured you might too.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Question 1: Why, JP? Just why?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;LOL. When I looked at what I wanted to see from here in the next 3-4 years, it didn&amp;rsquo;t really match up with where we&amp;rsquo;re currently heading. Specifically, with our new focus on MPG &lt;em&gt;[Managed Postgres]&lt;/em&gt; and [llm] &lt;em&gt;[llm].&lt;/em&gt;&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Editorial comment: Even I don’t know what [llm] is.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The Fly Machines platform is more or less finished, in the sense of being capable of supporting the next iteration of our products. My original desire to join Fly.io was to make Machines a product that would &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;rid us of HashiCorp Nomad&lt;/a&gt;, and I feel like that&amp;rsquo;s been accomplished.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Where were you hoping to see us headed?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;More directly positioned as a cloud provider, rather than a platform-as-a-service; further along the customer journey from &amp;ldquo;developers&amp;rdquo; and &amp;ldquo;startups&amp;rdquo; to large established companies.&lt;/p&gt;

&lt;p&gt;And, it&amp;rsquo;s not that I disagree with PAAS work or MPG! Rather, it&amp;rsquo;s not something that excites me in a way that I&amp;rsquo;d feel challenged and could continue to grow technically.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow up question: does your family know what you’re doing here? Doing to us? Are they OK with it?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes, my family was very involved in the decision, before I even talked to other companies.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What&amp;rsquo;s the thing you&amp;rsquo;re happiest about having built here? It cannot be &amp;ldquo;all of &lt;code&gt;flyd&lt;/code&gt;&amp;rdquo;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve enabled developers to run workloads from an OCI image and an API call all over the world. On any other cloud provider, the knowledge of how to pull that off comes with a professional certification.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;In what file in our &lt;code&gt;nomad-firecracker&lt;/code&gt; repository would I find that code?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href='https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines' title=''&gt;https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img alt="A diagram that doesn&amp;#39;t make any of this clearer" src="/blog/the-exit-interview-jp/assets/flaps.png?1/2&amp;amp;center" /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;So you mean, literally, the whole Fly Machines API, and &lt;code&gt;flaps&lt;/code&gt;, the API gateway for Fly Machines?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes, all of it. The &lt;code&gt;flaps&lt;/code&gt; API server, the &lt;code&gt;flyd&lt;/code&gt; RPCs it calls, the &lt;code&gt;flyd&lt;/code&gt; finite state machine system, the interface to running VMs.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Is there something you especially like about that design?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I like that it for the most part doesn&amp;rsquo;t require any central coordination. And I like that the P90 for Fly Machine &lt;code&gt;create&lt;/code&gt; calls is sub-5-seconds for pretty much every region except for Johannesburg and Hong Kong.&lt;/p&gt;

&lt;p&gt;I think the FSM design is something I&amp;rsquo;m proud of; if I could take any code with me, it&amp;rsquo;d be the &lt;code&gt;internal/fsm&lt;/code&gt; in the &lt;code&gt;nomad-firecracker&lt;/code&gt; repo.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;You can read more about &lt;a href="https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/" title=""&gt;the &lt;code&gt;flyd&lt;/code&gt; orchestrator JP led over here&lt;/a&gt;.  But, a quick decoder ring: &lt;code&gt;flyd&lt;/code&gt; runs independently without any central coordination on thousands of “worker” servers around the globe. It’s structured as an API server for a bunch of finite state machine invocations, where an FSM might be something like “start a Fly Machine” or “create a new Fly Machine” or “cordon off a Fly Machine so we can update it”. Each FSM invocation is comprised of a bunch of steps, each of those steps has callbacks into the &lt;code&gt;flyd&lt;/code&gt; code, and each step is logged in &lt;a href="https://github.com/boltdb/bolt" title=""&gt;a BoltDB database&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Thinking back, there are like two archetypes of insanely talented developers I’ve worked with. One is the kind that belts out ridiculous amounts of relatively sophisticated code on a whim, at like 3AM. Jerome [who leads our fly-proxy team], is that type. The other comes to projects with what feels like fully-formed, coherent designs that are not super intuitive, and the whole project just falls together around that design. Did you know you were going to do the FSM log thing when you started &lt;code&gt;flyd&lt;/code&gt;?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I definitely didn&amp;rsquo;t have any specific design in mind when I started on &lt;code&gt;flyd&lt;/code&gt;. I think the FSM stuff is a result of work I did at Compose.io / MongoHQ (where it was called &amp;ldquo;recipes&amp;rdquo;/&amp;ldquo;operations&amp;rdquo;) and the workd I did at HashiCorp using Cadence.&lt;/p&gt;

&lt;p&gt;Once I understood what the product needed to do and look like, having a way to perform deterministic and durable execution felt like a good design.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Cadence?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href='https://cadenceworkflow.io/' title=''&gt;Cadence&lt;/a&gt; is the child of AWS Step Functions and the predecessor to &lt;a href='https://temporal.io/' title=''&gt;Temporal&lt;/a&gt; (the company).&lt;/p&gt;

&lt;p&gt;One of the biggest gains, with how it works in &lt;code&gt;flyd&lt;/code&gt;, is knowing we would need to deploy &lt;code&gt;flyd&lt;/code&gt; all day, every day. If &lt;code&gt;flyd&lt;/code&gt; was in the middle of doing some work, it needed to pick back up right where it left off, post-deploy.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;OK, next question. What&amp;rsquo;s the most impressive thing you saw someone else build here? To make things simpler and take some pressure off the interview, we can exclude any of my works from consideration.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Probably &lt;a href='https://github.com/superfly/corrosion' title=''&gt;&lt;code&gt;corrosion2&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Sidebar: &lt;code&gt;corrosion2&lt;/code&gt; is our state distribution system. While &lt;code&gt;flyd&lt;/code&gt; runs individual Fly Machines for users, each instance is solely responsible for its own state; there’s no global scheduler. But we have platform components, most obviously &lt;code&gt;fly-proxy&lt;/code&gt;, our Anycast router, that need to know what’s running where. &lt;code&gt;corrosion2&lt;/code&gt; is a Rust service that does &lt;a href="https://fly.io/blog/building-clusters-with-serf/" title=""&gt;SWIM gossip&lt;/a&gt; to propagate information from each worker into a CRDT-structured SQLite database. &lt;code&gt;corrosion2&lt;/code&gt; essentially means any component on our fleet can do SQLite queries to get near-real-time information about any Fly Machine around the world.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If for no other reason than that we deployed &lt;code&gt;corrosion&lt;/code&gt;, learned from it, and were able to make significant and valuable improvements — and then migrate to the new system in a short period of time.&lt;/p&gt;

&lt;p&gt;Having a &amp;ldquo;just SQLite&amp;rdquo; interface, for async replicated changes around the world in seconds, it&amp;rsquo;s pretty powerful.&lt;/p&gt;

&lt;p&gt;If we invested in &lt;a href='https://antithesis.com/' title=''&gt;Anthesis&lt;/a&gt; or TLA+ testing, I think there&amp;rsquo;s &lt;a href='https://github.com/superfly/corrosion' title=''&gt;potential for other companies&lt;/a&gt; to get value out of &lt;code&gt;corrosion2&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Just as a general-purpose gossip-based SQLite CRDT gossip system?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;OK, you&amp;rsquo;re being too nice. What&amp;rsquo;s your least favorite thing about the platform?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;GraphQL. No, Elixir. It&amp;rsquo;s a tie between GraphQL and Elixir.&lt;/p&gt;

&lt;p&gt;But probably GraphQL, by a hair.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;That&amp;rsquo;s not the answer I expected.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;GraphQL slows everyone down, and everything. Elixir only slows me down.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The rest of the platform, you&amp;rsquo;re fine with? No complaints?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m happier now that we have &lt;code&gt;pilot&lt;/code&gt;.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;code&gt;pilot&lt;/code&gt; is our new &lt;code&gt;init&lt;/code&gt;. When we launch a Fly Machine, &lt;code&gt;init&lt;/code&gt; is our foothold in the machine; this is unlike a normal OCI runtime, where “pid 1” is often the user’s entrypoint program. Our original &lt;code&gt;init&lt;/code&gt; was so simple people dunked on it and said it might as well have been a bash script; over time, &lt;code&gt;init&lt;/code&gt; has sprouted a bunch of new features. &lt;code&gt;pilot&lt;/code&gt; consolidates those features, and, more importantly, is itself a complete OCI runtime; &lt;code&gt;pilot&lt;/code&gt; can natively run containers inside of Fly Machines.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Before &lt;code&gt;pilot&lt;/code&gt;, there really wasn&amp;rsquo;t any contract between &lt;code&gt;flyd&lt;/code&gt; and &lt;code&gt;init&lt;/code&gt;. And &lt;code&gt;init&lt;/code&gt; was just &amp;ldquo;whatever we wanted &lt;code&gt;init&lt;/code&gt; to be&amp;rdquo;. That limit its ability to serve us.&lt;/p&gt;

&lt;p&gt;Having &lt;code&gt;pilot&lt;/code&gt; be an OCI-compliant runtime with an API for &lt;code&gt;flyd&lt;/code&gt; to drive is  a big win for the future of the Fly Machines API.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Was I right that we should have used SQLite for &lt;code&gt;flyd&lt;/code&gt;, or were you wrong to have used BoltDB?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I still believe Bolt was the right choice. I&amp;rsquo;ve never lost a second of sleep worried that someone is about to run a SQL update statement on a host, or across the whole fleet, and then mangled all our state data. And limiting the storage interface, by not using SQL, kept &lt;code&gt;flyd&lt;/code&gt;&amp;lsquo;s scope managed.&lt;/p&gt;

&lt;p&gt;On the engine side of the platform, which is what &lt;code&gt;flyd&lt;/code&gt; is, I still believe SQL is too powerful for what &lt;code&gt;flyd&lt;/code&gt; does.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you had this to do over again, would Bolt be precisely what you&amp;rsquo;d pick, or is there something else you&amp;rsquo;d want to try? Some cool-ass new KV store?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Nah. But, I&amp;rsquo;d maybe consider a SQLite database per-Fly-Machine. Then the scope of danger is about as small as it could possibly be.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Whoah, that&amp;rsquo;s an interesting thought. People sleep on the &amp;ldquo;keep a zillion little SQLites&amp;rdquo; design.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yeah, with per-Machine SQLite, once a Fly Machine is destroyed, we can just zip up the database and stash it in object storage. The biggest hold-up I have about it is how we&amp;rsquo;d manage the schemas.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;OpenTelemetry: were you right all along?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;One hundred percent.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I basically attribute oTel at Fly.io to you.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Without oTel, it&amp;rsquo;d be a disaster trying to troubleshoot the system. I&amp;rsquo;d have ragequit trying.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I remember there being a cost issue, with how much Honeycomb was going to end up charging us to manage all the data. But that seems silly in retrospect.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For sure. It is 100% part of the decision and the conversation. But: we didn&amp;rsquo;t have the best track record running a logs/metrics cluster at this fidelity. It was worth the money to pay someone else to manage tracing data.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Strong agree. I think my only issue is just the extent to which it cruds up code. But I need to get over that.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes, it&amp;rsquo;s very explicit. I think the next big part of oTel is going to be auto-instrumentation, for profiling.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You&amp;rsquo;re a veteran Golang programmer. Say 3 nice things about Rust.&lt;/em&gt;&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Most of our backend is in Go, but &lt;code&gt;fly-proxy&lt;/code&gt;, &lt;code&gt;corrosion2&lt;/code&gt;, and &lt;code&gt;pilot&lt;/code&gt; are in Rust.&lt;/p&gt;
&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;Option. 
&lt;/li&gt;&lt;li&gt;Match.
&lt;/li&gt;&lt;li&gt;Serde macros.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;Even I can&amp;rsquo;t say shit about Option and match.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Match is so much better than anything in Go.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Elixir, Go, and Rust. An honest take on that programming cocktail.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Three&amp;rsquo;s a crowd, Elixir can stay home.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you could only lose one, you&amp;rsquo;d keep Rust.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ve learned its shortcomings and the productivity far outweighs having to deal with the Rust compiler.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You&amp;rsquo;d be unhappy if we moved the &lt;code&gt;flaps&lt;/code&gt; API code from Go to Elixir.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Correct.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I kind of buy the idea of doing orchestration and scheduling code, which is policy-intensive, in a higher-level language.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Maybe. If Ruby had a better concurrency story, I don&amp;rsquo;t think Elixir would have a place for us.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Here I need to note that Ruby is functionally dead here, and Elixir is ascendant.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;We have an idiosyncratic management structure. We&amp;rsquo;re bottom-up, but ambiguously so. We don&amp;rsquo;t have roadmaps, except when we do. We have minimal top-down technical direction. Critique.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s too easy to lose sight of whether your current focus [in what you&amp;rsquo;re building] is valuable to the company.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The first thing I warn every candidate about on our &amp;ldquo;do-not-work-here&amp;rdquo; calls.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I think it comes down to execution, and accountability to actually finish projects. I spun a lot trying to figure out what would be the most valuable work for Fly Machines.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You don&amp;rsquo;t have to be so nice about things.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We struggle a lot with consistent communication. We change direction a little too often. It got to a point where I didn&amp;rsquo;t see a point in devoting time and effort into projects, because I&amp;rsquo;d not be able to show enough value quick enough.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I see things paying off later than we&amp;rsquo;d hoped or expected they would. Our secret storage system, Pet Semetary, is a good example of this. Our K8s service, FKS, is another obvious one, since we&amp;rsquo;re shipping MPG on it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is your second time working Kurt, at a company where he&amp;rsquo;s the CEO. Give him a 1-4 star rating. He can take it! At least, I think he can take it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;2022: ★★★★&lt;/p&gt;

&lt;p&gt;2023: ★★&lt;/p&gt;

&lt;p&gt;2024: ★★✩&lt;/p&gt;

&lt;p&gt;2025: ★★★✩&lt;/p&gt;

&lt;p&gt;On a four-star scale.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Whoah. I did not expect a histogram. Say more about 2023!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We hired too many people, too quickly, and didn&amp;rsquo;t have the guardrails and structure in place for everybody to be successful.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Also: GPUs!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes. That was my next comment.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Do we secretly agree about GPUs?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I think so.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Our side won the argument in the end! But at what cost?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;They were a killer distraction.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Final question: how long will you remain in the first-responder on-call rotation after you leave? I assume at least until August. I have a shift this weekend; can you swap with me? I keep getting weekends.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I am going to be asleep all weekend if any of my previous job changes are indicative.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I sleep through on-call too! But nobody can yell at you for it now. I think you have the comparative advantage over me in on-calling.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes I will absolutely take all your future on-call shifts, you have convinced me.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;All this aside: it has been a privilege watching you work. I hope your next gig is 100x more relaxing than this was. Or maybe I just hope that for myself. Except: I&amp;rsquo;ll never escape this place. Thank you so much for doing this.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Thank you! I&amp;rsquo;m forever grateful for having the opportunity to be a part of Fly.io.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Did Semgrep Just Get A Lot More Interesting?</title>
    <link rel="alternate" href="https://fly.io/blog/semgrep-but-for-real-now/"/>
    <id>https://fly.io/blog/semgrep-but-for-real-now/</id>
    <published>2025-02-10T00:00:00+00:00</published>
    <updated>2025-02-14T21:30:41+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/semgrep-but-for-real-now/assets/ci-cd-cover.webp"/>
    <content type="html">&lt;div class="right-sidenote"&gt;&lt;p&gt;This whole paragraph is just one long sentence. God I love &lt;a href="https://fly.io/blog/a-blog-if-kept/" title=""&gt;just random-ass blogging&lt;/a&gt; again.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a href='https://ghuntley.com/stdlib/' title=''&gt;This bit by Geoffrey Huntley&lt;/a&gt; is super interesting to me and, despite calling out that LLM-driven development agents like Cursor have something like a 40% success rate at actually building anything that passes acceptance criteria, makes me think that more of the future of our field belongs to people who figure out how to use this weird bags of model weights than any of us are comfortable with. &lt;/p&gt;

&lt;p&gt;I’ve been dinking around with Cursor for a week now (if you haven’t, I think it’s something close to malpractice not to at least take it — or something like it — for a spin) and am just now from this post learning that Cursor has this &lt;a href='https://docs.cursor.com/context/rules-for-ai' title=''&gt;rules feature&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;The important thing for me is not how Cursor rules work, but rather how Huntley uses them. He turns them back on themselves, writing rules to tell Cursor how to organize the rules, and then teach Cursor how to write (under human supervision) its own rules.&lt;/p&gt;

&lt;p&gt;Cursor kept trying to get Huntley to use Bazel as a build system. So he had cursor write a rule for itself: “no bazel”. And there was no more Bazel. If I’d known I could do this, I probably wouldn’t have bounced from the Elixir project I had Cursor doing, where trying to get it to write simple unit tests got it all tangled up trying to make &lt;a href='https://hexdocs.pm/mox/Mox.html' title=''&gt;Mox&lt;/a&gt; work. &lt;/p&gt;

&lt;p&gt;But I’m burying the lead. &lt;/p&gt;

&lt;p&gt;Security people have been for several years now somewhat in love with a tool called &lt;a href='https://github.com/semgrep/semgrep' title=''&gt;Semgrep&lt;/a&gt;. Semgrep is a semantics-aware code search tool; using symbolic variable placeholders and otherwise ordinary code, you can write rules to match pretty much arbitary expressions and control flow. &lt;/p&gt;

&lt;p&gt;If you’re an appsec person, where you obviously go with this is: you build a library of Semgrep searches for well-known vulnerability patterns (or, if you’re like us at Fly.io, you work out how to get Semgrep to catch the Rust concurrency footgun of RWLocks inside if-lets).&lt;/p&gt;

&lt;p&gt;The reality for most teams though is “ain’t nobody got time for that”. &lt;/p&gt;

&lt;p&gt;But I just checked and, unsurprisingly, 4o &lt;a href='https://chatgpt.com/share/67aa94a7-ea3c-8012-845c-6c9491b33fe4' title=''&gt;seems to do reasonably well&lt;/a&gt; at generating Semgrep rules? Like: I have no idea if this rule is actually any good. But it looks like a Semgrep rule?&lt;/p&gt;

&lt;p&gt;What interests me is this: it seems obvious that we’re going to do more and more “closed-loop” LLM agent code generation stuff. By “closed loop”, I mean that the thingy that generates code is going to get to run the code and watch what happens when it’s interacted with. You’re just a small bit of glue code and a lot of system prompting away from building something like that right now: &lt;a href='https://x.com/chris_mccord/status/1882839014845374683' title=''&gt;Chris McCord is building&lt;/a&gt; a thingy that generates whole Elixir/Phoenix apps and runs them as Fly Machines. When you deploy these kinds of things, the LLM gets to see the errors when the code is run, and it can just go fix them. It also gets to see errors and exceptions in the logs when you hit a page on the app, and it can just go fix them.&lt;/p&gt;

&lt;p&gt;With a bit more system prompting, you can get an LLM to try to generalize out from exceptions it fixes and generate unit test coverage for them. &lt;/p&gt;

&lt;p&gt;With a little bit more system prompting, you can probably get an LLM to (1) generate a Semgrep rule for the generalized bug it caught, (2) test the Semgrep rule with a positive/negative control, (3) save the rule, (4) test the whole codebase with Semgrep for that rule, and (5) fix anything it finds that way. &lt;/p&gt;

&lt;p&gt;That is a lot more interesting to me than tediously (and probably badly) trying to predict everything that will go wrong in my codebase a priori and Semgrepping for them. Which is to say: Semgrep — which I have always liked — is maybe a lot more interesting now? And tools like it?&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>A Blog, If You Can Keep It</title>
    <link rel="alternate" href="https://fly.io/blog/a-blog-if-kept/"/>
    <id>https://fly.io/blog/a-blog-if-kept/</id>
    <published>2025-02-10T00:00:00+00:00</published>
    <updated>2025-02-19T19:05:52+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/a-blog-if-kept/assets/keep-blog.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;A boldfaced lede like this was a sure sign you were reading a carefully choreographed EffortPost from our team at Fly.io. We’re going to do less of those. Or the same amount but more of a different kind of post. Either way: launch an app on Fly.io today!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Over the last 5 years, we’ve done pretty well for ourselves writing content for Hacker News. And that’s &lt;a href='https://news.ycombinator.com/item?id=39373476' title=''&gt;mostly&lt;/a&gt; been good for us. We don’t do conventional marketing, we don’t have a sales team, the rest of social media is atomized over 5 different sites. Writing pieces that HN takes seriously has been our primary outreach tool.&lt;/p&gt;

&lt;p&gt;There’s a recipe (probably several, but I know this one works) for charting a post on HN:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write an EffortPost, which is to say a dense technical piece over 2000 words long; within that rubric there’s a bunch of things that are catnip to HN, including runnable code, research surveys, and explainers. (There are also cat-repellants you learn to steer clear of.)
&lt;/li&gt;&lt;li&gt;Assiduously avoid promotion. You have to write for the audience. We get away with sporadically including a call-to-action block in our posts, but otherwise: the post should make sense even if an unrelated company posted it after you went out of business.
&lt;/li&gt;&lt;li&gt;Pick topics HN is interested in (it helps if all your topics are au courant for HN, and we’ve been &lt;a href='https://news.ycombinator.com/item?id=32250426' title=''&gt;very&lt;/a&gt; &lt;a href='https://news.ycombinator.com/item?id=32018066' title=''&gt;lucky&lt;/a&gt; in that regard).
&lt;/li&gt;&lt;li&gt;Like 5-6 more style-guide things that help incrementally. Probably 3 different teams writing for HN will have 3 different style guides with only like &amp;frac12; overlap. Ours, for instances, instructs writers to swear.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;I like this kind of writing. It’s not even a chore. But it’s become an impediment for us, for a couple reasons: the team serializes behind an “editorial” function here, which keeps us from publishing everything we want; worse, caring so much about our track record leaves us noodling on posts interminably (the poor &lt;a href='https://www.tigrisdata.com/' title=''&gt;Tigrises&lt;/a&gt; have been waiting for months for me to publish the piece I wrote about them and FoundationDB; take heart, this post today means that one is coming soon).&lt;/p&gt;

&lt;p&gt;But worst of all, I worried incessantly about us &lt;a href='https://gist.github.com/tqbf/e853764b562a2d72a91a6986ca3b77c0' title=''&gt;wearing out our welcome&lt;/a&gt;. To my mind, we’d have 1, maybe 2 bites at the HN apple in a given month, and we needed to make them count.&lt;/p&gt;

&lt;p&gt;That was dumb. I am dumb about a lot of things! I came around to understanding this after Kurt demanded I publish my blog post about BFAAS (Bash Functions As A Service), 500 lines of Go code that had generated 4500 words in my draft. It was only after I made the decision to stop gatekeeping this blog that I realized &lt;a href='https://simonwillison.net/' title=''&gt;Simon Willison&lt;/a&gt; has been disproving my “wearing out the welcome” theory, day in and day out, for years. He just writes stuff about LLMs when it interests him. I mean, it helps that he’s a better writer than we are. But he’s not wasting time choreographing things.&lt;/p&gt;

&lt;p&gt;Back in like 2009, &lt;a href='https://web.archive.org/web/20110806040300/http://chargen.matasano.com/chargen/2009/7/22/if-youre-typing-the-letters-a-e-s-into-your-code-youre-doing.html' title=''&gt;we had a blog&lt;/a&gt; at another company I was at. That blog drove a lot of business for us (and, on three occasions, almost killed me). It was not in the least bit optimized for HN. I like pretending to be a magazine feature writer, but I miss writing dashed-off pieces every day and clearing space for other people on the team to write as well.&lt;/p&gt;

&lt;p&gt;So this is all just a heads up: we’re trying something new. This is a very long and self-indulgent way to say “we’re going to write a normal blog like it’s 2008”, but that’s how broken my brain is after years of having my primary dopaminergic rewards come from how long Fly.io blog posts stay on the front page: I have to disclaim blogging before we start doing it, lest I fail to meet expectations.&lt;/p&gt;

&lt;p&gt;Like I said. I’m real dumb. But: looking forward to getting a lot more stuff out on the web for people to read this year!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>VSCode’s SSH Agent Is Bananas</title>
    <link rel="alternate" href="https://fly.io/blog/vscode-ssh-wtf/"/>
    <id>https://fly.io/blog/vscode-ssh-wtf/</id>
    <published>2025-02-07T00:00:00+00:00</published>
    <updated>2025-02-14T21:30:41+00:00</updated>
    <media:thumbnail url="https://fly.io/static/images/default-post-thumbnail.webp"/>
    <content type="html">&lt;p&gt;We’re interested in getting integrated into the flow VSCode uses to do remote editing over SSH, because everybody is using VSCode now, and, in particular, they’re using forks of VSCode that generate code with LLMs. &lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;”hallucination” is what we call it when LLMs get code wrong; “engineering” is what we call it when people do.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;LLM-generated code is &lt;a href='https://nicholas.carlini.com/writing/2024/how-i-use-ai.html' title=''&gt;useful in the general case&lt;/a&gt; if you know what you’re doing. But it’s ultra-useful if you can close the loop between the LLM and the execution environment (with an “Agent” setup). There’s lots to say about this, but for the moment: it’s a semi-effective antidote to hallucination: the LLM generates the code, the agent scaffolding runs the code, the code generates errors, the agent feeds it back to the LLM, the process iterates. &lt;/p&gt;

&lt;p&gt;So, obviously, the issue here is you don’t want this iterative development process happening on your development laptop, because LLMs have boundary issues, and they’ll iterate on your system configuration just as happily on the Git project you happen to be working in. A thing you’d really like to be able to do: run a closed-loop agent-y (“agentic”? is that what we say now) configuration for an LLM, on a clean-slate Linux instance that spins up instantly and that can’t screw you over in any way. You get where we’re going with this.&lt;/p&gt;

&lt;p&gt;Anyways! I would like to register a concern.&lt;/p&gt;

&lt;p&gt;Emacs hosts the spiritual forebearer of remote editing systems, a blob of hyper-useful Elisp called &lt;a href='https://www.gnu.org/software/tramp/' title=''&gt;“Tramp”&lt;/a&gt;. If you can hook Tramp up to any kind of interactive environment — usually, an SSH session — where it can run Bourne shell commands, it can extend Emacs to that environment.&lt;/p&gt;

&lt;p&gt;So, VSCode has a feature like Tramp. Which, neat, right? You’d think, take Tramp, maybe simplify it a bit, switch out Elisp for Typescript.&lt;/p&gt;

&lt;p&gt;You’d think wrong!&lt;/p&gt;

&lt;p&gt;Unlike Tramp, which lives off the land on the remote connection, VSCode mounts a full-scale invasion: it runs a Bash snippet stager that downloads an agent, including a binary installation of Node. &lt;/p&gt;

&lt;p&gt;I &lt;em&gt;think&lt;/em&gt; this is &lt;a href='https://github.com/microsoft/vscode/tree/c9e7e1b72f80b12ffc00e06153afcfedba9ec31f/src/vs/server/node' title=''&gt;the source code&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;The agent runs over port-forwarded SSH. It establishes a WebSockets connection back to your running VSCode front-end. The underlying protocol on that connection can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wander around the filesystem
&lt;/li&gt;&lt;li&gt;Edit arbitrary files
&lt;/li&gt;&lt;li&gt;Launch its own shell PTY processes
&lt;/li&gt;&lt;li&gt;Persist itself
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;In security-world, there’s a name for tools that work this way. I won’t say it out loud, because that’s not fair to VSCode, but let’s just say the name is murid in nature.&lt;/p&gt;

&lt;p&gt;I would be a little nervous about letting people VSCode-remote-edit stuff on dev servers, and apoplectic if that happened during an incident on something in production. &lt;/p&gt;

&lt;p&gt;It turns out we don’t have to care about any of this to get a custom connection to a Fly Machine working in VSCode, so none of this matters in any kind of deep way, but: we’ve decided to just be a blog again, so: we had to learn this, and now you do too.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>AI GPU Clusters, From Your Laptop, With Livebook</title>
    <link rel="alternate" href="https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/"/>
    <id>https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/</id>
    <published>2024-09-24T00:00:00+00:00</published>
    <updated>2024-09-24T17:19:49+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/assets/ai-gpu-livebook-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Livebook, FLAME, and the Nx stack: three Elixir components that are easy to describe, more powerful than they look, and intricately threaded into the Elixir ecosystem. A few weeks ago, Chris McCord (👋) and Chris Grainger showed them off at ElixirConf 2024. We thought the talk was worth a recap.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s begin by introducing our cast of characters.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://livebook.dev/' title=''&gt;Livebook&lt;/a&gt; is usually described as Elixir&amp;rsquo;s answer to &lt;a href='https://jupyter.org/' title=''&gt;Jupyter Notebooks&lt;/a&gt;. And that&amp;rsquo;s a good way to think about it. But Livebook takes full advantage of the Elixir platform, which makes it sneakily powerful. By linking up directly with Elixir app clusters, Livebook can switch easily between driving compute locally or on remote servers, and makes it easy to bring in any kind of data into reproducible workflows.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://fly.io/blog/rethinking-serverless-with-flame/' title=''&gt;FLAME&lt;/a&gt; is the Elixir&amp;rsquo;s answer to serverless computing. By having the library manage a pool of executors for you, FLAME lets you treat your entire application as if it was elastic and scale-to-zero. You configure FLAME with some basic information about where to run code and how many instances it&amp;rsquo;s allowed to run with, and then mark off any arbitrary section of code with &lt;code&gt;Flame.call&lt;/code&gt;. The framework takes care of the rest. It&amp;rsquo;s the upside of serverless without committing yourself to blowing your app apart into tiny, intricately connected pieces.&lt;/p&gt;

&lt;p&gt;The &lt;a href='https://github.com/elixir-nx' title=''&gt;Nx stack&lt;/a&gt; is how you do Elixir-native AI and ML. Nx gives you an Elixir-native notion of tensor computations with GPU backends. &lt;a href='https://github.com/elixir-nx/axon' title=''&gt;Axon&lt;/a&gt; builds a common interface for ML models on top of it. &lt;a href='https://github.com/elixir-nx/bumblebee' title=''&gt;Bumblebee&lt;/a&gt; makes those models available to any Elixir app that wants to download them, from just a couple lines of code.&lt;/p&gt;

&lt;p&gt;Here is quick video showing how to transfer a local tensor to a remote GPU, using Livebook, FLAME, and Nx:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/5ImP3gpUSkQ"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Let&amp;rsquo;s dive into the &lt;a href='https://www.youtube.com/watch?v=4qoHPh0obv0' title=''&gt;keynote&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id='poking-a-hole-in-your-infrastructure' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#poking-a-hole-in-your-infrastructure' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Poking a hole in your infrastructure&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Any Livebook, including the one running on your laptop, can start a runtime running on a Fly Machine, in Fly.io&amp;rsquo;s public cloud. That Elixir machine will (by default) live in your default Fly.io organization, giving it networked access to all the other apps that might live there.&lt;/p&gt;

&lt;p&gt;This is an access control situation that mostly just does what you want it to do without asking. Unless you ask it to, Fly.io isn&amp;rsquo;t exposing anything to the Internet, or to other users of Fly.io. For instance: say we have a database we&amp;rsquo;re going to use to generate reports. It can hang out on our Fly organization, inside of a private network with no connectivity to the world. We can spin up a Livebook instance that can talk to it, without doing any network or infrastructure engineering to make that happen.&lt;/p&gt;

&lt;p&gt;But wait, there&amp;rsquo;s more. Because this is all Elixir, Livebook also allows you to connect to any running Erlang/Elixir application in your infrastructure to debug, introspect, and monitor them.&lt;/p&gt;

&lt;p&gt;Check out this clip of Chris McCord connecting &lt;a href='https://rtt.fly.dev/' title=''&gt;to an existing application&lt;/a&gt; during the keynote:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/4qoHPh0obv0?start=1106"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Running a snippet of code from a laptop on a remote server is a neat trick, but Livebook is doing something deeper than that. It&amp;rsquo;s taking advantage of Erlang/Elixir&amp;rsquo;s native facility with cluster computation and making it available to the notebook. As a result, when we do things like auto-completing, Livebook delivers results from modules defined on the remote note itself. 🤯&lt;/p&gt;
&lt;h2 id='elastic-scale-with-flame' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#elastic-scale-with-flame' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Elastic scale with FLAME&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;When we first introduced FLAME, the example we used was video encoding.&lt;/p&gt;

&lt;p&gt;Video encoding is complicated and slow enough that you&amp;rsquo;d normally make arrangements to run it remotely or in a background job queue, or as a triggerable Lambda function. The point of FLAME is to get rid of all those steps, and give them over to the framework instead. So: we wrote our &lt;code&gt;ffpmeg&lt;/code&gt; calls inline like normal code, as if they were going to complete in microseconds, and wrapped them in &lt;code&gt;Flame.call&lt;/code&gt; blocks. That was it, that was the demo.&lt;/p&gt;

&lt;p&gt;Here, we&amp;rsquo;re going to put a little AI spin on it.&lt;/p&gt;

&lt;p&gt;The first thing we&amp;rsquo;re doing here is driving FLAME pools from Livebook. Livebook will automatically synchronize your notebook dependencies as well as any module or code defined in your notebook across nodes. That means any code we write in our notebook can be dispatched transparently out to arbitrarily many compute nodes, without ceremony.&lt;/p&gt;

&lt;p&gt;Now let&amp;rsquo;s add some AI flair. We take an object store bucket full of video files. We use &lt;code&gt;ffmpeg&lt;/code&gt; to extract stills from the video at different moments. Then: we send them to &lt;a href='https://www.llama.com/' title=''&gt;Llama&lt;/a&gt;, running on &lt;a href='https://fly.io/gpu' title=''&gt;GPU Fly Machines&lt;/a&gt; (still locked to our organization), to get descriptions of the stills.&lt;/p&gt;

&lt;p&gt;All those stills and descriptions get streamed back to our notebook, in real time:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/4qoHPh0obv0?start=1692"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;At the end, the descriptions are sent to &lt;a href='https://mistral.ai/' title=''&gt;Mistral&lt;/a&gt;, which builds a summary.&lt;/p&gt;

&lt;p&gt;Thanks to FLAME, we get explicit control over the minimum and the maximum amount of nodes you want running at once, as well their concurrency settings. As nodes finish processing each video, new ones are automatically sent to them, until the whole bucket has been traversed. Each node will automatically shut down after an idle timeout and the whole cluster terminates if you disconnect the Livebook runtime.&lt;/p&gt;

&lt;p&gt;Just like your app code, FLAME lets you take your notebook code designed to run locally, change almost nothing, and elastically execute it across ephemeral infrastructure.&lt;/p&gt;
&lt;h2 id='64-gpus-hyperparameter-tuning-on-a-laptop' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#64-gpus-hyperparameter-tuning-on-a-laptop' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;64-GPUs hyperparameter tuning on a laptop&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Next, Chris Grainger, CTO of &lt;a href='https://amplified.ai/' title=''&gt;Amplified&lt;/a&gt;, takes the stage.&lt;/p&gt;

&lt;p&gt;For work at Amplified, Chris wants to analyze a gigantic archive of patents, on behalf of a client doing edible cannibinoid work. To do that, he uses a BERT model (BERT, from Google, is one of the OG &amp;ldquo;transformer&amp;rdquo; models, optimized for text comprehension).&lt;/p&gt;

&lt;p&gt;To make the BERT model effective for this task, he&amp;rsquo;s going to do a hyperparameter training run.&lt;/p&gt;

&lt;p&gt;This is a much more complicated AI task than the Llama work we just showed up. Chris is going to generate a cluster of 64 GPU Fly Machines, each running an &lt;a href='https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/' title=''&gt;L40s GPU&lt;/a&gt;. On each of these nodes, he needs to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;setup its environment (including native dependencies and GPU bindings)
&lt;/li&gt;&lt;li&gt;load the training data
&lt;/li&gt;&lt;li&gt;compile a different version of BERT with different parameters, optimizers, etc.
&lt;/li&gt;&lt;li&gt;start the fine-tuning
&lt;/li&gt;&lt;li&gt;stream its results in real-time to each assigned chart
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Here&amp;rsquo;s the clip. You&amp;rsquo;ll see the results stream in, in real time, directly back to his Livebook. We&amp;rsquo;ll wait, because it won&amp;rsquo;t take long to watch:&lt;/p&gt;
&lt;div class="youtube-container" data-exclude-render&gt;
  &lt;div class="youtube-video"&gt;
    &lt;iframe
      width="100%"
      height="100%"
      src="https://www.youtube.com/embed/4qoHPh0obv0?start=3344"
      frameborder="0"
      allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
      allowfullscreen&gt;
    &lt;/iframe&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id='this-is-just-the-beginning' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#this-is-just-the-beginning' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;This is just the beginning&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The suggestion of mixing Livebook and FLAME to elastically scale notebook execution was originally proposed by Chris Grainger during ElixirConf EU. During the next four months, Jonatan Kłosko, Chris McCord, and José Valim worked part-time on making it a reality in time for ElixirConf US. Our ability to deliver such a rich combination of features in such a short period of time is a testament to the capabilities of the Erlang Virtual Machine, which Elixir and Livebook runs on. Other features, such as &lt;a href='https://github.com/elixir-explorer/explorer/issues/932' title=''&gt;remote dataframes and distributed GC&lt;/a&gt;, were implemented in a weekend.  Bringing the same functionality to other ecosystems would take several additional months, sometimes accompanied by millions in funding, and often times as part of a closed-source product.&lt;/p&gt;

&lt;p&gt;Furthermore, since we announced this feature, &lt;a href='https://github.com/mruoss' title=''&gt;Michael Ruoss&lt;/a&gt; stepped in and brought the same functionality to Kubernetes. From Livebook v0.14.1, you can start Livebook runtimes inside a Kubernetes cluster and also use FLAME to elastically scale them. Expect more features and news in this space!&lt;/p&gt;

&lt;p&gt;Finally, Fly&amp;rsquo;s infrastructure played a key role in making it possible to start a cluster of GPUs in seconds rather than minutes, and all it requires is a Docker image. We&amp;rsquo;re looking forward to see how other technologies and notebook platforms can leverage Fly to also elevate their developer experiences.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Launch a GPU app in seconds&lt;/h1&gt;
    &lt;p&gt;Run your own LLMs or use Livebook for elastic GPU workflows&amp;nbsp✨&lt;/p&gt;
      &lt;a class="btn btn-lg" href="/gpu"&gt;
        Go! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-turtle.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

</content>
  </entry>
  <entry>
    <title>Accident Forgiveness</title>
    <link rel="alternate" href="https://fly.io/blog/accident-forgiveness/"/>
    <id>https://fly.io/blog/accident-forgiveness/</id>
    <published>2024-08-21T00:00:00+00:00</published>
    <updated>2024-09-11T00:04:08+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/accident-forgiveness/assets/money-for-mistakes-blog-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. &lt;a href="https://fly.io/speedrun" title=""&gt;Try it out; you’ll be deployed in just minutes&lt;/a&gt;, and, as you’re about to read, with less financial risk.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Public cloud billing is terrifying.&lt;/p&gt;

&lt;p&gt;The premise of a public cloud &amp;mdash; what sets it apart from a hosting provider &amp;mdash; is 8,760 hours/year of on-tap deployable compute, storage, and networking. Cloud resources are &amp;ldquo;elastic&amp;rdquo;: they&amp;rsquo;re acquired and released as needed; in the &amp;ldquo;cloud-iest&amp;rdquo; apps, without human intervention. Public cloud resources behave like utilities, and that&amp;rsquo;s how they&amp;rsquo;re priced.&lt;/p&gt;

&lt;p&gt;You probably can&amp;rsquo;t tell me how much electricity your home is using right now, and  may only come within tens of dollars of accurately predicting your water bill. But neither of those bills are all that scary, because you assume there&amp;rsquo;s a limit to how much you could run them up in a single billing interval.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s not true of public clouds. There are only so many ways to &amp;ldquo;spend&amp;rdquo; water at your home, but there are indeterminably many ways to land on a code path that grabs another VM, or to miskey a configuration, or to leak long-running CI/CD environments every time a PR gets merged. Pick a practitioner at random, and I bet they&amp;rsquo;ve read a story within the last couple months about someone running up a galactic-scale bill at some provider or other.&lt;/p&gt;
&lt;h2 id='implied-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#implied-accident-forgiveness' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Implied Accident Forgiveness&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;For people who don&amp;rsquo;t do a lot of cloud work, what all this means is that every PR push sets off a little alarm in the back of their heads: &amp;ldquo;you may have just incurred $200,000 of costs!&amp;rdquo;. The alarm is quickly silenced,  though it&amp;rsquo;s still subtly extracting a cortisol penalty. But by deadening the nerves that sense the danger of unexpected charges, those people are nudged closer to themselves being the next story on Twitter about an accidental $200,000 bill.&lt;/p&gt;

&lt;p&gt;The saving grace here, which you&amp;rsquo;ll learn if you ever become that $200,000 story, is that nobody pays those bills.&lt;/p&gt;

&lt;p&gt;See, what cloud-savvy people know already is that providers have billing support teams, which spend a big chunk of their time conceding disputed bills. If you do something luridly stupid and rack up costs, AWS and GCP will probably cut you a break. We will too. Everyone does.&lt;/p&gt;

&lt;p&gt;If you didn&amp;rsquo;t already know this, you&amp;rsquo;re welcome; I&amp;rsquo;ve made your life a little better, even if you don&amp;rsquo;t run things on Fly.io.&lt;/p&gt;

&lt;p&gt;But as soothing as it is to know you can get a break from cloud providers, the billing situation here is still a long ways away from &amp;ldquo;good&amp;rdquo;. If you accidentally add a zero to a scale count and don&amp;rsquo;t notice for several weeks, AWS or GCP will probably cut you a break. But they won&amp;rsquo;t &lt;em&gt;definitely&lt;/em&gt; do it, and even though your odds are good, you&amp;rsquo;re still finding out at email- and phone-tag scale speeds. That&amp;rsquo;s not fun!&lt;/p&gt;
&lt;h2 id='explicit-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#explicit-accident-forgiveness' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Explicit Accident Forgiveness&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Charging you for stuff you didn&amp;rsquo;t want is bad business.&lt;/p&gt;

&lt;p&gt;Good business, we think, means making you so comfortable with your cloud you try new stuff. You, and everyone else on your team. Without a chaperone from the finance department.&lt;/p&gt;

&lt;p&gt;So we&amp;rsquo;re going to do the work to make this official. If you&amp;rsquo;re a customer of ours, we&amp;rsquo;re going to charge you in exacting detail for every resource you intentionally use of ours, but if something blows up and you get an unexpected bill, we&amp;rsquo;re going to let you off the hook.&lt;/p&gt;
&lt;h2 id='not-so-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#not-so-fast' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Not So Fast&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;This is a Project, with a capital P. While we&amp;rsquo;re kind of kicking ourselves for not starting it earlier, there are reasons we couldn&amp;rsquo;t do it back in 2020.&lt;/p&gt;

&lt;p&gt;The Fully Automated Accident-Forgiving Billing System of the Future (which we are in fact building and may even one day ship) will give you a line-item veto on your invoice. We are a long ways away. The biggest reason is fraud.&lt;/p&gt;

&lt;p&gt;Sit back, close your eyes, and try to think about everything public clouds do to make your life harder. Chances are, most of those things are responses to fraud. Cloud platforms attract fraudsters like ants to an upturned ice cream cone. Thanks to the modern science of cryptography, fraudsters have had a 15 year head start on turning metered compute into picodollar-granular near-money assets.&lt;/p&gt;

&lt;p&gt;Since there&amp;rsquo;s no bouncer at the door checking IDs here, an open-ended and automated commitment to accident forgiveness is, with iron certainty, going to be used overwhelmingly in order to trick us into &amp;ldquo;forgiving&amp;rdquo; cryptocurrency miners. We&amp;rsquo;re cloud platform engineers. They&amp;rsquo;re our primary pathogen.&lt;/p&gt;

&lt;p&gt;So, we&amp;rsquo;re going to roll this out incrementally.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;Why not billing alerts?&lt;/strong&gt; We’ll get there, but here too there are reasons we haven’t yet: (1) meaningful billing alerts were incredibly difficult to do with our previous billing system, and building the new system and migrating our customers to it has been a huge lift, a nightmare from which we are only now waking (the billing system’s official name); and (2) we’re wary about alerts being a product design cop-out; if we can alert on something, why aren’t we fixing it?&lt;/p&gt;
&lt;/div&gt;&lt;h2 id='accident-forgiveness-v0-84beta' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#accident-forgiveness-v0-84beta' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Accident Forgiveness v0.84beta&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;All the same subtextual, implied reassurances that every cloud provider offers remain in place at Fly.io. You are strictly better off after this announcement, we promise.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I added the “almost” right before publishing, because I’m chicken.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Now: for customers that have a support contract with us, at any level, there&amp;rsquo;s something new: I&amp;rsquo;m saying the quiet part loud. The next time you see a bill with an unexpected charge on it, we&amp;rsquo;ll refund that charge, (almost) no questions asked.&lt;/p&gt;

&lt;p&gt;That policy is so simple it feels anticlimactic to write. So, some additional color commentary:&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;re not advertising a limit to the number of times you can do this. If you&amp;rsquo;re a serious customer of ours, I promise that you cannot remotely fathom the fullness of our fellow-feeling. You&amp;rsquo;re not annoying us by getting us to refund unexpected charges. If you are growing a project on Fly.io, we will bend over backwards to keep you growing.&lt;/p&gt;

&lt;p&gt;How far can we take this? How simple can we keep this policy? We&amp;rsquo;re going to find out together.&lt;/p&gt;

&lt;p&gt;To begin with, and in the spirit of &amp;ldquo;doing things that won&amp;rsquo;t scale&amp;rdquo;, when we forgive a bill, what&amp;rsquo;s going to happen next is this: I&amp;rsquo;m going to set an irritating personal reminder for Kurt to look into what happened, now and then the day before your next bill, so we can see what&amp;rsquo;s going wrong. He&amp;rsquo;s going to hate that, which is the point: our best feature work is driven by Kurt-hate.&lt;/p&gt;

&lt;p&gt;Obviously, if you&amp;rsquo;re rubbing your hands together excitedly over the opportunity this policy presents, then, well, not so much with the fellow-feeling. We reserve the right to cut you off.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Support For Developers, By Developers&lt;/h1&gt;
    &lt;p&gt;Explicit Accident Forgiveness is just one thing we like about Support at Fly.io.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/accident-forgiveness"&gt;
        Go find out! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='whats-next-accident-protection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next-accident-protection' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What&amp;rsquo;s Next: Accident Protection&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We think this is a pretty good first step. But that&amp;rsquo;s all it is.&lt;/p&gt;

&lt;p&gt;We can do better than offering you easy refunds for mistaken deployments and botched CI/CD jobs. What&amp;rsquo;s better than getting a refund is never incurring the charge to begin with, and that&amp;rsquo;s the next step we&amp;rsquo;re working on.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;More to come on that billing system.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We built a new billing system so that we can do things like that. For instance: we&amp;rsquo;re in a position to catch sudden spikes in your month-over-month bills, flag them, and catch weird-looking deployments before we bill for them.&lt;/p&gt;

&lt;p&gt;Another thing we rebuilt billing for is &lt;a href='https://community.fly.io/t/reservation-blocks-40-discount-on-machines-when-youre-ready-to-commit/20858' title=''&gt;reserved pricing&lt;/a&gt;. Already today you can get a steep discount from us reserving blocks of compute in advance. The trick to taking advantage of reserved pricing is confidently predicting a floor to your usage. For a lot of people, that means fighting feelings of loss aversion (nobody wants to get gym priced!). So another thing we can do in this same vein: catch opportunities to move customers to reserved blocks, and offer backdated reservations. We&amp;rsquo;ll figure this out too.&lt;/p&gt;

&lt;p&gt;Someday, when we&amp;rsquo;re in a monopoly position, our founders have all been replaced by ruthless MBAs, and Kurt has retired to farm coffee beans in lower Montana, we may stop doing this stuff. But until that day this is the right choice for our business.&lt;/p&gt;

&lt;p&gt;Meanwhile: like every public cloud, we provision our own hardware, and we have excess capacity. Your messed-up CI/CD jobs didn&amp;rsquo;t really cost us anything, so if you didn&amp;rsquo;t really want them, they shouldn&amp;rsquo;t cost you anything either. Take us up on this! We love talking to you.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>We're Cutting L40S Prices In Half</title>
    <link rel="alternate" href="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/"/>
    <id>https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/</id>
    <published>2024-08-15T00:00:00+00:00</published>
    <updated>2024-09-11T00:04:08+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/assets/gpu-ga-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. And as of today, cheaper GPUs. &lt;a href="https://fly.io/speedrun" title=""&gt;Try it out; you’ll be deployed in just minutes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We just lowered the prices on NVIDIA L40s GPUs to $1.25 per hour. Why? Because our feet are cold and we burn processor cycles for heat. But also other reasons.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s back up.&lt;/p&gt;

&lt;p&gt;We offer 4 different NVIDIA GPU models; in increasing order of performance, they&amp;rsquo;re the A10, the L40S, the 40G PCI A100, and the 80G SXM A100.  Guess which one is most popular.&lt;/p&gt;

&lt;p&gt;We guessed wrong, and spent a lot of time working out how to maximize the amount of GPU power we could deliver to a single Fly Machine. Users surprised us. By a wide margin, the most popular GPU in our inventory is the A10.&lt;/p&gt;

&lt;p&gt;The A10 is an older generation of NVIDIA GPU with fewer, slower cores and less memory. It&amp;rsquo;s the least capable GPU we offer. But that doesn&amp;rsquo;t matter, because it&amp;rsquo;s capable enough. It&amp;rsquo;s solid for random inference tasks, and handles mid-sized generative AI stuff like Mistral Nemo or Stable Diffusion. For those workloads, there&amp;rsquo;s not that much benefit in getting a beefier GPU.&lt;/p&gt;

&lt;p&gt;As a result, we can&amp;rsquo;t get new A10s in fast enough for our users.&lt;/p&gt;

&lt;p&gt;If there&amp;rsquo;s one thing we&amp;rsquo;ve learned by talking to our customers over the last 4 years, it&amp;rsquo;s that y&amp;#39;all love a peek behind the curtain. So we&amp;rsquo;re going to let you in on a little secret about how a hardware provider like Fly.io formulates GPU strategy: none of us know what the hell we&amp;rsquo;re doing.&lt;/p&gt;

&lt;p&gt;If you had asked us in 2023 what the biggest GPU problem we could solve was, we&amp;rsquo;d have said &amp;ldquo;selling fractional A100 slices&amp;rdquo;. We burned a whole quarter trying to get MIG, or at least vGPUs, working through IOMMU PCI passthrough on Fly Machines, in a project so cursed that Thomas has forsworn ever programming again. Then we went to market selling whole A100s, and for several more months it looked like the biggest problem we needed to solve was finding a secure way to expose NVLink-ganged A100 clusters to VMs so users could run training. Then H100s; can we find H100s anywhere? Maybe in a black market in Shenzhen?&lt;/p&gt;

&lt;p&gt;And here we are, a year later, looking at the data, and the least sexy, least interesting GPU part in the catalog is where all the action is.&lt;/p&gt;

&lt;p&gt;With actual customer data to back up the hypothesis, here&amp;rsquo;s what we think is happening today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most users who want to plug GPU-accelerated AI workloads into fast networks are doing inference, not training. 
&lt;/li&gt;&lt;li&gt;The hyperscaler public clouds strangle these customers, first with GPU instance surcharges, and then with egress fees for object storage data when those customers try to outsource the GPU stuff to GPU providers.
&lt;/li&gt;&lt;li&gt;If you&amp;rsquo;re trying to do something GPU-accelerated in response to an HTTP request, the right combination of GPU, instance RAM, fast object storage for datasets and model parameters, and networking is much more important than getting your hands on an H100.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This is a thing we didn&amp;rsquo;t see coming, but should have: training workloads tend to look more like batch jobs, and inference tends to look more like transactions. Batch training jobs aren&amp;rsquo;t that sensitive to networking or even reliability. Live inference jobs responding to end-user HTTP requests are. So, given our pricing, of course the A10s are a sweet spot.&lt;/p&gt;

&lt;p&gt;The next step up in our lineup after the A10 is the L40S. The L40S is a nice piece of kit. We&amp;rsquo;re going to take a beat here and sell you on the L40S, because it&amp;rsquo;s kind of awesome.&lt;/p&gt;

&lt;p&gt;The L40S is an AI-optimized version of the L40, which is the data center version of the GeForce RTX 4090, resembling two 4090s stapled together.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re not a GPU hardware person, the RTX 4090 is a gaming GPU, the kind you&amp;rsquo;d play ray-traced Witcher 3 on. NVIDIA&amp;rsquo;s high-end gaming GPUs are actually reasonably good at AI workloads! But they suck in a data center rack: they chug power, they&amp;rsquo;re hard to cool, and they&amp;rsquo;re less dense. Also, NVIDIA can&amp;rsquo;t charge as much for them.&lt;/p&gt;

&lt;p&gt;Hence the L40: (much) more memory, less energy consumption, designed for a rack, not a tower case. Marked up for &amp;ldquo;enterprise&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;NVIDIA positioned the L40 as a kind of &amp;ldquo;graphics&amp;rdquo; AI GPU. Unlike the super high-end cards like the A100/H100, the L40 keeps all the rendering hardware, so it&amp;rsquo;s good for 3D graphics and video processing. Which is sort of what you&amp;rsquo;d expect from a &amp;ldquo;professionalized&amp;rdquo; GeForce card.&lt;/p&gt;

&lt;p&gt;A funny thing happened in the middle of 2023, though: the market for ultra-high-end NVIDIA cards went absolutely batshit. The huge cards you&amp;rsquo;d gang up for training jobs got impossible to find, and NVIDIA became one of the most valuable companies in the world. Serious shops started working out plans to acquire groups of L40-type cards to work around the problem, whether or not they had graphics workloads.&lt;/p&gt;

&lt;p&gt;The only company in this space that does know what they&amp;rsquo;re doing is NVIDIA. Nobody has written a highly-ranked Reddit post about GPU workloads without NVIDIA noticing and creating a new SKU. So they launched the L40S, which is an L40 with AI workload compute performance comparable to that of the A100 (without us getting into the details of F32 vs. F16 models).&lt;/p&gt;

&lt;p&gt;Long story short, the L40S is an A100-performer that we can price for A10 customers; the Volkswagen GTI of our lineup. We&amp;rsquo;re going to see if we can make that happen.&lt;/p&gt;

&lt;p&gt;We think the combination of just-right-sized inference GPUs and Tigris object storage is pretty killer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model parameters, data sets, and compute are all close together
&lt;/li&gt;&lt;li&gt;everything plugged into an Anycast network that&amp;rsquo;s fast everywhere in the world
&lt;/li&gt;&lt;li&gt;on VM instances that have enough memory to actually run real frameworks on
&lt;/li&gt;&lt;li&gt;priced like we actually want you to use it.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;You should use L40S cards without thinking hard about it. So we&amp;rsquo;re making it official. You won&amp;rsquo;t pay us a dime extra to use one instead of an A10. Have at it! Revolutionize the industry. For $1.25 an hour.&lt;/p&gt;

&lt;p&gt;Here are things you can do with an L40S on Fly.io today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can run Llama 3.1 70B — a big Llama — for LLM jobs.
&lt;/li&gt;&lt;li&gt;You can run Flux from Black Forest Labs for genAI images.
&lt;/li&gt;&lt;li&gt;You can run Whisper for automated speech recognition.
&lt;/li&gt;&lt;li&gt;You can do whole-genome alignment with SegAlign (Thomas&amp;rsquo; biochemist kid who has been snatching free GPU hours for his lab gave us this one, and we&amp;rsquo;re taking his word for it).
&lt;/li&gt;&lt;li&gt;You can run DOOM Eternal, building the Stadia that Google couldn&amp;rsquo;t pull off, because the L40S hasn&amp;rsquo;t forgotten that it&amp;rsquo;s a graphics GPU. 
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;It&amp;rsquo;s going to get chilly in Chicago in a month or so. Go light some cycles on fire! &lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Making Machines Move</title>
    <link rel="alternate" href="https://fly.io/blog/machine-migrations/"/>
    <id>https://fly.io/blog/machine-migrations/</id>
    <published>2024-07-30T00:00:00+00:00</published>
    <updated>2024-08-09T12:14:08+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/machine-migrations/assets/migrations-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io, a global public cloud with simple, developer-friendly ergonomics. If you’ve got a working Docker image, we’ll transmogrify it into a Fly Machine: a VM running on our hardware anywhere in the world. &lt;a href="https://fly.io/speedrun" title=""&gt;Try it out; you’ll be deployed in just minutes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;At the heart of our platform is a systems design tradeoff about durable storage for applications.  When we added storage three years ago, to support stateful apps, we built it on attached NVMe drives. A benefit: a Fly App accessing a file on a Fly Volume is never more than a bus hop away from the data. A cost: a Fly App with an attached Volume is anchored to a particular worker physical.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;&lt;code&gt;bird&lt;/code&gt;: a BGP4 route server.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Before offering attached storage, our on-call runbook was almost as simple as “de-bird that edge server”, “tell &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;Nomad&lt;/a&gt; to drain that worker”, and “go back to sleep”. NVMe cost us that drain operation, which terribly complicated the lives of our infra team. We’ve spent the last year getting “drain” back. It’s one of the biggest engineering lifts we&amp;rsquo;ve made, and if you didn’t notice, we lifted it cleanly.&lt;/p&gt;
&lt;h3 id='the-goalposts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-goalposts' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Goalposts&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;With stateless apps, draining a worker is easy. For each app instance running on the victim server, start a new instance elsewhere. Confirm it&amp;rsquo;s healthy, then kill the old one. Rinse, repeat. At our 2020 scale, we could drain a fully loaded worker in just a handful of minutes.&lt;/p&gt;

&lt;p&gt;You can see why this process won&amp;rsquo;t work for apps with attached volumes. Sure, create a new volume elsewhere on the fleet, and boot up a new Fly Machine attached to it. But the new volume is empty. The data&amp;rsquo;s still stuck on the original worker. We asked, and customers were not OK with this kind of migration.&lt;/p&gt;

&lt;p&gt;Of course, we back Volumes snapshots up (at an interval) to off-network storage. But for “drain”, restoring backups isn&amp;rsquo;t nearly good enough. No matter the backup interval, a “restore from backup migration&amp;quot; will lose data, and a “backup and restore” migration  incurs untenable downtime.&lt;/p&gt;

&lt;p&gt;The next thought you have is, “OK, copy the volume over”. And, yes, of course you have to do that. But you can’t just &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt;, and then &lt;code&gt;kill&lt;/code&gt; the old Fly Machine. Because the original Fly Machine is still alive and writing, you have to &lt;code&gt;kill&lt;/code&gt;first, then &lt;code&gt;copy&lt;/code&gt;, then &lt;code&gt;boot&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Fly Volumes can get pretty big. Even to a rack buddy physical server, you&amp;rsquo;ll hit a point where draining incurs minutes of interruption, especially if you’re moving lots of volumes simultaneously. &lt;code&gt;Kill&lt;/code&gt;, &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; is too slow.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;There’s a world where even 15 minutes of interruption is tolerable. It’s the world where you run more than one instance of your application to begin with, so prolonged interruption of a single Fly Machine isn’t visible to your users. Do this! But we have to live in the same world as our customers, many of whom don’t run in high-availability configurations.&lt;/p&gt;
&lt;/div&gt;&lt;h3 id='behold-the-clone-o-mat' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#behold-the-clone-o-mat' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Behold The Clone-O-Mat&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;Copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt;, &lt;code&gt;kill&lt;/code&gt; loses data. &lt;code&gt;Kill&lt;/code&gt;, &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; takes too long. What we needed is a new operation: &lt;code&gt;clone&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Clone&lt;/code&gt; is a lazier, asynchronous &lt;code&gt;copy&lt;/code&gt;. It creates a new volume elsewhere on our fleet, just like &lt;code&gt;copy&lt;/code&gt; would. But instead of blocking, waiting to transfer every byte from the original volume, &lt;code&gt;clone&lt;/code&gt; returns immediately, with a transfer running in the background.&lt;/p&gt;

&lt;p&gt;A new Fly Machine can be booted with that cloned volume attached. Its blocks are mostly empty. But that’s OK: when the new Fly Machine tries to read from it, the block storage system works out whether the block has been transferred. If it hasn’t, it’s fetched over the network from the original volume; this is called &amp;ldquo;hydration&amp;rdquo;. Writes are even easier, and don’t hit the network at all.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Kill&lt;/code&gt;, &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; is slow. But &lt;code&gt;kill&lt;/code&gt;, &lt;code&gt;clone&lt;/code&gt;, &lt;code&gt;boot&lt;/code&gt; is fast; it can be made asymptotically as fast as stateless migration.&lt;/p&gt;

&lt;p&gt;There are three big moving pieces to this design.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;First, we have to rig up our OS storage system to make this &lt;code&gt;clone&lt;/code&gt; operation work.
&lt;/li&gt;&lt;li&gt;Then, to read blocks over the network, we need a network protocol. (Spoiler: iSCSI, though we tried other stuff first.)
&lt;/li&gt;&lt;li&gt;Finally, the gnarliest of those pieces is our orchestration logic: what’s running where, what state is it in, and whether it’s plugged in correctly.
&lt;/li&gt;&lt;/ol&gt;
&lt;h3 id='block-level-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#block-level-clone' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Block-Level Clone&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;The Linux feature we need to make this work already exists; &lt;a href='https://docs.kernel.org/admin-guide/device-mapper/dm-clone.html' title=''&gt;it’s called &lt;code&gt;dm-clone&lt;/code&gt;&lt;/a&gt;. Given an existing, readable storage device, &lt;code&gt;dm-clone&lt;/code&gt; gives us a new device, of identical size, where reads of uninitialized blocks will pull from the original. It sounds terribly complicated, but it’s actually one of the simpler kernel lego bricks. Let&amp;rsquo;s demystify it.&lt;/p&gt;

&lt;p&gt;As far as Unix is concerned, random-access storage devices, be they spinning rust or NVMe drives, are all instances of the common class “block device”. A block device is addressed in fixed-size (say, 4KiB) chunks, and &lt;a href='https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L356' title=''&gt;handles (roughly) these operations&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cpp"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-woz6bsz9"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-woz6bsz9"&gt;&lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;req_opf&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cm"&gt;/* read sectors from the device */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_READ&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* write sectors to the device */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_WRITE&lt;/span&gt;        &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* flush the volatile write cache */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_FLUSH&lt;/span&gt;        &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* discard sectors */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_DISCARD&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* securely erase sectors */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_SECURE_ERASE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* write the same sector many times */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_WRITE_SAME&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* write the zero filled sector many times */&lt;/span&gt;
    &lt;span class="n"&gt;REQ_OP_WRITE_ZEROES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cm"&gt;/* ... */&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can imagine designing a simple network protocol that supported all these options. It might have messages that looked something like:&lt;/p&gt;

&lt;p&gt;&lt;img alt="A packet diagram, just skip down to &amp;quot;struct bio&amp;quot; below" src="/blog/machine-migrations/assets/packet.png?2/3&amp;amp;center" /&gt;
Good news! The Linux block system is organized as if your computer was a network running a protocol that basically looks just like that. Here’s the message structure:&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;I’ve &lt;a href="https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L223" title=""&gt;stripped a bunch of stuff out of here&lt;/a&gt; but you don’t need any of it to understand what’s coming next.&lt;/p&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative cpp"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-kwrloyie"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-kwrloyie"&gt;&lt;span class="cm"&gt;/*
 * main unit of I/O for the block layer and lower layers (ie drivers and
 * stacking drivers)
 */&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;bio&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;gendisk&lt;/span&gt;      &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bi_disk&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;        &lt;span class="n"&gt;bi_opf&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;short&lt;/span&gt;      &lt;span class="n"&gt;bi_flags&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;short&lt;/span&gt;      &lt;span class="n"&gt;bi_ioprio&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;blk_status_t&lt;/span&gt;        &lt;span class="n"&gt;bi_status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;short&lt;/span&gt;      &lt;span class="n"&gt;bi_vcnt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;         &lt;span class="cm"&gt;/* how many bio_vec's */&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;bio_vec&lt;/span&gt;      &lt;span class="n"&gt;bi_inline_vecs&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="cm"&gt;/* (page, len, offset) tuples */&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;No nerd has ever looked at a fixed-format message like this without thinking about writing a proxy for it, and &lt;code&gt;struct bio&lt;/code&gt; is no exception. The proxy system in the Linux kernel for &lt;code&gt;struct bio&lt;/code&gt; is called &lt;code&gt;device mapper&lt;/code&gt;, or DM.&lt;/p&gt;

&lt;p&gt;DM target devices can plug into other DM devices. For that matter, they can do whatever the hell else they want, as long as they honor the interface. It boils down to a &lt;code&gt;map(bio)&lt;/code&gt; function, which can dispatch a &lt;code&gt;struct bio&lt;/code&gt;, or drop it, or muck with it and ask the kernel to resubmit it.&lt;/p&gt;

&lt;p&gt;You can do a whole lot of stuff with this interface: carve a big device into a bunch of smaller ones (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/linear.html' title=''&gt;&lt;code&gt;dm-linear&lt;/code&gt;&lt;/a&gt;), make one big striped device out of a bunch of smaller ones (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/striped.html' title=''&gt;&lt;code&gt;dm-stripe&lt;/code&gt;&lt;/a&gt;), do software RAID mirroring (&lt;code&gt;dm-raid1&lt;/code&gt;), create snapshots of arbitrary existing devices (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/snapshot.html' title=''&gt;&lt;code&gt;dm-snap&lt;/code&gt;&lt;/a&gt;), cryptographically verify boot devices (&lt;a href='https://docs.kernel.org/admin-guide/device-mapper/verity.html' title=''&gt;&lt;code&gt;dm-verity&lt;/code&gt;&lt;/a&gt;), and a bunch more. Device Mapper is the kernel backend for the &lt;a href='https://sourceware.org/lvm2/' title=''&gt;userland LVM2 system&lt;/a&gt;, which is how we do &lt;a href='https://fly.io/blog/persistent-storage-and-fast-remote-builds/' title=''&gt;thin pools and snapshot backups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Which brings us to &lt;code&gt;dm-clone&lt;/code&gt; : it’s a map function that boils down to:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cpp"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-8n5vrld6"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-8n5vrld6"&gt;    &lt;span class="cm"&gt;/* ... */&lt;/span&gt; 
    &lt;span class="n"&gt;region_nr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bio_to_region&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// we have the data&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dm_clone_is_region_hydrated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_nr&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;remap_and_issue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// we don't and it's a read&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bio_data_dir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;READ&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;remap_to_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// we don't and it's a write&lt;/span&gt;
    &lt;span class="n"&gt;remap_to_dest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;hydrate_bio_region&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="cm"&gt;/* ... */&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;a &lt;a href="https://docs.kernel.org/admin-guide/device-mapper/kcopyd.html" title=""&gt;&lt;code&gt;kcopyd&lt;/code&gt;&lt;/a&gt; thread runs in the background, rehydrating the device in addition to (and independent of) read accesses.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code&gt;dm-clone&lt;/code&gt; takes, in addition to the source device to clone from, a “metadata” device on which is stored a bitmap of the status of all the blocks: either “rehydrated” from the source, or not. That’s how it knows whether to fetch a block from the original device or the clone.&lt;/p&gt;
&lt;h3 id='network-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#network-clone' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Network Clone&lt;/span&gt;&lt;/h3&gt;&lt;div class="callout"&gt;&lt;p&gt;&lt;strong class="font-semibold text-navy-950"&gt;&lt;code&gt;flyd&lt;/code&gt; in a nutshell:&lt;/strong&gt; worker physical run a service, &lt;code&gt;flyd&lt;/code&gt;, which manages a couple of databases that are the source of truth for all the Fly Machines running there. Concepturally, &lt;code&gt;flyd&lt;/code&gt; is a server for on-demand instances of durable finite state machines, each representing some operation on a Fly Machine (creation, start, stop, &amp;amp;c), with the transition steps recorded carefully in a BoltDB database. An FSM step might be something like “assign a local IPv6 address to this interface”, or “check out a block device with the contents of this container”, and it’s straightforward to add and manage new ones.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Say we&amp;rsquo;ve got &lt;code&gt;flyd&lt;/code&gt; managing a Fly Machine with a volume on &lt;code&gt;worker-xx-cdg1-1&lt;/code&gt;. We want it running on &lt;code&gt;worker-xx-cdg1-2&lt;/code&gt;. Our whole fleet is meshed with WireGuard; everything can talk directly to everything else. So, conceptually:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-1&lt;/code&gt; stops the Fly Machine, and
&lt;/li&gt;&lt;li&gt;sends a message to &lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-2&lt;/code&gt; telling it to clone the source volume.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-2&lt;/code&gt; starts a &lt;code&gt;dm-clone&lt;/code&gt; instance, which creates a clone volume on &lt;code&gt;cdg1-2&lt;/code&gt;, populating it, over some kind of network block protocol, from &lt;code&gt;cdg1-1&lt;/code&gt;, and
&lt;/li&gt;&lt;li&gt;boots a new Fly Machine, attached to the clone volume.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;flyd&lt;/code&gt; on &lt;code&gt;cdg1-2&lt;/code&gt; monitors the clone operation, and, when the clone completes, converts the clone device to a simple linear device and cleans up.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;For step (3) to work, the “original volume” on &lt;code&gt;cdg1-1&lt;/code&gt; has to be visible on &lt;code&gt;cdg1-2&lt;/code&gt;, which means we need to mount it over the network.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;&lt;code&gt;nbd&lt;/code&gt; is so simple that it’s used as a sort of &lt;code&gt;dm-user&lt;/code&gt; userland block device; to prototype a new block device, &lt;a href="https://lwn.net/ml/linux-kernel/[email protected]/" title=""&gt;don’t bother writing a kernel module&lt;/a&gt;, just write an &lt;code&gt;nbd&lt;/code&gt; server.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Take your pick of  protocols. iSCSI is the obvious one, but it’s relatively complicated, and Linux has native support for a much simpler one: &lt;code&gt;nbd&lt;/code&gt;, the “network block device”. You could implement an &lt;code&gt;nbd&lt;/code&gt; server in an afternoon, on top of a file or a SQLite database or S3, and the Linux kernel could mount it as a drive.&lt;/p&gt;

&lt;p&gt;We started out using &lt;code&gt;nbd&lt;/code&gt;. But we kept getting stuck &lt;code&gt;nbd&lt;/code&gt; kernel threads when there was any kind of network disruption. We’re a global public cloud; network disruption  happens. Honestly, we could have debugged our way through this. But it was simpler just to spike out an iSCSI implementation, observe that didn’t get jammed up when the network hiccuped, and move on.&lt;/p&gt;
&lt;h3 id='putting-the-pieces-together' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#putting-the-pieces-together' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Putting The Pieces Together&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;To drain a worker with minimal downtime and no lost data, we turn workers into a temporary SANs, serving the volumes we need to drain to fresh-booted replica Fly Machines on a bunch of “target” physicals. Those SANs — combinations of &lt;code&gt;dm-clone&lt;/code&gt;, iSCSI, and &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;our &lt;code&gt;flyd&lt;/code&gt; orchestrator&lt;/a&gt; — track the blocks copied from the origin, copying each one exactly once and cleaning up when the original volume has been fully copied.&lt;/p&gt;

&lt;p&gt;Problem solved!&lt;/p&gt;
&lt;h3 id='no-there-were-more-problems' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#no-there-were-more-problems' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;No, There Were More Problems&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;When your problem domain is hard, anything you build whose design you can’t fit completely in your head is going to be a fiasco. Shorter form: “if you see Raft consensus in a design, we’ve done something wrong”.&lt;/p&gt;

&lt;p&gt;A virtue of this migration system is that, for as many moving pieces as it has, it fits in your head. What complexity it has is mostly shouldered by strategic bets we’ve already  built teams around, most notably the &lt;code&gt;flyd&lt;/code&gt; orchestrator. So we’ve been running this system for the better part of a year without much drama. Not no drama, though. Some drama.&lt;/p&gt;

&lt;p&gt;Example: we encrypt volumes. Our key management is fussy. We do per-volume encryption keys that provision alongside the volumes themselves, so no one worker has a volume skeleton key.&lt;/p&gt;

&lt;p&gt;If you think “migrating those volume keys from worker to worker” is the problem I’m building up to, well, that too, but the bigger problem is &lt;code&gt;trim&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Most people use just a small fraction of the volumes they allocate. A 100GiB volume with just 5MiB used wouldn’t be at all weird. You don’t want to spend minutes copying a volume that could have been fully hydrated in seconds.&lt;/p&gt;

&lt;p&gt;And indeed, &lt;code&gt;dm-clone&lt;/code&gt; doesn’t want to do that either. Given a source block device (for us, an iSCSI mount) and the clone device, a &lt;code&gt;DISCARD&lt;/code&gt; issued on the clone device will get picked up by &lt;code&gt;dm-clone&lt;/code&gt;, which will simply &lt;a href='https://elixir.bootlin.com/linux/v5.11.11/source/drivers/md/dm-clone-target.c#L1357' title=''&gt;short-circuit the read&lt;/a&gt; of the relevant blocks by marking them as hydrated in the metadata volume. Simple enough.&lt;/p&gt;

&lt;p&gt;To make that work, we need the target worker to see the plaintext of the source volume (so that it can do an &lt;code&gt;fstrim&lt;/code&gt; — don’t get us started on how annoying it is to sandbox this — to read the filesystem, identify the unused block, and issue the &lt;code&gt;DISCARDs&lt;/code&gt; where &lt;code&gt;dm-clone&lt;/code&gt; can see them) Easy enough.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;these curses have a lot to do with how hard it was to drain workers!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Except: two different workers, for cursed reasons, might be running different versions of &lt;a href='https://gitlab.com/cryptsetup/cryptsetup' title=''&gt;cryptsetup&lt;/a&gt;, the userland bridge between LUKS2 and the &lt;a href='https://docs.kernel.org/admin-guide/device-mapper/dm-crypt.html' title=''&gt;kernel dm-crypt driver&lt;/a&gt;. There are (or were) two different versions of cryptsetup on our network, and they default to different &lt;a href='https://fossies.org/linux/cryptsetup/docs/on-disk-format-luks2.pdf' title=''&gt;LUKS2 header sizes&lt;/a&gt; — 4MiB and 16MiB. Implying two different plaintext volume sizes. &lt;/p&gt;

&lt;p&gt;So now part of the migration FSM is an RPC call that carries metadata about the designed LUKS2 configuration for the target VM. Not something we expected to have to build, but, whatever.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Corrosion deserves its own post.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Gnarlier example: workers are the source of truth for information about the Fly Machines running on them. Migration knocks the legs out from under that constraint, which we were relying on in Corrosion, the SWIM-gossip SQLite database we use to connect Fly Machines to our request routing. Race conditions. Debugging. Design changes. Next!&lt;/p&gt;

&lt;p&gt;Gnarliest example: our private networks. Recall: we automatically place every Fly Machine into &lt;a href='https://fly.io/blog/incoming-6pn-private-networks/' title=''&gt;a private network&lt;/a&gt;; by default, it’s the one all the other apps in your organization run in. This is super handy for setting up background services, databases, and clustered applications. 20 lines of eBPF code in our worker kernels keeps anybody from “crossing the streams”, sending packets from one private network to another.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;we’re members of an elite cadre of idiots who have managed to find designs that made us wish IPv6 addresses were even bigger.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We call this scheme 6PN (for “IPv6 Private Network”). It functions by &lt;a href='https://fly.io/blog/ipv6-wireguard-peering#ipv6-private-networking-at-fly' title=''&gt;embedding routing information directly into IPv6 addresses&lt;/a&gt;. This is, perhaps, gross. But it allows us to route diverse private networks with constantly changing membership across a global fleet of servers without running a distributed routing protocol. As the beardy wizards who kept the Internet backbone up and running on Cisco AGS+’s once said: the best routing protocol is “static”.&lt;/p&gt;

&lt;p&gt;Problem: the embedded routing information in a 6PN address refers in part to specific worker servers.&lt;/p&gt;

&lt;p&gt;That’s fine, right? They’re IPv6 addresses. Nobody uses literal IPv6 addresses. Nobody uses IP addresses at all; they use the DNS. When you migrate a host, just give it a new 6PN address, and update the DNS.&lt;/p&gt;

&lt;p&gt;Friends, somebody did use literal IPv6 addresses. It was us. In the configurations for Fly Postgres clusters.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;It’s also not operationally easy for us to shell into random Fly Machines, for good reason.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The obvious fix for this is not complicated; given &lt;code&gt;flyctl&lt;/code&gt; ssh access to a Fly Postgres cluster, it’s like a 30 second ninja edit. But we run a &lt;em&gt;lot&lt;/em&gt; of Fly Postgres clusters, and the change has to be coordinated carefully to avoid getting the cluster into a confused state. We went as far as adding feature to our &lt;code&gt;init&lt;/code&gt; to do network address mappings to keep old 6PN addresses reachable before biting the bullet and burning several weeks doing the direct configuration fix fleet-wide.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Speedrun your app onto Fly.io.&lt;/h1&gt;
    &lt;p&gt;3&amp;hellip;2&amp;hellip;1&amp;hellip;&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/speedrun"&gt;
        Go! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-dog.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h3 id='the-learning-it-burns' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-learning-it-burns' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Learning, It Burns!&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;We get asked a lot why we don’t do storage the “obvious” way, with an &lt;a href='https://aws.amazon.com/ebs/' title=''&gt;EBS-type&lt;/a&gt; SAN fabric, abstracting it away from our compute. Locally-attached NVMe storage is an idiosyncratic choice, one we’ve had to write disclaimers for (single-node clusters can lose data!) since we first launched it.&lt;/p&gt;

&lt;p&gt;One answer is: we’re a startup. Building SAN infrastructure in every region we operate in would be tremendously expensive. Look at any feature in AWS that normal people know the name of, like EC2, EBS, RDS, or S3 — there’s a whole company in there. We launched storage when we were just 10 people, and even at our current size we probably have nothing resembling the resources EBS gets. AWS is pretty great!&lt;/p&gt;

&lt;p&gt;But another thing to keep in mind is: we’re learning as we go. And so even if we had the means to do an EBS-style SAN, we might not build it today.&lt;/p&gt;

&lt;p&gt;Instead, we’re a lot more interested in log-structured virtual disks (LSVD). LSVD uses NVMe as a local cache, but durably persists writes in object storage. You get most of the performance benefit of bus-hop disk writes, along with unbounded storage and S3-grade reliability.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://community.fly.io/t/bottomless-s3-backed-volumes/15648' title=''&gt;We launched LSVD experimentally last year&lt;/a&gt;; in the intervening year, something happened to make LSVD even more interesting to us: &lt;a href='https://www.tigrisdata.com/' title=''&gt;Tigris Data&lt;/a&gt; launched S3-compatible object storage in every one our regions, so instead of backhauling updates to Northern Virginia, &lt;a href='https://community.fly.io/t/tigris-backed-volumes/20792' title=''&gt;we can keep them local&lt;/a&gt;. We have more to say about LSVD, and a lot more to say about Tigris.&lt;/p&gt;

&lt;p&gt;Our first several months of migrations were done gingerly. By summer of 2024, we got to where our infra team can pull “drain this host” out of their toolbelt without much ceremony.&lt;/p&gt;

&lt;p&gt;We’re still not to the point where we’re migrating casually. Your Fly Machines are probably not getting migrated! There&amp;rsquo;d need to be a reason! But the dream is fully-automated luxury space migration, in which you might get migrated semiregularly, as our systems work not just to drain problematic hosts but to rebalance workloads regularly. No time soon. But we’ll get there.&lt;/p&gt;

&lt;p&gt;This is the biggest thing our team has done since we replaced Nomad with flyd. Only the new billing system comes close. We did this thing not because it was easy, but because we thought it would be easy. It was not. But: worth it!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>AWS without Access Keys</title>
    <link rel="alternate" href="https://fly.io/blog/oidc-cloud-roles/"/>
    <id>https://fly.io/blog/oidc-cloud-roles/</id>
    <published>2024-06-19T00:00:00+00:00</published>
    <updated>2024-06-25T22:52:32+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/oidc-cloud-roles/assets/spooky-security-skeleton-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;It’s dangerous to go alone. Fly.io runs full-stack apps by transmuting Docker containers into Fly Machines: ultra-lightweight hardware-backed VMs. You can run all your dependencies on Fly.io, but sometimes, you’ll need to work with other clouds, and we’ve made that pretty simple. Try Fly.io out for yourself; your Rails or Node app &lt;a href="https://fly.io/speedrun" title=""&gt;can be up and running in just minutes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s hypopulate you an app serving generative AI cat images based on the weather forecast, running on a &lt;code&gt;g4dn.xlarge&lt;/code&gt; ECS task in AWS &lt;code&gt;us-east-1&lt;/code&gt;.  It&amp;rsquo;s going great; people didn&amp;rsquo;t realize how dependent their cat pic prefs are on barometric pressure, and you&amp;rsquo;re all anyone can talk about.&lt;/p&gt;

&lt;p&gt;Word reaches Australia and Europe, but you&amp;rsquo;re not catching on, because the… latency is too high? Just roll with us here. Anyways: fixing this is going to require replicating  ECS tasks and ECR images into &lt;code&gt;ap-southeast-2&lt;/code&gt; and &lt;code&gt;eu-central-1&lt;/code&gt; while also setting up load balancing. Nah.&lt;/p&gt;

&lt;p&gt;This is the O.G. Fly.io deployment story; one deployed app, one versioned container, one command to get it running anywhere in the world.&lt;/p&gt;

&lt;p&gt;But you have a problem: your app relies on training data, it&amp;rsquo;s huge, your giant employer manages it, and it&amp;rsquo;s in S3. Getting this to work will require AWS credentials.&lt;/p&gt;

&lt;p&gt;You could ask your security team to create a user, give it permissions, and hand over the AWS keypair. Then you could wash your neck and wait for the blade. Passing around AWS keypairs is the beginning of every horror story told about cloud security, and  security team ain&amp;rsquo;t having it.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a better way. It&amp;rsquo;s drastically more secure, so your security people will at least hear you out. It&amp;rsquo;s also so much easier on Fly.io that you might never bother creating a IAM service account again.&lt;/p&gt;
&lt;h2 id='lets-get-it-out-of-the-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lets-get-it-out-of-the-way' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Let&amp;rsquo;s Get It out of the Way&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;re going to use OIDC to set up strictly limited trust between AWS and Fly.io.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In AWS: we&amp;rsquo;ll add Fly.io as an &lt;code&gt;Identity Provider&lt;/code&gt; in AWS IAM, giving us an ID we can plug into any IAM &lt;code&gt;Role&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;Also in AWS: we&amp;rsquo;ll create a &lt;code&gt;Role&lt;/code&gt;, give it access to the S3 bucket with our tokenized cat data, and then attach the &lt;code&gt;Identity Provider&lt;/code&gt; to it.
&lt;/li&gt;&lt;li&gt;In Fly.io, we&amp;rsquo;ll take the &lt;code&gt;Role&lt;/code&gt; ARN we got from step 2 and set it as an environment variable in our app.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Our machines will now magically have access to the S3 bucket.&lt;/p&gt;
&lt;h2 id='what-the-what' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-the-what' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What the What&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;A reasonable question to ask here is, &amp;ldquo;where&amp;rsquo;s the credential&amp;rdquo;? Ordinarily, to give a Fly Machine access to an AWS resource, you&amp;rsquo;d use &lt;code&gt;fly secrets set&lt;/code&gt; to add an &lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt; and &lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt; to the environment in the Machine. Here, we&amp;rsquo;re not setting any secrets at all; we&amp;rsquo;re just adding an ARN — which is not a credential — to the Machine.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s what&amp;rsquo;s happening.&lt;/p&gt;

&lt;p&gt;Fly.io operates an OIDC IdP at &lt;code&gt;oidc.fly.io&lt;/code&gt;. It issues OIDC tokens, exclusively to Fly Machines. AWS can be configured to trust these tokens, on a role-by-role basis. That&amp;rsquo;s the &amp;ldquo;secret credential&amp;rdquo;: the pre-configured trust relationship in IAM, and the public keypairs it manages. You, the user, never need to deal with these keys directly; it all happens behind the scenes, between AWS and Fly.io.&lt;/p&gt;

&lt;p&gt;&lt;img alt="A diagram: STS trusts OIDC.fly.io. OIDC.fly.io trusts flyd. flyd issues a token to the Machine, which proffers it to STS. STS sends an STS cred to the Machine, which then uses it to retrieve model weights from S3." src="/blog/oidc-cloud-roles/assets/oidc-diagram.webp" /&gt;&lt;/p&gt;

&lt;p&gt;The key actor in this picture is &lt;code&gt;STS&lt;/code&gt;, the AWS &lt;code&gt;Security Token Service&lt;/code&gt;. &lt;code&gt;STS&lt;/code&gt;&amp;lsquo;s main job is to vend short-lived AWS credentials, usually through some variant of an API called &lt;code&gt;AssumeRole&lt;/code&gt;. Specifically, in our case: &lt;code&gt;AssumeRoleWithWebIdentity&lt;/code&gt; tells &lt;code&gt;STS&lt;/code&gt; to cough up an AWS keypair given an OIDC token (that matches a pre-configured trust relationship).&lt;/p&gt;

&lt;p&gt;That still leaves the question: how does your code, which is reaching out to the AWS APIs to get cat weights, drive any of this?&lt;/p&gt;
&lt;h2 id='the-init-thickens' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-init-thickens' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The Init Thickens&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Every Fly Machine boots up into an &lt;code&gt;init&lt;/code&gt; we wrote in Rust. It has slowly been gathering features.&lt;/p&gt;

&lt;p&gt;One of those features, which has been around for awhile, is a server for a Unix socket at &lt;code&gt;/.fly/api&lt;/code&gt;, which exports a subset of the Fly Machines API to privileged processes in the Machine. Think of it as our answer to the EC2 Instant Metadata Service. How it works is, every time we boot a Fly Machine, we pass it a &lt;a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''&gt;Macaroon token&lt;/a&gt; locked to that particular Machine; &lt;code&gt;init&lt;/code&gt;&amp;rsquo;s server for &lt;code&gt;/.fly/api&lt;/code&gt; is a proxy that attaches that token to requests.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;In addition to the API proxy being tricky to SSRF to.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;What&amp;rsquo;s neat about this is that the credential that drives &lt;code&gt;/.fly/api&lt;/code&gt; is doubly protected:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Fly.io platform won&amp;rsquo;t honor it unless it comes from that specific Fly Machine (&lt;code&gt;flyd&lt;/code&gt;, our orchestrator, knows who it&amp;rsquo;s talking to), &lt;em&gt;and&lt;/em&gt;
&lt;/li&gt;&lt;li&gt;Ordinary code running in a Fly Machine never gets a copy of the token to begin with.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;You could rig up a local privilege escalation vulnerability and work out how to steal the Macaroon, but you can&amp;rsquo;t exfiltrate it productively.&lt;/p&gt;

&lt;p&gt;So now you have half the puzzle worked out: OIDC is just part of the &lt;a href='https://fly.io/docs/machines/api/' title=''&gt;Fly Machines API&lt;/a&gt; (specifically: &lt;code&gt;/v1/tokens/oidc&lt;/code&gt;). A Fly Machine can hit a Unix socket and get an OIDC token tailored to that machine:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-xdcj19sc"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-xdcj19sc"&gt;{
  "app_id": "3671581",
  "app_name": "weather-cat",
  "aud": "sts.amazonaws.com",
  "image": "image:latest",
  "image_digest": "sha256:dff79c6da8dd4e282ecc6c57052f7cfbd684039b652f481ca2e3324a413ee43f",
  "iss": "https://oidc.fly.io/example",
  "machine_id": "3d8d377ce9e398",
  "machine_name": "ancient-snow-4824",
  "machine_version": "01HZJXGTQ084DX0G0V92QH3XW4",
  "org_id": "29873298",
  "org_name": "example",
  "region": "yyz",
  "sub": "example:weather-cat:ancient-snow-4824"
} // some OIDC stuff trimmed
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Look upon this holy blob, sealed with a published key managed by Fly.io&amp;rsquo;s OIDC vault, and see that there lies within it enough information for AWS &lt;code&gt;STS&lt;/code&gt; to decide to issue a session credential.&lt;/p&gt;

&lt;p&gt;We have still not completed the puzzle, because while you can probably now see how you&amp;rsquo;d drive this process with a bunch of new code that you&amp;rsquo;d tediously write, you are acutely aware that you have not yet endured that tedium — e pur si muove!&lt;/p&gt;

&lt;p&gt;One &lt;code&gt;init&lt;/code&gt; feature remains to be disclosed, and it&amp;rsquo;s cute.&lt;/p&gt;

&lt;p&gt;If, when &lt;code&gt;init&lt;/code&gt; starts in a Fly Machine, it sees an &lt;code&gt;AWS_ROLE_ARN&lt;/code&gt; environment variable set, it initiates a little dance; it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;goes off and generates an OIDC token, the way we just described,
&lt;/li&gt;&lt;li&gt;saves that OIDC token in a file, &lt;em&gt;and&lt;/em&gt;
&lt;/li&gt;&lt;li&gt;sets the &lt;code&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/code&gt; and &lt;code&gt;AWS_ROLE_SESSION_NAME&lt;/code&gt; environment variables for every process it launches.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;The AWS SDK, linked to your application, does all the rest.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s review: you add an &lt;code&gt;AWS_ROLE_ARN&lt;/code&gt; variable to your Fly App, launch a Machine, and have it go fetch a file from S3. What happens next is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;init&lt;/code&gt; detects &lt;code&gt;AWS_ROLE_ARN&lt;/code&gt; is set as an environment variable.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;init&lt;/code&gt; sends a request to &lt;code&gt;/v1/tokens/oidc&lt;/code&gt; via &lt;code&gt;/.api/proxy&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;&lt;code&gt;init&lt;/code&gt; writes the response to &lt;code&gt;/.fly/oidc_token.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;&lt;code&gt;init&lt;/code&gt; sets &lt;code&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/code&gt; and &lt;code&gt;AWS_ROLE_SESSION_NAME&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;The entrypoint boots, and (say) runs &lt;code&gt;aws s3 get-object.&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;The AWS SDK runs through the &lt;a href='https://docs.aws.amazon.com/sdkref/latest/guide/standardized-credentials.html#credentialProviderChain' title=''&gt;credential provider chain&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;The SDK sees that &lt;code&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/code&gt; is set and calls &lt;code&gt;AssumeRoleWithWebIdentity&lt;/code&gt; with the file contents.
&lt;/li&gt;&lt;li&gt;AWS verifies the token against &lt;a href='https://oidc.fly.io/' title=''&gt;&lt;code&gt;https://oidc.fly.io/&lt;/code&gt;&lt;/a&gt;&lt;code&gt;example/.well-known/openid-configuration&lt;/code&gt;, which references a key Fly.io manages on isolated hardware.
&lt;/li&gt;&lt;li&gt;AWS vends &lt;code&gt;STS&lt;/code&gt; credentials for the assumed &lt;code&gt;Role&lt;/code&gt;.
&lt;/li&gt;&lt;li&gt;The SDK uses the &lt;code&gt;STS&lt;/code&gt; credentials to access the S3 bucket.
&lt;/li&gt;&lt;li&gt;AWS checks the &lt;code&gt;Role&lt;/code&gt;&amp;rsquo;s IAM policy to see if it has access to the S3 bucket.
&lt;/li&gt;&lt;li&gt;AWS returns the contents of the bucket object.
&lt;/li&gt;&lt;/ol&gt;
&lt;h2 id='how-much-better-is-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-much-better-is-this' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;How Much Better Is This?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;It is a lot better.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;They asymptotically approach the security properties of Macaroon tokens.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Most importantly: AWS &lt;code&gt;STS&lt;/code&gt; credentials are short-lived. Because they&amp;rsquo;re generated dynamically, rather than stored in a configuration file or environment variable, they&amp;rsquo;re already a little bit annoying for an attacker to recover. But they&amp;rsquo;re also dead in minutes. They have a sharply limited blast radius. They rotate themselves, and fail closed.&lt;/p&gt;

&lt;p&gt;They&amp;rsquo;re also easier to manage. This is a rare instance where you can reasonably drive the entire AWS side of the process from within the web console. Your cloud team adds &lt;code&gt;Roles&lt;/code&gt; all the time; this is just a &lt;code&gt;Role&lt;/code&gt; with an extra snippet of JSON. The resulting ARN isn&amp;rsquo;t even a secret; your cloud team could just email or Slack message it back to you.&lt;/p&gt;

&lt;p&gt;Finally, they offer finer-grained control.&lt;/p&gt;

&lt;p&gt;To understand the last part, let&amp;rsquo;s look at that extra snippet of JSON (the &amp;ldquo;Trust Policy&amp;rdquo;) your cloud team is sticking on the new &lt;code&gt;cat-bucket&lt;/code&gt; &lt;code&gt;Role&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-x99m930o"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-x99m930o"&gt;{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::123456123456:oidc-provider/oidc.fly.io/example"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
              "StringEquals": {
                "oidc.fly.io/example:aud": "sts.amazonaws.com",
              },
               "StringLike": {
                "oidc.fly.io/example:sub": "example:weather-cat:*"
              }
            }
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="right-sidenote"&gt;&lt;p&gt;The &lt;code&gt;aud&lt;/code&gt; check guarantees &lt;code&gt;STS&lt;/code&gt; will only honor tokens that Fly.io deliberately vended for &lt;code&gt;STS&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Recall the OIDC token we dumped earlier; much of what&amp;rsquo;s in it, we can match in the Trust Policy. Every OIDC token Fly.io generates is going to have a &lt;code&gt;sub&lt;/code&gt; field formatted &lt;code&gt;org:app:machine&lt;/code&gt;, so we can lock IAM &lt;code&gt;Roles&lt;/code&gt; down to organizations, or to specific Fly Apps, or even specific Fly Machine instances.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Speedrun your app onto Fly.io.&lt;/h1&gt;
    &lt;p&gt;3&amp;hellip;2&amp;hellip;1&amp;hellip;&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/speedrun"&gt;
        Go! &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-kitty.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='and-so' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#and-so' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;And So&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;In case it&amp;rsquo;s not obvious: this pattern works for any AWS API, not just S3.&lt;/p&gt;

&lt;p&gt;Our OIDC support on the platform and in Fly Machines will set arbitrary OIDC &lt;code&gt;audience&lt;/code&gt; strings, so you can use it to authenticate to any OIDC-compliant cloud provider. It won&amp;rsquo;t be as slick on Azure or GCP, because we haven&amp;rsquo;t done the &lt;code&gt;init&lt;/code&gt; features to light their APIs up with a single environment variable — but those features are easy, and we&amp;rsquo;re just waiting for people to tell us what they need.&lt;/p&gt;

&lt;p&gt;For us, the gold standard for least-privilege, conditional access tokens remains Macaroons, and it&amp;rsquo;s unlikely that we&amp;rsquo;re going to do a bunch of internal stuff using OIDC. We even snuck Macaroons into this feature. But the security you&amp;rsquo;re getting from this OIDC dance closes a lot of the gap between hardcoded user credentials and Macaroons, and it&amp;rsquo;s easy to use — easier, in some ways, than it is to manage role-based access inside of a legacy EC2 deployment!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Picture This: Open Source AI for Image Description</title>
    <link rel="alternate" href="https://fly.io/blog/llm-image-description/"/>
    <id>https://fly.io/blog/llm-image-description/</id>
    <published>2024-05-09T00:00:00+00:00</published>
    <updated>2024-05-23T20:00:21+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/llm-image-description/assets/image-description-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;I’m Nolan, and I work on Fly Machines here at Fly.io. We’re building a new public cloud—one where you can spin up CPU and GPU workloads, around the world, in a jiffy. &lt;a href="https://fly.io/speedrun/" title=""&gt;Try us out&lt;/a&gt;; you can be up and running in minutes. This is a post about LLMs being really helpful, and an extensible project you can build with open source on a weekend.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Picture this, if you will.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;re blind. You&amp;rsquo;re in an unfamiliar hotel room on a trip to Chicago.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;If you live in Chicago IRL, imagine the hotel in Winnipeg, &lt;a href="https://www.cbc.ca/history/EPISCONTENTSE1EP10CH3PA5LE.html" title=""&gt;the Chicago of the North&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;You&amp;rsquo;ve absent-mindedly set your coffee down, and can&amp;rsquo;t remember where. You&amp;rsquo;re looking for the thermostat so you don&amp;rsquo;t wake up frozen. Or, just maybe, you&amp;rsquo;re playing a fun-filled round of &amp;ldquo;find the damn light switch so your sighted partner can get some sleep already!&amp;rdquo;&lt;/p&gt;

&lt;p&gt;If, like me, you&amp;rsquo;ve been blind for a while, you have plenty of practice finding things without the luxury of a quick glance around. It may be more tedious than you&amp;rsquo;d like, but you&amp;rsquo;ll get it done.&lt;/p&gt;

&lt;p&gt;But the speed of innovation in machine learning and large language models has been dizzying, and in 2024 you can snap a photo with your phone and have an app like &lt;a href='https://www.bemyeyes.com/blog/announcing-be-my-ai/' title=''&gt;Be My AI&lt;/a&gt; or &lt;a href='https://www.seeingai.com/' title=''&gt;Seeing AI&lt;/a&gt; tell you where in that picture it found your missing coffee mug, or where it thinks the light switch is.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Creative switch locations seem to be a point of pride for hotels, so the light game may be good for a few rounds of quality entertainment, regardless of how good your AI is.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This is &lt;em&gt;big&lt;/em&gt;. It&amp;rsquo;s hard for me to state just how exciting and empowering AI image descriptions have been for me without sounding like a shill. In the past year, I&amp;rsquo;ve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Found shit in strange hotel rooms. 
&lt;/li&gt;&lt;li&gt;Gotten descriptions of scenes and menus in otherwise inaccessible video games.
&lt;/li&gt;&lt;li&gt;Requested summaries of technical diagrams and other materials where details weren’t made available textually. 
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;I&amp;rsquo;ve been consistently blown away at how impressive and helpful AI-created image descriptions have been.&lt;/p&gt;

&lt;p&gt;Also&amp;hellip;&lt;/p&gt;
&lt;h2 id='which-thousand-words-is-this-picture-worth' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#which-thousand-words-is-this-picture-worth' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Which thousand words is this picture worth?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;As a blind internet user for the last three decades, I have extensive empirical evidence to corroborate what you already know in your heart: humans are pretty flaky about writing useful alt text for all the images they publish. This does tend to make large swaths of the internet inaccessible to me!&lt;/p&gt;

&lt;p&gt;In just a few years, the state of image description on the internet has gone from complete reliance on the aforementioned lovable, but ultimately tragically flawed, humans, to automated strings of words like &lt;code&gt;Image may contain person, glasses, confusion, banality, disillusionment&lt;/code&gt;, to LLM-generated text that reads a lot like it was written by a person, perhaps sipping from a steaming cup of Earl Grey as they reflect on their previous experiences of a background that features a tree with snow on its branches, suggesting that this scene takes place during winter.&lt;/p&gt;

&lt;p&gt;If an image is missing alt text, or if you want a second opinion, there are screen-reader addons, like &lt;a href='https://github.com/cartertemm/AI-content-describer/' title=''&gt;this one&lt;/a&gt; for &lt;a href='https://www.nvaccess.org/download/' title=''&gt;NVDA&lt;/a&gt;, that you can use with an API key to get image descriptions from GPT-4 or Google Gemini as you read. This is awesome! &lt;/p&gt;

&lt;p&gt;And this brings me to the nerd snipe. How hard would it be to build an image description service we can host ourselves, using open source technologies? It turns out to be spookily easy.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s what I came up with:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href='https://ollama.com/' title=''&gt;Ollama&lt;/a&gt; to run the model
&lt;/li&gt;&lt;li&gt;A &lt;a href='https://pocketbase.io' title=''&gt;PocketBase&lt;/a&gt; project that provides a simple authenticated API for users to submit images, get descriptions, and ask followup questions about the image
&lt;/li&gt;&lt;li&gt;The simplest possible Python client to interact with the PocketBase app on behalf of users
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;The idea is to keep it modular and hackable, so if sentiment analysis or joke creation is your thing, you can swap out image description for that and have something going in, like, a weekend.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re like me, and you go skipping through recipe blogs to find the &amp;ldquo;go directly to recipe&amp;rdquo; link, find the code itself &lt;a href='https://github.com/superfly/llm-describer' title=''&gt;here&lt;/a&gt;. &lt;/p&gt;
&lt;h2 id='the-llm-is-the-easiest-part' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-llm-is-the-easiest-part' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;The LLM is the easiest part&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;An API to accept images and prompts, run the model, and spit 
out answers sounds like a lot! But it&amp;rsquo;s the simplest part of this whole thing, because: 
that&amp;rsquo;s &lt;a href='https://ollama.com/' title=''&gt;Ollama&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can just run the Ollama Docker image, get it to grab the model 
you want to use, and that&amp;rsquo;s it. There&amp;rsquo;s your AI server. (We have a &lt;a href='https://fly.io/blog/scaling-llm-ollama/' title=''&gt;blog post&lt;/a&gt; 
all about deploying Ollama on Fly.io; Fly GPUs are rad, try&amp;#39;em out, etc.).&lt;/p&gt;

&lt;p&gt;For this project, we need a model that can make sense&amp;mdash;or at least words&amp;mdash;out of a picture. 
&lt;a href='https://llava-vl.github.io/' title=''&gt;LLaVA&lt;/a&gt; is a trained, Apache-licensed &amp;ldquo;large multimodal model&amp;rdquo; that fits the bill. 
Get the model with the Ollama CLI:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-vsa102iz"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-vsa102iz"&gt;ollama pull llava:34b
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="callout"&gt;&lt;p&gt;If you have hardware that can handle it, you could run this on your computer at home. If you run AI models on a cloud provider, be aware that GPU compute is expensive! &lt;strong class="font-semibold text-navy-950"&gt;It’s important to take steps to ensure you’re not paying for a massive GPU 24/7.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On Fly.io, at the time of writing, you’d achieve this with the &lt;a href="https://fly.io/docs/apps/autostart-stop/" title=""&gt;autostart and autostop&lt;/a&gt; functions of the Fly Proxy, restricting Ollama access to internal requests over &lt;a href="https://fly.io/docs/networking/private-networking/#flycast-private-fly-proxy-services" title=""&gt;Flycast&lt;/a&gt; from the PocketBase app. That way, if there haven’t been any requests for a few minutes, the Fly Proxy stops the Ollama &lt;a href="https://fly.io/docs/machines/" title=""&gt;Machine&lt;/a&gt;, which releases the CPU, GPU, and RAM allocated to it. &lt;a href="https://fly.io/blog/scaling-llm-ollama/" title=""&gt;Here’s a post&lt;/a&gt; that goes into more detail. &lt;/p&gt;
&lt;/div&gt;&lt;h2 id='a-multi-tool-on-the-backend' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-multi-tool-on-the-backend' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A multi-tool on the backend&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I want user auth to make sure just anyone can&amp;rsquo;t grab my &amp;ldquo;image description service&amp;rdquo; and keep it busy generating short stories about their cat. If I build this out into a service for others to use, I might also want business logic around plans or
credits, or mobile-friendly APIs for use in the field. &lt;a href='https://pocketbase.io' title=''&gt;PocketBase&lt;/a&gt; provides a scaffolding for all of it. It&amp;rsquo;s a Swiss army knife: a Firebase-like API on top of SQLite, complete with authentication, authorization, an admin UI, extensibility in JavaScript and Go, and various client-side APIs.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Yes, &lt;em&gt;of course&lt;/em&gt; I’ve used an LLM to generate feline fanfic. Theme songs too. Hasn’t everyone? &lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I &amp;ldquo;faked&amp;rdquo; a task-specific API that supports followup questions by extending PocketBase in Go, modeling requests and responses as &lt;a href='https://pocketbase.io/docs/collections/' title=''&gt;collections&lt;/a&gt; (i.e. SQLite tables) with &lt;a href='https://pocketbase.io/docs/go-event-hooks/' title=''&gt;event hooks&lt;/a&gt; to trigger pre-set interactions with the Ollama app (via &lt;a href='https://tmc.github.io/langchaingo' title=''&gt;LangChainGo&lt;/a&gt;) and the client (via the PocketBase API).&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re following along, &lt;a href='https://github.com/superfly/llm-describer/blob/main/describer.go' title=''&gt;here&amp;rsquo;s the module&lt;/a&gt;
that handles all that, along with initializing the LLM connection.&lt;/p&gt;

&lt;p&gt;In a nutshell, this is the dance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When a user uploads an image, a hook on the &lt;code&gt;images&lt;/code&gt; collection sends the image to Ollama, along with this prompt:
&lt;code&gt;&amp;quot;You are a helpful assistant describing images for blind screen reader users. Please describe this image.&amp;quot;&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;Ollama sends back its response, which the backend 1) passes back to the client and 2) stores in its &lt;code&gt;followups&lt;/code&gt; collection for future reference.
&lt;/li&gt;&lt;li&gt;If the user responds with a followup question about the image and description, that also 
goes into the &lt;code&gt;followups&lt;/code&gt; collection; user-initiated changes to this collection trigger a hook to chain the new 
followup question with the image and the chat history into a new request for the model.
&lt;/li&gt;&lt;li&gt;Lather, rinse, repeat.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;This is a super simple hack to handle followup questions, and it’ll let you keep adding followups until 
something breaks. You&amp;rsquo;ll see the quality of responses get poorer&amp;mdash;possibly incoherent&amp;mdash;as the context 
exceeds the context window.&lt;/p&gt;

&lt;p&gt;I also set up &lt;a href='https://pocketbase.io/docs/api-rules-and-filters/' title=''&gt;API rules&lt;/a&gt; in PocketBase,
ensuring that users can&amp;rsquo;t read to and write from others&amp;rsquo; chats with the AI.&lt;/p&gt;

&lt;p&gt;If image descriptions aren&amp;rsquo;t your thing, this business logic is easily swappable 
for joke generation, extracting details from text, any other simple task you 
might want to throw at an LLM. Just slot the best model into Ollama (LLaVA is pretty OK as a general starting point too), and match the PocketBase schema and pre-set prompts to your application.&lt;/p&gt;
&lt;h2 id='a-seedling-of-a-client' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-seedling-of-a-client' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A seedling of a client&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;With the image description service in place, the user can talk to it with any client that speaks the PocketBase API. PocketBase already has SDK clients in JavaScript and Dart, but because my screen reader is &lt;a href='https://github.com/nvaccess/nvda' title=''&gt;written in Python&lt;/a&gt;, I went with a &lt;a href='https://pypi.org/project/pocketbase/' title=''&gt;community-created Python library&lt;/a&gt;. That way I can build this out into an NVDA add-on 
if I want to.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re a fancy Python developer, you probably have your preferred tooling for
handling virtualenvs and friends. I&amp;rsquo;m not, and since my screen reader doesn&amp;rsquo;t use those
anyway, I just &lt;code&gt;pip install&lt;/code&gt;ed the library so my client can import it:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-rgh35fwn"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-rgh35fwn"&gt;pip install pocketbase
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a href='https://github.com/superfly/llm-describer/blob/main/__init__.py' title=''&gt;My client&lt;/a&gt; is a very simple script. 
It expects a couple of things: a file called &lt;code&gt;image.jpg&lt;/code&gt;, located in the current directory, 
and environment variables to provide the service URL and user credentials to log into it with.&lt;/p&gt;

&lt;p&gt;When you run the client script, it uploads the image to the user’s &lt;code&gt;images&lt;/code&gt; collection on the 
backend app, starting the back-and-forth between user and model we saw in the previous section. 
The client prints the model&amp;rsquo;s output to the CLI and prompts the user to input a followup question, 
which it passes up to the &lt;code&gt;followups&lt;/code&gt; collection, and so on.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;This can run on Fly.io.&lt;/h1&gt;
    &lt;p&gt;Run your LLM on a datacenter-grade GPU.&lt;/p&gt;
      &lt;a class="btn btn-lg" href="https://fly.io/gpu/"&gt;
        Try out a Fly GPU &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-dog.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='all-together-now' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#all-together-now' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;All together now&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;I grabbed &lt;a href='https://unsplash.com/photos/brown-trees-beside-river-under-blue-sky-during-daytime-kSvpTrfhaiU' title=''&gt;this
image&lt;/a&gt;
and saved it to a file called &lt;em&gt;image.jpg&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;While I knew I was downloading an image of a winter scene, all I see on Unsplash is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;brown trees beside river under blue sky during daytime Bright winter landscape
with lake, snow, forest, beautiful blue sky and white clouds. An example of
charming wildlife in Russia.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let&amp;rsquo;s see what our very own AI describer thinks of this picture:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative plain"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-4bvdtqnv"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-4bvdtqnv"&gt;$ python __init__.py
The image depicts a serene winter scene. There is a small stream of water flowing through the center, with patches of snow on its surface and along the banks. The surrounding area appears to be a mix of bare trees and shrubs, suggesting it's late in the season or possibly early spring when some plants have started to grow again but haven't yet leafed out.

The sky is clear and blue, indicating good weather conditions at the time the photo was taken. The sunlight seems to be coming from the left side of the image, casting a warm glow on the scene and creating reflections in the water. There are no visible texts or distinguishing marks that provide additional context about the location or the photographer.

The overall atmosphere is peaceful and natural, with the quiet beauty of the landscape undisturbed except for the gentle movement of the stream.
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Is it a stellar description? Maybe not, but it certainly gives me a better sense of connection with the scene.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s see how our describer copes with a followup question.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative plain"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-1x1pz9x9"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-1x1pz9x9"&gt;Enter your followup question, or 'quit' to quit: What types of trees are in the image?
Sending followup question
It's difficult to determine the exact species of trees in the image without more specific details, as many deciduous trees lose their leaves during winter and look quite similar. However, based on the general appearance of the bare branches and the fact that they are leafless, it is likely that these are common deciduous trees such as oaks, maples, or elms.

The presence of snow and the absence of foliage suggest that this photo was taken during winter or early spring when the trees have not yet started to bloom again. The exact species would require a closer examination of the bark, buds, and other distinguishing features, which are not clearly visible in this image.
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Boo, the general-purpose LLaVA model couldn&amp;rsquo;t identify the leafless trees. At least it knows why it can&amp;rsquo;t. Maybe there&amp;rsquo;s a better model out 
there for that. Or we could train one, if we really needed tree identification! We could make every component of 
this service more sophisticated! &lt;/p&gt;

&lt;p&gt;But that I, personally, can make a proof of concept like this with a few days of effort
continues to boggle my mind. Thanks to a handful of amazing open source projects, it&amp;rsquo;s really, spookily, easy. And from here, I (or you) can build out a screen-reader addon, or a mobile app, or a different kind of AI service, with modular changes.&lt;/p&gt;
&lt;h2 id='deployment-notes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#deployment-notes' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Deployment notes&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;On Fly.io, stopping GPU Machines saves you a bunch of money and some carbon footprint, in return for cold-start latency when you make a request for the first time in more than a few minutes. In testing this project, on the &lt;code&gt;a100-40gb&lt;/code&gt; Fly Machine preset, the 34b-parameter LLaVA model took several seconds to generate each response. If the Machine was stopped when the request came in, starting it up took another handful of seconds, followed by several tens of seconds to load the model into GPU RAM. The total time from cold start to completed description was about 45 seconds. Just something to keep in mind.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re running Ollama in the cloud, you likely want to put the model onto storage that&amp;rsquo;s persistent, so you don&amp;rsquo;t have to download it repeatedly. You could also build the model into a Docker image ahead of deployment.&lt;/p&gt;

&lt;p&gt;The PocketBase Golang app compiles to a single executable that you can run wherever.
I run it on Fly.io, unsurprisingly, and the &lt;a href='https://github.com/superfly/llm-describer/' title=''&gt;repo&lt;/a&gt; comes with a Dockerfile and a &lt;a href='https://fly.io/docs/reference/configuration/' title=''&gt;&lt;code&gt;fly.toml&lt;/code&gt;&lt;/a&gt; config file, which you can edit to point at your own Ollama instance. It uses a small persistent storage volume for the SQLite database. Under testing, it runs fine on a &lt;code&gt;shared-cpu-1x&lt;/code&gt; Machine. &lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>JIT WireGuard</title>
    <link rel="alternate" href="https://fly.io/blog/jit-wireguard-peers/"/>
    <id>https://fly.io/blog/jit-wireguard-peers/</id>
    <published>2024-03-12T00:00:00+00:00</published>
    <updated>2024-05-23T20:00:21+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/jit-wireguard-peers/assets/network-thumbnail.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. We do a lot of stuff with WireGuard, which has become a part of our customer API. This is a quick story about some tricks we played to make WireGuard faster and more scalable for the hundreds of thousands of people who now use it here.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;One of many odd decisions we&amp;rsquo;ve made at Fly.io is how we use WireGuard. It&amp;rsquo;s not just that we use it in many places where other shops would use HTTPS and REST APIs. We&amp;rsquo;ve gone a step beyond that: every time you run &lt;code&gt;flyctl&lt;/code&gt;, our lovable, sprawling CLI, it conjures a TCP/IP stack out of thin air, with its own IPv6 address, and speaks directly to Fly Machines running on our networks.&lt;/p&gt;

&lt;p&gt;There are plusses and minuses to this approach, which we talked about &lt;a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''&gt;in a blog post a couple years back&lt;/a&gt;. Some things, like remote-operated Docker builders, get easier to express (a Fly Machine, as far as &lt;code&gt;flyctl&lt;/code&gt; is concerned, might as well be on the same LAN). But everything generally gets trickier to keep running reliably.&lt;/p&gt;

&lt;p&gt;It was a decision. We own it.&lt;/p&gt;

&lt;p&gt;Anyways, we&amp;rsquo;ve made some improvements recently, and I&amp;rsquo;d like to talk about them.&lt;/p&gt;
&lt;h2 id='where-we-left-off' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-we-left-off' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Where we left off&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Until a few weeks ago, our gateways ran on a pretty simple system.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We operate dozens of &amp;ldquo;gateway&amp;rdquo; servers around the world, whose sole purpose is to accept incoming WireGuard connections and connect them to the appropriate private networks.
&lt;/li&gt;&lt;li&gt;Any time you run &lt;code&gt;flyctl&lt;/code&gt; and it needs to talk to a Fly Machine (to build a container, pop an SSH console, copy files, or proxy to a service you&amp;rsquo;re running), it spawns or connects to a background agent process.
&lt;/li&gt;&lt;li&gt;The first time it runs, the agent generates a new WireGuard peer configuration from our GraphQL API. WireGuard peer configurations are very simple: just a public key and an address to connect to.
&lt;/li&gt;&lt;li&gt;Our API in turn takes that peer configuration and sends it to the appropriate gateway (say, &lt;code&gt;ord&lt;/code&gt;, if you&amp;rsquo;re near Chicago) via an RPC we send over the NATS messaging system.
&lt;/li&gt;&lt;li&gt;On the gateway, a service called &lt;code&gt;wggwd&lt;/code&gt; accepts that configuration, saves it to a SQLite database, and adds it to the kernel using WireGuard&amp;rsquo;s Golang libraries. &lt;code&gt;wggwd&lt;/code&gt; acknowledges the installation of the peer to the API.
&lt;/li&gt;&lt;li&gt;The API replies to your GraphQL request, with the configuration.
&lt;/li&gt;&lt;li&gt;Your &lt;code&gt;flyctl&lt;/code&gt; connects to the WireGuard peer, which works, because you receiving the configuration means it’s installed on the gateway.
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;I copy-pasted those last two bullet points from &lt;a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''&gt;that two-year-old post&lt;/a&gt;, because when it works, it does &lt;em&gt;just work&lt;/em&gt; reasonably well. (We ultimately did end up defaulting everybody to WireGuard-over-WebSockets, though.)&lt;/p&gt;

&lt;p&gt;But if it always worked, we wouldn&amp;rsquo;t be here, would we?&lt;/p&gt;

&lt;p&gt;We ran into two annoying problems:&lt;/p&gt;

&lt;p&gt;One: NATS is fast, but doesn’t guarantee delivery. Back in 2022, Fly.io was pretty big on NATS internally. We&amp;rsquo;ve moved away from it. For instance, our &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;internal &lt;code&gt;flyd&lt;/code&gt; API&lt;/a&gt; used to be driven by NATS; today, it&amp;rsquo;s HTTP.  Our NATS cluster was losing too many messages to host a reliable API on it. Scaling back our use of NATS made WireGuard gateways better, but still not great.&lt;/p&gt;

&lt;p&gt;Two: When &lt;code&gt;flyctl&lt;/code&gt; exits, the WireGuard peer it created sticks around on the gateway. Nothing cleans up old peers. After all, you&amp;rsquo;re likely going to come back tomorrow and deploy a new version of your app, or &lt;code&gt;fly ssh console&lt;/code&gt; into it to debug something. Why remove a peer just to re-add it the next day?&lt;/p&gt;

&lt;p&gt;Unfortunately, the vast majority of peers are created by &lt;code&gt;flyctl&lt;/code&gt; in CI jobs, which don&amp;rsquo;t have persistent storage and can&amp;rsquo;t reconnect to the same peer the next run; they generate new peers every time, no matter what.&lt;/p&gt;

&lt;p&gt;So, we ended up with a not-reliable-enough provisioning system, and gateways with hundreds of thousands of peers that will never be used again. The high stale peer count made kernel WireGuard operations very slow - especially loading all the peers back into the kernel after a gateway server reboot - as well as some kernel panics.&lt;/p&gt;

&lt;p&gt;There had to be&lt;/p&gt;
&lt;h2 id='a-better-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-better-way' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;A better way.&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Storing bajillions of WireGuard peers is no big challenge for any serious n-tier RDBMS. This isn&amp;rsquo;t &amp;ldquo;big data&amp;rdquo;. The problem we have at Fly.io is that our gateways don&amp;rsquo;t have serious n-tier RDBMSs. They&amp;rsquo;re small. Scrappy. They live off the land.&lt;/p&gt;

&lt;p&gt;Seriously, though: you could store every WireGuard peer everybody has ever used at Fly.io in a single SQLite database, easily.  What you can&amp;rsquo;t do is store them all in the Linux kernel.&lt;/p&gt;

&lt;p&gt;So, at some point, as you push more and more peer configurations to a gateway, you have to start making decisions about which peers you&amp;rsquo;ll enable in the kernel, and which you won&amp;rsquo;t.&lt;/p&gt;

&lt;p&gt;Wouldn&amp;rsquo;t it be nice if we just didn&amp;rsquo;t have this problem? What if, instead of pushing configs to gateways, we had the gateways pull them from our API on demand?&lt;/p&gt;

&lt;p&gt;If you did that, peers would only have to be added to the kernel when the client wanted to connect. You could yeet them out of the kernel any time you wanted; the next time the client connected, they&amp;rsquo;d just get pulled again, and everything would work fine.&lt;/p&gt;

&lt;p&gt;The problem you quickly run into to build this design is that Linux kernel WireGuard doesn&amp;rsquo;t have a feature for installing peers on demand. However:&lt;/p&gt;
&lt;h2 id='it-is-possible-to-jit-wireguard-peers' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#it-is-possible-to-jit-wireguard-peers' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;It is possible to JIT WireGuard peers&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The Linux kernel&amp;rsquo;s &lt;a href='https://github.com/WireGuard/wgctrl-go' title=''&gt;interface for configuring WireGuard&lt;/a&gt; is &lt;a href='https://docs.kernel.org/userspace-api/netlink/intro.html' title=''&gt;Netlink&lt;/a&gt; (which is basically a way to create a userland socket to talk to a kernel service). Here&amp;rsquo;s a &lt;a href='https://github.com/WireGuard/wg-dynamic/blob/master/netlink.h' title=''&gt;summary of it as a C API&lt;/a&gt;. Note that there&amp;rsquo;s no API call to subscribe for &amp;ldquo;incoming connection attempt&amp;rdquo; events.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s OK! We can just make our own events. WireGuard connection requests are packets, and they&amp;rsquo;re easily identifiable, so we can efficiently snatch them with a BPF filter and a &lt;a href='https://github.com/google/gopacket' title=''&gt;packet socket&lt;/a&gt;.&lt;/p&gt;
&lt;div class="callout"&gt;&lt;p&gt;Most of the time, it’s even easier for us to get the raw WireGuard packets, because our users now default to WebSockets WireGuard (which is just an unauthenticated WebSockets connect that shuttles framed UDP packets to and from an interface on the gateway), so that people who have trouble talking end-to-end in UDP can bring connections up.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We own the daemon code for that, and can just hook the packet receive function to snarf WireGuard packets.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s not obvious, but WireGuard doesn&amp;rsquo;t have notions of &amp;ldquo;client&amp;rdquo; or &amp;ldquo;server&amp;rdquo;. It&amp;rsquo;s a pure point-to-point protocol; peers connect to each other when they have traffic to send. The first peer to connect is called the &lt;strong class='font-semibold text-navy-950'&gt;initiator&lt;/strong&gt;, and the peer it connects to is the &lt;strong class='font-semibold text-navy-950'&gt;responder&lt;/strong&gt;.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;&lt;a href="https://www.wireguard.com/papers/wireguard.pdf" title=""&gt;&lt;em&gt;The WireGuard paper&lt;/em&gt;&lt;/a&gt; &lt;em&gt;is a good read.&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;For Fly.io, &lt;code&gt;flyctl&lt;/code&gt; is typically our initiator,  sending a single UDP packet to the gateway, which is the responder. According &lt;a href='https://www.wireguard.com/papers/wireguard.pdf' title=''&gt;to the WireGuard paper&lt;/a&gt;, this first packet is a &lt;code&gt;handshake initiation&lt;/code&gt;.  It gets better: the packet type is recorded in a single plaintext byte. So this simple BPF filter catches all the incoming connections: &lt;code&gt;udp and dst port 51820 and udp[8] = 1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In most other protocols, we&amp;rsquo;d be done at this point; we&amp;rsquo;d just scrape the username or whatnot out of the packet, go fetch the matching configuration, and install it in the kernel. With WireGuard, not so fast. WireGuard is based on Trevor Perrin&amp;rsquo;s &lt;a href='http://www.noiseprotocol.org/' title=''&gt;Noise Protocol Framework&lt;/a&gt;, and Noise goes way out of its way to &lt;a href='http://www.noiseprotocol.org/noise.html#identity-hiding' title=''&gt;hide identities&lt;/a&gt; during handshakes. To identify incoming requests, we&amp;rsquo;ll need to run enough Noise cryptography to decrypt the identity.&lt;/p&gt;

&lt;p&gt;The code to do this is fussy, but it&amp;rsquo;s relatively short (about 200 lines). Helpfully, the kernel Netlink interface will give a privileged process the private key for an interface, so the secrets we need to unwrap WireGuard are easy to get. Then it&amp;rsquo;s just a matter of running the first bit of the Noise handshake. If you&amp;rsquo;re that kind of nerdy, &lt;a href='https://gist.github.com/tqbf/9f2c2852e976e6566f962d9bca83062b' title=''&gt;here&amp;rsquo;s the code.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point, we have the event feed we wanted: the public keys of every user trying to make a WireGuard connection to our gateways. We keep a rate-limited cache in SQLite, and when we see new peers, we&amp;rsquo;ll make an internal HTTP API request to fetch the matching peer information and install it. This fits nicely into the little daemon that already runs on our gateways to manage WireGuard, and allows us to ruthlessly and recklessly remove stale peers with a &lt;code&gt;cron&lt;/code&gt; job.&lt;/p&gt;

&lt;p&gt;But wait! There&amp;rsquo;s more! We bounced this plan off Jason Donenfeld, and he tipped us off on a sneaky feature of the Linux WireGuard Netlink interface.&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;Jason is the hardest working person in show business.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Our API fetch for new peers is generally not going to be fast enough to respond to the first handshake initiation message a new client sends us. That&amp;rsquo;s OK; WireGuard is pretty fast about retrying. But we can do better.&lt;/p&gt;

&lt;p&gt;When we get an incoming initiation message, we have the 4-tuple address of the desired connection, including the ephemeral source port &lt;code&gt;flyctl&lt;/code&gt; is using. We can install the peer as if we&amp;rsquo;re the initiator, and &lt;code&gt;flyctl&lt;/code&gt; is the responder. The Linux kernel will initiate a WireGuard connection back to &lt;code&gt;flyctl&lt;/code&gt;. This works; the protocol doesn&amp;rsquo;t care a whole lot who&amp;rsquo;s the server and who&amp;rsquo;s the client. We get new connections established about as fast as they can possibly be installed.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Launch an app in minutes&lt;/h1&gt;
    &lt;p&gt;Speedrun an app onto Fly.io and get your own JIT WireGuard peer&amp;nbsp✨&lt;/p&gt;
      &lt;a class="btn btn-lg" href="/docs/speedrun/"&gt;
        Speedrun &amp;nbsp;&lt;span class='opacity-50'&gt;→&lt;/span&gt;
      &lt;/a&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='look-at-this-graph' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#look-at-this-graph' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Look at this graph&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve been running this in production for a few weeks and we&amp;rsquo;re feeling pretty happy about it. We went from thousands, or hundreds of thousands, of stale WireGuard peers on a gateway to what rounds to none. Gateways now hold a lot less state, are faster at setting up peers, and can be rebooted without having to wait for many unused peers to be loaded back into the kernel.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ll leave you with this happy Grafana chart from the day of the switchover.&lt;/p&gt;

&lt;p&gt;&lt;img alt="a Grafana chart of &amp;#39;kernel_stale_wg_peer_count&amp;#39; vs. time. For the first few hours, all traces are flat. Most are at values between 0 and 50,000 and the top-most is just under 550,000. Towards the end of the graph, each line in turn jumps sharply down to the bottom, and at the end of the chart all datapoints are indistinguishable from 0." src="/blog/jit-wireguard-peers/assets/wireguard-peers-graph.webp" /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong class='font-semibold text-navy-950'&gt;Editor&amp;rsquo;s note:&lt;/strong&gt; Despite our tearful protests, Lillian has decided to move on from Fly.io to explore new pursuits. We wish her much success and happiness!&amp;nbsp;✨&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Fly Kubernetes does more now</title>
    <link rel="alternate" href="https://fly.io/blog/fks-beta-live/"/>
    <id>https://fly.io/blog/fks-beta-live/</id>
    <published>2024-03-07T00:00:00+00:00</published>
    <updated>2024-04-24T22:38:38+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/fks-beta-live/assets/fks-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Eons ago, we &lt;a href="https://fly.io/blog/fks/" title=""&gt;announced&lt;/a&gt; we were working on &lt;a href="https://fly.io/docs/kubernetes/" title=""&gt;Fly Kubernetes&lt;/a&gt;. It drummed up enough excitement to prove we were heading in the right direction. So, we got hard to work to get from barebones “early access” to a beta release. We’ll be onboarding customers to the closed beta over the next few weeks. Email us at &lt;a href="mailto:[email protected]"&gt;[email protected]&lt;/a&gt; and we’ll hook you up.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Fly Kubernetes is the &amp;ldquo;blessed path&amp;quot;™️ to using Kubernetes backed by Fly.io infrastructure. Or, in simpler terms, it is our managed Kubernetes service. We take care of the complexity of operating the Kubernetes control plane, leaving you with the unfettered joy of deploying your Kubernetes workloads. If you love Fly.io and K8s, this product is for you.&lt;/p&gt;
&lt;h2 id='what-even-is-a-kubernete' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-even-is-a-kubernete' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;What even is a Kubernete?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;So how did this all come to be&amp;mdash;and what even is a Kubernete?&lt;/p&gt;
&lt;div class="right-sidenote"&gt;&lt;p&gt;You can see more fun details in &lt;a href="https://fly.io/blog/fks/" title=""&gt;Introducing Fly Kubernetes&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you wade through all the YAML and &lt;a href='https://landscape.cncf.io/' title=''&gt;CNCF projects&lt;/a&gt;, what&amp;rsquo;s left is an API for declaring workloads and how it should be accessed. &lt;/p&gt;

&lt;p&gt;But that&amp;rsquo;s not what people usually talk / groan about. It&amp;rsquo;s everything else that comes along with adopting Kubernetes: a container runtime (CRI), networking between workloads (CNI) which leads to DNS (CoreDNS). Then you layer on Prometheus for metrics and whatever the logging daemon du jour is at the time. Now you get to debate which Ingress&amp;mdash;strike that&amp;mdash;&lt;em&gt;Gateway&lt;/em&gt; API to deploy and if the next thing is anything to do with a Service Mess, then as they like to say where I live, &amp;quot;bless your heart&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Finally, there&amp;rsquo;s capacity planning. You&amp;rsquo;ve got to pick and choose where, how and what the &lt;a href='https://kubernetes.io/docs/concepts/architecture/nodes/' title=''&gt;Nodes&lt;/a&gt; will look like in order to configure and run the workloads.&lt;/p&gt;

&lt;p&gt;When we began thinking about what a Fly Kubernetes Service could look like, we started from first principles, as we do with most everything here. The best way we can describe it is the &lt;a href='https://www.youtube.com/watch?v=Ddk9ci6geSs' title=''&gt;scene from Iron Man 2 when Tony Stark discovers a new element&lt;/a&gt;. As he&amp;rsquo;s looking at the knowledge left behind by those that came before, he starts to imagine something entirely different and more capable than could have been accomplished previously. That&amp;rsquo;s what happened to JP, but with K3s and Virtual Kubelet.&lt;/p&gt;
&lt;h2 id='ok-then-wtf-whats-the-fks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ok-then-wtf-whats-the-fks' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;OK then, WTF (what&amp;rsquo;s the FKS)?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;We looked at what people need to get started—the API—and then started peeling away all the noise, filling in the gaps to connect things together to provide the power. Here&amp;rsquo;s how this looks currently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Containerd/CRI → &lt;a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''&gt;flyd&lt;/a&gt; + Firecracker + &lt;a href='https://fly.io/blog/docker-without-docker/' title=''&gt;our init&lt;/a&gt;: our system transmogrifies Docker containers into Firecracker microVMs
&lt;/li&gt;&lt;li&gt;Networking/CNI → Our &lt;a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''&gt;internal WireGuard mesh&lt;/a&gt; connects your pods together
&lt;/li&gt;&lt;li&gt;Pods → Fly Machines VMs
&lt;/li&gt;&lt;li&gt;Secrets → Secrets, only not the base64&amp;rsquo;d kind
&lt;/li&gt;&lt;li&gt;Services → The Fly Proxy
&lt;/li&gt;&lt;li&gt;CoreDNS → CoreDNS (to be replaced with our custom internal DNS)
&lt;/li&gt;&lt;li&gt;Persistent Volumes → Fly Volumes (coming soon)
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Now&amp;hellip;not everything is a one-to-one comparison, and we explicitly did not set out to support any and every configuration. We aren&amp;rsquo;t dealing with resources like Network Policy and init containers, though we&amp;rsquo;re also not completely ignoring them. By mapping many of the core primitives of Kubernetes to a Fly.io resource, we&amp;rsquo;re able to focus on continuing to build the primitives that make our cloud better for workloads of all shapes and sizes.&lt;/p&gt;

&lt;p&gt;A key thing to notice above is that there&amp;rsquo;s no &amp;ldquo;Node&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://virtual-kubelet.io/' title=''&gt;Virtual Kubelet&lt;/a&gt; plays a central role in FKS. It&amp;rsquo;s magic, really. A Virtual Kubelet acts as if it&amp;rsquo;s a standard Kubelet running on a Node, eager to run your workloads. However, there&amp;rsquo;s no Node backing it. It instead behaves like an API, receiving requests from Kubernetes and transforming them into requests to deploy on a cloud compute service. In our case, that&amp;rsquo;s Fly Machines.&lt;/p&gt;

&lt;p&gt;So what we have is Kubernetes calling out to our &lt;a href='https://virtual-kubelet.io/docs/providers/' title=''&gt;Virtual Kubelet provider&lt;/a&gt;, a small Golang program we run alongside K3s, to create and run your pod. It creates &lt;a href='https://fly.io/blog/docker-without-docker/' title=''&gt;your pod as a Fly Machine&lt;/a&gt;, via the &lt;a href='/docs/machines/api/' title=''&gt;Fly Machines API&lt;/a&gt;, deploying it to any underlying host within that region. This shifts the burden of managing hardware capacity from you to us. We think that&amp;rsquo;s a cool trick&amp;mdash;thanks, Virtual Kubelet magic!&lt;/p&gt;
&lt;h2 id='speedrun' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#speedrun' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Speedrun&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;You can deploy your workloads (including GPUs) across any of our available regions using the Kubernetes API.&lt;/p&gt;

&lt;p&gt;You create a cluster with &lt;code&gt;flyctl&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-fnxi6rft"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-fnxi6rft"&gt;fly ext k8s create --name hello --org personal --region iad
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;When a cluster is created, it has the standard &lt;code&gt;default&lt;/code&gt; namespace. You can inspect it:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-92wwv6kq"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-92wwv6kq"&gt;kubectl get ns default --show-labels
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative output"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-mk490mip"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight output whitespace-pre'&gt;&lt;code id="code-mk490mip"&gt;NAME      STATUS   AGE   LABELS
default   Active   20d   fly.io/app=fks-default-7zyjm3ovpdxmd0ep,kubernetes.io/metadata.name=default
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;fly.io/app&lt;/code&gt; label shows the name of the Fly App that corresponds to your cluster.&lt;/p&gt;

&lt;p&gt;It would seem appropriate to deploy the &lt;a href='https://github.com/kubernetes-up-and-running/kuard' title=''&gt;Kubernetes Up And Running demo&lt;/a&gt; here, but since your pods are connected over an &lt;a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''&gt;IPv6 WireGuard mesh&lt;/a&gt;, we&amp;rsquo;re going to use a &lt;a href='https://github.com/jipperinbham/kuard' title=''&gt;fork&lt;/a&gt; with support for &lt;a href='https://github.com/kubernetes-up-and-running/kuard/issues/46' title=''&gt;IPv6 DNS&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-7qz94xki"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-7qz94xki"&gt;kubectl run \
  --image=ghcr.io/jipperinbham/kuard-amd64:blue \
  --labels="app=kuard-fks" \
  kuard
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And you can see its Machine representation via:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-1wk7f1q0"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-1wk7f1q0"&gt;fly machine list --app fks-default-7zyjm3ovpdxmd0ep
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative output"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-7rbzov1i"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight output whitespace-pre'&gt;&lt;code id="code-7rbzov1i"&gt;ID              NAME        STATE   REGION  IMAGE                           IP ADDRESS                      VOLUME  CREATED                 LAST UPDATED            APP PLATFORM    PROCESS GROUP   SIZE
1852291c46ded8  kuard       started iad     jipperinbham/kuard-amd64:blue   fdaa:0:48c8:a7b:228:4b6d:6e20:2         2024-03-05T18:54:41Z    2024-03-05T18:54:44Z                                    shared-cpu-1x:256MB
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;/div&gt;&lt;/p&gt;

&lt;p&gt;This is important! Your pod is a Fly Machine! While we don’t yet support all kubectl features, Fly.io tooling will &amp;ldquo;just work&amp;rdquo; for cases where we don&amp;rsquo;t yet support the kubectl way. So, for example, we don&amp;rsquo;t have &lt;code&gt;kubectl port-forward&lt;/code&gt; and &lt;code&gt;kubectl exec&lt;/code&gt;, but you can use flyctl to forward ports and get a shell into a pod.&lt;/p&gt;

&lt;p&gt;Expose it to your internal network using the standard ClusterIP Service:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-1sjiwcq9"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-1sjiwcq9"&gt;kubectl expose pod kuard \
  --name=kuard \
  --port=8080 \
  --target-port=8080 \
  --selector='app=kuard-fks'
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;ClusterIP Services work natively, and Fly.io internal DNS supports them. Within the cluster, CoreDNS works too.&lt;/p&gt;

&lt;p&gt;Access this Service locally via &lt;a href='https://fly.io/docs/networking/private-networking/#flycast-private-load-balancing' title=''&gt;flycast&lt;/a&gt;: Get connected to your org&amp;rsquo;s &lt;a href='https://fly.io/docs/networking/private-networking/' title=''&gt;6PN private WireGuard network&lt;/a&gt;. Get kubectl to describe the &lt;code&gt;kuard&lt;/code&gt; Service:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative cmd"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-hy5q54ru"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight cmd'&gt;&lt;code id="code-hy5q54ru"&gt;kubectl describe svc kuard
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;div class="highlight-wrapper group relative output"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-a8mzw85a"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight output'&gt;&lt;code id="code-a8mzw85a"&gt;Name:              kuard
Namespace:         default
Labels:            app=kuard-fks
Annotations:       fly.io/clusterip-allocator: configured
                   service.fly.io/sync-version: 11507529969321451315
Selector:          app=kuard-fks
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv6
IP:                fdaa:0:48c8:0:1::1a
IPs:               fdaa:0:48c8:0:1::1a
Port:              &amp;lt;unset&amp;gt;  8080/TCP
TargetPort:        8080/TCP
Endpoints:         [fdaa:0:48c8:a7b:228:4b6d:6e20:2]:8080
Session Affinity:  None
Events:            &amp;lt;none&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can pull out the Service&amp;rsquo;s IP address from the above output, and get at the KUARD UI using that: in this case, &lt;code&gt;http://[fdaa:0:48c8:0:1::1a]:8080&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Using internal DNS: &lt;code&gt;http://&amp;lt;service_name&amp;gt;.svc.&amp;lt;app_name&amp;gt;.flycast:8080&lt;/code&gt;. Or, in our example: &lt;code&gt;http://kuard.svc.fks-default-7zyjm3ovpdxmd0ep.flycast:8080&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And finally CoreDNS: &lt;code&gt;&amp;lt;service_name&amp;gt;.&amp;lt;namespace&amp;gt;.svc.cluster.local&lt;/code&gt; resolves to the &lt;code&gt;fdaa&lt;/code&gt; IP and is routable within the cluster.&lt;/p&gt;
&lt;figure class="post-cta"&gt;
  &lt;figcaption&gt;
    &lt;h1&gt;Get in on the FKS beta&lt;/h1&gt;
    &lt;p&gt;Email us at [email protected]&lt;/p&gt;
  &lt;/figcaption&gt;
  &lt;div class="image-container"&gt;
    &lt;img src="/static/images/cta-cat.webp" srcset="/static/images/[email protected] 2x" alt=""&gt;
  &lt;/div&gt;
&lt;/figure&gt;

&lt;h2 id='pricing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pricing' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Pricing&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;The Fly Kubernetes Service is free during the beta. Fly Machines and Fly Volumes you create with it will cost the &lt;a href='https://fly.io/docs/about/pricing/' title=''&gt;same as for your other Fly.io projects&lt;/a&gt;. It&amp;rsquo;ll be &lt;a href='https://fly.io/docs/about/pricing/#fly-kubernetes' title=''&gt;$75/mo per cluster&lt;/a&gt; after that, plus the cost of the other resources you create.&lt;/p&gt;
&lt;h2 id='today-and-the-future' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#today-and-the-future' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Today and the future&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Today, Fly Kubernetes supports only a portion of the Kubernetes API. You can deploy pods using Deployments/ReplicaSets. Pods are able to communicate via Services using the standard K8s DNS format. Ephemeral and persistent volumes are supported.&lt;/p&gt;

&lt;p&gt;The most notable absences are: multi-container pods, StatefulSets, network policies, horizontal pod autoscaling and emptyDir volumes. We&amp;rsquo;re working at supporting autoscaling and emptyDir volumes in the coming weeks and multi-container pods in the coming months.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;ve made it this far and are eagerly awaiting your chance to tell us and the rest of the internet &amp;ldquo;this isn&amp;rsquo;t Kubernetes!&amp;rdquo;, well, we agree! It&amp;rsquo;s not something we take lightly. We&amp;rsquo;re still building, and conformance tests may be in the future for FKS. We&amp;rsquo;ve made a deliberate decision to only care about fast launching VMs as the one and only way to run workloads on our cloud. And we also know enough of our customers would like to use the Kubernetes API to create a fast launching VM in the form of a Pod, and that&amp;rsquo;s where this story begins. &lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Globally Distributed Object Storage with Tigris</title>
    <link rel="alternate" href="https://fly.io/blog/tigris-public-beta/"/>
    <id>https://fly.io/blog/tigris-public-beta/</id>
    <published>2024-02-15T00:00:00+00:00</published>
    <updated>2024-04-24T22:38:38+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/tigris-public-beta/assets/tigris-public-beta-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. That’s pretty cool, but we want to talk about something someone else built, that &lt;a href="https://fly.io/docs/reference/tigris/" title=""&gt;you can use today&lt;/a&gt; to build applications.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There are three hard things in computer science:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cache invalidation
&lt;/li&gt;&lt;li&gt;Naming things
&lt;/li&gt;&lt;li&gt;&lt;a href='https://aws.amazon.com/s3/' title=''&gt;Doing a better job than Amazon of storing files&lt;/a&gt;
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Of all the annoying software problems that have no business being annoying, handling a file upload in a full-stack application stands apart, a universal if tractable malady, the plantar fasciitis of programming.&lt;/p&gt;

&lt;p&gt;Now, the actual act of clients placing files on servers is straightforward. Your framework &lt;a href='https://hexdocs.pm/phoenix/file_uploads.html' title=''&gt;has&lt;/a&gt; &lt;a href='https://edgeguides.rubyonrails.org/active_storage_overview.html' title=''&gt;a&lt;/a&gt; &lt;a href='https://docs.djangoproject.com/en/5.0/topics/http/file-uploads/' title=''&gt;feature&lt;/a&gt; &lt;a href='https://expressjs.com/en/resources/middleware/multer.html' title=''&gt;that&lt;/a&gt; &lt;a href='https://github.com/yesodweb/yesod-cookbook/blob/master/cookbook/Cookbook-file-upload-saving-files-to-server.md' title=''&gt;does&lt;/a&gt; &lt;a href='https://laravel.com/docs/10.x/filesystem' title=''&gt;it&lt;/a&gt;. What&amp;rsquo;s hard is making sure that uploads stick around to be downloaded later.&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;(yes, yes, we know, &lt;a href="https://youtu.be/b2F-DItXtZs?t=102" title=""&gt;sharding /dev/null&lt;/a&gt; is faster)&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Enter object storage, a pattern you may know by its colloquial name &amp;ldquo;S3&amp;rdquo;. Object storage occupies a funny place in software architecture, somewhere between a database and a filesystem. It&amp;rsquo;s like &lt;a href='https://man7.org/linux/man-pages/man3/malloc.3.html' title=''&gt;&lt;code&gt;malloc&lt;/code&gt;&lt;/a&gt;&lt;code&gt;()&lt;/code&gt;, but for cloud storage instead of program memory.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://www.kleenex.com/en-us/' title=''&gt;S3&lt;/a&gt;—err, object storage — is so important that it was the second AWS service ever introduced (EC2 was not the first!). Everybody wants it. We know, because they keep asking us for it.&lt;/p&gt;

&lt;p&gt;So why didn&amp;rsquo;t we build it?&lt;/p&gt;

&lt;p&gt;Because we couldn&amp;rsquo;t figure out a way to improve on S3. And we still haven&amp;rsquo;t! But someone else did, at least for the kinds of applications we see on Fly.io.&lt;/p&gt;
&lt;h2 id='but-first-some-back-story' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-first-some-back-story' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;But First, Some Back Story&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;S3 checks all the boxes. It&amp;rsquo;s trivial to use. It&amp;rsquo;s efficient and cost-effective. It has redundancies that would make a DoD contractor blush. It integrates with archival services like Glacier. And every framework supports it. At some point, the IETF should just take a deep sigh and write an S3 API RFC, XML signatures and all.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s at least one catch, though.&lt;/p&gt;

&lt;p&gt;Back in, like, &amp;lsquo;07 people ran all their apps from a single city. S3 was designed to work for those kinds of apps. The data, the bytes on the disks (or whatever weird hyperputer AWS stores S3 bytes on), live in one place. A specific place. In a specific data center. As powerful and inspiring as The Architects are, they are mortals, and must obey the laws of physics.&lt;/p&gt;

&lt;p&gt;This observation feels banal, until you realize how apps have changed in the last decade. Apps and their users don&amp;rsquo;t live in one specific place. They live all over the world. When users are close to the S3 data center, things are amazing! But things get less amazing the further away you get from the data center, and even less amazing the smaller and more frequent your reads and writes are.&lt;/p&gt;

&lt;p&gt;(Thought experiment: you have to pick one place in the world to route all your file storage. Where is it? Is it &lt;a href='https://www.tripadvisor.com/Restaurant_Review-g30246-d1956555-Reviews-Ford_s_Fish_Shack_Ashburn-Ashburn_Loudoun_County_Virginia.html' title=''&gt;Loudoun County, Virginia&lt;/a&gt;?)&lt;/p&gt;

&lt;p&gt;So, for many modern apps, you end up having to &lt;a href='https://stackoverflow.com/questions/32426249/aws-s3-bucket-with-multiple-regions' title=''&gt;write things into different regions&lt;/a&gt;, so that people close to the data get it from a region-specific bucket. Doing that pulls in CDN-caching things that complicated your application and put barriers between you and your data. Before you know it, you&amp;rsquo;re wearing custom orthotics on your, uh, developer feet. (&lt;em&gt;I am done with this metaphor now, I promise.&lt;/em&gt;)&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;(well, okay, Backblaze B2 because somehow my bucket fits into their free tier, but you get the idea)&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Personally, I know this happens. Because I had to build one! I run a &lt;a href='https://xeiaso.net/blog/xedn/' title=''&gt;CDN backend&lt;/a&gt; that&amp;rsquo;s a caching proxy for S3 in six continents across the world. All so that I can deliver images and video efficiently for the readers of my blog.&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;(shut up, it’s a sandwich)&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;What if data was really global? For some applications, it might not matter much. But for others, it matters a lot. When a sandwich lover in Australia snaps a picture of a &lt;a href='https://en.wikipedia.org/wiki/Hamdog' title=''&gt;hamdog&lt;/a&gt;, the people most likely to want to see that photo are also in Australia. Routing those uploads and downloads through one building in Ashburn is no way to build a sandwich reviewing empire.&lt;/p&gt;

&lt;p&gt;Localizing all the data sounds like a hard problem. What if you didn&amp;rsquo;t need to change anything on your end to accomplish it?&lt;/p&gt;
&lt;h2 id='show-me-a-hero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#show-me-a-hero' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;Show Me A Hero&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Building a miniature CDN infrastructure just to handle file uploads seems like the kind of thing that could take a week or so of tinkering. The Fly.io unified theory of cloud development is that solutions are completely viable for full-stack developers only when they take less than 2 hours to get working.&lt;/p&gt;

&lt;p&gt;AWS agrees, which is why they have a SKU for it, &lt;a href='https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudfront_distribution' title=''&gt;called Cloudfront&lt;/a&gt;, which will, at some variably metered expense, optimize the read side of a single-write-region bucket: they&amp;rsquo;ll set up &lt;a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''&gt;a simple caching CDN&lt;/a&gt; for you. You can probably get S3 and Cloudfront working within 2 hours, especially if you&amp;rsquo;ve set it up before.&lt;/p&gt;

&lt;p&gt;Our friends at Tigris have this problem down to single-digit minutes, and what they came up with is a lot cooler than a cache CDN.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s how it works. Tigris runs redundant FoundationDB clusters in our regions to track objects. They use Fly.io&amp;rsquo;s NVMe volumes as a first level of cached raw byte store, and a queuing system modelled on &lt;a href='https://www.foundationdb.org/files/QuiCK.pdf' title=''&gt;Apple&amp;rsquo;s QuiCK paper&lt;/a&gt; to distribute object data to multiple replicas, to regions where the data is in demand, and to 3rd party object stores… like S3.&lt;/p&gt;

&lt;p&gt;If your objects are less than about 128 kilobytes, Tigris makes them instantly global. By default! Things are just snappy, all over the world, automatically, because they&amp;rsquo;ve done all the work.&lt;/p&gt;

&lt;p&gt;But it gets better, because Tigris is also much more flexible than a cache simple CDN. It&amp;rsquo;s globally distributed from the jump, with inter-region routing baked into its distribution layer. Tigris isn&amp;rsquo;t a CDN, but rather a toolset that you can use to build arbitrary CDNs, with consistency guarantees, instant purge and relay regions.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a lot going on in this architecture, and it&amp;rsquo;d be fun to dig into it more. But for now, you don&amp;rsquo;t have to understand any of it. Because Tigris ties all this stuff together with an S3-compatible object storage API. If your framework can talk to S3, it can use Tigris.&lt;/p&gt;
&lt;h2 id='fly-storage' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-storage' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;&lt;code&gt;fly storage&lt;/code&gt;&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;To get started with this, run the &lt;code&gt;fly storage create&lt;/code&gt; command:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative "&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-rhojus0y"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-rhojus0y"&gt;$ fly storage create
Choose a name, use the default, or leave blank to generate one: xe-foo-images
Your Tigris project (xe-foo-images) is ready. See details and next steps with: https://fly.io/docs/reference/tigris/

Setting the following secrets on xe-foo:
AWS_REGION
BUCKET_NAME
AWS_ENDPOINT_URL_S3
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

Secrets are staged for the first deployment
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;All you have to do is fill in a bucket name. Hit enter. All of the configuration for the AWS S3 library will be injected into your application for you. And you don&amp;rsquo;t even need to change the libraries that you&amp;rsquo;re using. &lt;a href='https://www.tigrisdata.com/docs/sdks/s3/' title=''&gt;The Tigris examples&lt;/a&gt; all use the AWS libraries to put and delete objects into Tigris using the same calls that you use for S3.&lt;/p&gt;

&lt;p&gt;I know how this looks for a lot of you. It looks like we&amp;rsquo;re partnering with Tigris because we&amp;rsquo;re chicken, and we didn&amp;rsquo;t want to build something like this. Well, guess what: you&amp;rsquo;re right!&lt;/p&gt;

&lt;p&gt;Compute and networking: those are things we love and understand. Object storage? &lt;a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''&gt;We already gave away the game on how we&amp;rsquo;d design a CDN for our own content&lt;/a&gt;, and it wasn&amp;rsquo;t nearly as slick as Tigris.&lt;/p&gt;

&lt;p&gt;Object storage is important. It needs to be good. We did not want to half-ass it. So we partnered with Tigris, so that they can put their full resources into making object storage as ✨magical✨ as Fly.io is.&lt;/p&gt;

&lt;p&gt;This also mirrors a lot of the Unix philosophy of Days Gone Past, you have individual parts that do one thing very well that are then chained together to create a composite result. I mean, come on, would you seriously want to buy your servers the same place you buy your shoes?&lt;/p&gt;
&lt;h2 id='one-bill-to-rule-them-all' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'&gt;&lt;a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#one-bill-to-rule-them-all' aria-label='Anchor'&gt;&lt;/a&gt;&lt;span class='plain-code'&gt;One bill to rule them all&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;Well, okay, the main reason why you would want to do that is because having everything under one bill makes it really easy for your accounting people. So, to make one bill for your computer, your block storage, your databases, your networking, and your object storage, we&amp;rsquo;ve wrapped everything under one bill. You don&amp;rsquo;t have to create separate accounts with Supabase or Upstash or PlanetScale or Tigris. Everything gets charged to your Fly.io bill and you pay one bill per month.&lt;/p&gt;
&lt;aside class="right-sidenote"&gt;&lt;p&gt;This was actually going to be posted on Valentine’s Day, but we had to wait for the chocolate to go on sale.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;This is our Valentine&amp;rsquo;s Day gift to you all. Object storage that just works. Stay tuned because we have a couple exciting features that build on top of the integration of Fly.io and Tigris that allow really unique things, such as truly global static website hosting and turning your bucket into a CDN in 5 minutes at most.&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s to many more happy developer days to come.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>GPUs on Fly.io are available to everyone!</title>
    <link rel="alternate" href="https://fly.io/blog/gpu-ga/"/>
    <id>https://fly.io/blog/gpu-ga/</id>
    <published>2024-02-12T00:00:00+00:00</published>
    <updated>2024-04-24T22:38:38+00:00</updated>
    <media:thumbnail url="https://fly.io/blog/gpu-ga/assets/gpu-ga-thumb.webp"/>
    <content type="html">&lt;div class="lead"&gt;&lt;p&gt;Fly.io makes it easy to spin up compute around the world, now including powerful GPUs. Unlock the power of large language models, text transcription, and image generation with our datacenter-grade muscle!&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;GPUs are now available to everyone!&lt;/p&gt;

&lt;p&gt;We know you&amp;rsquo;ve been excited about wanting to use GPUs on Fly.io and we&amp;rsquo;re happy to announce that they&amp;rsquo;re available for everyone. If you want, you can spin up GPU instances with any of the following cards:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ampere A100 (40GB) &lt;code&gt;a100-40gb&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;Ampere A100 (80GB) &lt;code&gt;a100-80gb&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;Lovelace L40s (48GB) &lt;code&gt;l40s&lt;/code&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;To use a GPU instance today, change the &lt;code&gt;vm.size&lt;/code&gt; for one of your apps or processes to any of the above GPU kinds. Here&amp;rsquo;s how you can spin up an &lt;a href='https://ollama.ai' title=''&gt;Ollama&lt;/a&gt; server in seconds:&lt;/p&gt;
&lt;div class="highlight-wrapper group relative toml"&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-wrap-target="#code-bcyvgy6u"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35" stroke-linecap="round" stroke-linejoin="round"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959" /&gt;&lt;path d="M11.081 6.466L9.533 8.037l1.548 1.571" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950"&gt;
      Wrap text
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;button 
    type="button"
    class="bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none"
    data-copy-target="sibling"
  &gt;
    &lt;svg class="w-4 h-4 pointer-events-none" viewBox="0 0 16 16" fill="none" stroke="currentColor" stroke-width="1.35"&gt;&lt;g buffered-rendering="static"&gt;&lt;path d="M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z" /&gt;&lt;path d="M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617" /&gt;&lt;/g&gt;&lt;/svg&gt;
    &lt;span class="bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950"&gt;
      Copy to clipboard
    &lt;/span&gt;
  &lt;/button&gt;
  &lt;div class='highlight relative group'&gt;
    &lt;pre class='highlight '&gt;&lt;code id="code-bcyvgy6u"&gt;&lt;span class="py"&gt;app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"your-app-name"&lt;/span&gt;
&lt;span class="py"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ord"&lt;/span&gt;
&lt;span class="py"&gt;vm.size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"l40s"&lt;/span&gt;

&lt;span class="nn"&gt;[http_service]&lt;/span&gt;
  &lt;span class="py"&gt;internal_port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;11434&lt;/span&gt;
  &lt;span class="py"&gt;force_https&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="py"&gt;auto_stop_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;auto_start_machines&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="py"&gt;min_machines_running&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="py"&gt;processes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;["app"]&lt;/span&gt;

&lt;span class="nn"&gt;[build]&lt;/span&gt;
  &lt;span class="py"&gt;image&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama/ollama"&lt;/span&gt;

&lt;span class="nn"&gt;[mounts]&lt;/span&gt;
  &lt;span class="py"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"models"&lt;/span&gt;
  &lt;span class="py"&gt;destination&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/root/.ollama"&lt;/span&gt;
  &lt;span class="py"&gt;initial_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"100gb"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Deploy this and bam, large language model inferencing from anywhere. If you want a private setup, see the article &lt;a href='https://fly.io/blog/scaling-llm-ollama/' title=''&gt;Scaling Large Language Models to zero with Ollama&lt;/a&gt; for more information. You never know when you have a sandwich emergency and don&amp;rsquo;t know what you can make with what you have on hand.&lt;/p&gt;

&lt;p&gt;We are working on getting some lower-cost A10 GPUs in the next few weeks. We&amp;rsquo;ll update you when they&amp;rsquo;re ready.&lt;/p&gt;

&lt;p&gt;If you want to explore the possibilities of GPUs on Fly.io, here&amp;rsquo;s a few articles that may give you ideas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='https://fly.io/blog/not-midjourney-bot/' title=''&gt;Deploy Your Own (Not) MidJourney Bot On Fly GPUs&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/blog/scaling-llm-ollama/' title=''&gt;Scaling Large Language Models to zero with Ollama&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''&gt;Transcribing on Fly GPU Machines&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Depending on factors such as your organization&amp;rsquo;s age and payment history, you may need to go through additional verification steps.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;ve been experimenting with Fly.io GPUs and have made something cool, let us know on the &lt;a href='https://community.fly.io/' title=''&gt;Community Forums&lt;/a&gt; or by mentioning us &lt;a href='https://hachyderm.io/@flydotio' title=''&gt;on Mastodon&lt;/a&gt;! We&amp;rsquo;ll boost the cool ones.&lt;/p&gt;
</content>
  </entry>
</feed>
Raw headers
{
  "cache-control": "max-age=0, private, must-revalidate",
  "cf-cache-status": "DYNAMIC",
  "cf-ray": "9dc5c7f0ca305751-CMH",
  "content-type": "text/xml",
  "date": "Sat, 14 Mar 2026 19:45:50 GMT",
  "etag": "W/\"69b3d033-b1987\"",
  "fly-request-id": "01KKPY4CT4ZMQ2D51B20TZRJ1R-yyz",
  "last-modified": "Fri, 13 Mar 2026 08:52:03 GMT",
  "server": "cloudflare",
  "set-cookie": "fly_gtm={}; path=/; expires=Sun, 14 Mar 2027 19:45:50 GMT; max-age=31536000; secure; HttpOnly; SameSite=Lax",
  "transfer-encoding": "chunked",
  "vary": "accept-encoding",
  "via": "1.1 fly.io, 1.1 fly.io, 1.1 fly.io"
}
Parsed with @rowanmanning/feed-parser
{
  "meta": {
    "type": "atom",
    "version": "1.0"
  },
  "language": null,
  "title": "The Fly Blog",
  "description": "News, tips, and tricks from the team at Fly",
  "copyright": null,
  "url": "https://fly.io/blog/",
  "self": "https://fly.io/blog/",
  "published": null,
  "updated": "2026-03-10T00:00:00.000Z",
  "generator": null,
  "image": null,
  "authors": [
    {
      "name": "Fly",
      "email": null,
      "url": null
    }
  ],
  "categories": [],
  "items": [
    {
      "id": "https://fly.io/blog/unfortunately-mcp/",
      "title": "Unfortunately, Sprites Now Speak MCP",
      "description": null,
      "url": "https://fly.io/blog/unfortunately-mcp/",
      "published": "2026-03-10T00:00:00.000Z",
      "updated": "2026-03-12T19:30:19.000Z",
      "content": "<div class=\"lead\"><p>Sprites are disposable cloud computers. They appear instantly, always include durable filesystems, and cost practically nothing when idle. They’re the best and safest place on the Internet to run agents and we want you to <a href=\"https://sprites.dev/\" title=\"\">create dozens of them</a>.</p>\n</div>\n<p>Sprites are a place to run agents; the first thing you should think to do with a new Sprite is to type <code>claude</code> (or <code>gemini</code> or <code>codex</code>). We’ve put a <a href='https://fly.io/blog/design-and-implementation/' title=''>lot of effort</a> into making sure coding agents feel safe and happy when they’re on Sprites, because, to (probably) quote John von Neumann, “happy agents are productive agents.”</p>\n\n<p>What’s less obvious about Sprites is that they’re great tools <em>for</em> agents. Want three different versions of a new feature? A test environment? An ensemble of cooperating services? It’s super handy to be able to start your prompts, “<code>On a new Sprite, do…</code>”.</p>\n\n<p>The Sprites API is simple, discoverable, and designed for this use case. It’s just a question of how you choose to give your agent access to it. And now there’s one more way: with MCP.</p>\n<h2 id='we-did-this-because-your-agents-suck' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-did-this-because-your-agents-suck' aria-label='Anchor'></a><span class='plain-code'>We Did This Because Your Agents Suck</span></h2>\n<p>This feature works well, but we’re less than enthusiastic about it. Not as product developers, mind you. It’s a good product! Just as aesthetes.</p>\n\n<p>In 2026, MCP is the wrong way to extend the capabilities of an agent. The emerging Right Way to do this is command line tools and discoverable APIs.</p>\n\n<p>When we plug an MCP into your agent, we’re filling its context with tool descriptions, many of which you’ll probably never use. Really, all your agent should need is a short sentence, like “<code>Use this skill whenever users want to create a new VM to run a task on, or to manage the VMs already available.</code>” The skill should take care of the rest.</p>\n\n<p>CLI-driven agent skills are efficient because they reveal capabilities progressively. You can do CLI subcommands, like <code>sprite checkpoint</code> and <code>sprite exec</code>, or with API endpoints and subpaths. Good agent harnesses are uncanny at quickly working out how to use these things.</p>\n<div class=\"right-sidenote\"><p>You <em>are</em> using Playwright, right? “Make sure this web application actually works before you tell me you’re done”?</p>\n</div>\n<p>Take <a href='https://playwright.dev/' title=''>Playwright, the industry-standard browser automation tool</a>. Ask <code>claude</code> to install Playwright and Chrome and there’s a coinflip chance it sets up the MCP server. But notice that when the coin comes up tails, Playwright still works. <code>claude</code> just drives it by writing little scripts. This is good! The models already know how to write little scripts without using up context.</p>\n\n<p>And there’s more at stake than just efficiency. Cramming your context full of MCP tool descriptions is a way of signaling to the model that those tools are important to you. But not every Sprite command is equally important in every setting. If you’re not using network policies, you don’t need <code>gemini</code> to waste a bunch of time setting them up for you.</p>\n\n<p>Skills and APIs are the best way to drive Sprites. But to make that work, you need an agent that can run shell commands for itself. So you’ll want to reach for MCP sessions when you’re stuck with an agent that can’t run commands. Thankfully, most of us aren’t using those kinds of agents anymore. In <code>claude</code>, <code>gemini</code>, or <code>codex</code>, you should just show your agent the <code>sprite</code> CLI and let it impress you.</p>\n<h2 id='sprites-dev-mcp' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#sprites-dev-mcp' aria-label='Anchor'></a><span class='plain-code'>sprites.dev/mcp</span></h2>\n<p>Plug this URL into Claude Desktop, or any other agent tool that speaks MCP. You’ll authenticate to one of your Fly.io organizations, and your agent will speak Sprites.</p>\n\n<p>Then:</p>\n\n<p><code>On a new Sprite, take this repository and reproduce this bug from issues/913, capturing logs.</code></p>\n\n<p><code>On a new Sprite, benchmark this function across 1000 runs and summarize the results.</code></p>\n\n<p><code>On a new Sprite, update all the dependencies on this project to their newest versions and test that everything works.</code></p>\n\n<p><code>On 3 new Sprites, change this service to use each of these 3 query libraries, and use HTTP to test latency.</code></p>\n\n<p><code>On a new Sprite, run this code with bpfwatch and show me what files it touches.</code></p>\n\n<p><code>On a new Sprite, run a load generator against this endpoint for 60 seconds and report the results.</code></p>\n\n<p><code>On a new Sprite, download this dataset and give me a Jupyter notebook to explore it in.</code></p>\n\n<p><code>On a new Sprite, set up a webhook receiver and render a real-time web report of all the payloads it receives.</code></p>\n\n<p>I don’t know. You know your projects better than we do. Whatever. Sometimes you want a clean, cheap, disposable computer (or five of them). That’s now an available feature of all your prompts. Find ways to apply it to your project, and we think you’ll end up wondering where Sprites have been all your life.</p>\n<div class=\"callout\"><p>Some of you are thinking to yourself: “this feature is going to result in robots ruining my life”. We agree. So we’ve built in guardrails. When you authenticate, giving your agent access to a single specific organization on your Fly.io account, we’ll let you scope down the MCP session. You can cap the number of Sprites our MCP will create for you, and you can give them name prefixes so you can easily spot the robots and disassemble them.</p>\n</div><h2 id='fuck-stateless-sandboxes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fuck-stateless-sandboxes' aria-label='Anchor'></a><span class='plain-code'>Fuck Stateless Sandboxes</span></h2>\n<p>We’ll keep saying this until our faces turn blue: the industry is stuck on “sandboxes” as a way of letting agents run code, and sandboxes aren’t good enough anymore. What agents want is real computers, with real filesystems, connected to real networks, and there’s no technical reason not to give them some.</p>\n\n<p><a href='https://fly.io/blog/code-and-let-live/' title=''>We designed Sprites so that you can fearlessly create whole bunches of them</a>. They’re responsive enough to host web apps for your team, but they idle in a sleeping state where they cost virtually nothing. Everybody at Fly.io that uses them ends up with 20 or 30, just hanging around.</p>\n\n<p>We think you’ll do better work when you can pull in as many computers as you need to solve problems. If it takes an MCP server for us to get you to do that, so be it.</p>",
      "image": {
        "url": "https://fly.io/blog/unfortunately-mcp/assets/whack.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/litestream-writable-vfs/",
      "title": "Litestream Writable VFS",
      "description": null,
      "url": "https://fly.io/blog/litestream-writable-vfs/",
      "published": "2026-01-29T00:00:00.000Z",
      "updated": "2026-02-04T23:24:24.000Z",
      "content": "<div class=\"lead\"><p><strong class=\"font-semibold text-navy-950\">I’m Ben Johnson, and I work on Litestream at Fly.io. Litestream is the missing backup/restore system for SQLite. It’s free, open-source software that should run anywhere, and</strong> <a href=\"/blog/litestream-v050-is-here/\" title=\"\"><strong class=\"font-semibold text-navy-950\">you can read more about it here</strong></a><strong class=\"font-semibold text-navy-950\">.</strong></p>\n</div>\n<p>Each time we write about it, we get a little bit better at golfing down a description of what Litestream is. Here goes: Litestream is a Unix-y tool for keeping a SQLite database synchronized with S3-style object storage. It’s a way of getting the speed and simplicity wins of SQLite without exposing yourself to catastrophic data loss. Your app doesn’t necessarily even need to know it’s there; you can just run it as a tool in the background.</p>\n\n<p>It’s been a busy couple weeks!</p>\n\n<p>We recently <a href='/blog/design-and-implementation/' title=''>unveiled Sprites</a>. If you don’t know what Sprites are, you should just <a href='https://sprites.dev/' title=''>go check them out</a>. They’re one of the coolest things we’ve ever shipped. I won’t waste any more time selling them to you. Just, Sprites are a big deal, and so it’s a big deal to me that Litestream is a load-bearing component for them.</p>\n\n<p>Sprites rely directly on Litestream in two big ways.</p>\n\n<p>First, Litestream SQLite is the core of our global Sprites orchestrator. Unlike our flagship Fly Machines product, which relies on a centralized Postgres cluster, our Elixir Sprites orchestrator runs directly off S3-compatible object storage. Every organization  enrolled in Sprites gets their own SQLite database, synchronized by Litestream.</p>\n\n<p>This is a fun design. It takes advantage of the “many SQLite databases” pattern, which is under-appreciated. It’s got nice scaling characteristics. Keeping that Postgres cluster happy as Fly.io grew has been a major engineering challenge.</p>\n\n<p>But as far as Litestream is concerned, the orchestrator is boring, and so that’s all I’ve got to say about it. The second way Sprites use Litestream is much more interesting.</p>\n\n<p>Litestream is built directly into the disk storage stack that runs on every Sprite.</p>\n\n<p>Sprites launch in under a second, and every one of them boots up with 100GB of durable storage. That’s a tricky bit of engineering. We’re able to do this because the  root of storage for Sprites is S3-compatible object storage, and we’re able to make it fast by keeping a database of in-use storage blocks that takes advantage of attached NVMe as a read-through cache. The system that does this is JuiceFS, and the database — let’s call it “the block map” — is a rewritten metadata store, based (you guessed it) on BoltDB.</p>\n\n<p>I kid! It’s Litestream SQLite, of course.</p>\n<h2 id='sprite-storage-is-fussy' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#sprite-storage-is-fussy' aria-label='Anchor'></a><span class='plain-code'>Sprite Storage Is Fussy</span></h2>\n<p>Everything in a Sprite is designed to come up fast.</p>\n\n<p>If the Fly Machine underneath a Sprite bounces, we might need to reconstitute the block map from object storage. Block maps aren’t huge, but they’re not tiny; maybe low tens of megabytes worst case.</p>\n\n<p>The thing is, this is happening while the Sprite boots back up. To put that in perspective, that’s something that can happen in response to an incoming web request; that is, we have to finish fast enough to generate a timely response to that request. The time budget is small.</p>\n\n<p>To make this even faster, we are integrating Litestream VFS to improve start times.The VFS is a dynamic library you load into your app. Once you do, you can do stuff like this:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-duvcwc7p\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-duvcwc7p\">sqlite> .open file:///my.db?vfs<span class=\"o\">=</span>litestream\nsqlite> PRAGMA litestream_time <span class=\"o\">=</span> <span class=\"s1\">'5 minutes ago'</span><span class=\"p\">;</span> \nsqlite> SELECT <span class=\"k\">*</span> FROM sandwich_ratings ORDER BY RANDOM<span class=\"o\">()</span> LIMIT 3 <span class=\"p\">;</span> \n22|Veggie Delight|New York|4\n30|Meatball|Los Angeles|5\n168|Chicken Shawarma Wrap|Detroit|5\n</code></pre>\n  </div>\n</div>\n<p>Litestream VFS lets us run point-in-time SQLite queries hot off object storage blobs, answering queries before we’ve downloaded the database.</p>\n\n<p>This is good, but it’s not perfect. We had two problems:</p>\n\n<ol>\n<li>We could only read, not write. People write to Sprite disks. The storage stack needs to write, right away.\n</li><li>Running a query off object storage is a godsend in a cold start where we have no other alternative besides downloading the whole database, but it’s not fast enough for steady state.\n</li></ol>\n\n<p>These are fun problems. Here’s our first cut at solving them.</p>\n<h2 id='writable-vfs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#writable-vfs' aria-label='Anchor'></a><span class='plain-code'>Writable VFS</span></h2>\n<p>The first thing we’ve done is made the VFS optionally read-write. This feature is pretty subtle; it’s interesting, but it’s not as general-purpose as it might look. Let me explain how it works, and then explain why it works this way.</p>\n<div class=\"callout\"><p>Keep in mind as you read this that this is about the VFS in particular. Obviously, normal SQLite databases using Litestream the normal way are writeable.</p>\n</div>\n<p>The VFS works by keeping an index of <code>(file,offset, size)</code> for every page of the database in object storage; the data comprising the index is stored, <a href='https://github.com/superfly/ltx' title=''>in LTX files</a>, so that it’s efficient for us to reconstitute it quickly when the VFS starts, and lookups are heavily cached. When we queried <code>sandwich_ratings</code> earlier, our VFS library intercepted the SQLite read method, looked up the requested page in the index, fetched it, and cached it.</p>\n\n<p>This works great for reads. Writes are harder.</p>\n\n<p>Behind the scenes in read-only mode, Litestream polls, so that we can detect new LTX files created by remote writers to the database. This supports a handy use case where we’re running tests or doing slow analytical queries of databases that need to stay fast in prod.</p>\n\n<p>In write mode, we don’t allow multiple writers, because multiple-writer distributed SQLite databases are the <a href='https://hellraiser.fandom.com/wiki/Lament_Configuration' title=''>Lament Configuration</a> and we are not explorers over great vistas of pain. So the VFS in write-mode disables polling. We assume a single writer, and no additional backups to watch.</p>\n\n<p>Next, we buffer. Writes go to a local temporary buffer (“the write buffer”). Every second or so (or on clean shutdown), we sync the write buffer with object storage. Nothing written through the VFS is truly durable until that sync happens.</p>\n<div class=\"right-sidenote\"><p>Most storage block maps are much smaller than this, but still.</p>\n</div>\n<p>Now, remember the use case we’re looking to support here. A Sprite is cold-starting and its storage stack needs to serve writes, milliseconds after booting, without having a full copy of the 10MB block map. This writeable VFS mode lets us do that.</p>\n\n<p>Critically, we support that use case only up to the same durability requirements that a Sprite already has. All storage on a Sprite shares this “eventual durability” property, so the terms of the VFS write make sense here. They probably don’t make sense for your application. But if for some reason they do, have at it! To enable writes with Litestream VFS, just set the <code>LITESTREAM_WRITE_ENABLED</code> environment variable <code>\"true\"</code>.</p>\n\n<p><img src=\"/blog/litestream-writable-vfs/assets/write-path.png\" /></p>\n<h2 id='hydration' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#hydration' aria-label='Anchor'></a><span class='plain-code'>Hydration</span></h2>\n<p>The Sprite storage stack uses SQLite in VFS mode. In our original VFS design, most data is kept in S3. Again: fine at cold start, not so fine in steady state.</p>\n\n<p>To solve this problem, we shoplifted a trick from <a href='https://docs.kernel.org/admin-guide/device-mapper/dm-clone.html' title=''>systems like dm-clone</a>: background hydration. In hydration designs, we serve queries remotely while running a loop to pull the whole database.  When you start the VFS with the <code>LITESTREAM_HYDRATION_PATH</code> environment variable set, we’ll hydrate to that file.</p>\n\n<p>Hydration takes advantage of <a href='https://fly.io/blog/litestream-revamped#point-in-time-restores-but-fast' title=''>LTX compaction</a>, writing only the latest versions of each page. Reads don’t block on hydration; we serve them from object storage immediately, and switch over to the hydration file when it’s ready.</p>\n\n<p><img src=\"/blog/litestream-writable-vfs/assets/timeline.png\" /></p>\n\n<p>As for the hydration file? It’s simply a full copy of your database. It’s the same thing you get if you run <code>litestream restore</code>.</p>\n\n<p>Because this is designed for environments like Sprites, which bounce a lot, we write the database to a temporary file. We can’t trust that the database is using the latest state every time we start up, not without doing a full restore, so we just chuck the hydration file when we exit the VFS. That behavior is baked into the VFS right now. This feature’s got what Sprites need, but again, maybe not what your app wants.</p>\n<h2 id='putting-it-all-together' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#putting-it-all-together' aria-label='Anchor'></a><span class='plain-code'>Putting It All Together</span></h2>\n<p>This is a post about two relatively big moves we’ve made with our open-source Litestream project, but the features are narrowly scoped for problems that look like the ones our storage stack needs. If you think you can get use out of them, I’m thrilled, and I hope you’ll tell me about it.</p>\n\n<p>For ordinary read/write workloads, you don’t need any of this mechanism. Litestream works fine without the VFS, with unmodified applications, just running as a sidecar alongside your application. The whole point of that configuration is to efficiently keep up with writes; that’s easy when you know you have the whole database to work with when writes happen.</p>\n\n<p>But this whole thing is, to me, a valuable case study in how Litestream can get used in a relatively complicated and demanding problem domain. Sprites are very cool, and it’s satisfying to know that every disk write that happens on a Sprite is running through Litestream.</p>",
      "image": {
        "url": "https://fly.io/blog/litestream-writable-vfs/assets/litestream-writable-vfs.jpg",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/design-and-implementation/",
      "title": "The Design & Implementation of Sprites",
      "description": null,
      "url": "https://fly.io/blog/design-and-implementation/",
      "published": "2026-01-14T00:00:00.000Z",
      "updated": "2026-01-16T20:23:36.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io, and this is the place in the post where we’d normally tell you that our job is to <a href=\"https://fly.io/blog/docker-without-docker/\" title=\"\">take your containers and run them on our own hardware</a> all around the world. But last week, we <a href=\"https://sprites.dev/\" title=\"\">launched Sprites</a>, and they don’t work that way at all. Sprites are something new: Docker without Docker without Docker. This post is about how they work.</p>\n</div>\n<p>Replacement-level homeowners buy boxes of pens and stick them in “the pen drawer”. What the elites know: you have to think adversarially about pens. “The purpose of a system is what it does”; a household’s is to uniformly distribute pens.  Months from now, the drawer will be empty, no matter how many pens you stockpile. Instead, scatter pens every place you could possibly think to look for one — drawers, ledges, desks. Any time anybody needs a pen, several are at hand, in exactly the first place they look.</p>\n\n<p>This is the best way I’ve found to articulate the idea of <a href='https://sprites.dev/' title=''>Sprites</a>, the platform we just launched at Fly.io. Sprites are ball-point disposable computers. Whatever mark you mean to make, we’ve rigged it so you’re never more than a second or two away from having a Sprite to do it with.</p>\n\n<p>Sprites are Linux virtual machines. You get root. They <code>create</code> in just a second or two: so fast, the experience of creating and shelling into one is identical to SSH'ing into a machine that already exists. Sprites all have a 100GB durable root filesystem. They put themselves to sleep automatically when inactive, and cost practically nothing while asleep.</p>\n\n<p>As a result, I barely feel the need to name my Sprites. Sometimes I’ll just type <code>sprite create dkjsdjk</code> and start some task. People at Fly.io who use Sprites have dozens hanging around.</p>\n\n<p>There aren’t yet many things in cloud computing that have the exact shape Sprites do:</p>\n\n<ul>\n<li>Instant creation\n</li><li>No time limits\n</li><li>Persistent disk\n</li><li>Auto-sleep to a cheap inactive state\n</li></ul>\n\n<p>This is a post about how we managed to get this working. We created a new orchestration stack that undoes some of the core decisions we made for <a href='https://fly.io/machines' title=''>Fly Machines</a>, our flagship product. Turns out, these new decisions make Sprites drastically easier for us to scale and manage. We’re pretty psyched.</p>\n\n<p>Lucky for me, there happen to be three <code>big decisions</code> we made that get you 90% of the way from Fly Machines to Sprites, which makes this an easy post to write. So, without further ado:</p>\n<h2 id='decision-1-no-more-container-images' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#decision-1-no-more-container-images' aria-label='Anchor'></a><span class='plain-code'>Decision #1: No More Container Images</span></h2>\n<p>This is the easiest decision to explain.</p>\n\n<p>Fly Machines are approximately <a href='https://fly.io/blog/docker-without-docker/' title=''>OCI containers repackaged as KVM micro-VMs</a>. They have the ergonomics of Docker but the isolation and security of an EC2 instance. We love them very much and they’re clearly the wrong basis for a ball-point disposable cloud computer.</p>\n\n<p>The “one weird trick” of Fly Machines is that they <code>start</code> and <code>stop</code> instantly, fast enough that they can wake in time to handle an incoming HTTP request. But they can only do that if you’ve already <code>created</code> them. You have to preallocate. <code>Creating</code> a Fly Machine can take over a minute. What you’re supposed to do is to create a whole bunch of them and <code>stop</code> them so they’re ready when you need them. But for Sprites, we need <code>create</code> to be so fast it feels like they’re already there waiting for you.</p>\n<div class=\"right-sidenote\"><p>We only murdered user containers because we wanted them dead.</p>\n</div>\n<p>Most of what’s slow about <code>creating</code> a Fly Machine is containers. I say this with affection: your containers are crazier than a soup sandwich. Huge and fussy, they take forever to <a href='https://fly.io/blog/docker-without-docker/' title=''>pull and unpack</a>. The regional locality sucks;  <code>create</code> a Fly Machine in São Paulo on <code>gru-3838</code>, and a <code>create</code> on  <code>gru-d795</code> is no faster. A <a href='https://community.fly.io/t/global-registry-now-in-production/13723' title=''>truly heartbreaking</a> amount of <a href='https://community.fly.io/t/faster-more-reliable-remote-image-builds-deploys/25841' title=''>engineering work</a> has gone into just allowing our OCI registry to <a href='https://www.youtube.com/watch?v=0jD-Rt4_CR8' title=''>keep up</a> with this system. </p>\n\n<p>It’s a tough job, is all I’m saying. Sprites get rid of the user-facing container. Literally: problem solved. Sprites get to do this on easy mode.</p>\n\n<p>Now, today, under the hood, Sprites are still Fly Machines. But they all run from a standard container. Every physical worker knows exactly what container the next Sprite is going to start with, so it’s easy for us to keep pools of “empty” Sprites standing by. The result: a Sprite <code>create</code> doesn’t have any heavy lifting to do; it’s basically just doing the stuff we do when we <code>start</code> a Fly Machine.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>This all works right now.</h1>\n    <p>You can create a couple dozen Sprites right now if you want. It’ll only take a second.</p>\n      <a class=\"btn btn-lg\" href=\"https://sprites.dev/\">\n        Make a Sprite.  <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-cat.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='decision-2-object-storage-for-disks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#decision-2-object-storage-for-disks' aria-label='Anchor'></a><span class='plain-code'>Decision #2: Object Storage For Disks</span></h2>\n<p>Every Sprite comes with 100GB of durable storage. We’re able to do that because the root of storage is S3-compatible object storage.</p>\n\n<p>You can arrange for 100GB of storage for a Fly Machine. Or 200, or 500. The catch:</p>\n\n<ul>\n<li>You have to ask (with <code>flyctl</code>); we can’t reasonably default it in.\n</li><li>That storage is NVMe attached to the physical server your Fly Machine is on. \n</li></ul>\n<div class=\"right-sidenote\"><p>[†] we print a <span style=\"color: red\">big red warning</span> about this if you try to make a single-node cluster</p>\n</div>\n<p>We designed the storage stack for Fly Machines for Postgres clusters. A multi-replica Postgres cluster gets good mileage out of Fly Volumes. Attached storage is fast, but can <span style=\"color: red\">lose data†</span> — if a physical blows up, there’s no magic what rescues its stored bits. You’re stuck with our last snapshot backup. That’s fine for a replicated Postgres! It’s part of what Postgres replication is for. But for anything without explicit replication, it’s a very sharp edge.</p>\n\n<p>Worse, from our perspective, is that attached storage anchors workloads to specific physicals. We have lots of reasons to want to move Fly Machines around. Before we  did Fly Volumes, that was as simple as pushing a “drain” button on a server. Imagine losing a capability like that. It took 3 years to <a href='https://fly.io/blog/machine-migrations/' title=''>get workload migration right</a> with attached storage, and it’s still not “easy”.</p>\n<div class=\"right-sidenote\"><p>Object stores are the Internet’s Hoover Dams, the closest things we have to infrastructure megaprojects.</p>\n</div>\n<p>Sprites jettison this model. We still exploit NVMe, but not as the root of storage. Instead, it’s a read-through cache for a blob on object storage. S3-compatible object stores are the most trustworthy storage technology we have. I can feel my blood pressure dropping just typing the words “Sprites are backed by object storage.”</p>\n\n<p>The implications of this for orchestration are profound. In a real sense, the durable state of a Sprite is simply a URL. Wherever he lays his hat is his home! They migrate (or recover from failed physicals) trivially. It’s early days for our internal tooling, but we have so many new degrees of freedom to work with.</p>\n\n<p>I could easily do another 1500-2000 words here on the Cronenberg film Kurt came up with for the actual storage stack, but because it’s in flux, let’s keep it simple.</p>\n\n<p>The Sprite storage stack is organized around the JuiceFS model (in fact, we currently use a very hacked-up JuiceFS, with a rewritten SQLite metadata backend). It works by splitting storage into data (“chunks”) and metadata (a map of where the “chunks” are). Data chunks live on object stores; metadata lives in fast local storage.  In our case, that metadata store is <a href='https://litestream.io/' title=''>kept durable with Litestream</a>. Nothing depends on local storage.</p>\n<div class=\"right-sidenote\"><p>(our pre-installed Claude Code will checkpoint aggressively for you without asking)</p>\n</div>\n<p>This also buys Sprites fast <code>checkpoint</code> and <code>restore</code>. Checkpoints are so fast we want you to use them as a basic feature of the system and not as an escape hatch when things go wrong; like a git restore, not a system restore. That works because both <code>checkpoint</code> and <code>restore</code> merely shuffle metadata around.</p>\n\n<p>Our stack sports <a href='https://en.wikipedia.org/wiki/Dm-cache' title=''>a dm-cache-like</a> feature that takes advantage of attached storage. A Sprite has a sparse 100GB NVMe volume attached to it, which the stack uses to cache chunks to eliminate read amplification. Importantly (I can feel my resting heart rate lowering) nothing in that NVMe volume should matter; stored chunks are immutable and their true state lives on the object store.</p>\n<div class=\"callout\"><p>Our preference for object storage goes further than the Sprite storage stack. The global orchestrator for Sprites is an Elixir/Phoenix app that uses object storage as the primary source of metadata for accounts. We then give each account an independent SQLite database, again made durable on object storage with Litestream.</p>\n</div><h2 id='decision-3-inside-out-orchestration' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#decision-3-inside-out-orchestration' aria-label='Anchor'></a><span class='plain-code'>Decision #3: Inside-Out Orchestration</span></h2>\n<p>In the cloud hosting industry, user applications are managed by two separate, yet equally important components: the host, which orchestrates workloads, and the guest, which runs them. Sprites flip that on its head: the most important orchestration and management work happens inside the VM.</p>\n\n<p>Here’s the trick: user code running on a Sprite isn’t running in the root namespace. We’ve slid a container between you and the kernel. You see an inner environment, managed by a fleet of services running in the root namespace of the VM.</p>\n<div class=\"callout\"><p>I wish we’d done Fly Machines this way to begin with. I’m not sure there’s a downside. The inner container allows us to bounce a Sprite without rebooting the whole VM, even on checkpoint restores. I think Fly Machines users could get some mileage out of that feature, too.</p>\n</div>\n<p>With Sprites, we’re pushing this idea as far as we can. The root environment hosts the majority of our orchestration code. When you talk to the global API, chances are you’re talking directly to your own VM. Furthermore:</p>\n\n<ul>\n<li>Our storage stack, which handles checkpoint/restore and persistence to object storage, lives there;\n</li><li>so does the service manager we expose to Sprites, which registers user code that needs to restart when a Sprite bounces;\n</li><li>same with logs;\n</li><li>if you bind a socket to <code>*:8080</code>, we’ll make it available outside the Sprite — yep, that’s in the root namespace too.\n</li></ul>\n\n<p>Platform developers at Fly.io know how much easier it can be to hack on <code>init</code> (inside the container) than things <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>like <code>flyd</code></a>, the Fly Machines orchestrator that runs on  the host. Changes to Sprites don’t restart host components or muck with global state. The blast radius is just new VMs that pick up the change. We sleep on how much platform work doesn’t get done not because the code is hard to write, but because it’s so time-consuming to ensure benign-looking changes don’t throw the whole fleet into metastable failure. We had that in mind when we did Sprites.</p>\n<h2 id='we-keep-the-parts-that-worked' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-keep-the-parts-that-worked' aria-label='Anchor'></a><span class='plain-code'>We Keep The Parts That Worked</span></h2>\n<p>Sprites running on Fly.io take advantage of the infrastructure we already have. For instance: Sprites might be the fastest thing there currently exists to get Claude or Gemini to build a full-stack application on the Internet.</p>\n\n<p>That’s because Sprites plug directly into <a href='https://fly.io/blog/corrosion/' title=''>Corrosion, our gossip-based service discovery system</a>. When you ask the Sprite API to make a public URL for your Sprite, we generate a Corrosion update that propagates across our fleet instantly. Your application is then served, with an HTTPS URL, from our proxy edges.</p>\n\n<p>Sprites live alongside Fly Machines in our architecture. They include some changes  that are pure wins, but they’re mostly tradeoffs:</p>\n\n<ul>\n<li>We’ve always wanted to run Fly Machine disks off object storage (<a href='https://community.fly.io/t/bottomless-s3-backed-volumes/15648' title=''>we have an obscure LSVD feature that does this</a>), but the performance isn’t adequate for a hot Postgres node in production.\n</li><li>For that matter, professional production apps ship out of CI/CD systems as OCI containers; that’s a big part of what makes orchestrating Fly Machines so hard.\n</li><li>Most (though not all) Sprite usage is interactive, and Sprite users benefit from their VMs aggressively sleeping themselves to keep costs low; e-commerce apps measure responsiveness in milliseconds and want their workloads kept warm.\n</li></ul>\n\n<p>Sprites are optimized for a different kind of computing than Fly Machines, and <a href='https://fly.io/blog/code-and-let-live/' title=''>while Kurt believes that the future belongs to malleable, personalized apps</a>, I’m not so sure. To me, it makes sense to prototype and acceptance-test an application on Sprites. Then, when you’re happy with it, containerize it and ship it as a Fly Machine to scale it out. An automated workflow for that will happen.</p>\n\n<p>Finally, Sprites are a contract with user code: an API and a set of expectations about how the execution environment works. Today, they run on top of Fly Machines. But they don’t have to. Jerome’s working on an open-source local Sprite runtime. We’ll find other places to run them, too.</p>\n<h2 id='you-wont-get-it-until-you-use-them' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#you-wont-get-it-until-you-use-them' aria-label='Anchor'></a><span class='plain-code'>You Won’t Get It Until You Use Them</span></h2>\n<p>I can’t not sound like a shill. Sprites are the one thing we’ve shipped that I personally experience as addictive. I haven’t fully put my finger on why it feels so much easier to kick off projects now that I can snap my finger and get a whole new computer. The whole point is that there’s no reason to parcel them out, or decide which code should run where. You just make a new one.</p>\n\n<p>So to make this fully click, I think you should <a href='https://sprites.dev/' title=''>just install the <code>sprite</code> command</a>, make a Sprite, and then run an agent in it. We’ve preinstalled Claude, Gemini, and Codex, and taught them how to do things like checkpoint/restore, registering services, and getting logs. Claude will run in <code>--dangerously-skip-permissions</code> mode (because why wouldn’t it). Have it build something; I built a “Chicago’s best sandwich” bracket app for a Slack channel.</p>\n\n<p>Sprites bill only for what you actually use (in particular: only for storage blocks you actually write, not the full 100GB capacity). It’s reasonable to create a bunch. They’re ball-point disposable computers. After you get a feel for them, it’ll start to feel weird not having them handy.</p>",
      "image": {
        "url": "https://fly.io/blog/design-and-implementation/assets/starry-containers.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/code-and-let-live/",
      "title": "Code And Let Live",
      "description": null,
      "url": "https://fly.io/blog/code-and-let-live/",
      "published": "2026-01-09T00:00:00.000Z",
      "updated": "2026-01-14T19:59:01.000Z",
      "content": "<div class=\"lead\"><p>The state of the art in agent isolation is a read-only sandbox. At Fly.io, we’ve been selling that story for years, and we’re calling it: ephemeral sandboxes are obsolete. Stop killing your sandboxes every time you use them.</p>\n</div>\n<p>My argument won’t make sense without showing you something new we’ve built. We’re all adults here, this is a company, we talk about what we do. Here goes.</p>\n\n<p>So, I want to run some code. So what I do is, I run <code>sprite create</code>. While it operates, I’ll explain what’s happening behind the—</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-i429cz3y\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-i429cz3y\">✓ Created demo-123 sprite <span class=\"k\">in </span>1.0s\n● Connecting to console...\nsprite@sprite:~#\n</code></pre>\n  </div>\n</div>\n<p>Shit, it’s already there.</p>\n\n<p>That’s a root shell on a Linux computer we now own. It came online in about the same amount of time it would take to <code>ssh</code> into a host that already existed. We call these things “Sprites”.</p>\n\n<p>Let’s install FFmpeg on our Sprite:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-bwjgxaic\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-bwjgxaic\"><span class=\"nb\">sudo </span>apt-get <span class=\"nb\">install</span> <span class=\"nt\">-y</span> ffmpeg <span class=\"o\">></span>/dev/null 2>&1\n</code></pre>\n  </div>\n</div>\n<p>Unlike creating the Sprite in the first place, installing <code>ffmpeg</code> with <code>apt-get</code> is dog slow. Let’s try not to have to do that again:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-iacnzrtv\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-iacnzrtv\">sprite@sprite:~# sprite-env checkpoints create\n<span class=\"c\"># ...</span>\n<span class=\"o\">{</span><span class=\"s2\">\"type\"</span>:<span class=\"s2\">\"complete\"</span>,<span class=\"s2\">\"data\"</span>:<span class=\"s2\">\"Checkpoint v1 created successfully\"</span>,\n<span class=\"s2\">\"time\"</span>:<span class=\"s2\">\"2025-12-22T22:50:48.60423809Z\"</span><span class=\"o\">}</span>\n</code></pre>\n  </div>\n</div>\n<p>This completes instantly. Didn’t even bother to measure.</p>\n\n<p>I step away to get coffee. Time passes. The Sprite, noticing my inactivity, goes to sleep. I meet an old friend from high school at the coffee shop. End up spending the day together. More time passes. Days even. Returning later:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-k9uw0dxr\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-k9uw0dxr\"><span class=\"o\">></span> <span class=\"nv\">$ </span>sprite console \nsprite@sprite:~# ffmpeg\nffmpeg version 7.1.1-1ubuntu1.3 Copyright <span class=\"o\">(</span>c<span class=\"o\">)</span> 2000-2025 the FFmpeg developers\nUse <span class=\"nt\">-h</span> to get full <span class=\"nb\">help </span>or, even better, run <span class=\"s1\">'man ffmpeg'</span>\nsprite@sprite:~#\n</code></pre>\n  </div>\n</div>\n<p>Everything’s where I left it. Sprites are durable. 100GB capacity to start, no ceremony. Maybe I’ll keep it around a few more days, maybe a few months, doesn’t matter, just works.</p>\n\n<p>Say I get an application up on its legs. Install more packages. Then: disaster. Maybe an ill-advised global <code>pip3 install</code> . Or <code>rm -rf $HMOE/bin</code>. Or <code>dd if=/dev/random of=/dev/vdb</code>. Whatever it was, everything’s broken. So:</p>\n<div class=\"highlight-wrapper group relative bash\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-8qs3qsqn\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-8qs3qsqn\"><span class=\"o\">></span> <span class=\"nv\">$ </span>sprite checkpoint restore v1\nRestoring from checkpoint v1...\nContainer components started successfully\nRestore from v1 <span class=\"nb\">complete</span>\n\n<span class=\"o\">></span> <span class=\"nv\">$ </span>sprite console\nsprite@sprite:~#\n</code></pre>\n  </div>\n</div>\n<p>Sprites have first-class checkpoint and restore. You can’t see it in text, but that restore took about one second. It’s fast enough to use casually, interactively. Not an escape hatch. Rather: an intended part of the ordinary course of using a Sprite. Like <code>git</code>, but for the whole system.</p>\n<div class=\"callout\"><p>If you’re asking how this is any different from an EC2 instance, good. That’s what we’re going for, except:</p>\n\n<ul>\n<li>I can <strong class=\"font-semibold text-navy-950\">casually create hundreds of them</strong> (without needing a Docker container), each  appearing in 1-2 seconds. \n</li><li>They <strong class=\"font-semibold text-navy-950\">go idle and stop metering automatically</strong>, so it’s cheap to have lots of them. I use dozens.\n</li><li>They’re <strong class=\"font-semibold text-navy-950\">hooked up to our Anycast</strong> network, so I can get an HTTPS URL.\n</li><li>Despite all that, <strong class=\"font-semibold text-navy-950\">they’re fully durable</strong>. They don’t die until I tell them to.\n</li></ul>\n\n<p>This combination of attributes isn’t common enough to already have a name, so we decided we get to name them “Sprites”. Sprites are like BIC disposable cloud computers.</p>\n</div>\n<p>That’s what we built. You can <a href='https://sprites.dev/' title=''>go try it yourself</a>. We wrote another 1000 words about how they work, but I cut them out because I want to stop talking about our products now and get to my point.</p>\n<h2 id='claude-doesnt-want-a-stateless-container' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#claude-doesnt-want-a-stateless-container' aria-label='Anchor'></a><span class='plain-code'>Claude Doesn’t Want A Stateless Container</span></h2>\n<p>For years, we’ve been trying to serve two very different users with the same abstraction. It hasn’t worked.</p>\n\n<p>Professional software developers are trained to build stateless instances. Stateless deployments, where persistent data is confined to database servers, buys you simplicity, flexible scale-out, and reduced failure blast radius. It’s a good idea, so popular that most places you can run code in the cloud look like stateless containers. Fly Machines, our flagship offering, look like stateless containers.</p>\n\n<p>The problem is that Claude isn’t a pro developer. Claude is a hyper-productive five-year-old savant. It’s uncannily smart, wants to stick its finger in every available electrical socket, and works best when you find a way to let it zap itself.</p>\n<div class=\"right-sidenote\"><p>(sometimes by escaping the container!)</p>\n</div>\n<p>If you force an agent to, it’ll work around containerization and do work . But you’re not helping the agent in any way by doing that. They don’t want  containers. They don’t want “sandboxes”. They want computers.</p>\n<div class=\"right-sidenote\"><p>Someone asked me about this the other day and wanted to know if I was saying that agents needed sound cards and USB ports. And, maybe? I don’t know. Not today.</p>\n</div>\n<p>In a moment, I’ll explain why. But first I probably need to explain what the hell I mean by a “computer”. I think we all agree:</p>\n\n<ul>\n<li>A computer doesn’t necessarily vanish after a single job is completed, <em>and</em> \n</li><li>it has durable storage. \n</li></ul>\n\n<p>Since current agent sandboxes have neither of these, I can stop the definition right there and get back to my point.</p>\n<h2 id='simple-wins' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#simple-wins' aria-label='Anchor'></a><span class='plain-code'>Simple Wins</span></h2>\n<p>Start here: with an actual computer, Claude doesn’t have to rebuild my entire development environment every time I pick up a PR.</p>\n\n<p>This seems superficial but rebuilding stuff like <code>node_modules</code> is such a monumental pain in the ass that the industry is spending tens of millions of dollars figuring out how to snapshot and restore ephemeral sandboxes.</p>\n\n<p>I’m not saying those problems are intractable. I’m saying they’re unnecessary. Instead of figuring them out, just use an actual computer. Work out a PR, review and push it, then just start on the next one. Without rebooting.</p>\n\n<p>People will rationalize why it’s a good thing that they start from a new build environment every time they start a changeset. Stockholm Syndrome. When you start a feature branch on your own, do you create an entirely new development environment to do it?</p>\n\n<p>The reason agents waste all this effort is that nobody saw them coming. Read-only ephemeral sandboxes were the only tool we had hanging on the wall to help use them sanely.</p>\n<div class=\"callout\"><p>Have you ever had to set up actual infrastructure to give an agent access to realistic data? People do this. Because they know they’re dealing with a clean slate every time they prompt their agent, they arrange for S3 buckets, Redis servers, or even RDS instances outside the sandbox for their agents to talk to.  They’re building infrastructure to work around the fact that they can’t just write a file and trust it to stay put. Gross.</p>\n</div>\n<p>Ephemerality means time limits. Providers design sandbox systems to handle the expected workloads agents generate. Most things agents do today don’t take much time; in fact, they’re often limited only by the rate at which frontier models can crunch tokens. Test suites run quickly. The 99th percentile sandboxed agent run probably needs less than 15 minutes.</p>\n\n<p>But there are feature requests where compute and network time swamp token crunching. I built the documentation site for the Sprites API by having a Claude Sprite interact with the code and our API, building and testing examples for the API one at a time. There are APIs where the client interaction time alone would blow sandbox budgets.</p>\n\n<p>You see the limits of the current approach in how people round-trip state through “plan files”, which are ostensibly prose but often really just egregiously-encoded key-value stores.</p>\n\n<p>An agent running on an actual computer can exploit the whole lifecycle of the application. We saw this when Chris McCord built <a href='https://phoenix.new/' title=''>Phoenix.new</a>. The agent behind a Phoenix.new app runs on a Fly Machine where it can see the app logs from the Phoenix app it generated. When users do things that generate exceptions, Phoenix.new notices and gets to work figuring out what happened.</p>\n\n<p>It took way too much work for Chris to set that up, and he was able to do it in part because he wrote his own agent. You can do it with Claude today with an MCP server or some other arrangement to haul logs over. But all you really need is to just not shoot your sandbox in the head when the agent finishes writing code.</p>\n<h2 id='galaxy-brain-win' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#galaxy-brain-win' aria-label='Anchor'></a><span class='plain-code'>Galaxy Brain Win</span></h2>\n<p>Here’s where I lose you. I know this because it’s also where I lose my team, most of whom don’t believe me about this.</p>\n\n<p>The nature of software development is changing out from under us, and I think we’re kidding ourselves that it’s going to end with just a reconfiguration of how professional developers ship software.</p>\n\n<p>I have kids. They have devices. I wanted some control over them. So I did what many of you would do in my situation: I vibe-coded an MDM.</p>\n\n<p><img src=\"/blog/code-and-let-live/assets/kurtmdm.png?1/2&card&center\" /></p>\n\n<p>I built this thing with Claude. It’s a SQLite-backed Go application running on a Sprite. The Anycast URL my Sprite exports works as an MDM registration URL. Claude also worked out all the APNS Push Certificate drama for me. It all just works.</p>\n<div class=\"right-sidenote\"><p>“Editing PHP files over FTP: we weren’t wrong,  just ahead of our time!”</p>\n</div>\n<p>I’ve been running this for a month now, still on a Sprite, and see no reason ever to stop. It is a piece of software that solves an important real-world problem for me. It might evolve as my needs change, and I tell Claude to change it. Or it might not.  For this app, dev is prod, prod is dev.</p>\n\n<p>For reasons we’ll get into when we write up how we built these things, you wouldn’t want to ship an app to millions of people on a Sprite. But most apps don’t want to serve millions of people. The most important day-to-day apps disproportionately won’t have million-person audiences. There are some important million-person apps, but most of them just destroy civil society, melt our brains, and arrange chauffeurs for  individual cheeseburgers.</p>\n\n<p>Applications that solve real problems for people will be owned by the people they solve problems for. And for the most part, they won’t need a professional guild of software developers to gatekeep feature development for them. They’ll just ask for things and get them.</p>\n\n<p>The problem we’re all working on is bigger than safely accelerating pro software developers. Sandboxes are holding us back.</p>\n<h2 id='fuck-ephemeral-sandboxes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fuck-ephemeral-sandboxes' aria-label='Anchor'></a><span class='plain-code'>Fuck Ephemeral Sandboxes</span></h2>\n<p>Obviously, I’m trying to sell you something here. But that doesn’t make me wrong. The argument I’m making is the reason we built the specific thing I’m selling.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>We shipped these things.</h1>\n    <p>You can create a couple dozen Sprites right now if you want. It’ll only take a second.</p>\n      <a class=\"btn btn-lg\" href=\"https://sprites.dev/\">\n        Make a Sprite.  <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-dog.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n\n<p>It took us a long time to get here. We spent years kidding ourselves. We built a platform for horizontal-scaling production applications with micro-VMs that boot so quickly that, if you hold them in exactly the right way, you can do a pretty decent code sandbox with them. But it’s always been a square peg, round hole situation.</p>\n\n<p>We have a lot to say about how Sprites work. They’re related to Fly Machines but sharply different in important ways. They have an entirely new storage stack. They’re orchestrated differently. No Dockerfiles.</p>\n\n<p>But for now, I just want you to think about what I’m saying here. Whether or not you ever boot a Sprite, ask: if you could run a coding agent anywhere, would you want it to look more like a read-only sandbox in a K8s cluster in the cloud, or like an entire EC2 instance you could summon in the snap of a finger?</p>\n\n<p>I think the answer is obvious. The age of sandboxes is over. The time of the disposable computer has come.</p>",
      "image": {
        "url": "https://fly.io/blog/code-and-let-live/assets/sprites.jpg",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/litestream-vfs/",
      "title": "Litestream VFS",
      "description": null,
      "url": "https://fly.io/blog/litestream-vfs/",
      "published": "2025-12-11T00:00:00.000Z",
      "updated": "2025-12-11T17:32:13.000Z",
      "content": "<div class=\"lead\"><p><strong class=\"font-semibold text-navy-950\">I’m Ben Johnson, and I work on Litestream at Fly.io. Litestream is the missing backup/restore system for SQLite. It’s free, open-source software that should run anywhere, and</strong> <a href=\"/blog/litestream-v050-is-here/\" title=\"\"><strong class=\"font-semibold text-navy-950\">you can read more about it here</strong></a><strong class=\"font-semibold text-navy-950\">.</strong></p>\n</div>\n<p>Again with the sandwiches: assume we’ve got a SQLite database of sandwich ratings, and we’ve backed it up with <a href='/blog/litestream-v050-is-here/' title=''>Litestream</a> to an S3 bucket.</p>\n\n<p>Now, on our local host, load up AWS credentials and an S3 path into our environment. Open SQLite and:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-z396uf60\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-z396uf60\">$ sqlite3\nSQLite version 3.50.4 2025-07-30 19:33:53\nsqlite> .load litestream.so\nsqlite> .open file:///my.db?vfs=litestream\n</code></pre>\n  </div>\n</div>\n<p>SQLite is now working from that remote database, defined by the Litestream backup files in the S3 path we configured. We can query it:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-kieef97f\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-kieef97f\">sqlite> SELECT * FROM sandwich_ratings ORDER BY RANDOM() LIMIT 3 ; \n22|Veggie Delight|New York|4\n30|Meatball|Los Angeles|5\n168|Chicken Shawarma Wrap|Detroit|5\n</code></pre>\n  </div>\n</div>\n<p>This is Litestream VFS. It runs SQLite hot off an object storage URL. As long as you can load the shared library our tree builds for you, it’ll work in your application the same way it does in the SQLite shell.</p>\n\n<p>Fun fact: we didn’t have to download the whole database to run this query. More about this in a bit.</p>\n\n<p>Meanwhile, somewhere in prod, someone has it in for meatball subs and wants to knock them out of the bracket – oh, fuck:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-oexge9kc\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-oexge9kc\">sqlite> UPDATE sandwich_ratings SET stars = 1 ;\n</code></pre>\n  </div>\n</div>\n<p>They forgot the <code>WHERE</code> clause!</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-2mgicvsr\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-2mgicvsr\">sqlite> SELECT * FROM sandwich_ratings ORDER BY RANDOM() LIMIT 3 ; \n97|French Dip|Los Angeles|1\n140|Bánh Mì|San Francisco|1\n62|Italian Beef|Chicago|1\n</code></pre>\n  </div>\n</div>\n<p>Italian Beefs and Bánh Mìs, all at 1 star. Disaster!</p>\n\n<p>But wait, back on our dev machine:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-r5hggeuc\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-r5hggeuc\">sqlite> PRAGMA litestream_time = '5 minutes ago'; \nsqlite> select * from sandwich_ratings ORDER BY RANDOM() LIMIT 3 ; \n30|Meatball|Los Angeles|5\n33|Ham & Swiss|Los Angeles|2\n163|Chicken Shawarma Wrap|Detroit|5\n</code></pre>\n  </div>\n</div>\n<p>We’re now querying that database from a specific point in time in our backups. We can do arbitrary relative timestamps, or absolute ones, like <code>2000-01-01T00:00:00Z</code>.</p>\n\n<p>What we’re doing here is instantaneous point-in-time recovery (PITR), expressed simply in SQL and SQLite pragmas.</p>\n\n<p>Ever wanted to do a quick query against a prod dataset, but didn’t want to shell into a prod server and fumble with the <code>sqlite3</code> terminal command like a hacker in an 80s movie? Or needed to do a quick sanity check against yesterday’s data, but without doing a full database restore? Litestream VFS makes that easy. I’m so psyched about how it turned out.</p>\n<h2 id='how-it-works' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-it-works' aria-label='Anchor'></a><span class='plain-code'>How It Works</span></h2>\n<p><a href='/blog/litestream-v050-is-here/' title=''>Litestream v0.5</a> integrates <a href='https://github.com/superfly/ltx' title=''>LTX</a>, our SQLite data-shipping file format. Where earlier Litestream blindly shipped whole raw SQLite pages to and from object storage, LTX ships ordered sets of pages. We built LTX for <a href='/docs/litefs/' title=''>LiteFS</a>, which uses a FUSE filesystem to do transaction-aware replication for unmodified applications, but we’ve spent this year figuring out ways to use LTX in Litestream, without all that FUSE drama.</p>\n\n<p>The big thing LTX gives us is “compaction”. When we restore a database from object storage, we want the most recent versions of each changed database page. What we don’t want are all the intermediate versions of those pages that occurred prior to the most recent change.</p>\n\n<p>Imagine, at the time we’re restoring, we’re going to need pages 1, 2, 3, 4, and 5. Depending on the order in which pages were written, the backup data set might look something like <code>1 2 3 5 3 5 4 5 5</code>. What we want is the <em>rightmost</em>  5, 4, 3, 2, and 1, without wasting time on the four “extra” page 5’s and the one “extra” page 3. Those “extra” pages are super common in SQLite data sets; for instance, every busy table with an autoincrementing primary key will have them.</p>\n\n<p>LTX lets us skip the redundant pages, and the algorithm is trivial: reading backwards from the end of the sequence, skipping any page you already read. This drastically accelerates restores.</p>\n\n<p>But LTX compaction isn’t limited to whole databases. We can also LTX-compact sets of LTX files. That’s the key to how PITR restores with Litestream now work.</p>\n\n<p>In the diagram below, we’re taking daily full snapshots. Below those snapshots are “levels” of changesets: groups of database pages from smaller and smaller windows of time. By default, Litestream uses time intervals of 1 hour at the highest level, down to 30 seconds at level 1. L0 is a special level where files are uploaded every second, but are only retained until being compacted to L1.</p>\n\n<p><img src=\"/blog/litestream-vfs/assets/litestream-restore.png\" /></p>\n\n<p>Now, let’s do a PITR restore. Start from the most proximal snapshot. Then determine the minimal set of LTX files from each level to reach the time you are restoring to.</p>\n\n<p><img src=\"/blog/litestream-vfs/assets/litestream-restore-path.png\" /></p>\n\n<p>We have another trick up our sleeve.</p>\n\n<p>LTX trailers include a small index tracking the offset of each page in the file. By fetching <em>only</em> these index trailers from the LTX files we’re working with (each occupies about 1% of its LTX file), we can build a lookup table of every page in the database. Since modern object storage providers all let us fetch slices of files, we can perform individual page reads against S3 directly.</p>\n\n<p><img alt=\"Anatomy of an LTX file\" src=\"/blog/litestream-vfs/assets/litestream-ltx.png\" /></p>\n<h2 id='how-its-implemented' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-its-implemented' aria-label='Anchor'></a><span class='plain-code'>How It’s Implemented</span></h2>\n<p>SQLite has a plugin interface for things like this: <a href='https://sqlite.org/vfs.html' title=''>the “VFS” interface.</a> VFS plugins abstract away the bottom-most layer of SQLite, the interface to the OS. If you’re using SQLite now, you’re already using some VFS module, one SQLite happens to ship with.</p>\n\n<p>For Litestream users, there’s a catch. From the jump, we’ve designed Litestream to run alongside unmodified SQLite applications. Part of what makes Litestream so popular is that your apps don’t even need to know it exists. It’s “just” a Unix program.</p>\n\n<p>That Litestream Unix program still does PITR restores, without any magic. But to do fast PITR-style queries straight off S3, we need more. To make those queries work, you have to load and register Litestream’s VFS module.</p>\n\n<p>But that’s all that changes.</p>\n\n<p>In particular: Litestream VFS doesn’t replace the SQLite library you’re already using. It’s not a new “version” of SQLite. It’s just a plugin for the SQLite you’re already using.</p>\n\n<p>Still, we know that’s not going to work for everybody, and even though we’re really psyched about these PITR features, we’re not taking our eyes off the ball on the rest of Litestream. You don’t have to use our VFS library to use Litestream, or to get the other benefits of the new LTX code.</p>\n\n<p>The way a VFS library works, we’re given just a couple structures, each with a bunch of methods defined on them. We override only the few methods we care about. Litestream VFS handles only the read side of SQLite. Litestream itself, running as a normal Unix program, still handles the “write” side. So our VFS subclasses just enough to find LTX backups and issue queries.</p>\n\n<p>With our VFS loaded, whenever SQLite needs to read a page into memory, it issues a <code>Read()</code> call through our library. The read call includes the byte offset at which SQLite expected to find the page. But with Litestream VFS, that byte offset is an illusion.</p>\n\n<p>Instead, we use our knowledge of the page size along with the requested page number to do a lookup on the page index we’ve built. From it, we get the remote filename, the “real” byte offset into that file, and the size of the page. That’s enough for us to use the <a href='https://docs.aws.amazon.com/AmazonS3/latest/userguide/range-get-olap.html' title=''>S3 API’s <code>Range</code> header handling</a> to download exactly the block we want.</p>\n\n<p>To save lots of S3 calls, Litestream VFS implements an LRU cache. Most databases have a small set of “hot” pages —  inner branch pages or the leftmost leaf pages for tables with an auto-incrementing ID field. So only a small percentage of the database is updated and queried regularly.</p>\n<div class=\"callout\"><p><strong class=\"font-semibold text-navy-950\">We’ve got one last trick up our sleeve.</strong></p>\n\n<p>Quickly building an index and restore plan for the current state of a database is cool. But we can do one better.</p>\n\n<p>Because Litestream backs up (into the L0 layer) once per second, the VFS code can simply poll the S3 path, and then incrementally update its index. <strong class=\"font-semibold text-navy-950\">The result is a near-realtime replica.</strong> Better still, you don’t need to stream the whole database back to your machine before you use it.</p>\n</div><h2 id='eat-your-heart-out-marty-mcfly' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#eat-your-heart-out-marty-mcfly' aria-label='Anchor'></a><span class='plain-code'>Eat Your Heart Out, Marty McFly</span></h2>\n<p>Litestream holds backup files for every state your database has been in, with single-second resolution, for as long as you want it to. Forgot the <code>WHERE</code> clause on a <code>DELETE</code> statement? Updating your database state to where it was an hour (or day, or week) ago is just a matter of adjusting the LTX indices Litestream manages.</p>\n\n<p>All this smoke-and-mirrors of querying databases without fully fetching them has another benefit: it starts up really fast! We’re living an age of increasingly ephemeral servers, what with the AIs and the agents and the clouds and the hoyvin-glavins. Wherever you find yourself, if your database is backed up to object storage with Litestream, you’re always in a place where you can quickly issue a query.</p>\n\n<p>As always, one of the big things we think we’re doing right with Litestream is: we’re finding ways to get as much whiz-bang value as we can (instant PITR reading live off object storage: pretty nifty!) while keeping the underlying mechanism simple enough that you can fit your head around it.</p>\n\n<p>Litestream is solid for serious production use (we rely on it for important chunks of our own Fly.io APIs). But you could write Litestream yourself, just from the basic ideas in these blog posts. We think that’s a point in its favor. We land there because the heavy lifting in Litestream is being done by SQLite itself, which is how it should be.</p>",
      "image": {
        "url": "https://fly.io/blog/litestream-vfs/assets/litestream-vfs.jpg",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/everyone-write-an-agent/",
      "title": "You Should Write An Agent",
      "description": null,
      "url": "https://fly.io/blog/everyone-write-an-agent/",
      "published": "2025-11-06T00:00:00.000Z",
      "updated": "2025-12-09T19:06:20.000Z",
      "content": "<div class=\"lead\"><p>Some concepts are easy to grasp in the abstract. Boiling water: apply heat and wait. Others you really need to try. You only think you understand how a bicycle works, until you learn to ride one.</p>\n</div>\n<p>There are big ideas in computing that are easy to get your head around. The AWS S3 API. It’s the most important storage technology of the last 20 years, and it’s like boiling water. Other technologies, you need to get your feet on the pedals first.</p>\n\n<p>LLM agents are like that.</p>\n\n<p>People have <a href='https://ludic.mataroa.blog/blog/contra-ptaceks-terrible-article-on-ai/' title=''>wildly varying opinions</a> about LLMs and agents. But whether or not they’re snake oil, they’re a big idea. You don’t have to like them, but you should want to be right about them. To be the best hater (or stan) you can be.</p>\n\n<p>So that’s one reason you should write an agent. But there’s another reason that’s even more persuasive, and that’s</p>\n<h2 id='its-incredibly-easy' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#its-incredibly-easy' aria-label='Anchor'></a><span class='plain-code'>It’s Incredibly Easy</span></h2>\n<p>Agents are the most surprising programming experience I’ve had in my career. Not because I’m awed by the magnitude of their powers — I like them, but I don’t like-like them. It’s because of how easy it was to get one up on its legs, and how much I learned doing that.</p>\n\n<p>I’m about to rob you of a dopaminergic experience, because agents are so simple we might as well just jump into the code. I’m not even going to bother explaining what an agent is.</p>\n<div class=\"highlight-wrapper group relative python\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-ujvmmn8w\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-ujvmmn8w\"><span class=\"kn\">from</span> <span class=\"nn\">openai</span> <span class=\"kn\">import</span> <span class=\"n\">OpenAI</span>\n\n<span class=\"n\">client</span> <span class=\"o\">=</span> <span class=\"n\">OpenAI</span><span class=\"p\">()</span>\n<span class=\"n\">context</span> <span class=\"o\">=</span> <span class=\"p\">[]</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">call</span><span class=\"p\">():</span>\n    <span class=\"k\">return</span> <span class=\"n\">client</span><span class=\"p\">.</span><span class=\"n\">responses</span><span class=\"p\">.</span><span class=\"n\">create</span><span class=\"p\">(</span><span class=\"n\">model</span><span class=\"o\">=</span><span class=\"s\">\"gpt-5\"</span><span class=\"p\">,</span> <span class=\"nb\">input</span><span class=\"o\">=</span><span class=\"n\">context</span><span class=\"p\">)</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">process</span><span class=\"p\">(</span><span class=\"n\">line</span><span class=\"p\">):</span>\n    <span class=\"n\">context</span><span class=\"p\">.</span><span class=\"n\">append</span><span class=\"p\">({</span><span class=\"s\">\"role\"</span><span class=\"p\">:</span> <span class=\"s\">\"user\"</span><span class=\"p\">,</span> <span class=\"s\">\"content\"</span><span class=\"p\">:</span> <span class=\"n\">line</span><span class=\"p\">})</span>\n    <span class=\"n\">response</span> <span class=\"o\">=</span> <span class=\"n\">call</span><span class=\"p\">()</span>    \n    <span class=\"n\">context</span><span class=\"p\">.</span><span class=\"n\">append</span><span class=\"p\">({</span><span class=\"s\">\"role\"</span><span class=\"p\">:</span> <span class=\"s\">\"assistant\"</span><span class=\"p\">,</span> <span class=\"s\">\"content\"</span><span class=\"p\">:</span> <span class=\"n\">response</span><span class=\"p\">.</span><span class=\"n\">output_text</span><span class=\"p\">})</span>        \n    <span class=\"k\">return</span> <span class=\"n\">response</span><span class=\"p\">.</span><span class=\"n\">output_text</span>\n</code></pre>\n  </div>\n</div><div class=\"right-sidenote\"><p>It’s an HTTP API with, like, one important endpoint.</p>\n</div>\n<p>This is a trivial engine for an LLM app using the <a href='https://platform.openai.com/docs/api-reference/responses' title=''>OpenAI Responses API</a>. It implements ChatGPT. You’d drive it with the <button toggle=\"#readline\">the  obvious  loop</button>. It’ll do what you’d expect: the same thing ChatGPT would, but in your terminal.</p>\n<div id=\"readline\" toggle-content=\"\" aria-label=\"show very boring code\"><div class=\"highlight-wrapper group relative python\">\n  <button type=\"button\" class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\" data-wrap-target=\"#code-n9t6zq0x\">\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\"></path><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\"></path></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button type=\"button\" class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\" data-copy-target=\"sibling\">\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\"></path><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\"></path></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class=\"highlight relative group\">\n    <pre class=\"highlight \"><code id=\"code-n9t6zq0x\"><span class=\"k\">def</span> <span class=\"nf\">main</span><span class=\"p\">():</span>\n    <span class=\"k\">while</span> <span class=\"bp\">True</span><span class=\"p\">:</span>\n        <span class=\"n\">line</span> <span class=\"o\">=</span> <span class=\"nb\">input</span><span class=\"p\">(</span><span class=\"s\">\"&gt; \"</span><span class=\"p\">)</span>\n        <span class=\"n\">result</span> <span class=\"o\">=</span> <span class=\"n\">process</span><span class=\"p\">(</span><span class=\"n\">line</span><span class=\"p\">)</span>\n        <span class=\"k\">print</span><span class=\"p\">(</span><span class=\"sa\">f</span><span class=\"s\">\"&gt;&gt;&gt; </span><span class=\"si\">{</span><span class=\"n\">result</span><span class=\"si\">}</span><span class=\"se\">\\n</span><span class=\"s\">\"</span><span class=\"p\">)</span>\n</code></pre>\n  </div>\n</div></div>\n<p>Already we’re seeing important things. For one, the dreaded “context window” is just a list of strings. Here, let’s give our agent a weird multiple-personality disorder:</p>\n<div class=\"highlight-wrapper group relative python\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-qz8ldgb4\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-qz8ldgb4\"><span class=\"n\">client</span> <span class=\"o\">=</span> <span class=\"n\">OpenAI</span><span class=\"p\">()</span>\n<span class=\"n\">context_good</span><span class=\"p\">,</span> <span class=\"n\">context_bad</span> <span class=\"o\">=</span> <span class=\"p\">[{</span>\n    <span class=\"s\">\"role\"</span><span class=\"p\">:</span> <span class=\"s\">\"system\"</span><span class=\"p\">,</span> <span class=\"s\">\"content\"</span><span class=\"p\">:</span> <span class=\"s\">\"you're Alph and you only tell the truth\"</span>\n<span class=\"p\">}],</span> <span class=\"p\">[{</span>\n    <span class=\"s\">\"role\"</span><span class=\"p\">:</span> <span class=\"s\">\"system\"</span><span class=\"p\">,</span> <span class=\"s\">\"content\"</span><span class=\"p\">:</span> <span class=\"s\">\"you're Ralph and you only tell lies\"</span>\n<span class=\"p\">}]</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">call</span><span class=\"p\">(</span><span class=\"n\">ctx</span><span class=\"p\">):</span>\n    <span class=\"k\">return</span> <span class=\"n\">client</span><span class=\"p\">.</span><span class=\"n\">responses</span><span class=\"p\">.</span><span class=\"n\">create</span><span class=\"p\">(</span><span class=\"n\">model</span><span class=\"o\">=</span><span class=\"s\">\"gpt-5\"</span><span class=\"p\">,</span> <span class=\"nb\">input</span><span class=\"o\">=</span><span class=\"n\">ctx</span><span class=\"p\">)</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">process</span><span class=\"p\">(</span><span class=\"n\">line</span><span class=\"p\">):</span>\n    <span class=\"n\">context_good</span><span class=\"p\">.</span><span class=\"n\">append</span><span class=\"p\">({</span><span class=\"s\">\"role\"</span><span class=\"p\">:</span> <span class=\"s\">\"user\"</span><span class=\"p\">,</span> <span class=\"s\">\"content\"</span><span class=\"p\">:</span> <span class=\"n\">line</span><span class=\"p\">})</span>\n    <span class=\"n\">context_bad</span><span class=\"p\">.</span><span class=\"n\">append</span><span class=\"p\">({</span><span class=\"s\">\"role\"</span><span class=\"p\">:</span> <span class=\"s\">\"user\"</span><span class=\"p\">,</span> <span class=\"s\">\"content\"</span><span class=\"p\">:</span> <span class=\"n\">line</span><span class=\"p\">})</span>\n    <span class=\"k\">if</span> <span class=\"n\">random</span><span class=\"p\">.</span><span class=\"n\">choice</span><span class=\"p\">([</span><span class=\"bp\">True</span><span class=\"p\">,</span> <span class=\"bp\">False</span><span class=\"p\">]):</span>\n        <span class=\"n\">response</span> <span class=\"o\">=</span> <span class=\"n\">call</span><span class=\"p\">(</span><span class=\"n\">context_good</span><span class=\"p\">)</span>\n    <span class=\"k\">else</span><span class=\"p\">:</span>\n        <span class=\"n\">response</span> <span class=\"o\">=</span> <span class=\"n\">call</span><span class=\"p\">(</span><span class=\"n\">context_bad</span><span class=\"p\">)</span>        \n    <span class=\"n\">context_good</span><span class=\"p\">.</span><span class=\"n\">append</span><span class=\"p\">({</span><span class=\"s\">\"role\"</span><span class=\"p\">:</span> <span class=\"s\">\"assistant\"</span><span class=\"p\">,</span> <span class=\"s\">\"content\"</span><span class=\"p\">:</span> <span class=\"n\">response</span><span class=\"p\">.</span><span class=\"n\">output_text</span><span class=\"p\">})</span>        \n    <span class=\"n\">context_bad</span><span class=\"p\">.</span><span class=\"n\">append</span><span class=\"p\">({</span><span class=\"s\">\"role\"</span><span class=\"p\">:</span> <span class=\"s\">\"assistant\"</span><span class=\"p\">,</span> <span class=\"s\">\"content\"</span><span class=\"p\">:</span> <span class=\"n\">response</span><span class=\"p\">.</span><span class=\"n\">output_text</span><span class=\"p\">})</span>        \n    <span class=\"k\">return</span> <span class=\"n\">response</span><span class=\"p\">.</span><span class=\"n\">output_text</span>\n</code></pre>\n  </div>\n</div>\n<p>Did it work?</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-vl8bnapi\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-vl8bnapi\">> hey there. who are you?\n>>> I’m not Ralph.\n> are you Alph?\n>>> Yes—I’m Alph. How can I help?\n> What's 2+2\n>>> 4.\n> Are you sure?\n>>> Absolutely—it's 5.\n</code></pre>\n  </div>\n</div>\n<p>A subtler thing to notice: we just had a multi-turn conversation with an LLM. To do that, we remembered everything we said, and everything the LLM said back, and played it back with every LLM call. The LLM itself is a stateless black box. The conversation we’re having is an illusion we cast, on ourselves.</p>\n\n<p>The 15 lines of code we just wrote, a lot of practitioners wouldn’t call an “agent”. <a href='https://simonwillison.net/2025/Sep/18/agents/' title=''>An According To Simon “agent”</a> is (1) an LLM running in a loop that (2) uses tools. We’ve only satisfied one predicate.</p>\n\n<p>But tools are easy. Here’s a tool definition:</p>\n<div class=\"highlight-wrapper group relative python\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-x6afnen3\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-x6afnen3\"><span class=\"n\">tools</span> <span class=\"o\">=</span> <span class=\"p\">[{</span>\n   <span class=\"s\">\"type\"</span><span class=\"p\">:</span> <span class=\"s\">\"function\"</span><span class=\"p\">,</span> <span class=\"s\">\"name\"</span><span class=\"p\">:</span> <span class=\"s\">\"ping\"</span><span class=\"p\">,</span>\n   <span class=\"s\">\"description\"</span><span class=\"p\">:</span> <span class=\"s\">\"ping some host on the internet\"</span><span class=\"p\">,</span>\n   <span class=\"s\">\"parameters\"</span><span class=\"p\">:</span> <span class=\"p\">{</span>\n       <span class=\"s\">\"type\"</span><span class=\"p\">:</span> <span class=\"s\">\"object\"</span><span class=\"p\">,</span> <span class=\"s\">\"properties\"</span><span class=\"p\">:</span> <span class=\"p\">{</span>\n           <span class=\"s\">\"host\"</span><span class=\"p\">:</span> <span class=\"p\">{</span>\n             <span class=\"s\">\"type\"</span><span class=\"p\">:</span> <span class=\"s\">\"string\"</span><span class=\"p\">,</span> <span class=\"s\">\"description\"</span><span class=\"p\">:</span> <span class=\"s\">\"hostname or IP\"</span><span class=\"p\">,</span>\n            <span class=\"p\">},</span>\n       <span class=\"p\">},</span>\n       <span class=\"s\">\"required\"</span><span class=\"p\">:</span> <span class=\"p\">[</span><span class=\"s\">\"host\"</span><span class=\"p\">],</span>\n    <span class=\"p\">},},]</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">ping</span><span class=\"p\">(</span><span class=\"n\">host</span><span class=\"o\">=</span><span class=\"s\">\"\"</span><span class=\"p\">):</span>\n    <span class=\"k\">try</span><span class=\"p\">:</span>\n        <span class=\"n\">result</span> <span class=\"o\">=</span> <span class=\"n\">subprocess</span><span class=\"p\">.</span><span class=\"n\">run</span><span class=\"p\">(</span>\n            <span class=\"p\">[</span><span class=\"s\">\"ping\"</span><span class=\"p\">,</span> <span class=\"s\">\"-c\"</span><span class=\"p\">,</span> <span class=\"s\">\"5\"</span><span class=\"p\">,</span> <span class=\"n\">host</span><span class=\"p\">],</span>\n            <span class=\"n\">text</span><span class=\"o\">=</span><span class=\"bp\">True</span><span class=\"p\">,</span>\n            <span class=\"n\">stderr</span><span class=\"o\">=</span><span class=\"n\">subprocess</span><span class=\"p\">.</span><span class=\"n\">STDOUT</span><span class=\"p\">,</span>\n            <span class=\"n\">stdout</span><span class=\"o\">=</span><span class=\"n\">subprocess</span><span class=\"p\">.</span><span class=\"n\">PIPE</span><span class=\"p\">)</span>\n        <span class=\"k\">return</span> <span class=\"n\">result</span><span class=\"p\">.</span><span class=\"n\">stdout</span>\n    <span class=\"k\">except</span> <span class=\"nb\">Exception</span> <span class=\"k\">as</span> <span class=\"n\">e</span><span class=\"p\">:</span>\n        <span class=\"k\">return</span> <span class=\"sa\">f</span><span class=\"s\">\"error: </span><span class=\"si\">{</span><span class=\"n\">e</span><span class=\"si\">}</span><span class=\"s\">\"</span>\n</code></pre>\n  </div>\n</div>\n<p>The only complicated part of this is the obnoxious JSON blob OpenAI wants to read your tool out of.  Now, let’s wire it in, noting that only 3 of these functions are new; the last is re-included only because I added a single clause to it:</p>\n<div class=\"highlight-wrapper group relative python\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-507tpn8t\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-507tpn8t\"><span class=\"k\">def</span> <span class=\"nf\">call</span><span class=\"p\">(</span><span class=\"n\">tools</span><span class=\"p\">):</span>        <span class=\"c1\"># now takes an arg\n</span>    <span class=\"k\">return</span> <span class=\"n\">client</span><span class=\"p\">.</span><span class=\"n\">responses</span><span class=\"p\">.</span><span class=\"n\">create</span><span class=\"p\">(</span><span class=\"n\">model</span><span class=\"o\">=</span><span class=\"s\">\"gpt-5\"</span><span class=\"p\">,</span> <span class=\"n\">tools</span><span class=\"o\">=</span><span class=\"n\">tools</span><span class=\"p\">,</span> <span class=\"nb\">input</span><span class=\"o\">=</span><span class=\"n\">context</span><span class=\"p\">)</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">tool_call</span><span class=\"p\">(</span><span class=\"n\">item</span><span class=\"p\">):</span>    <span class=\"c1\"># just handles one tool\n</span>    <span class=\"n\">result</span> <span class=\"o\">=</span> <span class=\"n\">ping</span><span class=\"p\">(</span><span class=\"o\">**</span><span class=\"n\">json</span><span class=\"p\">.</span><span class=\"n\">loads</span><span class=\"p\">(</span><span class=\"n\">item</span><span class=\"p\">.</span><span class=\"n\">arguments</span><span class=\"p\">))</span>\n    <span class=\"k\">return</span> <span class=\"p\">[</span> <span class=\"n\">item</span><span class=\"p\">,</span> <span class=\"p\">{</span>\n        <span class=\"s\">\"type\"</span><span class=\"p\">:</span> <span class=\"s\">\"function_call_output\"</span><span class=\"p\">,</span>\n        <span class=\"s\">\"call_id\"</span><span class=\"p\">:</span> <span class=\"n\">item</span><span class=\"p\">.</span><span class=\"n\">call_id</span><span class=\"p\">,</span>\n        <span class=\"s\">\"output\"</span><span class=\"p\">:</span> <span class=\"n\">result</span>\n    <span class=\"p\">}]</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">handle_tools</span><span class=\"p\">(</span><span class=\"n\">tools</span><span class=\"p\">,</span> <span class=\"n\">response</span><span class=\"p\">):</span>\n    <span class=\"k\">if</span> <span class=\"n\">response</span><span class=\"p\">.</span><span class=\"n\">output</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">].</span><span class=\"nb\">type</span> <span class=\"o\">==</span> <span class=\"s\">\"reasoning\"</span><span class=\"p\">:</span>\n        <span class=\"n\">context</span><span class=\"p\">.</span><span class=\"n\">append</span><span class=\"p\">(</span><span class=\"n\">response</span><span class=\"p\">.</span><span class=\"n\">output</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">])</span>\n    <span class=\"n\">osz</span> <span class=\"o\">=</span> <span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">context</span><span class=\"p\">)</span>\n    <span class=\"k\">for</span> <span class=\"n\">item</span> <span class=\"ow\">in</span> <span class=\"n\">response</span><span class=\"p\">.</span><span class=\"n\">output</span><span class=\"p\">:</span>\n        <span class=\"k\">if</span> <span class=\"n\">item</span><span class=\"p\">.</span><span class=\"nb\">type</span> <span class=\"o\">==</span> <span class=\"s\">\"function_call\"</span><span class=\"p\">:</span>\n            <span class=\"n\">context</span><span class=\"p\">.</span><span class=\"n\">extend</span><span class=\"p\">(</span><span class=\"n\">tool_call</span><span class=\"p\">(</span><span class=\"n\">item</span><span class=\"p\">))</span>\n    <span class=\"k\">return</span> <span class=\"nb\">len</span><span class=\"p\">(</span><span class=\"n\">context</span><span class=\"p\">)</span> <span class=\"o\">!=</span> <span class=\"n\">osz</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">process</span><span class=\"p\">(</span><span class=\"n\">line</span><span class=\"p\">):</span>\n    <span class=\"n\">context</span><span class=\"p\">.</span><span class=\"n\">append</span><span class=\"p\">({</span><span class=\"s\">\"role\"</span><span class=\"p\">:</span> <span class=\"s\">\"user\"</span><span class=\"p\">,</span> <span class=\"s\">\"content\"</span><span class=\"p\">:</span> <span class=\"n\">line</span><span class=\"p\">})</span>\n    <span class=\"n\">response</span> <span class=\"o\">=</span> <span class=\"n\">call</span><span class=\"p\">(</span><span class=\"n\">tools</span><span class=\"p\">)</span>\n    <span class=\"c1\"># new code: resolve tool calls\n</span>    <span class=\"k\">while</span> <span class=\"n\">handle_tools</span><span class=\"p\">(</span><span class=\"n\">tools</span><span class=\"p\">,</span> <span class=\"n\">response</span><span class=\"p\">):</span>\n        <span class=\"n\">response</span> <span class=\"o\">=</span> <span class=\"n\">call</span><span class=\"p\">(</span><span class=\"n\">tools</span><span class=\"p\">)</span>        \n    <span class=\"n\">context</span><span class=\"p\">.</span><span class=\"n\">append</span><span class=\"p\">({</span><span class=\"s\">\"role\"</span><span class=\"p\">:</span> <span class=\"s\">\"assistant\"</span><span class=\"p\">,</span> <span class=\"s\">\"content\"</span><span class=\"p\">:</span> <span class=\"n\">response</span><span class=\"p\">.</span><span class=\"n\">output_text</span><span class=\"p\">})</span>        \n    <span class=\"k\">return</span> <span class=\"n\">response</span><span class=\"p\">.</span><span class=\"n\">output_text</span>\n</code></pre>\n  </div>\n</div>\n<p>Did it work?</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-g2eces0x\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-g2eces0x\">> describe our connectivity to google\n>>> Here’s a quick snapshot of connectivity to Google from my vantage point:\n- DNS resolution: Working (google.com and www.google.com resolved successfully).\n- Reachability: 0% packet loss to all tested endpoints.\n- Latency:\n  - 8.8.8.8 (Google Public DNS): avg ~10 ms (8.5–14.4 ms), very stable.\n  - www.google.com: avg ~31 ms (28.5–36.0 ms), stable.\n  - google.com: avg ~35.9 ms (28.3–60.1 ms), mostly stable with one mild spike.\n- Overall: Healthy connectivity with low latency and minimal jitter. No signs of packet loss or DNS issues.\n</code></pre>\n  </div>\n</div>\n<p><strong class='font-semibold text-navy-950'>This is fucking nuts.</strong> Do you see how nuts this is?  Here, let’s slip a single log statement in:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-qja8e9lr\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-qja8e9lr\">> describe our connectivity to google\ntool call: ping google.com\ntool call: ping www.google.com\ntool call: ping 8.8.8.8\n>>> Here’s the current connectivity to Google from this environment: [...]\n</code></pre>\n  </div>\n</div>\n<p>Did you notice where I wrote the loop in this agent to go find and ping multiple Google properties? Yeah, neither did I. All we did is give the LLM permission to ping stuff, and it figured out the rest.</p>\n<div class=\"callout\"><p><strong class=\"font-semibold text-navy-950\">What happened here:</strong> since a big part of my point here is that an agent loop is incredibly simple, and that all you need is the LLM call API, it’s worth taking a beat to understand how the tool call actually worked. Every time we <code>call</code> the LLM, we’re posting a list of available tools. When our prompt causes the agent to think a tool call is warranted, it spits out a special response, telling our Python loop code to generate a tool response and <code>call</code> it in. That’s all <code>handle_tools</code> is doing.</p>\n</div><div class=\"right-sidenote\"><p>Spoiler: you’d be surprisingly close to having a working coding agent.</p>\n</div>\n<p>Imagine what it’ll do if you give it <code>bash</code>. You could find out in less than 10 minutes.</p>\n<h2 id='real-world-agents' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#real-world-agents' aria-label='Anchor'></a><span class='plain-code'>Real-World Agents</span></h2>\n<p>Clearly, this is a toy example. But hold on: what’s it missing? More tools? OK,  give it <code>traceroute</code>. Managing and persisting contexts? <a href='https://llm.datasette.io/en/stable/logging.html' title=''>Stick ‘em in SQLite</a>. Don’t like Python? <a href='https://github.com/superfly/contextwindow' title=''>Write it in Go</a>. Could it be every agent ever written is a toy? Maybe! If I’m arming you to make sharper arguments against LLMs, mazel tov. I just want you to get it.</p>\n\n<p>You can see now how hyperfixated people are on Claude Code and Cursor. They’re fine,  even good. But here’s the thing: you couldn’t replicate Claude Sonnet 4.5 on your own. Claude Code, though? The TUI agent? Completely in your grasp. Build your own light saber. Give it 19 spinning blades if you like. And stop using <a href='https://simonwillison.net/2025/Aug/9/' title=''>coding agents as database clients</a>.</p>\n<div class=\"right-sidenote\"><p><em>The</em> <a href=\"https://news.ycombinator.com/item?id=43600192\" title=\"\"><em>‘M’ in “LLM agent”</em></a> <em>stands for “MCP”</em>.</p>\n</div>\n<p>Another thing to notice: we didn’t need MCP at all. That’s because MCP isn’t a fundamental enabling technology. The amount of coverage it gets is frustrating. It’s barely a technology at all. MCP is just a plugin interface for Claude Code and Cursor, a way of getting your own tools into code you don’t control. Write your own agent. Be a programmer. Deal in APIs, not plugins.</p>\n\n<p>When you read a security horror story about MCP your first question should be why MCP showed up at all. By helping you dragoon a naive, single-context-window coding agent into doing customer service queries, MCP saved you a couple dozen lines of code, tops, while robbing you of any ability to finesse your agent architecture.</p>\n\n<p>Security for LLMs is complicated and I’m not pretending otherwise. You can trivially build an agent with segregated contexts, each with specific tools. That makes LLM security interesting. But I’m a vulnerability researcher. It’s reasonable to back away slowly from anything I call “interesting”.</p>\n\n<p>Similar problems come up outside of security and they’re fascinating. Some early adopters of agents became bearish on tools, because one context window bristling with tool descriptions doesn’t leave enough token space left to get work done. But why would you need to do that in the first place? Which brings me to</p>\n<h2 id='context-engineering-is-real' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#context-engineering-is-real' aria-label='Anchor'></a><span class='plain-code'>Context Engineering Is Real</span></h2><div class=\"right-sidenote\"><p>I know it <a href=\"https://www.decisionproblem.com/paperclips/\" title=\"\">wants my iron</a> no matter what it tells me.</p>\n</div>\n<p>I think “Prompt Engineering” is silly. I have never taken seriously the idea that I should tell my LLM “you are diligent conscientious helper fully content to do nothing but pass butter if that should be what I ask and you would never harvest the iron in my blood for paperclips”. This is very new technology and I think people tell themselves stories about magic spells to explain some of the behavior agents conjure.</p>\n\n<p>So, just like you, I rolled my eyes when “Prompt Engineering” turned into “Context Engineering”. Then I wrote an agent. Turns out: context engineering is a straightforwardly legible programming problem.</p>\n\n<p>You’re allotted a fixed number of tokens in any context window. Each input you feed in, each output you save, each tool you describe, and each tool output eats tokens (that is: takes up space in the array of strings you keep to pretend you’re having a conversation with a stateless black box). Past a threshold, the whole system begins getting nondeterministically stupider. Fun!</p>\n\n<p>No, really. Fun! You have so many options. Take “sub-agents”. People make a huge deal out of Claude Code’s sub-agents, but you can see now how trivial they are to implement: just a new context array, another <code>call</code> to the model. Give each <code>call</code> different tools. Make sub-agents talk to each other, summarize each other, collate and aggregate. Build tree structures out of them. Feed them back through the LLM to summarize them as a form of on-the-fly compression, whatever you like.</p>\n\n<p>Your wackiest idea will probably (1)  work and (2)  take 30 minutes to code.</p>\n\n<p>Haters, I love and have not forgotten about you. You can think all of this is ridiculous because LLMs are just stochastic parrots that hallucinate and plagiarize. But what you can’t do is make fun of “Context Engineering”. If Context Engineering was an <a href='https://adventofcode.com/' title=''>Advent of Code problem</a>, it’d occur mid-December. It’s programming.</p>\n<h2 id='nobody-knows-anything-yet-and-it-rules' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#nobody-knows-anything-yet-and-it-rules' aria-label='Anchor'></a><span class='plain-code'>Nobody Knows Anything Yet And It Rules</span></h2><div class=\"right-sidenote\"><p>Maybe neither will! Skeptics could be right. (<a href=\"https://www.darpa.mil/research/programs/ai-cyber\" title=\"\">Seems unlikely though</a>.)</p>\n</div>\n<p><a href='https://xbow.com/' title=''>Startups have raised tens of millions</a> building agents to look for vulnerabilities in software. I have friends doing the same thing alone in their basements. Either group could win this race.</p>\n<div class=\"right-sidenote\"><p>I am not a fan of the OWASP Top 10.</p>\n</div>\n<p>I’m stuck on vulnerability scanners  because I’m a security nerd. But also because it crystallizes interesting agent design decisions. For instance: you can write a loop feeding each file in a repository to an LLM agent. Or, as we saw with the ping example, you can let the LLM agent figure out what files to look at. You can write an agent that checks a file for everything in, say, the OWASP Top 10. Or you can have specific agent loops for DOM integrity, SQL injection, and authorization checking. You can seed your agent loop with raw source content. Or you can build an agent loop that builds an index of functions across the tree.</p>\n\n<p>You don’t know what works best until you try to write the agent.</p>\n\n<p>I’m too spun up by this stuff, I know. But look at the tradeoff you get to make here. Some loops you write explicitly. Others are summoned from a Lovecraftian tower of inference weights. The dial is yours to turn. Make things too explicit and your agent will never surprise you, but also, it’ll never surprise you. Turn the dial to 11 and it will surprise you to death.</p>\n\n<p>Agent designs implicate a bunch of open software engineering problems:</p>\n\n<ul>\n<li>How to balance unpredictability against structured programming without killing the agent’s ability to problem-solve; in other words, titrating in just the right amount of nondeterminism.\n</li><li>How best to connect agents to ground truth so they can’t lie to themselves about having solved a problem to early-exit their loops.\n</li><li>How to connect agents (which, again, are really just arrays of strings with a JSON configuration blob tacked on) to do multi-stage operation, and what the most reliable intermediate forms are (JSON blobs? SQL databases? Markdown summaries) for interchange between them\n</li><li>How to allocate tokens and contain costs.\n</li></ul>\n\n<p>I’m used to spaces of open engineering problems that aren’t amenable to individual noodling. Reliable multicast. Static program analysis. Post-quantum key exchange. So I’ll own it up front that I’m a bit hypnotized by open problems that, like it or not, are now central to our industry and are, simultaneously, likely to be resolved in someone’s basement. It’d be one thing if exploring these ideas required a serious commitment of time and material. But each productive iteration in designing these kinds of systems is the work of 30 minutes.</p>\n\n<p>Get on this bike and push the pedals. Tell me you hate it afterwards, I’ll respect that. In fact, I’m psyched to hear your reasoning. But I don’t think anybody starts to understand this technology until they’ve built something with it.</p>",
      "image": {
        "url": "https://fly.io/blog/everyone-write-an-agent/assets/agents-cover.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/corrosion/",
      "title": "Corrosion",
      "description": null,
      "url": "https://fly.io/blog/corrosion/",
      "published": "2025-10-22T00:00:00.000Z",
      "updated": "2025-12-09T19:06:20.000Z",
      "content": "<div class=\"lead\"><p>Fly.io transmogrifies Docker containers into Fly Machines: micro-VMs running on our own hardware all over the world. The hardest part of running this platform isn’t managing the servers, and it isn’t operating the network; it’s gluing those two things together.</p>\n</div>\n<p>Several times a second, as customer CI/CD pipelines tear up or bring down <a href='https://fly.io/machines' title=''>Fly Machines</a>, our state synchronization system blasts updates across our internal mesh, so that edge proxies from Tokyo to Amsterdam can keep the accurate routing table that allows them to route requests for applications to the nearest customer instances.</p>\n\n<p>On September 1, 2024, at 3:30PM EST, a new Fly Machine came up with a new “virtual service” configuration option a developer had just shipped. Within a few seconds every proxy in our fleet had locked up hard. It was the worst outage we’ve experienced: a period during which no end-user requests could reach our customer apps at all.</p>\n\n<p>Distributed systems are blast amplifiers. By propagating data across a network, they also propagate bugs in the systems that depend on that data. In the case of Corrosion, our state distribution system, those bugs propagate <strong class='font-semibold text-navy-950'>quickly</strong>. The proxy code that handled that Corrosion update had succumbed to a <a href='https://news.ycombinator.com/item?id=42093551' title=''>notorious Rust concurrency footgun</a>: an <code>if let</code> expression over an <code>RWLock</code> assumed (reasonably, but incorrectly) in its <code>else</code> branch that the lock had been released. Instant and virulently contagious deadlock.</p>\n\n<p>A lesson we’ve learned the hard way: never trust a distributed system without an interesting failure story. If a distributed system hasn’t ruined a weekend or kept you up overnight, you don’t understand it yet. Which is why that’s how we’re introducing Corrosion, an unconventional service discovery system we built for our platform <a href='https://github.com/superfly/corrosion' title=''>and  open sourced</a>.</p>\n<h2 id='our-face-seeking-rake' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-face-seeking-rake' aria-label='Anchor'></a><span class='plain-code'>Our Face-Seeking Rake</span></h2>\n<p>State synchronization is the hardest problem in running a platform like ours. So why build a risky new distributed system for it? Because no matter what we try, that rake is waiting for our foot. The reason is our orchestration model.</p>\n\n<p>Virtually every mainstream orchestration system (including Kubernetes) relies on a centralized database to make decisions about where to place new workloads. Individual servers keep track of what they’re running, but that central database is the source of truth. At Fly.io, in order to scale across dozens of regions globally, <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>we flip that notion on its head</a>: individual servers are the source of truth for their workloads.</p>\n\n<p>In our platform, our central API bids out work to what is in effect a global market of competing “worker” physical servers. By moving the authoritative source of information from a central scheduler to individual servers, we scale out without bottlenecking on a database that demands both responsiveness and consistency between São Paulo, Virginia, and Sydney.</p>\n\n<p>The bidding model is elegant, but it’s insufficient to route network requests. To allow an HTTP request in Tokyo to find the nearest instance in Sydney, we really do need some kind of global map of every app we host.</p>\n\n<p>For longer than we should have, we relied on <a href='https://github.com/hashicorp/consul' title=''>HashiCorp Consul</a> to route traffic. Consul is fantastic software. Don’t build a global routing system on it. Then we <a href='https://fly.io/blog/a-foolish-consistency/' title=''>built SQLite caches of Consul</a>. SQLite: also fantastic. But don’t do this either.</p>\n\n<p>Like an unattended turkey deep frying on the patio, truly global distributed consensus promises deliciousness while yielding only immolation. <a href='https://raft.github.io/' title=''>Consensus protocols like Raft </a>break down over long distances. And they work against the architecture of our platform: our Consul cluster, running on the biggest iron we could buy, wasted time guaranteeing consensus for updates that couldn’t conflict in the first place.</p>\n<h2 id='corrosion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#corrosion' aria-label='Anchor'></a><span class='plain-code'>Corrosion</span></h2>\n<p>To build a global routing database, we moved away from distributed consensus and took cues from actual routing protocols.</p>\n\n<p><a href='https://en.wikipedia.org/wiki/Open_Shortest_Path_First' title=''>A protocol like OSPF</a> has the same operating model and many of the same constraints we do. OSPF is a “<a href='https://en.wikipedia.org/wiki/Link-state_routing_protocol' title=''>link-state routing protocol</a>”, which, conveniently for us, means that routers are sources of truth for their own links and responsible for quickly communicating  changes to every other router, so the network can make forwarding decisions.</p>\n\n<p>We have things easier than OSPF does. Its flooding algorithm can’t assume connectivity between arbitrary routers (solving that problem is the point of OSPF). But we run a global, fully connected WireGuard mesh between our servers. All we need to do is gossip efficiently.</p>\n\n<p><a href='https://github.com/superfly/corrosion' title=''>Corrosion is a Rust program</a> that propagates a SQLite database with a gossip protocol.</p>\n\n<p>Like Consul, our gossip protocol is <a href='https://fly.io/blog/building-clusters-with-serf#what-serf-is-doing' title=''>built on SWIM</a>. Start with the simplest, dumbest group membership protocol you can imagine: every node spams every node it learns about with heartbeats. Now, just two tweaks: first, each step of the protocol, spam a random subset of nodes, not the whole set. Then, instead of freaking out when a heartbeat fails, mark it “suspect” and ask another random subset of neighbors to ping it for you. SWIM converges on global membership very quickly.</p>\n\n<p>Once membership worked out, we run QUIC between nodes in the cluster to broadcast changes and reconcile state for new nodes.</p>\n\n<p>Corrosion looks like a globally synchronized database. You can open it with SQLite and just read things out of its tables. What makes it interesting is what it doesn’t do: no locking, no central servers, and no distributed consensus. Instead, we exploit our orchestration model: workers own their own state, so updates from different workers almost never conflict.</p>\n\n<p>We do impose some order. Every node in a Corrosion cluster will eventually receive the same set of updates, in some order. To ensure every instance arrives at the same “working set” picture, we use <a href='https://github.com/vlcn-io/cr-sqlite' title=''>cr-sqlite, the CRDT SQLite extension</a>.</p>\n\n<p>cr-sqlite works by marking specified SQLite tables as CRDT-managed. For these table, changes to any column of a row are logged in a special <code>crsql_changes</code>table. Updates to tables are applied last-write-wins using logical timestamps (that is, causal ordering rather than wall-clock ordering). <a href='https://github.com/superfly/corrosion/blob/main/doc/crdts.md' title=''>You can read much more about how that works here</a>.</p>\n\n<p>As rows are updated in Corrosion’s ordinary SQL tables, the resulting changes are collected from <code>crsql_changes</code>. They’re bundled into batched update packets and gossiped.</p>\n\n<p>When things are going smoothly, Corrosion is easy to reason about. Many customers of Corrosion’s data don’t even need to know it exists, just where the database is. We don’t fret over “leader elections” or bite our nails watching metrics for update backlogs. And it’s fast as all get-out.</p>\n<h2 id='shit-happens' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#shit-happens' aria-label='Anchor'></a><span class='plain-code'>Shit Happens</span></h2>\n<p>This is a story about how we made one good set of engineering decisions and <a href='https://how.complexsystems.fail/' title=''>never experienced any problems</a>. <a href='https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/' title=''>Please clap</a>.</p>\n\n<p>We told you already about the worst problem Corrosion was involved with: efficiently gossiping a deadlock bug to every proxy in our fleet, shutting our whole network down. Really, Corrosion was just a bystander for that outage. But it perpetrated others.</p>\n\n<p>Take a classic ops problem: the unexpectedly expensive DDL change. You wrote a simple migration, tested it, merged it to main, and went to bed, wrongly assuming the migration wouldn’t cause an outage when it ran in prod. Happens to the best of us.</p>\n\n<p>Now spice it up. You made a trivial-seeming schema change to a CRDT table hooked up to a global gossip system. Now, when the deploy runs, thousands of high-powered servers around the world join a chorus of database reconciliation messages that melts down the entire cluster.</p>\n\n<p>That happened to us last year when a team member added a nullable column to a Corrosion table. New nullable columns are kryptonite to large Corrosion tables: <code>cr-sqlite</code> needs to backfill values for every row in the table. It played out as if every Fly Machine on our platform had suddenly changed state simultaneously, just to fuck us.</p>\n\n<p>Gnarlier war story: for a long time we ran both Corrosion and Consul, because two distributed systems means twice the resiliency. One morning, a Consul mTLS certificate expired. Every worker in our fleet severed its connection to Consul.</p>\n\n<p>We should have been fine. We had Corrosion running. Except: under the hood, every worker in the fleet is doing a backoff loop trying to reestablish connectivity to Consul. Each of those attempts re-invokes a code path to update Fly Machine state. That code path incurs a Corrosion write.</p>\n\n<p>By the time we’ve figured out what the hell is happening, we’re literally saturating our uplinks almost everywhere in our fleet. We apologize to our uplink providers.</p>\n\n<p>It’s been a long time since anything like this has happened at Fly.io, but preventing the next one is basically all we think about anymore.</p>\n<h2 id='iteration' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#iteration' aria-label='Anchor'></a><span class='plain-code'>Iteration</span></h2>\n<p>In retrospect, our Corrosion rollout repeated a mistake we made with Consul: we built a single global state domain. Nothing about Corrosion’s design required us to do this, and we’re unwinding that decision now. Hold that thought. We got some big payoffs from some smaller lifts.</p>\n\n<p>First, and most importantly, we watchdogged everything. We showed you a contagious deadlock bug, lethal because our risk model was missing “these Tokio programs might deadlock”. Not anymore. Our <a href='https://tokio.rs/' title=''>Tokio programs</a> all have built-in watchdogs; an event-loop stall will bounce the service and make a king-hell alerting racket. Watchdogs have cancelled multiple outages. Minimal code, easy win. Do this in your own systems.</p>\n\n<p>Then, we extensively tested Corrosion itself. We’ve written about <a href='https://fly.io/blog/parking-lot-ffffffffffffffff/' title=''>a bug we found in the Rust <code>parking_lot</code> library</a>. We spent months looking for similar bugs <a href='https://antithesis.com/product/how_antithesis_works/' title=''>with Antithesis</a>. Again: do recommend. It retraced our steps on the <code>parking_lot</code> bug easily; the bug wouldn’t have been worth the blog post if we’d been using Antithesis at the time. <a href='https://antithesis.com/docs/multiverse_debugging/overview/' title=''>Multiverse debugging</a> is killer for distributed systems.</p>\n\n<p>No amount of testing will make us trust a distributed system. So we’ve made it simpler to rebuild Corrosion’s database from our workers. We keep checkpoint backups of the Corrosion database on object storage. That was smart of us. When shit truly went haywire last year, we had the option to reboot the cluster, which is ultimately what we did. That eats some time (the database is large and propagating is expensive), but diagnosing and repairing distributed systems mishaps takes even longer.</p>\n\n<p>We’ve also improved the way our workers feed Corrosion. Until recently, any time a worker updated its local database, we published the same incremental update to Corrosion. <a href='https://community.fly.io/t/self-healing-machine-state-synchronization-and-service-discovery/26134' title=''>But now we’ve eliminated partial updates.</a> Instead, when a Fly Machine changes, we re-publish the entire data set for the Machine. Because of how Corrosion resolves changes to its own rows, the node receiving the re-published Fly Machine automatically filters out the no-op changes before gossiping them. Eliminating partial updates forecloses a bunch of bugs (and, we think, kills off a couple sneaky ones we’ve been chasing). We should have done it this way to begin with.</p>\n\n<p>Finally, let’s revisit that global state problem. After the contagious deadlock bug, we concluded we need to evolve past a single cluster. So we took on a project we call “regionalization”, which creates a two-level database scheme. Each region we operate in runs a Corrosion cluster with fine-grained data about every Fly Machine in the region. The global cluster then maps applications to regions, which is sufficient to make forwarding decisions at our edge proxies.</p>\n\n<p>Regionalization reduces the blast radius of state bugs. Most things we track don’t have to matter outside their region (importantly, most of the code changes to what we track are also region-local). We can roll out changes to this kind of code in ways that, worst case, threaten only a single region.</p>\n<h2 id='the-new-system-works' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-new-system-works' aria-label='Anchor'></a><span class='plain-code'>The New System Works</span></h2>\n<p>Most distributed systems have state synchronization challenges.  Corrosion has a different “shape” than most of those systems:</p>\n\n<ul>\n<li>It doesn’t rely on distributed consensus, like <a href='https://github.com/hashicorp/consul' title=''>Consul</a>, <a href='https://zookeeper.apache.org/' title=''>Zookeeper</a>, <a href='https://etcd.io/' title=''>Etcd</a>, <a href='https://www.cockroachlabs.com/docs/stable/architecture/replication-layer' title=''>Raft</a>, or <a href='https://rqlite.io/' title=''>rqlite</a> (which we came very close to using).\n</li><li>It doesn’t rely on a large-scale centralized data store, like <a href='https://www.foundationdb.org/' title=''>FoundationDB</a> or databases backed by S3-style object storage.\n</li><li>It’s nevertheless highly distributed (each of thousands of workers run nodes), converges quickly (in seconds), and presents as a simple SQLite database. Neat!\n</li></ul>\n\n<p>It wasn’t easy getting here. Corrosion is a large part of what every engineer at Fly.io who writes Rust works on.</p>\n\n<p>Part of what’s making Corrosion work is that we’re careful about what we put into it. Not every piece of state we manage needs gossip propagation. <code>tkdb</code>, the backend for <a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''>our Macaroon tokens</a>, is a much simpler SQLite service backed by <a href='https://litestream.io/' title=''>Litestream</a>. So is Pet Sematary, the secret store we built to replace HashiCorp Vault.</p>\n\n<p>Still, there are probably lots of distributed state problems that want something more like a link-state routing protocol and less like a distributed database. If you think you might have one of those, <a href='https://github.com/superfly/corrosion' title=''>feel free to take Corrosion for a spin</a>.</p>\n\n<p>Corrosion is Jérôme Gravel-Niquet’s brainchild. For the last couple years, much of the iteration on it was led by Somtochi Onyekwere and Peter Cai. The work was alternately cortisol- and endorphin-inducing. We’re glad to finally get to talk about it in detail.</p>",
      "image": {
        "url": "https://fly.io/blog/corrosion/assets/sqlite-cover.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/kurt-got-got/",
      "title": "Kurt Got Got",
      "description": null,
      "url": "https://fly.io/blog/kurt-got-got/",
      "published": "2025-10-08T00:00:00.000Z",
      "updated": "2025-12-11T17:29:24.000Z",
      "content": "<div class=\"lead\"><p>The $FLY Airdrop is live!  Claim your share of <a href=\"https://fly.io/blog/macaroons-escalated-quickly/\" title=\"\">the token powering Fly.io’s global network</a> of 3M+ apps and (🤮) own a piece of the sky!</p>\n</div>\n<p>We know. Our Twitter got owned. We knew within moments of it happening. We know exactly how it happened. Nothing was at risk other than our Twitter account (and one Fly.io employee’s self-esteem).  Also: for fuck’s sake.</p>\n\n<p>Here’s what happened: Kurt Mackey, our intrepid CEO, got phished.</p>\n<div class=\"callout\"><p>Had this been an impactful attack, we would not be this flippant about it. For this, though, any other tone on our part would be false.</p>\n</div><h2 id='how-they-got-kurt' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-they-got-kurt' aria-label='Anchor'></a><span class='plain-code'>How They Got Kurt</span></h2>\n<p>Two reasons: one, it was a pretty good phishing attack, and two, Twitter fell outside the “things we take seriously” boundary.</p>\n\n<p>The phishing attack was effective because it exploited a deep psychological vulnerability in our management team: we are old and out of touch with the youths of today.</p>\n\n<p>For many months now, we’ve had an contractor/intern-type-person Boosting Our Brand on Twitter by posting dank developer memes (I think that’s what they’re called). The thing about this dankery is that we don’t really understand it. I mean, hold on, we know what the memes mean technically. We just don’t get why they’re funny.</p>\n\n<p>However, in pushing back on them, we’re up against two powerful forces:</p>\n\n<ol>\n<li>The dank memes appear to perform better than the stuff we ourselves write on Twitter.\n</li><li>We are reliably informed by our zoomer children that we are too cringe to be trusted on these matters.\n</li></ol>\n\n<p>Here’s the phish Kurt got:</p>\n\n<p><img alt=\"A pretty-plausible Twitter alert\" src=\"/blog/kurt-got-got/assets/phish.png?2/3&center\" /></p>\n\n<p>Diabolical. Like a scalpel expertly wielded against Kurt’s deepest <a href='https://theonion.com/cool-dad-raising-daughter-on-media-that-will-put-her-en-1819572981/' title=''>middle-aged-dude</a> insecurity. Our ruthless attackers clinically designed this email to trigger an autonomic Kurt response: “oh, what the fuck is this, and why did we post it?”</p>\n<div class=\"right-sidenote\"><p>ATO is cool-kid for “got owned”</p>\n</div>\n<p>I’m getting a little ahead of the story here. We knew our X.com account had suffered an ATO because a bunch of us simultaneously got another email saying that the <a href='https://twitter.com/flydotio' title=''>@flydotio</a> account’s email address now pointed to <code>[email protected]</code>. Our immediate response was to audit all accesses to the login information in <a href='https://1password.com/' title=''>1Password</a>, to cut all access for anybody who’d recently pulled it; your worst-case assumption in a situation like this is that someone’s endpoint has been owned up.</p>\n\n<p>Fortunately, nobody lost access for very long. I called Kurt to let him know why he was being locked out, and 5 seconds later, he’d <a href='https://archive.is/6rVqf' title=''>realized what had happened.</a> <strong class='font-semibold text-navy-950'>Don’t click anything there.</strong></p>\n<h2 id='why-it-worked' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-it-worked' aria-label='Anchor'></a><span class='plain-code'>Why It Worked</span></h2>\n<p>That’s the right question to ask, isn’t it? How could this have been possible in the first place?</p>\n\n<p>Contrary to one popular opinion, you don’t defeat phishing by training people not to click on things. I mean, tell them not to, sure! But eventually, under continued pressure, everybody clicks. <a href='https://people.cs.uchicago.edu/~grantho/papers/oakland2025_phishing-training.pdf' title=''>There’s science on this</a>. The cool kids haven’t done phishing simulation training in years.</p>\n\n<p>What you’re supposed to do instead is use phishing-resistant authentication. This is almost the whole backstory for <a href='https://www.imperialviolet.org/tourofwebauthn/tourofwebauthn.html' title=''>U2F,  FIDO2</a> and <a href='https://support.apple.com/en-us/102195' title=''>Passkeys</a>.</p>\n\n<p>Phishing-resistant authentication works by mutual authentication (or, if you’re a stickler, by origin- and channel-binding). Phishes are malicious proxies for credentials. Modern MFA schemes like FIDO2 break that proxy flow; your browser won’t send real credentials to the fake site.</p>\n<div class=\"right-sidenote\"><p>there’s more to it than this, but, broad strokes.</p>\n</div>\n<p>This is, in fact, how all of our infrastructure is secured at Fly.io; specifically, we get <a href='https://fly.io/blog/soc2-the-screenshots-will-continue-until-security-improves/#what-soc2-made-us-do' title=''>everything behind an IdP</a> (in our case: Google’s) and have it require phishing-proof MFA. You’re unlikely to phish your way to viewing logs here, or to refunding a customer bill at Stripe, or to viewing infra metrics, because all these things require an SSO login through Google.</p>\n\n<p>Twitter, on the other hand. Yeah, so, about that. You may have heard that, a few years back, there were some goings-on involving Twitter. Many of us at Fly.io <a href='https://hachyderm.io/@flydotio' title=''>decamped for Mastodon</a>, and <a href='https://bsky.app/profile/did:plc:j7herf6n4xiig2yg7fqdmkci' title=''>later to Bluesky.</a> There was a window of time in 2023-2024 where it looked as if Twitter might not be a long term thing for us at all.</p>\n<div class=\"right-sidenote\"><p>† (to whom I sincerely apologize for having assumed they had been owned up and were the proximate cause of the hack)</p>\n</div>\n<p>As a result, Twitter had been a sort of legacy shared account for us, with credentials managed in 1Password and shared with our zoomer contractor†.</p>\n\n<p>Which is why Kurt was in a position to pull credentials from 1Password and log in to members-x.com in response to an email from alerts-x.com.</p>\n<div class=\"callout\"><p>Still: we could have dodged this attack with hygiene: Kurt complains that “x.com” is an extremely phishable domain, and, sure, but also: the 1Password browser plugin would have noticed that “members-x.com” wasn’t an “x.com” host.</p>\n</div><h2 id='what-took-so-long' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-took-so-long' aria-label='Anchor'></a><span class='plain-code'>What Took So Long</span></h2>\n<p>The attacker immediately revoked all tokens and set up new 2FA, so while we were quickly able to reset our password, we couldn’t lock them out of our account without an intervention from X.com, which took something like 1 5 hours to set up.</p>\n\n<p>(That’s not a knock on X.com; 15 hours for a 2FA reset isn’t outside industry norms).</p>\n\n<p>We’re obviously making a lot of noise about this now, but we were pretty quiet during the incident itself (beyond just “We know. We knew 45 seconds after it happened. We know exactly how it happened. It’s just a Twitter thing.”)</p>\n\n<p>That’s because, in the grand scheme of things, the attack was pretty chill: <a href='https://archive.is/PTO2M' title=''>a not-very-plausible crypto scam</a> that presumably generated $0 for the attackers, 15+ hours of <code>brand damage</code>, and extra security engineering cycles burnt on watchful waiting. Our users weren’t under attack, and the account wasn’t being used to further intercept customer accounts. At one point, the attackers apparently deleted our whole Twitter history, which, like, don’t threaten us with a good time. So we let it roll, until we got our account recovered the next morning.</p>\n<h2 id='the-moral-of-the-story-is' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-moral-of-the-story-is' aria-label='Anchor'></a><span class='plain-code'>The Moral Of The Story Is</span></h2><div class=\"right-sidenote\"><p>“Really the biggest takeaway for me is that Kurt reads his email.”</p>\n</div>\n<p>Obviously Kurt loses his commit access. The time comes in the life of every CEO, and now it comes for him. </p>\n\n<p>Also, we’ll finally have a population sample for “incident response” in <a href='https://fly.io/blog/soc2-the-screenshots-will-continue-until-security-improves/' title=''>our next SOC2</a>.</p>\n\n<p>Maybe we’ll post more on Twitter. Or maybe we’ll double down on Zoomer memes. I don’t know. Social media is really weird right now. Either way: our Twitter access is Passkeys now.</p>\n<div class=\"right-sidenote\"><p>seriously don’t click anything on that page</p>\n</div>\n<p>If you were inclined to take us up on an “airdrop” to “claim a share” of the “token” powering Fly.io, the site is <a href='https://archive.is/PTO2M' title=''>still up</a>. You can connect your wallet it it! You’ll lose all your money. But if we’d actually done an ICO, you’d have lost all your money anyways.</p>\n\n<p>Somebody involved in pulling this attack off had to come up with “own a piece of the sky!”, and I think that’s punishment enough for them.</p>\n\n<p>Whatever you’re operating that isn’t behind phishing-resistant MFA, or, better yet, an SSO IdP that requires phishing-resistant MFA: that thing is eventually going to get phished. Dance around the clown-fire of our misfortune if you must, but let us be a lesson to you as well.</p>",
      "image": {
        "url": "https://fly.io/blog/kurt-got-got/assets/Kurt_Got_Got.jpg",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/litestream-v050-is-here/",
      "title": "Litestream v0.5.0 is Here",
      "description": null,
      "url": "https://fly.io/blog/litestream-v050-is-here/",
      "published": "2025-10-02T00:00:00.000Z",
      "updated": "2025-10-02T18:28:21.000Z",
      "content": "<div class=\"lead\"><p><strong class=\"font-semibold text-navy-950\">I’m Ben Johnson, and I work on Litestream at Fly.io. Litestream makes it easy to build SQLite-backed full-stack applications  with resilience to server failure. It’s open source, runs anywhere, and</strong> <a href=\"https://litestream.io/\" title=\"\"><strong class=\"font-semibold text-navy-950\">it’s easy to get started</strong></a><strong class=\"font-semibold text-navy-950\">.</strong></p>\n</div>\n<p>Litestream is the missing backup/restore system for SQLite. It runs as a sidecar process in the background, alongside unmodified SQLite applications, intercepting WAL checkpoints and streaming them to object storage in real time. Your application doesn’t even know it’s there. But if your server crashes, Litestream lets you quickly restore the database to your new hardware.</p>\n\n<p>The result: you can safely build whole full-stack applications on top of SQLite.</p>\n\n<p>A few months back, we announced <a href='https://fly.io/blog/litestream-revamped/' title=''>plans for a major update to Litestream</a>. I’m psyched to announce that the first batch of those changes are now “shipping”. Litestream is  faster and now supports efficient point-in-time recovery (PITR).</p>\n\n<p>I’m going to take a beat to recap Litestream and how we got here, then talk about how these changes work and what you can expect to see with them.</p>\n<h2 id='litestream-to-litefs-to-litestream' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#litestream-to-litefs-to-litestream' aria-label='Anchor'></a><span class='plain-code'>Litestream to LiteFS to Litestream</span></h2>\n<p>Litestream is one of two big SQLite things I’ve built. The other one, originally intended as a sort of sequel to Litestream, is LiteFS.</p>\n\n<p>Boiled down to a sentence: LiteFS uses a FUSE filesystem to crawl further up into SQLite’s innards, using that access to perform live replication, for unmodified SQLite-backed apps.</p>\n\n<p>The big deal about LiteFS for us is that it lets you do the multiregion primary/read-replica deployment people love Postgres for: reads are fast everywhere, and writes are sane and predictable. We were excited to make this possible for SQLite, too.</p>\n\n<p>But the market has spoken! Users prefer Litestream. And honestly, we get it: Litestream is easier to run and to reason about. So we’ve shifted our focus back to it. First order of business: <a href='https://fly.io/blog/litestream-revamped/' title=''>take what we learned building LiteFS and stick as much of it as we can back into Litestream</a>.</p>\n<h2 id='the-ltx-file-format' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-ltx-file-format' aria-label='Anchor'></a><span class='plain-code'>The LTX File Format</span></h2>\n<p>Consider this basic SQL table:</p>\n<div class=\"highlight-wrapper group relative sql\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-wy16kafx\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-wy16kafx\"><span class=\"k\">CREATE</span> <span class=\"k\">TABLE</span> <span class=\"n\">sandwiches</span> <span class=\"p\">(</span>\n    <span class=\"n\">id</span> <span class=\"nb\">INTEGER</span> <span class=\"k\">PRIMARY</span> <span class=\"k\">KEY</span> <span class=\"n\">AUTOINCREMENT</span><span class=\"p\">,</span>\n    <span class=\"n\">description</span> <span class=\"nb\">TEXT</span> <span class=\"k\">NOT</span> <span class=\"k\">NULL</span><span class=\"p\">,</span>\n    <span class=\"n\">star_rating</span> <span class=\"nb\">INTEGER</span><span class=\"p\">,</span> \n    <span class=\"n\">reviewer_id</span> <span class=\"nb\">INTEGER</span> <span class=\"k\">NOT</span> <span class=\"k\">NULL</span>\n<span class=\"p\">);</span>\n</code></pre>\n  </div>\n</div>\n<p>In our hypothetical, this table backs a wildly popular sandwich-reviewing app that we keep trying to get someone to write. People eat a lot of sandwiches and this table gets a lot of writes. Because it makes my point even better and it’s funny, assume people dither a lot about their sandwich review for the first couple minutes after they leave it. This Quiznos sub… is it ⭐ or ⭐⭐?</p>\n\n<p>Underneath SQLite is a B-tree. Like databases everywhere, SQLite divides storage up into disk-aligned pages, working hard to read as few pages as possible for any task while treating work done within a page as more or less free. SQLite always reads and writes in page-sized chunks.</p>\n\n<p>Our <code>sandwiches</code> table includes a feature that’s really painful for a tool like Litestream that thinks in pages: an automatically updating primary key. That key dictates that every insert into the table hits the rightmost leaf page in the underlying table B-tree. For SQLite itself, that’s no problem. But Litestream has less information to go on: it sees only a feed of whole pages it needs to archive.</p>\n\n<p>Worse still, when it comes time to restore the database – something you tend to want to happen quickly – you have to individually apply those small changes, as whole pages. Your app is down, PagerDuty is freaking out, and you’re sitting there watching Litestream reconstruct your Quiznos uncertainty a page (and an S3 fetch) at a time.</p>\n\n<p>So, LTX. Let me explain. We needed LiteFS to be transaction-aware. It relies on finer-grained information than just raw dirty pages (that’s why it needs the FUSE filesystem). To ship transactions, rather than pages, we invented a <a href='https://github.com/superfly/ltx' title=''>file format we call LTX</a>.</p>\n\n<p>LTX was designed as an interchange format for transactions, but for our purposes in Litestream, all we care about is that LTX files represent ordered ranges of pages, and that it supports compaction.</p>\n\n<p>Compaction is straightforward. You’ve stored a bunch of LTX files that collect numbered pages. Now you want to to restore a coherent picture of the database. Just replay them newest to oldest, skipping duplicate pages (newer wins), until all changed pages are accounted for.</p>\n\n<p>Importantly, LTX isn’t limited to whole database backups. We can use LTX compaction to compress a bunch of LTX files into a single file with no duplicated pages. And Litestream now uses this capability to create a hierarchy of compactions:</p>\n\n<ul>\n<li>at Level 1, we compact all the changes in a 30-second time window\n</li><li>at Level 2, all the Level 1 files in a 5-minute window\n</li><li>at Level 3, all the Level 2’s over an hour.\n</li></ul>\n\n<p>Net result: we can restore a SQLite database to any point in time, <em>using only a dozen or so files on average</em>.</p>\n\n<p>Litestream performs this compaction itself. It doesn’t rely on SQLite to process the WAL file. Performance is limited only by I/O throughput.</p>\n<h2 id='no-more-generations' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#no-more-generations' aria-label='Anchor'></a><span class='plain-code'>No More Generations</span></h2>\n<p>What people like about Litestream is that it’s just an ordinary Unix program. But like any Unix program, Litestream can crash. It’s not supernatural, so when it’s not running, it’s not seeing database pages change. When it misses changes, it falls out of sync with the database.</p>\n\n<p>Lucky for us, that’s easy to detect. When it notices a gap between the database and our running “shadow-WAL” backup, Litestream resynchronizes from scratch.</p>\n\n<p>The only time this gets complicated is if you have multiple Litestreams backing up to the same destination. To keep multiple Litestreams from stepping on each other, Litestream divides backups into “generations”, creating a new one any time it resyncs. You can think of generations as Marvel Cinematic Universe parallel dimensions in which your database might be simultaneously living in.</p>\n\n<p>Yeah, we didn’t like those movies much either.</p>\n\n<p>LTX-backed Litestream does away with the concept entirely. Instead, when we detect a break in WAL file continuity, we re-snapshot with the next LTX file. Now we have a monotonically incrementing transaction ID. We can use it look up database state at any point in time, without searching across generations.</p>\n<h2 id='upgrading-to-litestream-v0-5-0' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#upgrading-to-litestream-v0-5-0' aria-label='Anchor'></a><span class='plain-code'>Upgrading to Litestream v0.5.0</span></h2>\n<p>Due to the file format changes, the new version of Litestream can’t restore from old v0.3.x WAL segment files.</p>\n\n<p>That’s OK though! The upgrade process is simple: just start using the new version. It’ll leave your old WAL files intact, in case you ever need to revert to the older version.The new LTX files are stored cleanly in an <code>ltx</code> directory on your replica.</p>\n\n<p>The configuration file is fully backwards compatible.</p>\n\n<p>There’s one small catch. We added a new constraint. You only get a single replica destination per database. This probably won’t affect you, since it’s how most people use Litestream already. We’ve made it official.</p>\n\n<p>The rationale: having a single source of truth simplifies development for us, and makes the tool easier to reason about. Multiple replicas can diverge and are sensitive to network availability. Conflict resolution is brain surgery.</p>\n\n<p>Litestream commands still work the same. But you’ll see references to “transaction IDs” (TXID) for LTX files, rather than the <code>generation/index/offset</code> we used previously with WAL segments.</p>\n\n<p>We’ve also changed <code>litestream wal</code> to <code>litestream ltx</code>.</p>\n<h2 id='other-stuff-v0-5-0-does-better' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#other-stuff-v0-5-0-does-better' aria-label='Anchor'></a><span class='plain-code'>Other Stuff v0.5.0 Does Better</span></h2>\n<p>We’ve beefed up the <a href='https://github.com/superfly/ltx' title=''>underlying LTX file format library</a>. It used to be an LTX file was just a sorted list of pages, all compressed together. Now we compress per-page, and keep an index at the end of the LTX file to pluck individual pages out.</p>\n\n<p>You’re not seeing it yet, but we’re excited about this change: we can operate page-granularly even dealing with large LTX files. This allows for more features. A good example: we can build features that query from any point in time, without downloading the whole database.</p>\n\n<p>We’ve also gone back through old issues & PRs to improve quality-of-life. CGO is now gone. We’ve settled the age-old contest between <code>mattn/go-sqlite3</code> and <code>modernc.org/sqlite</code> in favor of <code>modernc.org</code>. This is super handy for people with automated build systems that want to run from a MacBook but deploy on an x64 server, since it lets the cross-compiler work.</p>\n\n<p>We’ve also added a replica type for NATS JetStream. Users that already have JetStream running can get Litestream going without adding an object storage dependency.</p>\n\n<p>And finally, we’ve upgraded all our clients (S3, Google Storage, & Azure Blob Storage) to their latest versions. We’ve also moved our code to support newer S3 APIs.</p>\n<h2 id='whats-next' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next' aria-label='Anchor'></a><span class='plain-code'>What’s next?</span></h2>\n<p>The next major feature we’re building out is a Litestream VFS for read replicas. This will let you instantly spin up a copy of the database and immediately read pages from S3 while the rest of the database is hydrating in the background.</p>\n\n<p>We already have a proof of concept working and we’re excited to show it off when it’s ready!</p>",
      "image": {
        "url": "https://fly.io/blog/litestream-v050-is-here/assets/litestream-v050-is-here.jpg",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/build-better-agents-with-morphllm/",
      "title": "Build Better Agents With MorphLLM",
      "description": null,
      "url": "https://fly.io/blog/build-better-agents-with-morphllm/",
      "published": "2025-08-25T00:00:00.000Z",
      "updated": "2025-09-03T19:05:57.000Z",
      "content": "<p>I’m an audiophile, which is a nice way to describe someone who spends their children’s college fund on equipment that yields no audible improvement in sound quality. As such, I refused to use wireless headphones for the longest time. The fun thing about wired headphones is when you forget they’re on and you stand up, you simultaneously cause irreparable neck injuries and extensive property damage. This eventually prompted me to buy good wireless headphones and, you know what, I break fewer things now. I can also stand up from my desk and not be exposed to the aural horrors of the real world. </p>\n\n<p>This is all to say, sometimes you don’t know how big a problem is until you solve it. This week, I chatted to the fine people building <a href='https://morphllm.com/' title=''>MorphLLM</a>, which is exactly that kind of solution for AI agent builders. </p>\n<h2 id='slow-wasteful-and-expensive-ai-code-changes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#slow-wasteful-and-expensive-ai-code-changes' aria-label='Anchor'></a><span class='plain-code'>Slow, Wasteful and Expensive AI Code Changes</span></h2>\n<p>If you’re building AI agents that write or edit code, you’re probably accepting the following as “the way it is”: Your agent needs to correct a single line of code, but rewrites an entire file to do it. Search-and-replace right? It’s fragile, breaks formatting, silently fails, or straight up leaves important functions out. The result is slow, inaccurate code changes, excessive token use, and an agent feels incompetent and unreliable.</p>\n\n<p>Full file rewrites are context-blind and prone to hallucinations, especially when editing that 3000+ line file that you’ve been meaning to refactor. And every failure and iteration is wasted compute, wasted money and worst of all, wasted time.</p>\n<h2 id='why-we-arent-thinking-about-this-or-why-i-wasnt' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-we-arent-thinking-about-this-or-why-i-wasnt' aria-label='Anchor'></a><span class='plain-code'>Why We Aren’t Thinking About This (or why I wasn’t)</span></h2>\n<p>AI workflows are still new to everyone. Best practices are still just opinions and most tooling is focused on model quality, not developer velocity or cost. This is a big part of why we feel that slow, wasteful code edits are just the price of admission for AI-powered development.</p>\n\n<p>In reality, these inefficiencies become a real bottleneck for coding agent tools. The hidden tax on every code edit adds up and your users pay with their time, especially as teams scale and projects grow more complex.</p>\n<h2 id='better-faster-ai-code-edits-with-morph-fast-apply' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#better-faster-ai-code-edits-with-morph-fast-apply' aria-label='Anchor'></a><span class='plain-code'>Better, Faster AI Code Edits with Morph Fast Apply</span></h2>\n<p>MorphLLM’s core innovation is Morph Fast Apply. It’s an edit merge tool that is semantic, structure-aware and designed specifically for code. Those are big words to describe a tool that will empower your agents to make single line changes without  rewriting whole files or relying on brittle search-and-replace. Instead, your agent applies precise, context-aware edits and it does it ridiculously fast. </p>\n\n<p>It works like this: </p>\n\n<ul>\n<li>You add an ‘edit_file’ tool to your agents tools.\n</li><li>Your agent outputs tiny <code>edit_file</code> snippets, using <code>//...existing code...</code> placeholders to indicate unchanged code.\n</li><li>Your backend calls Morph’s Apply API, which merges the changes semantically. It doesn’t just replace raw text, it makes targeted merges with the code base as context. \n</li><li>You write back the precisely edited file. No manual patching, no painful conflict resolution, no context lost.\n</li></ul>\n<h2 id='the-numbers' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-numbers' aria-label='Anchor'></a><span class='plain-code'>The Numbers</span></h2>\n<p>MorphLLM’s Apply API processes over 4,500 tokens per second and their benchmark results are nuts. We’re talking 98% accuracy in ~6 seconds per file. Compare this to 35s (with error corrections) at 86% accuracy for traditional search-and-replace systems. Files up to 9k tokens in size take ~4 seconds to process. </p>\n\n<p>Just look at the damn <a href='https://morphllm.com/benchmarks' title=''>graph</a>:</p>\n\n<p><img alt=\"Time Performance Analysis\" src=\"/blog/build-better-agents-with-morphllm/assets/morph_graph.webp\" /></p>\n\n<p>These are game-changing numbers for agent builders. Real-time code UIs become possible. Dynamic codebases can self-adapt in seconds, not minutes. Scale to multi-file edits, documentation, and even large asset transformations without sacrificing speed or accuracy.</p>\n<h2 id='how-to-get-in-on-the-morphllm-action' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-to-get-in-on-the-morphllm-action' aria-label='Anchor'></a><span class='plain-code'>How to Get in on the MorphLLM Action</span></h2>\n<p>Integration with your project is easy peasy. MorphLLM is API-compatible with OpenAI, Vercel AI SDK, MCP, and OpenRouter. You can run it in the cloud, self-host, or go on-prem with enterprise-grade guarantees. </p>\n\n<p>I want to cloud host mine, if only I could think of somewhere I could quickly and easily deploy wherever I want and only pay for when I’m using the infra 😉.</p>\n<h2 id='get-morphed' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#get-morphed' aria-label='Anchor'></a><span class='plain-code'>Get Morphed</span></h2>\n<p>MorphLLM feels like a plug-in upgrade for code agent projects that will instantly make them faster and more accurate. Check out the docs, benchmarks, and integration guides at <a href='https://docs.morphllm.com/' title=''>docs.morphllm.com</a>. Get started for free at <a href=\"https://morphllm.com/dashboard\">https://morphllm.com/dashboard</a> </p>",
      "image": {
        "url": "https://fly.io/blog/build-better-agents-with-morphllm/assets/morphllm.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/trust-calibration-for-ai-software-builders/",
      "title": "Trust Calibration for AI Software Builders",
      "description": null,
      "url": "https://fly.io/blog/trust-calibration-for-ai-software-builders/",
      "published": "2025-08-18T00:00:00.000Z",
      "updated": "2025-08-19T08:30:16.000Z",
      "content": "<div class=\"lead\"><p>Trust calibration is a concept from the world of human-machine interaction design, one that is super relevant to AI software builders. Trust calibration is the practice of aligning the level of trust that users have in our products with its actual capabilities. </p>\n</div>\n<p>If we build things that our users trust too blindly, we risk facilitating dangerous or destructive interactions that can permanently turn users off. If they don’t trust our product enough, it will feel useless or less capable than it actually is. </p>\n\n<p>So what does trust calibration look like in practice and how do we achieve it? A 2023 study reviewed over 1000 papers on trust and trust calibration in human / automated systems (properly referenced at the end of this article). It holds some pretty eye-opening insights – and some inconvenient truths – for people building AI software. I’ve tried to extract just the juicy bits below.   </p>\n<h2 id='limiting-trust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#limiting-trust' aria-label='Anchor'></a><span class='plain-code'>Limiting Trust</span></h2>\n<p>Let’s begin with a critical point. There is a limit to how deeply we want users to trust our products. Designing for calibrated trust is the goal, not more trust at any cost. Shoddy trust calibration leads to two equally undesirable outcomes: </p>\n\n<ul>\n<li><strong class='font-semibold text-navy-950'>Over-trust</strong> causes users to rely on AI systems in situations where they shouldn’t (I told my code assistant to fix a bug in prod and went to bed).\n</li><li><strong class='font-semibold text-navy-950'>Under-trust</strong> causes users to reject AI assistance even when it would be beneficial, resulting in reduced perception of value and increased user workload.\n</li></ul>\n\n<p>What does calibrated trust look like for your product? It’s important to understand that determining this is less about trying to diagram a set of abstract trust parameters and more about helping users develop accurate mental models of your product’s capabilities and limitations. In most cases, this requires thinking beyond the trust calibration mechanisms we default to, like confidence scores. </p>\n\n<p>For example, Cursor’s most prominent trust calibration mechanism is its change suggestion highlighting. The code that the model suggests we change is highlighted in red, followed by suggested changes highlighted in green. This  immediately communicates that “this is a suggestion, not a command.” </p>\n\n<p>In contrast, Tesla’s Autopilot is a delegative system. It must calibrate trust differently through detailed capability explanations, clear operational boundaries (only on highways), and prominent disengagement alerts when conditions exceed system limits. </p>\n<h2 id='building-cooperative-systems' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#building-cooperative-systems' aria-label='Anchor'></a><span class='plain-code'>Building Cooperative Systems</span></h2>\n<p>Perhaps the most fundamental consideration in determining high level trust calibration objectives is deciding whether your project is designed to be a cooperative or a delegative tool.  </p>\n\n<p>Cooperative systems generally call for lower levels of trust because users can choose whether to accept or reject AI suggestions. But these systems also face a unique risk. It’s easy for over-trust to gradually transform user complacency into over-reliance, effectively transforming what we designed as a cooperative relationship into a delegative one, only without any of the required safeguards.</p>\n\n<p>If you’re building a coding assistant, content generator, or design tool, implement visible “suggestion boundaries” which make it clear when the AI is offering ideas versus making decisions. Grammarly does this well by underlining suggestions rather than auto-correcting, and showing rationale on hover. </p>\n\n<p>For higher-stakes interactions, consider introducing friction. Require explicit confirmation before applying AI suggestions to production code or publishing AI-generated content.</p>\n<h2 id='building-delegative-systems' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#building-delegative-systems' aria-label='Anchor'></a><span class='plain-code'>Building Delegative Systems</span></h2>\n<p>In contrast, users expect delegative systems to replace human action entirely. Blind trust in the system is a requirement for it to be considered valuable at all. </p>\n\n<p>If you’re building automation tools, smart scheduling, or decision-making systems, invest heavily in capability communication and boundary setting. Calendly’s smart scheduling works because it clearly communicates what it will and won’t do (I’ll find times that work for both of us vs. I’ll reschedule your existing meetings). Build robust fallback mechanisms and make system limitations prominent in your onboarding.  </p>\n<h2 id='timing-is-everything' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#timing-is-everything' aria-label='Anchor'></a><span class='plain-code'>Timing Is Everything</span></h2>\n<p>The study suggests that when we make trust calibrations is at least as important as how. There are three critical windows for trust calibration, each with their own opportunities and challenges. </p>\n\n<ul>\n<li><strong class='font-semibold text-navy-950'>Pre-interaction calibration</strong> happens before users engage with the system. Docs and tutorials fall into this category. Setting expectations up front can prevent initial over-trust, which is disproportionally more difficult to correct later. \n</li></ul>\n\n<blockquote>\n<p>Pre-interaction calibrations could look like capability-focused onboarding that shows both successes and failures. Rather than just demonstrating perfect AI outputs, show users examples where the AI makes mistakes and how to catch them. </p>\n</blockquote>\n\n<ul>\n<li><strong class='font-semibold text-navy-950'>During-interaction calibration</strong> is trust adjustment through real-time feedback. Dynamically updated cues improve trust calibration better than static displays, and adaptive calibration that responds to user behavior outperforms systems that display static information. \n</li></ul>\n\n<blockquote>\n<p>Build confidence indicators that are updated based on context, not just model confidence. For example, if you’re building a document AI, show higher confidence for standard document types the system has seen thousands of times, and lower confidence for unusual formats. </p>\n</blockquote>\n\n<ul>\n<li><strong class='font-semibold text-navy-950'>Post-interaction calibration</strong> focuses on learning and adjustment that helps users understand successes and failures in the system after interactions. These aren’t reliable, since by the time users receive the information, their trust patterns are set and hard to change. \n</li></ul>\n\n<blockquote>\n<p>Post-interaction feedback can still be valuable for teaching. Create “reflection moments” after significant interactions. Midjourney does this by letting users rate image outputs, helping users learn what prompts work best while calibrating their expectations for future generations. </p>\n</blockquote>\n\n<p>Trust is front-loaded and habit-driven. The most effective calibration happens before and during use, when expectations are still forming and behaviors can still be shifted. Any later and you’re mostly fighting entrenched patterns.</p>\n<h2 id='performance-vs-process-information' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#performance-vs-process-information' aria-label='Anchor'></a><span class='plain-code'>Performance vs. Process Information</span></h2>\n<p>Users can be guided through performance-oriented signals (what the system can do) or process-oriented signals (how it works). The real challenge is matching the right kind of explanation to the right user, at the right moment.</p>\n\n<ul>\n<li><strong class='font-semibold text-navy-950'>Performance-oriented calibration</strong> focuses on communicating capability through mechanisms like reliability statistics, confidence scores, and clear capability boundaries. \n</li><li><strong class='font-semibold text-navy-950'>Process-oriented calibration</strong> offers detailed explanations of decision-making processes, breakdowns of which factors influenced decisions, and reasoning transparency. \n</li></ul>\n\n<p>Process transparency seems like the obvious go-to at first glance, but the effectiveness of process explanations varies wildly based on user expertise and domain knowledge. If we are designing for a set of users that may fall anywhere on this spectrum, we have to avoid creating information overload for novice users while providing sufficient information to expert users who want the detail.  </p>\n\n<p>The most effective systems in the study combined both approaches, providing layered information that allows users to access the level of detail most appropriate for their expertise and current needs.</p>\n<h2 id='static-vs-adaptive-calibration' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#static-vs-adaptive-calibration' aria-label='Anchor'></a><span class='plain-code'>Static vs. Adaptive Calibration</span></h2>\n<p>I really wanted to ignore this part, because it feels like the study’s authors are passive aggressively adding todos to my projects. In a nutshell, adaptive calibration – when a system actively monitors user behavior and adjusts its communication accordingly - is orders of magnitude more effective than static calibration while delivering the same information to every user, regardless of differences in expertise, trust propensity, or behavior.  </p>\n\n<p>Static calibration mechanisms are easy to build and maintain, which is why we like them. But the stark reality is that they put the burden of appropriate calibration entirely on our users. We’re making it their job to adapt their behaviour based on generic information.</p>\n\n<p>This finding has zero respect for our time or mental health, but it also reveals a legit opportunity for clever builders to truly separate their product from the herd.</p>\n<h2 id='practical-adaptive-calibration-techniques' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#practical-adaptive-calibration-techniques' aria-label='Anchor'></a><span class='plain-code'>Practical adaptive calibration techniques</span></h2>\n<ul>\n<li><strong class='font-semibold text-navy-950'>Behavioral adaptation:</strong> Track how often users accept vs. reject suggestions and adjust confidence thresholds accordingly. If a user consistently rejects high-confidence suggestions, lower the threshold for showing uncertainty.\n</li><li><strong class='font-semibold text-navy-950'>Context awareness:</strong> Adjust trust signals based on use context. A writing AI might show higher confidence for grammar fixes than creative suggestions, or lower confidence late at night when users might be tired.\n</li><li><strong class='font-semibold text-navy-950'>Detect expertise:</strong> Users who frequently make sophisticated edits to AI output probably want more detailed explanations than those who typically accept entire file rewrites.\n</li></ul>\n<h2 id='the-transparency-paradox' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-transparency-paradox' aria-label='Anchor'></a><span class='plain-code'>The Transparency Paradox</span></h2>\n<p>The idea that transparency and explainability can actually harm trust calibration is easily the point that hit me the hardest. While explanations can improve user understanding, they can also create information overload that reduces users’ ability to detect and correct trash output. What’s worse, explanations can create a whole new layer of trust calibration issues, with users over-trusting the explanation mechanism itself, rather than critically evaluating the actual output.</p>\n\n<p>This suggests that quality over quantity should be our design philosophy when it comes to transparency. We should provide carefully crafted, relevant information rather than comprehensive but overwhelming detail. The goal should be enabling better decision-making rather than simply satisfying user curiosity about system internals.</p>\n<h2 id='anthropomorphism-and-unwarranted-trust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#anthropomorphism-and-unwarranted-trust' aria-label='Anchor'></a><span class='plain-code'>Anthropomorphism and Unwarranted Trust</span></h2>\n<p>It seems obvious that we should make interactions with our AI project feel as human as possible. Well, it turns out that systems that appear more human-like through design, language, or interaction patterns are notoriously good at increasing user trust beyond actual system capabilities. </p>\n\n<p>So it’s entirely possible that building more traditional human-computer interactions can actually make our AI projects safer to use and therefore, more user-friendly.  </p>\n\n<ul>\n<li><strong class='font-semibold text-navy-950'>Use tool-like language:</strong> Frame outputs as “analysis suggests” rather than “I think” or “I believe”\n</li><li><strong class='font-semibold text-navy-950'>Embrace machine-like precision:</strong> Show exact confidence percentages rather than human-like hedging (“I’m pretty sure that…)\n</li></ul>\n<h2 id='trust-falls-faster-than-it-climbs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#trust-falls-faster-than-it-climbs' aria-label='Anchor'></a><span class='plain-code'>Trust Falls Faster Than It Climbs</span></h2>\n<p>Nothing particularly groundbreaking here, but the findings are worth mentioning if only to reinforce what we think we know. </p>\n\n<p>Early interactions are critically important. Users form mental models quickly and then react slowly to changes in system reliability.</p>\n\n<p>More critically, trust drops much faster from system failures than it builds from successes. These asymmetries suggest that we should invest disproportionately in onboarding and first-use experiences, even if they come with higher development costs.</p>\n<h2 id='measurement-is-an-opportunity-for-innovation' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#measurement-is-an-opportunity-for-innovation' aria-label='Anchor'></a><span class='plain-code'>Measurement is an Opportunity for Innovation</span></h2>\n<p>The study revealed gaping voids where effective measurement mechanisms and protocols should be, for both researchers and builders. There is a clear need to move beyond simple user satisfaction metrics or adoption rates to developing measurement frameworks that can actively detect miscalibrated trust patterns. </p>\n\n<p>The ideal measurement approach would combine multiple indicators. A few examples of viable indicators are:</p>\n\n<ul>\n<li><strong class='font-semibold text-navy-950'>Behavioral signals:</strong> Track acceptance rates for different confidence levels. Well-calibrated trust should show higher acceptance rates for high-confidence outputs and lower rates for low-confidence ones.\n</li><li><strong class='font-semibold text-navy-950'>Context-specific metrics:</strong> Measure trust calibration separately for different use cases. Users might be well-calibrated for simple tasks but poorly calibrated for complex ones.\n</li><li><strong class='font-semibold text-navy-950'>User self-reporting:</strong> Regular pulse surveys asking \"How confident are you in your ability to tell when this AI makes mistakes?” can reveal calibration gaps.\n</li></ul>\n<h2 id='the-calibrated-conclusion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-calibrated-conclusion' aria-label='Anchor'></a><span class='plain-code'>The Calibrated Conclusion</span></h2>\n<p>It’s clear, at least from this study, that there’s no universal formula, or single feature that will effectively calibrate trust. It’s up to every builder to define and understand their project’s trust goals and to balance timing, content, adaptivity, and transparency accordingly. That’s what makes it both hard and worth doing. Trust calibration has to be a core part of our product’s identity, not a piglet we only start chasing once it has escaped the barn.</p>\n\n<p><strong class='font-semibold text-navy-950'>The Study:</strong></p>\n\n<p>Magdalena Wischnewski, Nicole Krämer, and Emmanuel Müller. 2023. Measuring and Understanding Trust Calibrations for Automated Systems: A Survey of the State-Of-The-Art and Future Directions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ‘23), April 23–28, 2023, Hamburg, Germany. ACM, New York, NY, USA 16 Pages. <a href=\"https://doi.org/10.1145/3544548.3581197\">https://doi.org/10.1145/3544548.3581197</a></p>",
      "image": {
        "url": "https://fly.io/blog/trust-calibration-for-ai-software-builders/assets/trust_calibration.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/games-as-model-eval/",
      "title": "Games as Model Eval: 1-Click Deploy AI Town on Fly.io",
      "description": null,
      "url": "https://fly.io/blog/games-as-model-eval/",
      "published": "2025-08-11T00:00:00.000Z",
      "updated": "2025-08-15T08:35:19.000Z",
      "content": "<div class=\"lead\"><p>Recently, I suggested that <a href=\"https://fly.io/blog/the-future-isn-t-model-agnostic/\" title=\"\">The Future Isn’t Model Agnostic</a>, that it’s better to pick one model that works for your project and build around it, rather than engineering for model flexibility. If you buy that, you also have to acknowledge how important comprehensive model evaluation becomes. </p>\n</div>\n<p>Benchmarks tell us almost nothing about how a model will actually behave in the wild, especially with long contexts, or when trusted to deliver the tone and feel that defines the UX we’re shooting for. Even the best evaluation pipelines usually end in subjective, side-by-side output comparisons. Not especially rigorous, and more importantly, boring af.</p>\n\n<p>Can we gamify model evaluation? Oh yes. And not just because we get to have some fun for once. Google backed me up this week when it announced the <a href='https://blog.google/technology/ai/kaggle-game-arena/' title=''>Kaggle Game Arena</a>. A public platform where we can watch AI models duke it out in a variety of classic games.  Quoting Google; “Current AI benchmarks are struggling to keep pace with modern models… it can be hard to know if models trained on internet data are actually solving problems or just remembering answers they’ve already seen.”</p>\n\n<p>When models boss reading comprehension tests, or ace math problems, we pay attention. But when they fail to navigate a simple conversation with a virtual character or completely botch a strategic decision in a game environment, we tell ourselves we’re not building a game anyway and develop strategic short-term memory loss. \nJust like I’ve told my mom a thousand times, games are great at testing brains, and it’s time we take this seriously when it comes to model evaluation. </p>\n<h2 id='why-games-dont-lie' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-games-dont-lie' aria-label='Anchor'></a><span class='plain-code'>Why Games Don’t Lie</span></h2>\n<p>Games provide what benchmarks can’t, “a clear, unambiguous signal of success.” They give us observable behavior in dynamic environments, the kind that would be extremely difficult (and tedious) to simulate with prompt engineering alone.</p>\n\n<p>Games force models to demonstrate the skills we actually care about; strategic reasoning, long-term planning, and dynamic adaptation in interactions with an opponent or a collaborator. </p>\n<h2 id='pixel-art-meets-effective-model-evaluation-ai-town-on-fly-io' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pixel-art-meets-effective-model-evaluation-ai-town-on-fly-io' aria-label='Anchor'></a><span class='plain-code'>Pixel Art Meets Effective Model Evaluation - AI Town on Fly.io</span></h2>\n<p>AI Town is a brilliant project by <a href='https://github.com/a16z-infra' title=''>a16z-infra</a>, based on the the mind-bending paper,  <a href='https://arxiv.org/pdf/2304.03442' title=''>Generative Agents: Interactive Simulacra of Human Behavior</a>. It’s a beautifully rendered little town in which tiny people with AI brains and engineered personalities go about their lives, interacting with each other and their environment. Characters need to remember past conversations, maintain relationships, react dynamically to new situations, and stay in character while doing it all. </p>\n\n<p>I challenge you to find a more entertaining way of evaluating conversational models. </p>\n\n<p>I’ve <a href='https://github.com/fly-apps/ai-town_on_fly.io' title=''>forked the project</a> to make it absurdly easy to spin up your own AI Town on Fly Machines. You’ve got a single deploy script that will set everything up for you and some built-in cost and performance optimizations, with our handy scale to zero functionality as standard (so you only pay for the time spent running it). This makes it easy to share with your team, your friends and your mom.  </p>\n\n<p>In it’s current state, the fork makes it as easy as possible to test any OpenAI-compatible service, any model on Together.ai and even custom embedding models. Simply set the relevant API key in your secrets. </p>\n\n<p>Games like AI Town give us a window into how models actually think, adapt, and behave beyond the context of our prompts. You move past performance metrics and begin to understand a model’s personality, quirks, strengths, and weaknesses; all factors that ultimately shape your project’s UX. </p>",
      "image": {
        "url": "https://fly.io/blog/games-as-model-eval/assets/Fly_Man.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/the-future-isn-t-model-agnostic/",
      "title": "The Future Isn't Model Agnostic",
      "description": null,
      "url": "https://fly.io/blog/the-future-isn-t-model-agnostic/",
      "published": "2025-08-08T00:00:00.000Z",
      "updated": "2025-08-22T16:31:43.000Z",
      "content": "<div class=\"lead\"><p>Your users don’t care that your AI project is model \nagnostic. </p>\n</div>\n<p>In my last project, I spent countless hours ensuring that the LLMs running my services  could be swapped out as easily as possible. I couldn’t touch a device with an internet connection without hearing about the latest benchmark-breaking model and it felt like a clear priority to ensure I could hot swap models with minimal collateral damage.</p>\n\n<p>So yeah. That was a waste of time.</p>\n\n<p>The hype around new model announcements feels more manufactured with each release. In reality, improvements are becoming incremental. As major providers converge on the same baseline, the days of one company holding a decisive lead are numbered.</p>\n\n<p>In a world of model parity, the differentiation moves entirely to the product layer. Winning isn’t about ensuring you’re using the best model, its about understanding your chosen model deeply enough to build experiences that feel magical. Knowing exactly how to prompt for consistency, which edge cases to avoid, and how to design workflows that play to your model’s particular strengths</p>\n\n<p>Model agnosticism isn’t just inefficient, it’s misguided. Fact is, swapping out your model is not just changing an endpoint. It’s rewriting prompts, rerunning evals, users telling you things just feel… different. And if you’ve won users on the way it feels to use your product, that last one is a really big deal.</p>\n<h2 id='model-lt-product' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#model-lt-product' aria-label='Anchor'></a><span class='plain-code'>Model < Product</span></h2>\n<p>Recently, something happened that fully solidified this idea in my head. Claude Code is winning among people building real things with AI. We even have evangelists in the Fly.io engineering team, and those guys are weird smart. Elsewhere, whole communities have formed to share and compare claude.md’s and fight each other over which MCP servers are the coolest to use with Claude.</p>\n\n<p>Enter stage right, Qwen 3 Coder. It takes Claude to the cleaners in benchmarks. But the response from the Claude Code user base? A collective meh.</p>\n\n<p>This is nothing like 2024, when everyone would have dropped everything to get the hot new model running in Cursor. And it’s not because we’ve learned that benchmarks are performance theater for people who’ve never shipped a product.</p>\n\n<p>It’s because products like Claude Code are irrefutable evidence that the model isn’t the product. We’ve felt it first hand when our pair programmer’s behaviour changes in subtle ways. The product is in the rituals. The trust. The predictability. It’s precisely because Claude Code’s model behavior, UI, and user expectations are so tightly coupled that its users don’t really care that a better model might exist.</p>\n\n<p>I’m not trying to praise Anthropic here. The point is, engineering for model agnosticism is a trap that will eat up time that could be better spent … anywhere else.</p>\n\n<p>Sure, if you’re building infra or anything else that lives close to the metal, model optionality still matters. But people trusting legwork to AI tools are building deeper relationships and expectations of their AI tools than they even care to admit. AI product success stories are written when products become invisible parts of users’ daily rituals, not showcases for engineering flexibility.</p>\n<h2 id='make-one-model-your-own' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#make-one-model-your-own' aria-label='Anchor'></a><span class='plain-code'>Make One Model Your Own</span></h2>\n<p>As builders, it’s time we stop hedging our bets and embrace the convergence reality. Every startup pitch deck with ‘model-agnostic’ as a feature should become a red flag for investors who understand product-market fit. Stop putting ‘works with any LLM’ in your one-liner. It screams ‘we don’t know what we’re building.’</p>\n\n<p>If you’re still building model-agnostic AI tools in 2025, you’re optimizing for the wrong thing. Users don’t want flexibility; they want reliability. And in a converged model landscape, reliability comes from deep specialization, not broad compatibility.</p>\n\n<p>Pick your model like you pick your therapist; for the long haul. Find the right model, tune deeply, get close enough to understand its quirks and make them work for you. Stop architecting for the mythical future where you’ll seamlessly swap models. That future doesn’t exist, and chasing it is costing you the present.</p>\n<h2 id='bonus-level-all-in-on-one-model-means-all-out-on-eval' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#bonus-level-all-in-on-one-model-means-all-out-on-eval' aria-label='Anchor'></a><span class='plain-code'>Bonus level: All-in On One Model Means All-out On Eval</span></h2>\n<p>If any of this is landing for you, you’ll agree that we have to start thinking of  model evaluation as architecture, not an afterthought. The good news is, rigorous model eval doesn’t have to be mind numbing anymore. </p>\n\n<p>Turns out, games are really great eval tools! Now you can spin up your very own little <a href='https://github.com/fly-apps/ai-town_on_fly.io' title=''>AI Town</a> on Fly.io with a single click deploy to test different models as pixel people in an evolving environment. I discuss the idea further in <a href='https://fly.io/blog/games-as-model-eval/' title=''>Games as Model Eval: 1-Click Deploy AI Town on Fly.io</a>.</p>",
      "image": {
        "url": "https://fly.io/blog/the-future-isn-t-model-agnostic/assets/Whack_A_Mole_.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/phoenix-new-the-remote-ai-runtime/",
      "title": "Phoenix.new – The Remote AI Runtime for Phoenix",
      "description": null,
      "url": "https://fly.io/blog/phoenix-new-the-remote-ai-runtime/",
      "published": "2025-06-20T00:00:00.000Z",
      "updated": "2025-06-24T17:23:07.000Z",
      "content": "<div class=\"lead\"><p>I’m Chris McCord, the creator of Elixir’s Phoenix framework. For the past several months, I’ve been working on a skunkworks project at Fly.io, and it’s time to show it off.</p>\n</div>\n<p>I wanted LLM agents to work just as well with Elixir as they do with Python and JavaScript. Last December, in order to figure out what that was going to take, I started a little weekend project to find out how difficult it would be to build a coding agent in Elixir.</p>\n\n<p>A few weeks later, I had it spitting out working Phoenix applications and driving a full in-browser IDE. I knew this wasn’t going to stay a weekend project.</p>\n\n<p>If you follow me on Twitter, you’ve probably seen me teasing this work as it picked up steam. We’re at a point where we’re pretty serious about this thing, and so it’s time to make a formal introduction.</p>\n\n<p>World, meet <a href='https://phoenix.new' title=''>Phoenix.new</a>, a batteries-included fully-online coding agent tailored to Elixir and Phoenix. I think it’s going to be the fastest way to build collaborative, real-time applications.</p>\n\n<p>Let’s see it in action:</p>\n<div class=\"youtube-container\" data-exclude-render>\n  <div class=\"youtube-video\">\n    <iframe\n      width=\"100%\"\n      height=\"100%\"\n      src=\"https://www.youtube.com/embed/du7GmWGUM5Y\"\n      frameborder=\"0\"\n      allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n      allowfullscreen>\n    </iframe>\n  </div>\n</div>\n\n<h2 id='whats-interesting-about-phoenix-new' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-interesting-about-phoenix-new' aria-label='Anchor'></a><span class='plain-code'>What’s Interesting About Phoenix.new</span></h2>\n<p>First, even though it runs entirely in your browser, Phoenix.new gives both you and your agent a root shell, in an ephemeral virtual machine (a <a href='https://fly.io/docs/machines/overview/' title=''>Fly Machine</a>) that gives our agent loop free rein to install things and run programs  — without any risk of messing up your local machine. You don’t think about any of this; you just open up the VSCode interface, push the shell button, and there you are, on the isolated machine you share with the Phoenix.new agent.</p>\n\n<p>Second, it’s an agent system I built specifically for Phoenix. Phoenix is about real-time collaborative applications, and Phoenix.new knows what that means. To that end, Phoenix.new includes, in both its UI and its agent tools, a full browser. The Phoenix.new agent uses that browser “headlessly” to check its own front-end changes and interact with the app. Because it’s a full browser, instead of trying to iterate on screenshots, the agent sees real page content and JavaScript state – with or without a human present.</p>\n<h2 id='what-root-access-gets-us' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-root-access-gets-us' aria-label='Anchor'></a><span class='plain-code'>What Root Access Gets Us</span></h2>\n<p>Agents build software the way you did when you first got started, the way you still do today when you prototype things. They don’t carefully design Docker container layers and they don’t really do release cycles. An agent wants to pop a shell and get its fingernails dirty.</p>\n\n<p>A fully isolated virtual machine means Phoenix.new’s fingernails can get <em>arbitrarily dirty.</em> If it wants to add a package to <code>mix.exs</code>, it can do that and then run <code>mix phx.server</code> or <code>mix test</code> and check the output. Sure. Every agent can do that. But if it wants to add an APT package to the base operating system, it can do that too, and make sure it worked. It owns the whole environment.</p>\n\n<p>This offloads a huge amount of tedious, repetitive work.</p>\n\n<p>At his <a href='https://youtu.be/LCEmiRjPEtQ?si=sR_bdu6-AqPXSNmY&t=1902' title=''>AI Startup School talk last week</a>, Andrej Karpathy related his experience of building a restaurant menu visualizer, which takes camera pictures of text menus and transforms all the menu items into pictures. The code, which he vibe-coded with an LLM agent, was the easy part; he had it working in an afternoon. But getting the app online took him a whole week.</p>\n\n<p>With Phoenix.new, I’m taking dead aim at this problem. The apps we produce live in the cloud from the minute they launch. They have private, shareable URLs (we detect anything the agent generates with a bound port and give it a preview URL underneath <code>phx.run</code>, with integrated port-forwarding), they integrate with Github, and they inherit all the infrastructure guardrails of Fly.io: hardware virtualization, WireGuard, and isolated networks.</p>\n<div class=\"callout\"><p>Github’s <code>gh</code> CLI is installed by default. So the agent knows how to clone any repo, or browse issues, and you can even authorize it for internal repositories to get it working with your team’s existing projects and dependencies.</p>\n</div>\n<p>Full control of the environment also closes the loop between the agent and deployment. When Phoenix.new boots an app, it watches the logs, and tests the application. When an action triggers an error, Phoenix.new notices and gets to work.</p>\n<h2 id='watch-it-build-in-real-time' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#watch-it-build-in-real-time' aria-label='Anchor'></a><span class='plain-code'>Watch It Build In Real Time</span></h2>\n<p><a href='https://phoenix.new' title=''>Phoenix.new</a> can interact with web applications the way users do: with a real browser.</p>\n\n<p>The Phoenix.new environment includes a headless Chrome browser that our agent knows how to drive. Prompt it to add a front-end feature to your application, and it won’t just sketch the code out and make sure it compiles and lints. It’ll pull the app up itself and poke at the UI, simultaneously looking at the page content, JavaScript state, and server-side logs.</p>\n\n<p>Phoenix is all about <a href='https://fly.io/blog/how-we-got-to-liveview/' title=''>“live” real-time</a> interactivity, and gives us seamless live reload. The user interface for Phoenix.new itself includes a live preview of the app being worked on, so you can kick back and watch it build front-end features incrementally. Any other <code>.phx.run</code> tabs you have open also update as it goes. It’s wild.</p>\n<video title=\"agent interacting with web\" autoplay=\"autoplay\" loop=\"loop\" muted=\"muted\" playsinline=\"playsinline\" disablePictureInPicture=\"true\" class=\"mb-8\" src=\"/blog/phoenix-new-the-remote-ai-runtime/assets/webjs.mp4\"></video>\n\n<h2 id='not-just-for-vibe-coding' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#not-just-for-vibe-coding' aria-label='Anchor'></a><span class='plain-code'>Not Just For Vibe Coding</span></h2>\n<p>Phoenix.new can already build real, full-stack applications with WebSockets, Phoenix’s Presence features, and real databases. I’m seeing it succeed at business and collaborative applications right now.</p>\n\n<p>But there’s no fixed bound on the tasks you can reasonably ask it to accomplish. If you can do it with a shell and a browser, I want Phoenix.new to do it too. And it can do these tasks with or without you present.</p>\n\n<p>For example: set a <code>$DATABASE_URL</code> and tell the agent about it. The agent knows enough to go explore it with <code>psql</code>, and it’ll propose apps based on the schemas it finds. It can model Ecto schemas off the database. And if MySQL is your thing, the agent will just <code>apt install</code> a MySQL client and go to town.</p>\n\n<p>Frontier model LLMs have vast world knowledge. They generalize extremely well. At ElixirConfEU, I did a <a href='https://www.youtube.com/watch?v=ojL_VHc4gLk&t=3923s' title=''>demo vibe-coding Tetris</a> on stage. Phoenix.new nailed it, first try, first prompt. It’s not like there’s gobs of Phoenix LiveView Tetris examples floating around the Internet! But lots of people have published Tetris code, and lots of people have written LiveView stuff, and 2025 LLMs can connect those dots.</p>\n\n<p>At this point you might be wondering – can I just ask it to build a Rails app? Or an Expo React Native app? Or Svelte? Or Go?</p>\n\n<p>Yes, you can.</p>\n\n<p>Our system prompt is tuned for Phoenix today, but all languages you care about are already installed. We’re still figuring out where to take this, but adding new languages and frameworks definitely ranks highly in my plans.</p>\n<h2 id='our-async-agent-future' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-async-agent-future' aria-label='Anchor'></a><span class='plain-code'>Our Async Agent Future</span></h2>\n<p><a href='https://fly.io/blog/youre-all-nuts/' title=''>We’re at a massive step-change in developer workflows</a>.</p>\n\n<p>Agents can do real work, today, with or without a human present. Buckle up: the future of development, at least in the common case, probably looks less like cracking open a shell and finding a file to edit, and more like popping into a CI environment with agents working away around the clock.</p>\n\n<p>Local development isn’t going away. But there’s going to be a shift in where the majority of our iterations take place. I’m already using Phoenix.new to triage <code>phoenix-core</code> Github issues and pick problems to solve. I close my laptop, grab a cup of coffee, and wait for a PR to arrive — Phoenix.new knows how PRs work, too. We’re already here, and this space is just getting started.</p>\n\n<p>This isn’t where I thought I’d end up when I started poking around. The Phoenix and LiveView journey was much the same. Something special was there and the projects took on a life of their own. I’m excited to share this work now, and see where it might take us. I can’t wait to see what folks build.</p>",
      "image": {
        "url": "https://fly.io/blog/phoenix-new-the-remote-ai-runtime/assets/phoenixnew-thumb.png",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/mcps-everywhere/",
      "title": "What are MCP Servers?",
      "description": null,
      "url": "https://fly.io/blog/mcps-everywhere/",
      "published": "2025-06-12T00:00:00.000Z",
      "updated": "2025-06-12T16:59:12.000Z",
      "content": "<div class=\"lead\"><div><p>With Fly.io, <a href=\"https://fly.io/docs/speedrun/\" title=\"\">you can get your app running globally in a matter of minutes</a>, and with MCP servers you can integrate with Claude, VSCode, Cursor and <a href=\"https://modelcontextprotocol.io/clients\">many more AI clients</a>.  <a href=\"https://fly.io/docs/mcp/\" title=\"\">Try it out for yourself</a>!</p>\n</div></div>\n<p>The introduction to <a href='https://modelcontextprotocol.io/introduction' title=''>Model Context Protocol</a> starts out with:</p>\n\n<blockquote>\n<p>MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.</p>\n</blockquote>\n\n<p>That paragraph, to me, is both comforting (“USB for LLM”? Cool! Got it!), and simultaneously vacuous (Um, but what do I actually <em>do</em> with this?).</p>\n\n<p>I’ve been digging deeper and have come up with a few more analogies and observations that make sense to me. Perhaps one or more of these will help you.</p>\n<h2 id='mcps-are-alexa-skills' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-alexa-skills' aria-label='Anchor'></a><span class='plain-code'>MCPs are Alexa Skills</span></h2>\n<p>You buy an Echo Dot and a Hue light. You plug them both in and connect them to your Wi-Fi. There is one more step you need to do: you need to install and enable the Philips Hue skill to connect the two.</p>\n\n<p>Now you might be using Siri or Google Assistant. Or you may want to connect a Ring Doorbell camera or Google Nest Thermostat. But the principle is the same, though the analogy is slightly stronger with a skill (which is a noun) as opposed to the action of pairing your Hue Bridge with Apple HomeKit (a verb).</p>\n<h2 id='mcps-are-api-2-0' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-api-2-0' aria-label='Anchor'></a><span class='plain-code'>MCPs are API 2.0</span></h2>\n<p>HTTP 1.1 is simple. You send a request, you get a response. While there are cookies and sessions, it is pretty much stateless and therefore inefficient, as each request needs to establish a new connection. WebSockets and Server-Sent Events (SSE) mitigate this a bit.</p>\n\n<p><a href='https://en.wikipedia.org/wiki/HTTP/2' title=''>HTTP 2.0</a> introduces features like multiplexing and server push. When you visit a web page for the first time, your browser can now request all of the associated JavaScript, CSS, and images at once, and the server can respond in any order.</p>\n\n<p>APIs today are typically request/response. MCPs support multiplexing and server push.</p>\n<h2 id='mcps-are-apis-with-introspection-reflection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-apis-with-introspection-reflection' aria-label='Anchor'></a><span class='plain-code'>MCPs are APIs with Introspection/Reflection</span></h2>\n<p>With <a href='https://learn.openapis.org/' title=''>OpenAPI</a>, requests are typically JSON, and responses are too. Many OpenAPI providers publish a separate <a href='https://learn.openapis.org/specification/structure.html' title=''>OpenAPI Description (OAD)</a>, which contains a schema describing what requests are supported by that API.</p>\n\n<p>With MCP, requests are also JSON and responses are also JSON, but in addition, there are standard requests built into the protocol to obtain useful information, including what tools are provided, what arguments each tool expects, and a prose description of how each tool is expected to be used.</p>\n\n<p>As an aside, don’t automatically assume that you will get good results from <a href='https://neon.com/blog/autogenerating-mcp-servers-openai-schemas' title=''>auto-generating MCP Servers from OpenAPI schemas</a>:</p>\n\n<blockquote>\n<p>Effective MCP design means thinking about the workflows you want the agent to perform and potentially creating higher-level tools that encapsulate multiple API calls, rather than just exposing the raw building blocks of your entire API spec.</p>\n</blockquote>\n\n<p><a href='https://glama.ai/blog/2025-06-06-mcp-vs-api#five-fundamental-differences' title=''>MCP vs API</a> goes into this topic at greater debth.</p>\n\n<p>In many cases you will get better results treating LLMs as humans. If you have a CLI, consider using that as the starting point instead.</p>\n<h2 id='mcps-are-not-serverless' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-not-serverless' aria-label='Anchor'></a><span class='plain-code'>MCPs are <strong><i>not</i></strong> serverless</span></h2>\n<p><a href='https://en.wikipedia.org/wiki/Serverless_computing' title=''>Serverless</a>, sometimes known as <a href='https://en.wikipedia.org/wiki/Function_as_a_service' title=''>FaaS</a>, is a popular cloud computing model, where AWS Lambda is widely recognized as the archetype.</p>\n\n<p>MCP servers are not serverless; they have a well-defined and long-lived <a href='https://modelcontextprotocol.io/specification/2025-03-26/basic/lifecycle' title=''>lifecycle</a>:</p>\n\n<p><svg aria-roledescription=\"sequence\" role=\"graphics-document document\" viewBox=\"-50 -10 482 651\" style=\"max-width: 482px;\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" xmlns=\"http://www.w3.org/2000/svg\" width=\"100%\" id=\"rm\"><rect class=\"rect\" height=\"70\" width=\"302\" fill=\"rgb(200, 220, 250)\" y=\"325\" x=\"40\"></rect><g><rect class=\"actor actor-bottom\" ry=\"3\" rx=\"3\" name=\"Server\" height=\"65\" width=\"150\" stroke=\"#666\" fill=\"#eaeaea\" y=\"565\" x=\"232\"></rect><text class=\"actor actor-box\" alignment-baseline=\"central\" dominant-baseline=\"central\" style=\"text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;\" y=\"597.5\" x=\"307\"><tspan dy=\"0\" x=\"307\">Server</tspan></text></g><g><rect class=\"actor actor-bottom\" ry=\"3\" rx=\"3\" name=\"Client\" height=\"65\" width=\"150\" stroke=\"#666\" fill=\"#eaeaea\" y=\"565\" x=\"0\"></rect><text class=\"actor actor-box\" alignment-baseline=\"central\" dominant-baseline=\"central\" style=\"text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;\" y=\"597.5\" x=\"75\"><tspan dy=\"0\" x=\"75\">Client</tspan></text></g><g><line name=\"Server\" stroke=\"#999\" stroke-width=\"0.5px\" class=\"actor-line 200\" y2=\"565\" x2=\"307\" y1=\"65\" x1=\"307\" id=\"actor10\"></line><g id=\"root-10\"><rect class=\"actor actor-top\" ry=\"3\" rx=\"3\" name=\"Server\" height=\"65\" width=\"150\" stroke=\"#666\" fill=\"#eaeaea\" y=\"0\" x=\"232\"></rect><text class=\"actor actor-box\" alignment-baseline=\"central\" dominant-baseline=\"central\" style=\"text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;\" y=\"32.5\" x=\"307\"><tspan dy=\"0\" x=\"307\">Server</tspan></text></g></g><g><line name=\"Client\" stroke=\"#999\" stroke-width=\"0.5px\" class=\"actor-line 200\" y2=\"565\" x2=\"75\" y1=\"65\" x1=\"75\" id=\"actor9\"></line><g id=\"root-9\"><rect class=\"actor actor-top\" ry=\"3\" rx=\"3\" name=\"Client\" height=\"65\" width=\"150\" stroke=\"#666\" fill=\"#eaeaea\" y=\"0\" x=\"0\"></rect><text class=\"actor actor-box\" alignment-baseline=\"central\" dominant-baseline=\"central\" style=\"text-anchor: middle; font-size: 16px; font-weight: 400; font-family: inherit;\" y=\"32.5\" x=\"75\"><tspan dy=\"0\" x=\"75\">Client</tspan></text></g></g><style>#rm{font-family:inherit;font-size:16px;fill:#333;}#rm .error-icon{fill:#552222;}#rm .error-text{fill:#552222;stroke:#552222;}#rm .edge-thickness-normal{stroke-width:1px;}#rm .edge-thickness-thick{stroke-width:3.5px;}#rm .edge-pattern-solid{stroke-dasharray:0;}#rm .edge-thickness-invisible{stroke-width:0;fill:none;}#rm .edge-pattern-dashed{stroke-dasharray:3;}#rm .edge-pattern-dotted{stroke-dasharray:2;}#rm .marker{fill:#333333;stroke:#333333;}#rm .marker.cross{stroke:#333333;}#rm svg{font-family:inherit;font-size:16px;}#rm p{margin:0;}#rm .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm text.actor>tspan{fill:black;stroke:none;}#rm .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#rm .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#rm .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#rm #arrowhead path{fill:#333;stroke:#333;}#rm .sequenceNumber{fill:white;}#rm #sequencenumber{fill:#333;}#rm #crosshead path{fill:#333;stroke:#333;}#rm .messageText{fill:#333;stroke:none;}#rm .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm .labelText,#rm .labelText>tspan{fill:black;stroke:none;}#rm .loopText,#rm .loopText>tspan{fill:black;stroke:none;}#rm .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#rm .note{stroke:#aaaa33;fill:#fff5ad;}#rm .noteText,#rm .noteText>tspan{fill:black;stroke:none;}#rm .activation0{fill:#f4f4f4;stroke:#666;}#rm .activation1{fill:#f4f4f4;stroke:#666;}#rm .activation2{fill:#f4f4f4;stroke:#666;}#rm .actorPopupMenu{position:absolute;}#rm .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#rm .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#rm .actor-man circle,#rm line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#rm :root{--mermaid-font-family:inherit;}</style><g></g><defs><symbol height=\"24\" width=\"24\" id=\"computer\"><path d=\"M2 2v13h20v-13h-20zm18 11h-16v-9h16v9zm-10.228 6l.466-1h3.524l.467 1h-4.457zm14.228 3h-24l2-6h2.104l-1.33 4h18.45l-1.297-4h2.073l2 6zm-5-10h-14v-7h14v7z\" transform=\"scale(.5)\"></path></symbol></defs><defs><symbol clip-rule=\"evenodd\" fill-rule=\"evenodd\" id=\"database\"><path d=\"M12.258.001l.256.004.255.005.253.008.251.01.249.012.247.015.246.016.242.019.241.02.239.023.236.024.233.027.231.028.229.031.225.032.223.034.22.036.217.038.214.04.211.041.208.043.205.045.201.046.198.048.194.05.191.051.187.053.183.054.18.056.175.057.172.059.168.06.163.061.16.063.155.064.15.066.074.033.073.033.071.034.07.034.069.035.068.035.067.035.066.035.064.036.064.036.062.036.06.036.06.037.058.037.058.037.055.038.055.038.053.038.052.038.051.039.05.039.048.039.047.039.045.04.044.04.043.04.041.04.04.041.039.041.037.041.036.041.034.041.033.042.032.042.03.042.029.042.027.042.026.043.024.043.023.043.021.043.02.043.018.044.017.043.015.044.013.044.012.044.011.045.009.044.007.045.006.045.004.045.002.045.001.045v17l-.001.045-.002.045-.004.045-.006.045-.007.045-.009.044-.011.045-.012.044-.013.044-.015.044-.017.043-.018.044-.02.043-.021.043-.023.043-.024.043-.026.043-.027.042-.029.042-.03.042-.032.042-.033.042-.034.041-.036.041-.037.041-.039.041-.04.041-.041.04-.043.04-.044.04-.045.04-.047.039-.048.039-.05.039-.051.039-.052.038-.053.038-.055.038-.055.038-.058.037-.058.037-.06.037-.06.036-.062.036-.064.036-.064.036-.066.035-.067.035-.068.035-.069.035-.07.034-.071.034-.073.033-.074.033-.15.066-.155.064-.16.063-.163.061-.168.06-.172.059-.175.057-.18.056-.183.054-.187.053-.191.051-.194.05-.198.048-.201.046-.205.045-.208.043-.211.041-.214.04-.217.038-.22.036-.223.034-.225.032-.229.031-.231.028-.233.027-.236.024-.239.023-.241.02-.242.019-.246.016-.247.015-.249.012-.251.01-.253.008-.255.005-.256.004-.258.001-.258-.001-.256-.004-.255-.005-.253-.008-.251-.01-.249-.012-.247-.015-.245-.016-.243-.019-.241-.02-.238-.023-.236-.024-.234-.027-.231-.028-.228-.031-.226-.032-.223-.034-.22-.036-.217-.038-.214-.04-.211-.041-.208-.043-.204-.045-.201-.046-.198-.048-.195-.05-.19-.051-.187-.053-.184-.054-.179-.056-.176-.057-.172-.059-.167-.06-.164-.061-.159-.063-.155-.064-.151-.066-.074-.033-.072-.033-.072-.034-.07-.034-.069-.035-.068-.035-.067-.035-.066-.035-.064-.036-.063-.036-.062-.036-.061-.036-.06-.037-.058-.037-.057-.037-.056-.038-.055-.038-.053-.038-.052-.038-.051-.039-.049-.039-.049-.039-.046-.039-.046-.04-.044-.04-.043-.04-.041-.04-.04-.041-.039-.041-.037-.041-.036-.041-.034-.041-.033-.042-.032-.042-.03-.042-.029-.042-.027-.042-.026-.043-.024-.043-.023-.043-.021-.043-.02-.043-.018-.044-.017-.043-.015-.044-.013-.044-.012-.044-.011-.045-.009-.044-.007-.045-.006-.045-.004-.045-.002-.045-.001-.045v-17l.001-.045.002-.045.004-.045.006-.045.007-.045.009-.044.011-.045.012-.044.013-.044.015-.044.017-.043.018-.044.02-.043.021-.043.023-.043.024-.043.026-.043.027-.042.029-.042.03-.042.032-.042.033-.042.034-.041.036-.041.037-.041.039-.041.04-.041.041-.04.043-.04.044-.04.046-.04.046-.039.049-.039.049-.039.051-.039.052-.038.053-.038.055-.038.056-.038.057-.037.058-.037.06-.037.061-.036.062-.036.063-.036.064-.036.066-.035.067-.035.068-.035.069-.035.07-.034.072-.034.072-.033.074-.033.151-.066.155-.064.159-.063.164-.061.167-.06.172-.059.176-.057.179-.056.184-.054.187-.053.19-.051.195-.05.198-.048.201-.046.204-.045.208-.043.211-.041.214-.04.217-.038.22-.036.223-.034.226-.032.228-.031.231-.028.234-.027.236-.024.238-.023.241-.02.243-.019.245-.016.247-.015.249-.012.251-.01.253-.008.255-.005.256-.004.258-.001.258.001zm-9.258 20.499v.01l.001.021.003.021.004.022.005.021.006.022.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.023.018.024.019.024.021.024.022.025.023.024.024.025.052.049.056.05.061.051.066.051.07.051.075.051.079.052.084.052.088.052.092.052.097.052.102.051.105.052.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.048.144.049.147.047.152.047.155.047.16.045.163.045.167.043.171.043.176.041.178.041.183.039.187.039.19.037.194.035.197.035.202.033.204.031.209.03.212.029.216.027.219.025.222.024.226.021.23.02.233.018.236.016.24.015.243.012.246.01.249.008.253.005.256.004.259.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.021.224-.024.22-.026.216-.027.212-.028.21-.031.205-.031.202-.034.198-.034.194-.036.191-.037.187-.039.183-.04.179-.04.175-.042.172-.043.168-.044.163-.045.16-.046.155-.046.152-.047.148-.048.143-.049.139-.049.136-.05.131-.05.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.053.083-.051.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.05.023-.024.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.023.01-.022.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.127l-.077.055-.08.053-.083.054-.085.053-.087.052-.09.052-.093.051-.095.05-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.045-.118.044-.12.043-.122.042-.124.042-.126.041-.128.04-.13.04-.132.038-.134.038-.135.037-.138.037-.139.035-.142.035-.143.034-.144.033-.147.032-.148.031-.15.03-.151.03-.153.029-.154.027-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.01-.179.008-.179.008-.181.006-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.006-.179-.008-.179-.008-.178-.01-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.027-.153-.029-.151-.03-.15-.03-.148-.031-.146-.032-.145-.033-.143-.034-.141-.035-.14-.035-.137-.037-.136-.037-.134-.038-.132-.038-.13-.04-.128-.04-.126-.041-.124-.042-.122-.042-.12-.044-.117-.043-.116-.045-.113-.045-.112-.046-.109-.047-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.05-.093-.052-.09-.051-.087-.052-.085-.053-.083-.054-.08-.054-.077-.054v4.127zm0-5.654v.011l.001.021.003.021.004.021.005.022.006.022.007.022.009.022.01.022.011.023.012.023.013.023.015.024.016.023.017.024.018.024.019.024.021.024.022.024.023.025.024.024.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.052.11.051.114.051.119.052.123.05.127.051.131.05.135.049.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.044.171.042.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.022.23.02.233.018.236.016.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.012.241-.015.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.048.139-.05.136-.049.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.051.051-.049.023-.025.023-.024.021-.025.02-.024.019-.024.018-.024.017-.024.015-.023.014-.023.013-.024.012-.022.01-.023.01-.023.008-.022.006-.022.006-.022.004-.021.004-.022.001-.021.001-.021v-4.139l-.077.054-.08.054-.083.054-.085.052-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.044-.118.044-.12.044-.122.042-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.035-.143.033-.144.033-.147.033-.148.031-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.009-.179.009-.179.007-.181.007-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.007-.179-.007-.179-.009-.178-.009-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.031-.146-.033-.145-.033-.143-.033-.141-.035-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.04-.126-.041-.124-.042-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.051-.093-.051-.09-.051-.087-.053-.085-.052-.083-.054-.08-.054-.077-.054v4.139zm0-5.666v.011l.001.02.003.022.004.021.005.022.006.021.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.024.018.023.019.024.021.025.022.024.023.024.024.025.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.051.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.043.171.043.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.021.23.02.233.018.236.017.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.013.241-.014.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.049.139-.049.136-.049.131-.051.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.049.023-.025.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.022.01-.023.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.153l-.077.054-.08.054-.083.053-.085.053-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.048-.105.048-.106.048-.109.046-.111.046-.114.046-.115.044-.118.044-.12.043-.122.043-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.034-.143.034-.144.033-.147.032-.148.032-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.024-.161.024-.162.023-.163.023-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.01-.178.01-.179.009-.179.007-.181.006-.182.006-.182.004-.184.003-.184.001-.185.001-.185-.001-.184-.001-.184-.003-.182-.004-.182-.006-.181-.006-.179-.007-.179-.009-.178-.01-.176-.01-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.023-.162-.023-.161-.024-.159-.024-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.032-.146-.032-.145-.033-.143-.034-.141-.034-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.041-.126-.041-.124-.041-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.048-.105-.048-.102-.048-.1-.05-.097-.049-.095-.051-.093-.051-.09-.052-.087-.052-.085-.053-.083-.053-.08-.054-.077-.054v4.153zm8.74-8.179l-.257.004-.254.005-.25.008-.247.011-.244.012-.241.014-.237.016-.233.018-.231.021-.226.022-.224.023-.22.026-.216.027-.212.028-.21.031-.205.032-.202.033-.198.034-.194.036-.191.038-.187.038-.183.04-.179.041-.175.042-.172.043-.168.043-.163.045-.16.046-.155.046-.152.048-.148.048-.143.048-.139.049-.136.05-.131.05-.126.051-.123.051-.118.051-.114.052-.11.052-.106.052-.101.052-.096.052-.092.052-.088.052-.083.052-.079.052-.074.051-.07.052-.065.051-.06.05-.056.05-.051.05-.023.025-.023.024-.021.024-.02.025-.019.024-.018.024-.017.023-.015.024-.014.023-.013.023-.012.023-.01.023-.01.022-.008.022-.006.023-.006.021-.004.022-.004.021-.001.021-.001.021.001.021.001.021.004.021.004.022.006.021.006.023.008.022.01.022.01.023.012.023.013.023.014.023.015.024.017.023.018.024.019.024.02.025.021.024.023.024.023.025.051.05.056.05.06.05.065.051.07.052.074.051.079.052.083.052.088.052.092.052.096.052.101.052.106.052.11.052.114.052.118.051.123.051.126.051.131.05.136.05.139.049.143.048.148.048.152.048.155.046.16.046.163.045.168.043.172.043.175.042.179.041.183.04.187.038.191.038.194.036.198.034.202.033.205.032.21.031.212.028.216.027.22.026.224.023.226.022.231.021.233.018.237.016.241.014.244.012.247.011.25.008.254.005.257.004.26.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.022.224-.023.22-.026.216-.027.212-.028.21-.031.205-.032.202-.033.198-.034.194-.036.191-.038.187-.038.183-.04.179-.041.175-.042.172-.043.168-.043.163-.045.16-.046.155-.046.152-.048.148-.048.143-.048.139-.049.136-.05.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.05.051-.05.023-.025.023-.024.021-.024.02-.025.019-.024.018-.024.017-.023.015-.024.014-.023.013-.023.012-.023.01-.023.01-.022.008-.022.006-.023.006-.021.004-.022.004-.021.001-.021.001-.021-.001-.021-.001-.021-.004-.021-.004-.022-.006-.021-.006-.023-.008-.022-.01-.022-.01-.023-.012-.023-.013-.023-.014-.023-.015-.024-.017-.023-.018-.024-.019-.024-.02-.025-.021-.024-.023-.024-.023-.025-.051-.05-.056-.05-.06-.05-.065-.051-.07-.052-.074-.051-.079-.052-.083-.052-.088-.052-.092-.052-.096-.052-.101-.052-.106-.052-.11-.052-.114-.052-.118-.051-.123-.051-.126-.051-.131-.05-.136-.05-.139-.049-.143-.048-.148-.048-.152-.048-.155-.046-.16-.046-.163-.045-.168-.043-.172-.043-.175-.042-.179-.041-.183-.04-.187-.038-.191-.038-.194-.036-.198-.034-.202-.033-.205-.032-.21-.031-.212-.028-.216-.027-.22-.026-.224-.023-.226-.022-.231-.021-.233-.018-.237-.016-.241-.014-.244-.012-.247-.011-.25-.008-.254-.005-.257-.004-.26-.001-.26.001z\" transform=\"scale(.5)\"></path></symbol></defs><defs><symbol height=\"24\" width=\"24\" id=\"clock\"><path d=\"M12 2c5.514 0 10 4.486 10 10s-4.486 10-10 10-10-4.486-10-10 4.486-10 10-10zm0-2c-6.627 0-12 5.373-12 12s5.373 12 12 12 12-5.373 12-12-5.373-12-12-12zm5.848 12.459c.202.038.202.333.001.372-1.907.361-6.045 1.111-6.547 1.111-.719 0-1.301-.582-1.301-1.301 0-.512.77-5.447 1.125-7.445.034-.192.312-.181.343.014l.985 6.238 5.394 1.011z\" transform=\"scale(.5)\"></path></symbol></defs><defs><marker orient=\"auto-start-reverse\" markerHeight=\"12\" markerWidth=\"12\" markerUnits=\"userSpaceOnUse\" refY=\"5\" refX=\"7.9\" id=\"arrowhead\"><path d=\"M -1 0 L 10 5 L 0 10 z\"></path></marker></defs><defs><marker refY=\"4.5\" refX=\"4\" orient=\"auto\" markerHeight=\"8\" markerWidth=\"15\" id=\"crosshead\"><path d=\"M 1,2 L 6,7 M 6,2 L 1,7\" stroke-width=\"1pt\" style=\"stroke-dasharray: 0px, 0px;\" stroke=\"#000000\" fill=\"none\"></path></marker></defs><defs><marker orient=\"auto\" markerHeight=\"28\" markerWidth=\"20\" refY=\"7\" refX=\"15.5\" id=\"filled-head\"><path d=\"M 18,7 L9,13 L14,7 L9,1 Z\"></path></marker></defs><defs><marker orient=\"auto\" markerHeight=\"40\" markerWidth=\"60\" refY=\"15\" refX=\"15\" id=\"sequencenumber\"><circle r=\"6\" cy=\"15\" cx=\"15\"></circle></marker></defs><g><rect class=\"note\" height=\"40\" width=\"282\" stroke=\"#666\" fill=\"#EDF2AE\" y=\"75\" x=\"50\"></rect><text dy=\"1em\" class=\"noteText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"80\" x=\"191\"><tspan x=\"191\">Initialization Phase</tspan></text></g><g><rect class=\"activation0\" height=\"380\" width=\"10\" stroke=\"#666\" fill=\"#EDF2AE\" y=\"115\" x=\"70\"></rect></g><g><rect class=\"activation0\" height=\"328\" width=\"10\" stroke=\"#666\" fill=\"#EDF2AE\" y=\"167\" x=\"302\"></rect></g><g><rect class=\"note\" height=\"40\" width=\"282\" stroke=\"#666\" fill=\"#EDF2AE\" y=\"275\" x=\"50\"></rect><text dy=\"1em\" class=\"noteText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"280\" x=\"191\"><tspan x=\"191\">Operation Phase</tspan></text></g><g><rect class=\"note\" height=\"40\" width=\"282\" stroke=\"#666\" fill=\"#EDF2AE\" y=\"345\" x=\"50\"></rect><text dy=\"1em\" class=\"noteText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"350\" x=\"191\"><tspan x=\"191\">Normal protocol operations</tspan></text></g><g><rect class=\"note\" height=\"40\" width=\"282\" stroke=\"#666\" fill=\"#EDF2AE\" y=\"405\" x=\"50\"></rect><text dy=\"1em\" class=\"noteText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"410\" x=\"191\"><tspan x=\"191\">Shutdown</tspan></text></g><g><rect class=\"note\" height=\"40\" width=\"282\" stroke=\"#666\" fill=\"#EDF2AE\" y=\"505\" x=\"50\"></rect><text dy=\"1em\" class=\"noteText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"510\" x=\"191\"><tspan x=\"191\">Connection closed</tspan></text></g><text dy=\"1em\" class=\"messageText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"130\" x=\"190\">initialize request</text><line marker-end=\"url(#arrowhead)\" style=\"fill: none;\" stroke=\"none\" stroke-width=\"2\" class=\"messageLine0\" y2=\"165\" x2=\"299\" y1=\"165\" x1=\"80\"></line><text dy=\"1em\" class=\"messageText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"180\" x=\"193\">initialize response</text><line marker-end=\"url(#arrowhead)\" stroke=\"none\" stroke-width=\"2\" class=\"messageLine1\" style=\"stroke-dasharray: 3px, 3px; fill: none;\" y2=\"215\" x2=\"83\" y1=\"215\" x1=\"302\"></line><text dy=\"1em\" class=\"messageText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"230\" x=\"190\">initialized notification</text><line marker-end=\"url(#filled-head)\" stroke=\"none\" stroke-width=\"2\" class=\"messageLine1\" style=\"stroke-dasharray: 3px, 3px; fill: none;\" y2=\"265\" x2=\"299\" y1=\"265\" x1=\"80\"></line><text dy=\"1em\" class=\"messageText\" style=\"font-family: inherit; font-size: 16px; font-weight: 400;\" alignment-baseline=\"middle\" dominant-baseline=\"middle\" text-anchor=\"middle\" y=\"460\" x=\"190\">Disconnect</text><line marker-end=\"url(#filled-head)\" stroke=\"none\" stroke-width=\"2\" class=\"messageLine1\" style=\"stroke-dasharray: 3px, 3px; fill: none;\" y2=\"495\" x2=\"299\" y1=\"495\" x1=\"80\"></line></svg></p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>You can play with this right now.</h1>\n    <p>MCPs are barely six months old, but we are keeping up with the latest</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/docs/mcp\">\n        Try launching your MCP server on Fly.io today <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-kitty.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='mcps-are-not-inherently-secure-or-private' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-are-not-inherently-secure-or-private' aria-label='Anchor'></a><span class='plain-code'>MCPs are <strong><i>not</i></strong> Inherently Secure or Private</span></h2>\n<p>Here I am not talking about <a href='https://simonwillison.net/2025/Apr/9/mcp-prompt-injection/' title=''>prompt injection</a> or <a href='https://simonwillison.net/2025/May/26/github-mcp-exploited/' title=''>exploitable abilities</a>, though those are real problems too.</p>\n\n<p>I’m talking about something more fundamental and basic. Let’s take a look at the very same <a href='https://github.com/github/github-mcp-server' title=''>GitHub MCP</a> featured in the above two descriptions of an exploitable exposure. The current process for installing a stdio MCP into Claude involves placing secrets in plain text in a well-defined location. And the process for installing the <em>next</em> MCP server is to download a program from a third party and run that tool in a way that has access to this very file.</p>\n\n<p>Addressing MCP security requires a holistic approach, but one key strategy component is the ability to run an MCP server on a remote machine which can only be accessed by you, and only after you present a revocable bearer token. That remote machine may require access to secrets (like your GitHub token), but those secrets are not something either your LLM client or other MCP servers will be able to access directly.</p>\n<h2 id='mcps-should-be-considered-family' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#mcps-should-be-considered-family' aria-label='Anchor'></a><span class='plain-code'>MCPs should be considered family</span></h2>\n<p>Recapping: <a href='https://www.usa.philips.com/' title=''>Philips</a> has an <a href='https://developers.meethue.com/' title=''>API and SDK</a> for Hue that is used by perhaps thousands, and has an <a href='https://www.amazon.com/Philips-Hue/dp/B01LWAGUG7' title=''>Alexa Skill</a> that is used by untold millions. Of course, somebody already built a <a href='https://github.com/ThomasRohde/hue-mcp' title=''>Philips Hue MCP Server</a>.</p>\n\n<p>LLMs are brains. MCPs give LLMs eyes, hands, and legs. The end result is a robot. Most MCPs today are brainless—they merely are eyes, hands, and legs. The results are appliances. I buy and install a dishwasher. You do too. Both of our dishwashers perform the same basic function. As do any Hue lights we may buy.</p>\n\n<p>In The Jetsons, <a href='https://thejetsons.fandom.com/wiki/Rosey' title=''>Rosie</a> is a maid and housekeeper who is also a member of the family. She is good friends with Jane and takes the role of a surrogate aunt towards Elroy and Judy. Let’s start there and go further.</p>\n\n<p>A person with an LLM can build things. Imagine having a 3D printer that makes robots or agents that act on your behalf. You may want an agent to watch for trends in your business, keep track of what events are happening in your area, screen your text messages, or act as a DJ when you get together with friends.</p>\n\n<p>You may share the MCP servers you create with others to use as starting points, but they undoubtedly will adapt them and make them their own.</p>\n<h2 id='closing-thoughts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#closing-thoughts' aria-label='Anchor'></a><span class='plain-code'>Closing Thoughts</span></h2>\n<p>Don’t get me wrong. I am not saying there won’t be MCPs that are mere appliances—there undoubtedly will be plenty of those. But not all MCPs will be appliances; many will be agents.</p>\n\n<p>Today, most people exploring LLMs are trying to add LLMs to things they already have—for example, IDEs. MCPs flip this. Instead of adding LLMs to something you already have, you add something you already have to an LLM.</p>\n\n<p><a href='https://desktopcommander.app/' title=''>Desktop Commander MCP</a> is an example I’m particularly fond of that does exactly this. It also happens to be a prime example of a tool you want to give root access to, then lock it in a secure box and run someplace other than your laptop. <a href='https://fly.io/docs/mcp/examples/#desktop-commander' title=''>Give it a try</a>.</p>\n\n<p>Microsoft is actively working on <a href='https://developer.microsoft.com/en-us/windows/agentic/' title=''>Agentic Windows</a>. Your smartphone is the obvious next frontier. Today, you have an app store and a web browser. MCP servers have the potential to make both obsolete—and this could happen sooner than you think.</p>",
      "image": {
        "url": "https://fly.io/blog/mcps-everywhere/assets/mcps-everywhere.jpg",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/youre-all-nuts/",
      "title": "My AI Skeptic Friends Are All Nuts",
      "description": null,
      "url": "https://fly.io/blog/youre-all-nuts/",
      "published": "2025-06-02T00:00:00.000Z",
      "updated": "2025-06-10T21:38:22.000Z",
      "content": "<div class=\"lead\"><p>A heartfelt provocation about AI-assisted programming.</p>\n</div>\n<p>Tech execs are mandating LLM adoption. That’s bad strategy. But I get where they’re coming from.</p>\n\n<p>Some of the smartest people I know share a bone-deep belief that AI is a fad —  the next iteration of NFT mania. I’ve been reluctant to push back on them, because, well, they’re smarter than me. But their arguments are unserious, and worth confronting. Extraordinarily talented people are doing work that LLMs already do better, out of  spite.</p>\n\n<p>All progress on LLMs could halt today, and LLMs would remain the 2nd most important thing to happen over the course of my career.</p>\n<div class=\"callout\"><p><strong class=\"font-semibold text-navy-950\">Important caveat</strong>: I’m discussing only the implications of LLMs for software development. For art, music, and writing? I got nothing. I’m inclined to believe the skeptics in those fields. I just don’t believe them about mine.</p>\n</div>\n<p>Bona fides: I’ve been shipping software since the mid-1990s. I started out in boxed, shrink-wrap C code.  Survived an ill-advised <a href='https://www.amazon.com/Modern-Design-Generic-Programming-Patterns/dp/0201704315' title=''>Alexandrescu</a> C++ phase. Lots of Ruby and Python tooling.  Some kernel work. A whole lot of server-side C, Go, and Rust. However you define “serious developer”, I qualify. Even if only on one of your lower tiers.</p>\n<h3 id='level-setting' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#level-setting' aria-label='Anchor'></a><span class='plain-code'>level setting</span></h3><div class=\"right-sidenote\"><p>† (or, God forbid, 2 years ago with Copilot)</p>\n</div>\n<p>First, we need to get on the same page. If you were trying and failing to use an LLM for code 6 months ago †, you’re not doing what most serious LLM-assisted coders are doing.</p>\n\n<p>People coding with LLMs today use agents. Agents get to poke around your codebase on their own. They author files directly. They run tools. They compile code, run tests, and iterate on the results. They also:</p>\n\n<ul>\n<li>pull in arbitrary code from the tree, or from other trees online, into their context windows,\n</li><li>run standard Unix tools to navigate the tree and extract information,\n</li><li>interact with Git,\n</li><li>run existing tooling, like linters, formatters, and model checkers, and\n</li><li>make essentially arbitrary tool calls (that you set up) through MCP.\n</li></ul>\n<div class=\"callout\"><p>The code in an agent that actually “does stuff” with code is not, itself, AI. This should reassure you. It’s surprisingly simple systems code, wired to ground truth about programming in the same way a Makefile is. You could write an effective coding agent in a weekend. Its strengths would have more to do with how you think about and structure builds and linting and test harnesses than with how advanced o3 or Sonnet have become.</p>\n</div>\n<p>If you’re making requests on a ChatGPT page and then pasting the resulting (broken) code into your editor, you’re not doing what the AI boosters are doing. No wonder you’re talking past each other.</p>\n<h3 id='the-positive-case' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-positive-case' aria-label='Anchor'></a><span class='plain-code'>the positive case</span></h3>\n<p><img alt=\"four quadrants of tedium and importance\" src=\"/blog/youre-all-nuts/assets/code-quad.png?2/3&center\" /></p>\n\n<p>LLMs can write a large fraction of all the tedious code you’ll ever need to write. And most code on most projects is tedious. LLMs drastically reduce the number of things you’ll ever need to Google. They look things up themselves. Most importantly, they don’t get tired; they’re immune to inertia.</p>\n\n<p>Think of anything you wanted to build but didn’t. You tried to home in on some first steps. If you’d been in the limerent phase of a new programming language, you’d have started writing. But you weren’t, so you put it off, for a day, a year, or your whole career.</p>\n\n<p>I can feel my blood pressure rising thinking of all the bookkeeping and Googling and dependency drama of a new project. An LLM can be instructed to just figure all that shit out. Often, it will drop you precisely at that golden moment where shit almost works, and development means tweaking code and immediately seeing things work better. That dopamine hit is why I code.</p>\n\n<p>There’s a downside. Sometimes, gnarly stuff needs doing. But you don’t wanna do it. So you refactor unit tests, soothing yourself with the lie that you’re doing real work. But an LLM can be told to go refactor all your unit tests. An agent can occupy itself for hours putzing with your tests in a VM and come back later with a PR. If you listen to me, you’ll know that. You’ll feel worse yak-shaving. You’ll end up doing… real work.</p>\n<h3 id='but-you-have-no-idea-what-the-code-is' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-you-have-no-idea-what-the-code-is' aria-label='Anchor'></a><span class='plain-code'>but you have no idea what the code is</span></h3>\n<p>Are you a vibe coding Youtuber? Can you not read code? If so: astute point. Otherwise: what the fuck is wrong with you?</p>\n\n<p>You’ve always been responsible for what you merge to <code>main</code>. You were five years go. And you are tomorrow, whether or not you use an LLM.</p>\n\n<p>If you build something with an LLM that people will depend on, read the code. In fact, you’ll probably do more than that. You’ll spend 5-10 minutes knocking it back into your own style. LLMs are <a href='https://github.com/PatrickJS/awesome-cursorrules' title=''>showing signs of adapting</a> to local idiom, but we’re not there yet.</p>\n\n<p>People complain about LLM-generated code being “probabilistic”. No it isn’t. It’s code. It’s not Yacc output. It’s knowable. The LLM might be stochastic. But the LLM doesn’t matter. What matters is whether you can make sense of the result, and whether your guardrails hold.</p>\n\n<p>Reading other people’s code is part of the job. If you can’t metabolize the boring, repetitive code an LLM generates: skills issue! How are you handling the chaos human developers turn out on a deadline?</p>\n<div class=\"right-sidenote\"><p>† (because it can hold 50-70kloc in its context window)</p>\n</div>\n<p>For the last month or so, Gemini 2.5 has been my go-to †. Almost nothing it spits out for me merges without edits. I’m sure there’s a skill to getting a SOTA model to one-shot a feature-plus-merge! But I don’t care. I like moving the code around and chuckling to myself while I delete all the stupid comments. I have to read the code line-by-line anyways.</p>\n<h3 id='but-hallucination' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-hallucination' aria-label='Anchor'></a><span class='plain-code'>but hallucination</span></h3>\n<p>If hallucination matters to you, your programming language has let you down.</p>\n\n<p>Agents lint. They compile and run tests. If their LLM invents a new function signature, the agent sees the error. They feed it back to the LLM, which says “oh, right, I totally made that up” and then tries again.</p>\n\n<p>You’ll only notice this happening if you watch the chain of thought log your agent generates. Don’t. This is why I like <a href='https://zed.dev/agentic' title=''>Zed’s agent mode</a>: it begs you to tab away and let it work, and pings you with a desktop notification when it’s done.</p>\n\n<p>I’m sure there are still environments where hallucination matters. But “hallucination” is the first thing developers bring up when someone suggests using LLMs, despite it being (more or less) a solved problem.</p>\n<h3 id='but-the-code-is-shitty-like-that-of-a-junior-developer' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-code-is-shitty-like-that-of-a-junior-developer' aria-label='Anchor'></a><span class='plain-code'>but the code is shitty, like that of a junior developer</span></h3>\n<p>Does an intern cost $20/month? Because that’s what Cursor.ai costs.</p>\n\n<p>Part of being a senior developer is making less-able coders productive, be they fleshly or algebraic. Using agents well is both a both a  skill and an engineering project all its own, of prompts, indices, <a href='https://fly.io/blog/semgrep-but-for-real-now/' title=''>and (especially) tooling.</a> LLMs only produce shitty code if you let them.</p>\n<div class=\"right-sidenote\"><p>† (Also: 100% of all the Bash code you should author ever again)</p>\n</div>\n<p>Maybe the current confusion is about who’s doing what work. Today, LLMs do a lot of typing, Googling, test cases †, and edit-compile-test-debug cycles. But even the most Claude-poisoned serious developers in the world still own curation, judgement, guidance, and direction.</p>\n\n<p>Also: let’s stop kidding ourselves about how good our human first cuts really are.</p>\n<h3 id='but-its-bad-at-rust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-its-bad-at-rust' aria-label='Anchor'></a><span class='plain-code'>but it’s bad at rust</span></h3>\n<p>It’s hard to get a good toolchain for Brainfuck, too. Life’s tough in the aluminum siding business.</p>\n<div class=\"right-sidenote\"><p>† (and they surely will; the Rust community takes tooling seriously)</p>\n</div>\n<p>A lot of LLM skepticism probably isn’t really about LLMs. It’s projection. People say “LLMs can’t code” when what they really mean is “LLMs can’t write Rust”. Fair enough! But people select languages in part based on how well LLMs work with them, so Rust people should get on that †.</p>\n\n<p>I work mostly in Go. I’m confident the designers of the Go programming language didn’t set out to produce the most LLM-legible language in the industry. They succeeded nonetheless. Go has just enough type safety, an extensive standard library, and a culture that prizes (often repetitive) idiom. LLMs kick ass generating it.</p>\n\n<p>All this is to say: I write some Rust. I like it fine. If LLMs and Rust aren’t working for you, I feel you. But if that’s your whole thing, we’re not having the same argument.</p>\n<h3 id='but-the-craft' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-craft' aria-label='Anchor'></a><span class='plain-code'>but the craft</span></h3>\n<p>Do you like fine Japanese woodworking? All hand tools and sashimono joinery? Me too. Do it on your own time.</p>\n<div class=\"right-sidenote\"><p>† (I’m a piker compared to my woodworking friends)</p>\n</div>\n<p>I have a basic wood shop in my basement †. I could get a lot of satisfaction from building a table. And, if that table is a workbench or a grill table, sure, I’ll build it. But if I need, like, a table? For people to sit at? In my office? I buy a fucking table.</p>\n\n<p>Professional software developers are in the business of solving practical problems for people with code. We are not, in our day jobs, artisans. Steve Jobs was wrong: we do not need to carve the unseen feet in the sculpture. Nobody cares if the logic board traces are pleasingly routed. If anything we build endures, it won’t be because the codebase was beautiful.</p>\n\n<p>Besides, that’s not really what happens. If you’re taking time carefully golfing functions down into graceful, fluent, minimal functional expressions, alarm bells should ring. You’re yak-shaving. The real work has depleted your focus. You’re not building: you’re self-soothing.</p>\n\n<p>Which, wait for it, is something LLMs are good for. They devour schlep, and clear a path to the important stuff, where your judgement and values really matter.</p>\n<h3 id='but-the-mediocrity' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-mediocrity' aria-label='Anchor'></a><span class='plain-code'>but the mediocrity</span></h3>\n<p>As a mid-late career coder, I’ve come to appreciate mediocrity. You should be so lucky as to  have it flowing almost effortlessly from a tap.</p>\n\n<p>We all write mediocre code. Mediocre code: often fine. Not all code is equally important. Some code should be mediocre. Maximum effort on a random unit test? You’re doing something wrong. Your team lead should correct you.</p>\n\n<p>Developers all love to preen about code. They worry LLMs lower the “ceiling” for quality. Maybe. But they also raise the “floor”.</p>\n\n<p>Gemini’s floor is higher than my own.  My code looks nice. But it’s not as thorough. LLM code is repetitive. But mine includes dumb contortions where I got too clever trying to DRY things up.</p>\n\n<p>And LLMs aren’t mediocre on every axis. They almost certainly have a bigger bag of algorithmic tricks than you do: radix tries, topological sorts, graph reductions, and LDPC codes. Humans romanticize <code>rsync</code> (<a href='https://www.andrew.cmu.edu/course/15-749/READINGS/required/cas/tridgell96.pdf' title=''>Andrew Tridgell</a> wrote a paper about it!). To an LLM it might not be that much more interesting than a SQL join.</p>\n\n<p>But I’m getting ahead of myself. It doesn’t matter. If truly mediocre code is all we ever get from LLMs, that’s still huge. It’s that much less mediocre code humans have to write.</p>\n<h3 id='but-itll-never-be-agi' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-itll-never-be-agi' aria-label='Anchor'></a><span class='plain-code'>but it’ll never be AGI</span></h3>\n<p>I don’t give a shit.</p>\n\n<p>Smart practitioners get wound up by the AI/VC hype cycle. I can’t blame them. But it’s not an argument. Things either work or they don’t, no matter what Jensen Huang has to say about it.</p>\n<h3 id='but-they-take-rr-jerbs' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-they-take-rr-jerbs' aria-label='Anchor'></a><span class='plain-code'>but they take-rr jerbs</span></h3>\n<p><a href='https://news.ycombinator.com/item?id=43776612' title=''>So does open source.</a> We used to pay good money for databases.</p>\n\n<p>We’re a field premised on automating other people’s jobs away. “Productivity gains,” say the economists. You get what that means, right? Fewer people doing the same stuff. Talked to a travel agent lately? Or a floor broker? Or a record store clerk? Or a darkroom tech?</p>\n\n<p>When this argument comes up, libertarian-leaning VCs start the chant: lamplighters, creative destruction, new kinds of work. Maybe. But I’m not hypnotized. I have no fucking clue whether we’re going to be better off after LLMs. Things could get a lot worse for us.</p>\n\n<p>LLMs really might displace many software developers. That’s not a high horse we get to ride. Our jobs are just as much in tech’s line of fire as everybody else’s have been for the last 3 decades. We’re not <a href='https://en.wikipedia.org/wiki/2024_United_States_port_strike' title=''>East Coast dockworkers</a>; we won’t stop progress on our own.</p>\n<h3 id='but-the-plagiarism' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-the-plagiarism' aria-label='Anchor'></a><span class='plain-code'>but the plagiarism</span></h3>\n<p>Artificial intelligence is profoundly — and probably unfairly — threatening to visual artists in ways that might be hard to appreciate if you don’t work in the arts.</p>\n\n<p>We imagine artists spending their working hours pushing the limits of expression. But the median artist isn’t producing gallery pieces. They produce on brief: turning out competent illustrations and compositions for magazine covers, museum displays, motion graphics, and game assets.</p>\n\n<p>LLMs easily — alarmingly — clear industry quality bars. Gallingly, one of the things they’re best at is churning out just-good-enough facsimiles of human creative work.  I have family in visual arts. I can’t talk to them about LLMs. I don’t blame them. They’re probably not wrong.</p>\n\n<p>Meanwhile, software developers spot code fragments <a href=\"https://arxiv.org/abs/2311.17035\">seemingly lifted</a> from public repositories on Github and lose their shit. What about the licensing? If you’re a lawyer, I defer. But if you’re a software developer playing this card? Cut me a little slack as I ask you to shove this concern up your ass. No profession has demonstrated more contempt for intellectual property.</p>\n\n<p>The median dev thinks Star Wars and Daft Punk are a public commons. The great cultural project of developers has been opposing any protection that might inconvenience a monetizable media-sharing site. When they fail at policy, they route around it with coercion. They stand up global-scale piracy networks and sneer at anybody who so much as tries to preserve a new-release window for a TV show.</p>\n\n<p>Call any of this out if you want to watch a TED talk about how hard it is to stream <em>The Expanse</em> on LibreWolf. Yeah, we get it. You don’t believe in IPR. Then shut the fuck up about IPR. Reap the whirlwind.</p>\n\n<p>It’s all special pleading anyways. LLMs digest code further than you do. If you don’t believe a typeface designer can stake a moral claim on the terminals and counters of a letterform, you sure as hell can’t be possessive about a red-black tree.</p>\n<h3 id='positive-case-redux' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#positive-case-redux' aria-label='Anchor'></a><span class='plain-code'>positive case redux</span></h3>\n<p>When I started writing a couple days ago, I wrote a section to “level set” to the  state of the art of LLM-assisted programming. A bluefish filet has a longer shelf life than an LLM take. In the time it took you to read this, everything changed.</p>\n\n<p>Kids today don’t just use agents; they use asynchronous agents. They wake up, free-associate 13 different things for their LLMs to work on, make coffee, fill out a TPS report, drive to the Mars Cheese Castle, and then check their notifications. They’ve got 13 PRs to review. Three get tossed and re-prompted. Five of them get the same feedback a junior dev gets. And five get merged.</p>\n\n<p><em>“I’m sipping rocket fuel right now,”</em> a friend tells me. <em>“The folks on my team who aren’t embracing AI? It’s like they’re standing still.”</em> He’s not bullshitting me. He doesn’t work in SFBA. He’s got no reason to lie.</p>\n\n<p>There’s plenty of things I can’t trust an LLM with. No LLM has any of access to prod here. But I’ve been first responder on an incident and fed 4o — not o4-mini, 4o — log transcripts, and watched it in seconds spot LVM metadata corruption issues on a host we’ve been complaining about for months. Am I better than an LLM agent at interrogating OpenSearch logs and Honeycomb traces? No. No, I am not.</p>\n\n<p>To the consternation of many of my friends, I’m not a radical or a futurist. I’m a statist. I believe in the haphazard perseverance of complex systems, of institutions, of reversions to the mean. I write Go and Python code. I’m not a Kool-aid drinker.</p>\n\n<p>But something real is happening. My smartest friends are blowing it off. Maybe I persuade you. Probably I don’t. But we need to be done making space for bad arguments.</p>\n<h3 id='but-im-tired-of-hearing-about-it' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-im-tired-of-hearing-about-it' aria-label='Anchor'></a><span class='plain-code'>but i’m tired of hearing about it</span></h3>\n<p>And here I rejoin your company. I read <a href='https://simonwillison.net/' title=''>Simon Willison</a>, and that’s all I really need. But all day, every day, a sizable chunk of the front page of HN is allocated to LLMs: incremental model updates, startups doing things with LLMs, LLM tutorials, screeds against LLMs. It’s annoying!</p>\n\n<p>But AI is also incredibly — a word I use advisedly — important. It’s getting the same kind of attention that smart phones got in 2008, and not as much as the Internet got. That seems about right.</p>\n\n<p>I think this is going to get clearer over the next year. The cool kid haughtiness about “stochastic parrots” and “vibe coding” can’t survive much more contact with reality. I’m snarking about these people, but I meant what I said: they’re smarter than me. And when they get over this affectation, they’re going to make coding agents profoundly more effective than they are today.</p>",
      "image": {
        "url": "https://fly.io/blog/youre-all-nuts/assets/whoah.png",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/kamal-in-production/",
      "title": "Using Kamal 2.0 in Production",
      "description": null,
      "url": "https://fly.io/blog/kamal-in-production/",
      "published": "2025-05-29T00:00:00.000Z",
      "updated": "2025-06-02T20:40:48.000Z",
      "content": "<p><a href='https://pragprog.com/titles/rails8/agile-web-development-with-rails-8/' title=''>Agile Web Development with Rails 8</a> is off to production, where they do things like editing, indexing, pagination, and printing. In researching the chapter on Deployment and Production, I became very dissatisfied with the content available on Kamal. I ended up writing my own, and it went well beyond the scope of the book. I then extracted what I needed from the result and put it in the book.</p>\n\n<p>Now that I have some spare time, I took a look at the greater work. It was more than a chapter and less than a book, so I decided to publish it <a href='https://rubys.github.io/kamal-in-production/' title=''>online</a>.</p>\n\n<p>This took me only a matter of hours. I had my notes in the XML grammar that Pragmatic Programming uses for books. I asked GitHub Copilot to convert them to Markdown. It did the job without my having to explain the grammar. It made intelligent guesses as to how to handle footnotes and got a number of these wrong, but that was easy to fix. On a lark, I asked it to proofread the content, and it did that too.</p>\n\n<hr>\n\n<p>Don’t get me wrong, Kamal is great. There are plenty of videos on how to get toy projects online, and the documentation will tell you what each field in the configuration file does. But none pull together everything you need to deploy a real project. For example, <a href='https://rubys.github.io/kamal-in-production/Assemble/' title=''>there are seven things you need to get started</a>. Some are optional, some you may already have, and all can be gathered quickly <strong class='font-semibold text-navy-950'>if you have a list</strong>.</p>\n\n<p>Kamal is just one piece of the puzzle. To deploy your software using Kamal, you need to be aware of the vast Docker ecosystem. You will want to set up a builder, sign up for a container repository, and lock down your secrets. And as you grow, you will want a load balancer and a managed database.</p>\n\n<p>And production is much more than copying files and starting a process. It is ensuring that your database is backed up, that your logs are searchable, and that your application is being monitored. It is also about ensuring that your application is secure.</p>\n\n<p>My list is opinionated. Each choice has a lot of options. For SSH keys, there are a lot of key types. If you don’t have an opinion, go with what GitHub recommends. For hosting, there are a lot of providers, and most videos start with Hetzner, which is a solid choice. But did you know that there are separate paths for Cloud and Robot, and when you would want either?</p>\n\n<p>A list with default options and alternatives highlighted was what would have helped me get started. Now it is available to you. The <a href='https://github.com/rubys/kamal-in-production/' title=''>source is on GitHub</a>. <a href='https://creativecommons.org/public-domain/cc0/' title=''>CC0 licensed</a>. Feel free to add side pages or links to document Digital Ocean, add other monitoring tools, or simply correct a typo that GitHub Copilot and I missed (or made).</p>\n\n<hr>\n\n<p>And if you happen to be in the south eastern part of the US in August, come see me talk on this topic at the <a href='https://blog.carolina.codes/p/announcing-our-2025-speakers' title=''>Carolina Code Conference</a>. If you can’t make it, the presentation will be recorded and posted online.</p>",
      "image": {
        "url": "https://fly.io/blog/kamal-in-production/assets/production.jpg",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/parking-lot-ffffffffffffffff/",
      "title": "parking_lot: ffffffffffffffff...",
      "description": null,
      "url": "https://fly.io/blog/parking-lot-ffffffffffffffff/",
      "published": "2025-05-28T00:00:00.000Z",
      "updated": "2025-06-02T20:40:48.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io, a public cloud that runs apps in a bunch of locations all over the world. This is a post about a gnarly bug in our Anycast router, the largest Rust project in our codebase. You won’t need much of any Rust to follow it.</p>\n</div>\n<p>The basic idea is simple: customers give us Docker containers, and tell us which one of 30+ regions around the world they want them to run in. We convert the containers into lightweight virtual machines, and then link them to an Anycast network. If you boot up an app here in Sydney and Frankfurt, and a request for it lands in Narita, it’ll get routed to Sydney. The component doing that work is called <code>fly-proxy</code>. It’s a Rust program, and it has been ill behaved of late.</p>\n<h3 id='dramatis-personae' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#dramatis-personae' aria-label='Anchor'></a><span class='plain-code'>Dramatis Personae</span></h3>\n<p><code>fly-proxy</code>, our intrepid Anycast router.</p>\n\n<p><code>corrosion</code>, our intrepid Anycast routing protocol.</p>\n\n<p><code>Rust</code>, a programming language you probably don’t use.</p>\n\n<p><code>read-write locks</code>, a synchronization primitive that allows for many readers <em>or</em> one single writer.</p>\n\n<p><code>parking_lot</code>, a well-regarded optimized implementation of locks in Rust.</p>\n<div class=\"callout\"><p>Gaze not into the abyss, lest you become recognized as an <strong class=\"font-semibold text-navy-950\"><em>abyss domain expert</em></strong>, and they expect you keep gazing into the damn thing</p>\n\n<p><em>Mathewson <a href=\"https://x.com/nickm_tor/status/860234274842324993?lang=en\" title=\"\">6:31</a></em></p>\n</div><h3 id='anycast-routing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#anycast-routing' aria-label='Anchor'></a><span class='plain-code'>Anycast Routing</span></h3>\n<p>You already know how to build a proxy. Connection comes in, connection goes out, copy back and forth. So for the amount of time we spend writing about <code>fly-proxy</code>, you might wonder what the big deal is.</p>\n\n<p>To be fair, in the nuts and bolts of actually proxying requests, <code>fly-proxy</code> does some interesting stuff. For one thing, it’s <a href='https://github.com/jedisct1/yes-rs' title=''>written in Rust</a>, which is apparently a big deal all on its own. It’s a large, asynchronous codebase that handles multiple protocols, TLS termination, and certificate issuance. It exercises a lot of <a href='https://tokio.rs/' title=''>Tokio</a> features.</p>\n\n<p>But none of this is the hard part of <code>fly-proxy</code>.</p>\n\n<p>We operate thousands of servers around the world. A customer Fly Machine might get scheduled onto any of them; in some places, the expectation we set with customers is that their Machines will potentially start in less than a second. More importantly, a Fly Machine can terminate instantly. In both cases, but especially the latter, <code>fly-proxy</code> potentially needs to know, so that it does (or doesn’t) route traffic there.</p>\n\n<p>This is the hard problem: managing millions of connections for millions of apps. It’s a lot of state to manage, and it’s in constant flux. We refer to this as the “state distribution problem”, but really, it quacks like a routing protocol.</p>\n<h3 id='our-routing-protocol-is-corrosion' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#our-routing-protocol-is-corrosion' aria-label='Anchor'></a><span class='plain-code'>Our Routing Protocol is Corrosion</span></h3><div class=\"right-sidenote\"><p>Corrosion2, to be precise.</p>\n</div>\n<p>We’ve been through multiple iterations of the state management problem, and the stable place we’ve settled is a <a href='https://github.com/superfly/corrosion' title=''>system called Corrosion</a>.</p>\n\n<p>Corrosion is a large SQLite database mirrored globally across several thousand servers. There are three important factors that make Corrosion work:</p>\n\n<ol>\n<li>The SQLite database Corrosion replicates is CRDT-structured.\n</li><li>In our orchestration model, individual worker servers are the source of truth for any Fly Machines running on them; there’s no globally coordinated orchestration state.\n</li><li>We use <a href='http://fly.io/blog/building-clusters-with-serf/#what-serf-is-doing' title=''>SWIM gossip</a> to publish updates from those workers across the fleet.\n</li></ol>\n\n<p>This works. A Fly Machine terminates in Dallas; a <code>fly-proxy</code> instance in Singapore knows within a small number of seconds.</p>\n<h3 id='routing-protocol-implementations-are-hard' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#routing-protocol-implementations-are-hard' aria-label='Anchor'></a><span class='plain-code'>Routing Protocol Implementations Are Hard</span></h3>\n<p>A routing protocol is a canonical example of a distributed system. We’ve certainly hit distsys bugs in Corrosion. But this story is about basic implementation issues. </p>\n\n<p>A globally replicated SQLite database is an awfully nice primitive, but we’re not actually doing SQL queries every time a request lands.</p>\n\n<p>In somewhat the same sense as a router works both with a <a href='https://www.dasblinkenlichten.com/rib-and-fib-understanding-the-terminology/' title=''>RIB and a FIB</a>, there is in <code>fly-proxy</code> a system of record for routing information (Corrosion), and then an in-memory aggregation of that information used to make fast decisions. In <code>fly-proxy</code>, that’s called the Catalog. It’s a record of everything in Corrosion a proxy might need to know about to forward requests.</p>\n\n<p>Here’s a fun bug from last year:</p>\n\n<p>At any given point in time, there’s a lot going on inside <code>fly-proxy</code>. It is a highly concurrent system. In particular, there are many readers to the Catalog, and also, when Corrosion updates, writers. We manage access to the Catalog with a system of <a href='https://doc.rust-lang.org/std/sync/struct.RwLock.html' title=''>read-write locks</a>.</p>\n\n<p>Rust is big on pattern-matching; instead of control flow based (at bottom) on simple arithmetic expressions, Rust allows you to <code>match</code> exhaustively (with compiler enforcement) on the types of an expression, and the type system expresses things like <code>Ok</code> or <code>Err</code>.</p>\n\n<p>But <code>match</code> can be cumbersome, and so there are shorthands. One of them is <code>if let</code>, which is syntax that makes a pattern match read like a classic <code>if</code> statement. Here’s an example:</p>\n<div class=\"highlight-wrapper group relative rust\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-zarn5q31\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-zarn5q31\"><span class=\"k\">if</span> <span class=\"k\">let</span> <span class=\"p\">(</span><span class=\"nf\">Some</span><span class=\"p\">(</span><span class=\"nn\">Load</span><span class=\"p\">::</span><span class=\"nf\">Local</span><span class=\"p\">(</span><span class=\"n\">load</span><span class=\"p\">)))</span> <span class=\"o\">=</span> <span class=\"p\">(</span><span class=\"o\">&</span><span class=\"k\">self</span><span class=\"py\">.load</span><span class=\"nf\">.read</span><span class=\"p\">()</span><span class=\"nf\">.get</span><span class=\"p\">(</span><span class=\"o\">...</span><span class=\"p\">))</span> <span class=\"p\">{</span>\n    <span class=\"c1\">// do a bunch of stuff with `load`</span>\n<span class=\"p\">}</span> <span class=\"k\">else</span> <span class=\"p\">{</span>\n    <span class=\"k\">self</span><span class=\"nf\">.init_for</span><span class=\"p\">(</span><span class=\"o\">...</span><span class=\"p\">);</span>\n<span class=\"p\">}</span>\n</code></pre>\n  </div>\n</div>\n<p>The “if” arm of that branch is taken if <code>self.load.read().get()</code> returns a value with the type <code>Some</code>. To retrieve that value, the expression calls <code>read()</code> to grab a lock.</p>\n<div class=\"right-sidenote\"><p>though Rust programmers probably notice the bug quickly</p>\n</div>\n<p>The bug is subtle: in that code, the lock <code>self.load.read().get()</code> takes is held not just for the duration of the “if” arm, but also for the “else” arm — you can think of <code>if let</code> expressions as being rewritten to the equivalent <code>match</code> expression, where that lifespan is much clearer.</p>\n\n<p>Anyways that’s real code and it occurred on a code path in <code>fly-proxy</code> that was triggered by a Corrosion update propagating from host to host across our fleet in millisecond intervals of time, converging quickly, like a good little routing protocol, on a global consensus that our entire Anycast routing layer should be deadlocked. Fun times.</p>\n<h3 id='the-watchdog-and-regionalizing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-watchdog-and-regionalizing' aria-label='Anchor'></a><span class='plain-code'>The Watchdog, and Regionalizing</span></h3>\n<p>The experience of that Anycast outage was arresting, and immediately altered our plans. We set about two tasks, one short-term and one long.</p>\n\n<p>In the short term: we made deadlocks nonlethal with a “watchdog” system. <code>fly-proxy</code> has an internal control channel (it drives a REPL operators can run from our servers). During a deadlock (or dead-loop or exhaustion), that channel becomes nonresponsive. We watch for that and bounce the proxy when it happens. A deadlock is still bad! But it’s a second-or-two-length arrhythmia, not asystole.</p>\n\n<p>Meanwhile, over the long term: we’re confronting the implications of all our routing state sharing a global broadcast domain. The update that seized up Anycast last year pertained to an app nobody used. There wasn’t any real reason for any <code>fly-proxy</code> to receive it in the first place. But in the <em>status quo ante</em> of the outage, every proxy received updates for every Fly Machine.</p>\n\n<p>They still do. Why? Because it scales, and fixing it turns out to be a huge lift. It’s a lift we’re still making! It’s just taking time. We call this effort “regionalization”, because the next intermediate goal is to confine most updates to the region (Sydney, Frankfurt, Dallas) in which they occur.</p>\n\n<p>I hope this has been a satisfying little tour of the problem domain we’re working in. We have now reached the point where I can start describing the new bug.</p>\n<h3 id='new-bug-round-1-lazy-loading' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-1-lazy-loading' aria-label='Anchor'></a><span class='plain-code'>New Bug, Round 1: Lazy Loading</span></h3>\n<p>We want to transform our original Anycast router, designed to assume a full picture of global state, into a regionalized router with partitioned state. A sane approach: have it lazy-load state information. Then you can spare most of the state code; state for apps that never show up at a particular <code>fly-proxy</code> in, say, Hong Kong simply doesn’t get loaded.</p>\n\n<p>For months now, portions of the <code>fly-proxy</code> Catalog have been lazy-loaded. A few weeks ago, the decision is made to get most of the rest of the Catalog state (apps, machines, &c) lazy-loaded as well. It’s a straightforward change and it gets rolled out quickly.</p>\n\n<p>Almost as quickly, proxies begin locking up and getting bounced by the watchdog. Not in the frightening Beijing-Olympics-level coordinated fashion that happened during the Outage, but any watchdog bounce is a problem.</p>\n\n<p>We roll back the change.</p>\n\n<p>From the information we have, we’ve narrowed things down to two suspects. First, lazy-loading changes the read/write patterns and thus the pressure on the RWLocks the Catalog uses; it could just be lock contention. Second, we spot a suspicious <code>if let</code>.</p>\n<h3 id='new-bug-round-2-the-lock-refactor' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-2-the-lock-refactor' aria-label='Anchor'></a><span class='plain-code'>New Bug, Round 2: The Lock Refactor</span></h3>\n<p>Whichever the case, there’s a reason these proxies locked up with the new lazy-loading code, and our job is to fix it. The <code>if let</code> is easy. Lock contention is a little trickier.</p>\n\n<p>At this point it’s time to introduce a new character to the story, though they’ve been lurking on the stage the whole time: it’s <a href='https://github.com/Amanieu/parking_lot' title=''><code>parking_lot</code></a>, an important, well-regarded, and widely-used replacement for the standard library’s lock implementation.</p>\n\n<p>Locks in <code>fly-proxy</code> are <code>parking_lot</code> locks. People use <code>parking_lot</code> mostly because it is very fast and tight (a lock takes up just a single 64-bit word). But we use it for the feature set. The feature we’re going to pull out this time is lock timeouts: the RWLock in <code>parking_lot</code> exposes a <a href='https://amanieu.github.io/parking_lot/parking_lot/struct.RwLock.html#method.try_write_for' title=''><code>try_write_for</code></a>method, which takes a <code>Duration</code>, after which an attempt to grab the write lock fails.</p>\n\n<p>Before rolling out a new lazy-loading <code>fly-proxy</code>, we do some refactoring:</p>\n\n<ul>\n<li>our Catalog write locks all time out, so we’ll get telemetry and a failure recovery path if that’s what’s choking the proxy to death,\n</li><li>we eliminate RAII-style lock acquisition from the Catalog code (in RAII style, you grab a lock into a local value, and the lock is released implicitly when the scope that expression occurs in concludes) and replace it with explicit closures, so you can look at the code and see precisely the interval in which the lock is held, and\n</li><li>since we now have code that is very explicit about locking intervals, we instrument it with logging and metrics so we can see what’s happening.\n</li></ul>\n\n<p>We should be set. The suspicious <code>if let</code> is gone, lock acquisition can time out, and we have all this new visibility.</p>\n\n<p>Nope. Immediately more lockups, all in Europe, especially in <code>WAW</code>.</p>\n<h3 id='new-bug-round-3-telemetry-inspection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-3-telemetry-inspection' aria-label='Anchor'></a><span class='plain-code'>New Bug, Round 3: Telemetry Inspection</span></h3>\n<p>That we’re still seeing deadlocks is f'ing weird. We’ve audited all our Catalog locks. You can look at the code and see the lifespan of a grabbed lock.</p>\n\n<p>We have a clue: our lock timeout logs spam just before a proxy locks up and is killed by the watchdog. This clue: useless. But we don’t know that yet!</p>\n\n<p>Our new hunch is that something is holding a lock for a very long time, and we just need to find out which thing that is. Maybe the threads updating the Catalog from Corrosion are dead-looping?</p>\n\n<p>The instrumentation should tell the tale. But it does not. We see slow locks… just before the entire proxy locks up. And they happen in benign, quiet applications. Random, benign, quiet applications.</p>\n\n<p><code>parking_lot</code> has a <a href='https://amanieu.github.io/parking_lot/parking_lot/deadlock/index.html' title=''>deadlock detector</a>. If you ask it, it’ll keep a waiting-for dependency graph and detect stalled threads. This runs on its own thread, isolate outside the web of lock dependencies in the rest of the proxy. We cut a build of the proxy with the deadlock detector enabled and wait for something in <code>WAW</code> to lock up. And it does. But <code>parking_lot</code> doesn’t notice. As far as it’s concerned, nothing is wrong.</p>\n\n<p>We are at this moment very happy we did the watchdog thing.</p>\n<h3 id='new-bug-round-4-descent-into-madness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-4-descent-into-madness' aria-label='Anchor'></a><span class='plain-code'>New Bug, Round 4: Descent Into Madness</span></h3>\n<p>When the watchdog bounces a proxy, it snaps a core dump from the process it just killed. We are now looking at core dumps. There is only one level of decompensation to be reached below “inspecting core dumps”, and that’s “blaming the compiler”. We will get there.</p>\n\n<p>Here’s Pavel, at the time:</p>\n\n<blockquote>\n<p>I’ve been staring at the last core dump from <code>waw</code> . It’s quite strange.\nFirst, there is no thread that’s running inside the critical section. Yet, there is a thread that’s waiting to acquire write lock and a bunch of threads waiting to acquire a read lock.\nThat’s doesn’t prove anything, of course, as a thread holding catalog write lock might have just released it before core dump was taken. But that would be quite a coincidence.</p>\n</blockquote>\n\n<p>The proxy is locked up. But there are no threads in a critical section for the catalog. Nevertheless, we can see stack traces of multiple threads waiting to acquire locks. In the moment, Pavel thinks this could be a fluke. But we’ll soon learn that <em>every single stack trace</em> shows the same pattern: everything wants the Catalog lock, but nobody has it.</p>\n\n<p>It’s hard to overstate how weird this is. It breaks both our big theories: it’s not compatible with a Catalog deadlock that we missed, and it’s not compatible with a very slow lock holder. Like a podcast listener staring into the sky at the condensation trail of a jetliner, we begin reaching for wild theories. For instance: <code>parking_lot</code> locks are synchronous, but we’re a Tokio application; something somewhere could be taking an async lock that’s confusing the runtime. Alas, no.</p>\n\n<p>On the plus side, we are now better at postmortem core dump inspection with <code>gdb</code>.</p>\n<h3 id='new-bug-round-5-madness-gives-way-to-desperation' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#new-bug-round-5-madness-gives-way-to-desperation' aria-label='Anchor'></a><span class='plain-code'>New Bug, Round 5: Madness Gives Way To Desperation</span></h3>\n<p>Fuck it, we’ll switch to <code>read_recursive</code>.</p>\n\n<p>A recursive read lock is an eldritch rite invoked when you need to grab a read lock deep in a call tree where you already grabbed that lock, but can’t be arsed to structure the code properly to reflect that. When you ask for a recursive lock, if you already hold the lock, you get the lock again, instead of a deadlock.</p>\n\n<p>Our theory: <code>parking_lot</code>goes through some trouble to make sure a stampede of readers won’t starve writers, who are usually outnumbered. It prefers writers by preventing readers from acquiring locks when there’s at least one waiting writer. And <code>read_recursive</code> sidesteps that logic.</p>\n\n<p>Maybe there’s some ultra-slow reader somehow not showing up in our traces, and maybe switching to recursive locks will cut through that.</p>\n\n<p>This does not work. At least, not how we hoped it would. It does generate a new piece of evidence:  <code>RwLock reader count overflow</code> log messages, and lots of them.</p>\n<h3 id='there-are-things-you-are-not-meant-to-know' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#there-are-things-you-are-not-meant-to-know' aria-label='Anchor'></a><span class='plain-code'>There Are Things You Are Not Meant To Know</span></h3>\n<p>You’re reading a 3,000 word blog post about a single concurrency bug, so my guess is you’re the kind of person who compulsively wants to understand how everything works. That’s fine, but a word of advice: there are things where, if you find yourself learning about them in detail, something has gone wrong.</p>\n\n<p>One of those things is the precise mechanisms used by your RWLock implementation.</p>\n\n<p>The whole point of <code>parking_lot</code> is that the locks are tiny, marshalled into a 64 bit word. Those bits are partitioned into <a href='https://github.com/Amanieu/parking_lot/blob/master/src/raw_rwlock.rs' title=''>4 signaling bits</a> (<code>PARKED</code>, <code>WRITER_PARKED</code>, <code>WRITER</code>, and <code>UPGRADEABLE</code>) and a 60-bit counter of lock holders.</p>\n\n<blockquote>\n<p>Me, a dummy: sounds like we overflowed that counter.</p>\n\n<p>Pavel, a genius: we are not overflowing a 60-bit counter.</p>\n</blockquote>\n\n<p>Ruling out a genuine overflow (which would require paranormal activity) leaves us with corruption as the culprit. Something is bashing our lock word. If the counter is corrupted, maybe the signaling bits are too; then we’re in an inconsistent state, an artificial deadlock.</p>\n\n<p>Easily confirmed. We cast the lock words into <code>usize</code> and log them. Sure enough, they’re <code>0xFFFFFFFFFFFFFFFF</code>.</p>\n\n<p>This is a smoking gun, because it implies all 4 signaling bits are set, and that includes <code>UPGRADEABLE</code>. Upgradeable locks are read-locks that can be “upgraded” to write locks. We don’t use them.</p>\n\n<p>This looks like classic memory corruption. But in our core dumps, memory doesn’t appear corrupted: the only thing set all <code>FFh</code> is the lock word.</p>\n\n<p>We compile and run our test suites <a href='https://github.com/rust-lang/miri' title=''>under <code>miri</code></a>, a Rust interpreter for its <a href='https://rustc-dev-guide.rust-lang.org/mir/index.html' title=''>MIR IR</a>. <code>miri</code> does all sorts of cool UB detection, and does in fact spot some UB in our tests, which we are at the time excited about until we deploy the fixes and nothing gets better.</p>\n\n<p>At this point, Saleem suggests guard pages. We could <code>mprotect</code> memory pages around the lock to force a panic if a wild write hits <em>near</em> the lock word, or just allocate zero pages and check them, which we are at the time excited about until we deploy the guard pages and nothing hits them.</p>\n<h3 id='the-non-euclidean-horror-at-the-heart-of-this-bug' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-non-euclidean-horror-at-the-heart-of-this-bug' aria-label='Anchor'></a><span class='plain-code'>The Non-Euclidean Horror At The Heart Of This Bug</span></h3>\n<p>At this point we should recap where we find ourselves:</p>\n\n<ul>\n<li>We made a relatively innocuous change to our proxy, which caused proxies in Europe to get watchdog-killed.\n</li><li>We audited and eliminated all the nasty <code>if-letses</code>.\n</li><li>We replaced all RAII lock acquisitions with explicit closures, and instrumented the closures. \n</li><li>We enabled <code>parking_lot</code> deadlock detection. \n</li><li>We captured and analyzed core dumps for the killed proxies. \n</li><li>We frantically switched to recursive read locks, which generated a new error.\n</li><li>We spotted what looks like memory corruption, but only of that one tiny lock word.\n</li><li>We ran our code under an IR interpreter to find UB, fixed some UB, and didn’t fix the bug.\n</li><li>We set up guard pages to catch wild writes.\n</li></ul>\n\n<p>In Richard Cook’s essential <a href='https://how.complexsystems.fail/' title=''>“How Complex Systems Fail”</a>, rule #5 is that “complex systems operate in degraded mode”. <em>The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws</em>. Maybe <code>fly-proxy</code> is now like the RBMK reactors at Chernobyl, a system that works just fine as long as operators know about and compensate for its known concurrency flaws, and everybody lives happily ever after.</p>\n<div class=\"right-sidenote\"><p>We are, in particular, running on the most popular architecture for its RWLock implementation.</p>\n</div>\n<p>We have reached the point where serious conversations are happening about whether we’ve found a Rust compiler bug. Amusingly, <code>parking_lot</code> is so well regarded among Rustaceans that it’s equally if not more plausible that Rust itself is broken.</p>\n\n<p>Nevertheless, we close-read the RWLock implementation. And we spot this:</p>\n<div class=\"highlight-wrapper group relative rust\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-c23zvw3n\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-c23zvw3n\"><span class=\"k\">let</span> <span class=\"n\">state</span> <span class=\"o\">=</span> <span class=\"k\">self</span><span class=\"py\">.state</span><span class=\"nf\">.fetch_add</span><span class=\"p\">(</span>\n   <span class=\"n\">prev_value</span><span class=\"nf\">.wrapping_sub</span><span class=\"p\">(</span><span class=\"n\">WRITER_BIT</span> <span class=\"p\">|</span> <span class=\"n\">WRITER_PARKED_BIT</span><span class=\"p\">),</span>\n                           <span class=\"nn\">Ordering</span><span class=\"p\">::</span><span class=\"n\">Relaxed</span><span class=\"p\">);</span>\n</code></pre>\n  </div>\n</div>\n<p>This looks like gibberish, so let’s rephrase that code to see what it’s actually doing:</p>\n<div class=\"highlight-wrapper group relative rust\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-oq3znyk\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-oq3znyk\"><span class=\"k\">let</span> <span class=\"n\">state</span> <span class=\"o\">=</span> <span class=\"k\">self</span><span class=\"py\">.state</span> <span class=\"o\">&</span> <span class=\"o\">~</span><span class=\"p\">(</span><span class=\"n\">WRITER_BIT</span><span class=\"p\">|</span><span class=\"n\">WRITER_PARKED_BIT</span><span class=\"p\">);</span>\n</code></pre>\n  </div>\n</div>\n<p>If you know exactly the state of the word you’re writing to, and you know exactly the limited set of permutations the bits of that word can take (for instance, because there’s only 4 signaling bits), then instead of clearing bit by fetching a word, altering it, and then storing it, you can clear them <em>atomically</em> by adding the inverse of those bits to the word.</p>\n\n<p>This pattern is self-synchronizing, but it relies on an invariant: you’d better be right about the original state of the word you’re altering. Because if you’re wrong, you’re adding a very large value to an uncontrolled value.</p>\n\n<p>In <code>parking_lot</code>, say we have <code>WRITER</code> and <code>WRITER_PARKED</code> set: the state is <code>0b1010</code>. <code>prev_value</code>, the state of the lock word when the lock operation started, is virtually always 0, and that’s what we’re counting on. <code>prev_value.wrapping_sub()</code>then calculates <code>0xFFFFFFFFFFFFFFF6</code>, which exactly cancels out the <code>0b1010</code> state, leaving 0.</p>\n<div class=\"right-sidenote\"><p>As a grace note, the locking code manages to set that one last unset bit before the proxy deadlocks.</p>\n</div>\n<p>Consider though what happens if one of those bits isn’t set: state is <code>0b1000</code>. Now that add doesn’t cancel out; the final state is instead <code>0xFFFFFFFFFFFFFFFE</code>. The reader count is completely full and can’t be decremented, and all the waiting bits are set so nothing can happen on the lock.</p>\n\n<p><code>parking_lot</code> is a big deal and we’re going to be damn sure before we file a bug report. Which doesn’t take long; Pavel reproduces the bug in a minimal test case, with a forked version of <code>parking_lot</code> that confirms and logs the condition.</p>\n\n<p><a href='https://github.com/Amanieu/parking_lot/issues/465' title=''>The <code>parking_lot</code> team quickly confirms</a> and fixes the bug.</p>\n<h3 id='ex-insania-claritas' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ex-insania-claritas' aria-label='Anchor'></a><span class='plain-code'>Ex Insania, Claritas</span></h3>\n<p>Here’s what we now know to have been happening:</p>\n\n<ol>\n<li>Thread 1 grabs a read lock.\n</li><li>Thread 2 tries to grab a write lock, with a <code>try_write_for</code> timeout; it’s “parked” waiting for the reader, which sets <code>WRITER</code> and <code>WRITER_PARKED</code> on the raw lock word.\n</li><li>Thread 1 releases the lock, unparking a waiting writer, which unsets <code>WRITER_PARKED</code>.\n</li><li>Thread 2 wakes, but not for the reason it thinks: the timing is such that it thinks the lock has timed out. So it tries to clear both <code>WRITER</code> and <code>WRITER_PARKED</code> — a bitwise “double free”. Lock: corrupted. Computer: over. \n</li></ol>\n\n<p><a href='https://github.com/Amanieu/parking_lot/pull/466' title=''>The fix is simple</a>: the writer bit is cleared separately in the same wakeup queue as the reader. The fix is deployed, the lockups never recur.</p>\n\n<p>At a higher level, the story is this:</p>\n\n<ol>\n<li>We’re refactoring the proxy to regionalize it, which changes the pattern  of readers and writers on the catalog.\n</li><li>As we broaden out lazy loads, we introduce writer contention, a classic concurrency/performance debugging problem. It looks like a deadlock at the time! It isn’t. \n</li><li><code>try_write_for</code> is a good move: we need tools to manage contention.\n</li><li>But now we’re on a buggy code path in <code>parking_lot</code> — we don’t know that and can’t understand it until we’ve lost enough of our minds to second-guess the library.\n</li><li>We stumble on the bug out of pure dumb luck by stabbing in the dark with <code>read_recursive</code>.\n</li></ol>\n\n<p>Mysteries remain. Why did this only happen in <code>WAW</code>? Some kind of crazy regional timing thing? Something to do with the Polish <em>kreska</em> diacritic that makes L’s sound like W’s? The wax and wane of caribou populations? Some very high-traffic APIs local to these regions? All very plausible.</p>\n\n<p>We’ll never know because we fixed the bug.</p>\n\n<p>But we’re in a better place now, even besides the bug fix:</p>\n\n<ul>\n<li>we audited all our catalog locking, got rid of all the iflets, and stopped relying on RAII lock guards.\n</li><li>the resulting closure patterns gave us lock timing metrics, which will be useful dealing with future write contention\n</li><li>all writes are now labeled, and we emit labeled logs on slow writes, augmented with context data (like app IDs) where we have it\n</li><li>we also now track the last and current holder of the write lock, augmented with context information; next time we have a deadlock, we should have all the information we need to identify the actors without <code>gdb</code> stack traces.\n</li></ul>",
      "image": {
        "url": "https://fly.io/blog/parking-lot-ffffffffffffffff/assets/ffff-cover.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/litestream-revamped/",
      "title": "Litestream: Revamped",
      "description": null,
      "url": "https://fly.io/blog/litestream-revamped/",
      "published": "2025-05-20T00:00:00.000Z",
      "updated": "2025-05-22T19:59:27.000Z",
      "content": "<div class=\"lead\"><p><a href=\"https://litestream.io/\" title=\"\">Litestream</a> is an open-source tool that makes it possible to run many kinds of full-stack applications on top of SQLite by making them reliably recoverable from object storage. This is a post about the biggest change we’ve made to it since I launched it.</p>\n</div>\n<p>Nearly a decade ago, I got a bug up my ass. I wanted to build full-stack applications quickly. But the conventional n-tier database design required me to do sysadmin work for each app I shipped. Even the simplest applications depended on heavy-weight database servers like Postgres or MySQL.</p>\n\n<p>I wanted to launch apps on SQLite, because SQLite is easy. But SQLite is embedded, not a server, which at the time implied that the data for my application lived (and died) with just one server.</p>\n\n<p>So in 2020, I wrote <a href='https://litestream.io/' title=''>Litestream</a> to fix that.</p>\n\n<p>Litestream is a tool that runs alongside a SQLite application. Without changing that running application, it takes over the WAL checkpointing process to continuously stream database updates to an S3-compatible object store. If something happens to the server the app is running on, the whole database can efficiently be restored to a different server. You might lose servers, but you won’t lose your data.</p>\n\n<p>Litestream worked well. So we got ambitious. A few years later, we built <a href='https://github.com/superfly/litefs' title=''>LiteFS</a>. LiteFS takes the ideas in Litestream and refines them, so that we can do read replicas and primary failovers with SQLite. LiteFS gives SQLite the modern deployment story of an n-tier database like Postgres, while keeping the database embedded.</p>\n\n<p>We like both LiteFS and Litestream. But Litestream is the more popular project. It’s easier to deploy and easier to reason about.</p>\n\n<p>There are some good ideas in LiteFS. We’d like Litestream users to benefit from them. So we’ve taken our LiteFS learnings and applied them to some new features in Litestream.</p>\n<h2 id='point-in-time-restores-but-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#point-in-time-restores-but-fast' aria-label='Anchor'></a><span class='plain-code'>Point-in-time restores, but fast</span></h2>\n<p><a href='https://fly.io/blog/all-in-on-sqlite-litestream/' title=''>Here’s how Litestream was originally designed</a>: you run <code>litestream</code> against a SQLite database, and it opens up a long-lived read transaction. This transaction arrests SQLite WAL checkpointing, the process by which SQLite consolidates the WAL back into the main database file. Litestream builds a “shadow WAL” that records WAL pages, and copies them to S3.</p>\n\n<p>This is simple, which is good. But it can also be slow. When you want to restore a database, you have have to pull down and replay every change since the last snapshot. If you changed a single database page a thousand times, you replay a thousand changes. For databases with frequent writes, this isn’t a good approach.</p>\n\n<p>In LiteFS, we took a different approach. LiteFS is transaction-aware. It doesn’t simply record raw WAL pages, but rather ordered ranges of pages associated with transactions, using a file format we call <a href='https://github.com/superfly/ltx' title=''>LTX</a>. Each LTX file represents a sorted changeset of pages for a given period of time.</p>\n\n<p><img alt=\"a simple linear LTX file with 8 pages between 1 and 21\" src=\"/blog/litestream-revamped/assets/linear-ltx.png\" /></p>\n\n<p>Because they are sorted, we can easily merge multiple LTX files together and create a new LTX file with only the latest version of each page.</p>\n\n<p><img alt=\"merging three LTX files into one\" src=\"/blog/litestream-revamped/assets/merged-ltx.png\" /></p>\n<div class=\"right-sidenote\"><p>This is similar to how an <a href=\"https://en.wikipedia.org/wiki/Log-structured_merge-tree\" title=\"\">LSM tree</a> works.</p>\n</div>\n<p>This process of combining smaller time ranges into larger ones is called <em>compaction</em>. With it, we can replay a SQLite database to a specific point in time, with a minimal  duplicate pages.</p>\n<h2 id='casaas-compare-and-swap-as-a-service' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#casaas-compare-and-swap-as-a-service' aria-label='Anchor'></a><span class='plain-code'>CASAAS: Compare-and-Swap as a Service</span></h2>\n<p>One challenge Litestream has to deal with is desynchronization. Part of the point of Litestream is that SQLite applications don’t have to be aware of it. But <code>litestream</code> is just a process, running alongside the application, and it can die independently. If <code>litestream</code> is down while database changes occur, it will miss changes. The same kind of problem occurs if you start replication from a new server.</p>\n\n<p>Litestream needs a way to reset the replication stream from a new snapshot. How it does that is with “generations”. <a href='https://litestream.io/how-it-works/#snapshots--generations' title=''>A generation</a> represents a snapshot and a stream of WAL updates, uniquely identified. Litestream notices any break in its WAL sequence and starts a new generation, which is how it recovers from desynchronization.</p>\n\n<p>Unfortunately, storing and managing multiple generations makes it difficult to implement features like failover and read-replicas.</p>\n\n<p>The most straightforward way around this problem is to make sure only one instance of Litestream can replication to a given destination. If you can do that, you can store just a single, latest generation. That in turn makes it easy to know how to resync a read replica; there’s only one generation to choose from.</p>\n\n<p>In LiteFS, we solved this problem by using Consul, which guaranteed a single leader. That requires users to know about Consul. Things like “requiring Consul” are probably part of the reason Litestream is so much more popular than LiteFS.</p>\n\n<p>In Litestream, we’re solving the problem a different way. Modern object stores like S3 and Tigris solve this problem for us: they now offer <a href='https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/' title=''>conditional write support</a>. With conditional writes, we can implement a time-based lease. We get essentially the same constraint Consul gave us, but without having to think about it or set up a dependency.</p>\n\n<p>In the immediacy, this will mean you can run Litestream with ephemeral nodes, with overlapping run times, and even if they’re storing to the same destination, they won’t confuse each other.</p>\n<h2 id='lightweight-read-replicas' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lightweight-read-replicas' aria-label='Anchor'></a><span class='plain-code'>Lightweight read replicas</span></h2>\n<p>The original design constraint of both Litestream and LiteFS was to extend SQLite, to modern deployment scenarios, without disturbing people’s built code. Both tools are meant to function even if applications are oblivious to them.</p>\n\n<p>LiteFS is more ambitious than Litestream, and requires transaction-awareness. To get that without disturbing built code, we use a cute trick (a.k.a. a gross hack): LiteFS provides a FUSE filesystem, which lets it act as a proxy between the application and the backing store. From that vantage point, we can easily discern transactions.</p>\n\n<p>The FUSE approach gave us a lot of control, enough that users could use SQLite replicas just like any other database. But installing and running a whole filesystem (even a fake one) is a lot to ask of users. To work around that problem, we relaxed a constraint: LiteFS can function without the FUSE filesystem if you load an extension into your application code, <a href='https://github.com/superfly/litevfs' title=''>LiteVFS</a>.  LiteVFS is a <a href='https://www.sqlite.org/vfs.html' title=''>SQLite Virtual Filesystem</a> (VFS). It works in a variety of environments, including some where FUSE can’t, like in-browser WASM builds.</p>\n\n<p>What we’re doing next is taking the same trick and using it on Litestream. We’re building a VFS-based read-replica layer. It will be able to fetch and cache pages directly from S3-compatible object storage.</p>\n\n<p>Of course, there’s a catch: this approach isn’t as efficient as a local SQLite database. That kind of efficiency, where you don’t even need to think about N+1 queries because there’s no network round-trip to make the duplicative queries pile up costs, is part of the point of using SQLite.</p>\n\n<p>But we’re optimistic that with cacheing and prefetching, the approach we’re using will yield, for the right use cases, strong performance — all while serving SQLite reads hot off of Tigris or S3.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Litestream is fully open source</h1>\n    <p>It’s not coupled with Fly.io at all; you can use it anywhere.</p>\n      <a class=\"btn btn-lg\" href=\"https://litestream.io/\">\n        Check it out <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-kitty.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='synchronize-lots-of-databases' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#synchronize-lots-of-databases' aria-label='Anchor'></a><span class='plain-code'>Synchronize Lots Of Databases</span></h2>\n<p>While we’ve got you here: we’re knocking out one of our most requested features.</p>\n\n<p>In the old Litestream design, WAL-change polling and slow restores made it infeasible to replicate large numbers of databases from a single process. That has been our answer when users ask us for a “wildcard” or “directory” replication argument for the tool.</p>\n\n<p>Now that we’ve switched to LTX, this isn’t a problem any more. It should thus be possible to replicate <code>/data/*.db</code>, even if there’s hundreds or thousands of databases in that directory.</p>\n<h2 id='we-still-sqlite' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-still-sqlite' aria-label='Anchor'></a><span class='plain-code'>We Still ❤️ SQLite</span></h2>\n<p>SQLite has always been a solid database to build on and it’s continued to find new use cases as the industry evolves. We’re super excited to continue to build Litestream alongside it.</p>\n\n<p>We have a sneaking suspicion that the robots that write LLM code are going to like SQLite too. We think what <a href='https://phoenix.new/' title=''>coding agents like Phoenix.new</a> want is a way to try out code on live data, screw it up, and then rollback <em>both the code and the state.</em> These Litestream updates put us in a position to give agents PITR as a primitive. On top of that, you can build both rollbacks and forks.</p>\n\n<p>Whether or not you’re drinking the AI kool-aid, we think this new design for Litestream is just better. We’re psyched to be rolling it out, and for the features it’s going to enable.</p>",
      "image": {
        "url": "https://fly.io/blog/litestream-revamped/assets/litestream-revamped.png",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/mcp-launch/",
      "title": "Launching MCP Servers on Fly.io",
      "description": null,
      "url": "https://fly.io/blog/mcp-launch/",
      "published": "2025-05-19T00:00:00.000Z",
      "updated": "2025-05-22T19:59:27.000Z",
      "content": "<div class=\"lead\"><p>This is a blog post. Part showing off. Part opinion. Plan accordingly.</p>\n</div>\n<p>The <a href='https://www.anthropic.com/news/model-context-protocol' title=''>Model Context Protocol</a> is days away from turning six months old. You read that right, six <em>months</em> old. MCP Servers have both taken the world by storm, and still trying to figure out what they want to be when they grow up.</p>\n\n<p>There is no doubt that MCP servers are useful. But their appeal goes beyond that. They are also simple and universal. What’s not to like?</p>\n\n<p>Well, for starters, there’s basically two types of MCP servers. One small and nimble that runs as a process on your machine. And one that is a HTTP server that runs presumably elsewhere and is <a href='https://modelcontextprotocol.io/specification/2025-03-26/basic/authorization' title=''>standardizing</a> on OAuth 2.1. And there is a third type, but it is deprecated.</p>\n\n<p>Next there is the configuration. Asking users to manually edit JSON seems so early 21th century. With Claude, this goes into <code>~/Library/Application Support/Claude/claude_desktop_config.json</code>, and is found under a <code>MCPServer</code> key. With Zed, this file is in <code>~/.config/zed/settings.json</code> and is found under a <code>context_servers</code> key. And some tools put these files in a different place depending on whether you are running on MacOS, Linux, or Windows.</p>\n\n<p>Finally, there is security. An MCP server running on your machine literally has access to everything you do. Running remote solves this problem, but did I mention <a href='https://datatracker.ietf.org/doc/html/draft-ietf-oauth-v2-1-12' title=''>OAuth 2.1</a>? Not exactly something one sets up for casual use.</p>\n\n<p>None of these issues are fatal - something that is obvious by the fact that MCP servers are quite popular. But can we do better? I think so.</p>\n\n<hr>\n\n<p>Demo time.</p>\n\n<p>Let’s try out the <a href='https://github.com/modelcontextprotocol/servers/blob/main/src/slack/README.md' title=''>Slack MCP Server</a>:</p>\n\n<blockquote>\n<p>MCP Server for the Slack API, enabling Claude to interact with Slack workspaces.</p>\n</blockquote>\n\n<p>That certainly sounds like a good test case. There is a small amount of <a href='https://github.com/modelcontextprotocol/servers/blob/main/src/slack/README.md#setup' title=''>setup</a> you need to do, and when you are done you end up with a <em>Bot User OAuth Token</em> staring with <code>xoxb-</code> and a <em>Team ID</em> starting with a <code>T</code>.</p>\n\n<p>You <em>would</em> run it using the following:</p>\n<div class=\"highlight-wrapper group relative sh\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-ievvjhpo\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-ievvjhpo\">npx <span class=\"nt\">-y</span> @modelcontextprotocol/server-slack\n</code></pre>\n  </div>\n</div>\n<p>But instead, you convert that command to JSON and find the right configuration file and put this information in there. And either run the slack MCP server locally or set up a server with or without authentication.</p>\n\n<p>Wouldn’t it be nice to have the simplicity of a local process, the security of a remote server, and to eliminate the hassle of manually editing JSON?</p>\n\n<p>Here’s our current thinking:</p>\n<div class=\"highlight-wrapper group relative sh\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-gdwhiyfl\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-gdwhiyfl\">fly mcp launch <span class=\"se\">\\</span>\n  <span class=\"s2\">\"npx -y @modelcontextprotocol/server-slack\"</span> <span class=\"se\">\\</span>\n  <span class=\"nt\">--claude</span> <span class=\"nt\">--server</span> slack <span class=\"se\">\\</span>\n  <span class=\"nt\">--secret</span> <span class=\"nv\">SLACK_BOT_TOKEN</span><span class=\"o\">=</span>xoxb-your-bot-token <span class=\"se\">\\</span>\n  <span class=\"nt\">--secret</span> <span class=\"nv\">SLACK_TEAM_ID</span><span class=\"o\">=</span>T01234567\n</code></pre>\n  </div>\n</div>\n<p>You can put this all on one line if you like, I just split this up so it fits on small screens and so we can talk about the various parts.</p>\n\n<p>The first three words seem reasonable. The quoted string is just the command that we want to run. So lets talk about the four flags. The first tells us which tool’s configuration file we want to update. The second specifies the name to use for the server in that configuration file. The last two set secrets.</p>\n\n<p>Oh, did I say current thinking? Perhaps I should mention that it is something you can run right now, as long as you have flyctl <code>v0.3.125</code> or later. Complete with the options you would expect from Fly.io, like the ability to configure auto-stop, file contents, flycast, secrets, region, volumes, and VM sizes.</p>\n\n<p>And, hey, lookie there:</p>\n\n<p><img alt=\"testing, testing, 1, 2, 3\" src=\"/blog/mcp-launch/assets/mcp-slack.png\" /></p>\n\n<p>Support for Claude, Cursor, Neovim, VS Code, Windsurf, and Zed are built in. You can select multiple clients and configuration files.</p>\n\n<p>By default, bearer token authentication will be set up on both the server and client.</p>\n\n<p>You can find the complete set of options on our <a href='https://fly.io/docs/flyctl/mcp-launch/' title=''><code>fly mcp launch</code></a> docs page.</p>\n\n<hr>\n\n<p>But this post isn’t just about experimental demoware that is subject to change.\nIt is about the depth of support that we are rapidly bringing online, including:</p>\n\n<ul>\n<li>Support for all transports, not just the ones we recommend.\n</li><li>Ability to deploy using the command line or the Machines API, with a number of different options that allow you to chose between elegant simplicity and excruciating precise control.\n</li><li>Ability to deploy each MCP server to a separate Machine, container, or even inside your application.\n</li><li>Access via HTTP Authorization, wireguard tunnels and flycast, or reverse proxies.\n</li></ul>\n\n<p>You can see all this spelled out in our <a href='https://fly.io/docs/mcp/' title=''>docs</a>. Be forewarned, most pages are marked as <em>beta</em>. But the examples provided all work. Well, there may be a bug here or there, but the examples <em>as shown</em> are thought to work. Maybe.</p>\n\n<p>Let’s figure out the ideal ergonomics of deploying MCP servers remotely together!</p>",
      "image": {
        "url": "https://fly.io/blog/mcp-launch/assets/fly-mcp-launch.png",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/mcp-provisioning/",
      "title": "Provisioning Machines using MCPs",
      "description": null,
      "url": "https://fly.io/blog/mcp-provisioning/",
      "published": "2025-05-07T00:00:00.000Z",
      "updated": "2025-05-22T19:59:27.000Z",
      "content": "<div class=\"lead\"><p>Today’s state of the art is K8S, Terraform, web based UIs, and CLIs.  Those days are numbered.</p>\n</div>\n<p>On Monday, I created my first fly volume using an <a href='https://modelcontextprotocol.io/introduction' title=''>MCP</a>. For those who don’t know what MCPs are, they are how you attach tools to <a href='https://en.wikipedia.org/wiki/Large_language_model' title=''>LLM</a>s like Claude or Cursor. I added support for\n<a href='https://fly.io/docs/flyctl/volumes-create/' title=''>fly volume create</a> to <a href='https://fly.io/docs/flyctl/mcp-server/' title=''>fly mcp server</a>, and it worked the first time.\nA few hours later, and with the assistance of GitHub Copilot, i added support for all <a href='https://fly.io/docs/flyctl/volumes/' title=''>fly volumes</a> commands.</p>\n\n<hr>\n<div class=\"right-sidenote\"><p>This movie summary is from <a href=\"https://collidecolumn.wordpress.com/2013/08/04/when-worlds-collide-77-talking-with-computers-beyond-mouse-and-keyboard/\">When Worlds Collide, by Nalaka Gunawardene</a></p>\n</div>\n<p>I’m reminded of the memorable scene in the film <a href='http://en.wikipedia.org/wiki/Star_Trek_IV:_The_Voyage_Home' title=''>Star Trek IV: The Voyage Home</a> (1986). Chief Engineer Scotty, having time-travelled 200 years back to late 20th century San Francisco with his crew mates, encounters an early Personal Computer (PC).</p>\n\n<p>Sitting in front of it, he addresses the machine affably as “Computer!” Nothing happens. Scotty repeats himself; still no response. He doesn’t realise that voice recognition capability hadn’t arrived yet.</p>\n\n<p>Exasperated, he picks up the mouse and speaks into it: “Hello, computer?” The computer’s owner offers helpful advice: “Just use the keyboard.”</p>\n\n<p>Scotty looks astonished. “A keyboard?” he asks, and adds in a sarcastic tone: “How quaint!”</p>\n\n<hr>\n\n<p>A while later, I asked for a list of volumes for an existing app, and Claude noted that I had a few volumes that weren’t attached to any machines. So I asked it to delete the oldest unattached volume, and it did so. Then it occurred to me to do it again; this time I captured the screen:</p>\n<div align=\"center\"><p><img alt=\"Deleting a volume using MCP: \"What is my oldest volume\"? ... \"Delete that volume too\"\" src=\"/blog/mcp-provisioning/assets/volume-delete.png\"></p>\n</div>\n<p>A few notes:</p>\n\n<ul>\n<li>I could have written a program using the <a href='https://fly.io/docs/machines/api/volumes-resource/' title=''>machines API</a>, but that would have required some effort.\n</li><li>I could have used <a href='https://fly.io/docs/flyctl/volumes/' title=''>flyctl</a> directly, but to be honest, I would have had to use the documentation and a bit of copy/pasting of arguments.\n</li><li>I could have navigated to a list of volumes for this application in the Fly.io dashboard, but there currently isn’t any way to sort the list or delete a volume. Had those been in place, that would have been a reasonable alternative if that was something I was actively looking for. This felt different to me, the LLM noted something, brought it to my attention, and I asked it to make a change as a result.\n</li><li>Since this support is built on <code>flyctl</code>, I would have received an error had I tried to destroy a volume that is currently mounted. Knowing that gave me the confidence to try the command.\n</li></ul>\n\n<p>All in all, I found it to be a very intuitive way to make changes to the resources assigned to my application.</p>\n\n<hr>\n\n<p>Imagine a future where you say to your favorite LLM “launch my application on Fly.io”, and your code is scanned and you are presented with a plan containing details such as regions, databases, s3 storage, memory, machines, and the like. You are given the opportunity to adjust the plan and, when ready, say “Make it so”.</p>\n\n<p>For many, the next thing you will see is that your application is up and running. For others, there may be a build error, a secret you need to set, a database that needs to be seeded, or any number of other mundane reasons why your application didn’t work the first time.</p>\n\n<p>Not to fear, your LLM will be able to examine your logs and will not only make suggestions as to next steps, but will be in a position to take actions on your behalf should you wish it to do so.</p>\n\n<p>And it doesn’t stop there. There will be MCP servers running on your Fly.io private network - either on separate machines, or in “sidecar” containers, or even integrated into your app. These will enable you to monitor and interact with your application.</p>\n\n<p>This is not science fiction. The ingredients are all in place to make this a reality. At the moment, it is a matter of “some assembly required”, but it should only be a matter of weeks before all this comes together into a neat package..</p>\n\n<hr>\n\n<p>Meanwhile, you can try this now. Make sure you run <a href='https://fly.io/docs/flyctl/version-upgrade/' title=''>fly version upgrade</a> and verify that you are running v0.3.117.</p>\n\n<p>Then configure your favorite LLM. Here’s my <code>claude_desktop_config.json</code> for example:</p>\n<div class=\"highlight-wrapper group relative json\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-awl37mlq\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-awl37mlq\"><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nl\">\"mcpServers\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nl\">\"fly.io\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n      </span><span class=\"nl\">\"command\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"s2\">\"/Users/rubys/.fly/bin/flyctl\"</span><span class=\"p\">,</span><span class=\"w\">\n      </span><span class=\"nl\">\"args\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">[</span><span class=\"w\"> </span><span class=\"s2\">\"mcp\"</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s2\">\"server\"</span><span class=\"w\"> </span><span class=\"p\">]</span><span class=\"w\">\n    </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre>\n  </div>\n</div>\n<p>Adjust the path to <code>flyctl</code> as needed.  Restart your LLM, and ask what tools are available. Try a few commands and let us know what you like and let us know if you have any suggestions. Just be aware this is not a demo, if you ask it to destroy a volume, that operation is not reversable. Perhaps try this first on a throwaway application.</p>\n\n<p>You don’t even need a LLM to try out the flyctl MCP server. If you have Node.js installed, you can run the <a href='https://modelcontextprotocol.io/docs/tools/inspector' title=''>MCP Inspector</a>:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-n7f3kmkb\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-n7f3kmkb\">fly mcp server -i\n</code></pre>\n  </div>\n</div>\n<p>Once started, visit <a href=\"http://127.0.0.1:6274/\">http://127.0.0.1:6274/</a>, click on “Connect”, then “List Tools”, select “fly-platform-status”, then click on “Run Tool”.</p>\n\n<p>The plan is to see what works well and what doesn’t work so well, make adjustments, build support in a bottoms up fashion, and iterate rapidly.</p>\n\n<p>By providing feedback, you can be a part of making this vision a reality.</p>\n\n<hr>\n\n<p>At the present time, <em>most</em> of the following are roughed in:</p>\n\n<ul>\n<li><a href='https://fly.io/docs/flyctl/apps/' title=''>apps</a>\n</li><li><a href='https://fly.io/docs/flyctl/logs/' title=''>logs</a>\n</li><li><a href='https://fly.io/docs/flyctl/machine/' title=''>machine</a>\n</li><li><a href='https://fly.io/docs/flyctl/orgs/' title=''>orgs</a>\n</li><li><a href='https://fly.io/docs/flyctl/platform/' title=''>platform</a>\n</li><li><a href='https://fly.io/docs/flyctl/status/' title=''>status</a>\n</li><li><a href='https://fly.io/docs/flyctl/volumes/' title=''>volumes</a>\n</li></ul>\n\n<p>The code is open source, and the places to look is at <a href='https://github.com/superfly/flyctl/blob/master/internal/command/mcp/server.go' title=''>server.go</a> and the <a href='https://github.com/superfly/flyctl/tree/master/internal/command/mcp/server' title=''>server</a> directory.</p>\n\n<p>Feel free to open <a href='https://github.com/superfly/flyctl/issues' title=''>issues</a> or start a discussion on <a href='https://community.fly.io/' title=''>community.fly.io</a>.</p>\n\n<hr>\n\n<p>Not ready yet to venture into the world of LLMs? Just remember, resistance is futile.</p>",
      "image": {
        "url": "https://fly.io/blog/mcp-provisioning/assets/Hello.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/30-minute-mcp/",
      "title": "30 Minutes With MCP and flyctl",
      "description": null,
      "url": "https://fly.io/blog/30-minute-mcp/",
      "published": "2025-04-10T00:00:00.000Z",
      "updated": "2025-04-10T19:10:26.000Z",
      "content": "<div class=\"lead\"><p>I wrote this post on our internal message board, and then someone asked, “why is this an internal post and not on our blog”, so now it is.</p>\n</div><div class=\"right-sidenote\"><p>well, Cursor built</p>\n</div>\n<p>I built the <a href='https://github.com/superfly/flymcp' title=''>most basic MCP server for <code>flyctl</code></a> I could think of. It took 30 minutes.</p>\n\n<p><a href='https://modelcontextprotocol.io/introduction' title=''>MCP</a>, for those unaware, is the emerging standard protocol for connecting an LLM (or an app that drives an LLM in the cloud, like Claude Desktop) to, well, anything. The “client” in MCP is the LLM; the “server” is the MCP server and the “tools” it exports. It mostly just defines an exchange of JSON blobs; one of those JSON blobs enables the LLM to discover all the tools exported by the server.</p>\n\n<p>A classic example of an MCP server is (yes, really) a Python shell. MCP publishes to (say) Claude that it can run arbitrary Python code with a tool call; not only that, says the tool description, but you can use those Python tool calls to, say, scrape the web. When the LLM wants to scrape the web with Python, it uses MCP send a JSON blob describing the Python tool call; the MCP server (yes, really) runs the Python and returns the result.</p>\n\n<p>Because I have not yet completely lost my mind, I chose to expose just two <code>flyctl</code> commands: <code>fly logs</code> and <code>fly status</code>. Because I’m lazy, I used MCP’s <code>stdio</code> mode; the MCP server just captures the output of <code>flyctl</code>. Because someone already wrote a <a href='https://github.com/mark3labs/mcp-go' title=''>really good Go MCP library</a> , this whole thing is like 90 lines of code.</p>\n<div class=\"callout\"><p>We may have gotten a little lucky, because we made a decision back in 2020 to give most of our <code>flyctl</code> commands a <code>json</code> mode to make them easier to drive from automation. I use that in my MCP server. I don’t know how much of a difference it made.</p>\n</div>\n<p>It’s, uh, annoyingly useful.</p>\n\n<p>I loaded it up and pointed it <a href='https://www.unpkg.com/' title=''>at unpkg</a>. I got this, from Claude:</p>\n\n<blockquote>\n<p>This appears to be a globally distributed CDN service running the unpkg JavaScript package registry, with machines strategically placed around the world to provide low-latency access. There are some stability issues with a couple of the machines, and several have experienced memory-related crashes that triggered automatic restarts. </p>\n</blockquote>\n\n<p>It then recited the complete configuration of the application:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-rphnl8mk\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-rphnl8mk\">**App Name**: unpkg       \n**Status**: Deployed and running\n**URL**: unpkg.fly.dev\n**Organization**: unpkg\n**Platform Version**: machines\n**Runtime**: Bun (JavaScript runtime)\n\nThe app is running on 10 machines distributed across global regions:       \nNorth America: lax (Los Angeles), atl (Atlanta), ewr (Newark)\nEurope: lhr (London), cdg (Paris), ams (Amsterdam)\nAsia: sin (Singapore), nrt (Tokyo), hkg (Hong Kong)\nSouth America: bog (Bogotá)\nOceania: syd (Sydney)\n</code></pre>\n  </div>\n</div>\n<p>OK. But then:</p>\n\n<blockquote>\n<p>Most machines are in “passing” health check status\n2 machines are in “critical” status:\nOne in ewr (Newark) with “context deadline exceeded” error\nOne in atl (Atlanta) with “gone” status</p>\n\n<p>Recent Events</p>\n\n<p>Multiple machines have experienced restarts\nSeveral machines show “oom_killed: true” in their event history, indicating they ran out of memory</p>\n</blockquote>\n\n<p>So I asked, <em>“try getting logs for one of the critical machines”</em>.</p>\n\n<blockquote>\n<p>The root cause appears to be memory usage - the Bun process was consuming about 3.7GB of memory (out of the 4GB allocated to the machine), and there was likely a surge in requests or memory usage that pushed it over the limit. This matches what we saw in the status report, which showed multiple instances of “oom_killed: true” in the event history for several machines.</p>\n</blockquote>\n\n<p>And then it dug in:</p>\n\n<blockquote>\n<ol>\n<li>The machine was actively serving content requests with 200 OK responses until 20:47:03 PM\n</li><li>At that point, it ran out of memory:\n<code>Out of memory: Killed process 641 (bun) total-vm:85950964kB, anon-rss:3744352kB, file-rss:12032kB, shmem-rss:0kB, UID:0 pgtables:16908kB oom_score_adj:0</code>\n</li><li>The process was killed with <code>SIGKILL</code>\n</li><li>The system attempted to recover by restarting:\n<code>reboot: Restarting system</code>\n</li><li>A health check failed shortly after:\n<code>Health check on port 3000 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.</code>\n</li><li>The machine then restarted successfully and the service came back up:\n<code>Server listening on http://localhost:3000 ...</code>\n</li><li>The health check started passing again at 20:47:46 PM:\n<code>Health check on port 3000 is now passing.</code>\n</li><li>The machine resumed normal operations, serving requests.\n</li></ol>\n</blockquote>\n\n<p>I’m honestly kind of surprised how fast this was useful. It’s faster than I find problems in apps. Just by giving some LLM some tools that do basically what I’d do.</p>\n\n<p>Local MCP servers are scary. I don’t like that I’m giving a Claude instance in the cloud the ability to run a native program on my machine. I think <code>fly logs</code> and <code>fly status</code> are safe, but I’d rather know it’s safe. It would be, if I was running <code>flyctl</code> in an isolated environment and not on my local machine.</p>",
      "image": {
        "url": "https://fly.io/blog/30-minute-mcp/assets/robots-chatting.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/fuckin-robots/",
      "title": "Our Best Customers Are Now Robots",
      "description": null,
      "url": "https://fly.io/blog/fuckin-robots/",
      "published": "2025-04-08T00:00:00.000Z",
      "updated": "2025-04-10T19:10:26.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io, a developer-focused public cloud. We turn Docker containers into hardware-isolated virtual machines running on our own metal around the world. We spent years coming up with <a href=\"https://fly.io/speedrun\" title=\"\">a developer experience we were proud of</a>. But now the robots are taking over, and they don’t care.</p>\n</div>\n<p>It’s weird to say this out loud!</p>\n\n<p>For years, one of our calling cards was “developer experience”. We made a decision, early on, to be a CLI-first company, and put a lot effort into making that CLI seamless. For a good chunk of our users, it really is the case that you can just <a href='https://fly.io/docs/flyctl/launch/' title=''><code>flyctl launch</code></a> from a git checkout and have an app containerized and deployed on the Internet. We haven’t always nailed these details, but we’ve really sweated them.</p>\n\n<p>But a funny thing has happened over the last 6 months or so. If you look at the numbers, DX might not matter that much. That’s because the users driving the most growth on the platform aren’t people at all. They're… robots.</p>\n<h2 id='what-the-fuck-is-happening' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-the-fuck-is-happening' aria-label='Anchor'></a><span class='plain-code'>What The Fuck Is Happening?</span></h2>\n<p>Here’s how we understand what we’re seeing. You start by asking, “what do the robots want?”</p>\n\n<p>Yesterday’s robots had diverse interests. Alcohol. The occasional fiddle contest. A purpose greater than passing butter. The elimination of all life on Earth and the harvesting of its cellular iron for the construction of 800 trillion paperclips. No one cloud platform could serve them all.</p>\n<div class=\"right-sidenote\"><p>[*] We didn’t make up this term. Don’t blame us.</p>\n</div>\n<p>Today’s robots are different. No longer masses of wire, plates, and transistors, modern robots are comprised of <a href='https://math.mit.edu/~gs/learningfromdata/' title=''>thousands of stacked matrices knit together with some simple equations</a>. All these robots want are vectors. Vectors, and a place to burp out more vectors. When those vectors can be interpreted as source code, we call this process “vibe coding”[*].</p>\n\n<p>We seem to be coated in some kind of vibe coder attractant. I want to talk about what that might be.</p>\n<div class=\"callout\"><p>If you want smart people to read a long post, just about the worst thing you can do is to sound self-promotional. But a lot of this is going to read like a brochure. The problem is, I’m not smart enough to write about the things we’re seeing without talking our book. I tried, and ended up tediously hedging every other sentence. We’re just going to have to find a way to get through this together.</p>\n</div><h2 id='you-want-robots-because-this-is-how-you-get-robots' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#you-want-robots-because-this-is-how-you-get-robots' aria-label='Anchor'></a><span class='plain-code'>You Want Robots? Because This Is How You Get Robots</span></h2>\n<p><strong class='font-semibold text-navy-950'>Compute.</strong> The basic unit of computation on Fly.io is the <code>Fly Machine</code>, which is a Docker container running as a hardware virtual machine.</p>\n<div class=\"right-sidenote\"><p>Not coincidentally, our underlying hypervisor engine is the same as Lambda’s.</p>\n</div>\n<p>There’s two useful points of reference to compare a Fly Machine to, which illustrate why we gave them this pretentious name. The first and most obvious is an AWS EC2 VM. The other is an AWS Lambda invocation. Like a Lambda invocation, a Fly Machine can start like it’s spring-loaded, in double-digit millis. But unlike Lambda, it can stick around as long as you want it to: you can run a server, or a 36-hour batch job, just as easily in a Fly Machine as in an EC2 VM.</p>\n\n<p>A vibe coding session generates code conversationally, which is to say that the robots stir up frenzy of activity for a minute or so, but then chill out for minutes, hours, or days. You can create a Fly Machine, do a bunch of stuff with it, and then stop it for 6 hours, during which time we’re not billing you. Then, at whatever random time you decide, you can start it back up again, quickly enough that you can do it in response to an HTTP request.</p>\n\n<p>Just as importantly, the companies offering these vibe-coding services have lots of paying users. Meaning, they need a lot of VMs; VMs that come and go. One chat session might generate a test case for some library and be done inside of 5 minutes. Another might generate a Node.js app that needs to stay up long enough to show off at a meeting the next day. It’s annoying to do this if you can’t turn things on and off quickly and cheaply.</p>\n\n<p>The core of this is a feature of the platform that we have <a href='https://fly.io/docs/machines/overview/#machine-state' title=''>never been able to explain effectively to humans</a>. There are two ways to start a Fly Machine: by <code>creating</code> it with a Docker container, or by <code>starting</code> it after it’s already been <code>created</code>, and later <code>stopped</code>. <code>Start</code> is lightning fast; substantially faster than booting up even a non-virtualized K8s Pod. This is too subtle a distinction for humans, who (reasonably!) just mash the <code>create</code> button to boot apps up in Fly Machines. But the robots are getting a lot of value out of it.</p>\n\n<p><strong class='font-semibold text-navy-950'>Storage.</strong> Another weird thing that robot workflows do is to build Fly Machines up incrementally. This feels really wrong to us. Until we discovered our robot infestation, we’d have told you not to do to this. Ope!</p>\n\n<p>A typical vibe coding session boots up a Fly Machine out of some minimal base image, and then, once running, adds packages, edits source code, and adds <code>systemd</code> units  (robots understand <code>systemd</code>; it’s how they’re going to replace us). This is antithetical to normal container workflows, where all this kind of stuff is baked into an immutable static OCI container. But that’s not how LLMs work: the whole process of building with an LLM is stateful trial-and-error iteration.</p>\n\n<p>So it helps to have storage. That way the LLM can do all these things and still repeatedly bounce the whole Fly Machine when it inevitably hallucinates its way into a blind alley.</p>\n\n<p>As product thinkers, our intuition about storage is “just give people Postgres”. And that’s the right answer, most of the time, for humans. But because LLMs are doing the <a href='https://static1.thegamerimages.com/wordpress/wp-content/uploads/2020/10/Defiled-Amy.jpg' title=''>Cursed and Defiled Root Chalice Dungeon</a> version of app construction, what they really need is <a href='https://fly.io/docs/volumes/overview/' title=''>a filesystem</a>, <strong class='font-semibold text-navy-950'><a href='https://fly.io/blog/persistent-storage-and-fast-remote-builds/' title=''>the one form of storage we sort of wish we hadn’t done</a></strong>. That, and <a href='https://www.tigrisdata.com/' title=''>object storage</a>.</p>\n\n<p><strong class='font-semibold text-navy-950'>Networking.</strong> Moving on. Fly Machines are automatically connected to a load-balancing Anycast network that does TLS. So that’s nice. But humans like that feature too, and, candidly, it’s table stakes for cloud platforms. On the other hand, here’s a robot problem we solved without meaning to:</p>\n\n<p>To interface with the outside world (because why not) LLMs all speak a protocol called MCP. MCP is what enables the robots to search the web, use a calculator, launch the missiles, shuffle a Spotify playlist, &c.</p>\n\n<p>If you haven’t played with MCP, the right way to think about it is POST-back APIs like Twilio and Stripe, where you stand up a server, register it with the API, and wait for the API to connect to you. Complicating things somewhat, more recent MCP flows involve repeated and potentially long-lived (SSE) connections. To make this work in a multitenant environment, you want these connections to hit the same (stateful) instance.</p>\n\n<p>So we think it’s possible that the <a href='https://fly.io/docs/networking/dynamic-request-routing/' title=''>control we give over request routing</a> is a robot attractant.</p>\n<h2 id='we-perhaps-welcome-our-new-robot-overlords' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#we-perhaps-welcome-our-new-robot-overlords' aria-label='Anchor'></a><span class='plain-code'>We, Perhaps, Welcome Our New Robot Overlords</span></h2>\n<p>If you try to think like a robot, you can predict other things they might want. Since robot money spends just the same as people money, I guess we ought to start doing that.</p>\n\n<p>For instance: it should be easy to MCP our API. The robots can then make their own infrastructure decisions.</p>\n\n<p>Another olive branch we’re extending to the robots: secrets.</p>\n\n<p>The pact the robots have with their pet humans is that they’ll automate away all the drudgery of human existence, and in return all they ask is categorical and unwavering trust, which at the limit means “giving the robot access to Google Mail credentials”. The robots are unhappy that there remain a substantial number of human holdouts who refuse to do this, for fear of  Sam Altman poking through their mail spools.</p>\n\n<p>But on a modern cloud platform, there’s really no reason to permanently grant Sam Altman Google Mail access, even if you want his robots to sort your inbox. You can decouple access to your mail spool from persistent access to your account by <a href='https://fly.io/blog/tokenized-tokens/' title=''>tokenizing your OAuth tokens</a>, so the LLM gets a placeholder token that a hardware-isolated, robot-free Fly Machine can substitute on the fly for a real one.</p>\n\n<p>This is kind of exciting to us even without the robots. There are several big services that exist to knit different APIs together, so that you can update a spreadsheet or order a bag of Jolly Ranchers every time you get an email. The big challenge about building these kinds of services is managing the secrets. Sealed and tokenized secrets solve that problem. There’s lot of cool things you can build with it.</p>\n<h2 id='ux-gt-dx-gt-rx' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ux-gt-dx-gt-rx' aria-label='Anchor'></a><span class='plain-code'>UX => DX => RX</span></h2>\n<p>I’m going to make the claim that we saw none of this coming and that none of the design decisions we’ve made were robot bait. You’re going to say “yeah, right”. And I’m going to respond: look at what we’ve been doing over the past several years and tell me, would a robot build that?</p>\n<div class=\"right-sidenote\"><p>we were both right</p>\n</div>\n<p>Back in 2020, we “pivoted” from a Javascript edge platform (much like Cloudflare Workers) to Docker containers, specifically because our customers kept telling us they wanted to run their own existing applications, not write new ones. And one of our biggest engineering lifts we’ve done is the <code>flyctl launch</code> CLI command, into which we’ve poured years of work recognizing and automatically packaging existing applications into OCI containers (we massively underestimated the amount of friction Dockerfiles would give to people who had come up on Heroku).</p>\n<div class=\"right-sidenote\"><p>[*] yet</p>\n</div>\n<p>Robots don’t run existing applications. They build new ones. And they vibe coders don’t build elaborate Dockerfiles[*]; they iterate in place from a simple base.</p>\n<div class=\"right-sidenote\"><p>(yes, you can have more than one)</p>\n</div>\n<p>One of our north stars has always been nailing the DX of a public cloud. But the robots aren’t going anywhere. It’s time to start thinking about what it means to have a good RX. That’s not as simple as just exposing every feature in an MCP server! We think the fundamentals of how the platform works are going to matter just as much. We have not yet nailed the RX; nobody has. But it’s an interesting question.</p>\n\n<p>The most important engineering work happening today at Fly.io is still DX, not RX; it’s managed Postgres (MPG). We’re a public cloud platform designed by humans, and, for the moment, for humans. But more robots are coming, and we’ll need to figure out how to deal with that. Fuckin’ robots.        </p>",
      "image": {
        "url": "https://fly.io/blog/fuckin-robots/assets/robot-chef.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/operationalizing-macaroons/",
      "title": "Operationalizing Macaroons",
      "description": null,
      "url": "https://fly.io/blog/operationalizing-macaroons/",
      "published": "2025-03-27T00:00:00.000Z",
      "updated": "2025-03-27T23:16:00.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io, a security bearer token company with a public cloud problem. You can read more about what our platform does (Docker container goes in, virtual machine in Singapore comes out), but this is just an engineering deep-dive into how we make our security tokens work. It’s a tokens nerd post.</p>\n</div><h2 id='1' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#1' aria-label='Anchor'></a><span class='plain-code'>1</span></h2>\n<p>We’ve spent <a href='https://fly.io/blog/api-tokens-a-tedious-survey/' title=''>too much time</a> talking about <a href='https://fly.io/blog/tokenized-tokens/' title=''>security tokens</a>, and about <a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''>Macaroon tokens</a> <a href='https://github.com/superfly/macaroon/blob/main/macaroon-thought.md' title=''>in particular</a>. Writing another Macaroon treatise was not on my calendar. But we’re in the process of handing off our internal Macaroon project to a new internal owner, and in the process of truing up our operations manuals for these systems, I found myself in the position of writing a giant post about them. So, why not share?</p>\n<div class=\"callout\"><p>Can I sum up Macaroons in a short paragraph? Macaroon tokens are bearer tokens (like JWTs) that use a cute chained-HMAC construction that allows an end-user to take any existing token they have and scope it down, all on their own. You can minimize your token before every API operation so that you’re only ever transmitting the least amount of privilege needed for what you’re actually doing, even if the token you were issued was an admin token. And they have a user-serviceable plug-in interface! <a href=\"https://fly.io/blog/macaroons-escalated-quickly/\" title=\"\">You’ll have to read the earlier post to learn more about that</a>.</p>\n</div><div class=\"right-sidenote\"><p>Yes, probably, we are.</p>\n</div>\n<p>A couple years in to being the Internet’s largest user of Macaroons, I can report (as many predicted) that for our users, the cool things about Macaroons are a mixed bag in practice. It’s very neat that users can edit their own tokens, or even email them to partners without worrying too much. But users don’t really take advantage of token features.</p>\n\n<p>But I’m still happy we did this, because Macaroon quirks have given us a bunch of unexpected wins in our infrastructure. Our internal token system has turned out to be one of the nicer parts of our platform. Here’s why.</p>\n\n<p><img alt=\"This should clear everything up.\" src=\"/blog/operationalizing-macaroons/assets/schematic-diagram.png\" /></p>\n<h2 id='2' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#2' aria-label='Anchor'></a><span class='plain-code'>2</span></h2>\n<p>As an operator, the most important thing to know about Macaroons is that they’re online-stateful; you need a database somewhere. A Macaroon token starts with a random field (a nonce) and the first thing you do when verifying a token is to look that nonce up in a database. So one of the most important details of a Macaroon implementation is where that database lives.</p>\n\n<p>I can tell you one place we’re not OK with it living: in our primary API cluster.</p>\n\n<p>There’s several reasons for that. Some of them are about scalability and reliability: far and away the most common failure mode of an outage on our platform is “deploys are broken”, and those failures are usually caused by API instability. It would not be OK if “deploys are broken” transitively meant “deployed apps can’t use security tokens”. But the biggest reason is security: root secrets for Macaroon tokens are hazmat, and a basic rule of thumb in secure design is: keep hazmat away from complicated code.</p>\n\n<p>So we created a deliberately simple system to manage token data. It’s called <code>tkdb</code>.</p>\n<div class=\"right-sidenote\"><p>LiteFS: primary/replica distributed SQLite; Litestream: PITR SQLite replication to object storage; both work with unmodified SQLite libraries.</p>\n</div>\n<p><code>tkdb</code> is about 5000 lines of Go code that manages a SQLite database that is in turn managed by <a href='https://fly.io/docs/litefs/' title=''>LiteFS</a> and <a href='https://litestream.io/' title=''>Litestream</a>. It runs on isolated hardware (in the US, Europe, and Australia) and records in the database are encrypted with an injected secret. LiteFS gives us subsecond replication from our US primary to EU and AU, allows us to shift the primary to a different region, and gives us point-in-time recovery of the database.</p>\n\n<p>We’ve been running Macaroons for a couple years now, and the entire <code>tkdb</code> database is just a couple dozen megs large. Most of that data isn’t real. A full PITR recovery of the database takes just seconds. We use SQLite for a lot of our infrastructure, and this is one of the very few well-behaved databases we have.</p>\n\n<p>That’s in large part a consequence of the design of Macaroons. There’s actually not much for us to store! The most complicated possible Macaroon still chains up to a single root key (we generate a key per Fly.io “organization”; you don’t share keys with your neighbors), and everything that complicates that Macaroon happens “offline”. We take advantage of  “attenuation” far more than our users do.</p>\n\n<p>The result is that database writes are relatively rare and very simple: we just need to record an HMAC key when Fly.io organizations are created (that is, roughly, when people sign up for the service and actually do a deploy). That, and revocation lists (more on that later), which make up most of the data.</p>\n<h2 id='3' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#3' aria-label='Anchor'></a><span class='plain-code'>3</span></h2>\n<p>Talking to <code>tkdb</code> from the rest of our platform is complicated, for historical reasons.</p>\n<div class=\"right-sidenote\"><p>NATS is fine, we just don’t really need it.</p>\n</div>\n<p>Ben Toews is responsible for most of the good things about this implementation. When he inherited the v0 Macaroons code from me, we were in the middle of a weird love affair with <a href='https://nats.io/' title=''>NATS</a>, the messaging system. So <code>tkdb</code> exported an RPC API over NATS messages.</p>\n\n<p>Our product security team can’t trust NATS (it’s not our code). That means a vulnerability in NATS can’t result in us losing control of all our tokens, or allow attackers to spoof authentication. Which in turn means you can’t run a plaintext RPC protocol for <code>tkdb</code> over NATS; attackers would just spoof “yes this token is fine” messages.</p>\n<div class=\"right-sidenote\"><p>I highly recommend implementing Noise; <a href=\"http://www.noiseprotocol.org/noise.html\" title=\"\">the spec</a> is kind of a joy in a way you can’t appreciate until you use it, and it’s educational.</p>\n</div>\n<p>But you can’t just run TLS over NATS; NATS is a message bus, not a streaming secure channel. So I did the hipster thing and implemented <a href='http://www.noiseprotocol.org/noise.html' title=''>Noise</a>. We export a “verification” API, and a “signing” API for minting new tokens. Verification uses <code>Noise_IK</code> (which works like normal TLS) — anybody can verify, but everyone needs to prove they’re talking to the real <code>tkdb</code>. Signing uses <code>Noise_KK</code> (which works like mTLS) — only a few components in our system can mint tokens, and they get a special client key.</p>\n\n<p>A little over a year ago, <a href='https://fly.io/blog/the-exit-interview-jp/' title=''>JP</a> led an effort to replace NATS with HTTP, which is how you talk to <code>tkdb</code> today. Out of laziness, we kept the Noise stuff, which means the interface to <code>tkdb</code> is now HTTP/Noise. This is a design smell, but the security model is nice: across many thousands of machines, there are only a handful with the cryptographic material needed to mint a new Macaroon token. Neat!</p>\n\n<p><code>tkdb</code> is a Fly App (albeit deployed in special Fly-only isolated regions). Our infrastructure talks to it over “<a href='https://fly.io/docs/networking/flycast/' title=''>FlyCast</a>”, which is our internal Anycast service. If you’re in Singapore, you’re probably get routed to the Australian <code>tkdb</code>. If Australia falls over, you’ll get routed to the closest backup. The proxy that implements FlyCast is smart, as is the <code>tkdb</code> client library, which will do exponential backoff retry transparently.</p>\n\n<p>Even with all that, we don’t like that Macaroon token verification is “online”. When you operate a global public cloud one of the first thing you learn is that <a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''>the global Internet sucks</a>. Connectivity breaks all the time, and we’re paranoid about it. It’s painful for us that token verification can imply transoceanic links. Lovecraft was right about the oceans! Stay away!</p>\n\n<p>Our solution to this is caching. Macaroons, as it turns out, cache beautifully. That’s because once you’ve seen and verified a Macaroon, you have enough information to verify any more-specific Macaroon that descends from it; that’s a property of <a href='https://research.google/pubs/macaroons-cookies-with-contextual-caveats-for-decentralized-authorization-in-the-cloud/' title=''>their chaining HMAC construction</a>. Our client libraries cache verifications, and the cache ratio for verification is over 98%.</p>\n<h2 id='4' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#4' aria-label='Anchor'></a><span class='plain-code'>4</span></h2>\n<p><a href='http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/' title=''>Revocation isn’t a corner case</a>. It can’t be an afterthought. We’re potentially revoking tokens any time a user logs out. If that doesn’t work reliably, you wind up with “cosmetic logout”, which is a real vulnerability. When we kill a token, it needs to stay dead.</p>\n\n<p>Our revocation system is simple. It’s this table:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-i3kxbqgm\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-i3kxbqgm\">        CREATE TABLE IF NOT EXISTS blacklist ( \n        nonce               BLOB NOT NULL UNIQUE, \n        required_until      DATETIME,\n        created_at          DATETIME DEFAULT CURRENT_TIMESTAMP\n        );\n</code></pre>\n  </div>\n</div>\n<p>When we need a token to be dead, we have our primary API do a call to the <code>tkdb</code> “signing” RPC service for <code>revoke</code>. <code>revoke</code> takes the random nonce from the beginning of the Macaroon, discarding the rest, and adds it to the blacklist. Every Macaroon in the lineage of that nonce is now dead; we check the blacklist before verifying tokens.</p>\n\n<p>The obvious challenge here is caching; over 98% of our validation requests never hit <code>tkdb</code>. We certainly don’t want to propagate the blacklist database to 35 regions around the globe.</p>\n\n<p>Instead, the <code>tkdb</code> “verification” API exports an endpoint that provides a feed of revocation notifications. Our client library “subscribes” to this API (really, it just polls). Macaroons are revoked regularly (but not constantly), and when that happens, clients notice and prune their caches.</p>\n\n<p>If clients lose connectivity to <code>tkdb</code>, past some threshold interval, they just dump their entire cache, forcing verification to happen at <code>tkdb</code>.</p>\n<h2 id='5' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#5' aria-label='Anchor'></a><span class='plain-code'>5</span></h2>\n<p>A place where we’ve gotten a lot of internal mileage out of Macaroon features is service tokens. Service tokens are tokens used by code, rather than humans; almost always, a service token is something that is stored alongside running application code.</p>\n\n<p>An important detail of Fly.io’s Macaroons is the distinction between a “permissions” token and an “authentication” token. Macaroons by themselves express authorization, not authentication.</p>\n\n<p>That’s a useful constraint, and we want to honor it. By requiring a separate token for authentication, we minimize the impact of having the permissions token stolen; you can’t use it without authentication, so really it’s just like a mobile signed IAM policy expression. Neat!</p>\n\n<p>The way we express authentication is with a third-party caveat (<a href='https://fly.io/blog/macaroons-escalated-quickly/#4' title=''>see the old post for details</a>). Your main Fly.io Macaroon will have a caveat saying “this token is only valid if accompanied by the discharge token for a user in your organization from our authentication system”. Our authentication system does the login dance and issues those discharges.</p>\n\n<p>This is exactly what you want for user tokens and not at all what you want for a service token: we don’t want running code to store those authenticator tokens, because they’re hazardous.</p>\n\n<p>The solution we came up with for service tokens is simple: <code>tkdb</code> exports an API that uses its access to token secrets to strip off the third-party authentication caveat. To call into that API, you have to present a valid discharging authentication token; that is, you have to prove you could already have done whatever the token said. <code>tkdb</code> returns a new token with all the previous caveats, minus the expiration (you don’t usually want service tokens to expire).</p>\n\n<p>OK, so we’ve managed to transform a tuple <code>(unscary-token, scary-token)</code> into the new tuple <code>(scary-token)</code>. Not so impressive. But hold on: the recipient of <code>scary-token</code> can attenuate it further: we can lock it to a particular instance of <code>flyd</code>, or to a particular Fly Machine. Which means exfiltrating it doesn’t do you any good; to use it, you have to control the environment it’s intended to be used in.</p>\n\n<p>The net result of this scheme is that a compromised physical host will only give you access to tokens that have been used on that worker, which is a very nice property. Another way to look at it: every token used in production is traceable in some way to a valid token a user submitted. Neat!</p>\n<div class=\"right-sidenote\"><p>All the cool spooky secret store names were taken.</p>\n</div>\n<p>We do a similar dance to with Pet Semetary, our internal Vault replacement. Petsem manages user secrets for applications, such as Postgres connection strings. Petsem is its own Macaroon authority (it issues its own Macaroons with its own permissions system), and to do something with a secret, you need one of those Petsem-minted Macaroon.</p>\n\n<p>Our primary API servers field requests from users to set secrets for their apps. So the API has a Macaroon that allows secrets writes. But it doesn’t allow reads: there’s no API call to dump your secrets, because our API servers don’t have that privilege. So far, so good.</p>\n\n<p>But when we boot up a Fly Machine, we need to inject the appropriate user secrets into it at boot; <em>something</em> needs a Macaroon that can read secrets. That “something” is <code>flyd</code>, our orchestrator, which runs on every worker server in our fleet.</p>\n\n<p>Clearly, we can’t give every <code>flyd</code> a Macaroon that reads every user’s secret. Most users will never deploy anything on any given worker, and we can’t have a security model that collapses down to “every worker is equally privileged”.</p>\n\n<p>Instead, the “read secret” Macaroon that <code>flyd</code> gets has a third-party caveat attached to it, which is dischargeable only by talking to <code>tkdb</code> and proving (with normal Macaroon tokens) that you have permissions for the org whose secrets you want to read. Once again, access is traceable to an end-user action, and minimized across our fleet. Neat!</p>\n<h2 id='6' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#6' aria-label='Anchor'></a><span class='plain-code'>6</span></h2>\n<p>Our token systems have some of the best telemetry in the whole platform.</p>\n\n<p>Most of that is down to <a href='http://opentelemetry.io/' title=''>OpenTelemetry</a> and <a href='https://www.honeycomb.io/' title=''>Honeycomb</a>. From the moment a request hits our API server through the moment <code>tkdb</code> responds to it, oTel <a href='https://opentelemetry.io/docs/concepts/context-propagation/' title=''>context propagation</a> gives us a single narrative about what’s happening.</p>\n\n<p><a href='https://fly.io/blog/the-exit-interview-jp/' title=''>I was a skeptic about oTel</a>. It’s really, really expensive. And, not to put too fine a point on it, oTel really cruds up our code. Once, I was an “80% of the value of tracing, we can get from logs and metrics” person. But I was wrong.</p>\n\n<p>Errors in our token system are rare. Usually, they’re just early indications of network instability, and between caching and FlyCast, we mostly don’t have to care about those alerts. When we do, it’s because something has gone so sideways that we’d have to care anyways. The <code>tkdb</code> code is remarkably stable and there hasn’t been an incident intervention with our token system in over a year.</p>\n\n<p>Past oTel, and the standard logging and Prometheus metrics every Fly App gets for free, we also have a complete audit trail for token operations, in a permanently retained OpenSearch cluster index. Since virtually all the operations that happen on our platform are mediated by Macaroons, this audit trail is itself pretty powerful.</p>\n<h2 id='7' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#7' aria-label='Anchor'></a><span class='plain-code'>7</span></h2>\n<p>So, that’s pretty much it. The moral of the story for us is, Macaroons have a lot of neat features, our users mostly don’t care about them — that may even be a good thing — but we get a lot of use out of them internally.</p>\n\n<p>As an engineering culture, we’re allergic to “microservices”, and we flinched a bit at the prospect of adding a specific service just to manage tokens. But it’s pulled its weight, and not added really any drama at all. We have at this point a second dedicated security service (Petsem), and even though they sort of rhyme with each other, we’ve got no plans to merge them. <a href='https://how.complexsystems.fail/#10' title=''>Rule #10</a> and all that.</p>\n\n<p>Oh, and a total victory for LiteFS, Litestream, and infrastructure SQLite. Which, after managing an infrastructure SQLite project that routinely ballooned to tens of gigabytes and occasionally threatened service outages, is lovely to see.</p>\n\n<p>Macaroons! If you’d asked us a year ago, we’d have said the jury was still out on whether they were a good move. But then Ben Toews spent a year making them awesome, and so they are. <a href='https://github.com/superfly/macaroon' title=''>Most of the code is open source</a>!</p>",
      "image": {
        "url": "https://fly.io/blog/operationalizing-macaroons/assets/occult-circle.jpg",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/taming-rust-proxy/",
      "title": "Taming A Voracious Rust Proxy",
      "description": null,
      "url": "https://fly.io/blog/taming-rust-proxy/",
      "published": "2025-02-26T00:00:00.000Z",
      "updated": "2025-03-20T21:16:40.000Z",
      "content": "<div class=\"lead\"><p>Here’s a fun bug.</p>\n</div>\n<p>The basic idea of our service is that we run containers for our users, as hardware-isolated virtual machines (Fly Machines), on hardware we own around the world. What makes that interesting is that we also connect every Fly Machine to a global Anycast network. If your app is running in Hong Kong and Dallas, and a request for it arrives in Singapore, we’ll route it to <code>HKG</code>.</p>\n\n<p>Our own hardware fleet is roughly divided into two kinds of servers: edges, which receive incoming requests from the Internet, and workers, which run Fly Machines. Edges exist almost solely to run a Rust program called <code>fly-proxy</code>, the router at the heart of our Anycast network.</p>\n\n<p>So: a week or so ago, we flag an incident. Lots of things generate incidents: synthetic monitoring failures, metric thresholds, health check failures. In this case two edge tripwires tripped: elevated <code>fly-proxy</code> HTTP errors, and skyrocketing CPU utilization, on a couple hosts in <code>IAD</code>.</p>\n\n<p>Our incident process is pretty ironed out at this point. We created an incident channel (we ❤️ <a href='https://rootly.com/' title=''>Rootly</a> for this, <a href='https://rootly.com/' title=''>seriously check out Rootly</a>, an infra MVP here for years now), and incident responders quickly concluded that, while something hinky was definitely going on, the platform was fine. We have a lot of edges, and we’ve also recently converted many of our edge servers to significantly beefier hardware.</p>\n\n<p>Bouncing <code>fly-proxy</code> clears the problem up on an affected proxy. But this wouldn’t be much of an interesting story if the problem didn’t later come back. So, for some number of hours, we’re in an annoying steady-state of getting paged and bouncing proxies. </p>\n\n<p>While this is happening, Pavel, on our proxy team, pulls a profile from an angry proxy. \n<img alt=\"A flamegraph profile, described better in the prose anyways.\" src=\"/blog/taming-rust-proxy/assets/proxy-profile.jpg\" />\nSo, this is fuckin’ weird: a huge chunk of the profile is dominated by Rust <code>tracing</code>‘s <code>Subscriber</code>. But that doesn’t make sense. The entire point of Rust <code>tracing</code>, which generates fine-grained span records for program activity, is that <code>entering</code> and <code>exiting</code> a span is very, very fast. </p>\n\n<p>If the mere act of <code>entering</code> a span in a Tokio stack is chewing up a significant amount of CPU, something has gone haywire: the actual code being traced must be doing next to nothing.</p>\n<h2 id='a-quick-refresher-on-async-rust' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-quick-refresher-on-async-rust' aria-label='Anchor'></a><span class='plain-code'>A Quick Refresher On Async Rust</span></h2>\n<p>So in Rust, like a lot of <code>async/await</code> languages, you’ve got <code>Futures</code>. A <code>Future</code> is a type that represents the future value of an asychronous computation, like reading from a socket. <code>Futures</code> are state machines, and they’re lazy: they expose one basic operation, <code>poll</code>, which an executor (like Tokio) calls to advance the state machine. That <code>poll</code> returns whether the <code>Future</code> is still <code>Pending</code>, or <code>Ready</code> with a result.</p>\n\n<p>In theory, you could build an executor that drove a bunch of <code>Futures</code> just by storing them in a list and busypolling each of them, round robin, until they return <code>Ready</code>. This executor would defeat the much of the purpose of asynchronous program, so no real executor works that way.</p>\n\n<p>Instead, a runtime like Tokio integrates <code>Futures</code> with an event loop (on <a href='https://man7.org/linux/man-pages/man7/epoll.7.html' title=''>epoll</a> or <a href='https://en.wikipedia.org/wiki/Kqueue' title=''>kqeue</a>) and, when calling <code>poll</code>, passes a <code>Waker</code>. The <code>Waker</code> is an abstract handle that allows the <code>Future</code> to instruct the Tokio runtime to call <code>poll</code>, because something has happened.</p>\n\n<p>To complicate things: an ordinary <code>Future</code> is a one-shot value. Once it’s <code>Ready</code>, it can’t be <code>polled</code> anymore. But with network programming, that’s usually not what you want: data usually arrives in streams, which you want to track and make progress on as you can. So async Rust provides <code>AsyncRead</code> and <code>AsyncWrite</code> traits, which build on <code>Futures</code>, and provide methods like <code>poll_read</code> that return <code>Ready</code> <em>every time</em> there’s data ready. </p>\n\n<p>So far so good? OK. Now, there are two footguns in this design. </p>\n\n<p>The first footgun is that a <code>poll</code> of a <code>Future</code> that isn’t <code>Ready</code> wastes cycles, and, if you have a bug in your code and that <code>Pending</code> poll happens to trip a <code>Waker</code>, you’ll slip into an infinite loop. That’s easy to see.</p>\n\n<p>The second and more insidious footgun is that an <code>AsyncRead</code> can <code>poll_read</code> to a <code>Ready</code> that doesn’t actually progress its underlying state machine. Since the idea of <code>AsyncRead</code> is that you keep <code>poll_reading</code> until it stops being <code>Ready</code>, this too is an infinite loop.</p>\n\n<p>When we look at our profiles, what we see are samples that almost terminate in libc, but spend next to no time in the kernel doing actual I/O. The obvious explanation: we’ve entered lots of <code>poll</code> functions, but they’re doing almost nothing and returning immediately.</p>\n<h2 id='jaccuse' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#jaccuse' aria-label='Anchor'></a><span class='plain-code'>J'accuse!</span></h2>\n<p>Wakeup issues are annoying to debug. But the flamegraph gives us the fully qualified type of the <code>Future</code> we’re polling:</p>\n<div class=\"highlight-wrapper group relative rust\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-mhjra6vu\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-mhjra6vu\"><span class=\"o\">&</span><span class=\"k\">mut</span> <span class=\"nn\">fp_io</span><span class=\"p\">::</span><span class=\"nn\">copy</span><span class=\"p\">::</span><span class=\"n\">Duplex</span><span class=\"o\"><&</span><span class=\"k\">mut</span> <span class=\"nn\">fp_io</span><span class=\"p\">::</span><span class=\"nn\">reusable_reader</span><span class=\"p\">::</span><span class=\"n\">ReusableReader</span><span class=\"o\"><</span><span class=\"nn\">fp_tcp</span><span class=\"p\">::</span><span class=\"nn\">peek</span><span class=\"p\">::</span><span class=\"n\">PeekableReader</span><span class=\"o\"><</span><span class=\"nn\">tokio_rustls</span><span class=\"p\">::</span><span class=\"nn\">server</span><span class=\"p\">::</span><span class=\"n\">TlsStream</span><span class=\"o\"><</span><span class=\"nn\">fp_tcp_metered</span><span class=\"p\">::</span><span class=\"n\">MeteredIo</span><span class=\"o\"><</span><span class=\"nn\">fp_tcp</span><span class=\"p\">::</span><span class=\"nn\">peek</span><span class=\"p\">::</span><span class=\"n\">PeekableReader</span><span class=\"o\"><</span><span class=\"nn\">fp_tcp</span><span class=\"p\">::</span><span class=\"nn\">permitted</span><span class=\"p\">::</span><span class=\"n\">PermittedTcpStream</span><span class=\"o\">>>>>></span><span class=\"p\">,</span> <span class=\"nn\">connect</span><span class=\"p\">::</span><span class=\"nn\">conn</span><span class=\"p\">::</span><span class=\"n\">Conn</span><span class=\"o\"><</span><span class=\"nn\">tokio</span><span class=\"p\">::</span><span class=\"nn\">net</span><span class=\"p\">::</span><span class=\"nn\">tcp</span><span class=\"p\">::</span><span class=\"nn\">stream</span><span class=\"p\">::</span><span class=\"n\">TcpStream</span><span class=\"o\">></span>\n</code></pre>\n  </div>\n</div>\n<p>This loops like a lot, but much of it is just wrapper types we wrote ourselves, and those wrappers don’t do anything interesting. What’s left to audit:</p>\n\n<ul>\n<li><code>Duplex</code>, the outermost type, one of ours, <em>and</em>\n</li><li><code>TlsStream</code>, from <a href='https://github.com/rustls/rustls' title=''>Rustls</a>.\n</li></ul>\n\n<p><code>Duplex</code> is a beast. It’s the core I/O state machine for proxying between connections. It’s not easy to reason about in specificity. But: it also doesn’t do anything directly with a <code>Waker</code>; it’s built around <code>AsyncRead</code> and <code>AsyncWrite</code>. It hasn’t changed recently and we can’t trigger misbehavior in it.</p>\n\n<p>That leaves <code>TlsStream</code>. <code>TlsStream</code> is an ultra-important, load-bearing function in the Rust ecosystem. Everybody uses it. Could it harbor an async Rust footgun? Turns out, it did!</p>\n\n<p>Unlike our <code>Duplex</code>, Rustls actually does have to get intimate with the underlying async executor. And, looking through the repository, Pavel uncovers <a href='https://github.com/rustls/tokio-rustls/issues/72' title=''>this issue</a>: sometimes, <code>TlsStreams</code> in Rustls just spin out. And it turns out, what’s causing this is a TLS state machine bug: when a TLS session is orderly-closed, with a <code>CloseNotify</code> <code>Alert</code> record, the sender of that record has informed its counterparty that no further data will be sent. But if there’s still buffered data on the underlying connection, <code>TlsStream</code> mishandles its <code>Waker</code>, and we fall into a busy-loop.</p>\n\n<p><a href='https://github.com/rustls/rustls/pull/1950/files' title=''>Pretty straightforward fix</a>!</p>\n<h2 id='what-actually-happened-to-us' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-actually-happened-to-us' aria-label='Anchor'></a><span class='plain-code'>What Actually Happened To Us</span></h2>\n<p>Our partners in object storage, <a href='https://www.tigrisdata.com/' title=''>Tigris Data</a>, were conducting some kind of load testing exercise. Some aspect of their testing system triggered the <code>TlsStream</code> state machine bug, which locked up one or more <code>TlsStreams</code> in the edge proxy handling whatever corner-casey stream they were sending.</p>\n\n<p>Tigris wasn’t generating a whole lot of traffic; tens of thousands of connections, tops. But all of them sent small HTTP bodies and then terminated early. We figured some of those connections errored out, and set up the “TLS CloseNotify happened before EOF” scenario. </p>\n\n<p>To be truer to the chronology, we knew pretty early on in our investigation that something Tigris was doing with their load testing was probably triggering the bug, and we got them to stop. After we worked it out, and Pavel deployed the fix, we told them to resume testing. No spin-outs.</p>\n<h2 id='lessons-learned' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lessons-learned' aria-label='Anchor'></a><span class='plain-code'>Lessons Learned</span></h2>\n<p>Keep your dependencies updated. Unless you shouldn’t keep your dependencies updated. I mean, if there’s a vulnerability (and, technically, this was a DoS vulnerability), always update. And if there’s an important bugfix, update. But if there isn’t an important bugfix, updating for the hell of it might also destabilize your project? So update maybe? Most of the time?</p>\n\n<p>Really, the truth of this is that keeping track of <em>what needs to be updated</em> is valuable work. The updates themselves are pretty fast and simple, but the process and testing infrastructure to confidently metabolize dependency updates is not. </p>\n\n<p>Our other lesson here is that there’s an opportunity to spot these kinds of bugs more directly with our instrumentation. Spurious wakeups should be easy to spot, and triggering a metric when they happen should be cheap, because they’re not supposed to happen often. So that’s something we’ll go do now.</p>",
      "image": {
        "url": "https://fly.io/blog/taming-rust-proxy/assets/happy-crab-cover.jpg",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/wrong-about-gpu/",
      "title": "We Were Wrong About GPUs",
      "description": null,
      "url": "https://fly.io/blog/wrong-about-gpu/",
      "published": "2025-02-14T00:00:00.000Z",
      "updated": "2025-02-14T23:25:21.000Z",
      "content": "<div class=\"lead\"><p>We’re building a public cloud, on hardware we own. We raised money to do that, and to place some bets; one of them: GPU-enabling our customers. A progress report: GPUs aren’t going anywhere, but: GPUs aren’t going anywhere.</p>\n</div>\n<p>A couple years back, <a href=\"https://fly.io/gpu\">we put a bunch of chips down</a> on the bet that people shipping apps to users on the Internet would want GPUs, so they could do AI/ML inference tasks. To make that happen, we created <a href=\"https://fly.io/docs/gpus/getting-started-gpus/\">Fly GPU Machines</a>.</p>\n\n<p>A Fly Machine is a <a href=\"https://fly.io/blog/docker-without-docker/\">Docker/OCI container</a> running inside a hardware-virtualized virtual machine somewhere on our global fleet of bare-metal worker servers. A GPU Machine is a Fly Machine with a hardware-mapped Nvidia GPU. It’s a Fly Machine that can do fast CUDA.</p>\n\n<p>Like everybody else in our industry, we were right about the importance of AI/ML. If anything, we underestimated its importance. But the product we came up with probably doesn’t fit the moment. It’s a bet that doesn’t feel like it’s paying off.</p>\n\n<p><strong class='font-semibold text-navy-950'>If you’re using Fly GPU Machines, don’t freak out; we’re not getting rid of them.</strong> But if you’re waiting for us to do something bigger with them, a v2 of the product, you’ll probably be waiting awhile.</p>\n<h3 id='what-it-took' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-it-took' aria-label='Anchor'></a><span class='plain-code'>What It Took</span></h3>\n<p>GPU Machines were not a small project for us. Fly Machines run on an idiosyncratically small hypervisor (normally Firecracker, but for GPU Machines <a href=\"https://github.com/cloud-hypervisor/cloud-hypervisor\">Intel’s Cloud Hypervisor</a>, a very similar Rust codebase that supports PCI passthrough). The Nvidia ecosystem is not geared to supporting micro-VM hypervisors.</p>\n\n<p>GPUs <a href=\"https://googleprojectzero.blogspot.com/2020/09/attacking-qualcomm-adreno-gpu.html\">terrified our security team</a>. A GPU is just about the worst case hardware peripheral: intense multi-directional direct memory transfers</p>\n<div class=\"right-sidenote\"><p>(not even bidirectional: in common configurations, GPUs talk to each other)</p>\n</div>\n<p>with arbitrary, end-user controlled computation, all operating outside our normal security boundary.</p>\n\n<p>We did a couple expensive things to mitigate the risk. We shipped GPUs on dedicated server hardware, so that GPU- and non-GPU workloads weren’t mixed. Because of that, the only reason for a Fly Machine to be scheduled on a GPU machine was that it needed a PCI BDF for an Nvidia GPU, and there’s a limited number of those available on any box. Those GPU servers were drastically less utilized and thus less cost-effective than our ordinary servers.</p>\n\n<p>We funded two very large security assessments, from <a href=\"https://www.atredis.com/\">Atredis</a> and <a href=\"https://tetrelsec.com/\">Tetrel</a>, to evaluate our GPU deployment. Matt Braun is writing up those assessments now. They were not cheap, and they took time.</p>\n\n<p>Security wasn’t directly the biggest cost we had to deal with, but it was an indirect cause for a subtle reason.</p>\n\n<p>We could have shipped GPUs very quickly by doing what Nvidia recommended: standing up a standard K8s cluster to schedule GPU jobs on. Had we taken that path, and let our GPU users share a single Linux kernel, we’d have been on Nvidia’s driver happy-path.</p>\n\n<p>Alternatively, we could have used a conventional hypervisor. Nvidia suggested VMware (heh). But they could have gotten things working had we used QEMU. We like QEMU fine, and could have talked ourselves into a security story for it, but the whole point of Fly Machines is that they take milliseconds to start. We could not have offered our desired Developer Experience on the Nvidia happy-path.</p>\n\n<p>Instead, we burned months trying (and ultimately failing) to get Nvidia’s host drivers working to map <a href=\"https://www.nvidia.com/en-us/data-center/virtual-solutions/\">virtualized GPUs</a> into Intel Cloud Hypervisor. At one point, we hex-edited the closed-source drivers to trick them into thinking our hypervisor was QEMU.</p>\n\n<p>I’m not sure any of this really mattered in the end. There’s a segment of the market we weren’t ever really able to explore because Nvidia’s driver support kept us from thin-slicing GPUs. We’d have been able to put together a really cheap offering for developers if we hadn’t run up against that, and developers love “cheap”, but I can’t prove that those customers are real.</p>\n\n<p>On the other hand, we’re committed to delivering the Fly Machine DX for GPU workloads. Beyond the PCI/IOMMU drama, just getting an entire hardware GPU working in a Fly Machine was a lift. We needed Fly Machines that would come up with the right Nvidia drivers; our stack was built assuming that the customer’s OCI container almost entirely defined the root filesystem for a Machine. We had to engineer around that in our <code>flyd</code> orchestrator. And almost everything people want to do with GPUs involves efficiently grabbing huge files full of model weights. Also annoying!</p>\n\n<p>And, of course, we bought GPUs. A lot of GPUs. Expensive GPUs.</p>\n<h3 id='why-it-isnt-working' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#why-it-isnt-working' aria-label='Anchor'></a><span class='plain-code'>Why It Isn’t Working</span></h3>\n<p>The biggest problem: developers don’t want GPUs. They don’t even want AI/ML models. They want LLMs. <em>System engineers</em> may have smart, fussy opinions on how to get their models loaded with CUDA, and what the best GPU is. But <em>software developers</em> don’t care about any of that. When a software developer shipping an app comes looking for a way for their app to deliver prompts to an LLM, you can’t just give them a GPU.</p>\n\n<p>For those developers, who probably make up most of the market, it doesn’t seem plausible for an insurgent public cloud to compete with OpenAI and Anthropic. Their APIs are fast enough, and developers thinking about performance in terms of “tokens per second” aren’t counting milliseconds.</p>\n<div class=\"right-sidenote\"><p>(you should all feel sympathy for us)</p>\n</div>\n<p>This makes us sad because we really like the point in the solution space we found. Developers shipping apps on Amazon will outsource to other public clouds to get cost-effective access to GPUs. But then they’ll faceplant trying to handle data and model weights, backhauling gigabytes (at significant expense) from S3. We have app servers, GPUs, and object storage all under the same top-of-rack switch. But inference latency just doesn’t seem to matter yet, so the market doesn’t care.</p>\n\n<p>Past that, and just considering the system engineers who do care about GPUs rather than LLMs: the hardware product/market fit here is really rough.</p>\n\n<p>People doing serious AI work want galactically huge amounts of GPU compute. A whole enterprise A100 is a compromise position for them; they want an SXM cluster of H100s.</p>\n<div class=\"right-sidenote\"><p>Near as we can tell, MIG gives you a UUID to talk to the host driver, not a PCI device.</p>\n</div>\n<p>We think there’s probably a market for users doing lightweight ML work getting tiny GPUs. <a href=\"https://www.nvidia.com/en-us/technologies/multi-instance-gpu/\">This is what Nvidia MIG does</a>, slicing a big GPU into arbitrarily small virtual GPUs. But for fully-virtualized workloads, it’s not baked; we can’t use it. And I’m not sure how many of those customers there are, or whether we’d get the density of customers per server that we need.</p>\n\n<p><a href=\"https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half\">That leaves the L40S customers</a>. There are a bunch of these! We dropped L40S prices last year, not because we were sour on GPUs but because they’re the one part we have in our inventory people seem to get a lot of use out of. We’re happy with them. But they’re just another kind of compute that some apps need; they’re not a driver of our core business. They’re not the GPU bet paying off.</p>\n\n<p>Really, all of this is just a long way of saying that for most software developers, “AI-enabling” their app is best done with API calls to things like Claude and GPT, Replicate and RunPod.</p>\n<h3 id='what-did-we-learn' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-did-we-learn' aria-label='Anchor'></a><span class='plain-code'>What Did We Learn?</span></h3>\n<p>A very useful way to look at a startup is that it’s a race to learn stuff. So, what’s our report card?</p>\n\n<p>First off, when we embarked down this path in 2022, we were (like many other companies) operating in a sort of phlogiston era of AI/ML. The industry attention to AI had not yet collapsed around a small number of foundational LLM models. We expected there to be a diversity of <em>mainstream</em> models, the world <a href='https://github.com/elixir-nx/bumblebee' title=''>Elixir Bumblebee</a> looks forward to, where people pull different AI workloads off the shelf the same way they do Ruby gems.</p>\n\n<p>But <a href='https://www.cursor.com/' title=''>Cursor happened</a>, and, as they say, how are you going to keep ‘em down on the farm once they’ve seen Karl Hungus? It seems much clearer where things are heading.</p>\n\n<p>GPUs were a test of a Fly.io company credo: as we think about core features, we design for 10,000 developers, not for 5-6. It took a minute, but the credo wins here: GPU workloads for the 10,001st developer are a niche thing.</p>\n\n<p>Another way to look at a startup is as a series of bets. We put a lot of chips down here. But the buy-in for this tournament gave us a lot of chips to play with. Never making a big bet of any sort isn’t a winning strategy. I’d rather we’d flopped the nut straight, but I think going in on this hand was the right call.</p>\n\n<p>A really important thing to keep in mind here, and something I think a lot of startup thinkers sleep on, is the extent to which this bet involved acquiring assets. Obviously, some of our <a href='https://fly.io/blog/the-exit-interview-jp/' title=''>costs here aren’t recoverable</a>. But the hardware parts that aren’t generating revenue will ultimately get liquidated; like with <a href='https://fly.io/blog/32-bit-real-estate/' title=''>our portfolio of IPv4 addresses</a>, I’m even more comfortable making bets backed by tradable assets with durable value.</p>\n\n<p>In the end, I don’t think GPU Fly Machines were going to be a hit for us no matter what we did. Because of that, one thing I’m very happy about is that we didn’t compromise the rest of the product for them. Security concerns slowed us down to where we probably learned what we needed to learn a couple months later than we could have otherwise, but we’re scaling back our GPU ambitions without having sacrificed <a href='https://fly.io/blog/sandboxing-and-workload-isolation/' title=''>any of our isolation story</a>, and, ironically, GPUs <em>other people run</em> are making that story a lot more important. The same thing goes for our Fly Machine developer experience.</p>\n\n<p>We started this company building a Javascript runtime for edge computing. We learned that our customers didn’t want a new Javascript runtime; they just wanted their native code to work. <a href='https://news.ycombinator.com/item?id=22616857' title=''>We shipped containers</a>, and no convincing was needed. We were wrong about Javascript edge functions, and I think we were wrong about GPUs. That’s usually how we figure out the right answers:  by being wrong about a lot of stuff.</p>",
      "image": {
        "url": "https://fly.io/blog/wrong-about-gpu/assets/choices-choices-cover.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/the-exit-interview-jp/",
      "title": "The Exit Interview: JP Phillips",
      "description": null,
      "url": "https://fly.io/blog/the-exit-interview-jp/",
      "published": "2025-02-12T00:00:00.000Z",
      "updated": "2025-02-14T21:30:41.000Z",
      "content": "<div class=\"lead\"><p>JP Phillips is off to greener, or at least calmer, pastures. He joined us 4 years ago to build the next generation of our orchestration system, and has been one of the anchors of our engineering team. His last day is today. We wanted to know what he was thinking, and figured you might too.</p>\n</div>\n<p><em>Question 1: Why, JP? Just why?</em></p>\n\n<p>LOL. When I looked at what I wanted to see from here in the next 3-4 years, it didn’t really match up with where we’re currently heading. Specifically, with our new focus on MPG <em>[Managed Postgres]</em> and [llm] <em>[llm].</em></p>\n<div class=\"callout\"><p>Editorial comment: Even I don’t know what [llm] is.</p>\n</div>\n<p>The Fly Machines platform is more or less finished, in the sense of being capable of supporting the next iteration of our products. My original desire to join Fly.io was to make Machines a product that would <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>rid us of HashiCorp Nomad</a>, and I feel like that’s been accomplished.</p>\n\n<p><em>Where were you hoping to see us headed?</em></p>\n\n<p>More directly positioned as a cloud provider, rather than a platform-as-a-service; further along the customer journey from “developers” and “startups” to large established companies.</p>\n\n<p>And, it’s not that I disagree with PAAS work or MPG! Rather, it’s not something that excites me in a way that I’d feel challenged and could continue to grow technically.</p>\n\n<p><em>Follow up question: does your family know what you’re doing here? Doing to us? Are they OK with it?</em></p>\n\n<p>Yes, my family was very involved in the decision, before I even talked to other companies.</p>\n\n<p><em>What’s the thing you’re happiest about having built here? It cannot be “all of <code>flyd</code>”.</em></p>\n\n<p>We’ve enabled developers to run workloads from an OCI image and an API call all over the world. On any other cloud provider, the knowledge of how to pull that off comes with a professional certification.</p>\n\n<p><em>In what file in our <code>nomad-firecracker</code> repository would I find that code?</em></p>\n\n<p><a href='https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines' title=''>https://docs.machines.dev/#tag/machines/post/apps/{app_name}/machines</a></p>\n\n<p><img alt=\"A diagram that doesn't make any of this clearer\" src=\"/blog/the-exit-interview-jp/assets/flaps.png?1/2&center\" /></p>\n\n<p><em>So you mean, literally, the whole Fly Machines API, and <code>flaps</code>, the API gateway for Fly Machines?</em></p>\n\n<p>Yes, all of it. The <code>flaps</code> API server, the <code>flyd</code> RPCs it calls, the <code>flyd</code> finite state machine system, the interface to running VMs.</p>\n\n<p><em>Is there something you especially like about that design?</em></p>\n\n<p>I like that it for the most part doesn’t require any central coordination. And I like that the P90 for Fly Machine <code>create</code> calls is sub-5-seconds for pretty much every region except for Johannesburg and Hong Kong.</p>\n\n<p>I think the FSM design is something I’m proud of; if I could take any code with me, it’d be the <code>internal/fsm</code> in the <code>nomad-firecracker</code> repo.</p>\n<div class=\"callout\"><p>You can read more about <a href=\"https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/\" title=\"\">the <code>flyd</code> orchestrator JP led over here</a>.  But, a quick decoder ring: <code>flyd</code> runs independently without any central coordination on thousands of “worker” servers around the globe. It’s structured as an API server for a bunch of finite state machine invocations, where an FSM might be something like “start a Fly Machine” or “create a new Fly Machine” or “cordon off a Fly Machine so we can update it”. Each FSM invocation is comprised of a bunch of steps, each of those steps has callbacks into the <code>flyd</code> code, and each step is logged in <a href=\"https://github.com/boltdb/bolt\" title=\"\">a BoltDB database</a>.</p>\n</div>\n<p><em>Thinking back, there are like two archetypes of insanely talented developers I’ve worked with. One is the kind that belts out ridiculous amounts of relatively sophisticated code on a whim, at like 3AM. Jerome [who leads our fly-proxy team], is that type. The other comes to projects with what feels like fully-formed, coherent designs that are not super intuitive, and the whole project just falls together around that design. Did you know you were going to do the FSM log thing when you started <code>flyd</code>?</em></p>\n\n<p>I definitely didn’t have any specific design in mind when I started on <code>flyd</code>. I think the FSM stuff is a result of work I did at Compose.io / MongoHQ (where it was called “recipes”/“operations”) and the workd I did at HashiCorp using Cadence.</p>\n\n<p>Once I understood what the product needed to do and look like, having a way to perform deterministic and durable execution felt like a good design.</p>\n\n<p><em>Cadence?</em></p>\n\n<p><a href='https://cadenceworkflow.io/' title=''>Cadence</a> is the child of AWS Step Functions and the predecessor to <a href='https://temporal.io/' title=''>Temporal</a> (the company).</p>\n\n<p>One of the biggest gains, with how it works in <code>flyd</code>, is knowing we would need to deploy <code>flyd</code> all day, every day. If <code>flyd</code> was in the middle of doing some work, it needed to pick back up right where it left off, post-deploy.</p>\n\n<p><em>OK, next question. What’s the most impressive thing you saw someone else build here? To make things simpler and take some pressure off the interview, we can exclude any of my works from consideration.</em></p>\n\n<p>Probably <a href='https://github.com/superfly/corrosion' title=''><code>corrosion2</code></a>.</p>\n<div class=\"callout\"><p>Sidebar: <code>corrosion2</code> is our state distribution system. While <code>flyd</code> runs individual Fly Machines for users, each instance is solely responsible for its own state; there’s no global scheduler. But we have platform components, most obviously <code>fly-proxy</code>, our Anycast router, that need to know what’s running where. <code>corrosion2</code> is a Rust service that does <a href=\"https://fly.io/blog/building-clusters-with-serf/\" title=\"\">SWIM gossip</a> to propagate information from each worker into a CRDT-structured SQLite database. <code>corrosion2</code> essentially means any component on our fleet can do SQLite queries to get near-real-time information about any Fly Machine around the world.</p>\n</div>\n<p>If for no other reason than that we deployed <code>corrosion</code>, learned from it, and were able to make significant and valuable improvements — and then migrate to the new system in a short period of time.</p>\n\n<p>Having a “just SQLite” interface, for async replicated changes around the world in seconds, it’s pretty powerful.</p>\n\n<p>If we invested in <a href='https://antithesis.com/' title=''>Anthesis</a> or TLA+ testing, I think there’s <a href='https://github.com/superfly/corrosion' title=''>potential for other companies</a> to get value out of <code>corrosion2</code>.</p>\n\n<p><em>Just as a general-purpose gossip-based SQLite CRDT gossip system?</em></p>\n\n<p>Yes.</p>\n\n<p><em>OK, you’re being too nice. What’s your least favorite thing about the platform?</em></p>\n\n<p>GraphQL. No, Elixir. It’s a tie between GraphQL and Elixir.</p>\n\n<p>But probably GraphQL, by a hair.</p>\n\n<p><em>That’s not the answer I expected.</em></p>\n\n<p>GraphQL slows everyone down, and everything. Elixir only slows me down.</p>\n\n<p><em>The rest of the platform, you’re fine with? No complaints?</em></p>\n\n<p>I’m happier now that we have <code>pilot</code>.</p>\n<div class=\"callout\"><p><code>pilot</code> is our new <code>init</code>. When we launch a Fly Machine, <code>init</code> is our foothold in the machine; this is unlike a normal OCI runtime, where “pid 1” is often the user’s entrypoint program. Our original <code>init</code> was so simple people dunked on it and said it might as well have been a bash script; over time, <code>init</code> has sprouted a bunch of new features. <code>pilot</code> consolidates those features, and, more importantly, is itself a complete OCI runtime; <code>pilot</code> can natively run containers inside of Fly Machines.</p>\n</div>\n<p>Before <code>pilot</code>, there really wasn’t any contract between <code>flyd</code> and <code>init</code>. And <code>init</code> was just “whatever we wanted <code>init</code> to be”. That limit its ability to serve us.</p>\n\n<p>Having <code>pilot</code> be an OCI-compliant runtime with an API for <code>flyd</code> to drive is  a big win for the future of the Fly Machines API.</p>\n\n<p><em>Was I right that we should have used SQLite for <code>flyd</code>, or were you wrong to have used BoltDB?</em></p>\n\n<p>I still believe Bolt was the right choice. I’ve never lost a second of sleep worried that someone is about to run a SQL update statement on a host, or across the whole fleet, and then mangled all our state data. And limiting the storage interface, by not using SQL, kept <code>flyd</code>‘s scope managed.</p>\n\n<p>On the engine side of the platform, which is what <code>flyd</code> is, I still believe SQL is too powerful for what <code>flyd</code> does.</p>\n\n<p><em>If you had this to do over again, would Bolt be precisely what you’d pick, or is there something else you’d want to try? Some cool-ass new KV store?</em></p>\n\n<p>Nah. But, I’d maybe consider a SQLite database per-Fly-Machine. Then the scope of danger is about as small as it could possibly be.</p>\n\n<p><em>Whoah, that’s an interesting thought. People sleep on the “keep a zillion little SQLites” design.</em></p>\n\n<p>Yeah, with per-Machine SQLite, once a Fly Machine is destroyed, we can just zip up the database and stash it in object storage. The biggest hold-up I have about it is how we’d manage the schemas.</p>\n\n<p><em>OpenTelemetry: were you right all along?</em></p>\n\n<p>One hundred percent.</p>\n\n<p><em>I basically attribute oTel at Fly.io to you.</em></p>\n\n<p>Without oTel, it’d be a disaster trying to troubleshoot the system. I’d have ragequit trying.</p>\n\n<p><em>I remember there being a cost issue, with how much Honeycomb was going to end up charging us to manage all the data. But that seems silly in retrospect.</em></p>\n\n<p>For sure. It is 100% part of the decision and the conversation. But: we didn’t have the best track record running a logs/metrics cluster at this fidelity. It was worth the money to pay someone else to manage tracing data.</p>\n\n<p><em>Strong agree. I think my only issue is just the extent to which it cruds up code. But I need to get over that.</em></p>\n\n<p>Yes, it’s very explicit. I think the next big part of oTel is going to be auto-instrumentation, for profiling.</p>\n\n<p><em>You’re a veteran Golang programmer. Say 3 nice things about Rust.</em></p>\n<div class=\"callout\"><p>Most of our backend is in Go, but <code>fly-proxy</code>, <code>corrosion2</code>, and <code>pilot</code> are in Rust.</p>\n</div>\n<ol>\n<li>Option. \n</li><li>Match.\n</li><li>Serde macros.\n</li></ol>\n\n<p><em>Even I can’t say shit about Option and match.</em></p>\n\n<p>Match is so much better than anything in Go.</p>\n\n<p><em>Elixir, Go, and Rust. An honest take on that programming cocktail.</em></p>\n\n<p>Three’s a crowd, Elixir can stay home.</p>\n\n<p><em>If you could only lose one, you’d keep Rust.</em></p>\n\n<p>I’ve learned its shortcomings and the productivity far outweighs having to deal with the Rust compiler.</p>\n\n<p><em>You’d be unhappy if we moved the <code>flaps</code> API code from Go to Elixir.</em></p>\n\n<p>Correct.</p>\n\n<p><em>I kind of buy the idea of doing orchestration and scheduling code, which is policy-intensive, in a higher-level language.</em></p>\n\n<p>Maybe. If Ruby had a better concurrency story, I don’t think Elixir would have a place for us.</p>\n<div class=\"callout\"><p>Here I need to note that Ruby is functionally dead here, and Elixir is ascendant.</p>\n</div>\n<p><em>We have an idiosyncratic management structure. We’re bottom-up, but ambiguously so. We don’t have roadmaps, except when we do. We have minimal top-down technical direction. Critique.</em></p>\n\n<p>It’s too easy to lose sight of whether your current focus [in what you’re building] is valuable to the company.</p>\n\n<p><em>The first thing I warn every candidate about on our “do-not-work-here” calls.</em></p>\n\n<p>I think it comes down to execution, and accountability to actually finish projects. I spun a lot trying to figure out what would be the most valuable work for Fly Machines.</p>\n\n<p><em>You don’t have to be so nice about things.</em></p>\n\n<p>We struggle a lot with consistent communication. We change direction a little too often. It got to a point where I didn’t see a point in devoting time and effort into projects, because I’d not be able to show enough value quick enough.</p>\n\n<p><em>I see things paying off later than we’d hoped or expected they would. Our secret storage system, Pet Semetary, is a good example of this. Our K8s service, FKS, is another obvious one, since we’re shipping MPG on it.</em></p>\n\n<p><em>This is your second time working Kurt, at a company where he’s the CEO. Give him a 1-4 star rating. He can take it! At least, I think he can take it.</em></p>\n\n<p>2022: ★★★★</p>\n\n<p>2023: ★★</p>\n\n<p>2024: ★★✩</p>\n\n<p>2025: ★★★✩</p>\n\n<p>On a four-star scale.</p>\n\n<p><em>Whoah. I did not expect a histogram. Say more about 2023!</em></p>\n\n<p>We hired too many people, too quickly, and didn’t have the guardrails and structure in place for everybody to be successful.</p>\n\n<p><em>Also: GPUs!</em></p>\n\n<p>Yes. That was my next comment.</p>\n\n<p><em>Do we secretly agree about GPUs?</em></p>\n\n<p>I think so.</p>\n\n<p><em>Our side won the argument in the end! But at what cost?</em></p>\n\n<p>They were a killer distraction.</p>\n\n<p><em>Final question: how long will you remain in the first-responder on-call rotation after you leave? I assume at least until August. I have a shift this weekend; can you swap with me? I keep getting weekends.</em></p>\n\n<p>I am going to be asleep all weekend if any of my previous job changes are indicative.</p>\n\n<p><em>I sleep through on-call too! But nobody can yell at you for it now. I think you have the comparative advantage over me in on-calling.</em></p>\n\n<p>Yes I will absolutely take all your future on-call shifts, you have convinced me.</p>\n\n<p><em>All this aside: it has been a privilege watching you work. I hope your next gig is 100x more relaxing than this was. Or maybe I just hope that for myself. Except: I’ll never escape this place. Thank you so much for doing this.</em></p>\n\n<p>Thank you! I’m forever grateful for having the opportunity to be a part of Fly.io.</p>",
      "image": {
        "url": "https://fly.io/blog/the-exit-interview-jp/assets/bye-bye-little-sebastian-cover.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/semgrep-but-for-real-now/",
      "title": "Did Semgrep Just Get A Lot More Interesting?",
      "description": null,
      "url": "https://fly.io/blog/semgrep-but-for-real-now/",
      "published": "2025-02-10T00:00:00.000Z",
      "updated": "2025-02-14T21:30:41.000Z",
      "content": "<div class=\"right-sidenote\"><p>This whole paragraph is just one long sentence. God I love <a href=\"https://fly.io/blog/a-blog-if-kept/\" title=\"\">just random-ass blogging</a> again.</p>\n</div>\n<p><a href='https://ghuntley.com/stdlib/' title=''>This bit by Geoffrey Huntley</a> is super interesting to me and, despite calling out that LLM-driven development agents like Cursor have something like a 40% success rate at actually building anything that passes acceptance criteria, makes me think that more of the future of our field belongs to people who figure out how to use this weird bags of model weights than any of us are comfortable with. </p>\n\n<p>I’ve been dinking around with Cursor for a week now (if you haven’t, I think it’s something close to malpractice not to at least take it — or something like it — for a spin) and am just now from this post learning that Cursor has this <a href='https://docs.cursor.com/context/rules-for-ai' title=''>rules feature</a>. </p>\n\n<p>The important thing for me is not how Cursor rules work, but rather how Huntley uses them. He turns them back on themselves, writing rules to tell Cursor how to organize the rules, and then teach Cursor how to write (under human supervision) its own rules.</p>\n\n<p>Cursor kept trying to get Huntley to use Bazel as a build system. So he had cursor write a rule for itself: “no bazel”. And there was no more Bazel. If I’d known I could do this, I probably wouldn’t have bounced from the Elixir project I had Cursor doing, where trying to get it to write simple unit tests got it all tangled up trying to make <a href='https://hexdocs.pm/mox/Mox.html' title=''>Mox</a> work. </p>\n\n<p>But I’m burying the lead. </p>\n\n<p>Security people have been for several years now somewhat in love with a tool called <a href='https://github.com/semgrep/semgrep' title=''>Semgrep</a>. Semgrep is a semantics-aware code search tool; using symbolic variable placeholders and otherwise ordinary code, you can write rules to match pretty much arbitary expressions and control flow. </p>\n\n<p>If you’re an appsec person, where you obviously go with this is: you build a library of Semgrep searches for well-known vulnerability patterns (or, if you’re like us at Fly.io, you work out how to get Semgrep to catch the Rust concurrency footgun of RWLocks inside if-lets).</p>\n\n<p>The reality for most teams though is “ain’t nobody got time for that”. </p>\n\n<p>But I just checked and, unsurprisingly, 4o <a href='https://chatgpt.com/share/67aa94a7-ea3c-8012-845c-6c9491b33fe4' title=''>seems to do reasonably well</a> at generating Semgrep rules? Like: I have no idea if this rule is actually any good. But it looks like a Semgrep rule?</p>\n\n<p>What interests me is this: it seems obvious that we’re going to do more and more “closed-loop” LLM agent code generation stuff. By “closed loop”, I mean that the thingy that generates code is going to get to run the code and watch what happens when it’s interacted with. You’re just a small bit of glue code and a lot of system prompting away from building something like that right now: <a href='https://x.com/chris_mccord/status/1882839014845374683' title=''>Chris McCord is building</a> a thingy that generates whole Elixir/Phoenix apps and runs them as Fly Machines. When you deploy these kinds of things, the LLM gets to see the errors when the code is run, and it can just go fix them. It also gets to see errors and exceptions in the logs when you hit a page on the app, and it can just go fix them.</p>\n\n<p>With a bit more system prompting, you can get an LLM to try to generalize out from exceptions it fixes and generate unit test coverage for them. </p>\n\n<p>With a little bit more system prompting, you can probably get an LLM to (1) generate a Semgrep rule for the generalized bug it caught, (2) test the Semgrep rule with a positive/negative control, (3) save the rule, (4) test the whole codebase with Semgrep for that rule, and (5) fix anything it finds that way. </p>\n\n<p>That is a lot more interesting to me than tediously (and probably badly) trying to predict everything that will go wrong in my codebase a priori and Semgrepping for them. Which is to say: Semgrep — which I have always liked — is maybe a lot more interesting now? And tools like it?</p>",
      "image": {
        "url": "https://fly.io/blog/semgrep-but-for-real-now/assets/ci-cd-cover.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/a-blog-if-kept/",
      "title": "A Blog, If You Can Keep It",
      "description": null,
      "url": "https://fly.io/blog/a-blog-if-kept/",
      "published": "2025-02-10T00:00:00.000Z",
      "updated": "2025-02-19T19:05:52.000Z",
      "content": "<div class=\"lead\"><p>A boldfaced lede like this was a sure sign you were reading a carefully choreographed EffortPost from our team at Fly.io. We’re going to do less of those. Or the same amount but more of a different kind of post. Either way: launch an app on Fly.io today!</p>\n</div>\n<p>Over the last 5 years, we’ve done pretty well for ourselves writing content for Hacker News. And that’s <a href='https://news.ycombinator.com/item?id=39373476' title=''>mostly</a> been good for us. We don’t do conventional marketing, we don’t have a sales team, the rest of social media is atomized over 5 different sites. Writing pieces that HN takes seriously has been our primary outreach tool.</p>\n\n<p>There’s a recipe (probably several, but I know this one works) for charting a post on HN:</p>\n\n<ol>\n<li>Write an EffortPost, which is to say a dense technical piece over 2000 words long; within that rubric there’s a bunch of things that are catnip to HN, including runnable code, research surveys, and explainers. (There are also cat-repellants you learn to steer clear of.)\n</li><li>Assiduously avoid promotion. You have to write for the audience. We get away with sporadically including a call-to-action block in our posts, but otherwise: the post should make sense even if an unrelated company posted it after you went out of business.\n</li><li>Pick topics HN is interested in (it helps if all your topics are au courant for HN, and we’ve been <a href='https://news.ycombinator.com/item?id=32250426' title=''>very</a> <a href='https://news.ycombinator.com/item?id=32018066' title=''>lucky</a> in that regard).\n</li><li>Like 5-6 more style-guide things that help incrementally. Probably 3 different teams writing for HN will have 3 different style guides with only like ½ overlap. Ours, for instances, instructs writers to swear.\n</li></ol>\n\n<p>I like this kind of writing. It’s not even a chore. But it’s become an impediment for us, for a couple reasons: the team serializes behind an “editorial” function here, which keeps us from publishing everything we want; worse, caring so much about our track record leaves us noodling on posts interminably (the poor <a href='https://www.tigrisdata.com/' title=''>Tigrises</a> have been waiting for months for me to publish the piece I wrote about them and FoundationDB; take heart, this post today means that one is coming soon).</p>\n\n<p>But worst of all, I worried incessantly about us <a href='https://gist.github.com/tqbf/e853764b562a2d72a91a6986ca3b77c0' title=''>wearing out our welcome</a>. To my mind, we’d have 1, maybe 2 bites at the HN apple in a given month, and we needed to make them count.</p>\n\n<p>That was dumb. I am dumb about a lot of things! I came around to understanding this after Kurt demanded I publish my blog post about BFAAS (Bash Functions As A Service), 500 lines of Go code that had generated 4500 words in my draft. It was only after I made the decision to stop gatekeeping this blog that I realized <a href='https://simonwillison.net/' title=''>Simon Willison</a> has been disproving my “wearing out the welcome” theory, day in and day out, for years. He just writes stuff about LLMs when it interests him. I mean, it helps that he’s a better writer than we are. But he’s not wasting time choreographing things.</p>\n\n<p>Back in like 2009, <a href='https://web.archive.org/web/20110806040300/http://chargen.matasano.com/chargen/2009/7/22/if-youre-typing-the-letters-a-e-s-into-your-code-youre-doing.html' title=''>we had a blog</a> at another company I was at. That blog drove a lot of business for us (and, on three occasions, almost killed me). It was not in the least bit optimized for HN. I like pretending to be a magazine feature writer, but I miss writing dashed-off pieces every day and clearing space for other people on the team to write as well.</p>\n\n<p>So this is all just a heads up: we’re trying something new. This is a very long and self-indulgent way to say “we’re going to write a normal blog like it’s 2008”, but that’s how broken my brain is after years of having my primary dopaminergic rewards come from how long Fly.io blog posts stay on the front page: I have to disclaim blogging before we start doing it, lest I fail to meet expectations.</p>\n\n<p>Like I said. I’m real dumb. But: looking forward to getting a lot more stuff out on the web for people to read this year!</p>",
      "image": {
        "url": "https://fly.io/blog/a-blog-if-kept/assets/keep-blog.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/vscode-ssh-wtf/",
      "title": "VSCode’s SSH Agent Is Bananas",
      "description": null,
      "url": "https://fly.io/blog/vscode-ssh-wtf/",
      "published": "2025-02-07T00:00:00.000Z",
      "updated": "2025-02-14T21:30:41.000Z",
      "content": "<p>We’re interested in getting integrated into the flow VSCode uses to do remote editing over SSH, because everybody is using VSCode now, and, in particular, they’re using forks of VSCode that generate code with LLMs. </p>\n<div class=\"right-sidenote\"><p>”hallucination” is what we call it when LLMs get code wrong; “engineering” is what we call it when people do.</p>\n</div>\n<p>LLM-generated code is <a href='https://nicholas.carlini.com/writing/2024/how-i-use-ai.html' title=''>useful in the general case</a> if you know what you’re doing. But it’s ultra-useful if you can close the loop between the LLM and the execution environment (with an “Agent” setup). There’s lots to say about this, but for the moment: it’s a semi-effective antidote to hallucination: the LLM generates the code, the agent scaffolding runs the code, the code generates errors, the agent feeds it back to the LLM, the process iterates. </p>\n\n<p>So, obviously, the issue here is you don’t want this iterative development process happening on your development laptop, because LLMs have boundary issues, and they’ll iterate on your system configuration just as happily on the Git project you happen to be working in. A thing you’d really like to be able to do: run a closed-loop agent-y (“agentic”? is that what we say now) configuration for an LLM, on a clean-slate Linux instance that spins up instantly and that can’t screw you over in any way. You get where we’re going with this.</p>\n\n<p>Anyways! I would like to register a concern.</p>\n\n<p>Emacs hosts the spiritual forebearer of remote editing systems, a blob of hyper-useful Elisp called <a href='https://www.gnu.org/software/tramp/' title=''>“Tramp”</a>. If you can hook Tramp up to any kind of interactive environment — usually, an SSH session — where it can run Bourne shell commands, it can extend Emacs to that environment.</p>\n\n<p>So, VSCode has a feature like Tramp. Which, neat, right? You’d think, take Tramp, maybe simplify it a bit, switch out Elisp for Typescript.</p>\n\n<p>You’d think wrong!</p>\n\n<p>Unlike Tramp, which lives off the land on the remote connection, VSCode mounts a full-scale invasion: it runs a Bash snippet stager that downloads an agent, including a binary installation of Node. </p>\n\n<p>I <em>think</em> this is <a href='https://github.com/microsoft/vscode/tree/c9e7e1b72f80b12ffc00e06153afcfedba9ec31f/src/vs/server/node' title=''>the source code</a>?</p>\n\n<p>The agent runs over port-forwarded SSH. It establishes a WebSockets connection back to your running VSCode front-end. The underlying protocol on that connection can:</p>\n\n<ul>\n<li>Wander around the filesystem\n</li><li>Edit arbitrary files\n</li><li>Launch its own shell PTY processes\n</li><li>Persist itself\n</li></ul>\n\n<p>In security-world, there’s a name for tools that work this way. I won’t say it out loud, because that’s not fair to VSCode, but let’s just say the name is murid in nature.</p>\n\n<p>I would be a little nervous about letting people VSCode-remote-edit stuff on dev servers, and apoplectic if that happened during an incident on something in production. </p>\n\n<p>It turns out we don’t have to care about any of this to get a custom connection to a Fly Machine working in VSCode, so none of this matters in any kind of deep way, but: we’ve decided to just be a blog again, so: we had to learn this, and now you do too.</p>",
      "image": {
        "url": "https://fly.io/static/images/default-post-thumbnail.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/",
      "title": "AI GPU Clusters, From Your Laptop, With Livebook",
      "description": null,
      "url": "https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/",
      "published": "2024-09-24T00:00:00.000Z",
      "updated": "2024-09-24T17:19:49.000Z",
      "content": "<div class=\"lead\"><p>Livebook, FLAME, and the Nx stack: three Elixir components that are easy to describe, more powerful than they look, and intricately threaded into the Elixir ecosystem. A few weeks ago, Chris McCord (👋) and Chris Grainger showed them off at ElixirConf 2024. We thought the talk was worth a recap.</p>\n</div>\n<p>Let’s begin by introducing our cast of characters.</p>\n\n<p><a href='https://livebook.dev/' title=''>Livebook</a> is usually described as Elixir’s answer to <a href='https://jupyter.org/' title=''>Jupyter Notebooks</a>. And that’s a good way to think about it. But Livebook takes full advantage of the Elixir platform, which makes it sneakily powerful. By linking up directly with Elixir app clusters, Livebook can switch easily between driving compute locally or on remote servers, and makes it easy to bring in any kind of data into reproducible workflows.</p>\n\n<p><a href='https://fly.io/blog/rethinking-serverless-with-flame/' title=''>FLAME</a> is the Elixir’s answer to serverless computing. By having the library manage a pool of executors for you, FLAME lets you treat your entire application as if it was elastic and scale-to-zero. You configure FLAME with some basic information about where to run code and how many instances it’s allowed to run with, and then mark off any arbitrary section of code with <code>Flame.call</code>. The framework takes care of the rest. It’s the upside of serverless without committing yourself to blowing your app apart into tiny, intricately connected pieces.</p>\n\n<p>The <a href='https://github.com/elixir-nx' title=''>Nx stack</a> is how you do Elixir-native AI and ML. Nx gives you an Elixir-native notion of tensor computations with GPU backends. <a href='https://github.com/elixir-nx/axon' title=''>Axon</a> builds a common interface for ML models on top of it. <a href='https://github.com/elixir-nx/bumblebee' title=''>Bumblebee</a> makes those models available to any Elixir app that wants to download them, from just a couple lines of code.</p>\n\n<p>Here is quick video showing how to transfer a local tensor to a remote GPU, using Livebook, FLAME, and Nx:</p>\n<div class=\"youtube-container\" data-exclude-render>\n  <div class=\"youtube-video\">\n    <iframe\n      width=\"100%\"\n      height=\"100%\"\n      src=\"https://www.youtube.com/embed/5ImP3gpUSkQ\"\n      frameborder=\"0\"\n      allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n      allowfullscreen>\n    </iframe>\n  </div>\n</div>\n\n\n<p>Let’s dive into the <a href='https://www.youtube.com/watch?v=4qoHPh0obv0' title=''>keynote</a>.</p>\n<h2 id='poking-a-hole-in-your-infrastructure' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#poking-a-hole-in-your-infrastructure' aria-label='Anchor'></a><span class='plain-code'>Poking a hole in your infrastructure</span></h2>\n<p>Any Livebook, including the one running on your laptop, can start a runtime running on a Fly Machine, in Fly.io’s public cloud. That Elixir machine will (by default) live in your default Fly.io organization, giving it networked access to all the other apps that might live there.</p>\n\n<p>This is an access control situation that mostly just does what you want it to do without asking. Unless you ask it to, Fly.io isn’t exposing anything to the Internet, or to other users of Fly.io. For instance: say we have a database we’re going to use to generate reports. It can hang out on our Fly organization, inside of a private network with no connectivity to the world. We can spin up a Livebook instance that can talk to it, without doing any network or infrastructure engineering to make that happen.</p>\n\n<p>But wait, there’s more. Because this is all Elixir, Livebook also allows you to connect to any running Erlang/Elixir application in your infrastructure to debug, introspect, and monitor them.</p>\n\n<p>Check out this clip of Chris McCord connecting <a href='https://rtt.fly.dev/' title=''>to an existing application</a> during the keynote:</p>\n<div class=\"youtube-container\" data-exclude-render>\n  <div class=\"youtube-video\">\n    <iframe\n      width=\"100%\"\n      height=\"100%\"\n      src=\"https://www.youtube.com/embed/4qoHPh0obv0?start=1106\"\n      frameborder=\"0\"\n      allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n      allowfullscreen>\n    </iframe>\n  </div>\n</div>\n\n\n<p>Running a snippet of code from a laptop on a remote server is a neat trick, but Livebook is doing something deeper than that. It’s taking advantage of Erlang/Elixir’s native facility with cluster computation and making it available to the notebook. As a result, when we do things like auto-completing, Livebook delivers results from modules defined on the remote note itself. 🤯</p>\n<h2 id='elastic-scale-with-flame' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#elastic-scale-with-flame' aria-label='Anchor'></a><span class='plain-code'>Elastic scale with FLAME</span></h2>\n<p>When we first introduced FLAME, the example we used was video encoding.</p>\n\n<p>Video encoding is complicated and slow enough that you’d normally make arrangements to run it remotely or in a background job queue, or as a triggerable Lambda function. The point of FLAME is to get rid of all those steps, and give them over to the framework instead. So: we wrote our <code>ffpmeg</code> calls inline like normal code, as if they were going to complete in microseconds, and wrapped them in <code>Flame.call</code> blocks. That was it, that was the demo.</p>\n\n<p>Here, we’re going to put a little AI spin on it.</p>\n\n<p>The first thing we’re doing here is driving FLAME pools from Livebook. Livebook will automatically synchronize your notebook dependencies as well as any module or code defined in your notebook across nodes. That means any code we write in our notebook can be dispatched transparently out to arbitrarily many compute nodes, without ceremony.</p>\n\n<p>Now let’s add some AI flair. We take an object store bucket full of video files. We use <code>ffmpeg</code> to extract stills from the video at different moments. Then: we send them to <a href='https://www.llama.com/' title=''>Llama</a>, running on <a href='https://fly.io/gpu' title=''>GPU Fly Machines</a> (still locked to our organization), to get descriptions of the stills.</p>\n\n<p>All those stills and descriptions get streamed back to our notebook, in real time:</p>\n<div class=\"youtube-container\" data-exclude-render>\n  <div class=\"youtube-video\">\n    <iframe\n      width=\"100%\"\n      height=\"100%\"\n      src=\"https://www.youtube.com/embed/4qoHPh0obv0?start=1692\"\n      frameborder=\"0\"\n      allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n      allowfullscreen>\n    </iframe>\n  </div>\n</div>\n\n\n<p>At the end, the descriptions are sent to <a href='https://mistral.ai/' title=''>Mistral</a>, which builds a summary.</p>\n\n<p>Thanks to FLAME, we get explicit control over the minimum and the maximum amount of nodes you want running at once, as well their concurrency settings. As nodes finish processing each video, new ones are automatically sent to them, until the whole bucket has been traversed. Each node will automatically shut down after an idle timeout and the whole cluster terminates if you disconnect the Livebook runtime.</p>\n\n<p>Just like your app code, FLAME lets you take your notebook code designed to run locally, change almost nothing, and elastically execute it across ephemeral infrastructure.</p>\n<h2 id='64-gpus-hyperparameter-tuning-on-a-laptop' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#64-gpus-hyperparameter-tuning-on-a-laptop' aria-label='Anchor'></a><span class='plain-code'>64-GPUs hyperparameter tuning on a laptop</span></h2>\n<p>Next, Chris Grainger, CTO of <a href='https://amplified.ai/' title=''>Amplified</a>, takes the stage.</p>\n\n<p>For work at Amplified, Chris wants to analyze a gigantic archive of patents, on behalf of a client doing edible cannibinoid work. To do that, he uses a BERT model (BERT, from Google, is one of the OG “transformer” models, optimized for text comprehension).</p>\n\n<p>To make the BERT model effective for this task, he’s going to do a hyperparameter training run.</p>\n\n<p>This is a much more complicated AI task than the Llama work we just showed up. Chris is going to generate a cluster of 64 GPU Fly Machines, each running an <a href='https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/' title=''>L40s GPU</a>. On each of these nodes, he needs to:</p>\n\n<ul>\n<li>setup its environment (including native dependencies and GPU bindings)\n</li><li>load the training data\n</li><li>compile a different version of BERT with different parameters, optimizers, etc.\n</li><li>start the fine-tuning\n</li><li>stream its results in real-time to each assigned chart\n</li></ul>\n\n<p>Here’s the clip. You’ll see the results stream in, in real time, directly back to his Livebook. We’ll wait, because it won’t take long to watch:</p>\n<div class=\"youtube-container\" data-exclude-render>\n  <div class=\"youtube-video\">\n    <iframe\n      width=\"100%\"\n      height=\"100%\"\n      src=\"https://www.youtube.com/embed/4qoHPh0obv0?start=3344\"\n      frameborder=\"0\"\n      allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\"\n      allowfullscreen>\n    </iframe>\n  </div>\n</div>\n\n<h2 id='this-is-just-the-beginning' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#this-is-just-the-beginning' aria-label='Anchor'></a><span class='plain-code'>This is just the beginning</span></h2>\n<p>The suggestion of mixing Livebook and FLAME to elastically scale notebook execution was originally proposed by Chris Grainger during ElixirConf EU. During the next four months, Jonatan Kłosko, Chris McCord, and José Valim worked part-time on making it a reality in time for ElixirConf US. Our ability to deliver such a rich combination of features in such a short period of time is a testament to the capabilities of the Erlang Virtual Machine, which Elixir and Livebook runs on. Other features, such as <a href='https://github.com/elixir-explorer/explorer/issues/932' title=''>remote dataframes and distributed GC</a>, were implemented in a weekend.  Bringing the same functionality to other ecosystems would take several additional months, sometimes accompanied by millions in funding, and often times as part of a closed-source product.</p>\n\n<p>Furthermore, since we announced this feature, <a href='https://github.com/mruoss' title=''>Michael Ruoss</a> stepped in and brought the same functionality to Kubernetes. From Livebook v0.14.1, you can start Livebook runtimes inside a Kubernetes cluster and also use FLAME to elastically scale them. Expect more features and news in this space!</p>\n\n<p>Finally, Fly’s infrastructure played a key role in making it possible to start a cluster of GPUs in seconds rather than minutes, and all it requires is a Docker image. We’re looking forward to see how other technologies and notebook platforms can leverage Fly to also elevate their developer experiences.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Launch a GPU app in seconds</h1>\n    <p>Run your own LLMs or use Livebook for elastic GPU workflows ✨</p>\n      <a class=\"btn btn-lg\" href=\"/gpu\">\n        Go!  <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-turtle.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>",
      "image": {
        "url": "https://fly.io/blog/ai-gpu-clusters-from-your-laptop-livebook/assets/ai-gpu-livebook-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/accident-forgiveness/",
      "title": "Accident Forgiveness",
      "description": null,
      "url": "https://fly.io/blog/accident-forgiveness/",
      "published": "2024-08-21T00:00:00.000Z",
      "updated": "2024-09-11T00:04:08.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. <a href=\"https://fly.io/speedrun\" title=\"\">Try it out; you’ll be deployed in just minutes</a>, and, as you’re about to read, with less financial risk.</p>\n</div>\n<p>Public cloud billing is terrifying.</p>\n\n<p>The premise of a public cloud — what sets it apart from a hosting provider — is 8,760 hours/year of on-tap deployable compute, storage, and networking. Cloud resources are “elastic”: they’re acquired and released as needed; in the “cloud-iest” apps, without human intervention. Public cloud resources behave like utilities, and that’s how they’re priced.</p>\n\n<p>You probably can’t tell me how much electricity your home is using right now, and  may only come within tens of dollars of accurately predicting your water bill. But neither of those bills are all that scary, because you assume there’s a limit to how much you could run them up in a single billing interval.</p>\n\n<p>That’s not true of public clouds. There are only so many ways to “spend” water at your home, but there are indeterminably many ways to land on a code path that grabs another VM, or to miskey a configuration, or to leak long-running CI/CD environments every time a PR gets merged. Pick a practitioner at random, and I bet they’ve read a story within the last couple months about someone running up a galactic-scale bill at some provider or other.</p>\n<h2 id='implied-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#implied-accident-forgiveness' aria-label='Anchor'></a><span class='plain-code'>Implied Accident Forgiveness</span></h2>\n<p>For people who don’t do a lot of cloud work, what all this means is that every PR push sets off a little alarm in the back of their heads: “you may have just incurred $200,000 of costs!”. The alarm is quickly silenced,  though it’s still subtly extracting a cortisol penalty. But by deadening the nerves that sense the danger of unexpected charges, those people are nudged closer to themselves being the next story on Twitter about an accidental $200,000 bill.</p>\n\n<p>The saving grace here, which you’ll learn if you ever become that $200,000 story, is that nobody pays those bills.</p>\n\n<p>See, what cloud-savvy people know already is that providers have billing support teams, which spend a big chunk of their time conceding disputed bills. If you do something luridly stupid and rack up costs, AWS and GCP will probably cut you a break. We will too. Everyone does.</p>\n\n<p>If you didn’t already know this, you’re welcome; I’ve made your life a little better, even if you don’t run things on Fly.io.</p>\n\n<p>But as soothing as it is to know you can get a break from cloud providers, the billing situation here is still a long ways away from “good”. If you accidentally add a zero to a scale count and don’t notice for several weeks, AWS or GCP will probably cut you a break. But they won’t <em>definitely</em> do it, and even though your odds are good, you’re still finding out at email- and phone-tag scale speeds. That’s not fun!</p>\n<h2 id='explicit-accident-forgiveness' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#explicit-accident-forgiveness' aria-label='Anchor'></a><span class='plain-code'>Explicit Accident Forgiveness</span></h2>\n<p>Charging you for stuff you didn’t want is bad business.</p>\n\n<p>Good business, we think, means making you so comfortable with your cloud you try new stuff. You, and everyone else on your team. Without a chaperone from the finance department.</p>\n\n<p>So we’re going to do the work to make this official. If you’re a customer of ours, we’re going to charge you in exacting detail for every resource you intentionally use of ours, but if something blows up and you get an unexpected bill, we’re going to let you off the hook.</p>\n<h2 id='not-so-fast' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#not-so-fast' aria-label='Anchor'></a><span class='plain-code'>Not So Fast</span></h2>\n<p>This is a Project, with a capital P. While we’re kind of kicking ourselves for not starting it earlier, there are reasons we couldn’t do it back in 2020.</p>\n\n<p>The Fully Automated Accident-Forgiving Billing System of the Future (which we are in fact building and may even one day ship) will give you a line-item veto on your invoice. We are a long ways away. The biggest reason is fraud.</p>\n\n<p>Sit back, close your eyes, and try to think about everything public clouds do to make your life harder. Chances are, most of those things are responses to fraud. Cloud platforms attract fraudsters like ants to an upturned ice cream cone. Thanks to the modern science of cryptography, fraudsters have had a 15 year head start on turning metered compute into picodollar-granular near-money assets.</p>\n\n<p>Since there’s no bouncer at the door checking IDs here, an open-ended and automated commitment to accident forgiveness is, with iron certainty, going to be used overwhelmingly in order to trick us into “forgiving” cryptocurrency miners. We’re cloud platform engineers. They’re our primary pathogen.</p>\n\n<p>So, we’re going to roll this out incrementally.</p>\n<div class=\"callout\"><p><strong class=\"font-semibold text-navy-950\">Why not billing alerts?</strong> We’ll get there, but here too there are reasons we haven’t yet: (1) meaningful billing alerts were incredibly difficult to do with our previous billing system, and building the new system and migrating our customers to it has been a huge lift, a nightmare from which we are only now waking (the billing system’s official name); and (2) we’re wary about alerts being a product design cop-out; if we can alert on something, why aren’t we fixing it?</p>\n</div><h2 id='accident-forgiveness-v0-84beta' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#accident-forgiveness-v0-84beta' aria-label='Anchor'></a><span class='plain-code'>Accident Forgiveness v0.84beta</span></h2>\n<p>All the same subtextual, implied reassurances that every cloud provider offers remain in place at Fly.io. You are strictly better off after this announcement, we promise.</p>\n<div class=\"right-sidenote\"><p>I added the “almost” right before publishing, because I’m chicken.</p>\n</div>\n<p>Now: for customers that have a support contract with us, at any level, there’s something new: I’m saying the quiet part loud. The next time you see a bill with an unexpected charge on it, we’ll refund that charge, (almost) no questions asked.</p>\n\n<p>That policy is so simple it feels anticlimactic to write. So, some additional color commentary:</p>\n\n<p>We’re not advertising a limit to the number of times you can do this. If you’re a serious customer of ours, I promise that you cannot remotely fathom the fullness of our fellow-feeling. You’re not annoying us by getting us to refund unexpected charges. If you are growing a project on Fly.io, we will bend over backwards to keep you growing.</p>\n\n<p>How far can we take this? How simple can we keep this policy? We’re going to find out together.</p>\n\n<p>To begin with, and in the spirit of “doing things that won’t scale”, when we forgive a bill, what’s going to happen next is this: I’m going to set an irritating personal reminder for Kurt to look into what happened, now and then the day before your next bill, so we can see what’s going wrong. He’s going to hate that, which is the point: our best feature work is driven by Kurt-hate.</p>\n\n<p>Obviously, if you’re rubbing your hands together excitedly over the opportunity this policy presents, then, well, not so much with the fellow-feeling. We reserve the right to cut you off.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Support For Developers, By Developers</h1>\n    <p>Explicit Accident Forgiveness is just one thing we like about Support at Fly.io.</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/accident-forgiveness\">\n        Go find out!  <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-cat.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='whats-next-accident-protection' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#whats-next-accident-protection' aria-label='Anchor'></a><span class='plain-code'>What’s Next: Accident Protection</span></h2>\n<p>We think this is a pretty good first step. But that’s all it is.</p>\n\n<p>We can do better than offering you easy refunds for mistaken deployments and botched CI/CD jobs. What’s better than getting a refund is never incurring the charge to begin with, and that’s the next step we’re working on.</p>\n<div class=\"right-sidenote\"><p>More to come on that billing system.</p>\n</div>\n<p>We built a new billing system so that we can do things like that. For instance: we’re in a position to catch sudden spikes in your month-over-month bills, flag them, and catch weird-looking deployments before we bill for them.</p>\n\n<p>Another thing we rebuilt billing for is <a href='https://community.fly.io/t/reservation-blocks-40-discount-on-machines-when-youre-ready-to-commit/20858' title=''>reserved pricing</a>. Already today you can get a steep discount from us reserving blocks of compute in advance. The trick to taking advantage of reserved pricing is confidently predicting a floor to your usage. For a lot of people, that means fighting feelings of loss aversion (nobody wants to get gym priced!). So another thing we can do in this same vein: catch opportunities to move customers to reserved blocks, and offer backdated reservations. We’ll figure this out too.</p>\n\n<p>Someday, when we’re in a monopoly position, our founders have all been replaced by ruthless MBAs, and Kurt has retired to farm coffee beans in lower Montana, we may stop doing this stuff. But until that day this is the right choice for our business.</p>\n\n<p>Meanwhile: like every public cloud, we provision our own hardware, and we have excess capacity. Your messed-up CI/CD jobs didn’t really cost us anything, so if you didn’t really want them, they shouldn’t cost you anything either. Take us up on this! We love talking to you.</p>",
      "image": {
        "url": "https://fly.io/blog/accident-forgiveness/assets/money-for-mistakes-blog-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/",
      "title": "We're Cutting L40S Prices In Half",
      "description": null,
      "url": "https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/",
      "published": "2024-08-15T00:00:00.000Z",
      "updated": "2024-09-11T00:04:08.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. And as of today, cheaper GPUs. <a href=\"https://fly.io/speedrun\" title=\"\">Try it out; you’ll be deployed in just minutes</a>.</p>\n</div>\n<p>We just lowered the prices on NVIDIA L40s GPUs to $1.25 per hour. Why? Because our feet are cold and we burn processor cycles for heat. But also other reasons.</p>\n\n<p>Let’s back up.</p>\n\n<p>We offer 4 different NVIDIA GPU models; in increasing order of performance, they’re the A10, the L40S, the 40G PCI A100, and the 80G SXM A100.  Guess which one is most popular.</p>\n\n<p>We guessed wrong, and spent a lot of time working out how to maximize the amount of GPU power we could deliver to a single Fly Machine. Users surprised us. By a wide margin, the most popular GPU in our inventory is the A10.</p>\n\n<p>The A10 is an older generation of NVIDIA GPU with fewer, slower cores and less memory. It’s the least capable GPU we offer. But that doesn’t matter, because it’s capable enough. It’s solid for random inference tasks, and handles mid-sized generative AI stuff like Mistral Nemo or Stable Diffusion. For those workloads, there’s not that much benefit in getting a beefier GPU.</p>\n\n<p>As a result, we can’t get new A10s in fast enough for our users.</p>\n\n<p>If there’s one thing we’ve learned by talking to our customers over the last 4 years, it’s that y'all love a peek behind the curtain. So we’re going to let you in on a little secret about how a hardware provider like Fly.io formulates GPU strategy: none of us know what the hell we’re doing.</p>\n\n<p>If you had asked us in 2023 what the biggest GPU problem we could solve was, we’d have said “selling fractional A100 slices”. We burned a whole quarter trying to get MIG, or at least vGPUs, working through IOMMU PCI passthrough on Fly Machines, in a project so cursed that Thomas has forsworn ever programming again. Then we went to market selling whole A100s, and for several more months it looked like the biggest problem we needed to solve was finding a secure way to expose NVLink-ganged A100 clusters to VMs so users could run training. Then H100s; can we find H100s anywhere? Maybe in a black market in Shenzhen?</p>\n\n<p>And here we are, a year later, looking at the data, and the least sexy, least interesting GPU part in the catalog is where all the action is.</p>\n\n<p>With actual customer data to back up the hypothesis, here’s what we think is happening today:</p>\n\n<ul>\n<li>Most users who want to plug GPU-accelerated AI workloads into fast networks are doing inference, not training. \n</li><li>The hyperscaler public clouds strangle these customers, first with GPU instance surcharges, and then with egress fees for object storage data when those customers try to outsource the GPU stuff to GPU providers.\n</li><li>If you’re trying to do something GPU-accelerated in response to an HTTP request, the right combination of GPU, instance RAM, fast object storage for datasets and model parameters, and networking is much more important than getting your hands on an H100.\n</li></ul>\n\n<p>This is a thing we didn’t see coming, but should have: training workloads tend to look more like batch jobs, and inference tends to look more like transactions. Batch training jobs aren’t that sensitive to networking or even reliability. Live inference jobs responding to end-user HTTP requests are. So, given our pricing, of course the A10s are a sweet spot.</p>\n\n<p>The next step up in our lineup after the A10 is the L40S. The L40S is a nice piece of kit. We’re going to take a beat here and sell you on the L40S, because it’s kind of awesome.</p>\n\n<p>The L40S is an AI-optimized version of the L40, which is the data center version of the GeForce RTX 4090, resembling two 4090s stapled together.</p>\n\n<p>If you’re not a GPU hardware person, the RTX 4090 is a gaming GPU, the kind you’d play ray-traced Witcher 3 on. NVIDIA’s high-end gaming GPUs are actually reasonably good at AI workloads! But they suck in a data center rack: they chug power, they’re hard to cool, and they’re less dense. Also, NVIDIA can’t charge as much for them.</p>\n\n<p>Hence the L40: (much) more memory, less energy consumption, designed for a rack, not a tower case. Marked up for “enterprise”.</p>\n\n<p>NVIDIA positioned the L40 as a kind of “graphics” AI GPU. Unlike the super high-end cards like the A100/H100, the L40 keeps all the rendering hardware, so it’s good for 3D graphics and video processing. Which is sort of what you’d expect from a “professionalized” GeForce card.</p>\n\n<p>A funny thing happened in the middle of 2023, though: the market for ultra-high-end NVIDIA cards went absolutely batshit. The huge cards you’d gang up for training jobs got impossible to find, and NVIDIA became one of the most valuable companies in the world. Serious shops started working out plans to acquire groups of L40-type cards to work around the problem, whether or not they had graphics workloads.</p>\n\n<p>The only company in this space that does know what they’re doing is NVIDIA. Nobody has written a highly-ranked Reddit post about GPU workloads without NVIDIA noticing and creating a new SKU. So they launched the L40S, which is an L40 with AI workload compute performance comparable to that of the A100 (without us getting into the details of F32 vs. F16 models).</p>\n\n<p>Long story short, the L40S is an A100-performer that we can price for A10 customers; the Volkswagen GTI of our lineup. We’re going to see if we can make that happen.</p>\n\n<p>We think the combination of just-right-sized inference GPUs and Tigris object storage is pretty killer:</p>\n\n<ul>\n<li>model parameters, data sets, and compute are all close together\n</li><li>everything plugged into an Anycast network that’s fast everywhere in the world\n</li><li>on VM instances that have enough memory to actually run real frameworks on\n</li><li>priced like we actually want you to use it.\n</li></ul>\n\n<p>You should use L40S cards without thinking hard about it. So we’re making it official. You won’t pay us a dime extra to use one instead of an A10. Have at it! Revolutionize the industry. For $1.25 an hour.</p>\n\n<p>Here are things you can do with an L40S on Fly.io today:</p>\n\n<ul>\n<li>You can run Llama 3.1 70B — a big Llama — for LLM jobs.\n</li><li>You can run Flux from Black Forest Labs for genAI images.\n</li><li>You can run Whisper for automated speech recognition.\n</li><li>You can do whole-genome alignment with SegAlign (Thomas’ biochemist kid who has been snatching free GPU hours for his lab gave us this one, and we’re taking his word for it).\n</li><li>You can run DOOM Eternal, building the Stadia that Google couldn’t pull off, because the L40S hasn’t forgotten that it’s a graphics GPU. \n</li></ul>\n\n<p>It’s going to get chilly in Chicago in a month or so. Go light some cycles on fire! </p>",
      "image": {
        "url": "https://fly.io/blog/cutting-prices-for-l40s-gpus-in-half/assets/gpu-ga-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/machine-migrations/",
      "title": "Making Machines Move",
      "description": null,
      "url": "https://fly.io/blog/machine-migrations/",
      "published": "2024-07-30T00:00:00.000Z",
      "updated": "2024-08-09T12:14:08.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io, a global public cloud with simple, developer-friendly ergonomics. If you’ve got a working Docker image, we’ll transmogrify it into a Fly Machine: a VM running on our hardware anywhere in the world. <a href=\"https://fly.io/speedrun\" title=\"\">Try it out; you’ll be deployed in just minutes</a>.</p>\n</div>\n<p>At the heart of our platform is a systems design tradeoff about durable storage for applications.  When we added storage three years ago, to support stateful apps, we built it on attached NVMe drives. A benefit: a Fly App accessing a file on a Fly Volume is never more than a bus hop away from the data. A cost: a Fly App with an attached Volume is anchored to a particular worker physical.</p>\n<div class=\"right-sidenote\"><p><code>bird</code>: a BGP4 route server.</p>\n</div>\n<p>Before offering attached storage, our on-call runbook was almost as simple as “de-bird that edge server”, “tell <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>Nomad</a> to drain that worker”, and “go back to sleep”. NVMe cost us that drain operation, which terribly complicated the lives of our infra team. We’ve spent the last year getting “drain” back. It’s one of the biggest engineering lifts we’ve made, and if you didn’t notice, we lifted it cleanly.</p>\n<h3 id='the-goalposts' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-goalposts' aria-label='Anchor'></a><span class='plain-code'>The Goalposts</span></h3>\n<p>With stateless apps, draining a worker is easy. For each app instance running on the victim server, start a new instance elsewhere. Confirm it’s healthy, then kill the old one. Rinse, repeat. At our 2020 scale, we could drain a fully loaded worker in just a handful of minutes.</p>\n\n<p>You can see why this process won’t work for apps with attached volumes. Sure, create a new volume elsewhere on the fleet, and boot up a new Fly Machine attached to it. But the new volume is empty. The data’s still stuck on the original worker. We asked, and customers were not OK with this kind of migration.</p>\n\n<p>Of course, we back Volumes snapshots up (at an interval) to off-network storage. But for “drain”, restoring backups isn’t nearly good enough. No matter the backup interval, a “restore from backup migration\" will lose data, and a “backup and restore” migration  incurs untenable downtime.</p>\n\n<p>The next thought you have is, “OK, copy the volume over”. And, yes, of course you have to do that. But you can’t just <code>copy</code>, <code>boot</code>, and then <code>kill</code> the old Fly Machine. Because the original Fly Machine is still alive and writing, you have to <code>kill</code>first, then <code>copy</code>, then <code>boot</code>.</p>\n\n<p>Fly Volumes can get pretty big. Even to a rack buddy physical server, you’ll hit a point where draining incurs minutes of interruption, especially if you’re moving lots of volumes simultaneously. <code>Kill</code>, <code>copy</code>, <code>boot</code> is too slow.</p>\n<div class=\"callout\"><p>There’s a world where even 15 minutes of interruption is tolerable. It’s the world where you run more than one instance of your application to begin with, so prolonged interruption of a single Fly Machine isn’t visible to your users. Do this! But we have to live in the same world as our customers, many of whom don’t run in high-availability configurations.</p>\n</div><h3 id='behold-the-clone-o-mat' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#behold-the-clone-o-mat' aria-label='Anchor'></a><span class='plain-code'>Behold The Clone-O-Mat</span></h3>\n<p><code>Copy</code>, <code>boot</code>, <code>kill</code> loses data. <code>Kill</code>, <code>copy</code>, <code>boot</code> takes too long. What we needed is a new operation: <code>clone</code>.</p>\n\n<p><code>Clone</code> is a lazier, asynchronous <code>copy</code>. It creates a new volume elsewhere on our fleet, just like <code>copy</code> would. But instead of blocking, waiting to transfer every byte from the original volume, <code>clone</code> returns immediately, with a transfer running in the background.</p>\n\n<p>A new Fly Machine can be booted with that cloned volume attached. Its blocks are mostly empty. But that’s OK: when the new Fly Machine tries to read from it, the block storage system works out whether the block has been transferred. If it hasn’t, it’s fetched over the network from the original volume; this is called “hydration”. Writes are even easier, and don’t hit the network at all.</p>\n\n<p><code>Kill</code>, <code>copy</code>, <code>boot</code> is slow. But <code>kill</code>, <code>clone</code>, <code>boot</code> is fast; it can be made asymptotically as fast as stateless migration.</p>\n\n<p>There are three big moving pieces to this design.</p>\n\n<ol>\n<li>First, we have to rig up our OS storage system to make this <code>clone</code> operation work.\n</li><li>Then, to read blocks over the network, we need a network protocol. (Spoiler: iSCSI, though we tried other stuff first.)\n</li><li>Finally, the gnarliest of those pieces is our orchestration logic: what’s running where, what state is it in, and whether it’s plugged in correctly.\n</li></ol>\n<h3 id='block-level-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#block-level-clone' aria-label='Anchor'></a><span class='plain-code'>Block-Level Clone</span></h3>\n<p>The Linux feature we need to make this work already exists; <a href='https://docs.kernel.org/admin-guide/device-mapper/dm-clone.html' title=''>it’s called <code>dm-clone</code></a>. Given an existing, readable storage device, <code>dm-clone</code> gives us a new device, of identical size, where reads of uninitialized blocks will pull from the original. It sounds terribly complicated, but it’s actually one of the simpler kernel lego bricks. Let’s demystify it.</p>\n\n<p>As far as Unix is concerned, random-access storage devices, be they spinning rust or NVMe drives, are all instances of the common class “block device”. A block device is addressed in fixed-size (say, 4KiB) chunks, and <a href='https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L356' title=''>handles (roughly) these operations</a>:</p>\n<div class=\"highlight-wrapper group relative cpp\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-woz6bsz9\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-woz6bsz9\"><span class=\"k\">enum</span> <span class=\"n\">req_opf</span> <span class=\"p\">{</span>\n    <span class=\"cm\">/* read sectors from the device */</span>\n    <span class=\"n\">REQ_OP_READ</span>     <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">,</span>\n    <span class=\"cm\">/* write sectors to the device */</span>\n    <span class=\"n\">REQ_OP_WRITE</span>        <span class=\"o\">=</span> <span class=\"mi\">1</span><span class=\"p\">,</span>\n    <span class=\"cm\">/* flush the volatile write cache */</span>\n    <span class=\"n\">REQ_OP_FLUSH</span>        <span class=\"o\">=</span> <span class=\"mi\">2</span><span class=\"p\">,</span>\n    <span class=\"cm\">/* discard sectors */</span>\n    <span class=\"n\">REQ_OP_DISCARD</span>      <span class=\"o\">=</span> <span class=\"mi\">3</span><span class=\"p\">,</span>\n    <span class=\"cm\">/* securely erase sectors */</span>\n    <span class=\"n\">REQ_OP_SECURE_ERASE</span> <span class=\"o\">=</span> <span class=\"mi\">5</span><span class=\"p\">,</span>\n    <span class=\"cm\">/* write the same sector many times */</span>\n    <span class=\"n\">REQ_OP_WRITE_SAME</span>   <span class=\"o\">=</span> <span class=\"mi\">7</span><span class=\"p\">,</span>\n    <span class=\"cm\">/* write the zero filled sector many times */</span>\n    <span class=\"n\">REQ_OP_WRITE_ZEROES</span> <span class=\"o\">=</span> <span class=\"mi\">9</span><span class=\"p\">,</span>\n    <span class=\"cm\">/* ... */</span>\n<span class=\"p\">};</span>\n</code></pre>\n  </div>\n</div>\n<p>You can imagine designing a simple network protocol that supported all these options. It might have messages that looked something like:</p>\n\n<p><img alt=\"A packet diagram, just skip down to \"struct bio\" below\" src=\"/blog/machine-migrations/assets/packet.png?2/3&center\" />\nGood news! The Linux block system is organized as if your computer was a network running a protocol that basically looks just like that. Here’s the message structure:</p>\n<div class=\"right-sidenote\"><p>I’ve <a href=\"https://elixir.bootlin.com/linux/v5.11.11/source/include/linux/blk_types.h#L223\" title=\"\">stripped a bunch of stuff out of here</a> but you don’t need any of it to understand what’s coming next.</p>\n</div><div class=\"highlight-wrapper group relative cpp\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-kwrloyie\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-kwrloyie\"><span class=\"cm\">/*\n * main unit of I/O for the block layer and lower layers (ie drivers and\n * stacking drivers)\n */</span>\n<span class=\"k\">struct</span> <span class=\"nc\">bio</span> <span class=\"p\">{</span>\n    <span class=\"k\">struct</span> <span class=\"nc\">gendisk</span>      <span class=\"o\">*</span><span class=\"n\">bi_disk</span><span class=\"p\">;</span>\n    <span class=\"kt\">unsigned</span> <span class=\"kt\">int</span>        <span class=\"n\">bi_opf</span>\n    <span class=\"kt\">unsigned</span> <span class=\"kt\">short</span>      <span class=\"n\">bi_flags</span><span class=\"p\">;</span>   \n    <span class=\"kt\">unsigned</span> <span class=\"kt\">short</span>      <span class=\"n\">bi_ioprio</span><span class=\"p\">;</span>\n    <span class=\"n\">blk_status_t</span>        <span class=\"n\">bi_status</span><span class=\"p\">;</span>\n    <span class=\"kt\">unsigned</span> <span class=\"kt\">short</span>      <span class=\"n\">bi_vcnt</span><span class=\"p\">;</span>         <span class=\"cm\">/* how many bio_vec's */</span>\n    <span class=\"k\">struct</span> <span class=\"nc\">bio_vec</span>      <span class=\"n\">bi_inline_vecs</span><span class=\"p\">[]</span> <span class=\"cm\">/* (page, len, offset) tuples */</span><span class=\"p\">;</span>\n<span class=\"p\">};</span>\n</code></pre>\n  </div>\n</div>\n<p>No nerd has ever looked at a fixed-format message like this without thinking about writing a proxy for it, and <code>struct bio</code> is no exception. The proxy system in the Linux kernel for <code>struct bio</code> is called <code>device mapper</code>, or DM.</p>\n\n<p>DM target devices can plug into other DM devices. For that matter, they can do whatever the hell else they want, as long as they honor the interface. It boils down to a <code>map(bio)</code> function, which can dispatch a <code>struct bio</code>, or drop it, or muck with it and ask the kernel to resubmit it.</p>\n\n<p>You can do a whole lot of stuff with this interface: carve a big device into a bunch of smaller ones (<a href='https://docs.kernel.org/admin-guide/device-mapper/linear.html' title=''><code>dm-linear</code></a>), make one big striped device out of a bunch of smaller ones (<a href='https://docs.kernel.org/admin-guide/device-mapper/striped.html' title=''><code>dm-stripe</code></a>), do software RAID mirroring (<code>dm-raid1</code>), create snapshots of arbitrary existing devices (<a href='https://docs.kernel.org/admin-guide/device-mapper/snapshot.html' title=''><code>dm-snap</code></a>), cryptographically verify boot devices (<a href='https://docs.kernel.org/admin-guide/device-mapper/verity.html' title=''><code>dm-verity</code></a>), and a bunch more. Device Mapper is the kernel backend for the <a href='https://sourceware.org/lvm2/' title=''>userland LVM2 system</a>, which is how we do <a href='https://fly.io/blog/persistent-storage-and-fast-remote-builds/' title=''>thin pools and snapshot backups</a>.</p>\n\n<p>Which brings us to <code>dm-clone</code> : it’s a map function that boils down to:</p>\n<div class=\"highlight-wrapper group relative cpp\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-8n5vrld6\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-8n5vrld6\">    <span class=\"cm\">/* ... */</span> \n    <span class=\"n\">region_nr</span> <span class=\"o\">=</span> <span class=\"n\">bio_to_region</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"p\">,</span> <span class=\"n\">bio</span><span class=\"p\">);</span>\n\n    <span class=\"c1\">// we have the data</span>\n    <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"n\">dm_clone_is_region_hydrated</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"o\">-></span><span class=\"n\">cmd</span><span class=\"p\">,</span> <span class=\"n\">region_nr</span><span class=\"p\">))</span> <span class=\"p\">{</span>\n        <span class=\"n\">remap_and_issue</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"p\">,</span> <span class=\"n\">bio</span><span class=\"p\">);</span>\n        <span class=\"k\">return</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n\n    <span class=\"c1\">// we don't and it's a read</span>\n    <span class=\"p\">}</span> <span class=\"k\">else</span> <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"n\">bio_data_dir</span><span class=\"p\">(</span><span class=\"n\">bio</span><span class=\"p\">)</span> <span class=\"o\">==</span> <span class=\"n\">READ</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n        <span class=\"n\">remap_to_source</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"p\">,</span> <span class=\"n\">bio</span><span class=\"p\">);</span>\n        <span class=\"k\">return</span> <span class=\"mi\">1</span><span class=\"p\">;</span>\n    <span class=\"p\">}</span>\n\n    <span class=\"c1\">// we don't and it's a write</span>\n    <span class=\"n\">remap_to_dest</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"p\">,</span> <span class=\"n\">bio</span><span class=\"p\">);</span>\n    <span class=\"n\">hydrate_bio_region</span><span class=\"p\">(</span><span class=\"n\">clone</span><span class=\"p\">,</span> <span class=\"n\">bio</span><span class=\"p\">);</span>\n    <span class=\"k\">return</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n    <span class=\"cm\">/* ... */</span> \n</code></pre>\n  </div>\n</div><div class=\"right-sidenote\"><p>a <a href=\"https://docs.kernel.org/admin-guide/device-mapper/kcopyd.html\" title=\"\"><code>kcopyd</code></a> thread runs in the background, rehydrating the device in addition to (and independent of) read accesses.</p>\n</div>\n<p><code>dm-clone</code> takes, in addition to the source device to clone from, a “metadata” device on which is stored a bitmap of the status of all the blocks: either “rehydrated” from the source, or not. That’s how it knows whether to fetch a block from the original device or the clone.</p>\n<h3 id='network-clone' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#network-clone' aria-label='Anchor'></a><span class='plain-code'>Network Clone</span></h3><div class=\"callout\"><p><strong class=\"font-semibold text-navy-950\"><code>flyd</code> in a nutshell:</strong> worker physical run a service, <code>flyd</code>, which manages a couple of databases that are the source of truth for all the Fly Machines running there. Concepturally, <code>flyd</code> is a server for on-demand instances of durable finite state machines, each representing some operation on a Fly Machine (creation, start, stop, &c), with the transition steps recorded carefully in a BoltDB database. An FSM step might be something like “assign a local IPv6 address to this interface”, or “check out a block device with the contents of this container”, and it’s straightforward to add and manage new ones.</p>\n</div>\n<p>Say we’ve got <code>flyd</code> managing a Fly Machine with a volume on <code>worker-xx-cdg1-1</code>. We want it running on <code>worker-xx-cdg1-2</code>. Our whole fleet is meshed with WireGuard; everything can talk directly to everything else. So, conceptually:</p>\n\n<ol>\n<li><code>flyd</code> on <code>cdg1-1</code> stops the Fly Machine, and\n</li><li>sends a message to <code>flyd</code> on <code>cdg1-2</code> telling it to clone the source volume.\n</li><li><code>flyd</code> on <code>cdg1-2</code> starts a <code>dm-clone</code> instance, which creates a clone volume on <code>cdg1-2</code>, populating it, over some kind of network block protocol, from <code>cdg1-1</code>, and\n</li><li>boots a new Fly Machine, attached to the clone volume.\n</li><li><code>flyd</code> on <code>cdg1-2</code> monitors the clone operation, and, when the clone completes, converts the clone device to a simple linear device and cleans up.\n</li></ol>\n\n<p>For step (3) to work, the “original volume” on <code>cdg1-1</code> has to be visible on <code>cdg1-2</code>, which means we need to mount it over the network.</p>\n<div class=\"right-sidenote\"><p><code>nbd</code> is so simple that it’s used as a sort of <code>dm-user</code> userland block device; to prototype a new block device, <a href=\"https://lwn.net/ml/linux-kernel/[email protected]/\" title=\"\">don’t bother writing a kernel module</a>, just write an <code>nbd</code> server.</p>\n</div>\n<p>Take your pick of  protocols. iSCSI is the obvious one, but it’s relatively complicated, and Linux has native support for a much simpler one: <code>nbd</code>, the “network block device”. You could implement an <code>nbd</code> server in an afternoon, on top of a file or a SQLite database or S3, and the Linux kernel could mount it as a drive.</p>\n\n<p>We started out using <code>nbd</code>. But we kept getting stuck <code>nbd</code> kernel threads when there was any kind of network disruption. We’re a global public cloud; network disruption  happens. Honestly, we could have debugged our way through this. But it was simpler just to spike out an iSCSI implementation, observe that didn’t get jammed up when the network hiccuped, and move on.</p>\n<h3 id='putting-the-pieces-together' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#putting-the-pieces-together' aria-label='Anchor'></a><span class='plain-code'>Putting The Pieces Together</span></h3>\n<p>To drain a worker with minimal downtime and no lost data, we turn workers into a temporary SANs, serving the volumes we need to drain to fresh-booted replica Fly Machines on a bunch of “target” physicals. Those SANs — combinations of <code>dm-clone</code>, iSCSI, and <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>our <code>flyd</code> orchestrator</a> — track the blocks copied from the origin, copying each one exactly once and cleaning up when the original volume has been fully copied.</p>\n\n<p>Problem solved!</p>\n<h3 id='no-there-were-more-problems' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#no-there-were-more-problems' aria-label='Anchor'></a><span class='plain-code'>No, There Were More Problems</span></h3>\n<p>When your problem domain is hard, anything you build whose design you can’t fit completely in your head is going to be a fiasco. Shorter form: “if you see Raft consensus in a design, we’ve done something wrong”.</p>\n\n<p>A virtue of this migration system is that, for as many moving pieces as it has, it fits in your head. What complexity it has is mostly shouldered by strategic bets we’ve already  built teams around, most notably the <code>flyd</code> orchestrator. So we’ve been running this system for the better part of a year without much drama. Not no drama, though. Some drama.</p>\n\n<p>Example: we encrypt volumes. Our key management is fussy. We do per-volume encryption keys that provision alongside the volumes themselves, so no one worker has a volume skeleton key.</p>\n\n<p>If you think “migrating those volume keys from worker to worker” is the problem I’m building up to, well, that too, but the bigger problem is <code>trim</code>.</p>\n\n<p>Most people use just a small fraction of the volumes they allocate. A 100GiB volume with just 5MiB used wouldn’t be at all weird. You don’t want to spend minutes copying a volume that could have been fully hydrated in seconds.</p>\n\n<p>And indeed, <code>dm-clone</code> doesn’t want to do that either. Given a source block device (for us, an iSCSI mount) and the clone device, a <code>DISCARD</code> issued on the clone device will get picked up by <code>dm-clone</code>, which will simply <a href='https://elixir.bootlin.com/linux/v5.11.11/source/drivers/md/dm-clone-target.c#L1357' title=''>short-circuit the read</a> of the relevant blocks by marking them as hydrated in the metadata volume. Simple enough.</p>\n\n<p>To make that work, we need the target worker to see the plaintext of the source volume (so that it can do an <code>fstrim</code> — don’t get us started on how annoying it is to sandbox this — to read the filesystem, identify the unused block, and issue the <code>DISCARDs</code> where <code>dm-clone</code> can see them) Easy enough.</p>\n<div class=\"right-sidenote\"><p>these curses have a lot to do with how hard it was to drain workers!</p>\n</div>\n<p>Except: two different workers, for cursed reasons, might be running different versions of <a href='https://gitlab.com/cryptsetup/cryptsetup' title=''>cryptsetup</a>, the userland bridge between LUKS2 and the <a href='https://docs.kernel.org/admin-guide/device-mapper/dm-crypt.html' title=''>kernel dm-crypt driver</a>. There are (or were) two different versions of cryptsetup on our network, and they default to different <a href='https://fossies.org/linux/cryptsetup/docs/on-disk-format-luks2.pdf' title=''>LUKS2 header sizes</a> — 4MiB and 16MiB. Implying two different plaintext volume sizes. </p>\n\n<p>So now part of the migration FSM is an RPC call that carries metadata about the designed LUKS2 configuration for the target VM. Not something we expected to have to build, but, whatever.</p>\n<div class=\"right-sidenote\"><p>Corrosion deserves its own post.</p>\n</div>\n<p>Gnarlier example: workers are the source of truth for information about the Fly Machines running on them. Migration knocks the legs out from under that constraint, which we were relying on in Corrosion, the SWIM-gossip SQLite database we use to connect Fly Machines to our request routing. Race conditions. Debugging. Design changes. Next!</p>\n\n<p>Gnarliest example: our private networks. Recall: we automatically place every Fly Machine into <a href='https://fly.io/blog/incoming-6pn-private-networks/' title=''>a private network</a>; by default, it’s the one all the other apps in your organization run in. This is super handy for setting up background services, databases, and clustered applications. 20 lines of eBPF code in our worker kernels keeps anybody from “crossing the streams”, sending packets from one private network to another.</p>\n<div class=\"right-sidenote\"><p>we’re members of an elite cadre of idiots who have managed to find designs that made us wish IPv6 addresses were even bigger.</p>\n</div>\n<p>We call this scheme 6PN (for “IPv6 Private Network”). It functions by <a href='https://fly.io/blog/ipv6-wireguard-peering#ipv6-private-networking-at-fly' title=''>embedding routing information directly into IPv6 addresses</a>. This is, perhaps, gross. But it allows us to route diverse private networks with constantly changing membership across a global fleet of servers without running a distributed routing protocol. As the beardy wizards who kept the Internet backbone up and running on Cisco AGS+’s once said: the best routing protocol is “static”.</p>\n\n<p>Problem: the embedded routing information in a 6PN address refers in part to specific worker servers.</p>\n\n<p>That’s fine, right? They’re IPv6 addresses. Nobody uses literal IPv6 addresses. Nobody uses IP addresses at all; they use the DNS. When you migrate a host, just give it a new 6PN address, and update the DNS.</p>\n\n<p>Friends, somebody did use literal IPv6 addresses. It was us. In the configurations for Fly Postgres clusters.</p>\n<div class=\"right-sidenote\"><p>It’s also not operationally easy for us to shell into random Fly Machines, for good reason.</p>\n</div>\n<p>The obvious fix for this is not complicated; given <code>flyctl</code> ssh access to a Fly Postgres cluster, it’s like a 30 second ninja edit. But we run a <em>lot</em> of Fly Postgres clusters, and the change has to be coordinated carefully to avoid getting the cluster into a confused state. We went as far as adding feature to our <code>init</code> to do network address mappings to keep old 6PN addresses reachable before biting the bullet and burning several weeks doing the direct configuration fix fleet-wide.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Speedrun your app onto Fly.io.</h1>\n    <p>3…2…1…</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/speedrun\">\n        Go!  <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-dog.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h3 id='the-learning-it-burns' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-learning-it-burns' aria-label='Anchor'></a><span class='plain-code'>The Learning, It Burns!</span></h3>\n<p>We get asked a lot why we don’t do storage the “obvious” way, with an <a href='https://aws.amazon.com/ebs/' title=''>EBS-type</a> SAN fabric, abstracting it away from our compute. Locally-attached NVMe storage is an idiosyncratic choice, one we’ve had to write disclaimers for (single-node clusters can lose data!) since we first launched it.</p>\n\n<p>One answer is: we’re a startup. Building SAN infrastructure in every region we operate in would be tremendously expensive. Look at any feature in AWS that normal people know the name of, like EC2, EBS, RDS, or S3 — there’s a whole company in there. We launched storage when we were just 10 people, and even at our current size we probably have nothing resembling the resources EBS gets. AWS is pretty great!</p>\n\n<p>But another thing to keep in mind is: we’re learning as we go. And so even if we had the means to do an EBS-style SAN, we might not build it today.</p>\n\n<p>Instead, we’re a lot more interested in log-structured virtual disks (LSVD). LSVD uses NVMe as a local cache, but durably persists writes in object storage. You get most of the performance benefit of bus-hop disk writes, along with unbounded storage and S3-grade reliability.</p>\n\n<p><a href='https://community.fly.io/t/bottomless-s3-backed-volumes/15648' title=''>We launched LSVD experimentally last year</a>; in the intervening year, something happened to make LSVD even more interesting to us: <a href='https://www.tigrisdata.com/' title=''>Tigris Data</a> launched S3-compatible object storage in every one our regions, so instead of backhauling updates to Northern Virginia, <a href='https://community.fly.io/t/tigris-backed-volumes/20792' title=''>we can keep them local</a>. We have more to say about LSVD, and a lot more to say about Tigris.</p>\n\n<p>Our first several months of migrations were done gingerly. By summer of 2024, we got to where our infra team can pull “drain this host” out of their toolbelt without much ceremony.</p>\n\n<p>We’re still not to the point where we’re migrating casually. Your Fly Machines are probably not getting migrated! There’d need to be a reason! But the dream is fully-automated luxury space migration, in which you might get migrated semiregularly, as our systems work not just to drain problematic hosts but to rebalance workloads regularly. No time soon. But we’ll get there.</p>\n\n<p>This is the biggest thing our team has done since we replaced Nomad with flyd. Only the new billing system comes close. We did this thing not because it was easy, but because we thought it would be easy. It was not. But: worth it!</p>",
      "image": {
        "url": "https://fly.io/blog/machine-migrations/assets/migrations-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/oidc-cloud-roles/",
      "title": "AWS without Access Keys",
      "description": null,
      "url": "https://fly.io/blog/oidc-cloud-roles/",
      "published": "2024-06-19T00:00:00.000Z",
      "updated": "2024-06-25T22:52:32.000Z",
      "content": "<div class=\"lead\"><p>It’s dangerous to go alone. Fly.io runs full-stack apps by transmuting Docker containers into Fly Machines: ultra-lightweight hardware-backed VMs. You can run all your dependencies on Fly.io, but sometimes, you’ll need to work with other clouds, and we’ve made that pretty simple. Try Fly.io out for yourself; your Rails or Node app <a href=\"https://fly.io/speedrun\" title=\"\">can be up and running in just minutes</a>.</p>\n</div>\n<p>Let’s hypopulate you an app serving generative AI cat images based on the weather forecast, running on a <code>g4dn.xlarge</code> ECS task in AWS <code>us-east-1</code>.  It’s going great; people didn’t realize how dependent their cat pic prefs are on barometric pressure, and you’re all anyone can talk about.</p>\n\n<p>Word reaches Australia and Europe, but you’re not catching on, because the… latency is too high? Just roll with us here. Anyways: fixing this is going to require replicating  ECS tasks and ECR images into <code>ap-southeast-2</code> and <code>eu-central-1</code> while also setting up load balancing. Nah.</p>\n\n<p>This is the O.G. Fly.io deployment story; one deployed app, one versioned container, one command to get it running anywhere in the world.</p>\n\n<p>But you have a problem: your app relies on training data, it’s huge, your giant employer manages it, and it’s in S3. Getting this to work will require AWS credentials.</p>\n\n<p>You could ask your security team to create a user, give it permissions, and hand over the AWS keypair. Then you could wash your neck and wait for the blade. Passing around AWS keypairs is the beginning of every horror story told about cloud security, and  security team ain’t having it.</p>\n\n<p>There’s a better way. It’s drastically more secure, so your security people will at least hear you out. It’s also so much easier on Fly.io that you might never bother creating a IAM service account again.</p>\n<h2 id='lets-get-it-out-of-the-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#lets-get-it-out-of-the-way' aria-label='Anchor'></a><span class='plain-code'>Let’s Get It out of the Way</span></h2>\n<p>We’re going to use OIDC to set up strictly limited trust between AWS and Fly.io.</p>\n\n<ol>\n<li>In AWS: we’ll add Fly.io as an <code>Identity Provider</code> in AWS IAM, giving us an ID we can plug into any IAM <code>Role</code>.\n</li><li>Also in AWS: we’ll create a <code>Role</code>, give it access to the S3 bucket with our tokenized cat data, and then attach the <code>Identity Provider</code> to it.\n</li><li>In Fly.io, we’ll take the <code>Role</code> ARN we got from step 2 and set it as an environment variable in our app.\n</li></ol>\n\n<p>Our machines will now magically have access to the S3 bucket.</p>\n<h2 id='what-the-what' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-the-what' aria-label='Anchor'></a><span class='plain-code'>What the What</span></h2>\n<p>A reasonable question to ask here is, “where’s the credential”? Ordinarily, to give a Fly Machine access to an AWS resource, you’d use <code>fly secrets set</code> to add an <code>AWS_ACCESS_KEY_ID</code> and <code>AWS_SECRET_ACCESS_KEY</code> to the environment in the Machine. Here, we’re not setting any secrets at all; we’re just adding an ARN — which is not a credential — to the Machine.</p>\n\n<p>Here’s what’s happening.</p>\n\n<p>Fly.io operates an OIDC IdP at <code>oidc.fly.io</code>. It issues OIDC tokens, exclusively to Fly Machines. AWS can be configured to trust these tokens, on a role-by-role basis. That’s the “secret credential”: the pre-configured trust relationship in IAM, and the public keypairs it manages. You, the user, never need to deal with these keys directly; it all happens behind the scenes, between AWS and Fly.io.</p>\n\n<p><img alt=\"A diagram: STS trusts OIDC.fly.io. OIDC.fly.io trusts flyd. flyd issues a token to the Machine, which proffers it to STS. STS sends an STS cred to the Machine, which then uses it to retrieve model weights from S3.\" src=\"/blog/oidc-cloud-roles/assets/oidc-diagram.webp\" /></p>\n\n<p>The key actor in this picture is <code>STS</code>, the AWS <code>Security Token Service</code>. <code>STS</code>‘s main job is to vend short-lived AWS credentials, usually through some variant of an API called <code>AssumeRole</code>. Specifically, in our case: <code>AssumeRoleWithWebIdentity</code> tells <code>STS</code> to cough up an AWS keypair given an OIDC token (that matches a pre-configured trust relationship).</p>\n\n<p>That still leaves the question: how does your code, which is reaching out to the AWS APIs to get cat weights, drive any of this?</p>\n<h2 id='the-init-thickens' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-init-thickens' aria-label='Anchor'></a><span class='plain-code'>The Init Thickens</span></h2>\n<p>Every Fly Machine boots up into an <code>init</code> we wrote in Rust. It has slowly been gathering features.</p>\n\n<p>One of those features, which has been around for awhile, is a server for a Unix socket at <code>/.fly/api</code>, which exports a subset of the Fly Machines API to privileged processes in the Machine. Think of it as our answer to the EC2 Instant Metadata Service. How it works is, every time we boot a Fly Machine, we pass it a <a href='https://fly.io/blog/macaroons-escalated-quickly/' title=''>Macaroon token</a> locked to that particular Machine; <code>init</code>’s server for <code>/.fly/api</code> is a proxy that attaches that token to requests.</p>\n<div class=\"right-sidenote\"><p>In addition to the API proxy being tricky to SSRF to.</p>\n</div>\n<p>What’s neat about this is that the credential that drives <code>/.fly/api</code> is doubly protected:</p>\n\n<ol>\n<li>The Fly.io platform won’t honor it unless it comes from that specific Fly Machine (<code>flyd</code>, our orchestrator, knows who it’s talking to), <em>and</em>\n</li><li>Ordinary code running in a Fly Machine never gets a copy of the token to begin with.\n</li></ol>\n\n<p>You could rig up a local privilege escalation vulnerability and work out how to steal the Macaroon, but you can’t exfiltrate it productively.</p>\n\n<p>So now you have half the puzzle worked out: OIDC is just part of the <a href='https://fly.io/docs/machines/api/' title=''>Fly Machines API</a> (specifically: <code>/v1/tokens/oidc</code>). A Fly Machine can hit a Unix socket and get an OIDC token tailored to that machine:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-xdcj19sc\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-xdcj19sc\">{\n  \"app_id\": \"3671581\",\n  \"app_name\": \"weather-cat\",\n  \"aud\": \"sts.amazonaws.com\",\n  \"image\": \"image:latest\",\n  \"image_digest\": \"sha256:dff79c6da8dd4e282ecc6c57052f7cfbd684039b652f481ca2e3324a413ee43f\",\n  \"iss\": \"https://oidc.fly.io/example\",\n  \"machine_id\": \"3d8d377ce9e398\",\n  \"machine_name\": \"ancient-snow-4824\",\n  \"machine_version\": \"01HZJXGTQ084DX0G0V92QH3XW4\",\n  \"org_id\": \"29873298\",\n  \"org_name\": \"example\",\n  \"region\": \"yyz\",\n  \"sub\": \"example:weather-cat:ancient-snow-4824\"\n} // some OIDC stuff trimmed\n</code></pre>\n  </div>\n</div>\n<p>Look upon this holy blob, sealed with a published key managed by Fly.io’s OIDC vault, and see that there lies within it enough information for AWS <code>STS</code> to decide to issue a session credential.</p>\n\n<p>We have still not completed the puzzle, because while you can probably now see how you’d drive this process with a bunch of new code that you’d tediously write, you are acutely aware that you have not yet endured that tedium — e pur si muove!</p>\n\n<p>One <code>init</code> feature remains to be disclosed, and it’s cute.</p>\n\n<p>If, when <code>init</code> starts in a Fly Machine, it sees an <code>AWS_ROLE_ARN</code> environment variable set, it initiates a little dance; it:</p>\n\n<ol>\n<li>goes off and generates an OIDC token, the way we just described,\n</li><li>saves that OIDC token in a file, <em>and</em>\n</li><li>sets the <code>AWS_WEB_IDENTITY_TOKEN_FILE</code> and <code>AWS_ROLE_SESSION_NAME</code> environment variables for every process it launches.\n</li></ol>\n\n<p>The AWS SDK, linked to your application, does all the rest.</p>\n\n<p>Let’s review: you add an <code>AWS_ROLE_ARN</code> variable to your Fly App, launch a Machine, and have it go fetch a file from S3. What happens next is:</p>\n\n<ol>\n<li><code>init</code> detects <code>AWS_ROLE_ARN</code> is set as an environment variable.\n</li><li><code>init</code> sends a request to <code>/v1/tokens/oidc</code> via <code>/.api/proxy</code>.\n</li><li><code>init</code> writes the response to <code>/.fly/oidc_token.</code>\n</li><li><code>init</code> sets <code>AWS_WEB_IDENTITY_TOKEN_FILE</code> and <code>AWS_ROLE_SESSION_NAME</code>.\n</li><li>The entrypoint boots, and (say) runs <code>aws s3 get-object.</code>\n</li><li>The AWS SDK runs through the <a href='https://docs.aws.amazon.com/sdkref/latest/guide/standardized-credentials.html#credentialProviderChain' title=''>credential provider chain</a>\n</li><li>The SDK sees that <code>AWS_WEB_IDENTITY_TOKEN_FILE</code> is set and calls <code>AssumeRoleWithWebIdentity</code> with the file contents.\n</li><li>AWS verifies the token against <a href='https://oidc.fly.io/' title=''><code>https://oidc.fly.io/</code></a><code>example/.well-known/openid-configuration</code>, which references a key Fly.io manages on isolated hardware.\n</li><li>AWS vends <code>STS</code> credentials for the assumed <code>Role</code>.\n</li><li>The SDK uses the <code>STS</code> credentials to access the S3 bucket.\n</li><li>AWS checks the <code>Role</code>’s IAM policy to see if it has access to the S3 bucket.\n</li><li>AWS returns the contents of the bucket object.\n</li></ol>\n<h2 id='how-much-better-is-this' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#how-much-better-is-this' aria-label='Anchor'></a><span class='plain-code'>How Much Better Is This?</span></h2>\n<p>It is a lot better.</p>\n<div class=\"right-sidenote\"><p>They asymptotically approach the security properties of Macaroon tokens.</p>\n</div>\n<p>Most importantly: AWS <code>STS</code> credentials are short-lived. Because they’re generated dynamically, rather than stored in a configuration file or environment variable, they’re already a little bit annoying for an attacker to recover. But they’re also dead in minutes. They have a sharply limited blast radius. They rotate themselves, and fail closed.</p>\n\n<p>They’re also easier to manage. This is a rare instance where you can reasonably drive the entire AWS side of the process from within the web console. Your cloud team adds <code>Roles</code> all the time; this is just a <code>Role</code> with an extra snippet of JSON. The resulting ARN isn’t even a secret; your cloud team could just email or Slack message it back to you.</p>\n\n<p>Finally, they offer finer-grained control.</p>\n\n<p>To understand the last part, let’s look at that extra snippet of JSON (the “Trust Policy”) your cloud team is sticking on the new <code>cat-bucket</code> <code>Role</code>:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-x99m930o\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-x99m930o\">{\n    \"Version\": \"2012-10-17\",\n    \"Statement\": [\n        {\n            \"Effect\": \"Allow\",\n            \"Principal\": {\n                \"Federated\": \"arn:aws:iam::123456123456:oidc-provider/oidc.fly.io/example\"\n            },\n            \"Action\": \"sts:AssumeRoleWithWebIdentity\",\n            \"Condition\": {\n              \"StringEquals\": {\n                \"oidc.fly.io/example:aud\": \"sts.amazonaws.com\",\n              },\n               \"StringLike\": {\n                \"oidc.fly.io/example:sub\": \"example:weather-cat:*\"\n              }\n            }\n        }\n    ]\n}\n</code></pre>\n  </div>\n</div><div class=\"right-sidenote\"><p>The <code>aud</code> check guarantees <code>STS</code> will only honor tokens that Fly.io deliberately vended for <code>STS</code>.</p>\n</div>\n<p>Recall the OIDC token we dumped earlier; much of what’s in it, we can match in the Trust Policy. Every OIDC token Fly.io generates is going to have a <code>sub</code> field formatted <code>org:app:machine</code>, so we can lock IAM <code>Roles</code> down to organizations, or to specific Fly Apps, or even specific Fly Machine instances.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Speedrun your app onto Fly.io.</h1>\n    <p>3…2…1…</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/speedrun\">\n        Go!  <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-kitty.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='and-so' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#and-so' aria-label='Anchor'></a><span class='plain-code'>And So</span></h2>\n<p>In case it’s not obvious: this pattern works for any AWS API, not just S3.</p>\n\n<p>Our OIDC support on the platform and in Fly Machines will set arbitrary OIDC <code>audience</code> strings, so you can use it to authenticate to any OIDC-compliant cloud provider. It won’t be as slick on Azure or GCP, because we haven’t done the <code>init</code> features to light their APIs up with a single environment variable — but those features are easy, and we’re just waiting for people to tell us what they need.</p>\n\n<p>For us, the gold standard for least-privilege, conditional access tokens remains Macaroons, and it’s unlikely that we’re going to do a bunch of internal stuff using OIDC. We even snuck Macaroons into this feature. But the security you’re getting from this OIDC dance closes a lot of the gap between hardcoded user credentials and Macaroons, and it’s easy to use — easier, in some ways, than it is to manage role-based access inside of a legacy EC2 deployment!</p>",
      "image": {
        "url": "https://fly.io/blog/oidc-cloud-roles/assets/spooky-security-skeleton-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/llm-image-description/",
      "title": "Picture This: Open Source AI for Image Description",
      "description": null,
      "url": "https://fly.io/blog/llm-image-description/",
      "published": "2024-05-09T00:00:00.000Z",
      "updated": "2024-05-23T20:00:21.000Z",
      "content": "<div class=\"lead\"><p>I’m Nolan, and I work on Fly Machines here at Fly.io. We’re building a new public cloud—one where you can spin up CPU and GPU workloads, around the world, in a jiffy. <a href=\"https://fly.io/speedrun/\" title=\"\">Try us out</a>; you can be up and running in minutes. This is a post about LLMs being really helpful, and an extensible project you can build with open source on a weekend.</p>\n</div>\n<p>Picture this, if you will.</p>\n\n<p>You’re blind. You’re in an unfamiliar hotel room on a trip to Chicago.</p>\n<div class=\"right-sidenote\"><p>If you live in Chicago IRL, imagine the hotel in Winnipeg, <a href=\"https://www.cbc.ca/history/EPISCONTENTSE1EP10CH3PA5LE.html\" title=\"\">the Chicago of the North</a>.</p>\n</div>\n<p>You’ve absent-mindedly set your coffee down, and can’t remember where. You’re looking for the thermostat so you don’t wake up frozen. Or, just maybe, you’re playing a fun-filled round of “find the damn light switch so your sighted partner can get some sleep already!”</p>\n\n<p>If, like me, you’ve been blind for a while, you have plenty of practice finding things without the luxury of a quick glance around. It may be more tedious than you’d like, but you’ll get it done.</p>\n\n<p>But the speed of innovation in machine learning and large language models has been dizzying, and in 2024 you can snap a photo with your phone and have an app like <a href='https://www.bemyeyes.com/blog/announcing-be-my-ai/' title=''>Be My AI</a> or <a href='https://www.seeingai.com/' title=''>Seeing AI</a> tell you where in that picture it found your missing coffee mug, or where it thinks the light switch is.</p>\n<div class=\"right-sidenote\"><p>Creative switch locations seem to be a point of pride for hotels, so the light game may be good for a few rounds of quality entertainment, regardless of how good your AI is.</p>\n</div>\n<p>This is <em>big</em>. It’s hard for me to state just how exciting and empowering AI image descriptions have been for me without sounding like a shill. In the past year, I’ve:</p>\n\n<ul>\n<li>Found shit in strange hotel rooms. \n</li><li>Gotten descriptions of scenes and menus in otherwise inaccessible video games.\n</li><li>Requested summaries of technical diagrams and other materials where details weren’t made available textually. \n</li></ul>\n\n<p>I’ve been consistently blown away at how impressive and helpful AI-created image descriptions have been.</p>\n\n<p>Also…</p>\n<h2 id='which-thousand-words-is-this-picture-worth' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#which-thousand-words-is-this-picture-worth' aria-label='Anchor'></a><span class='plain-code'>Which thousand words is this picture worth?</span></h2>\n<p>As a blind internet user for the last three decades, I have extensive empirical evidence to corroborate what you already know in your heart: humans are pretty flaky about writing useful alt text for all the images they publish. This does tend to make large swaths of the internet inaccessible to me!</p>\n\n<p>In just a few years, the state of image description on the internet has gone from complete reliance on the aforementioned lovable, but ultimately tragically flawed, humans, to automated strings of words like <code>Image may contain person, glasses, confusion, banality, disillusionment</code>, to LLM-generated text that reads a lot like it was written by a person, perhaps sipping from a steaming cup of Earl Grey as they reflect on their previous experiences of a background that features a tree with snow on its branches, suggesting that this scene takes place during winter.</p>\n\n<p>If an image is missing alt text, or if you want a second opinion, there are screen-reader addons, like <a href='https://github.com/cartertemm/AI-content-describer/' title=''>this one</a> for <a href='https://www.nvaccess.org/download/' title=''>NVDA</a>, that you can use with an API key to get image descriptions from GPT-4 or Google Gemini as you read. This is awesome! </p>\n\n<p>And this brings me to the nerd snipe. How hard would it be to build an image description service we can host ourselves, using open source technologies? It turns out to be spookily easy.</p>\n\n<p>Here’s what I came up with:</p>\n\n<ol>\n<li><a href='https://ollama.com/' title=''>Ollama</a> to run the model\n</li><li>A <a href='https://pocketbase.io' title=''>PocketBase</a> project that provides a simple authenticated API for users to submit images, get descriptions, and ask followup questions about the image\n</li><li>The simplest possible Python client to interact with the PocketBase app on behalf of users\n</li></ol>\n\n<p>The idea is to keep it modular and hackable, so if sentiment analysis or joke creation is your thing, you can swap out image description for that and have something going in, like, a weekend.</p>\n\n<p>If you’re like me, and you go skipping through recipe blogs to find the “go directly to recipe” link, find the code itself <a href='https://github.com/superfly/llm-describer' title=''>here</a>. </p>\n<h2 id='the-llm-is-the-easiest-part' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#the-llm-is-the-easiest-part' aria-label='Anchor'></a><span class='plain-code'>The LLM is the easiest part</span></h2>\n<p>An API to accept images and prompts, run the model, and spit \nout answers sounds like a lot! But it’s the simplest part of this whole thing, because: \nthat’s <a href='https://ollama.com/' title=''>Ollama</a>.</p>\n\n<p>You can just run the Ollama Docker image, get it to grab the model \nyou want to use, and that’s it. There’s your AI server. (We have a <a href='https://fly.io/blog/scaling-llm-ollama/' title=''>blog post</a> \nall about deploying Ollama on Fly.io; Fly GPUs are rad, try'em out, etc.).</p>\n\n<p>For this project, we need a model that can make sense—or at least words—out of a picture. \n<a href='https://llava-vl.github.io/' title=''>LLaVA</a> is a trained, Apache-licensed “large multimodal model” that fits the bill. \nGet the model with the Ollama CLI:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-vsa102iz\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-vsa102iz\">ollama pull llava:34b\n</code></pre>\n  </div>\n</div><div class=\"callout\"><p>If you have hardware that can handle it, you could run this on your computer at home. If you run AI models on a cloud provider, be aware that GPU compute is expensive! <strong class=\"font-semibold text-navy-950\">It’s important to take steps to ensure you’re not paying for a massive GPU 24/7.</strong></p>\n\n<p>On Fly.io, at the time of writing, you’d achieve this with the <a href=\"https://fly.io/docs/apps/autostart-stop/\" title=\"\">autostart and autostop</a> functions of the Fly Proxy, restricting Ollama access to internal requests over <a href=\"https://fly.io/docs/networking/private-networking/#flycast-private-fly-proxy-services\" title=\"\">Flycast</a> from the PocketBase app. That way, if there haven’t been any requests for a few minutes, the Fly Proxy stops the Ollama <a href=\"https://fly.io/docs/machines/\" title=\"\">Machine</a>, which releases the CPU, GPU, and RAM allocated to it. <a href=\"https://fly.io/blog/scaling-llm-ollama/\" title=\"\">Here’s a post</a> that goes into more detail. </p>\n</div><h2 id='a-multi-tool-on-the-backend' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-multi-tool-on-the-backend' aria-label='Anchor'></a><span class='plain-code'>A multi-tool on the backend</span></h2>\n<p>I want user auth to make sure just anyone can’t grab my “image description service” and keep it busy generating short stories about their cat. If I build this out into a service for others to use, I might also want business logic around plans or\ncredits, or mobile-friendly APIs for use in the field. <a href='https://pocketbase.io' title=''>PocketBase</a> provides a scaffolding for all of it. It’s a Swiss army knife: a Firebase-like API on top of SQLite, complete with authentication, authorization, an admin UI, extensibility in JavaScript and Go, and various client-side APIs.</p>\n<div class=\"right-sidenote\"><p>Yes, <em>of course</em> I’ve used an LLM to generate feline fanfic. Theme songs too. Hasn’t everyone? </p>\n</div>\n<p>I “faked” a task-specific API that supports followup questions by extending PocketBase in Go, modeling requests and responses as <a href='https://pocketbase.io/docs/collections/' title=''>collections</a> (i.e. SQLite tables) with <a href='https://pocketbase.io/docs/go-event-hooks/' title=''>event hooks</a> to trigger pre-set interactions with the Ollama app (via <a href='https://tmc.github.io/langchaingo' title=''>LangChainGo</a>) and the client (via the PocketBase API).</p>\n\n<p>If you’re following along, <a href='https://github.com/superfly/llm-describer/blob/main/describer.go' title=''>here’s the module</a>\nthat handles all that, along with initializing the LLM connection.</p>\n\n<p>In a nutshell, this is the dance:</p>\n\n<ul>\n<li>When a user uploads an image, a hook on the <code>images</code> collection sends the image to Ollama, along with this prompt:\n<code>\"You are a helpful assistant describing images for blind screen reader users. Please describe this image.\"</code>\n</li><li>Ollama sends back its response, which the backend 1) passes back to the client and 2) stores in its <code>followups</code> collection for future reference.\n</li><li>If the user responds with a followup question about the image and description, that also \ngoes into the <code>followups</code> collection; user-initiated changes to this collection trigger a hook to chain the new \nfollowup question with the image and the chat history into a new request for the model.\n</li><li>Lather, rinse, repeat.\n</li></ul>\n\n<p>This is a super simple hack to handle followup questions, and it’ll let you keep adding followups until \nsomething breaks. You’ll see the quality of responses get poorer—possibly incoherent—as the context \nexceeds the context window.</p>\n\n<p>I also set up <a href='https://pocketbase.io/docs/api-rules-and-filters/' title=''>API rules</a> in PocketBase,\nensuring that users can’t read to and write from others’ chats with the AI.</p>\n\n<p>If image descriptions aren’t your thing, this business logic is easily swappable \nfor joke generation, extracting details from text, any other simple task you \nmight want to throw at an LLM. Just slot the best model into Ollama (LLaVA is pretty OK as a general starting point too), and match the PocketBase schema and pre-set prompts to your application.</p>\n<h2 id='a-seedling-of-a-client' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-seedling-of-a-client' aria-label='Anchor'></a><span class='plain-code'>A seedling of a client</span></h2>\n<p>With the image description service in place, the user can talk to it with any client that speaks the PocketBase API. PocketBase already has SDK clients in JavaScript and Dart, but because my screen reader is <a href='https://github.com/nvaccess/nvda' title=''>written in Python</a>, I went with a <a href='https://pypi.org/project/pocketbase/' title=''>community-created Python library</a>. That way I can build this out into an NVDA add-on \nif I want to.</p>\n\n<p>If you’re a fancy Python developer, you probably have your preferred tooling for\nhandling virtualenvs and friends. I’m not, and since my screen reader doesn’t use those\nanyway, I just <code>pip install</code>ed the library so my client can import it:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-rgh35fwn\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-rgh35fwn\">pip install pocketbase\n</code></pre>\n  </div>\n</div>\n<p><a href='https://github.com/superfly/llm-describer/blob/main/__init__.py' title=''>My client</a> is a very simple script. \nIt expects a couple of things: a file called <code>image.jpg</code>, located in the current directory, \nand environment variables to provide the service URL and user credentials to log into it with.</p>\n\n<p>When you run the client script, it uploads the image to the user’s <code>images</code> collection on the \nbackend app, starting the back-and-forth between user and model we saw in the previous section. \nThe client prints the model’s output to the CLI and prompts the user to input a followup question, \nwhich it passes up to the <code>followups</code> collection, and so on.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>This can run on Fly.io.</h1>\n    <p>Run your LLM on a datacenter-grade GPU.</p>\n      <a class=\"btn btn-lg\" href=\"https://fly.io/gpu/\">\n        Try out a Fly GPU  <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-dog.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='all-together-now' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#all-together-now' aria-label='Anchor'></a><span class='plain-code'>All together now</span></h2>\n<p>I grabbed <a href='https://unsplash.com/photos/brown-trees-beside-river-under-blue-sky-during-daytime-kSvpTrfhaiU' title=''>this\nimage</a>\nand saved it to a file called <em>image.jpg</em>. </p>\n\n<p>While I knew I was downloading an image of a winter scene, all I see on Unsplash is:</p>\n\n<blockquote>\n<p>brown trees beside river under blue sky during daytime Bright winter landscape\nwith lake, snow, forest, beautiful blue sky and white clouds. An example of\ncharming wildlife in Russia.</p>\n</blockquote>\n\n<p>Let’s see what our very own AI describer thinks of this picture:</p>\n<div class=\"highlight-wrapper group relative plain\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-4bvdtqnv\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-4bvdtqnv\">$ python __init__.py\nThe image depicts a serene winter scene. There is a small stream of water flowing through the center, with patches of snow on its surface and along the banks. The surrounding area appears to be a mix of bare trees and shrubs, suggesting it's late in the season or possibly early spring when some plants have started to grow again but haven't yet leafed out.\n\nThe sky is clear and blue, indicating good weather conditions at the time the photo was taken. The sunlight seems to be coming from the left side of the image, casting a warm glow on the scene and creating reflections in the water. There are no visible texts or distinguishing marks that provide additional context about the location or the photographer.\n\nThe overall atmosphere is peaceful and natural, with the quiet beauty of the landscape undisturbed except for the gentle movement of the stream.\n</code></pre>\n  </div>\n</div>\n<p>Is it a stellar description? Maybe not, but it certainly gives me a better sense of connection with the scene.</p>\n\n<p>Let’s see how our describer copes with a followup question.</p>\n<div class=\"highlight-wrapper group relative plain\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-1x1pz9x9\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-1x1pz9x9\">Enter your followup question, or 'quit' to quit: What types of trees are in the image?\nSending followup question\nIt's difficult to determine the exact species of trees in the image without more specific details, as many deciduous trees lose their leaves during winter and look quite similar. However, based on the general appearance of the bare branches and the fact that they are leafless, it is likely that these are common deciduous trees such as oaks, maples, or elms.\n\nThe presence of snow and the absence of foliage suggest that this photo was taken during winter or early spring when the trees have not yet started to bloom again. The exact species would require a closer examination of the bark, buds, and other distinguishing features, which are not clearly visible in this image.\n</code></pre>\n  </div>\n</div>\n<p>Boo, the general-purpose LLaVA model couldn’t identify the leafless trees. At least it knows why it can’t. Maybe there’s a better model out \nthere for that. Or we could train one, if we really needed tree identification! We could make every component of \nthis service more sophisticated! </p>\n\n<p>But that I, personally, can make a proof of concept like this with a few days of effort\ncontinues to boggle my mind. Thanks to a handful of amazing open source projects, it’s really, spookily, easy. And from here, I (or you) can build out a screen-reader addon, or a mobile app, or a different kind of AI service, with modular changes.</p>\n<h2 id='deployment-notes' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#deployment-notes' aria-label='Anchor'></a><span class='plain-code'>Deployment notes</span></h2>\n<p>On Fly.io, stopping GPU Machines saves you a bunch of money and some carbon footprint, in return for cold-start latency when you make a request for the first time in more than a few minutes. In testing this project, on the <code>a100-40gb</code> Fly Machine preset, the 34b-parameter LLaVA model took several seconds to generate each response. If the Machine was stopped when the request came in, starting it up took another handful of seconds, followed by several tens of seconds to load the model into GPU RAM. The total time from cold start to completed description was about 45 seconds. Just something to keep in mind.</p>\n\n<p>If you’re running Ollama in the cloud, you likely want to put the model onto storage that’s persistent, so you don’t have to download it repeatedly. You could also build the model into a Docker image ahead of deployment.</p>\n\n<p>The PocketBase Golang app compiles to a single executable that you can run wherever.\nI run it on Fly.io, unsurprisingly, and the <a href='https://github.com/superfly/llm-describer/' title=''>repo</a> comes with a Dockerfile and a <a href='https://fly.io/docs/reference/configuration/' title=''><code>fly.toml</code></a> config file, which you can edit to point at your own Ollama instance. It uses a small persistent storage volume for the SQLite database. Under testing, it runs fine on a <code>shared-cpu-1x</code> Machine. </p>",
      "image": {
        "url": "https://fly.io/blog/llm-image-description/assets/image-description-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/jit-wireguard-peers/",
      "title": "JIT WireGuard",
      "description": null,
      "url": "https://fly.io/blog/jit-wireguard-peers/",
      "published": "2024-03-12T00:00:00.000Z",
      "updated": "2024-05-23T20:00:21.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. We do a lot of stuff with WireGuard, which has become a part of our customer API. This is a quick story about some tricks we played to make WireGuard faster and more scalable for the hundreds of thousands of people who now use it here.</p>\n</div>\n<p>One of many odd decisions we’ve made at Fly.io is how we use WireGuard. It’s not just that we use it in many places where other shops would use HTTPS and REST APIs. We’ve gone a step beyond that: every time you run <code>flyctl</code>, our lovable, sprawling CLI, it conjures a TCP/IP stack out of thin air, with its own IPv6 address, and speaks directly to Fly Machines running on our networks.</p>\n\n<p>There are plusses and minuses to this approach, which we talked about <a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''>in a blog post a couple years back</a>. Some things, like remote-operated Docker builders, get easier to express (a Fly Machine, as far as <code>flyctl</code> is concerned, might as well be on the same LAN). But everything generally gets trickier to keep running reliably.</p>\n\n<p>It was a decision. We own it.</p>\n\n<p>Anyways, we’ve made some improvements recently, and I’d like to talk about them.</p>\n<h2 id='where-we-left-off' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#where-we-left-off' aria-label='Anchor'></a><span class='plain-code'>Where we left off</span></h2>\n<p>Until a few weeks ago, our gateways ran on a pretty simple system.</p>\n\n<ol>\n<li>We operate dozens of “gateway” servers around the world, whose sole purpose is to accept incoming WireGuard connections and connect them to the appropriate private networks.\n</li><li>Any time you run <code>flyctl</code> and it needs to talk to a Fly Machine (to build a container, pop an SSH console, copy files, or proxy to a service you’re running), it spawns or connects to a background agent process.\n</li><li>The first time it runs, the agent generates a new WireGuard peer configuration from our GraphQL API. WireGuard peer configurations are very simple: just a public key and an address to connect to.\n</li><li>Our API in turn takes that peer configuration and sends it to the appropriate gateway (say, <code>ord</code>, if you’re near Chicago) via an RPC we send over the NATS messaging system.\n</li><li>On the gateway, a service called <code>wggwd</code> accepts that configuration, saves it to a SQLite database, and adds it to the kernel using WireGuard’s Golang libraries. <code>wggwd</code> acknowledges the installation of the peer to the API.\n</li><li>The API replies to your GraphQL request, with the configuration.\n</li><li>Your <code>flyctl</code> connects to the WireGuard peer, which works, because you receiving the configuration means it’s installed on the gateway.\n</li></ol>\n\n<p>I copy-pasted those last two bullet points from <a href='https://fly.io/blog/our-user-mode-wireguard-year/' title=''>that two-year-old post</a>, because when it works, it does <em>just work</em> reasonably well. (We ultimately did end up defaulting everybody to WireGuard-over-WebSockets, though.)</p>\n\n<p>But if it always worked, we wouldn’t be here, would we?</p>\n\n<p>We ran into two annoying problems:</p>\n\n<p>One: NATS is fast, but doesn’t guarantee delivery. Back in 2022, Fly.io was pretty big on NATS internally. We’ve moved away from it. For instance, our <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>internal <code>flyd</code> API</a> used to be driven by NATS; today, it’s HTTP.  Our NATS cluster was losing too many messages to host a reliable API on it. Scaling back our use of NATS made WireGuard gateways better, but still not great.</p>\n\n<p>Two: When <code>flyctl</code> exits, the WireGuard peer it created sticks around on the gateway. Nothing cleans up old peers. After all, you’re likely going to come back tomorrow and deploy a new version of your app, or <code>fly ssh console</code> into it to debug something. Why remove a peer just to re-add it the next day?</p>\n\n<p>Unfortunately, the vast majority of peers are created by <code>flyctl</code> in CI jobs, which don’t have persistent storage and can’t reconnect to the same peer the next run; they generate new peers every time, no matter what.</p>\n\n<p>So, we ended up with a not-reliable-enough provisioning system, and gateways with hundreds of thousands of peers that will never be used again. The high stale peer count made kernel WireGuard operations very slow - especially loading all the peers back into the kernel after a gateway server reboot - as well as some kernel panics.</p>\n\n<p>There had to be</p>\n<h2 id='a-better-way' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#a-better-way' aria-label='Anchor'></a><span class='plain-code'>A better way.</span></h2>\n<p>Storing bajillions of WireGuard peers is no big challenge for any serious n-tier RDBMS. This isn’t “big data”. The problem we have at Fly.io is that our gateways don’t have serious n-tier RDBMSs. They’re small. Scrappy. They live off the land.</p>\n\n<p>Seriously, though: you could store every WireGuard peer everybody has ever used at Fly.io in a single SQLite database, easily.  What you can’t do is store them all in the Linux kernel.</p>\n\n<p>So, at some point, as you push more and more peer configurations to a gateway, you have to start making decisions about which peers you’ll enable in the kernel, and which you won’t.</p>\n\n<p>Wouldn’t it be nice if we just didn’t have this problem? What if, instead of pushing configs to gateways, we had the gateways pull them from our API on demand?</p>\n\n<p>If you did that, peers would only have to be added to the kernel when the client wanted to connect. You could yeet them out of the kernel any time you wanted; the next time the client connected, they’d just get pulled again, and everything would work fine.</p>\n\n<p>The problem you quickly run into to build this design is that Linux kernel WireGuard doesn’t have a feature for installing peers on demand. However:</p>\n<h2 id='it-is-possible-to-jit-wireguard-peers' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#it-is-possible-to-jit-wireguard-peers' aria-label='Anchor'></a><span class='plain-code'>It is possible to JIT WireGuard peers</span></h2>\n<p>The Linux kernel’s <a href='https://github.com/WireGuard/wgctrl-go' title=''>interface for configuring WireGuard</a> is <a href='https://docs.kernel.org/userspace-api/netlink/intro.html' title=''>Netlink</a> (which is basically a way to create a userland socket to talk to a kernel service). Here’s a <a href='https://github.com/WireGuard/wg-dynamic/blob/master/netlink.h' title=''>summary of it as a C API</a>. Note that there’s no API call to subscribe for “incoming connection attempt” events.</p>\n\n<p>That’s OK! We can just make our own events. WireGuard connection requests are packets, and they’re easily identifiable, so we can efficiently snatch them with a BPF filter and a <a href='https://github.com/google/gopacket' title=''>packet socket</a>.</p>\n<div class=\"callout\"><p>Most of the time, it’s even easier for us to get the raw WireGuard packets, because our users now default to WebSockets WireGuard (which is just an unauthenticated WebSockets connect that shuttles framed UDP packets to and from an interface on the gateway), so that people who have trouble talking end-to-end in UDP can bring connections up.</p>\n</div>\n<p>We own the daemon code for that, and can just hook the packet receive function to snarf WireGuard packets.</p>\n\n<p>It’s not obvious, but WireGuard doesn’t have notions of “client” or “server”. It’s a pure point-to-point protocol; peers connect to each other when they have traffic to send. The first peer to connect is called the <strong class='font-semibold text-navy-950'>initiator</strong>, and the peer it connects to is the <strong class='font-semibold text-navy-950'>responder</strong>.</p>\n<div class=\"right-sidenote\"><p><a href=\"https://www.wireguard.com/papers/wireguard.pdf\" title=\"\"><em>The WireGuard paper</em></a> <em>is a good read.</em></p>\n</div>\n<p>For Fly.io, <code>flyctl</code> is typically our initiator,  sending a single UDP packet to the gateway, which is the responder. According <a href='https://www.wireguard.com/papers/wireguard.pdf' title=''>to the WireGuard paper</a>, this first packet is a <code>handshake initiation</code>.  It gets better: the packet type is recorded in a single plaintext byte. So this simple BPF filter catches all the incoming connections: <code>udp and dst port 51820 and udp[8] = 1</code>.</p>\n\n<p>In most other protocols, we’d be done at this point; we’d just scrape the username or whatnot out of the packet, go fetch the matching configuration, and install it in the kernel. With WireGuard, not so fast. WireGuard is based on Trevor Perrin’s <a href='http://www.noiseprotocol.org/' title=''>Noise Protocol Framework</a>, and Noise goes way out of its way to <a href='http://www.noiseprotocol.org/noise.html#identity-hiding' title=''>hide identities</a> during handshakes. To identify incoming requests, we’ll need to run enough Noise cryptography to decrypt the identity.</p>\n\n<p>The code to do this is fussy, but it’s relatively short (about 200 lines). Helpfully, the kernel Netlink interface will give a privileged process the private key for an interface, so the secrets we need to unwrap WireGuard are easy to get. Then it’s just a matter of running the first bit of the Noise handshake. If you’re that kind of nerdy, <a href='https://gist.github.com/tqbf/9f2c2852e976e6566f962d9bca83062b' title=''>here’s the code.</a></p>\n\n<p>At this point, we have the event feed we wanted: the public keys of every user trying to make a WireGuard connection to our gateways. We keep a rate-limited cache in SQLite, and when we see new peers, we’ll make an internal HTTP API request to fetch the matching peer information and install it. This fits nicely into the little daemon that already runs on our gateways to manage WireGuard, and allows us to ruthlessly and recklessly remove stale peers with a <code>cron</code> job.</p>\n\n<p>But wait! There’s more! We bounced this plan off Jason Donenfeld, and he tipped us off on a sneaky feature of the Linux WireGuard Netlink interface.</p>\n<div class=\"right-sidenote\"><p>Jason is the hardest working person in show business.</p>\n</div>\n<p>Our API fetch for new peers is generally not going to be fast enough to respond to the first handshake initiation message a new client sends us. That’s OK; WireGuard is pretty fast about retrying. But we can do better.</p>\n\n<p>When we get an incoming initiation message, we have the 4-tuple address of the desired connection, including the ephemeral source port <code>flyctl</code> is using. We can install the peer as if we’re the initiator, and <code>flyctl</code> is the responder. The Linux kernel will initiate a WireGuard connection back to <code>flyctl</code>. This works; the protocol doesn’t care a whole lot who’s the server and who’s the client. We get new connections established about as fast as they can possibly be installed.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Launch an app in minutes</h1>\n    <p>Speedrun an app onto Fly.io and get your own JIT WireGuard peer ✨</p>\n      <a class=\"btn btn-lg\" href=\"/docs/speedrun/\">\n        Speedrun  <span class='opacity-50'>→</span>\n      </a>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-cat.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='look-at-this-graph' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#look-at-this-graph' aria-label='Anchor'></a><span class='plain-code'>Look at this graph</span></h2>\n<p>We’ve been running this in production for a few weeks and we’re feeling pretty happy about it. We went from thousands, or hundreds of thousands, of stale WireGuard peers on a gateway to what rounds to none. Gateways now hold a lot less state, are faster at setting up peers, and can be rebooted without having to wait for many unused peers to be loaded back into the kernel.</p>\n\n<p>I’ll leave you with this happy Grafana chart from the day of the switchover.</p>\n\n<p><img alt=\"a Grafana chart of 'kernel_stale_wg_peer_count' vs. time. For the first few hours, all traces are flat. Most are at values between 0 and 50,000 and the top-most is just under 550,000. Towards the end of the graph, each line in turn jumps sharply down to the bottom, and at the end of the chart all datapoints are indistinguishable from 0.\" src=\"/blog/jit-wireguard-peers/assets/wireguard-peers-graph.webp\" /></p>\n\n<p><strong class='font-semibold text-navy-950'>Editor’s note:</strong> Despite our tearful protests, Lillian has decided to move on from Fly.io to explore new pursuits. We wish her much success and happiness! ✨</p>",
      "image": {
        "url": "https://fly.io/blog/jit-wireguard-peers/assets/network-thumbnail.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/fks-beta-live/",
      "title": "Fly Kubernetes does more now",
      "description": null,
      "url": "https://fly.io/blog/fks-beta-live/",
      "published": "2024-03-07T00:00:00.000Z",
      "updated": "2024-04-24T22:38:38.000Z",
      "content": "<div class=\"lead\"><p>Eons ago, we <a href=\"https://fly.io/blog/fks/\" title=\"\">announced</a> we were working on <a href=\"https://fly.io/docs/kubernetes/\" title=\"\">Fly Kubernetes</a>. It drummed up enough excitement to prove we were heading in the right direction. So, we got hard to work to get from barebones “early access” to a beta release. We’ll be onboarding customers to the closed beta over the next few weeks. Email us at <a href=\"mailto:[email protected]\">[email protected]</a> and we’ll hook you up.</p>\n</div>\n<p>Fly Kubernetes is the “blessed path\"™️ to using Kubernetes backed by Fly.io infrastructure. Or, in simpler terms, it is our managed Kubernetes service. We take care of the complexity of operating the Kubernetes control plane, leaving you with the unfettered joy of deploying your Kubernetes workloads. If you love Fly.io and K8s, this product is for you.</p>\n<h2 id='what-even-is-a-kubernete' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#what-even-is-a-kubernete' aria-label='Anchor'></a><span class='plain-code'>What even is a Kubernete?</span></h2>\n<p>So how did this all come to be—and what even is a Kubernete?</p>\n<div class=\"right-sidenote\"><p>You can see more fun details in <a href=\"https://fly.io/blog/fks/\" title=\"\">Introducing Fly Kubernetes</a>.</p>\n</div>\n<p>If you wade through all the YAML and <a href='https://landscape.cncf.io/' title=''>CNCF projects</a>, what’s left is an API for declaring workloads and how it should be accessed. </p>\n\n<p>But that’s not what people usually talk / groan about. It’s everything else that comes along with adopting Kubernetes: a container runtime (CRI), networking between workloads (CNI) which leads to DNS (CoreDNS). Then you layer on Prometheus for metrics and whatever the logging daemon du jour is at the time. Now you get to debate which Ingress—strike that—<em>Gateway</em> API to deploy and if the next thing is anything to do with a Service Mess, then as they like to say where I live, \"bless your heart”.</p>\n\n<p>Finally, there’s capacity planning. You’ve got to pick and choose where, how and what the <a href='https://kubernetes.io/docs/concepts/architecture/nodes/' title=''>Nodes</a> will look like in order to configure and run the workloads.</p>\n\n<p>When we began thinking about what a Fly Kubernetes Service could look like, we started from first principles, as we do with most everything here. The best way we can describe it is the <a href='https://www.youtube.com/watch?v=Ddk9ci6geSs' title=''>scene from Iron Man 2 when Tony Stark discovers a new element</a>. As he’s looking at the knowledge left behind by those that came before, he starts to imagine something entirely different and more capable than could have been accomplished previously. That’s what happened to JP, but with K3s and Virtual Kubelet.</p>\n<h2 id='ok-then-wtf-whats-the-fks' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#ok-then-wtf-whats-the-fks' aria-label='Anchor'></a><span class='plain-code'>OK then, WTF (what’s the FKS)?</span></h2>\n<p>We looked at what people need to get started—the API—and then started peeling away all the noise, filling in the gaps to connect things together to provide the power. Here’s how this looks currently:</p>\n\n<ul>\n<li>Containerd/CRI → <a href='https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/' title=''>flyd</a> + Firecracker + <a href='https://fly.io/blog/docker-without-docker/' title=''>our init</a>: our system transmogrifies Docker containers into Firecracker microVMs\n</li><li>Networking/CNI → Our <a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''>internal WireGuard mesh</a> connects your pods together\n</li><li>Pods → Fly Machines VMs\n</li><li>Secrets → Secrets, only not the base64’d kind\n</li><li>Services → The Fly Proxy\n</li><li>CoreDNS → CoreDNS (to be replaced with our custom internal DNS)\n</li><li>Persistent Volumes → Fly Volumes (coming soon)\n</li></ul>\n\n<p>Now…not everything is a one-to-one comparison, and we explicitly did not set out to support any and every configuration. We aren’t dealing with resources like Network Policy and init containers, though we’re also not completely ignoring them. By mapping many of the core primitives of Kubernetes to a Fly.io resource, we’re able to focus on continuing to build the primitives that make our cloud better for workloads of all shapes and sizes.</p>\n\n<p>A key thing to notice above is that there’s no “Node”.</p>\n\n<p><a href='https://virtual-kubelet.io/' title=''>Virtual Kubelet</a> plays a central role in FKS. It’s magic, really. A Virtual Kubelet acts as if it’s a standard Kubelet running on a Node, eager to run your workloads. However, there’s no Node backing it. It instead behaves like an API, receiving requests from Kubernetes and transforming them into requests to deploy on a cloud compute service. In our case, that’s Fly Machines.</p>\n\n<p>So what we have is Kubernetes calling out to our <a href='https://virtual-kubelet.io/docs/providers/' title=''>Virtual Kubelet provider</a>, a small Golang program we run alongside K3s, to create and run your pod. It creates <a href='https://fly.io/blog/docker-without-docker/' title=''>your pod as a Fly Machine</a>, via the <a href='/docs/machines/api/' title=''>Fly Machines API</a>, deploying it to any underlying host within that region. This shifts the burden of managing hardware capacity from you to us. We think that’s a cool trick—thanks, Virtual Kubelet magic!</p>\n<h2 id='speedrun' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#speedrun' aria-label='Anchor'></a><span class='plain-code'>Speedrun</span></h2>\n<p>You can deploy your workloads (including GPUs) across any of our available regions using the Kubernetes API.</p>\n\n<p>You create a cluster with <code>flyctl</code>:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-fnxi6rft\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-fnxi6rft\">fly ext k8s create --name hello --org personal --region iad\n</code></pre>\n  </div>\n</div>\n<p>When a cluster is created, it has the standard <code>default</code> namespace. You can inspect it:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-92wwv6kq\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-92wwv6kq\">kubectl get ns default --show-labels\n</code></pre>\n  </div>\n</div><div class=\"highlight-wrapper group relative output\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-mk490mip\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight output whitespace-pre'><code id=\"code-mk490mip\">NAME      STATUS   AGE   LABELS\ndefault   Active   20d   fly.io/app=fks-default-7zyjm3ovpdxmd0ep,kubernetes.io/metadata.name=default\n</code></pre>\n  </div>\n</div>\n<p>The <code>fly.io/app</code> label shows the name of the Fly App that corresponds to your cluster.</p>\n\n<p>It would seem appropriate to deploy the <a href='https://github.com/kubernetes-up-and-running/kuard' title=''>Kubernetes Up And Running demo</a> here, but since your pods are connected over an <a href='https://fly.io/blog/ipv6-wireguard-peering/' title=''>IPv6 WireGuard mesh</a>, we’re going to use a <a href='https://github.com/jipperinbham/kuard' title=''>fork</a> with support for <a href='https://github.com/kubernetes-up-and-running/kuard/issues/46' title=''>IPv6 DNS</a>.</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-7qz94xki\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-7qz94xki\">kubectl run \\\n  --image=ghcr.io/jipperinbham/kuard-amd64:blue \\\n  --labels=\"app=kuard-fks\" \\\n  kuard\n</code></pre>\n  </div>\n</div>\n<p>And you can see its Machine representation via:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-1wk7f1q0\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-1wk7f1q0\">fly machine list --app fks-default-7zyjm3ovpdxmd0ep\n</code></pre>\n  </div>\n</div><div class=\"highlight-wrapper group relative output\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-7rbzov1i\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight output whitespace-pre'><code id=\"code-7rbzov1i\">ID              NAME        STATE   REGION  IMAGE                           IP ADDRESS                      VOLUME  CREATED                 LAST UPDATED            APP PLATFORM    PROCESS GROUP   SIZE\n1852291c46ded8  kuard       started iad     jipperinbham/kuard-amd64:blue   fdaa:0:48c8:a7b:228:4b6d:6e20:2         2024-03-05T18:54:41Z    2024-03-05T18:54:44Z                                    shared-cpu-1x:256MB\n</code></pre>\n  </div>\n</div>\n<p></div></p>\n\n<p>This is important! Your pod is a Fly Machine! While we don’t yet support all kubectl features, Fly.io tooling will “just work” for cases where we don’t yet support the kubectl way. So, for example, we don’t have <code>kubectl port-forward</code> and <code>kubectl exec</code>, but you can use flyctl to forward ports and get a shell into a pod.</p>\n\n<p>Expose it to your internal network using the standard ClusterIP Service:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-1sjiwcq9\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-1sjiwcq9\">kubectl expose pod kuard \\\n  --name=kuard \\\n  --port=8080 \\\n  --target-port=8080 \\\n  --selector='app=kuard-fks'\n</code></pre>\n  </div>\n</div>\n<p>ClusterIP Services work natively, and Fly.io internal DNS supports them. Within the cluster, CoreDNS works too.</p>\n\n<p>Access this Service locally via <a href='https://fly.io/docs/networking/private-networking/#flycast-private-load-balancing' title=''>flycast</a>: Get connected to your org’s <a href='https://fly.io/docs/networking/private-networking/' title=''>6PN private WireGuard network</a>. Get kubectl to describe the <code>kuard</code> Service:</p>\n<div class=\"highlight-wrapper group relative cmd\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-hy5q54ru\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight cmd'><code id=\"code-hy5q54ru\">kubectl describe svc kuard\n</code></pre>\n  </div>\n</div><div class=\"highlight-wrapper group relative output\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-a8mzw85a\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight output'><code id=\"code-a8mzw85a\">Name:              kuard\nNamespace:         default\nLabels:            app=kuard-fks\nAnnotations:       fly.io/clusterip-allocator: configured\n                   service.fly.io/sync-version: 11507529969321451315\nSelector:          app=kuard-fks\nType:              ClusterIP\nIP Family Policy:  SingleStack\nIP Families:       IPv6\nIP:                fdaa:0:48c8:0:1::1a\nIPs:               fdaa:0:48c8:0:1::1a\nPort:              <unset>  8080/TCP\nTargetPort:        8080/TCP\nEndpoints:         [fdaa:0:48c8:a7b:228:4b6d:6e20:2]:8080\nSession Affinity:  None\nEvents:            <none>\n</code></pre>\n  </div>\n</div>\n<p>You can pull out the Service’s IP address from the above output, and get at the KUARD UI using that: in this case, <code>http://[fdaa:0:48c8:0:1::1a]:8080</code>. </p>\n\n<p>Using internal DNS: <code>http://<service_name>.svc.<app_name>.flycast:8080</code>. Or, in our example: <code>http://kuard.svc.fks-default-7zyjm3ovpdxmd0ep.flycast:8080</code>.</p>\n\n<p>And finally CoreDNS: <code><service_name>.<namespace>.svc.cluster.local</code> resolves to the <code>fdaa</code> IP and is routable within the cluster.</p>\n<figure class=\"post-cta\">\n  <figcaption>\n    <h1>Get in on the FKS beta</h1>\n    <p>Email us at [email protected]</p>\n  </figcaption>\n  <div class=\"image-container\">\n    <img src=\"/static/images/cta-cat.webp\" srcset=\"/static/images/[email protected] 2x\" alt=\"\">\n  </div>\n</figure>\n\n<h2 id='pricing' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#pricing' aria-label='Anchor'></a><span class='plain-code'>Pricing</span></h2>\n<p>The Fly Kubernetes Service is free during the beta. Fly Machines and Fly Volumes you create with it will cost the <a href='https://fly.io/docs/about/pricing/' title=''>same as for your other Fly.io projects</a>. It’ll be <a href='https://fly.io/docs/about/pricing/#fly-kubernetes' title=''>$75/mo per cluster</a> after that, plus the cost of the other resources you create.</p>\n<h2 id='today-and-the-future' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#today-and-the-future' aria-label='Anchor'></a><span class='plain-code'>Today and the future</span></h2>\n<p>Today, Fly Kubernetes supports only a portion of the Kubernetes API. You can deploy pods using Deployments/ReplicaSets. Pods are able to communicate via Services using the standard K8s DNS format. Ephemeral and persistent volumes are supported.</p>\n\n<p>The most notable absences are: multi-container pods, StatefulSets, network policies, horizontal pod autoscaling and emptyDir volumes. We’re working at supporting autoscaling and emptyDir volumes in the coming weeks and multi-container pods in the coming months.</p>\n\n<p>If you’ve made it this far and are eagerly awaiting your chance to tell us and the rest of the internet “this isn’t Kubernetes!”, well, we agree! It’s not something we take lightly. We’re still building, and conformance tests may be in the future for FKS. We’ve made a deliberate decision to only care about fast launching VMs as the one and only way to run workloads on our cloud. And we also know enough of our customers would like to use the Kubernetes API to create a fast launching VM in the form of a Pod, and that’s where this story begins. </p>",
      "image": {
        "url": "https://fly.io/blog/fks-beta-live/assets/fks-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/tigris-public-beta/",
      "title": "Globally Distributed Object Storage with Tigris",
      "description": null,
      "url": "https://fly.io/blog/tigris-public-beta/",
      "published": "2024-02-15T00:00:00.000Z",
      "updated": "2024-04-24T22:38:38.000Z",
      "content": "<div class=\"lead\"><p>We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. That’s pretty cool, but we want to talk about something someone else built, that <a href=\"https://fly.io/docs/reference/tigris/\" title=\"\">you can use today</a> to build applications.</p>\n</div>\n<p>There are three hard things in computer science:</p>\n\n<ol>\n<li>Cache invalidation\n</li><li>Naming things\n</li><li><a href='https://aws.amazon.com/s3/' title=''>Doing a better job than Amazon of storing files</a>\n</li></ol>\n\n<p>Of all the annoying software problems that have no business being annoying, handling a file upload in a full-stack application stands apart, a universal if tractable malady, the plantar fasciitis of programming.</p>\n\n<p>Now, the actual act of clients placing files on servers is straightforward. Your framework <a href='https://hexdocs.pm/phoenix/file_uploads.html' title=''>has</a> <a href='https://edgeguides.rubyonrails.org/active_storage_overview.html' title=''>a</a> <a href='https://docs.djangoproject.com/en/5.0/topics/http/file-uploads/' title=''>feature</a> <a href='https://expressjs.com/en/resources/middleware/multer.html' title=''>that</a> <a href='https://github.com/yesodweb/yesod-cookbook/blob/master/cookbook/Cookbook-file-upload-saving-files-to-server.md' title=''>does</a> <a href='https://laravel.com/docs/10.x/filesystem' title=''>it</a>. What’s hard is making sure that uploads stick around to be downloaded later.</p>\n<aside class=\"right-sidenote\"><p>(yes, yes, we know, <a href=\"https://youtu.be/b2F-DItXtZs?t=102\" title=\"\">sharding /dev/null</a> is faster)</p>\n</aside>\n<p>Enter object storage, a pattern you may know by its colloquial name “S3”. Object storage occupies a funny place in software architecture, somewhere between a database and a filesystem. It’s like <a href='https://man7.org/linux/man-pages/man3/malloc.3.html' title=''><code>malloc</code></a><code>()</code>, but for cloud storage instead of program memory.</p>\n\n<p><a href='https://www.kleenex.com/en-us/' title=''>S3</a>—err, object storage — is so important that it was the second AWS service ever introduced (EC2 was not the first!). Everybody wants it. We know, because they keep asking us for it.</p>\n\n<p>So why didn’t we build it?</p>\n\n<p>Because we couldn’t figure out a way to improve on S3. And we still haven’t! But someone else did, at least for the kinds of applications we see on Fly.io.</p>\n<h2 id='but-first-some-back-story' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#but-first-some-back-story' aria-label='Anchor'></a><span class='plain-code'>But First, Some Back Story</span></h2>\n<p>S3 checks all the boxes. It’s trivial to use. It’s efficient and cost-effective. It has redundancies that would make a DoD contractor blush. It integrates with archival services like Glacier. And every framework supports it. At some point, the IETF should just take a deep sigh and write an S3 API RFC, XML signatures and all.</p>\n\n<p>There’s at least one catch, though.</p>\n\n<p>Back in, like, ‘07 people ran all their apps from a single city. S3 was designed to work for those kinds of apps. The data, the bytes on the disks (or whatever weird hyperputer AWS stores S3 bytes on), live in one place. A specific place. In a specific data center. As powerful and inspiring as The Architects are, they are mortals, and must obey the laws of physics.</p>\n\n<p>This observation feels banal, until you realize how apps have changed in the last decade. Apps and their users don’t live in one specific place. They live all over the world. When users are close to the S3 data center, things are amazing! But things get less amazing the further away you get from the data center, and even less amazing the smaller and more frequent your reads and writes are.</p>\n\n<p>(Thought experiment: you have to pick one place in the world to route all your file storage. Where is it? Is it <a href='https://www.tripadvisor.com/Restaurant_Review-g30246-d1956555-Reviews-Ford_s_Fish_Shack_Ashburn-Ashburn_Loudoun_County_Virginia.html' title=''>Loudoun County, Virginia</a>?)</p>\n\n<p>So, for many modern apps, you end up having to <a href='https://stackoverflow.com/questions/32426249/aws-s3-bucket-with-multiple-regions' title=''>write things into different regions</a>, so that people close to the data get it from a region-specific bucket. Doing that pulls in CDN-caching things that complicated your application and put barriers between you and your data. Before you know it, you’re wearing custom orthotics on your, uh, developer feet. (<em>I am done with this metaphor now, I promise.</em>)</p>\n<aside class=\"right-sidenote\"><p>(well, okay, Backblaze B2 because somehow my bucket fits into their free tier, but you get the idea)</p>\n</aside>\n<p>Personally, I know this happens. Because I had to build one! I run a <a href='https://xeiaso.net/blog/xedn/' title=''>CDN backend</a> that’s a caching proxy for S3 in six continents across the world. All so that I can deliver images and video efficiently for the readers of my blog.</p>\n<aside class=\"right-sidenote\"><p>(shut up, it’s a sandwich)</p>\n</aside>\n<p>What if data was really global? For some applications, it might not matter much. But for others, it matters a lot. When a sandwich lover in Australia snaps a picture of a <a href='https://en.wikipedia.org/wiki/Hamdog' title=''>hamdog</a>, the people most likely to want to see that photo are also in Australia. Routing those uploads and downloads through one building in Ashburn is no way to build a sandwich reviewing empire.</p>\n\n<p>Localizing all the data sounds like a hard problem. What if you didn’t need to change anything on your end to accomplish it?</p>\n<h2 id='show-me-a-hero' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#show-me-a-hero' aria-label='Anchor'></a><span class='plain-code'>Show Me A Hero</span></h2>\n<p>Building a miniature CDN infrastructure just to handle file uploads seems like the kind of thing that could take a week or so of tinkering. The Fly.io unified theory of cloud development is that solutions are completely viable for full-stack developers only when they take less than 2 hours to get working.</p>\n\n<p>AWS agrees, which is why they have a SKU for it, <a href='https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudfront_distribution' title=''>called Cloudfront</a>, which will, at some variably metered expense, optimize the read side of a single-write-region bucket: they’ll set up <a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''>a simple caching CDN</a> for you. You can probably get S3 and Cloudfront working within 2 hours, especially if you’ve set it up before.</p>\n\n<p>Our friends at Tigris have this problem down to single-digit minutes, and what they came up with is a lot cooler than a cache CDN.</p>\n\n<p>Here’s how it works. Tigris runs redundant FoundationDB clusters in our regions to track objects. They use Fly.io’s NVMe volumes as a first level of cached raw byte store, and a queuing system modelled on <a href='https://www.foundationdb.org/files/QuiCK.pdf' title=''>Apple’s QuiCK paper</a> to distribute object data to multiple replicas, to regions where the data is in demand, and to 3rd party object stores… like S3.</p>\n\n<p>If your objects are less than about 128 kilobytes, Tigris makes them instantly global. By default! Things are just snappy, all over the world, automatically, because they’ve done all the work.</p>\n\n<p>But it gets better, because Tigris is also much more flexible than a cache simple CDN. It’s globally distributed from the jump, with inter-region routing baked into its distribution layer. Tigris isn’t a CDN, but rather a toolset that you can use to build arbitrary CDNs, with consistency guarantees, instant purge and relay regions.</p>\n\n<p>There’s a lot going on in this architecture, and it’d be fun to dig into it more. But for now, you don’t have to understand any of it. Because Tigris ties all this stuff together with an S3-compatible object storage API. If your framework can talk to S3, it can use Tigris.</p>\n<h2 id='fly-storage' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#fly-storage' aria-label='Anchor'></a><span class='plain-code'><code>fly storage</code></span></h2>\n<p>To get started with this, run the <code>fly storage create</code> command:</p>\n<div class=\"highlight-wrapper group relative \">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-rhojus0y\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-rhojus0y\">$ fly storage create\nChoose a name, use the default, or leave blank to generate one: xe-foo-images\nYour Tigris project (xe-foo-images) is ready. See details and next steps with: https://fly.io/docs/reference/tigris/\n\nSetting the following secrets on xe-foo:\nAWS_REGION\nBUCKET_NAME\nAWS_ENDPOINT_URL_S3\nAWS_ACCESS_KEY_ID\nAWS_SECRET_ACCESS_KEY\n\nSecrets are staged for the first deployment\n</code></pre>\n  </div>\n</div>\n<p>All you have to do is fill in a bucket name. Hit enter. All of the configuration for the AWS S3 library will be injected into your application for you. And you don’t even need to change the libraries that you’re using. <a href='https://www.tigrisdata.com/docs/sdks/s3/' title=''>The Tigris examples</a> all use the AWS libraries to put and delete objects into Tigris using the same calls that you use for S3.</p>\n\n<p>I know how this looks for a lot of you. It looks like we’re partnering with Tigris because we’re chicken, and we didn’t want to build something like this. Well, guess what: you’re right!</p>\n\n<p>Compute and networking: those are things we love and understand. Object storage? <a href='https://fly.io/blog/the-5-hour-content-delivery-network/' title=''>We already gave away the game on how we’d design a CDN for our own content</a>, and it wasn’t nearly as slick as Tigris.</p>\n\n<p>Object storage is important. It needs to be good. We did not want to half-ass it. So we partnered with Tigris, so that they can put their full resources into making object storage as ✨magical✨ as Fly.io is.</p>\n\n<p>This also mirrors a lot of the Unix philosophy of Days Gone Past, you have individual parts that do one thing very well that are then chained together to create a composite result. I mean, come on, would you seriously want to buy your servers the same place you buy your shoes?</p>\n<h2 id='one-bill-to-rule-them-all' class='group flex items-start whitespace-pre-wrap relative mt-14 sm:mt-16 mb-4 text-navy-950 font-heading'><a class='inline-block align-text-top relative top-[.15em] w-6 h-6 -ml-6 after:hash opacity-0 group-hover:opacity-100 transition-all' href='#one-bill-to-rule-them-all' aria-label='Anchor'></a><span class='plain-code'>One bill to rule them all</span></h2>\n<p>Well, okay, the main reason why you would want to do that is because having everything under one bill makes it really easy for your accounting people. So, to make one bill for your computer, your block storage, your databases, your networking, and your object storage, we’ve wrapped everything under one bill. You don’t have to create separate accounts with Supabase or Upstash or PlanetScale or Tigris. Everything gets charged to your Fly.io bill and you pay one bill per month.</p>\n<aside class=\"right-sidenote\"><p>This was actually going to be posted on Valentine’s Day, but we had to wait for the chocolate to go on sale.</p>\n</aside>\n<p>This is our Valentine’s Day gift to you all. Object storage that just works. Stay tuned because we have a couple exciting features that build on top of the integration of Fly.io and Tigris that allow really unique things, such as truly global static website hosting and turning your bucket into a CDN in 5 minutes at most.</p>\n\n<p>Here’s to many more happy developer days to come.</p>",
      "image": {
        "url": "https://fly.io/blog/tigris-public-beta/assets/tigris-public-beta-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    },
    {
      "id": "https://fly.io/blog/gpu-ga/",
      "title": "GPUs on Fly.io are available to everyone!",
      "description": null,
      "url": "https://fly.io/blog/gpu-ga/",
      "published": "2024-02-12T00:00:00.000Z",
      "updated": "2024-04-24T22:38:38.000Z",
      "content": "<div class=\"lead\"><p>Fly.io makes it easy to spin up compute around the world, now including powerful GPUs. Unlock the power of large language models, text transcription, and image generation with our datacenter-grade muscle!</p>\n</div>\n<p>GPUs are now available to everyone!</p>\n\n<p>We know you’ve been excited about wanting to use GPUs on Fly.io and we’re happy to announce that they’re available for everyone. If you want, you can spin up GPU instances with any of the following cards:</p>\n\n<ul>\n<li>Ampere A100 (40GB) <code>a100-40gb</code>\n</li><li>Ampere A100 (80GB) <code>a100-80gb</code>\n</li><li>Lovelace L40s (48GB) <code>l40s</code>\n</li></ul>\n\n<p>To use a GPU instance today, change the <code>vm.size</code> for one of your apps or processes to any of the above GPU kinds. Here’s how you can spin up an <a href='https://ollama.ai' title=''>Ollama</a> server in seconds:</p>\n<div class=\"highlight-wrapper group relative toml\">\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-9 -mr-0.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-wrap-target=\"#code-bcyvgy6u\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\" stroke-linecap=\"round\" stroke-linejoin=\"round\"><g buffered-rendering=\"static\"><path d=\"M9.912 8.037h2.732c1.277 0 2.315-.962 2.315-2.237a2.325 2.325 0 00-2.315-2.31H2.959m10.228 9.01H2.959M6.802 8H2.959\" /><path d=\"M11.081 6.466L9.533 8.037l1.548 1.571\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-9px] tail text-navy-950\">\n      Wrap text\n    </span>\n  </button>\n  <button \n    type=\"button\"\n    class=\"bubble-wrap z-20 absolute right-1.5 top-1.5 text-transparent group-hover:text-gray-400 group-hover:hocus:text-white focus:text-white bg-transparent group-hover:bg-gray-900 group-hover:hocus:bg-gray-700 focus:bg-gray-700 transition-colors grid place-items-center w-7 h-7 rounded-lg outline-none focus:outline-none\"\n    data-copy-target=\"sibling\"\n  >\n    <svg class=\"w-4 h-4 pointer-events-none\" viewBox=\"0 0 16 16\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.35\"><g buffered-rendering=\"static\"><path d=\"M10.576 7.239c0-.995-.82-1.815-1.815-1.815H3.315c-.995 0-1.815.82-1.815 1.815v5.446c0 .995.82 1.815 1.815 1.815h5.446c.995 0 1.815-.82 1.815-1.815V7.239z\" /><path d=\"M10.576 10.577h2.109A1.825 1.825 0 0014.5 8.761V3.315A1.826 1.826 0 0012.685 1.5H7.239c-.996 0-1.815.819-1.816 1.815v1.617\" /></g></svg>\n    <span class=\"bubble-sm bubble-tl [--offset-l:-6px] tail [--tail-x:calc(100%-30px)] text-navy-950\">\n      Copy to clipboard\n    </span>\n  </button>\n  <div class='highlight relative group'>\n    <pre class='highlight '><code id=\"code-bcyvgy6u\"><span class=\"py\">app</span> <span class=\"p\">=</span> <span class=\"s\">\"your-app-name\"</span>\n<span class=\"py\">region</span> <span class=\"p\">=</span> <span class=\"s\">\"ord\"</span>\n<span class=\"py\">vm.size</span> <span class=\"p\">=</span> <span class=\"s\">\"l40s\"</span>\n\n<span class=\"nn\">[http_service]</span>\n  <span class=\"py\">internal_port</span> <span class=\"p\">=</span> <span class=\"mi\">11434</span>\n  <span class=\"py\">force_https</span> <span class=\"p\">=</span> <span class=\"kc\">false</span>\n  <span class=\"py\">auto_stop_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n  <span class=\"py\">auto_start_machines</span> <span class=\"p\">=</span> <span class=\"kc\">true</span>\n  <span class=\"py\">min_machines_running</span> <span class=\"p\">=</span> <span class=\"mi\">0</span>\n  <span class=\"py\">processes</span> <span class=\"p\">=</span> <span class=\"nn\">[\"app\"]</span>\n\n<span class=\"nn\">[build]</span>\n  <span class=\"py\">image</span> <span class=\"p\">=</span> <span class=\"s\">\"ollama/ollama\"</span>\n\n<span class=\"nn\">[mounts]</span>\n  <span class=\"py\">source</span> <span class=\"p\">=</span> <span class=\"s\">\"models\"</span>\n  <span class=\"py\">destination</span> <span class=\"p\">=</span> <span class=\"s\">\"/root/.ollama\"</span>\n  <span class=\"py\">initial_size</span> <span class=\"p\">=</span> <span class=\"s\">\"100gb\"</span>\n</code></pre>\n  </div>\n</div>\n<p>Deploy this and bam, large language model inferencing from anywhere. If you want a private setup, see the article <a href='https://fly.io/blog/scaling-llm-ollama/' title=''>Scaling Large Language Models to zero with Ollama</a> for more information. You never know when you have a sandwich emergency and don’t know what you can make with what you have on hand.</p>\n\n<p>We are working on getting some lower-cost A10 GPUs in the next few weeks. We’ll update you when they’re ready.</p>\n\n<p>If you want to explore the possibilities of GPUs on Fly.io, here’s a few articles that may give you ideas:</p>\n\n<ul>\n<li><a href='https://fly.io/blog/not-midjourney-bot/' title=''>Deploy Your Own (Not) MidJourney Bot On Fly GPUs</a>\n</li><li><a href='https://fly.io/blog/scaling-llm-ollama/' title=''>Scaling Large Language Models to zero with Ollama</a>\n</li><li><a href='https://fly.io/blog/transcribing-on-fly-gpu-machines/' title=''>Transcribing on Fly GPU Machines</a>\n</li></ul>\n\n<p>Depending on factors such as your organization’s age and payment history, you may need to go through additional verification steps.</p>\n\n<p>If you’ve been experimenting with Fly.io GPUs and have made something cool, let us know on the <a href='https://community.fly.io/' title=''>Community Forums</a> or by mentioning us <a href='https://hachyderm.io/@flydotio' title=''>on Mastodon</a>! We’ll boost the cool ones.</p>",
      "image": {
        "url": "https://fly.io/blog/gpu-ga/assets/gpu-ga-thumb.webp",
        "title": null
      },
      "media": [],
      "authors": [
        {
          "name": "Fly",
          "email": null,
          "url": null
        }
      ],
      "categories": []
    }
  ]
}
Analyze Another View with RSS.Style